ۿ۴ý

Nonparametric Iterated-Logarithm Extensions of the Sequential Generalized Likelihood Ratio Test

Submitted by admin on Mon, 10/28/2024 - 01:24

We develop a nonparametric extension of the sequential generalized likelihood ratio (GLR) test and corresponding time-uniform confidence sequences for the mean of a univariate distribution. By utilizing a geometric interpretation of the GLR statistic, we derive a simple analytic upper bound on the probability that it exceeds any prespecified boundary; these are intractable to approximate via simulations due to infinite horizon of the tests and the composite nonparametric nulls under consideration.

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the estimation of risk-sensitive policies in reinforcement learning (RL) problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent, and hence modifications of Bellman's equations are required, which often do not permit efficient fixed point schemes.

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Submitted by admin on Mon, 10/28/2024 - 01:24

In this paper, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of algorithm for decentralized stochastic optimization. Considering communication and synchronization costs are the major bottlenecks for decentralized optimization, we attempt to reduce these costs from an algorithmic design aspect, in particular, we are able to reduce the number of agents involved in one round of update via randomization.

Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent

Submitted by admin on Mon, 10/28/2024 - 01:24

Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner’s policy approaches optimality—where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be “asymptotically optimal” in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either “destroyed” or “incapacitated” with probability 1.

Asynchronous Delayed Optimization With Time-Varying Minibatches

Submitted by admin on Mon, 10/28/2024 - 01:24

Large-scale learning and optimization problems are often solved in parallel. In a master-worker distributed setup, worker nodes are most often assigned fixed-sized minibatches of data points to process. However, workers may take different amounts of time to complete their per-batch calculations. To deal with such variability in processing times, an alternative approach has recently been proposed wherein each worker is assigned a fixed duration to complete the calculations associated with each batch.

Robust Change Detection via Information Projection

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the robust transient and quickest change detection problems with unknown post-change distributions under finite alphabets. When the distribution after the change point is unknown, the change in the distribution of observations may occur in multiple ways without much structure on the observations, whereas, before the change point, a false alarm is highly structured, following a particular sample path with respect to the distribution of observations and the detection scheme.

Bandit-Based Monte Carlo Optimization for Nearest Neighbors

Submitted by admin on Mon, 10/28/2024 - 01:24

The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits.

On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective. Instead of focusing on the hardness of multiple players, we introduce a new dimension of hardness, called attackability. All adversaries can be categorized based on the attackability and we introduce Adversary-Adaptive Collision-Communication (A2C2), a family of algorithms with forced-collision communication among players.

Quickest Detection of Moving Anomalies in Sensor Networks

Submitted by admin on Mon, 10/28/2024 - 01:24

The problem of sequentially detecting a moving anomaly is studied, in which the anomaly affects different parts of a sensor network over time. Each network sensor is characterized by a pre- and post-change distribution. Initially, the observations of each sensor are generated according to the corresponding pre-change distribution. After some unknown but deterministic time instant, a moving anomaly emerges, affecting different sets of sensors as time progresses.

Universal Active Learning via Conditional Mutual Information Minimization

Submitted by admin on Mon, 10/28/2024 - 01:24

Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates which is very expensive in terms of time and cost. Active learning is an approach which uses feedback to only label the most informative data points and significantly reduce the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking.