# Efficient algorithms for high-dimensional bandit problems¶

Daniel Hsu

Dr.

Microsoft Research New England

Abstract :
Many online decision problems are characterized by limited feedback:
an agent only learns about a chosen action, and nothing about other actions that could have been taken. In such problems, the agent is faced with the exploration/exploitation dilemma -- whether to spend time learning more about possible actions, or to choose actions previously discovered to yield high reward. This is neatly captured in the classical multi-armed bandit problem, for which optimal solutions are now well-known. However, modern applications go beyond the classical framework. For instance, the number of actions available to an agent may be very large or infinite, or there may be no single action that gives high reward. In each of these cases, additional structure must be exploited to achieve good performance.
In the first part of the talk, I'll present an algorithm for the convex bandit problem, where the set of actions is an arbitrary compact and convex subset of d-dimensional Euclidean space (and thus uncountably infinite), but the cost function is convex. Here, the challenge is to avoid the curse of dimensionality, both in terms of the performance measure as well as in the running-time of the decision-making algorithm.
In the second part of the talk, I'll describe a setting where context information is available to the agent in each round; the goal is then to compete against the best policy (mapping contexts to actions) in some class, rather than just the best single action. A crucial feature of the algorithm I'll describe is its polylog(N) running-time, where N is the number of polices in the class (and is typically very large, e.g., exponential in natural dimension quantities).
This talk is based on joint work with Alekh Agarwal, Dean Foster, Sham Kakade, Sasha Rakhlin; and with Miro Dudik, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang.

Bio:
Daniel Hsu is currently a postdoc at Microsoft Research New England.
He received his Ph.D. in Computer Science in 2010 from the Department of Computer Science and Engineering at UC San Diego, where he was advised by Sanjoy Dasgupta. In 2010-2011, he was a postdoc with the Department of Statistics at Rutgers University and the Department of Statistics at the University of Pennsylvania, supervised by Tong Zhang and Sham M. Kakade. His research interests are in algorithmic statistics and machine learning.