Advanced Optimization (2023 Fall)

Course Information

This course aims to provide an overview of modern optimization theory designed for machine learning, particularly focusing on the gradient-based first-order optimization methods (with offline and online deployments), which serve as the foundational optimization tools for modern large-scale learning tasks.



Course agenda

Week Date Topic
Lecture Notes/ Readings
1 09.22 Introduction; Mathematical Background Lecture 1 on matrix norm
09.29 no lecture (due to National Holiday)
2 10.08 Convex Optimization Basics; Function Properties Lecture 2 Chapter 3.2 of Boyd and Vandenberghe’s book (on operations that preserve convexity)
Chapter 3 of Amir Beck’s book (on subgradients)
Chapter 5 of Amir Beck’s book (on smoothness and strong convexity)
3 10.13 GD Methods I: GD method, Lipschitz optimization Lecture 3 Chapter 8.2 of Amir Beck’s book (on GD methods for convex and strongly convex functions)
4 10.20 GD Methods II: GD method, smooth optimization, Nesterov’s AGD, composite optimization Lecture 4 Chapter 3.2 of Bubeck’s book (on GD methods for smooth functions)
Chapter 14 & 15 of Ashok Cutkosky’s lecture note (on momentum and acceleration)
Chapter 10 of Amir Beck’s book (on composite optimization and proximal gradient)
5 10.27 Online Convex Optimization: OGD, convex functions, strongly convex functions, online Newton step, exp-concave functions Lecture 5 Chapter 3 of Hazan’s book (on OGD for convex and strongly convex functions)
Chapter 4 of Hazan’s book (on ONS for exp-concave functions)
11.03 no lecture (due to Sports Day)
6 11.07 Prediction with Expert Advice: Hedge, minimax bound, lower bound; mirror descent (motivation and preliminary) Lecture 6 Lecture Note 2 of Luo’s course (on PEA problem)
7 11.10 Online Mirror Descent: OMD framework, regret analysis, primal-dual view, mirror map, FTRL, dual averaging Lecture 7 Chapter 6 of Orabona’s note (on OMD)
Chapter 7 of Orabona’s note (on FTRL)
Chapter 4 of Bubeck’s book (on mirror map, dual averaging)
8 11.17 Adaptive Online Convex Optimization: problem-dependent guarantee, small-loss bound, self-confident tuning, small-loss OCO, self-bounding property bound Lecture 8 Lecture Note 4 of Luo’s course (on small-loss PEA)
Chapter 4.2 of Orabona’s note (on small-loss OCO)
9 11.24 Optimistic Online Mirror Descent: optimistic online learning, predictable sequence, small-loss bound, gradient-variance bound, gradient-variation bound Lecture 9
Chapter 7.12 of Orabona’s note (on variance/variation bound of OCO)
10 12.01 Online Learning in Games: two-player zero-sum games, repeated play, minimax theorem, fast convergence Lecture 10 Chapter 8 of Hazan’s book (on games and repeated play)
Lecture Note 7 of Luo’s course (on fast-convergence games )
11 12.08 Adversarial Bandits: MAB, IW estimator, Exp3, lower bound, BCO, gradient estimator, self-concordant barrier Lecture 11 Lecture Note 12 of Luo’s course (on adversarial MAB)
Chapter 12 of Lattimore and Szepesvári’s book (on Exp3.IX algorithm for high-probability regret)
12 12.15 Stochastic Bandits: MAB, UCB, linear bandits, self-normalized concentration, generalized linear bandits Lecture 12 Lecture Note 14 of Luo’s course (on stochastic MAB)
Lecture Note 15 of Luo’s course (on stochastic linear bandits)
Chapter 6 & 7 of Lattimore and Szepesvári’s book (on ETC and UCB)
12.22 no lecture
13 12.29 Advanced Topics: non-stationary online learning, universal online learning, online ensemble, base algorithm, meta algorithm Lecture 13 see the references in the slides


Familiar with calculus, probability, and linear algebra. Basic knowledge in convex optimization and machine learning.


Unfortunately, we don’t have a specific textbook for this course. In addition to the course slides and lecture notes (will write if time permits), the following books are very good materials for extra readings.

Some related courses:

Last modified: 2023-12-29 by Peng Zhao