Course topics
Module 1.- Bandit algorithms.
- Markov Decision Problems and Dynamic Programming
- Practice: programming of some bandit algorithms. Bandit algorithms for stock-picking.
- Tabular methods (Montecarlo and Temporal Difference).
- Practice: implement some of these methods in OpenAI Gym.
- On-policy prediction and control with function approximation. Deep Reinforcement Learning.
- Practice: OpenAI Gym (FrozenLake/MountainCar).
- Policy Optimization / Policy gradients.
- Practice: OpenAI Gym (Pong).
- Two-player games. Evolutionary games.
- Practice: Counterfactual Regret minimization. Evolutionary game theory.
- Meta-learning
- Learning through self-play