Deep Learning

Course topics

Part 1. Training the Multilayer Perceptrons.

  • Multilayer Perceptrons
  • General learning theory
  • Derivation of backpropagation
  • MLP for classification, one-hot-encoding, softmax, cross-entropy loss function.

Part 2. Deep Convolutional Neural Networks.

  • Convolutional Neural Networks
  • Cortical receptive fields
  • Comparison of MLP and CNN architectures.
  • Convolutional net blocks: convolution, pooling, non-linear activation function, fully-connected layers.
  • Convolutional functions 1D/2D
  • ConvNet architectures: AlexNet, Network in Network, VGG, ResNet, SqueezeNet.

Part 3. Regularization & Optimization for Deep Learning

  • Regularization, the idea, historical issues.
  • Model capacity control: AIC criteria.
  • Ensemble model: averaging (“bootstrap”).
  • Injecting noise to the output targets.
  • Gradient optimization of the error function
  • Batch gradient descent, stochastic gradient descent, using the momentum.
  • Nesterov accelerated gradient.
  • Adaptive learning methods: AdaDelta, AdaGrad, RMSProp, ADAM.

Part 4. Vanishing Gradients Effect. Recurrent Neural Networks

  • Motivation to go deeper: overview ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners 2011-2015.
  • Smart initialization: derivation  of Xavier (Glorot) approach.
  • Batch normalization.
  • Smart orthogonal initialization.
  • Dynamic Neural Networks
  • Modes of RNN’s work. Backpropagation Through Time (BPTT) for Recurrent Network.
  • Simple Recurrent Neural Network (SRN). SRN Forward Dynamics.
  • Simple Recurrent Neural Network Backpropagated Through Time. Effect of different weights initializations for SRN.
  • Long-Short Term Memory (LSTM)

Part 5. Learning the Representations

  • Learning of Representations (Features).
  • Unsupervised pretraining.
  • Autoencoders.
  • Undercomplete, sparse, denoised autoencoders.
  • Unsupervised greedy layer-wise pretraining.
  • Deep, big, simple neural nets (Schmidhuber, 2012): no pre-training, simple backpropagation + gradient descent.
  • Case study: Deep Face (Facebook, 2014), Supervised pre-training for face recognition.
  • Databases for Deep Face pre-training.
  • Transfer Learning: CNNs as feature extractors. Caffe Model Zoo.

Part 6. Deep Learning for Natural Language Processing

  • Sequential and tree-like data structures in NLP.
  • Traditional feature vectors for texts: structural, lexical, syntactic and other features.  
  • Training the representations for texts: word2vec.
  • Distributional Hypothesis. word2vec, two basic neural network models
  • Multi-task learning for NLP. Neural Machine Translation (NMT). Sequence-to-sequence model, RNN encoder-decoder.
  • Beam search. Attention mechanism for NMT.
  • Attention beyond NMT: generation descriptions by images.

Part 7. Neural Generative Models.

  • Discriminative and generative neural models
  • Pixel CNNs/Pixel RNNs. Variational autoencoders.
  • Forward and Reverse Kullback-Leibler Divergence.
  • Generative Adversarial Networks.
  • Convolutional Architectures for GANs.
  • Conditional GANs, supervised / unsupervised modes. pix2pix, CycleGAN

Part 8. Neural Networks for Control

  • AI approach for control, the difference between static and dynamic systems.
  • Control of dynamic plants, examples of states.
  • Direct Inverse Neurocontrol.
  • Model Reference Adaptive Neurocontrol.
  • Backpropagation Through Time for Neurocontrol (Danil Prokhorov’s models).
  • Cascade training of differentiable models, Generative Adversarial Nets as a side example.
  • Approximate Dynamic Programming, Belman’s principle of optimality.
  • Straightforward solution: Model Predictive Control. ADP: Policy-Value iteration.

Prerequisites