Deep Learning for Audio

Deep Learning for Audio

Course Description

The course is focused on Speech Synthesis using Neural Networks. We will cover key building blocks of the end-to-end speech synthesis system, diving into details and building intuition on how the whole system works. This course will bring together various concepts that are usually learned in a high school and forgotten later. As a useful side effect, you’ll also become experts in reading spectrograms, quite a rare skill.

Course tools

  • Python
  • Pytorch deep learning framework.
  • Hardware with Nvidia GPUs.


  • Strong knowledge of Python
  • Being able to read and implement scientific papers
  • Concepts: Fourier transform; Hidden Markov Models; Neural Networks; Dynamic Programming.
Level of complexity of course Advanced


Mr. Taras Sereda Taras’s path in Machine Learning started from the passion in visual arts & music & languages and mathematics.  It evolved through several projects ranging from object detection/classification to audio source separation, spoken word detection and voice generation. Currently, he collaborates together with SyncWords to bring next-generation voice synthesis technology. Fields of interests: Deep Learning, Linear Algebra, Probability Theory

Про факультет

Важлива інформація

Контактна інформація