Natural Language Processing

Brief summary of the course

This course teaches fundamental concepts of NLP, one of the most important applied areas of artificial intelligence. It covers the general principles of the field, including the basic understanding of structural linguistics. It demonstrates the progression of NLP approaches from symbolic/rule based, to classic Machine Learning based ones, to modern Neural Network based solutions. The practical topics that the course deals with are NLP techniques for information extraction, classification, sequence modeling, parsing, semantics and sense representation, language modeling and generation.

Course plan

  1. Introduction to Structural Linguistics
    • Definition and history of structural linguistics
    • Grammar and syntax
    • Parts of speech and morphological analysis
    • Phrase structure and constituent structure analysis
    • Dependency grammar and dependency parsing
    • Named entity recognition
    • Text classification
    • Sentiment analysis
    • Regular expressions
    • Tokenization, stemming, and lemmatization
  2. Full Cycle of NLP Projects, Rule-Based Approaches
    • Overview of the NLP project lifecycle
    • Data collection and pre-processing techniques
    • Text normalization and cleaning
    • Rule-based approaches to NLP, including regular expressions and text classification
    • Named entity recognition
    • Sentiment analysis
    • Text generation
    • Evaluation metrics for NLP projects
    • Handling of out-of-vocabulary words and rare words
    • Techniques for error analysis and model improvement
  3. Bag-of-Words Approach and Unsupervised NLP 
    • Introduction to the bag-of-words approach
    • Term frequency and inverse document frequency
    • N-grams and character n-grams
    • Unsupervised NLP techniques, including clustering and dimensionality reduction
    • Latent Dirichlet allocation (LDA) and topic modeling
    • Document similarity and information retrieval
    • Text classification using unsupervised methods
    • Visualizing text data using word clouds and t-SNE plots
  4. Syntactic Parsing and Semantic Analysis
    • Introduction to syntactic parsing
    • Constituency and dependency parsing
    • Part-of-speech tagging
    • Coreference resolution
    • Semantic role labeling
    • Word sense disambiguation
    • Frame semantic parsing
    • Predicate argument structure
    • Textual entailment and semantic relatedness
  5. Language Modeling and Generation
    • N-gram language models
    • Perplexity and evaluation metrics for language models
    • Text generation techniques, including maximum likelihood and sampling-based methods
    • Dialogue generation and chatbots
    • Text style transfer and text manipulation
    • Text summarization and abstractive summarization
    • Evaluating the quality and diversity of generated text
  6. Deep Learning Approahces
    • Overview of deep learning for NLP
    • Convolutional Neural Networks (CNNs) for NLP
    • Recurrent Neural Networks (RNNs) and variants such as LSTMs and GRUs
    • Transformers and self-attention mechanisms
    • BERT and other pre-trained language models
    • Fine-tuning pre-trained models for NLP tasks
    • Sequence labeling tasks, such as part-of-speech tagging and named entity recognition
    • Sentiment analysis and text classification using deep learning
    • Question answering and text generation using deep learning
    • Transfer learning and multi-task learning in NLP

Course topics

  • Introduction to Natural Language Processing
  • Structural Linguistics
  • Working with data: create, get, prepare
  • Rule-based NLP systems
  • Basic classification
  • Bag-of-words model
  • Sequences, n-grams, language modeling
  • Syntax
  • Semantics
  • Unsupervised NLP
  • Deep Learning, RNN for NLP

Prerequisites

Learning outcomes

    1. Understanding of the fundamental concepts of structural linguistics, including grammar and syntax.
    2. Knowledge of the full cycle of NLP projects, including data collection, pre-processing, and evaluation.
    3. Familiarity with rule-based approaches to NLP, including regular expressions and text classification.
    4. Understanding of the bag-of-words approach and the ability to apply unsupervised NLP techniques.
    5. Ability to use syntactic parsing and semantic analysis techniques to extract meaning from text.
    6. Knowledge of language modeling and generation techniques, and experience with deep learning approaches to NLP.

Prerequisites