Introduction to Statistical Programming with R

Brief summary of the course

In this course, we consider the features of the programming language R with an emphasis on functions designed for statistical analysis and modeling. This course will make it easier for you to study econometrics, as R allows you to do statistical data analysis with a minimum of writing your own code.

Course topics

Part 1-2. Introduction to R programming (3 hours)

  • Data structures in R
  • Concept of vectorization
  • Loops in R
  • Functions in R
  • Apply-family functions

Part 3. Useful R packages (1.5 hours)

  • Important libraries: dplyr, purrr, tidyr, ggplot2
  • Data reading, writing, and manipulations
  • Data visualizations in R

Part 4. Descriptive Statistics (1.5 hours)

  • Basic descriptive functions
  • Characteristics of univariate data sets
  • Characteristics of bivariate data sets
  • Descriptive visualizations

Part 5. Probability Theory in R (1.5 hours)

  • Probability of events
  • Random variables and distribution functions
  • Visualizing data distributions

Part 6. Inferential Statistics (1.5 hours)

  • Parameter estimation
  • Confidence intervals
  • Statistical tests
  • Hypothesis Tests

Part 7. Regression (part I) (1.5 hours)

  • Linear Regression
  • OLS
  • Testing the coefficients
  • Goodness-of-fit
  • Missing data
  • Multicollinearity
  • Heteroscedasticity
  • Autocorrelation
  • Outliers

Part 8. Regression (part II) (1.5 hours)

  • Model selection
  • Omission of relevant regressors
  • Inclusion of irrelevant regressors
  • Stepwise model selection
  • Generalized least squares
  • Nonlinear regression

Part 9. Nonparametric regression (1.5 hours)

  • Kernel density estimator
  • Univariate nonparametric regression
  • Lasso regression
  • Ridge regression

Part 10. Modeling binary, nominal, and count data (1.5 hours)

  • Regression trees (CART, CHAID)
  • Modeling binary data
  • Binary data with CART/CHAID
  • Modeling nominal data
  • Modeling count data

Part 11. Time-series: decomposition (1.5 hours)

  • Time series components: trend, seasonality, irregular component
  • Time series models

Part 12. Time-series: forecasting (1.5 hours)

  • forecast package
  • Parameter estimation for ARMA processes
  • Goodness of forecasts
  • Naive forecasts
  • Exponential smoothing
  • Forecast combinations

Homeworks

Homework Description Points
Homework 1 – Introduction to R programming 6 programming tasks to check R programming skills  25
Homework 2 – Hypothesis testing 6 hypothesis to prove using R 25
Homework 3 – Regression competition Challenge to develop the most optimal regression model 50