Introduction to Statistical Programming with R

Course topics

Part 1-2. Introduction to R programming

  • Data structures in R
  • Concept of vectorization
  • Loops in R
  • Functions in R
  • Apply-family functions

Part 3. Useful R packages

  • Important libraries: dplyr, purrr, tidyr, ggplot2
  • Data reading, writing, and manipulations
  • Data visualizations in R

Part 4. Descriptive Statistics

  • Basic descriptive functions
  • Characteristics of univariate data sets
  • Characteristics of bivariate data sets
  • Descriptive visualizations

Part 5. Probability Theory in R

  • Probability of events
  • Random variables and distribution functions
  • Visualizing data distributions

Part 6. Inferential Statistics

  • Parameter estimation
  • Confidence intervals
  • Statistical tests
  • Hypothesis Tests

Part 7. Regression (part I)

  • Linear Regression
  • OLS
  • Testing the coefficients
  • Goodness-of-fit
  • Missing data
  • Multicollinearity
  • Heteroscedasticity
  • Autocorrelation
  • Outliers

Part 8. Regression (part II)

  • Model selection
  • Omission of relevant regressors
  • Inclusion of irrelevant regressors
  • Stepwise model selection
  • Generalized least squares
  • Nonlinear regression

Part 9. Nonparametric regression

  • Kernel density estimator
  • Univariate nonparametric regression
  • Lasso regression
  • Ridge regression

Part 10. Modeling binary, nominal, and count data

  • Regression trees (CART, CHAID)
  • Modeling binary data
  • Binary data with CART/CHAID
  • Modeling nominal data
  • Modeling count data

Part 11. Time-series: decomposition

  • Time series components: trend, seasonality, irregular component
  • Time series models

Part 12. Time-series: forecasting

  • forecast package
  • Parameter estimation for ARMA processes
  • Goodness of forecasts
  • Naive forecasts
  • Exponential smoothing
  • Forecast combinations

Prerequisites

  • Introduction to Data Science
  • Statistics and Econometrics

Homeworks

  1. (50 points) Hypothesis Testing Exercises (using given dataset)
  2. (50 points) Regression models exercise