Introduction to Statistical Programming with R

Course topics

Part 1-2. Introduction to R programming

  • Data structures in R
  •  Concept of vectorization
  •  Loops in R
  •  Functions in R
  •  Apply-family functions
  •  Important libraries: dplyr, ggplot2
  •  Data reading, writing, and manipulations
  • Data visualizations in R

Part 3. Descriptive Statistics and Probability Theory in R]

  • Basic descriptive functions
  • Characteristics of univariate data sets
  • Characteristics of bivariate data sets
  • Probability of events
  •  Random variables and distribution functions

Part 4. Inferential Statistics

  •  Parameter estimation
  • Confidence intervals
  •  Statistical tests
  •  Hypothesis Tests

Part 5. Regression (part I)

  • Linear Regression
  • OLS
  • Testing the coefficients
  • Goodness-of-fit
  • Missing data
  • Multicollinearity
  • Heteroscedasticity
  • Autocorrelation
  • Outliers

Part 6. Regression (part II)

  • Model selection
  • The omission of relevant regressors
  • Inclusion of irrelevant regressors
  • Stepwise model selection
  • Generalized least squares
  • Nonlinear regression

Part 7. Nonparametric regression, modeling binary, nominal, and count data

  • Kernel density estimator
  • Univariate nonparametric regression
  • Lasso regression
  •  Regression trees (CART, CHAID)
  •  Modeling binary data
  • Binary data with CART/CHAID
  • Modeling nominal data
  • Modeling count data

Part 8. Time-series: decomposition and forecasting

  • Time series components: trend, seasonality, irregular component
  • Time series models
  • Forecast package
  • Parameter estimation for ARMA processes
  • Goodness of forecasts
  • Naive forecasts
  •  Exponential smoothing
  • Forecast combinations

Prerequisites

  • Introduction to Data Science
  • Statistics and Econometrics

Homeworks

  1. (50 points) Hypothesis Testing Exercises (using given dataset)
  2. (50 points) Regression models exercise