## Introduction to Statistical Programming with R

### Brief summary of the course

In this course, we consider the features of the programming language R with an emphasis on functions designed for statistical analysis and modeling. This course will make it easier for you to study econometrics, as R allows you to do statistical data analysis with a minimum of writing your own code.

### Course topics

Part 1-2. Introduction to R programming (3 hours)

• Data structures in R
• Concept of vectorization
• Loops in R
• Functions in R
• Apply-family functions

Part 3. Useful R packages (1.5 hours)

• Important libraries: dplyr, purrr, tidyr, ggplot2
• Data reading, writing, and manipulations
• Data visualizations in R

Part 4. Descriptive Statistics (1.5 hours)

• Basic descriptive functions
• Characteristics of univariate data sets
• Characteristics of bivariate data sets
• Descriptive visualizations

Part 5. Probability Theory in R (1.5 hours)

• Probability of events
• Random variables and distribution functions
• Visualizing data distributions

Part 6. Inferential Statistics (1.5 hours)

• Parameter estimation
• Confidence intervals
• Statistical tests
• Hypothesis Tests

Part 7. Regression (part I) (1.5 hours)

• Linear Regression
• OLS
• Testing the coefficients
• Goodness-of-fit
• Missing data
• Multicollinearity
• Heteroscedasticity
• Autocorrelation
• Outliers

Part 8. Regression (part II) (1.5 hours)

• Model selection
• Omission of relevant regressors
• Inclusion of irrelevant regressors
• Stepwise model selection
• Generalized least squares
• Nonlinear regression

Part 9. Nonparametric regression (1.5 hours)

• Kernel density estimator
• Univariate nonparametric regression
• Lasso regression
• Ridge regression

Part 10. Modeling binary, nominal, and count data (1.5 hours)

• Regression trees (CART, CHAID)
• Modeling binary data
• Binary data with CART/CHAID
• Modeling nominal data
• Modeling count data

Part 11. Time-series: decomposition (1.5 hours)

• Time series components: trend, seasonality, irregular component
• Time series models

Part 12. Time-series: forecasting (1.5 hours)

• forecast package
• Parameter estimation for ARMA processes
• Goodness of forecasts
• Naive forecasts
• Exponential smoothing
• Forecast combinations

### Homeworks

 Homework Description Points Homework 1 – Introduction to R programming 6 programming tasks to check R programming skills 25 Homework 2 – Hypothesis testing 6 hypothesis to prove using R 25 Homework 3 – Regression competition Challenge to develop the most optimal regression model 50