Introduction to Statistical Programming with R
Brief summary of the course
In this course, we consider the features of the programming language R with an emphasis on functions designed for statistical analysis and modeling. This course will make it easier for you to study econometrics, as R allows you to do statistical data analysis with a minimum of writing your own code.
Course topics
Part 1-2. Introduction to R programming (3 hours)
- Data structures in R
- Concept of vectorization
- Loops in R
- Functions in R
- Apply-family functions
Part 3. Useful R packages (1.5 hours)
- Important libraries: dplyr, purrr, tidyr, ggplot2
- Data reading, writing, and manipulations
- Data visualizations in R
Part 4. Descriptive Statistics (1.5 hours)
- Basic descriptive functions
- Characteristics of univariate data sets
- Characteristics of bivariate data sets
- Descriptive visualizations
Part 5. Probability Theory in R (1.5 hours)
- Probability of events
- Random variables and distribution functions
- Visualizing data distributions
Part 6. Inferential Statistics (1.5 hours)
- Parameter estimation
- Confidence intervals
- Statistical tests
- Hypothesis Tests
Part 7. Regression (part I) (1.5 hours)
- Linear Regression
- OLS
- Testing the coefficients
- Goodness-of-fit
- Missing data
- Multicollinearity
- Heteroscedasticity
- Autocorrelation
- Outliers
Part 8. Regression (part II) (1.5 hours)
- Model selection
- Omission of relevant regressors
- Inclusion of irrelevant regressors
- Stepwise model selection
- Generalized least squares
- Nonlinear regression
Part 9. Nonparametric regression (1.5 hours)
- Kernel density estimator
- Univariate nonparametric regression
- Lasso regression
- Ridge regression
Part 10. Modeling binary, nominal, and count data (1.5 hours)
- Regression trees (CART, CHAID)
- Modeling binary data
- Binary data with CART/CHAID
- Modeling nominal data
- Modeling count data
Part 11. Time-series: decomposition (1.5 hours)
- Time series components: trend, seasonality, irregular component
- Time series models
Part 12. Time-series: forecasting (1.5 hours)
- forecast package
- Parameter estimation for ARMA processes
- Goodness of forecasts
- Naive forecasts
- Exponential smoothing
- Forecast combinations
Homeworks
Homework | Description | Points |
Homework 1 – Introduction to R programming | 6 programming tasks to check R programming skills | 25 |
Homework 2 – Hypothesis testing | 6 hypothesis to prove using R | 25 |
Homework 3 – Regression competition | Challenge to develop the most optimal regression model | 50 |