Introduction to Statistical Programming with R

Brief summary of the course

In this course, we consider the features of the programming language R with an emphasis on functions designed for statistical analysis and modeling. This course will make it easier for you to study econometrics, as R allows you to do statistical data analysis with a minimum of writing your own code.

Learning outcomes

By the end of the course students will have specialized conceptual knowledge that includes modern scientific achievements in the field of computer science and is the basis for original thinking and research, critical thinking of problems in the field of computer science and on the verge of knowledge fields; develop algorithmic and software for data analysis (including big data); design architectural solutions for information and computer systems for various purposes; test the software; identify and eliminate issues during software exploitations, and formulate tasks for its modification or reengineering.

Course plan

Part 1-2. Introduction to R programming

● Data structures in R
● Concept of vectorization
● Loops in R
● Functions in R
● Apply-family functions

Part 3. Useful R packages
● Important libraries: dplyr, purrr, tidyr, ggplot2
● Data reading, writing, and manipulations
● Data visualizations in R

Part 4. Descriptive Statistics
● Basic descriptive functions
● Characteristics of univariate data sets
● Characteristics of bivariate data sets
● Descriptive visualizations

Part 5. Probability Theory in R
● Probability of events
● Random variables and distribution functions
● Visualizing data distributions

Part 6. Inferential Statistics
● Parameter estimation
● Confidence intervals
● Statistical tests
● Hypothesis Tests

Part 7. Regression (part I)
● Linear Regression
● OLS
● Testing the coefficients
● Goodness-of-fit
● Missing data
● Multicollinearity
● Heteroscedasticity
● Autocorrelation
● Outliers

Part 8. Regression (part II)
● Model selection
● Omission of relevant regressors
● Inclusion of irrelevant regressors
● Stepwise model selection
● Generalized least squares
● Nonlinear regression

Part 9. Nonparametric regression
● Kernel density estimator
● Univariate nonparametric regression
● Lasso regression
● Ridge regression

Part 10. Modeling binary, nominal, and count data
● Regression trees (CART, CHAID)
● Modeling binary data
● Binary data with CART/CHAID
● Modeling nominal data
● Modeling count data

Part 11. Time-series: decomposition
● Time series components: trend, seasonality, irregular component
● Time series models

Part 12. Time-series: forecasting
● forecast package
● Parameter estimation for ARMA processes
● Goodness of forecasts
● Naive forecasts
● Exponential smoothing
● Forecast combinations