Date of Award
2022
Document Type
Thesis
Degree Name
Bachelors
Department
Natural Sciences
First Advisor
Gillman, David
Area of Concentration
Computer Science and Statistics
Abstract
Variable selection algorithms are used to find the predictor variables that have a correlation with the response variable. Having a correlation among predictor variables is not desired, as it poses issues in separating their respective effects on the response. This phenomenon, called ”multi-collinearity”, generally implies a strong linear inter-relationship between predictors. In this thesis, we compare variable selection algorithms - Stepwise, LASSO, and Sparse Principal Component Regression - and their performance of picking the right set of predictors under various levels of multi-collinearity. We perform simulations with randomly generated datasets according to various parameters controlling the severity of multi-collinearity, the proportion of relevant predictors, and the signal to noise ratio in the response variable. We also try all algorithms using a real dataset and report the differences in their behaviors, as well as their overall performance. We also talk about methods to perform computationally demanding simulations in R and provide methods to make them faster.
Recommended Citation
Kutlay, Atalay, "EFFECTS OF MULTICOLLINEARITY IN VARIABLE SELECTION ALGORITHMS" (2022). Theses & ETDs. 6258.
https://digitalcommons.ncf.edu/theses_etds/6258