Author

Atalay Kutlay

Date of Award

2022

Document Type

Thesis

Degree Name

Bachelors

Department

Natural Sciences

First Advisor

Gillman, David

Area of Concentration

Computer Science and Statistics

Abstract

Variable selection algorithms are used to find the predictor variables that have a correlation with the response variable. Having a correlation among predictor variables is not desired, as it poses issues in separating their respective effects on the response. This phenomenon, called ”multi-collinearity”, generally implies a strong linear inter-relationship between predictors. In this thesis, we compare variable selection algorithms - Stepwise, LASSO, and Sparse Principal Component Regression - and their performance of picking the right set of predictors under various levels of multi-collinearity. We perform simulations with randomly generated datasets according to various parameters controlling the severity of multi-collinearity, the proportion of relevant predictors, and the signal to noise ratio in the response variable. We also try all algorithms using a real dataset and report the differences in their behaviors, as well as their overall performance. We also talk about methods to perform computationally demanding simulations in R and provide methods to make them faster.

Share

COinS