Beyond Box Scores: Statcast’s Impact on Predicting Pitcher Performance in Baseball

Author

Ryan Fortune

Date of Award

2025

Document Type

Thesis

Degree Name

Bachelors

Department

Natural Sciences

First Advisor

Loveland, Rohan

Second Advisor

Skripnikov, Andrey

Area of Concentration

Computer Science

Abstract

Predicting the ERA of Major League Baseball (MLB) pitchers remains a complex problem, with these evaluations holding significant financial and competitive implications for MLB organizations, if their decisions do not pan out. Traditional evaluation metrics often fail to isolate a pitcher’s underlying skill from the influence of luck, team defense, and other external variables that make this a noisy problem. This thesis explores whether the inclusion of newly tracked Statcast metrics and metrics derived from them can meaningfully improve predictions of next season Earned Run Average (ERA), surpassing what can be achieved through conventional box score statistics and run estimators alone. To address this question, the research employs an out-of-sample validation process, leveraging data from four distinct feature sets Box Score, Run Estimators, Raw Statcast, and Enhanced Statcast, which then applies a collection of machine learning models, including Linear Regression, Support Vector Regression (SVR), and XGBoost. The strongest results were obtained from a hybrid SVR model, which combines five critical features: sp-stuff, strikeouts at pitcher averages per 9 innings pitched (K-per-9), pitch-type diversity, Hard% (Hard hit percentage), and innings pitched per appearance (IPA). This approach yielded a test set R squared value of approximately 0.27, substantially exceeding the performance of any single-domain or traditional baseline model I observed within my own research. Overall, the results indicate that strategically combining select Statcast measurements, especially those reflecting the physical quality of pitches and their ability to control contact, with established measures of strikeout proficiency and pitcher workload leads to a significant improvement in forecasting ERA. By uniting these modern and traditional elements, the study presents a more sophisticated picture of pitcher ability, demonstrating that a blended analytical approach can more effectively capture the underlying dynamics of pitcher performance.

This document is currently not available here.

Share

COinS