Date of Award

5-2026

Document Type

Thesis

Degree Name

Bachelor of Arts (BA)

Department

Natural Sciences

First Advisor

Skripnikov, Andrey

Area of Concentration

Natural Sciences

Abstract

This study investigates whether deviations in age-performance trajectories can be used to statistically identify patterns consistent with anabolic steroid use in Major League Baseball (MLB). Athletic performance typically follows a predictable biological trajectory, characterized by growth in early ages, a peak in the late twenties to early thirties, and a decline thereafter. Due to anabolic androgenic steroids being known to enhance muscle mass, strength, and recovery, their use may alter this natural progression, producing detectable distortions in performance over time. Using data from the Lahman Baseball Database, this analysis examines performance metrics for MLB players, with a primary focus on isolated power (ISO) as a proxy for power output. Age-performance relationships are modeled using quadratic regression with interaction terms to compare known steroid users and non users. To address the confounding effect of player talent, within player standardization techniques are applied to isolate trajectory shape rather than absolute performance level. Two classification approaches are used: a prototype distance method, which compares individual trajectories to average profiles of known users and non users, and a logistic regression model based on extracted trajectory features such as peak performance, peak age, and pre and post peak slopes. Results indicate that raw performance based models are heavily influenced by player talent, leading to a “greatness detection” problem. In contrast, standardized trajectory based methods more effectively capture patterns associated with delayed decline and extended peak performance, which are consistent with the hypothesized effects of steroid use. While the models do not provide causal identification of steroid use, they demonstrate that deviations in age-performance trajectories can serve as a meaningful statistical signal. This framework contributes to the development of anomaly detection methods in sports analytics and highlights the importance of focusing on performance dynamics rather than levels when analyzing potential performance enhancing behavior.

Rights

The author has granted New College of Florida the nonexclusive right to archive, make accessible, and distribute for educational purposes this work in whole or in part in all forms of media, now or hereafter known. The copyright of this work remains with the author.

Share

COinS