Leveraging Statistics to Understand How to Build a Playoff Baseball Team

Completion: December 2023

This project seeks to explore the underlying team statistics that best can predict a team's likelihood of reaching the MLB playoffs.
The approach is two-fold, providing an 'offensive' regression that regresses wins on offensive statistics and a “defensive” regression that regresses wins on defensive statistics. These regressions helped show the significance of specific statistics influencing wins and the magnitude and direction of these statistics. As we are interested in creating a model that creates a playoff-caliber team, we then run a Monte-Carlo simulation for an imagined team with specific stats and show the percentage of that team making the playoffs by reaching a threshold amount of wins.
The project's data was collected from the 2008 season until the most recent 2023 season; however, 2020 was omitted from the data set due to the coronavirus pandemic leading to a shortened season and unique playoff format.

Tags: R Microsoft Excel Linear Regression Statistics Data-visualization Teamwork Columbia

How to contact me


Email
cjd2186@columbia.edu