Sequences of wins and losses of sports teams over the course of a season provide a rich source of real-world "pseudo-random" sequences to explore from a variety of angles. In this project we will focus on phenomena such as streakiness, momentum, and over/underperformance that may be observed in these sequences. We will develop mathematical tools to quantify phenomena of this type and then apply these tools to data from professional sports leagues such as the MLB, NBA, NHL, NFL. This will allow one to answer, in a mathematically sound manner, questions such as the following: Which is the "streakiest" team in the history of the MLB? How do the major sports leagues compare in terms of their streakiness? How does streakiness in win/loss sequences of sports teams compare to streakiness in randomly generated sequences? This project has a theoretical component, involving reading background literature, but most of the work will be on the coding side. All participants should be proficient with Python and ideally should have some experience with large scale coding projects in Python. Knowledge of any of the sports mentioned is not required. However, if you are very knowledgeable in a particular sport and its recent history, please mention this in your application.
Aside from completion of the calculus sequence, there are no hard course prerequisites. The necessary background in probability and statistics can be acquired during the course of the project. A high level of proficiency with Python is essential; if you have a Github site, please mention it in your application. Experience with webscraping tools such as Selenium *may* be helpful for certain aspects of the project, but is not required.