Sunday, January 26, 2020

A Team Game: Team Objects in the Passes & Patterns Code

1. Abstract

A custom class was created for teams in the Passes & Patterns database in order to hold metadata about each team, held in a dict accessible globally. Linking to relevant game objects will allow much more efficient team-specific analysis and centralize team meta-data. Future development involves creating player-level objects and linking team logo objects.

Friday, January 24, 2020

The Numbers Behind the Numbers: EP Classification Models at the Class Level

1. Abstract

Using the three highest-performing models from the previous examination of EP classification models (Clement 2019a), this work looks at how well these models are able to predict the individual classes of future scoring. The k-Nearest Neighbours (kNN) model performs adequately, but not as well as the Multi-Layer Perceptron (MLP) and Gradient Boosting Classifier (GBC), which performed very well, the GBC slightly better than the MLP. The models all show bias by quarter, demonstrating the need to include game time as a factor, and home-field advantage. Future work will involve feature selection and fine-tuning of the MLP and GBC models. 

Monday, January 6, 2020

Fresh Start: 2019 Play-by-Play Data in the Passes & Patterns Database


1. Abstract

New play-by-play data from the 2019 U Sports and CFL seasons was added to the Passes & Patterns Database. 125 games and 21,549 plays were added to the U Sports database, while 95 games and 14,025 plays were added to the CFL database. All data was cleaned and error-checked to fit with existing data format standards. A new scraper was developed in Python to streamline the CFL data collection process.

Wednesday, January 1, 2020

Reverting to the Mean: Regression EP Models in U Sports Football

1. Abstract

A set of five different regression models were tested as measures of Expected Points, parallelling prior work in the field (Clement 2019) - the Multi-Layer Perceptron, Stochastic Gradient Descent, Elastic Net, Ada Boost, and Bayesian Ridge models. The model outputs were viewed and compared to the results of the raw data, and calibration graphs for each model were developed, as well as calibration graphs broken down by down, quarter, and home/away. The Multi-Layer Perceptron proved the only effective model, with the Elastic Net and Bayesian Ridge models effective only in certain limited circumstances, the Ada Boost is of very limited use, and the Stochastic Gradient Descent proved completely useless as a predictor of Expected Points.

Three Downs Away: P(1D) In U Sports Football

1-Abstract A data set of U Sports football play-by-play data was analyzed to determine the First Down Probability (P(1D)) of down & d...