Saturday, June 30, 2018

Appendix to It's the Data, Stupid: Developing a U Sports Football Database

6- Appendix 1: List of Games

The table below provides a complete list of all the games for which play-by-play data has been found and included in the database as of the date of publication. The games are listed chronologically by date played, oldest game first.

It's the Data, Stupid: Development of a U Sports Football Database

1-Abstract

A discussion of the development of a fully parsed database of U Sports football play-by-play data, including the sources and structures of available play-by-play data, examples of common errors in the data and methods to rectify them, and complete discussion of the code, structured as individual functions, used to parse the data into discrete columns. The code used is included within the discussion of each function.  The parsed data forms a relational database of plays that allow further research to be conducted at the individual play level. This database is believed to be the first of its kind for both U Sports and Canadian football in general.

Friday, June 8, 2018

You Play to Win the Game: Win Probability in American Football

1-Abstract

The conclusion of a three-part series discussing the three major fields of research in American football analytics; First Down Probability (P(1D)) (Clement 2018a), Expected Points (EP) (Clement 2018b), and Win Probability (WP). This chapter discusses the existing body of work regarding WP and various models derived to estimate it over the last 30 years. Development in the sophistication  of model-building techniques is an ongoing theme, and better understanding of measures of uncertainty are an emergent topic as competing models enter public view and analytic notions infiltrate the sporting lexicon.

Tuesday, June 5, 2018

Score, Score, Score Some More: Expected Points in American Football

1 – Abstract
A continuation of prior efforts to consolidate the body of knowledge in American football analytics (Clement 2018), this work discusses the development of Expected Points as an analytical model over the past decades, with a focus on results, statistical methods, data management techniques, and historiographical change.
The development of EP over the past forty years has shown development in statistical techniques employed, growing from linear approximations based on two data points to advanced smoothing techniques and non-linear fits. This growth is aided by the exponential growth in dataset sizes that has paralleled the rise in cheaply available computing.

Sunday, June 3, 2018

Keep the Drive Alive: First Down Probability in American Football

1 – Abstract
An examination of the existing scholarship of First Down Probability (P(1D)) in American football. A fairly unambiguous question, this work reviews ten studies over the past 45 years, albeit mostly in the last decade. Results are generally consistent across sources that P(1D) decreases in linear proportion to distance-to-gain for 1st and 2nd downs, while different sources model 3rd down as being either a weakly fit linear relationship or a slight exponential fit.
4th down was not examined fully by any source because of insufficient data. What data points can be confidently placed seem very close to 3rd down, leading to discussion over whether 3rd down data can serve as a proxy for 4th down in decision-making models.

Three Downs Away: P(1D) In U Sports Football

1-Abstract A data set of U Sports football play-by-play data was analyzed to determine the First Down Probability (P(1D)) of down & d...