Monday, December 17, 2018

Three Point Plays: The Analytics of Field Goals

1. Abstract

A look at the development of models determining field goal probability, and related studies of meaningful predictor variables. Logistic regression is the most common way of incorporating different predictor values, with a single example of a neural network. Distance is unfailingly the strongest predictor of success, and by an order of magnitude over any other variable. Useful environmental variables also include wind, rain, and altitude. The question of clutch kicking is discussed ad nauseum, with a general consensus of there being little effect if any, and icing the kicker similarly finds little support in most works, in some cases even contraindicated.

Thursday, October 18, 2018

Kick it Away, Kick it Away, Kick it Away Now: The Analytics of Punting


    1. Abstract


    A study of the development of research on punting in football. Punting is shown to have improved steadily over time, by both measures of yardage and more sophisticated measures such as Expected Points (EP). The existing official statistics are criticized and being result-oriented without regard to the process or factors beyond the punter’s control. Better measures are suggested and a method of assessing punts relative to their expectation based on a large number of environmental and circumstantial factors to rate punters independently of their opportunities.

    Saturday, October 13, 2018

    The Roman Numerals of Computing: An Object-Oriented Database of U Sports Football

    1. Abstract

    A redevelopment of the U Sports parser and calculator previously described (Clement 2018b) that was built using VBA, this time working in Python. Data is imported through Python’s csv package, and parsed using an object-oriented approach, creating games and plays as classes with attributes as appropriate to each. Further objects exist to support analysis on the parsed data, and the numpy package with its arrays allows for far faster calculation of results. Discussion of future work built on this restructured database includes examination of special teams, expected points (EP) and Win Probability (WP).

    Monday, September 17, 2018

    The Whole Ten Yards: P(1D) in the CFL

    1-Abstract

    P(1D) values across down and distance were calculated for CFL data across down & distance. & Goal situations were separated and viewed independently. Results were compared to results obtained using the same methods on U Sports data. CFL offenses generally follow the same trends as seen in U Sports in terms of P(1D), though CFL P(1D) is consistently ~5 percentage points higher than U Sports under the same conditions. 1st down follows the same linear trend, while 2nd down shows exponential decay, with the previously discussed “Stupidity Asymptote” at 10%. However, CFL teams are markedly less willing to attempt 3rd down conversions, causing a lack of usable data points for 3rd down. For & goal situations the disparity between CFL and U Sports P(1D) does not seem to be as prevalent.

    Friday, August 31, 2018

    Appendix to Going Pro: Developing a CFL Play-by-Play Database

    6-Appendix 1: List of Games

    The table below provides a complete list of all the games for which play-by-play data has been found and included in the database as of the date of publication. The games are listed chronologically by date played, oldest game first.

    Thursday, August 30, 2018

    Going Pro: Developing a CFL Play-by-Play Database

    1-Abstract

    CFL play-by-play data was collected from the official website into a single database. Idiosyncrasies in the structure of the data required the development of a VBA HTML scraper into a database. The data was parsed with a method that parallels the U Sports database, making it available for future investigation.

    Thursday, August 23, 2018

    Three Downs Away: P(1D) In U Sports Football

    1-Abstract

    A data set of U Sports football play-by-play data was analyzed to determine the First Down Probability (P(1D)) of down & distance states. P(1D) was treated as a binomial variable, and confidence intervals were determined iteratively until convergence at the 10-10 level. For each down, fitted regression lines were added to enable discussion of overall trends with respect to distance.
    Only points with a minimum N of 100 instances were considered. 1st down trended linearly, bearing only points at 5-yard intervals. 2nd and 3rd downs followed an exponential decay fit. Special attention was given to the non-zero asymptotes of these functions, and their implications towards the nature of the game. A review of & Goal data failed to provide any deeper insight.

    Due North: Analytics Research in Canadian Football

    1-Abstract

    A broad survey of the history of scholarship in Canadian football analytics as relates to in-game decision-making. Works discussed span from 1982 to 2016, and while most focus on CFL data there does exist some mention of Canadian university football research (then CIS, now known as U Sports). Eleven different works covering a variety of topics create the basis of modern analytics in Canadian football.

    Saturday, June 30, 2018

    Appendix to It's the Data, Stupid: Developing a U Sports Football Database

    6- Appendix 1: List of Games

    The table below provides a complete list of all the games for which play-by-play data has been found and included in the database as of the date of publication. The games are listed chronologically by date played, oldest game first.

    It's the Data, Stupid: Development of a U Sports Football Database

    1-Abstract

    A discussion of the development of a fully parsed database of U Sports football play-by-play data, including the sources and structures of available play-by-play data, examples of common errors in the data and methods to rectify them, and complete discussion of the code, structured as individual functions, used to parse the data into discrete columns. The code used is included within the discussion of each function.  The parsed data forms a relational database of plays that allow further research to be conducted at the individual play level. This database is believed to be the first of its kind for both U Sports and Canadian football in general.

    Friday, June 8, 2018

    You Play to Win the Game: Win Probability in American Football

    1-Abstract

    The conclusion of a three-part series discussing the three major fields of research in American football analytics; First Down Probability (P(1D)) (Clement 2018a), Expected Points (EP) (Clement 2018b), and Win Probability (WP). This chapter discusses the existing body of work regarding WP and various models derived to estimate it over the last 30 years. Development in the sophistication  of model-building techniques is an ongoing theme, and better understanding of measures of uncertainty are an emergent topic as competing models enter public view and analytic notions infiltrate the sporting lexicon.

    Tuesday, June 5, 2018

    Score, Score, Score Some More: Expected Points in American Football

    1 – Abstract
    A continuation of prior efforts to consolidate the body of knowledge in American football analytics (Clement 2018), this work discusses the development of Expected Points as an analytical model over the past decades, with a focus on results, statistical methods, data management techniques, and historiographical change.
    The development of EP over the past forty years has shown development in statistical techniques employed, growing from linear approximations based on two data points to advanced smoothing techniques and non-linear fits. This growth is aided by the exponential growth in dataset sizes that has paralleled the rise in cheaply available computing.

    Sunday, June 3, 2018

    Keep the Drive Alive: First Down Probability in American Football

    1 – Abstract
    An examination of the existing scholarship of First Down Probability (P(1D)) in American football. A fairly unambiguous question, this work reviews ten studies over the past 45 years, albeit mostly in the last decade. Results are generally consistent across sources that P(1D) decreases in linear proportion to distance-to-gain for 1st and 2nd downs, while different sources model 3rd down as being either a weakly fit linear relationship or a slight exponential fit.
    4th down was not examined fully by any source because of insufficient data. What data points can be confidently placed seem very close to 3rd down, leading to discussion over whether 3rd down data can serve as a proxy for 4th down in decision-making models.

    Three Downs Away: P(1D) In U Sports Football

    1-Abstract A data set of U Sports football play-by-play data was analyzed to determine the First Down Probability (P(1D)) of down & d...