Sunday, January 26, 2020

A Team Game: Team Objects in the Passes & Patterns Code

1. Abstract

A custom class was created for teams in the Passes & Patterns database in order to hold metadata about each team, held in a dict accessible globally. Linking to relevant game objects will allow much more efficient team-specific analysis and centralize team meta-data. Future development involves creating player-level objects and linking team logo objects.
2. Introduction
Currently, teams in the Passes & Patterns database are identified by a unique three letter code that refers to either the school name for U Sports teams or the team nickname for CFL teams. While this method is certainly adequate, and it allows us to look at team-level concepts, it requires us to loop over the database repeatedly for each team, or employ a burdensome nested list structure. Furthermore, any team-level information must be stored in a variety of different lists. Creating teams as objects allows all data about teams to be stored in a single location, and allows teams to be studied more efficiently. Similarly to how the stadium objects (Clement 2019a) allowed METAR data to be included with each play (Clement 2019b) permitting analyses of the impacts of weather in U Sports football (Clement 2019c).
While most previous work in the Passes & Patterns archives refers to general analyses, including all data from all teams to establish baselines, to go deeper requires looking at the data on a team-by-team basis, and even by team-season, to suss out confounding issues, and to determine the applicability of generalized models to specific teams. In short, are teams meaningfully unique, or can the same general recommendations apply across the board.

3. Team objects

Team objects are stored in a dict in the Globals module, in the same way that stadia are held, and exist as a custom class created from a list of data about each team. Diacritics have been removed for enhanced compatibility with different character encodings. Table 1 gives a summary of all the data of the U Sports teams. CFL teams are similarly structured.


Abbr
Short Name
Long Name
Nickname
Conference
ACA
Acadia
Acadia University
Axemen
{season:"AUS" for season in range(2002, 2020)}
SMU
Saint Mary’s
Saint Mary’s University
Huskies
{season:"AUS" for season in range(2002, 2020)}
SFX
St. Francis Xavier
St. Francis Xavier University
X-Men
{season:"AUS" for season in range(2002, 2020)}
MTA
Mount Allison
Mount Allison University
Mounties
{season:"AUS" for season in range(2002, 2020)}
BIS
Bishop’s
Bishop’s University
Gaiters
**{season:"AUS" for season in range(2017, 2020)}}
SHE
Sherbrooke
Universite de Sherbrooke
Vert et Or
{season:"RSEQ" for season in range(2003, 2020)}
LAV
Laval
Universite de Laval
Rouge et Or
{season:"RSEQ" for season in range(2002, 2020)}
MON
Montreal
Universite de Montreal
Carabins
{season:"RSEQ" for season in range(2002, 2020)}
CON
Concordia
Concordia University
Stingers
{season:"RSEQ" for season in range(2002, 2020)}
MCG
McGill
McGill University
Martlets
{season:"RSEQ" for season in range(2002, 2020)}
CAR
Carleton
Carleton University
Ravens
{season:"OUA" for season in range(2013, 2020)}
OTT
Ottawa
Ottawa
Gee-Gees
{season:"OUA" for season in range(2002, 2020)}
QUE
Queen’s
Queen’s University
Golden Gaels
{season:"OUA" for season in range(2002, 2020)}
TOR
Toronto
University of Toronto
Varsity Blues
{season:"OUA" for season in range(2002, 2020)}
YRK
York
York University
Lions
{season:"OUA" for season in range(2002, 2020)}
MAC
McMaster
McMaster University
Marauders
{season:"OUA" for season in range(2002, 2020)}
GUE
Guelph
University of Guelph
Gryphons
{season:"OUA" for season in range(2002, 2020)}
WAT
Waterloo
University of Waterloo
Warriors
{season:"OUA" for season in range(2002, 2020)}
WLU
Wilfrid Laurier
Wilfrid Laurier University
Golden Hawks
{season:"OUA" for season in range(2002, 2020)}
WES
Western
University of Western Ontario
Mustangs
{season:"OUA" for season in range(2002, 2020)}
WIN
Windsor
University of Windsor
Lancers
{season:"OUA" for season in range(2002, 2020)}
MAN
Manitoba
University of Manitoba
Bisons
{season:"CWUAA" for season in range(2002, 2020)}
SKH
Saskatchewan
University of Saskatchewan
Huskies
{season:"CWUAA" for season in range(2002, 2020)}
REG
Regina
University of Regina
Rams
{season:"CWUAA" for season in range(2002, 2020)}
ALB
Alberta
University of Alberta
Golden Bears
{season:"CWUAA" for season in range(2002, 2020)}
CGY
Calgary
University of Calgary
Dinos
{season:"CWUAA" for season in range(2002, 2020)}
UBC
British Columbia
University of British Columbia
Thunderbirds
{season:"CWUAA" for season in range(2002, 2020)}
SFU
Simon Fraser
Simon Fraser University
Clan
{season:"CWUAA" for season in range(2003, 2010)}
Table 1 Team Object Data

a. Abbreviation

The abbreviation attribute is a string holds the three-letter code by which teams are referred throughout the database, such as “MAN.” This is a unique identifier for every school and is the primary means by which teams are labelled in all situations. The abbreviation is is always in capital letters and refers to the name of the school, not the name of the team, for U Sports data, as opposed to CFL data that refers to the team name, such as “ALS,” because the U Sports database preceded it, and to soothe raise would create ambiguity with a number of CFL teams collocates with universities named after cities such as Ottawa, Toronto, Montreal, and Calgary. Furthermore, the existence of two different Ottawa franchises in the CFL data means that the city name cannot be used for this purpose. The abbreviation also serves as the key for the dict in which team objects are stored together, in order to be able to easily reference them using their most common appellation in the data.  

b. Short name

The short name is the common name of the university in spoken parlance. Generally it is the name of the university, less the word “university” and any associated articles (“of,” “the”), such as “Manitoba.” This is the form used to identify teams in video footage, a future development. For CFL teams it is the full city name (or “BC” in the case of the B. C. Lions), also in full caps. This does create ambiguity in the manner avoided above with the abbreviation, so this form is only used to identify teams in video files. 

c. Long name

The long name is the full proper name of the university, with us “University of Manitoba,” or the full team name, such as “Montreal Alouettes.” Any preceding “The” is omitted. This is written in title case as dictated by the styling of the organization, including the block caps of “Ottawa REDBLACKS,” which, in spite of all manner of sense, style, and taste, remains the preference of the organization.

d. Nickname

This string holds the team nickname, e.g. “Bisons” for reference purposes. The McGill program, since relinquishing its nickname “Redmen,” is without a nickname. The stand-in “Martlets” is used here until the university comes to a decision on a future nickname.

e. Conference

The conference attribute is a Python dict, with the keys being the seasons, and the values being the conference to which each team belonged in that season. Previously, determining U Sports conferences required a set of conditional statements to account for Bishop’s changing conferences. This allows us to reference the conferences much more easily, especially in the CFL where conferences have adjusted somewhat frequently as the number of teams and the schedule structure has evolved. 

4. Further development

The existence of these objects now allows for far more team-level analysis without the need to repeatedly loop over the entire database for each team. This will allow efficient development of analyses by team, in order to determine whether and where there exist meaningful differences between teams, and where it is acceptable to use gross averages. 

a. Home stadia

Home stadia is a list holding references to all the stadium objects (Clement 2019a) that have served as that team’s home stadium. Only stadia that have been regular home venues for teams are included here, not every stadium in which a team was designated as the home team. This avoids a number of teams accumulating erroneous home stadia because of neutral-site games, but it does mean that the process of identifying these stadia must be done manually. This attribute is the inversion of the home_teams attribute associated with each stadium. Each team listed as a home team for a stadium object will have that stadium listed as one of its home venues. 

b. Game list

The game list is a list referencing all the game objects in which the team participated. For technical reasons this attribute will generally be deactivated because it’s use impedes the process of pickling the game objects, since it creates cross links that prevent games from being pickled individually. When used it allows a team’s statistics to be searched much more quickly. 

c. Player list

A future project is to create player objects along the same lines, and this link would then reference each player associated with that team. Given the inconsistencies in the play-by-play that have already been repaired it will likely be a substantial undertaking to correct not only all the play-by-play data for accuracy, but also to standardize the names of each player. Similar technical issues to those experienced by the game list are expected. The solutions to these problems are known, but relatively tedious.

d. Team Logos

Acquiring image files of each team’s logo and linking them to the team object will allow for improved visualizations where teams can be plotted individually and identified by their logo, making for a cleaner presentation than labelling individual data points, and helping to spot possible team-specific patterns

5. Conclusion

With the addition of team objects into the codebase we can now look at how different teams behave, and quickly access information about teams. We can also look forward to improved visualizations of teams against one another, and the ability to look deeper into the individuality of teams. Though some technical issues are still being resolved, this is an important step in the development of U Sports analytics.

6. References



No comments:

Post a Comment

Three Downs Away: P(1D) In U Sports Football

1-Abstract A data set of U Sports football play-by-play data was analyzed to determine the First Down Probability (P(1D)) of down & d...