1. Abstract
A custom class was created for teams in the Passes & Patterns database in order to hold metadata about each team, held in a dict accessible globally. Linking to relevant game objects will allow much more efficient team-specific analysis and centralize team meta-data. Future development involves creating player-level objects and linking team logo objects.
2. Introduction
Currently, teams in the Passes & Patterns database are identified by a unique three letter code that refers to either the school name for U Sports teams or the team nickname for CFL teams. While this method is certainly adequate, and it allows us to look at team-level concepts, it requires us to loop over the database repeatedly for each team, or employ a burdensome nested list structure. Furthermore, any team-level information must be stored in a variety of different lists. Creating teams as objects allows all data about teams to be stored in a single location, and allows teams to be studied more efficiently. Similarly to how the stadium objects (Clement 2019a) allowed METAR data to be included with each play (Clement 2019b) permitting analyses of the impacts of weather in U Sports football (Clement 2019c).
While most previous work in the Passes & Patterns archives refers to general analyses, including all data from all teams to establish baselines, to go deeper requires looking at the data on a team-by-team basis, and even by team-season, to suss out confounding issues, and to determine the applicability of generalized models to specific teams. In short, are teams meaningfully unique, or can the same general recommendations apply across the board.
3. Team objects
Team objects are stored in a dict in the Globals module, in the same way that stadia are held, and exist as a custom class created from a list of data about each team. Diacritics have been removed for enhanced compatibility with different character encodings. Table 1 gives a summary of all the data of the U Sports teams. CFL teams are similarly structured.
Table 1 Team Object Data
a. Abbreviation
The abbreviation attribute is a string holds the three-letter code by which teams are referred throughout the database, such as “MAN.” This is a unique identifier for every school and is the primary means by which teams are labelled in all situations. The abbreviation is is always in capital letters and refers to the name of the school, not the name of the team, for U Sports data, as opposed to CFL data that refers to the team name, such as “ALS,” because the U Sports database preceded it, and to soothe raise would create ambiguity with a number of CFL teams collocates with universities named after cities such as Ottawa, Toronto, Montreal, and Calgary. Furthermore, the existence of two different Ottawa franchises in the CFL data means that the city name cannot be used for this purpose. The abbreviation also serves as the key for the dict in which team objects are stored together, in order to be able to easily reference them using their most common appellation in the data.
b. Short name
The short name is the common name of the university in spoken parlance. Generally it is the name of the university, less the word “university” and any associated articles (“of,” “the”), such as “Manitoba.” This is the form used to identify teams in video footage, a future development. For CFL teams it is the full city name (or “BC” in the case of the B. C. Lions), also in full caps. This does create ambiguity in the manner avoided above with the abbreviation, so this form is only used to identify teams in video files.
c. Long name
The long name is the full proper name of the university, with us “University of Manitoba,” or the full team name, such as “Montreal Alouettes.” Any preceding “The” is omitted. This is written in title case as dictated by the styling of the organization, including the block caps of “Ottawa REDBLACKS,” which, in spite of all manner of sense, style, and taste, remains the preference of the organization.
d. Nickname
This string holds the team nickname, e.g. “Bisons” for reference purposes. The McGill program, since relinquishing its nickname “Redmen,” is without a nickname. The stand-in “Martlets” is used here until the university comes to a decision on a future nickname.
e. Conference
The conference attribute is a Python dict, with the keys being the seasons, and the values being the conference to which each team belonged in that season. Previously, determining U Sports conferences required a set of conditional statements to account for Bishop’s changing conferences. This allows us to reference the conferences much more easily, especially in the CFL where conferences have adjusted somewhat frequently as the number of teams and the schedule structure has evolved.
4. Further development
The existence of these objects now allows for far more team-level analysis without the need to repeatedly loop over the entire database for each team. This will allow efficient development of analyses by team, in order to determine whether and where there exist meaningful differences between teams, and where it is acceptable to use gross averages.
a. Home stadia
Home stadia is a list holding references to all the stadium objects (Clement 2019a) that have served as that team’s home stadium. Only stadia that have been regular home venues for teams are included here, not every stadium in which a team was designated as the home team. This avoids a number of teams accumulating erroneous home stadia because of neutral-site games, but it does mean that the process of identifying these stadia must be done manually. This attribute is the inversion of the home_teams attribute associated with each stadium. Each team listed as a home team for a stadium object will have that stadium listed as one of its home venues.
b. Game list
The game list is a list referencing all the game objects in which the team participated. For technical reasons this attribute will generally be deactivated because it’s use impedes the process of pickling the game objects, since it creates cross links that prevent games from being pickled individually. When used it allows a team’s statistics to be searched much more quickly.
c. Player list
A future project is to create player objects along the same lines, and this link would then reference each player associated with that team. Given the inconsistencies in the play-by-play that have already been repaired it will likely be a substantial undertaking to correct not only all the play-by-play data for accuracy, but also to standardize the names of each player. Similar technical issues to those experienced by the game list are expected. The solutions to these problems are known, but relatively tedious.
d. Team Logos
Acquiring image files of each team’s logo and linking them to the team object will allow for improved visualizations where teams can be plotted individually and identified by their logo, making for a cleaner presentation than labelling individual data points, and helping to spot possible team-specific patterns
5. Conclusion
With the addition of team objects into the codebase we can now look at how different teams behave, and quickly access information about teams. We can also look forward to improved visualizations of teams against one another, and the ability to look deeper into the individuality of teams. Though some technical issues are still being resolved, this is an important step in the development of U Sports analytics.
6. References
Clement, Christopher M. 2019a. “Home Sweet Home: Football Stadia in Canada.” Passes & Patterns. March 6, 2019. https://passesandpatterns.blogspot.com/2019/03/home-sweet-home-football-stadia-in.html.
———. 2019b. “Rain or Shine: Incorporating Weather Data into the U Sports Database.” Passes & Patterns. April 3, 2019. https://passesandpatterns.blogspot.com/2019/04/rain-or-shine-incorporating-weather.html.
———. 2019c. “It’s Up and It’s Good: Field Goals in U Sports.” Passes & Patterns. July 19, 2019. https://passesandpatterns.blogspot.com/2019/07/its-up-and-its-good-field-goals-in-u.html.
No comments:
Post a Comment