Welcome back to Sports Analytics 101, a series of blog posts outlining the core concepts behind sports analytics in non-technical terms. You can find all available installments in the series here.
In the previous two posts, we covered the three categories of analytics use cases for teams: in-game strategy, player personnel strategy and sports science. We now turn our attention to analytics use cases for leagues.
League-level sports analytics work can generally be classified into two categories:
A league might be interested in helping to supplement or drive the analytics work of its constituent teams to further the broader development of analytics and technology related to the sport.
For example, let’s say the National Elbowball League (NEL) is the most influential league in the very made-up sport of elbowball. While elbowball is gaining popularity rapidly, the league office realizes that elbowball analytics research and technology significantly trail analytics research and technology in sports like baseball and basketball, where analytics has been gaining momentum for some time.
The NEL wants to boost the profile of the burgeoning sport and the league, and sees a more robust analytics ecosystem as one tool to make that happen. For example, the league wants to see panels at the MIT Sloan Sports Analytics Conference on the latest developments in elbowball, and talking heads on ESPN discussing whether Elbow Efficiency Rating is a fair way to determine who is the greatest elbowball player of all time.
The issue is, nobody has been collecting good data on elbowball, so even if one of the NEL’s teams wanted to start using analytics, it would presumably have to pay a lot of money for a sports data company to start collecting NEL data. That might be an insurmountable barrier for any individual team. However, the league office can harness the collective power of the entire league to engage a company to collect data for the league, and give that data to the individual teams. Yes, this too would likely be a significant investment, but the league office can spread the cost across the entire league to save money on behalf of each team.
A similar situation could occur even if there is already extensive data for a particular league. For example, if capturing a certain type of data requires technology embedded in individual players’ equipment, the league might have to negotiate with the players’ union and other stakeholders to facilitate collection of that data. The league would probably even have to change its own equipment rules. This is the type of data collection arrangement that would likely have to be undertaken at the league level.
Let’s say that the NEL wants to start collecting data that tracks player movement throughout the game. It determines that the best way to do this is to embed a chip in the players’ helmets (yes, of course you wear a helmet when you play elbowball!).
First, the league might have to change the rulebook to allow chips in helmets, which could involve some negotiation with the players’ union.
Second, the players’ union might want assurances over who will have access to the data and how it will be used, like whether the data will be made public, or whether it will be used to make health evaluations. All parties involved will want to ensure that embedding the chip in the helmet does not make the helmet less safe.
Third, the league will have to work with a technology company to build the specialized chips that fit in the helmets, and any additional technology required to support the tracking infrastructure.
It might not be possible for an individual team to orchestrate this complex process. Therefore, the rollout of this type of data collection technology requires coordination by the league office.
Unlike individual teams, leagues are generally not looking to analytics to uncover competitive edges that might make a difference in-game. However, there are still insights to be gleaned from game data that can be quite useful in the decisions leagues need to make. For example, consider rule changes. When an existing league is evaluating a potential rule change, or an upstart league is building its rulebook from scratch, analytics can help evaluate the impact of potential rules. This applies to in-game rules and rules that govern the structure of the season. Let’s walk through examples of both.
Say an upstart football league, the Other Football League (OFL), has decided that it will not be allowing kickoffs and will instead have teams start with the ball somewhere inside their own 50. The league would prefer games to be high-scoring, but it doesn’t want the final scores to be too high. Therefore, where should the ball be placed in lieu of a kickoff? In order to answer this question, it would be helpful to have some estimates of how much scoring the league could expect given the placement of the ball. The league’s analytics staff turns to statistical simulation to build these estimates.
Statistical simulation uses historical data to create simulations of real-life situations that are as close to the real thing as possible. In this case, the league’s analytics staff uses historical NFL data to determine how often NFL teams have historically scored from different starting points on the field. Using this information, they can build a game simulator that closely mimics an NFL game, while allowing the analysts the ability to alter the rules slightly to understand the effects of prospective rule changes. In this case, the analytics staff first builds a game simulator in which teams start with the ball on the 35 yard line. They simulate thousands of games using this rule and determine the average score of those simulations. Next, they go through the same process, this time with teams starting with the ball on the 40 yard line, and so on. Using these different simulations, the analytics staff can go to league executives and say something along the lines of:
Based on our simulations, if teams started with the ball on the 35 yard line instead of receiving a kickoff, the average total score of a game would be roughly 55 points. If teams started with the ball on the 40, the average total score would be roughly 58 points. If teams started with the ball on the 45, the average score would be roughly 61 points.
Of course, all of these results are estimates that would need to be taken with a grain of salt. For example, the simulations are built using NFL data, but there is certainly no guarantee that OFL players would be as effective as scoring as NFL players. Nevertheless, this type of analysis could be a helpful guide for OFL decision-makers building their rulebook.
Simulation can also be used to analyze the impact of changes to a league’s schedule structure. For this example, we’ll return to the slightly more established National Elbowball League (NEL). Historically, the NEL regular season has consisted of 100 regular season games and a postseason contested by the top sixteen teams in the league. All rounds of the postseason follow a best-of-seven format. Leadership at the league feels that the postseason is too predictable, with the best-of-seven format allowing the higher ranked team to win virtually every series.
Over the course of seven games, the better team has more opportunity to leverage its strengths. The underdog has to beat the favorite four times to advance, a tough task. On the other hand, if all rounds of the tournament were to consist of single elimination games, the underdog would only need to win once, improving the chances of the underdog advancing in each round. The league, however, is hesitant to make the dramatic shift to a single-game knockout tournament for fear that with more potential for upsets, tournament seeding would matter less and the regular season wouldn’t be nearly as important.
With this in mind, the league wants to find a happy medium that produces some, but not too many upsets. In the real world, the league would almost certainly have an interest in maximizing the total number of games as well, because more games means more ticket and TV revenue. For the sake of simplicity in this example, though, let’s pretend that money doesn’t matter to the NEL, and that the league is only interested in finding a healthy rate of upsets.
The analytics department at the NEL goes to work constructing a season statistical simulator. Using statistically-realistic representations of NEL teams, they build a model that simulates each individual game of a season, including the regular season and the playoffs. With this model, they can simulate seasons under each playoff format to determine the rate of upsets in each round under each format. Similarly to the previous example with the OFL, the NEL analytics staff comes back to league decision makers with results that might sound something like this:
Under our current best-of-seven format, only 10% of round-of-sixteen and quarterfinal matchups end in an upset, making these two rounds of the playoffs very predictable. Based on our simulations, if we moved to a single elimination format, we would see the percentage of upsets in the round-of-sixteen and quarterfinals rise to 40%. Under a best-of-three format, the percentage of upsets drops to 30%, and under a best-of-five format, it drops to 20%.
These simulations could also provide even more granular insights, such as the probability that each seed advances from each round under the different playoff formats. For a league decision-maker, having numerical estimates to guide decision-making could be invaluable.
Additional league analytics questions include:
We’ve now covered analytics use cases for teams and leagues. Generally, we can categorize team use cases into three groups based on the types of decision making they’re intended to support: in-game strategy, player personnel strategy, and sports science. Similarly, we can categorize league use cases into two groups: league-orchestrated development and league-level decision making.
In the next post, we’ll shift our focus to use cases for the media.