sports analytics 101: team use cases (part 2)

October 23, 2020

Welcome back to Sports Analytics 101, a series of blog posts outlining the core concepts behind sports analytics in non-technical terms. In the previous post, we covered the first of three categories of analytics use cases for teams: in-game strategy. In this post, we’ll focus on the other two categories: player personnel strategy and sports science.

Player Personnel Strategy

Player personnel strategy relates to the process of building and maintaining a roster. In most organizations, this strategy is devised and executed under the leadership of a general manager, technical director, president of football/basketball/hockey operations, or equivalent leader. Sometimes the head coach or manager takes on this role in addition to standard coaching duties.

There are two particular functions within player personnel strategy where analytics can prove incredibly useful:

The identification of players
The valuation of players

Let’s start with the identification of players.

In the process of constructing and maintaining a roster, front office leadership will identify areas of weakness within the existing roster and look to patch up those holes. In these situations, the team is, to borrow a phrase specific to drafts, “drafting for need” rather than “drafting best available.” Analytics can make the process of identifying players that fit a specific need more efficient. This process can be particularly critical for a team with resource constraints (in other words, a team that can’t afford to hire a large staff of scouts).

Let’s return to the example from the first installment of this series in which a soccer club (we’ll call it Green Lake FC) is in need of a new left back. Green Lake FC has several criteria for the player: that he be young, fast, and adept at free kicks.

Because soccer is a global sport, there are thousands of professional left backs worldwide. Green Lake has a team of scouts but, as good as they may be, the scouts have only so much time and can’t possibly watch film of all of the left backs that could fit the desired profile. Before Green Lake can deploy its scouts to begin watching the prospects, it has to narrow down the pool of prospects to a manageable size.

There are a few ways Green Lake could narrow down this pool. The club could only look at players that come recommended through the professional networks of the coaching staff and scouts. Alternatively, the club could only look at players that the coaching staff is already aware of. But in either of these cases, a large portion of the professional left backs in the world are completely disregarded without having been properly scouted. It’s simply not possible for the coaching and scouting staff and their networks to simultaneously be aware of and familiar with all of the professional left backs in the world.

Using data, however, Green Lake can very quickly “look” at a much larger portion of the professional left backs in the world and more efficiently filter the pool. Let’s say Green Lake has a database of players from many of the professional leagues worldwide. The analytics staff can write an algorithm that combs this database to find the subset of professionals that fit the stated criteria of being young, fast, and adept at free kicks. From here, the scouts can begin watching film and live games of players in this subset. Thus, the quality of the subset of prospects watched by the scouts is likely high because that group was sourced from a very large pool and whittled down via analytics.

The second common use for analytics within player personnel strategy is player valuation: predicting the future performance of the player in order to help answer common front office questions like:

How high should this player go in the draft?
How much are we willing to pay this player?
What assets are we willing to give up in a trade for this player?

Analytics can be quite useful in helping to predict the future performance of current players or prospects through the process of predictive modeling. Predictive modeling, in this context, involves analyzing historical data to identify which components of a player’s performance are predictive of future performance, and how predictive each of those components are.

Let’s return to the Bears, the professional baseball team from the first post in this series. The Bears are preparing for the upcoming draft and would like to maximize their chances of drafting a future All-Star with their first round pick. The General Manager asks the analytics department to prepare a ranked list of potential draftees that are most likely to be All-Stars in the major leagues. The General Manager plans to combine the information from this list with intel from the scouting department in deciding who to draft.

The Bears’ analytics staff goes to work building a predictive model to estimate the likelihood that a player will be an All-Star. They begin by compiling the college and high school stats of as many players as they can, including some players that ultimately became All-Stars and some that did not. The analytics staff then uses one or more of the many applicable statistical techniques to determine which high school and college stats are predictive of eventually being an All-Star, and which of these stats are the most predictive. With this information, they can analyze at the stats of the potential draftees and identify which ones are most likely to be All-Stars.

One particular variety of predictive modeling in sports analytics goes further by identifying historical players that were very similar in high school and college to each potential draftee. The theory behind this is that players that were very similar statistically (and even physically) in high school and college are more likely to have similar career trajectories in the majors. For example, if the Bears’ analytics staff would like to predict the career trajectory of high school standout outfielder, Jorge, they might comb their database of historical high school players to find the players that had the most similar stats and attributes to Jorge in high school. The analytics staff would then analyze the career trajectories of these comparable players to estimate the career trajectory of Jorge. This is the general approach taken by Nate Silver’s famous PECOTA model.

Notice how this process of combing a database of historical high school baseball players to find players with certain attributes is very similar to Green Lake FC’s process of using an algorithm to search a database for left backs with certain characteristics. One of the key advantages of using data is the ability to efficiently identify players in a database with certain desired qualities without having to watch hours of film on thousands of players. This is helpful not only in identifying prospects, but can also add value in predicting the future performance of those prospects.

Additional player personnel strategy questions include:

Soccer: How much is a Major League Soccer international roster spot worth?
Baseball: Is it worth signing this free agent pitcher at the price his agent is asking?
Hockey: If we traded our first round draft pick, what is fair compensation?
Football: All else equal, is it more efficient to use a high draft pick on a QB or a WR?

Sports Science

The goal of sports science is to protect the health and maximize the physical output of athletes. Often, a team’s sports scientists work in concert with the team’s medical staff. In fact, sports scientists may have a background in a medical field themselves.

Sports science involves monitoring each player’s vital physical markers and using that data to make important decisions, such as how hard a player’s training should be in a particular week, or whether a player should be permitted to play an entire game.

Consider a hockey team, the Rhinos. The Rhinos have a sports science staff that works closely with the team trainers and doctors. These members of the sports science and medical staff work with the coaching staff to determine how many minutes a player should practice in a week and how many minutes a player should play in a game. For example, the Rhinos might have a center who is returning from a hamstring injury, Lucy. The coaching staff is eager to bring Lucy back into the lineup, but wants to be cautious so that she doesn’t reinjure her hamstring and require additional time out. The sports science staff has a large dataset of historical players, their minutes played over a period, their injury history, and other key physical attributes. Using this data, the sports science staff builds a predictive model that estimates the likelihood that a player will get injured during a game given the player’s physical attributes and the amount of time played in the game. The Rhinos’ sports science staff might come to the coaching staff and say something along the lines of:

Given Lucy’s injury history, recovery time, and overall physique, our model predicts that she has roughly a 40% chance of getting injured if she plays the next game. However, if she’s given an additional week of rest, we predict that she will only have roughly a 5% chance of getting injured in next week’s game.

As you can imagine, this type of insight can be quite useful to a coaching staff that needs to balance the desire to have Lucy back in the lineup with the need to protect her health for the long-term.

Another application of sports science is the use of data to plan differential training sessions for various players on a team given their respective recent output. Let’s return to Green Lake FC. Particularly in soccer, because 11 players play most of the game and only three substitutions are allowed, a subset of starters will play the vast majority of minutes on game days while the remainder of the squad will either play little or not at all. In the days after a game, the team’s manager will likely want to plan a lighter training for the players that played most or all the game, and a harder training session for the players that didn’t play. Sports science can help determine how much training each player should undertake to maintain fitness without being overworked.

Green Lake’s sports science staff can use wearable devices to track all player movement during both training sessions and games. These wearable devices are small trackers that players wear under their jerseys. The devices track metrics like distance and speed, which help the sports science staff determine the players’ physical output. This information can be combined with supplementary data, like how much the players have slept and how they’re feeling mentally, to estimate the amount of rest the players need in order to be in optimal physical condition for the next game.

Additional sports science questions include:

Basketball: How many minutes should we play him in tonight’s regular season game to ensure he’s adequately rested for the playoffs?
Baseball: How many pitches we can expect from our starter before they need to come out?
Tennis: How much sleep does she need to be getting in the week before her next match?

In Summary

We’ve now covered the three categories of analytics use cases for teams: in-game strategy (in the previous post), player personnel strategy, and sports science. Remember, there are no hard boundaries between these categories and a piece of analysis could fall into more than one category.

In the next post, we’ll dive into analytics use cases for leagues.