sports analytics 101: descriptive vs. predictive

Sports Analytics 101 is a series of blog posts outlining the core concepts behind sports analytics in non-technical terms. You can find all available installments in the series here.

In an earlier post, I introduced a framework for thinking about an individual sports analytics metric. This framework is essentially mental “paperwork” to fill out whenever you use a new metric to ensure you understand what the metric is and what it isn’t.

In using the framework, we first establish the name of the metric and what it’s being used to quantify. Next, we establish whether the metric is a fact or a proxy. Once all that’s done, we arrive at the question of whether the metric is descriptive or predictive.

If you’ve followed sports long enough, specifically player personnel dealings, you’ve probably heard the term “paying for past performance.” The phrase is typically used as an indictment of a general manager or front office for offering a player a contract commensurate with their level of production in the past rather than their expected level of production in the future. There are a lot of reasons that past performance might not resemble future performance, but one of the most notable reasons is age. If a player has been performing at a high level in recent years, but is on the way out of their physical prime, you probably shouldn’t expect them to perform at the same level going forward.

Nonetheless, it’s easy to see why a front office might still end up “paying for past performance.” After all, many of the traditional sources of information that a front office might factor into its decision-making are from the past. We can’t watch tape of a player’s future games to determine what to pay them for those games. We don’t know for sure how many points the player will score next season. Without much information on future performance, the front office might assume that future performance will be similar to past performance. It probably doesn’t help that if the player has had a few good seasons recently, any good agent will focus on that past performance in contract negotiations.

But future performance doesn’t always resemble past performance, so it’s important to make a distinction between what happened in the past and what we expect to happen in the future, especially as it relates to sports analytics metrics. This brings us to the distinction between descriptive metrics and predictive metrics: Descriptive metrics are intended to describe what has happened in the past. Predictive metrics are intended to provide insight into what might happen in the future (Note the use of might here. Virtually any prediction of the future is going to have some uncertainty attached to it).

To illustrate this distinction, let’s return to an example from the media use cases post. Recall the hypothetical blogger, Simone, who builds a metric to rate how good soccer players have been at taking corner kicks. In this case, Simone is building a metric to summarize what happened in the past, specifically how good certain players have been at taking corner kicks in previous games. Perhaps this metric could be used to estimate how good these players will be at taking corner kicks in the future, but the explicit output of the metric is a description of past corner-kick-taking performance.

Contrast this to a metric like win probability, which is often used to estimate the probability that a team will win a particular future game. A win probability model might take historical data as inputs but the output metric itself, the probability that a team will win a particular future matchup, is forward-looking. Win probability, in this case, is a prediction of what will happen in the future, not a description of what has happened in the past.

As it turns out, most predictive metrics like are built using descriptive metrics. It’s difficult to build a metric that accurately predicts what will happen in the future without knowing how similar situations have played out in the past. If I asked you to predict the weather tomorrow in Boston, your first question would probably be: “What is the weather typically like in Boston at this time of year?” You would most likely use this historical, descriptive information to make an educated prediction of what the weather will be like tomorrow.

So, what ultimately determines whether a metric is descriptive or predictive? It's all about context. Whether a metric is descriptive or predictive comes down to what we’re using the metric to quantify. If we’re using the metric to predict something in the future, it’s a predictive metric in that context, for better or worse.

Some metrics can theoretically operate as either descriptive metrics or predictive metrics, depending on how they're being used. Let’s continue with the example of Reading Hours from an earlier post. If we’re using Reading Hours to quantify how many hours Brian spent reading last year, it’s a descriptive metric. If we’re using Reading Hours to quantify how many hours we predict Brian will read next year, it’s a predictive metric.

That said, many advanced metrics were created explicitly to be used in a descriptive capacity or in a predictive capacity. If you're using a metric built by someone else, it's important to understand if the metric was constructed to be descriptive or predictive. No matter who built the metric, however, ask yourself: Am I using this particular metric to make a prediction and, if so, does the metric adequately account for the ways in which the future may be different from the past?