sports analytics 101:: facts and proxies

December 1, 2020

Sports Analytics 101 is a series of blog posts outlining the core concepts behind sports analytics in non-technical terms. You can find all available installments in the series here.

When using the metric framework (introduced in the previous post) to analyze a metric, one of key things we need to establish is whether a metric, when used in a particular way, is a statement of fact or a proxy.

Metrics that are statements of fact include generally everything you could find in a boxscore: the total points scored by each team, the number of goals scored by a particular player, or the number of blocks. These statements of fact also include averages of boxscore-type stats, like the average number of points scored by a particular player. These numbers are all statements of fact. A player either scored ten points or they didn’t. They either averaged 15.6 points or they didn’t.

A proxy is a substitution or representative of something else, but isn’t exactly the thing it’s representing. For example, you may have heard of the term “vote by proxy.” Voting by proxy consists of sending a representative to vote on your behalf because you’re unable to vote for one reason or another. Proxies have a similar function in sports analytics. A sports analytics metric is a proxy if it’s intended to at least roughly represent something that is difficult or impossible to quantify.

Whether a metric is a fact or a proxy depends on what the metric is being used to quantify. A metric could be a fact when used to quantify one thing and a proxy when used to quantify something else. Continuing the example from the previous post in this series, Reading Hours is a factual metric when it’s used to quantify Brian’s time spent reading, because the number of hours Brian spends reading indisputably quantifies the amount of time Brian spends reading. On the other hand, Reading Hours can only ever be a proxy for Brian’s understanding of American history (which is too abstract to truly quantify), and probably not a great proxy at that.

Shifting to sports, consider a common feature of both analytics-based and non-analytics-based sportswriting: power rankings. Power rankings for just about every sport are published by a variety of analysts, using different methodologies. Some writers create power rankings by watching games and using their deep understanding of the game to rank teams. Others build complicated mathematical models to rank teams. In all cases, the power ranking is an attempt to quantify the relative strengths of teams.

However, as good as a power ranking may be, it’s not purely factual ranking. The relative strengths of teams in a league are impossible to indisputably quantify. Yes, win-loss records and standings quantify how a team performed in a given season, but are records factual reflections of how strong teams intrinsically are? I would argue they aren’t. Some might counter that a team is only as good as its record; however, it is not an indisputable fact that any standings or power rankings reflect the true relative intrinsic strengths of teams. We can all agree that a team won 10 games or scored 50 points. Those are facts. We may not all agree about how strong a team is relative to its competitors.

Of course, just because power rankings aren’t statements of fact doesn’t mean they aren’t valuable or useful. Power rankings can still be useful proxies for the relative strengths of teams.

As you delve further into advanced sports analytics metrics, you’ll find that essentially all advanced metrics are proxies. While many of these metrics are incredibly useful and offer important insight, they are proxies. Therefore, we know that they are not purely factual, and that some subjective human thought went into constructing the metrics and their relationship to whatever it is they’re quantifying. This is fine and entirely expected—if we only looked at simple factual metrics like goals scored, analytics wouldn't be all that useful.

However, it’s important to always keep in mind that proxies inherently contain this subjective influence. While analytical measures often help us cut through human biases, it’s also possible that human biases inform how a proxy metric is constructed. It’s the job of the analyst to understand if biases exist in a metric and how they might impact the relationship between the metric and what the metric is intended to quantify.