free sports data sources

One of the first steps in any analytics project is acquiring the right dataset. While much of the more advanced data (e.g. player tracking data) remains inaccessible to the public, there's still a significant amount of free data only a few clicks or a couple of lines of code away.

Two weeks ago, I published a Twitter thread listing some of these free data sources. But because not everyone likes their lists in Twitter thread form, I've replicated that list here and broken it out by pre-built datasets and R/Python packages that can be used to import data. If you have any recommendations for additions to this list, get in touch!

I'll be keeping this list updated as new data sources emerge.

Datasets

Football 🏈

Pro Football Reference

College Football Reference

College Football Data

Basketball 🏀

Basketball Reference

College Basketball Reference

ShotQuality

Hockey 🏒

Hockey Reference

MoneyPuck

Evolving Hockey

Natural Stat Trick

Puck On Net

Soccer ⚽

FB Reference

WhoScored

Understat

StatsBomb

smarterscout

Metrica Sports

Baseball ⚾

Baseball Reference

Lahman's Baseball Database

FanGraphs

Retrosheet

Baseball Savant

Cricket 🏏

ESPN Cricinfo

Cycling 🚲

Pro Cycling Stats

Tennis 🎾

tennis_atp

Multiple Sports

Public Sports Science Datasets

R & Python Packages

Football 🏈

nflscrapR (R)

nflfastR (R)

cfbscrapR (R)

Basketball 🏀

ncaahoopR (R)

wncaahoopR (R)

nba_scraper (🐍)

airball (R)

wehoop (R)

Soccer ⚽

worldfootballR (R)

tyrone_mings (R)

Baseball ⚾

pybaseball (🐍)

baseballr (R)

Lahman (R)

Cricket 🏏

cricketR (R)

Swimming 🏊‍♀️

SwimmeR (R)

Track & Field 👟

JumpeR (R)

Australian Rules Football 🏉

fitzRoy (R)

Finally, a thank you to the folks who pointed me in the direction of some of these resources: @_b4billy_, @KurtKuluz, @papamoon92, @DaniTreisman, @Westlake_CJW, @Dander_Bogaerts, @AbeJaroszewski, @PySportOrg, @MarkWood14, @Costa_Klad, @SaiemGilani, @JoshHerzenberg, & @EthanCDouglas