One of the first steps in any analytics project is acquiring the right dataset. While much of the more advanced data (e.g. player tracking data) remains inaccessible to the public, there's still a significant amount of free data only a few clicks or a couple of lines of code away.
Two weeks ago, I published a Twitter thread listing some of these free data sources. But because not everyone likes their lists in Twitter thread form, I've replicated that list here and broken it out by pre-built datasets and R/Python packages that can be used to import data. If you have any recommendations for additions to this list, get in touch!
I'll be keeping this list updated as new data sources emerge.
Public Sports Science Datasets
nflscrapR (R)
nflfastR (R)
cfbscrapR (R)
ncaahoopR (R)
wncaahoopR (R)
nba_scraper (🐍)
airball (R)
wehoop (R)
worldfootballR (R)
tyrone_mings (R)
pybaseball (🐍)
baseballr (R)
Lahman (R)
cricketR (R)
SwimmeR (R)
JumpeR (R)
fitzRoy (R)
Finally, a thank you to the folks who pointed me in the direction of some of these resources: @_b4billy_, @KurtKuluz, @papamoon92, @DaniTreisman, @Westlake_CJW, @Dander_Bogaerts, @AbeJaroszewski, @PySportOrg, @MarkWood14, @Costa_Klad, @SaiemGilani, @JoshHerzenberg, & @EthanCDouglas