languages and tools to learn for sports analytics

One of the first questions most sports analytics newcomers have is: Which languages and tools do I need to learn to be successful in the field?

Learning to code can be a big time investment, and most folks understandably want to make sure they're spending time on the important stuff. With that in mind, I've outlined a "prioritized list" of languages and tools to learn for sports analytics. There are obviously other ways to get started in this field, but this is how I personally would approach it if I were starting from scratch now.

Priority 1: Get comfortable with Excel

Most high-level sports analytics is not done in Excel these days, however, it's still important to know your way around a spreadsheet.

Don't worry about getting too fancy. You don't really need to learn how to code in VBA (Excel's programming/automation language), but you should know how to write basic formulas, build charts, and create pivot tables.

Again, Excel is not the program of choice in sports analytics, but it's important to at least be familiar with. You never know if you'll be asked to put something in a spreadsheet, and you definitely don't want to be caught off guard by one of the most commonly-used data analysis tools in the world.

If you don't have access to Excel, getting comfortable with Google Sheets (which is free) will do the trick.

Priority 2: Learn either R or Python

R and Python are the core languages of sports analytics, and most roles will require that you know at least one of them.

When you're getting started, don't worry about learning both. Ultimately, they're very similar languages and if you're really comfortable with one of them, it won't be that hard to pick up the other if necessary.

This brings us to the obvious question: Which language should you learn first? Per a very unscientific Twitter poll, most sports analytics pros are using R most often, but a decent 31% are using Python most often.

You can't really make a "bad" choice between R and Python, but if I had to make a recommendation, I'd put it like this:

  • R is better suited for analysis. It's easier to play around with data and build visualizations. If you see yourself as more of an analyst than a programmer, start with R.
  • Python is better suited for production. It's easier to build Python into larger codebases or connect it to other applications and APIs. If you see yourself as more of a programmer than an analyst, start with Python.

Looking for a way to start learning? I put together a list of resources for learning to code in the context of sports analytics.

Priority 3: Get even better at R and/or Python

My point here is that anything beyond being rock solid at R and Python can be good-to-know, but most likely won’t be a game-changer for your career prospects. In most cases, you'll probably do more for yourself by spending time improving at R or Python (learning to create better visualizations, more advanced models, or learning whichever of the two languages you haven't already) than you will by moving on to the next items in this list.

Half of the reason for this is that R and Python will most likely be the center of your universe in sports analytics. The other half of the reason is that the stuff below is relatively easy to learn on the job.

Priority 4: Learn SQL

In an organization with lots of data and decent data infrastructure, SQL (Structured Query Language) is usually the language you’ll use to access that data. Typically, it goes like this: you’ll use SQL code to pull in the data you need, and then you’ll analyze that data in R or Python.

SQL is, relatively speaking, fairly easy to learn on the job. A lot of employers will assume that entry level analysts don't know SQL already. In fact, another not-so-scientific Twitter poll of mine showed that most sports analytics professionals learned SQL on the job.

SQL is a nice-to-have and might save you some start-up time at a new job, but it's usually not crucial to know when you're just getting started.

Priority 5: Learn Tableau, Power BI, or other visualization tools

Some organizations lean on Tableau and/or Power BI to create interactive dashboards and visualizations for decision-makers. Like SQL, these tools aren’t something that employers often expect you to know on Day 1, but if you've already mastered R, Python, and a bit of SQL, it can't hurt to get familiar.

While these products require subscriptions, you can play around with Tableau Public for free.

In summary

  1. Make sure you know your way around a spreadsheet
  2. Learn either R or Python
  3. Get even better at R or Python and/or learn whichever one you didn't to begin with
  4. Learn SQL
  5. Learn visualization tools like Tableau

If you have any questions or are still unsure where to start, feel free to reach out. I’m happy to provide more specific recommendations based on your circumstances and interests.