One of the first questions most sports analytics newcomers have is: Which languages and tools do I need to learn to be successful in the field?
Learning to code can be a big time investment, and most folks understandably want to make sure they're spending time on the important stuff. With that in mind, I've outlined a "prioritized list" of languages and tools to learn for sports analytics. There are obviously other ways to get started in this field, but this is how I personally would approach it if I were starting from scratch now.
Most high-level sports analytics is not done in Excel these days, however, it's still important to know your way around a spreadsheet.
Don't worry about getting too fancy. You don't really need to learn how to code in VBA (Excel's programming/automation language), but you should know how to write basic formulas, build charts, and create pivot tables.
Again, Excel is not the program of choice in sports analytics, but it's important to at least be familiar with. You never know if you'll be asked to put something in a spreadsheet, and you definitely don't want to be caught off guard by one of the most commonly-used data analysis tools in the world.
If you don't have access to Excel, getting comfortable with Google Sheets (which is free) will do the trick.
R and Python are the core languages of sports analytics, and most roles will require that you know at least one of them.
When you're getting started, don't worry about learning both. Ultimately, they're very similar languages and if you're really comfortable with one of them, it won't be that hard to pick up the other if necessary.
This brings us to the obvious question: Which language should you learn first? Per a very unscientific Twitter poll, most sports analytics pros are using R most often, but a decent 31% are using Python most often.
You can't really make a "bad" choice between R and Python, but if I had to make a recommendation, I'd put it like this:
Looking for a way to start learning? I put together a list of resources for learning to code in the context of sports analytics.
My point here is that anything beyond being rock solid at R and Python can be good-to-know, but most likely won’t be a game-changer for your career prospects. In most cases, you'll probably do more for yourself by spending time improving at R or Python (learning to create better visualizations, more advanced models, or learning whichever of the two languages you haven't already) than you will by moving on to the next items in this list.
Half of the reason for this is that R and Python will most likely be the center of your universe in sports analytics. The other half of the reason is that the stuff below is relatively easy to learn on the job.
In an organization with lots of data and decent data infrastructure, SQL (Structured Query Language) is usually the language you’ll use to access that data. Typically, it goes like this: you’ll use SQL code to pull in the data you need, and then you’ll analyze that data in R or Python.
SQL is, relatively speaking, fairly easy to learn on the job. A lot of employers will assume that entry level analysts don't know SQL already. In fact, another not-so-scientific Twitter poll of mine showed that most sports analytics professionals learned SQL on the job.
SQL is a nice-to-have and might save you some start-up time at a new job, but it's usually not crucial to know when you're just getting started.
Some organizations lean on Tableau and/or Power BI to create interactive dashboards and visualizations for decision-makers. Like SQL, these tools aren’t something that employers often expect you to know on Day 1, but if you've already mastered R, Python, and a bit of SQL, it can't hurt to get familiar.
While these products require subscriptions, you can play around with Tableau Public for free.
If you have any questions or are still unsure where to start, feel free to reach out. I’m happy to provide more specific recommendations based on your circumstances and interests.