20 Free Sports Datasets for Machine Learning

Article by Alex Nguyen | April 08, 2019

The lack of public sports data sources has been a major obstacle in the creation of modern, reproducible research and sports analytics. To help spread access to the available sources out there, we at Lionbridge AI have created a cheat sheet of publicly available sports datasets for machine learning. These include NBA datasets, soccer datasets, football datasets, and more. 

Sports Datasets for Machine Learning

Soccer Datasets

football.db: A free and open public domain football database & schema for use in any programming language.

FIFA 19 complete player dataset: Detailed attributes for every player registered in the latest edition of the FIFA 19 database scraped from SoFIFA.

Fifa 18 More Complete Player Dataset: An extension of the previous dataset, this version contains several extra fields and is pre-cleaned to a much greater extent.

World Cup Dataset: This dataset shows all information about historical World Cups as well as all match data.

International football results from 1872 to 2018: This dataset contains 40,000 results of football matches from the very first official match in 1972 up until 2018. Matches range from FIFA World Cup to regular friendly matches.


Basketball and NBA Datasets

NBA Play by Play: This datasets includes data for every player in the league’s history. It also includes play by play information for each team in the league dating as far back as the 2000/2001 season. 

NBA shot logs: Data on shots taken during the 2014-2015 season, which player took the shot, where on the floor was the shot taken from, who was the nearest defender, how far away was the nearest defender, time on the shot clock, and much more.

NBA Player of the Week Data: This NBA dataset is about the players of the week from 1984-5 to 2018-9 seasons, scraped from the Basketball real gm site.

Daily Fantasy Basketball: This dataset contains 20 days of DraftKings NBA fantasy basketball contest data scraped at the end of 2017.

NCAA Basketball: This dataset contains data about NCAA Basketball teams, teams, and games. It covers play-by-play and box scores from 2009 and final scores from 1996.


American Football datasets

NFLsavant.com: A website dedicated to providing NFL statistics in a simple interface. All data is compiled from publicly available NFL play-by-play data.

Detailed NFL Play-by-Play Data 2009-2018: Regular season plays from 2009-2016 containing information on: players, game situation, results, and advanced metrics such as expected point and win probability values.

NFL Draft Outcomes: This sports dataset includes all players selected in the NFL Draft from 1985-2015 including outcome statistics.


Racing Datasets

Ergast Formula One Dataset: An experimental web service which provides a historical record of motor racing data for non-commercial purposes.

Formula 1 Race Data: This dataset contains data from 1950 all the way through the 2017 season, and consists of tables describing constructors, race drivers, lap times, pit stops and more.


Miscellaneous Sports Datasets

FiveThirtyEight – Anews and sports site with data-driven articles. They make their datasets openly available on Github.

SPORTS-1M: 1M sports videos of average length-5.5mins labelled for 487 sports classes.

120 years of Olympic history: A historical dataset on the Olympic Games, including all the Games from Athens 1896 to Rio 2016 with data scraped from sports-reference.com.

Daily and Sports Activities Data Set: Motion sensor data of nineteen sports activities performed by 8 subjects in their own style for 5 minutes.

Lahman’s Baseball Database: A complete history of major league baseball stats from 1871 to 2018, including batting and pitching stats, standings, team stats, managerial records, post-season data, and more.

NHL Game Data: Game, team, player and play data including x,y coordinates measured for each game in the NHL in the past 6 years.


We hope this list of sports datasets will help you find the data you need in your own projects.

In case you missed our previous dataset compilations, you can find them all here. Still can’t find the custom data you need to train your model? Lionbridge AI provides machine learning data in dozens of languages for machine learning project needs.

Contact us to learn how Lionbridge AI can improve your training data.

The Author
Alex Nguyen

Alex manages content production for Lionbridge’s marketing team. Originally from San Francisco but based in Tokyo, she loves all things culture and design. When not at Lionbridge, she’s likely brushing up on her Japanese, letting loose at indie electronic shows or trying out new ice cream spots in the city.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.