The lack of public sports data sources has been a major obstacle in the creation of modern, reproducible research and sports analytics. To help, we at Lionbridge AI have created a cheat sheet of publicly available machine learning datasets categorized by sport.
football.db: A free and open public domain football database & schema for use in any programming language.
Fifa 18 More Complete Player Dataset: An extension of the previous dataset, this version contains several extra fields and is pre-cleaned to a much greater extent.
World Cup Dataset: This dataset shows all information about historical World Cups as well as all match data.
International football results from 1872 to 2018: This dataset contains 40,000 results of football matches from the very first official match in 1972 up until 2018. Matches range from FIFA World Cup to regular friendly matches.
NBA Play by Play: This datasets includes data for every player in the league’s history. It also includes play by play information for each team in the league dating as far back as the 2000/2001 season.
NBA shot logs: Data on shots taken during the 2014-2015 season, which player took the shot, where on the floor was the shot taken from, who was the nearest defender, how far away was the nearest defender, time on the shot clock, and much more.
Daily Fantasy Basketball: This dataset contains 20 days of DraftKings NBA fantasy basketball contest data scraped at the end of 2017.
NCAA Basketball: This dataset contains data about NCAA Basketball teams, teams, and games. It covers play-by-play and box scores from 2009 and final scores from 1996.
American Football datasets
NFLsavant.com: A website dedicated to providing NFL statistics in a simple interface. All data is compiled from publicly available NFL play-by-play data.
Detailed NFL Play-by-Play Data 2009-2018: Regular season plays from 2009-2016 containing information on: players, game situation, results, and advanced metrics such as expected point and win probability values.
NFL Draft Outcomes: All players selected in the NFL Draft from 1985-2015 including outcome statistics.
Ergast Formula One Dataset: An experimental web service which provides a historical record of motor racing data for non-commercial purposes.
Formula 1 Race Data: This dataset contains data from 1950 all the way through the 2017 season, and consists of tables describing constructors, race drivers, lap times, pit stops and more.
Miscellaneous Sports Datasets
FiveThirtyEight – Anews and sports site with data-driven articles. They make their datasets openly available on Github.
SPORTS-1M: 1M sports videos of average length-5.5mins labelled for 487 sports classes.
Daily and Sports Activities Data Set: Motion sensor data of nineteen sports activities performed by 8 subjects in their own style for 5 minutes.
Lahman’s Baseball Database: A complete history of major league baseball stats from 1871 to 2018, including batting and pitching stats, standings, team stats, managerial records, post-season data, and more.
NHL Game Data: Game, team, player and play data including x,y coordinates measured for each game in the NHL in the past 6 years.
In case you missed our previous dataset compilations, you can find them all here. Still can’t find the custom data you need to train your model? Lionbridge AI provides machine learning data in dozens of languages for machine learning project needs.
Contact us to learn how Lionbridge AI can improve your training data.