11 Best Climate Change Datasets for Machine Learning

Article by Lucas Scott | October 15, 2020

Data is a central piece of the climate change debate. With the climate change datasets on this list, many data scientists have created visualizations and models to measure and track the change in surface temperatures, sea ice levels, and more. Many of these datasets have been made public to allow people to contribute and add valuable insight into the way the climate is changing and its causes. 

We hope this collection provides you with a jumping off point to use your skills to contribute to one of the biggest and most important challenges of our time.

 

Global Climate Change Datasets

1. Berkeley Earth Surface Temperature Data – From the Berkeley Earth Data page, this dataset in made up or temperature recordings from the Earth’s surface.

Climate change temperature datasets

The data ranges from November 1st, 1743 to December 1st, 2015. The dataset is divided into several files including: 

  • GlobalTemperatures
  • GlobalLandTemperaturesByCountry
  • GlobalLandTemperaturesByState
  • GlobalLandTemperaturesByMajorCity
  • GlobalLandTemperaturesByCity

2. Global Climate Change Data – This dataset includes information from the Climate Change Knowledge Portal and World Development indicators. It covers various topics such as greenhouse gas emissions, energy consumption, and more. The total time period of the data covers 1990 – 2011.

3. International Greenhouse Gas Emissions – Created by the United Nations, this Kaggle dataset contains Greenhouse Gas Inventory Data from 1990 to 2014. The official UN website has updated the dataset up to 2017. It includes emission levels by country and region for the following gases:

  • carbon dioxide (CO2)
  • methane (CH4)
  • nitrous oxide (N2O)
  • hydrofluorocarbons (HFCs)
  • perfluorocarbons (PFCs)
  • unspecified mix of HFCs and PFCs
  • sulphur hexafluoride (SF6)
  • nitrogen trifluoride (NF3)

4. Daily Sea Ice Extent Data – From The National Snow and Ice Data Center, this climate change dataset has information on the Earth’s cryosphere, and includes glacier, ice, snow and frozen ground data. The dataset has seven columns: year, month, day, extent, missing, source, and hemisphere. Extent refers to the area of the ocean that includes portions of sea ice.  

sea ice extent data

5. Climate Change Adaptation of Coffee Production – From the Harvard Dataverse, this dataset was created to determine the impact of climate change on coffee production quality in Nicaragua. The dataset is divided into six Geotiff Raster files

6. Climate Change in Russia – As Russia is one of the largest producers of CO2 emissions worldwide, this portal on Statista highlights Russia’s C02 emissions volume from 1985 to 2019. It also includes information about the percentage of the Russian population who have been exposed to pollution.

Please note that this dataset is from Statista. Some of the charts and statistics within this dataset may require a premium Statista account. 

7. The Climate Change Knowledge Portal – This portal from World Bank Group is an easy-to-navigate platform where you can view climate change data visualizations based on historical data and projections. You can browse the data by impact sectors: energy, water, agriculture, and health. Alternatively, you can also browse by country, region, and watershed. Most importantly, the data is available for free download.

 

United States Data

8. Climate Change Projections and Impacts for New York State – This dataset is curated by the New York state government website. It contains climate data projections for three time periods: the 2020s, 2050s, and 2080s. The dataset includes the following data variables:

  • Average annual temperature
  • Average annual rainfall
  • Extreme weather events
  • Rise of sea levels

9. SGMA Climate Change Resources – From the California Natural Resources Agency, the SGMA Climate Change Resources Dataset includes data on changes in precipitation and bodies of water within the state of California. Some of the data provided includes climate condition projections for 2030 and 2070.

 

Social Media Climate Change Datasets

social media datasets

10. Harvard Dataset of Climate Change Tweet IDs – Collected between September 2017 and May 2019, the Climate Change Tweet IDs Dataset contains the IDs from over 39 million tweets about climate change. The tweets were tracked and curated using these hashtags related to climate change:

  • #climatechange
  • #climatechangeisreal
  • #actonclimate
  • #globalwarming
  • #climatechangehoax
  • #climatedeniers
  • #climatechangeisfalse 
  • #globalwarminghoax 
  • #climatechangenotreal

11. Sentiment of Climate Change – From Crowdflower, this dataset includes tweets that were classified for their sentiment by human contributors. The tweets were classified as:

  • Yes = Content suggests global warming is happening
  • No = Content suggests global warming is not happening
  • I can’t tell = Content is not clear or completely not related to global warming

We hope you found this list of climate change datasets useful. If you couldn’t find the data you need, check out our datasets library. Please be sure to subscribe to our newsletter below for more open datasets, AI news, and machine learning guides. 

Keep up with all the latest in machine learning
The Author
Lucas Scott

Lucas is a seasoned writer, with a specialization in pop culture and tech. He spends most of his free time coaching high-school basketball, watching Netflix, and working on the next great American novel.

    Welcome!

    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.