Top 12 Free Demographics Datasets for Machine Learning Projects

Article by Alex Nguyen | February 15, 2019

The power of demographic data lies in its potential to better government and society by serving as the basis for important economic decisions. In the same vein, machine learning models trained using demographic data can aid policymakers in identifying trends and preparing for issues related to population growth, aging and migration. To help data scientists harness this information, we at Lionbridge AI have prepared a list of the best free demographics datasets made from public sources of data.

Check out the full list below:


Free Demographic Datasets for Machine Learning

American FactFinder: The Census Bureau’s web-based, self-service tool to search a variety of population, economic, geographic and housing information.

U.S. Healthcare Data: Data about population health, diseases, drugs, health plans and more collected from the FDA drug database, USDA Food composition database and more.

New York City Census Data: Population, racial/ethnic demographic information, employment and commuting characteristics for New York City neighborhoods.

DataFerrett: A wide variety of population, health, economic, geographic and housing information about the United States to individuals, businesses, governments, and organizations.

US Public Assistance for Women and Children: Public assistance in the United States with initial coverage of the WIC Program. Files may include participation data and spending for state WIC programs, and poverty data for each state from 2012–2016.

Silicon Valley Diversity Data:  The demographics for 23 Silicon Valley tech companies, including factors like race, gender and salary.

World Gender Statistics: A free demographics dataset with the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.

Demographic Trends (1970-2010) for Coastal Geographies: Data derived from Census Block Group Data for 13 different coastal geographies.

National Student Loan Data System (NSLDS): A centralized, integrated view of loans and grants during their complete life cycle, from aid approval through disbursement, repayment, deferment, delinquency, and closure.

ZIP Code Data: This study provides detailed tabulations of individual income tax return data at the state and ZIP code level.

Nutrition, Physical Activity, and Obesity – Women, Infant, and Child: Data on weight status for children aged 3 months to 4 years old from Women, Infant, and Children Participant and Program Characteristics (WIC-PC).

The Demographic /r/ForeverAlone Dataset: Demographic data collected from a survey of subscribers of the subreddit /r/ForeverAlone


We hope that this list of free demographics datasets can help you with your own projects.

In case you missed our previous dataset articles, you can find them all here. Still can’t find the custom data you need to train your model? Lionbridge AI provides custom AI training data in over 300 languages for your specific machine learning project needs.

The Author
Alex Nguyen

Alex manages content production for Lionbridge’s marketing team. Originally from San Francisco but based in Tokyo, she loves all things culture and design. When not at Lionbridge, she’s likely brushing up on her Japanese, letting loose at indie electronic shows or trying out new ice cream spots in the city.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.