How Google is Leading the Quest for Data with Google Dataset Search

Article by Limarc Ambalina | June 20, 2019

For researchers and developers in the artificial intelligence industry, the demand for high-quality AI training data only seems to be increasing. Despite this increasing demand, one of the biggest challenges that ML developers face is the search for quality data to train their algorithms. While the demand grows, companies and private organizations seem to be guarding their data more seriously.

Even open data from social media platforms like Twitter and Facebook are difficult to collect. Custom-built APIs are often required just to scrape the data from these platforms. With the goal of creating easier access to open data for researchers, developers, journalists, and general enthusiasts, Google Dataset Search was launched late last year. When the company that made the world’s largest search engine creates a search engine specifically for datasets, it’s bound to get AI developers and machine learning researchers excited.

How Google is Leading the Quest for Data with Google Dataset Search - Interface
via toolbox.google.com/datasetsearch

 

What is Google Dataset Search?

As the name implies, Google Dataset Search is a search engine specifically for finding datasets. With their dataset search engine similar to Google Scholar, the company aims to improve worldwide access to open data. The search engine is free to use and is available in multiple languages, with more language options to be added in the future.

 

Lionbridge AI is an industry-leading provider of Search Evaluation Services. Contact us to learn how we can improve your search engine.

 

The Interface

While Google Dataset Search is still in beta, the search engine UX is well-developed, providing a succinct overview of each indexed dataset. If the details are available for the dataset, Google Dataset Search displays the following information:

  • Area/geolocale covered
  • File formats
  • Author(s)
  • License information
  • Creation date
  • Providing company or facility
  • Dataset description
  • Time period covered
  • Date of most recent update
  • Variables measured

 

How do you make your dataset available on Google Dataset Search?

Google Dataset Search crawls and indexes datasets from websites and repositories online just as its corpus search engine does. If you want your dataset to be crawled and indexed properly, you must follow the Schema.org Dataset Markup or one of the other data structure methods described in the dataset developer info.

To learn more about Google Dataset Search, check out the FAQ thread in the community help page.

 

In a world where technology is developing exponentially year after year, the quest for quality data sources will continue to be a challenging one. If you’re still having trouble finding the training data you need, get in touch with Lionbridge AI to learn how our crowd of multilingual experts can help you meet your project’s needs.

 

Multilingual Data Annotation Services

Lionbridge provides professional data annotation services in over 300 languages.

Some of our most popular languages include:

  • Chinese data annotation
  • Italian data annotation
  • Dutch data annotation
  • Japanese data annotation
  • French data annotation
  • Portuguese data annotation
  • German data annotation
  • Spanish data annotation
Learn how it feels to offload your data annotation tasks
The Author
Limarc Ambalina

Limarc writes content for Lionbridge’s website as part of the marketing team. Born and raised in Canada, Limarc’s love of Japanese pop culture brought him to Japan in 2016 and living in Japan has been his dream come true. Apart from Lionbridge content, you can catch Limarc online writing about anime, video games, and other nerd culture.

Welcome!

Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.