Interview with Fujitsu Cloud Technologies: 80% of Data Science is Pre-processing

Article by Hengtee Lim | January 31, 2020

With the constant development of AI technology in Japan, the term data scientist has become something of a buzzword. Put simply, however, data scientists analyze data and pull information from it for a variety of uses. In a professional setting, for example, this could mean analyzing company data for automating processes or making business-related predictions. The role of the data scientist is key when it comes to preparing and refining the training data for a machine learning model.

In this interview, we talk to Yusuke Takahashi and Takahito Hori, data scientists at Fujitsu Cloud Technologies. We cover the general work of data scientists, the importance of data pre-processing, advice for AI implementation, and more.


What does the Fujitsu Cloud Technologies data science team look like?

The data science team at Fujitsu consists of three different roles, each with their own responsibilities:

  • Data Scientist: Data analysis and the construction of AI models.
  • Data Engineer: Data warehouse design, data processing, and integrating AI models with APIs and relevant systems.
  • Data Director: Acts as the point of communication between the client and the data science team.

It is not uncommon for a data science team to also have a data analyst role. Data analysts find and pull new information from data to share with the client. At Fujitsu, the data analyst role is covered as part of the data scientist’s responsibilities.

What does a data scientist do?

Takahashi: Data pre-processing is so important you could say it’s 80% of our job. You can’t construct an AI model without preparing and organizing the data for it. That’s data pre-processing: the organization, supplementation, and processing of data to create efficient AI models. This is 80% of a data scientist’s job.

Hori: Data pre-processing is the most difficult part of our job, and the most interesting; it’s where we discover new information.

For example, let’s say you’re working on predicting the number of customers for a restaurant. You construct a model based on the records of past customer numbers, but when you look at its accuracy you discover it predicts double over a particular period of time. When you look deeper into that specific period, you find that over that particular week the restaurant was running a special campaign. In that case, you’ll construct another predictive model that includes relevant campaign data to improve accuracy. This example of supplementing a model with related data is an example of data pre-processing.

It’s really great when as a result of pre-processing, you see an increase in the accuracy of your predictive model. I think that’s when data scientists feel a real sense of value in their work.

Takahashi: In addition, I think pre-processing is a definitive way to increase model accuracy. At a glance it might seem like fine-tuning is the key role for data scientists. But actually data pre-processing is the core of what we do.


On data analysis and the importance of communication with the client:

Hori: A lot of analytical projects fail when there isn’t close communication regarding the analytical process. This is because it’s impossible to achieve success when there isn’t mutual understanding of knowledge and information, such as analytical methods and business smarts.

For this reason, as data scientists we take care to be in constant contact with clients. We work to include necessary industry knowledge into the AI model, and ensure there’s no misinterpretation of the data. We also try to explain the process in a simple manner, which can include what we do in pre-processing, why we choose particular methods, and how to read the statistical data.

Takahashi: I think communication is an indispensable skill for developing the role of data scientists in society. Outside of data science as a field of study, our work is to solve the problems that are brought to us by our clients.


What advice do you have for businesses that want to implement AI, or for AI project supervisors?

Takahashi: When starting AI implementation, it’s important to be clear about what problem the AI is solving, and what data you’ll use to solve it. Often people who are thinking of implementing AI don’t have a firm grasp of both of these points. When the problem and the data aren’t clear, you end up with an AI system that doesn’t actually solve any problems; this is not AI implementation.

Hori: AI is not an all-purpose tool. After an AI system is developed, you need someone experienced to manage it. It’s worth keeping in mind that if you don’t have a suitable person to do this, you’ll need to hire or train someone for it. To be frank, it’s worth asking if you’re ready for the person in charge of the project to become a data scientist. As data scientists we of course explain our analytical processes and results as best we can, but to clients we’re still outside support. With this in mind, it’s no exaggeration to say that a key to a successful project is having someone on the client side who is motivated to get really hands-on with the project. I think it’s important for people supervising AI projects to be aware of this.


About Fujitsu Cloud Technologies

Fujitsu Cloud Technologies provides a range of ICT services including Data Design, which harnesses AI technology to improve and enhance data analysis in areas including geographical statistics analysis, data assessment, and natural language processing. 


About Lionbridge’s AI training data services

Lionbridge is a leading provider of data services, including data collection, annotation, and validation. We support the development of AI projects in any timezone across the world with a community of 1,000,000+ crowd workers ready to complete projects quickly, at scale, and in any timezone.

Contact us for project inquiries or a free quote here.

Want to learn more about high-quality data solutions?
The Author
Hengtee Lim

Hengtee is a writer with the Lionbridge marketing team. An Australian who now calls Tokyo home, you will often find him crafting short stories in cafes and coffee shops around the city.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.