The Ethical Debate on AI Applications: An Interview with Data Scientist Dr. Iain Brown

Article by Rei Morikawa | August 21, 2019

Dr. Iain Brown is the Head of Data Science for SAS UK&I. In the past few years, he has focused on the practical implications of AI, and its potential positive and negative outcomes. For our interview, Iain shared his expertise about the ethical debate on AI applications.

 

Lionbridge: What are the most common fears and misconceptions about AI?

Iain: Based upon a recent study that SAS conducted with Forbes Insights, 61% of employees completely trust their organization’s ability to ethically use AI technology. We’ve seen firsthand how trust can be further tested with large corporate data breaches, the recent Capital One breach is a prime example.

I would suggest there’s a general misunderstanding and mistrust of AI. People are often suspicious that companies are collecting data for the wrong reasons, like intruding and hacking into people’s private information. The public tends to think that AI is uncontrollable, and a lot of that fear may stem from television and pop culture depictions of AI. But in reality, most businesses are trying to understand their clients needs better and provide them with more appropriate  products or services. 

Another misconception is that we can only use AI systems that are 100% accurate. The value of AI, however, should really be compared against the status quo. Let’s take the pharmaceutical sector as an example, and say that the manual (human) process of identifying defective devices on a production line is accurate 90% of the time, including false positives (leading to wastage of non-defective products) and false negatives (missing defective products). If machines can identify defective drugs with 99% accuracy, although not perfect that’s still a huge improvement. Keep in mind that the company must throw out the whole batch of drugs if even one sample is defective, so increasing the accuracy by 9% could save them millions of dollars and reduce the risk of defective products hitting the shelves. 

 

L: How can companies keep their data secure?

Iain: An increasing number of companies are moving their data storage to the cloud and security is becoming an important concern. Although still hotly debated, there is a growing consensus that it is actually more secure to store data within public cloud providers such as Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure, than on premise. It’s the cloud provider’s business to host, manage, and store data, so they have tight protocols in place. The cloud storage providers need to be extremely secure because they can’t operate as a business without trust.

 

L: How can we make sure that AI systems are as accurate as possible? Who should be held accountable if the system fails?

Iain: The topic of accountability has been hugely debated in the news of late. A topical example of this is if an autonomous vehicle crashes, who is responsible: the owner, driver, insurance company, or manufacturer? As a society, we’re going to need to grapple with this question as soon as possible, since we’re not too far from having autonomous vehicles on the roads. 

I’m a big advocate of the FATE framework for ethical AI development. FATE stands for fairness, accountability, transparency, and explainability. Fairness refers to removing bias and corporate discrimination. Accountability refers to ownership of decision-making and the willingness to take responsibility. Transparency refers to avoiding black box processes and having a good understanding of the process from start to finish. Finally, explainability refers to the ability to explain and justify your model’s results. It’s important to have internal governance around each of these four pillars. 

In addition, I’d prefer to keep a level of human oversight instead of completely automating processes, particularly for the healthcare and legal sectors where the impact of making a mistake is huge. In healthcare, we are already  helping practitioners use image processing for tumor detection, but that doesn’t mean that the machine’s output determines the patient’s final diagnosis. The MRI and CT scans are useful tools for flagging anomalies, but the practitioners take those results and use it to inform the final decision of the next steps for their patient. 

 

L: What are the current problems with AI innovation?

Iain: First, there’s a shortage of business leaders who truly understand the value AI can deliver enough to drive the right projects. Tech companies in major global markets in the United States, Europe, and Japan tend to have an appetite and desire to implement AI, but they often haven’t specified what business challenges they’d like to solve with AI. There are many companies feeling the pressure to “do AI” after hearing that their competitors have an AI team, but sometimes these companies don’t have a solid understanding of AI applications. 

In this situation, you should start with the problems you wish to solve, and identify the use cases that are tangible and will also drive business value. The next question typically is, does the data exist to support this use case? In most cases it doesn’t, so the next step would be to collect data and compile the necessary information to drive the decisions that you’re trying to achieve.

You need to be able to walk before you run: develop an effective strategy for collecting and organizing your data, and implement quality controls. The concept of building on strong data foundations is crucial for successful AI projects.

The second problem is that there’s a shortage of high-quality training data for machine learning.

 

A lot of companies struggle with collecting data and feeding it to machine learning systems.

 

There’s a lot of IP and competitor advantage tied up in data. In recent years, however, we’ve seen a rise in data being viewed as a sellable asset creating more of a shared data ecosystem . Large companies that have traditionally stored petabytes of data are now sharing it with others, because they are seeing a potential revenue stream if they can monetize the data. The caveat is that when data is lost, like with the recent Capital One and Equifax data breaches, companies would naturally become more skeptical about data sharing. 

At the same time, regulations are becoming more strict around the usage of data. The GDPR (General Data Protection Regulations) in Europe affects any business operating with European customers. The GDPR is a stringent set of regulations about how data can be used, and breach results in massive implications, including a fine of up to 4% of a company’s annual revenue. 

The third problem is that most companies aren’t using their unstructured data to the full potential. On average, 80% of a company’s data is in an unstructured form (for example text records and images). However, most companies have all of that data stored away in the cloud as static dump, thinking they might use it in the future. There’s a cost to storing data, but right now, most companies aren’t realizing the full value from it. 

For example, online retailers might solve the problem of using transactional history to recommend new products to their customers. But a lot of online retailers may yet consider search relevance to improve customer experience, sentiment analysis to understand the public opinion about their products, or natural language generation for customer support. The majority of companies haven’t looked into natural language processing or computer vision. Over the next decade, I foresee more organizations combining unstructured data in their decision making process to provide a more personalized experience for their customers. 

 

L: How can we eliminate bias from machine learning systems?

Iain: This is a common issue that companies need to be conscious of when building AI models. Bias can exist in many places including the training data, us as humans, and within the machine learning systems themselves. 

For example, I was recently engaged in a project with a large mortgage provider to develop a machine learning model to pre-score customers using stochastic gradient boosting. The organization had previously relied upon a simpler linear classifier (logistic regression) and although simple to explain, even with this approach biases inherent in the data were present. I found that the more advanced machine learning approach not only improved accuracy by 10% but also gave less biased decisions based upon certain customer attributes.

People often assume that AI is complicated, but actually, the underpinning approaches can range from simple to complex. A simple linear based classification model may say that as your age increases, your risk decreases. What this would miss however is any nonlinear interactions that can be picked up by more complex models such as neural networks, advanced tree based learners, and support vector machines.

What is developing at pace is a focus on model interpretability and new methodologies to explain traditional black box techniques. You can use partial dependence plots, for example, to see what marginal effect a certain variable has on the predicted outcome of a model, or individual conditional expectation (ICE) to see how a customers prediction changes when a variable feature changes. In addition, local interpretable model-agnostic explanations (LIME) look at explaining the individual predictions of a black box model. These techniques can work on any model, and are extremely powerful for explaining the impact of the relationships that exist within a dataset. 

In the beginning stages of your machine learning project, make sure that the training data isn’t biased. This might be difficult to confirm depending on your industry and where the data is sourced from. If possible, implement a tight quality control process and make sure that there’s clear metadata (data about the data: who collected it, when it was added to the database, whether it’s still valid, etc). These are the fundamental building blocks for good models. 

 

Keep the big picture in mind: you’re looking to improve the quality of your training data.

 

Make sure you don’t have duplicate customer records. Sample the data to make sure it’s randomly selected and representative of the target population that you’re looking to apply it to. Sampling gives you a robust, stratified training dataset for modeling. 

 

L: How do you foresee the job market changing as a result of automation?

Iain: We’re now part of the Fourth Industrial Revolution, a new era building and extending the impact of digitization in new and unanticipated ways. AI empowers people and potentially augments the work that people are doing today. Yes, there will be job replacements, but organizations have always been changing the way their staff operates. We as humans are skilled at adapting our work styles. I believe improving today’s jobs and taking away some of the monotonous work gives us more space to be creative, which after all is what humans do best. 

Interested? Get high-quality data now
The Author
Rei Morikawa

Rei writes content for Lionbridge’s website, blog articles, and social media. Born and raised in Tokyo, but also studied abroad in the US. A huge people person, and passionate about long-distance running, traveling, and discovering new music on Spotify.

    Welcome!

    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.