Over the past decade we’ve seen a rapid rise in natural language processing applications for businesses. This includes chatbots, virtual assistants, search engines, customer analysis, and general data analytics. The recent development of natural language processing tools and systems such as GPT-3 have many excited for the future of NLP technology for business.
To learn more about natural language processing companies, I talked with the co-founder of Linalgo, Arnaud Rachez. Linalgo is a machine learning and natural language processing consultancy that helps clients design, implement, and deploy data products. Through a blend of computer vision and natural language processing tools, they’ve developed a platform for analyzing and digitizing printed documents at scale.
In this article, we talk about the demand for natural language processing applications, tips for planning and designing successful NLP projects, and why high quality data is the bedrock of any language-related automated system.
The Demand for NLP: Realities and Misconceptions
To get a sense for trends in the market, I asked Rachez which natural language processing applications he thought were most popular at present. He pointed to chatbot systems, saying “In the past few years we’ve had quite a few clients asking about [chatbots.] They want to understand how to transform customer service processes that were once handled through email or phone to conversational agents.”
Outside of chatbot technology, Rachez has noticed a general increase in the demand for NLP in marketing, having worked on a number of social network analysis projects. “NLP is hugely important when it comes to analyzing communities and what they talk about at scale,” he said, “and there’s a lot of value that brands can capture when they understand who says what about different topics on social networks.”
On the flip side, Rachez said that growing demand has also resulted in misconceptions regarding the realistic capabilities of natural language processing and machine learning. When I asked for more detail, Rachez said, “Sometimes there’s a misunderstanding as to what machine learning can actually do. There’s this idea that if you have a team of data scientists, you’re magically going to be able to extract value from your data and solve all your problems. But that’s not what actually happens. To achieve your goals, you really need to understand what you’re working on and look at machine learning as a tool to help you get there. Machine learning is not going to find your problems for you. But if you know what you’re doing, it might help you solve them in a more efficient way.”
He stressed the idea that when it comes to implementing a business solution with automated technologies, it’s not about a specific algorithm or machine learning technique. Rather, it’s about ensuring you have everything you need to make sure the algorithms and techniques work. As examples, Rachez pointed to the availability of high quality annotated data, and the ability to deploy, monitor, and update your models.
What Makes For a Successful NLP Application?
Rachez pointed to two areas that can determine the success or failure of NLP projects in business: data availability and the ability to deploy into production.
“First, you need to have the right data. Data scientists and machine learning engineers won’t be able to do their job properly without quality, annotated data. This is especially true in natural language processing because of how machine learning works: the algorithms learn how to do their task based on lots of examples. Only humans are capable of language, so if you want to automate an NLP task, you need human beings to produce a large number of examples to feed the machine. The difficulty here is that most of the time, raw utterances are not enough; you need structured productions, and that requires domain modeling.”
“Take a reservation chatbot that handles airline tickets, for example. You can’t train it by just collecting sentences such as “Book me a Tokyo-Paris flight next Monday at 1 PM”. You need to teach the algorithm that Tokyo and Paris are cities, that Monday is a date and 1 PM is a time. That’s why the process of tagging data is so valuable in NLP.”
As for the area of deployment, Rachez pointed to it as a common roadblock. Many projects, he said, end as mere exploratory analysis or simple proof of concepts.
“Many projects never make it to production. You try something, you get preliminary results, but you’re not able to put them into production. This could be for a variety of reasons. Perhaps the project didn’t fulfill expectations, or it’s too complicated to implement, or a much simpler approach performs just as well. So it’s important to know that the data you have before production is not the data you’ll have in production. The production stage is when you get real data, and it might well be different from the data you trained on.”
The Road to Improved Natural Language Processing Applications
I asked Rachez to elaborate on the idea of projects that don’t make it to production, and why implementing natural language processing applications is so difficult.
“There’s still a general lack of understanding around the requirements of machine learning for NLP,” he said. “This means that objectives aren’t always clear and expectations aren’t realistic.”
To help improve this lack of understanding, Rachez pointed to the importance of team work, even before deciding on any natural language processing tools. At the inception of a machine learning project, it’s important for the business, the domain experts, and the engineers to work closely.
However, since implementation is becoming more common, the awareness of NLP systems and their capabilities is growing by the day. According to Rachez, this means it’s easier to implement processes to review progress on a regular basis. This in turn helps to confirm a project’s feasibility much earlier.
“We’re gradually building a better understanding of what is needed to solve problems with machine learning. Unfortunately, it is still almost impossible to predict the results of training a machine learning algorithm on a specific task, but we can build processes to detect failure at the early stages. This means we don’t have to wait until we’ve annotated hundreds of thousands of documents before getting the results we need. With the right feedback loop or with human-in-the-loop processes, you can spot trends early. This will let you know if you should continue investing in a project or shut it down because it won’t deliver on expectations.”
The Importance of Quality
Because machine learning models often rely on huge amounts of data, I asked Rachez to talk a little about the impact of data quality on natural language processing tools. He said the key is to know exactly what data you need before collecting and annotating your data. This means having clearly defined business goals and a good understanding of your industry. A talented data science team on its own might not be enough.
“First you need someone to scope or model the domain you’re working in,” said Rachez. “With chatbots, for example, that domain expert will answer questions like ‘What kind of requests are there?’ and ‘What kind of answers can you provide?’ For many chatbots, you’ll have a variety of customer claims that need to be classified. Are they claims for products that are late? Broken? Did the customer get the wrong product? All of these categories need to be clearly defined well before you get into the data science aspects of the work.”
In short, domain modeling helps data scientists to build the classifiers and data extraction algorithms that can provide answers to customer questions. It also helps to ensure that data is not biased, and is an important part of ensuring quality.
“Domain modeling is a tedious process,” said Rachez, “but it’s a very important part of the data science process. Once it is done, you can finally start to measure data quality with rigorous processes and concrete metrics like inter-rater agreement scores. Without it, you won’t be able to implement a data quality verification process.”
The implementation of natural language processing applications can improve businesses in a variety of different ways, from automating customer correspondence to drawing important analysis from historical data. However, the best implementation of these systems starts with a conscientious approach. To ensure a high-quality end result, care has to be taken with regards to data collection, data gathering, and data exploration.
If you’re looking to implement natural language processing applications in your own business, or need data for a particular NLP project, get in touch. Our data experts can help you define your business needs and construct a clear understanding of what data needs to be collected and how to prepare it. Our custom platform makes annotation projects easy to manage, and easy to scale. For more information, get in touch today.