50 Beginner AI Terms You Should Know

Article by Daniel Smith | June 05, 2019

AI is a field filled with technical terms. It can be difficult to pin down exactly what they mean, particularly if you don’t work directly with data every day.

That’s why we’ve created a glossary of 50 AI terms that frequently come up in discussions about AI. If you can lock down these basics, you should be able to hold your own in any discussion about machine learning. Let’s run through them in alphabetical order.

 

Algorithm

A set of rules that a machine can follow to learn how to do a task.

 

Artificial Intelligence

This refers to the general concept of machines acting in a way that simulates or mimics human intelligence. AI can have a variety of features, such as human-like communication or decision making.

 

Autonomous

A machine is described as autonomous if it can perform its task without the need for a human.

 

Backward Chaining

A method where the model starts with the desired output and works to find data that might support it.

 

Bias

Assumptions made by a model that simplify the process of learning to do its assigned task. Most supervised machine learning models perform better with low bias, as these assumptions can negatively affect results.

 

Big Data

Datasets that are too large or complex to be used by traditional data processing applications.

 

Bounding Box

Commonly used in image or video tagging, this is an imaginary box drawn on an image. The contents of the box are labeled to help a model recognize it as a distinct type of object.

 

 

Chatbot

A program that is designed to communicate with people through text or voice commands in a way that mimics human-to-human conversation.

 

Cognitive Computing

This is effectively another way to say artificial intelligence. It’s used by marketing teams at some companies to avoid the science fiction aura that sometimes surrounds AI.

 

Computational Learning Theory

A field within artificial intelligence that is primarily concerned with creating and analyzing machine learning algorithms.

 

Corpus

A large dataset of written or spoken material that can be used to train a machine to perform linguistic tasks.

 

Data Mining

The process of analyzing datasets in order to discover new patterns that might improve the model.

 

Data Science

Drawing from statistics, computer science and information science, this interdisciplinary field aims to use a variety of scientific methods, processes and systems to solve problems involving data.

 

Dataset

A collection of related data points, usually with a uniform order and tags. For a variety of datasets for different use cases, check out our collection of the 50 best datasets for machine learning.

 

Deep Learning

A function of artificial intelligence that imitates the human brain by learning from the way data is structured, rather than from an algorithm that’s programmed to do one specific thing.

 

Entity Annotation

The process of labeling unstructured sentences with information so that a machine can read them. For example, this could involve labeling all people, organizations and locations in a document.

 

 

Entity Extraction

An umbrella term referring to the process of adding structure to data so that a machine can read it. This may be done by humans or by a machine learning model.

 

Forward Chaining

A method in which a machine must work from a problem to find a potential solution. By analyzing a range of hypotheses, the AI must determine which are relevant to the problem.

 

General AI

An AI that could successfully do any intellectual task that any given human being currently can. This is sometimes referred to as strong AI, although they aren’t entirely equivalent terms.

 

Hyperparameter

Occasionally used interchangeably with parameter, although the terms have some subtle differences. Hyperparameters are values that affect the way your model learns. They are usually set manually outside the model.

 

Intent

Commonly used in training data for chatbots and other Natural Language Processing tasks, this is a type of label which defines the purpose or goal of what is said. For example, the intent for the phrase “turn the volume down” could be “decrease volume”.

 

Label

A part of training data which identifies the desired output for that particular piece of data.

 

Linguistic Annotation

Tagging a dataset of sentences with the subject of each sentence, ready for some form of analysis or assessment. Common uses for linguistically annotated data include sentiment analysis and natural language processing.

 

Machine Intelligence

An umbrella term for various types of learning algorithms, including machine learning and deep learning.

 

Machine Learning

This subset of AI is particularly focused on developing algorithms that will help machines to learn and change in response to new data, without the help of a human.

 

Machine Translation

The translation of text by an algorithm, independent of any human involvement.

 

Model

A loosely-defined term referring to the product of AI training, created by running a machine learning algorithm on training data.

 

Neural Network

Also called a neural net, this is a computer system designed to function like the human brain. Although researchers are still working on creating a machine model of the human brain, existing neural networks can perform many tasks involving speech, vision and board game strategy.

 

Natural Language Generation (NLG)

This refers to the process by which a machine turns structured data into text or speech that humans can understand. Essentially, NLG is concerned with what a machine writes or says itself as the end part of the communication process.

 

 

Natural Language Processing (NLP)

The umbrella term for any machine’s ability to perform conversational tasks, such as recognizing what is said to it, understanding the intended meaning and responding intelligibly. Check out our recent article for a more in-depth discussion about natural language processing.

 

Natural Language Understanding (NLU)

As a subset of NLP, this deals with helping machines to recognize the intended meaning of language, taking into account its subtle nuances and any grammatical errors.

 

Overfitting

A common symptom of machine learning training in which an algorithm is only able to work on or identify specific examples present in the training data. A working model should be able to use the general trends behind the data to work on new examples.

 

Parameter

A variable inside the model that helps it to make predictions. Their value can be estimated using data and they are usually not set by the person running the model.

 

Pattern Recognition

The distinction between this and machine learning is often blurry, but this field is basically concerned with finding trends and patterns in data.

 

Predictive Analytics

By combining data mining and machine learning, this type of analytics is built to forecast what will happen within a given timeframe based on historical data and trends.

 

Python

A popular programming language used for general programming.

 

Reinforcement Learning

This is a method of teaching AI which sets a goal that doesn’t have specific metrics, encouraging the model to test different scenarios rather than find a single answer. Based on human feedback, the model can then manipulate the next scenario to get better results.

 

Semantic Annotation

Tagging different search queries or products with the goal of improving the relevance of a search engine.

 

Sentiment Analysis

The process of identifying and categorizing opinions in a piece of text, often with the goal of determining the writer’s attitude towards something.

 

 

Strong AI

This field of research is focused on developing AI that is equal to the human mind when it comes to ability. General AI is a similar term often used interchangeably.

 

Supervised Learning

This is a type of machine learning where structured datasets, with inputs and labels, are used to train and develop an algorithm.

 

Test Data

The unlabeled data used to check that a machine learning model is able to perform its assigned task.

 

Training Data

This refers to all the data used during the process of training a machine learning algorithm, as well as the specific dataset used for training rather than testing.

 

Transfer Learning

This method of learning involves spending time teaching a machine to do a related task, allowing it to return to its original work with improved accuracy. One potential example of this is taking a model that analyses sentiment in product reviews and asking it to analyze tweets for a week.

 

Turing Test

Named after Alan Turing, this tests a machine’s ability to pass for a human, particularly in the fields of language and behavior. After being graded by a human, the machine passes if its output is indistinguishable from that of human participants in the test.

 

Unsupervised Learning

This is a form of training where the algorithm is asked to make inferences from datasets that don’t contain labels. These inferences are what help it to learn.

 

Validation Data

Structured like training data with an input and labels, this data is used to test a recently trained model against new data and to analyze performance, with a particular focus on checking for overfitting.

 

Variance

The amount that the intended function of a machine learning model changes while it’s being trained. Despite being flexible, models with high variance are prone to overfitting and low predictive accuracy, since they are reliant on their training data.

 

Variation

Also called queries or utterances, these work in tandem with intents for Natural Language Processing. The variation is what a person might say to achieve a certain purpose or goal. For example, if the intent is “pay by credit card”, the variation might be “I’d like to pay by card, please.”

 

Weak AI

Also called narrow AI, this is a model that has a set range of skills and focuses on one particular set of tasks. Most AI currently in use is weak AI, unable to learn or perform tasks outside of its specialist skill set.

 

Still confused?

A single term doesn’t always mean very much in isolation. However, when combined with some basic understanding of the way machine learning works, they can form a powerful foundation that will enable you to make informed business decisions. If you want to continue building out your knowledge base, start with training: the process that will make or break your model. Check out our introduction to training data for more.

The Author
Daniel Smith

Daniel writes a variety of content for Lionbridge’s website as part of the marketing team. Born and raised in the UK, he first came to Japan by chance in 2013 and is continually surprised that no one has thrown him out yet. Outside of Lionbridge, he loves to travel, take photos and listen to music that his neighbors really, really hate.

Welcome!

Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.