This glossary defines general AI and machine learning terms. If you haven’t quite locked down the more basic AI terms, we’d recommend starting with these 50 Beginner AI Terms You Should Know.
What is anomaly detection?
Anomaly detection is task of identifying suspicious elements within a given stream of data, based on how those elements differ from the rest of the dataset in relevant criteria.
Binary (Bimodal) Classification
What is binary classification?
Binary classification is the task of classifying elements into two groups, based on a classification rule, defined below.
Classification Rule (Classifier)
What is a classification rule?
Given a population where elements belong to different categories, a classification rule is a procedure to predict which elements belong to which categories.
What is a complex system?
A complex system is an algorithm to solve a problem containing many entities linked together in a complex way.
What is computational intelligence?
Computational intelligence is the ability of a computer to learn a specific task from training data or experimental observation.
What is computer vision?
Computer vision enables machines to understand the content of images and videos. The goal is to automate tasks that the human visual system can do. Learn more.
What is data cleansing?
The processing improving data quality, which usually involves removing or correcting false data values. Data cleansing is an important step to do before beginning a machine learning project.
What is game theory?
Game theory is the study of mathematical models of strategic interaction between rational decision makers. In simple terms, it is the study of how and why people make decisions. Game theory helps us understand parts of science and politics as well.
What is grid search?
The process of performing hyperparameter tuning in order to determine the optimal values for a given model. This is significant as the performance of the entire model is based on the hyper parameter values specified.
What is ground truth?
In machine learning, ground truth refers to the accuracy of the training dataset’s classification for supervised learning techniques. The ground truth is used in statistical models to prove or disprove research hypotheses.
Heuristic Search Techniques
What are heuristic search techniques?
Heuristic search techniques are support functions that narrow down the search for optimal solutions for a problem by eliminating incorrect options.
What is logarithmic loss?
Logarithmic loss is a function that measures the performance of a classification model where the prediction input is a probability value between 0 and 1. The goal of machine learning models is to minimize this value. Learn more.
What is logic programming?
Logic programming is a type of programming paradigm in which computation is carried out based on the knowledge repository of facts and rules. Two programming languages used for machine learning are LISP and Prolog.
Long Short-Term Memory (LSTM)
What is long short-term memory (LSTM)?
Long short-term memory (LSTM) is an artificial recurrent neural network architecture used in deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections that make it act as a general purpose computer that can process not only single data points, but also entire data sequences.
What is naive bayes?
Naive bayes is a probabilistic machine learning classifier that makes classifications using the Maximum A Posteriori decision rule in a Bayesian setting. Naive bayes classifiers are commonly used for text classification, and are a traditional solution for spam detection.
Named Entity Recognition (NER)
What is named entity recognition (NER)?
Named entity recognition is the classification of named entities present in a body or text. The entities are labeled based on predefined categories such as person, organization, place.
What is natural intelligence?
Natural intelligence refers to how humans and animals think, as opposed to artificial intelligence.
Optical Character Recognition (OCR)
What is optical character recognition (OCR)?
Optical character recognition (OCR) technology enables computers to extract text data from images. Once a document (typed, handwritten, or printed) undergoes OCR processing, the text data can easily be edited, searched, indexed, and retrieved. Learn more.
What is an optimization problem?
In mathematics and computer science, an optimization problem is the task of finding the most effective and efficient solution to a problem, instead of finding any possible solution that works.
What is phrase chunking?
Phrase chunking is the process of tagging parts of speech with their linguistic or grammatical meaning.
What is robotics?
Robotics deals with the design, construction, operation, and use of robots, as well as computer systems for their control, sensory feedback, and information processing.
What is search relevance?
Search relevance refers to search engine performance and the relevance of its fetched results. It is the user’s ability to search for information quickly and easily.
What is soft computing?
Soft computing, sometimes also referred to as computational intelligence, refers to the use of inexact but usable solutions to solve complex computational problems.
What is stemming?
Stemming is the process of reducing words to their root form. For example, the words robotics would be reduced to the stem robot. The stem is usually a written word, but does not need to be. The Porter stemmer, a widely used algorithm for removing common suffixes from English words reduces the words universal, university, and universe to the stem univers.
Support Vector Machines (SVM)
What are support vector machines (SVM)?
Support vector machines (SVM) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.
What is swarm behavior?
From the perspective of the mathematical modeler, swarm behavior is an emergent behavior arising from simple rules that are followed by individuals and does not involve any central coordination.
What is systems engineering?
Systems engineering is a sub-field of engineering that focuses on how to design and manage complex systems throughout their life cycles.
What is term frequency?
Term frequency, used in text mining, natural language processing, and information retrieval, tells you how frequently a term (word or phrase) occurs in a document. Since documents differ in length, it’s possible that a term would appear more times in longer documents than shorter ones. Thus, term frequency is calculated by dividing the total number of terms in the document, as a way of normalization.
Term Frequency = [Number of times the term appears in the document] / [Total number of terms in the document].
tf-idf (term frequency-inverse document frequency)
What is tf-idf (term frequency-inverse document frequency)?
Tf-idf (term frequency-inverse document frequency) is a numerical statistic that is used to show how important a word is to a document in a corpus. The method is to count how often the word occurs in the document, then normalize it against how often that word appears in other documents.
What is unstructured data?
Unstructured data is data that does not have easily searchable patterns, for example, audio, video, and social media content.
What are word vectors?
This is the concept of transforming a word into a vector and giving it a position in multi-dimensional space. By representing words as vectors, you can use them in mathematical operations. You can calculate the distance between words to represent mathematically which words are related. Learn more.
Think you’ve mastered these 30 intermediate terms? Test your knowledge on AI and machine learning with our interactive quiz here.