It’s a known fact that machine learning models need data to be trained. Without training data, even complex algorithms are essentially useless. So what do you do when you don’t have the necessary amount of data?
Over the years, researchers have developed a variety of ingenious solutions. Data Augmentation is one of these solutions. Instead of trying to find and label more data points, we construct new ones based on what we have.
Wait? Can we do that? Sure we can. Though it is not always easy and it might not produce the best results, it’s still a viable alternative.
Expanding a dataset with Data Augmentation methods is not only helpful for the challenge of limited data. It can also reduce overfitting and improve the generalization of our models because it increases the diversity of our training set.
So let’s cut to the chase: How can we perform Data Augmentation?
I think the image below says it all. In this article, we will focus on image augmentation as it is currently the most active field of research, but most of these techniques can be applied to all sorts of data.
Basic Image Manipulations
The first simple thing to do is to perform geometric transformations to the image. You might assume that machine learning can easily distinguish two identical images with different rotations, but it can’t.
Image flipping, cropping, rotations, and translations are some obvious first steps. We can also change the color space of the image using contrast, sharpening, white balancing, color jittering, random color manipulation and many other techniques (called photometric transformations). If you’ve ever used Instagram filters or Snapseed, then you’ll get what I’m saying. You can get as creative as you want.
Moreover, you can mix images together, randomly erase segments of an image, and of course, combine all the above in all sorts of various ways.
Data Augmentation using Machine Learning
Besides basic image manipulations, more and more engineers are starting to use machine and deep learning techniques to augment their data. Think about it this way: We can use machine learning models to produce more data to train more machine learning models. Some of the most promising works are below:
Feature Space Augmentation and Autoencoders
In the above examples, we transformed images on the input space. We can also apply transformations in feature space.
With Neural Networks we can very efficiently map high dimensional inputs (such as images) in lower-dimensional representation (known as feature or latent space). Think of it as encoding a 3D tensor into a 1D vector without losing too much information. Having an image encoding in a few dimensions makes it much easier to perform augmentations.
Many interesting papers suggest different ways to do so such as joining k nearest neighbors together, adding noise, interpolation, and more.
Autoencoders have proven the best choice to extract feature space representations. Autoencoders are a special type of neural networks, which try to reconstruct the input. They consist of two networks, an encoder and a decoder. The encoder takes the input and encodes it into a vector in lower dimensions (feature space). The decoder takes that vector and tries to reconstruct the original input.
By doing this, the latent vector in the middle contains all the information about the dataset and can be extracted to do all sorts of things, including data augmentation.
GAN-based Data Augmentation
Generative Modeling is one of the most exciting techniques at the moment because it can produce completely new images. Generative models can generate new patterns in data because they learn the distribution of the data and not the boundary between them (as is common in most machine learning models).
In that direction, Generative Adversarial Networks (GAN) have become the industry and research standard. GANs consist of two networks, the generator, and the discriminator. The generator’s job is to produce fake data with nothing but noise as input. The second model, the discriminator, receives as input both the real images and the fake images (produced by the generator) and learns to identify the image as fake or real.
As these networks compete against each other, and by training them simultaneously (in a process called adversarial training), the magic begins:
The generator becomes better and better at image generation because its ultimate goal is to fool the discriminator. The discriminator, in turn, becomes better and better at distinguishing fake from real images, because its goal is to not be fooled. The result is incredibly realistic fake data from the generator.
GANs have produced some of the most realistic images and videos we’ve ever seen. Remember deep fakes? That was all the work of GANs. So why not use them for data augmentation instead of replacing Jack Nickolson with Jim Carrey in the Shining?
Last but not least, we have Meta-Learning. Meta-Learning is a relatively new area, in which we use neural networks to optimize other neural networks by tuning their hyperparameters, improving their layout, and more. In terms of data augmentation, things get a little more complicated.
In simple terms, we use a classification network to tune an augmentation network into generating better images.
Take a look at the image below: By feeding random images to the Augmentation Network (most likely a GAN), it will generate augmented images. Both the augmented image and the original are passed into a second network, which compares them and tells us how good the augmented image is. After repeating the process the augmentation network becomes better and better at producing new images.
Of course, this procedure is not the only one available, but it’s a very good starting point for the different research papers in the area.
Data Augmentation is by no means an easy task. There have been quite a few interesting works (here is a collection of them) but there is still much room for improvement. For example, we can still improve the quality of GANs samples, find new ways to use Meta-Learning, and perhaps establish a taxonomy of different augmentation techniques.
And of course, we can still discover new ways to use these techniques in other forms of data, such as text, tabular data, graph data, and more. Why not also expand beyond that? How about Reinforcement Learning? Or search algorithms? The stage is yours.