In the field of machine learning, meta learning techniques focus on the research and development of automated learning algorithms. To get a better understanding of them, I got in touch with Nicholas Guttenberg, a senior research scientist at CrossLabs.
I reached out to Guttenberg because I’d touched on the topic of meta learning techniques in my discussion with Olaf Witkowski, but wanted a deeper understanding of the topic. When I realized that Guttenberg’s current research—a collaboration project with GoodAI—is focused on that very field, I hoped he could provide further insight.
What is Meta Learning?
When asked to define meta learning, Nicholas said “Meta learning is zooming out [from traditional machine learning] and saying, ‘instead of the system simply learning a task, [a meta learning system] is going to treat the task as data to learn to deal with new tasks that come in.’ In this sense, meta learning is the process of learning how to learn.”
In other words, rather than learning how to do one thing, meta learning algorithms operate one level above. It’s about building algorithms that can develop the ability to perform new tasks. This is an essential area of research, given the current state of ML and it’s shortcomings.
“At present, when you train an AI to do a specific task, it’s going to learn things about the dataset and information that you give it directly,” Guttenberg said. “That takes time and it takes a lot of data because you’re essentially learning from scratch. You can imagine this like having a child, raising the kid through to college age and then spending their college education teaching them to identify cats versus dogs until they’re an expert. But when you want to train a system for planes versus boats, you say goodbye to your first child, have another, and take the new child through that same long process to teach them planes vs boats. So you literally start from a blank slate. You have to train a network or model from scratch every time. That requires massive amounts of data.”
“On the other hand, let’s take a situation where you have an adult human and you say, ‘Here are two new kinds of objects you’ve never seen before in your life. I want you to learn the difference between these objects.’ In this case, it’s not going to take many examples for a human to tell the objects apart. Similarly, if a human sits down and plays a video game they’ve never played before, they’re not going to be incompetent at it just because it’s new. They’re not going to suddenly forget how to use a keyboard or mouse just because it’s a new game.”
This is a big part of how humans learn; our learning is built on cooperation and the knowledge of previous generations. Think of how developments in computing usually come from refining existing technology, or how sporting performance is improving based on our research over time. Much, if not all, of our intelligence and learning is built on past knowledge. However, this feature is missing from modern AI technology, leaving a question that Guttenberg’s research looks to answer: What would we need to do to tap into that potential learning benefit?
How Does Meta Learning Work?
With this idea in mind, I wanted to get Guttenberg’s take on another fundamental question: What are learning algorithms, and on a basic level, how do they work?
“To pin it down,” said Guttenberg, “If individual AIs learned to receive information from other AIs in their environment, and then used that information for tasks, they would learn to interpret incoming knowledge. So you’d have a representation of whatever the enabling information is that lets you succeed on a task. That information wouldn’t be about the specific instance, because the AI that communicates the knowledge wouldn’t know exactly what task the receiving AI is going to look at. So the communicated knowledge would have to be something that generalizes and doesn’t care about those details; it would only care about the substantive, like how to recognize a cat versus a dog, or how to navigate a maze. It should distill the skills of one AI into a format that can be transmitted to another. And if you have a bunch of AI passing distilled bits of knowledge around, modifying them, and then passing them to other AI, that is a kind of learning algorithm.”
So, the communication of generalized knowledge in a format that can pass from one system to another, that allows the second system to benefit from the experience of the first. That’s the goal, but how do we get there?
Meta Learning Challenges
So I wondered, what challenges face the field of meta learning and emergent learning? What’s stopping us from finding these particular algorithms? Guttenberg says it has to do with giving an algorithm freedom without losing the guarantees that simple, hand-designed algorithms often have. To demonstrate, he pointed to gradient descent, the optimization algorithm used to find a local minimum of a differentiable function, and a go-to algorithm for training neural networks.
“Think about what gradient descent can be guaranteed to do; if you have a two dimensional landscape with no hills and valleys, where everything is downhill to a specific point, you can prove things about it, like convergence and rate of convergence. However, neural network loss landscapes are like an unknown landscape with hills, valleys, and no guarantees of where it’s going to go. But even if you take that and add a hundred million dimensions, you can still prove the same things with gradient descent. So there’s an underlying property or guarantee built into the way we constructed gradient descent as an algorithm, and some other very simple algorithms share this guarantee.”
The importance of algorithms like gradient descent, according to Guttenberg, is that even though they were invented while looking at a very simple world, they generalize to a very complex one. The challenge for meta learning is to make a similar connection with the idea of generalization.
“If we are trying to create a learning algorithm, we’re showing a neural network a large number of examples of learning, and then asking it to copy that. Then we hope it generalizes to something very far outside of that set of learning examples. But in a learning algorithm, you don’t have conservation laws [like gradient descent]; that means you can only force it to obey particular constraints.”
“This means that depending on choices that are not entirely clear right now, sometimes a system generalizes well, but sometimes it doesn’t generalize at all. So [the challenge is] to systematically figure out the choices and design principles that guarantee convergence for an algorithm even when it’s far from what it saw during training. At the same time, we also want to give it enough freedom that it can discover something honestly new.”
Designing Learning Systems
“Somewhere in between constrained and unconstrained systems,” said Guttenberg, “we think there should be some way of discovering how to impose long-term convergence properties, or far-from-distribution convergence properties. [For example,] maybe the network doesn’t start with these properties, but it is encouraged to develop them.”
This idea has challenges of its own, because you need a way of representing convergence properties without looking at data. However, this goes against the grain of current data science, which succeeds because of statistically rich, heterogeneous training data. That said, there are other ideas Guttenberg has been experimenting with.
“One approach we’ve investigated is having two networks, where one fixes the mistakes of the other. So you have one unconstrained network that proposes solutions and might not converge, and another network with guaranteed convergence, whose job is to fix the errors of the first. The idea is that gradient descent is part of the inference step. You have all the nice properties of gradient descent, but the unconstrained part of the network decides what to do gradient descent over, and what modules to combine to create a solution. [In this system,] we still rely on gradient descent to figure out the best combination, but now what you combine can be very sophisticated. So the meta learning bit controls what moving pieces the system is allowed to play with to create a solution. If those moving pieces are a diverse set of functions that have nice properties, the other learners will have an easy time of it; you’ll have fast convergence, and you’ll be able to solve problems you couldn’t solve otherwise.”
Meta learning is often seen as a stepping stone towards artificial general intelligence, but it’s likely that learning algorithms will have a big impact on traditional machine learning systems also. Systems that can generalize to new tasks could theoretically engage in recursive self-improvement, learn to do new tasks with limited data, and make machine learning systems quicker to train and build in general.
Guttenberg and GoodAI’s meta learning research is ongoing and growing in popularity. As experiments and developments continue, more researchers are gathering to discuss meta learning and artificial general intelligence, as evidenced in the large turnout for the recent Meta-Learning & Multi-Agent Learning Workshop, which included institutions such as Google Brain, Deepmind, Oxford, and MIT. The conference also marked the launch of the GoodAI Grants initiative, aimed at funding further research.
You can follow more of Guttenberg’s work on his official Twitter account. For more about meta learning and artificial general intelligence, be sure to check out my previous interview with Olaf Witkowski and subscribe to the Lionbridge AI newsletter below for new articles direct to your inbox.