Neural Synesthesia is an AI art project that aims to create new and unique audiovisual experiences with artificial intelligence. It does this through collaborations between humans and generative networks. The results feel almost like organic art. Swirls of color and images blend together as faces, scenery, objects, and architecture transform to music. There’s a sense of things swinging between feeling unique and at the same time oddly familiar.
Neural Synesthesia was created by Xander Steenbrugge, an online content creator who made his start in data science while working on brain-computer interfaces. During his master thesis, he helped build a system that classified imagined movement through brain signals. This system allowed patients suffering from Locked-in syndrome to manipulate physical objects with their minds. The experience impressed upon Steenbrugge the importance of machine learning, and the potential for AI technology to build amazing things.
Outside of Neural Synesthesia, Steenbrugge works with a startup using machine learning for drug discovery and runs a popular YouTube channel. He’s also working on wzrd.ai, a platform that augments audio with immersive video through the work of AI. In this interview, we talk about Neural Synesthesia’s inspiration, how it works, and discuss AI and creativity.
What were the inspirations for Neural Synesthesia?
I’ve always had a fascination for aesthetics. Examples are mountain panoramas, indie game design, scuba diving in coral reefs, psychedelic experiences, and films by Tarkovsky. Beautiful visual scenes have the power to convey meaning without words. It’s almost like a primal, visual language we all speak intuitively.
When I saw the impressive advances in generative models (especially GANs), I started imagining where this could lead. Just like the camera and the projector brought about the film industry, I wondered what narratives could be built on top of the deep learning revolution. To get hands on with this, my first idea was to simply tweak the existing codebases for GANs to allow for direct visualization of audio. This was how Neural Synesthesia was born.
How much work did you do for the first Neural Synesthesia piece? Did you face any unique challenges?
I think coding for the first rendered video took over six months because I was doing it in my spare time. The biggest challenge was how to manipulate the GANs latent input space using features extracted from the audio track. I wanted to create a satisfying match between visual and auditory perception for viewers.
Here’s a little insight into what I do: I apply a Fourier Transform to extract time varying frequency components from the audio. I also perform harmonic/percussive decomposition, which basically separates the melody from the rhythmic components of the track. These three signals (instantaneous frequency content, melodic energy, and beats) are then combined to manipulate the GANs latent space, resulting in visuals that are directly controlled by the audio.
Is every image dataset unique? How do you collect images for these datasets, and how many images do you need?
I spent a lot of time collecting large and diverse image datasets to create interesting generative models. These datasets have aesthetics as their primary goal rather than realism, like most GANs. Experimenting with various blends of image collections is time consuming, since GAN training requires lots of compute and I don’t exactly have a data center at my disposal.
Most of the datasets I use are image sets I’ve encountered over the years. I saved them because I knew one day I’d have a use for them. I’ve always had an interest in aesthetics so when I discover something that triggers that sixth sense, I save it.
Most GAN papers use datasets of more than 50,000 images, but in practice you can get away with fewer examples. The first step is to start from a pre-trained GAN model that has already been trained on a large dataset. This means the convolutional filters in the model are already well-shaped and contain useful information about the visual world. Secondly, there’s data augmentation, which is basically flipping or rotating an image to effectively increase the amount of training data. Since I don’t really care about sample realism, I can actually afford to do very aggressive image augmentation. This results in many more training images than actual source images. For example, the model I used for a recent performance at Tate Modern had only 3,000 real images, aggressively augmented to a training set of around 70,000.
Recently, a lot of new research explicitly addresses the low-data regime for GANs (such as what you can find here, here, and here). My current codebase leverages these techniques to train GANs with as little as a few hundred images.
You talk about Neural Synesthesia as a collaboration between yourself and AI. What kind of potential do you see for the future of creative projects utilizing AI technology?
This is actually the most interesting part of the entire project. I usually set out with specific intentions as to what type of visual I want to create. I then curate my dataset, tune the parameters of the training script, and start training the model. A full training run usually requires a few days to converge. Very quickly though, the model starts returning samples that are often unexpected and surprising. This sets an intriguing feedback loop into motion, where I change the code of the model, the model responds with different samples, I react, and it goes on. The creative process is no longer fully under my control; I am effectively collaborating with an AI system to create these works.
I truly believe this is the biggest strength of this approach: you are not limited by your own imagination. There’s an entirely alien system that is also influencing the same space of ideas, often in unexpected and interesting ways. This leads you as a creator into areas you never would have wandered by yourself.
Looking at the tremendous pace of progress in the field of AI strongly motivates me to imagine what might be possible 10 years from now. After all, modern Deep Learning is only 8 years old! I expect that Moore’s law will continue to bring more powerful computing capabilities, that AI models will continue to scale with more compute, and that the possibilities of this medium will follow this exponential trend.
Neural Synesthesia in its current form is a prototype. It’s a version 0.1 of a grander idea to leverage deep learning as the core component of the advanced interactive media experiences of the future.
What kind of creative works do you have planned for the future of Neural Synesthesia? Do you have any goals or future plans?
I’ve always been fascinated by the overview effect, where astronauts describe how seeing the Earth in its entirety from space profoundly changes their worldview, kindling the awareness that we are all part of the same, fragile ecosystem, suspended in the blackness of space.
To me, this is great evidence that profound, alienating experiences can have spectacular effects on people’s choices and behaviors. And what we need is a shift in perception away from tribal feelings of us versus them. We need to move towards a global society with common goals and common challenges.
Our world is increasingly facing global issues that are deeply rooted in our locally-centered world views. These views are deeply rooted in our genes; we evolved in small tribes that only needed to attend to their local environments. However, the world is evolving towards a globally connected web of events, where the present can no longer be disconnected from the system as a whole. For example, look at climate change, and people fighting over artificially drawn borders of nationality, race, or even gender.
As such, my long-term vision is to create rich, immersive experiences with the power to shift perspectives. Cinema 2.0, if you will. I imagine an interactive experience, where a group of people can enter an AI-generated world (e.g. using Virtual Reality headsets) where the visual scenery is so utterly alien and breathtaking that it forces the mind to temporarily halt its usual narrative of describing what’s going on. This is essentially the goal of meditation: to experience the world as it is, emphasizing the experience of the present moment rather than the narrative we construct around it.
The goal then, is to mimic the perceptual shift one can experience from a positive psychedelic experience, meditative insight, or a trip to space. To realize that our ‘normal’ world view is just a tiny sliver of what it is possible to experience. I believe this perceptual shift is probably the most unique human characteristic. It allows the great wonder of imagination to power our world, and is the most powerful tool we have to tackle the world’s largest challenges.
From a technology standpoint, how far away are we from creating these basic “cinema 2.0” experiences?
I would say that from a technical point of view, we’re getting very close. The latest Generative models (e.g. StyleGANv2 or BigGanDeep) are able to create very realistic samples and allow for very high diversity. What is lacking at present are creative tools that let non-coders use this technology to get creative. The main challenge, at least for me, is to create a compelling narrative.
You can see more of Steenbrugge’s Neural Synesthesia work at its dedicated homepage, and try out wzrd.ai here. He’s also active on YouTube and Twitter, and open to collaborating with other creatives who have similar ideas and aspirations. You can contact him at firstname.lastname@example.org.
About Lionbridge AI
With over 20 years of experience as a trusted training data source, Lionbridge AI helps businesses large and small build, test, and improve machine learning models. Our community of 1,000,000+ qualified contributors is located across the globe and available 24/7, providing access to a huge volume of data across all languages and file types. Get in touch today.