Prototyping Spatial Audio for Movement Art

One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial audio has almost completely centered around capturing 360 audio at the source to complement 360 video. However, Oscillations has goals for augmenting spatial audio using non-spatialized sources to create novel auditory experiences that couldn’t simply be captured at the source.

For example, Oscillations has the idea of matching individual dancers in a 360 video to specific instruments of a song. One person likened it to the experience of standing amongst an orchestra; depending on one’s position, different instruments are amplified. For an augmented spatial audio experience, the viewer would be able to hear where the dancers are by tracking where the instruments attached to them are. The sound of a specific instrument would be amplified when the viewer looks at a specific dancer. This would create a unique audio-visual synchronous experience unlike anything done before.

We sought out to develop a workflow for prototyping a spatial audio experience, with the ultimate goal of porting a final prototype to easily accessible viewing methods such as Facebook 360 or YouTube 360 Videos. Oscillations provided us with wonderful 360 footage of some dancers to work with, and our goal was to match specific audio sources to their locations as they moved around the camera. We only had limited time to dedicate to this side project, because most of our time went into a detailed comparison of motion capture techniques (which you can read about in a separate blog post). This meant we never quite finished a refined prototype that had a convincing, spatialized sound. However, we did develop an effective (and free!) workflow, and with more time put into fine-tuning the process, we think it can create convincing augmented spatial audio experiences.

This workflow is designed around Unity. You’ll need the following components:

We decided on Unity because of its ease-of-use and many plugins that deal directly with 360 video and spatial audio. Google Resonance Audio is a new, free option designed exactly for what we want to do: create augmented spatial audio out of non-spatialized audio sources.

Here’s what it looks like:

Spatial audio Unity test

And here is our workflow:

  1. Acquire footage. We worked with Oscillations footage shot in early 2018 using an Insta360 Pro. It captured 8K monoscopic 360 and 6K stereoscopic 360 footage. Unity handled these large resolutions fairly well, and it can play both monoscopic and stereoscopic footage. Earlier in the quarter we also successfully used footage from a Gear 360.

  2. Set up a Unity project with these components:

    1. SteamVR. We did our testing with the HTC Vive.

    2. Google Resonance Audio. Project configuration

    3. Unity Panoramic Skybox Shader

  3. Add 360 video to Unity. This works by turning 360 video into the skybox. Instructions for setup.

  4. Add audio to Unity. Kyle (a Knight Lab fellow) helped us out by creating high and low frequency versions of the song used in one of the videos shot by Oscillations. Ideally, we would have access to songs with instrument separation for more interesting audio spatialization. These should be attached to AudioSources, following the Google Resonance Audio instructions to set AudioSource properties for proper spatialization:

  5. Use Timeline keyframing to manipulate audio source. This is the trickiest part of the project. If you’re mapping spatial audio onto a non-moving object or person, then you don’t have to do this. But if you’re trying to map audio onto a moving subject, then you need a way to move the AudioSource to match up with the subject in the video. There is no easy way to do this since the video exists within the skybox, outside of the dimensions of the project. We were hoping that motion capture of the subjects within the 360 video could give us data that we could export into Unity and use to attach the AudioSources, but this was not technically feasible due to limitations in the software we used (MochaVR). So we developed a workaround using Unity Timeline, which allows for keyframing to manually manipulate the audio’s location and match the subject’s location in the video.

    1. Here’s an introduction to Timeline we found helpful

    2. And here’s detailed Timeline documentation

    3. Important note: Go to Edit > Project Settings > Audio and turn Doppler Factor to 0. This prevents an annoying artificial Doppler Effect that ruins the spatial audio experience.

  6. Export the 360 video and ambisonic audio to a popular platform for viewing. Keeping the project in Unity is fine for prototyping stages, but it limits the experience to a Unity app, requiring a lot of work of potential viewers. Exporting can allow anyone to experience the video with a smartphone, Google Cardboard, and headphones. While we did not test this part of the project, we found several solutions that should work.

    1. Free option: Unity beta 360 video capture. This is a brand new solution from Unity that allows for exporting 360 videos from within Unity. This can be coupled with Google Resonance Audio’s ability to export ambisonic audio in the AmbiX ACN-SN3D format, which is used by Facebook 360 and YouTube. All spatial audio properties afforded by Resonance Audio can be exported in this format.

    2. VR Panorama 360 PRO Renderer. This paid plugin supports up to 8K stereoscopic 360 video export with ambisonic audio that should work with YouTube.

    3. Helios. This is another paid plugin that supports up to 8K stereoscopic 360 video export, but it’s not clear how compatible it would be with Resonance Audio’s ambisonic audio export.

We think spatial audio can create amazing, immersive experiences that take full advantage of the audiovisual synchrony principles promoted by Oscillations. The workflow we have presented here is an accessible way to prototype spatial audio experiences with the possibility of reaching a wide audience.

About the project

OscillationsImmersive Virtual Experiences in the Performing Arts

Advancements in neuroscience and immersive technologies offer mechanisms for engineering an entirely new mode of performance art one that engages audiences to unprecedented degree. Using the latest VR production techniques, students used motion capture and machine learning to teach a computer to improvise a performance, creating an engaging VR experience.

About the authors

Gabriel Caniglia

Interested in the future of human-computer interaction.

More results from Oscillations

  • Oscillations Audience Engagement Research Findings

    During the Winter 2018 quarter, the Oscillations Knight Lab team was tasked in exploring the question: what constitutes an engaging live movement arts performance for audiences? Oscillations’ Chief Technology Officer, Ilya Fomin, told the team at quarter’s start that the startup aims to create performing arts experiences that are “better than reality.” In response, our team spent the quarter seeking to understand what is reality with qualitative research. Three members of the team interviewed more...

    Continue Reading

  • Comparing Motion Capture Techniques for Movement Art

    With Oscillations’ connection to the movement arts, it made sense to experiment with existing motion capture technology to find accurate, consistent, and scalable ways to obtain three-dimensional motion data for purposes such as animation or machine learning to augment performances in virtual reality. An additional motivation to learn more about motion capture was connected to our early experiments with spatial audio (read more about them in our spatial audio blog post). Apart from using ambisonic...

    Continue Reading