I’ve been working with Iyad Assaf on implementation of 3DBARE for some months now. He’s the coding guru on this team. My initial concept for 3DBARE was inspired by the visionary avant-gardist Edgard Varese who, writing in the 1930s was able to envision technical solutions to musical problems that would not materially exist until decades later. There remains a significant gap between the ‘impossible’ music it is now possible to make in the digital studio, and the still very limited, fixed means of listening which we use. 3DBARE is the search for what Varese refers to as ‘zones of intensities’, the ‘differentiation of masses of sound, hitherto unobtainable’.
Back in 2011, I approached and eventually persuaded ISVR (Institute of Sound & Vibration Research) to get on board and build an initial 3D audio engine. I remember the naïve optimism of thinking that this would be the bulk of the work done.
They built two interesting engines that were in some ways impressive, with different benefits and drawbacks, I’ll write more elsewhere about these, but they had a common problem. They guzzled data at such a rate that they’d only work with a single user listening to very few sounds at once. This was to make both tools essentially irrelevant to our intended purpose: for a sizable audience to walk inside an apparently live performance and hear it according to the way they explored the physical space.
Iyad heard about some of the work I was doing with GPS-based tracking for music listening, using noTours software and offered to help out: a partnership was born. Over the past months, he has been working on a number of possible solutions for the real crux of the engine, the tracking controllers.
After trying for some time to obtain a workable multi-user solution to ISVR’s engine, we went back to the drawing board.
Iyad’s initial, apparently simple task of creating a user interface to control the second of ISVR’s audio engines was to turn into a complex ongoing development project which is at last yielding some very interesting fruit.
The ISVR engine created a 3D virtual sound world by convolving head related transfer files (HRTF), made by recording sounds from different orientations around a dummy head (a mannequin with microphones where the ears should be). It worked reasonably well with very few audio sources but became completely inefficient with more than a couple of sounds. It was built using Max MSP, which is intrinsically data-heavy and difficult to implement in other platforms due to lack of readily accessible text-based code (the user works in a graphical layout).
The new interface was created using Processing, a java-based creative coding platform used widely by artists, and controlled a much simpler binaural plugin, again using MAX. The UI allows mapping where audio should be placed in a virtual space in relation to the listener. At it’s current state, the user can export the source data to be used on an independent device.
This is what it looks like (the listener is the black dot at the hub of radiating red lines, individual sounds are marked as blue circles. Only the ones connected to the listener by red lines are audible, for a good reason I’ll explain later).
The creator of the second engine at ISVR was doing a Masters in Acoustics and about to head straight off into a job in sound proofing. He was therefore pretty bemused by our excitable urgency regarding imaginary musical instruments dotted around an otherwise empty room. We couldn’t always work out what he’d done or why, in the programming and communication difficulties finally led to starting from first principles on a completely new audio engine.
Iyad started in MAX MSP, using an external plug-in called ‘binaural+’, which did all of the directional processing, removing the need to load individual HRTF files. The engine was designed to be controlled by the user interface (above), creating an instance of the binaural panning external for each audio source, scaling the loudness depending on user distance from a source and mixing it with all other sources to create the impression of multiple sources emanating from points in the physical space.
We were still limited to a single plane of rotation – a horizontal ring in which to rotate sounds’ positions – and this is of course a major block to realism.
Acknowledging the complexity of adding the other two planes (or a total of “6 degrees of freedom”) we also remained convinced of the need to develop in such a way that would allow these to be added later without complete reconstruction. At first though, without much thought for how we would track the user, the concept was to have a base computer performing extensive audio processing and sending the audio to the audience members via wireless headsets.
We remembered how much data was used by the first audio engine, even with a single user turning an onscreen rotary slider to create the binaural panning and listening to just two audio sources. (This engine too, worked not in 3D but, in single planar rotation only. Rather than sounds everywhere, as you would hear in the real world, the effect was similar to 1970s and 80s ‘ambisonics’ with rings of speakers around a seated audience.)
So we looked at how, instead of streaming all the data not only about a listener’s movement but the audio itself – as we intended to do for lots of listeners at once! – we could use independent mobile devices.
This had several advantages, perhaps the biggest of which is that all of the audio processing is onboard. These devices use wireless technologies like Wi-Fi and Bluetooth to track the user, the application can be downloaded and used by many independent listeners at the same time and it can be implemented at low cost.
Mobile devices and Max/MSP don’t work together so if that was our framework, the code would need complete rewriting for compatibility with mobile operating systems.
We researched hardware and coding solutions that innovators around the web have been developing at incredible speed over the past year or two. Whether to start with iOS, Android or even the raspberry pi?
The current way forward is to take a version of the processing interface online, using openAL (Open Audio Library), a cross platform, open source audio library that can place audio in a 3D environment. In this way we can demonstrate the audio rendering remotely. Here the tracking is user-operated as an avatar seen from above.
The library responds to relative vector-based positions of listener and source to control perceived distance and direction of sounds.
Similar libraries such as FMOD are used in many first person video games to give a greater sense of presence and immersion into the gaming environment where for example you might hear a vehicle approaching from behind, where the audio library ‘places’ the sound behind the user with realistic spatialised effect.
Developing for iOS and testing on the almost identical OSX platform, substantial experimental work has now been completed into the audio playback system. One influential finding was that openAL is not optimized for lengthy synchronized audio files, which the project requires, so the audio has to be played through the “audio toolbox framework”, and then streamed into openAL.
Ultimate development for Android is also anticipated although this is going to require another complete reengineering of the code to work with this very different platform.
Our task at the moment is to have a working prototype for iOS, written in a language called Objective-C, (Android works on Java). Both platforms do however support the openAL framework.
The main aspect to the project at the moment is developing really accurate tracking and using this to control the audio rendering as the listener moves, to determine their proximity and orientation in relation to each virtual source.
Wireless signal tracking works by measuring Bluetooth or Wi-Fi signal strength of several anchor points – if the receiving device (smartphone) knows where the anchor points are, it can roughly determine it’s own location. An issue with this is that the signal strength can fluctuate significantly in different environmental conditions.
Iyad is now working with wireless signal strength tracking via Bluetooth; the processing can be done on the device and the lowest cost option for implementation over a wide physical space. Signal strength fluctuations can be minimized using smoothing algorithms based upon static anchor point locations. Both Bluetooth and Wi-Fi are of this, but since Apple’s removal of Wi-Fi support Bluetooth is the chosen method. In the last two generations of iOS devices, Apple has moved to the ‘Bluetooth Low Energy’ (Bluetooth LE) or ‘Bluetooth 4.0’ specification. In contrast to previous iterations that send continual data to other devices, Bluetooth LE can be used to send small bits of information without rapidly draining battery power.
This works well for this project as the only information the smartphone requires is the signal strength value, so very little data needs to be transmitted. For this purpose, Iyad has been experimenting with four Bluetooth LE stickers that run on watch batteries for up to a year.
Sold as ‘The StickNFind’, they are intended to allow the user to locate their lost belongings by way of signal strength – exactly what we need for this project. An early concern that since the technology uses very little energy, the update rate would be insufficient to create a fluid tracking model, but it is in fact frequent enough to remove the requirement for complex interpolation algorithms.
The 3DBARE product will be intended for the following applications:
-3DBARE Composer: A user interface where the composer can place and set attributes to their sources in different positions around a given space.
-3DBARE Setup: A means to set up the wireless tracking system within a space, the user measures the area and the computer returns vector-position values for where the trackers must be positioned in order to track users effectively.
-3DBARE Client: The 3DBARE audio application as used by listeners, which will be available on iOS and Android.
Another future possibility of the project involves streaming live musicians through the system in order to have live, but virtual performance. This would involve one or more microphones per instrument to be sent into a computer, and then streamed over a local network to the mobile devices. This means that the handset devices always have to be within range of the Bluetooth device.
A further possibility is for the development of external control to change the perceptual positions of active audios sources via automation, in a kind of ultra-acousmatic setting where listeners are motionless in a motion-filled sonic field.
The current stage involves synchronous multi-channel audio playback through Audio Toolbox and openAL and control of virtual spatialisation via Bluetooth tracking.
We are also conducting some really exciting research into using the new HTML5’s audio rendering capability which will be multi-browser, multi-device and here you will shortly be able to try out our first online demo with an original composition from the digital studio, another experiment in advanced virtual performance simulation.
Subscribe or follow on Twitter; #benjaminmawson for updates and to be the first to hear it!