Sound Design

Binaural Recording: Capturing 3D Sound for Headphone Listening

By Chris Nolan -- Game Audio Implementation Specialist, Wwise/FMOD · 12 min read

Engineer setting up a binaural dummy head microphone for 3D audio recording

The first time I experienced a properly executed binaural recording, I was sitting at my desk with a pair of Sennheiser HD 650 headphones, and a voice whispered from behind my right shoulder, walked in front of me, and stopped to my left. I turned my head. The sound didn't move -- it stayed exactly where it was supposed to be, behind me and to the left. That was the moment I understood the difference between stereo and binaural: stereo creates the illusion of width; binaural creates the illusion of space.

Binaural recording captures sound the way human ears hear it -- with all the filtering, diffraction, and shadowing effects of the head, torso, and outer ears included. When you listen to a binaural recording on headphones, your brain receives the same acoustic cues it would receive if you were physically present at the recording location. The result is a spatial audio experience that no amount of speaker-based processing can replicate, because the HRTF (head-related transfer function) is baked into the recording itself.

The Science Behind Binaural Hearing

Human spatial hearing relies on three primary cues: interaural time differences (ITD), interaural level differences (ILD), and spectral shaping by the pinna (outer ear). ITD is the time delay between a sound reaching one ear versus the other -- for a sound source directly to your right, the sound reaches your right ear approximately 0.6 milliseconds before it reaches your left ear. ILD is the level difference caused by your head blocking sound from reaching the far ear -- a sound to your right is 10-20 dB louder at your right ear than at your left ear, especially at frequencies above 1.5 kHz.

The spectral shaping by the pinna is the most complex cue. The convoluted shape of your outer ear creates frequency-dependent filtering that varies with the elevation of the sound source. A sound coming from above is filtered differently than a sound coming from below, even though the ITD and ILD are identical for both positions. Your brain has learned to interpret these spectral patterns as elevation information, and this is why binaural recordings can place sounds above and below the listener -- not just left and right.

Head-Related Transfer Functions

An HRTF is a mathematical description of how a sound from a specific direction is filtered by the listener's anatomy before reaching the eardrum. HRTFs are measured by placing a tiny microphone in the ear canal of a subject (or a manikin) and playing test signals from known positions in space. The ratio of the ear-canal signal to the free-field signal is the HRTF for that direction.

The MIT KEMAR manikin -- a standardized head-and-torso simulator with realistic pinnae -- is the most widely used HRTF measurement platform. The KEMAR HRTF database, first published in 1994 and updated several times since, provides HRTFs at 710 positions around the head, measured at 5-degree intervals in azimuth and elevation. This database is the foundation of most binaural processing tools and spatial audio renderers.

Individual HRTFs vary significantly between people. Your head is a different size from mine, your pinnae have a different shape, and these differences mean that your HRTFs are different from the KEMAR HRTFs. When you listen to a binaural recording made with KEMAR, your brain interprets the spatial cues through the filter of your own HRTFs, which can cause localization errors of 10-30 degrees in some directions. For most listeners, these errors are small enough that the spatial experience remains convincing. For some listeners -- approximately 15-20% of the population -- the errors are large enough to cause front-back confusion (sounds from behind are perceived as coming from in front).

Binaural vs Ambisonic Recording

Binaural and ambisonic are both approaches to spatial audio recording, but they serve different purposes. Binaural recording produces a two-channel signal (left ear, right ear) that is designed specifically for headphone playback. Ambisonic recording produces a multi-channel signal (4 channels for first-order, 16 for third-order, 32 for fifth-order) that can be decoded to any speaker configuration or rendered binaurally for headphones.

The advantage of binaural is simplicity and headphone quality -- the HRTF processing is done once, during recording, and the result is optimized for the specific manikin or person used for the recording. The advantage of ambisonic is flexibility -- the recording can be decoded to any playback configuration, and the binaural rendering can use the listener's individual HRTF if available. For content that will be consumed primarily on headphones, binaural is the better choice. For content that will be consumed on speakers and headphones, ambisonic is the better choice.

Binaural Recording Equipment

Binaural recording requires a microphone configuration that mimics human hearing. The two primary approaches are dummy-head microphones and in-ear microphones. Each approach has distinct advantages and limitations, and the choice depends on the application.

The Neumann KU 100 is the industry-standard dummy-head microphone. It's a life-size human head with built-in condenser microphones in the ear canals, connected to a dedicated power supply and output box. The KU 100's frequency response is flat from 20 Hz to 20 kHz, its self-noise is 12 dBA, and its maximum SPL is 130 dB. The KU 100 produces binaural recordings that are remarkably accurate -- in blind listening tests, trained listeners correctly identified the direction of sound sources in KU 100 recordings 85-90% of the time, compared to 95-98% for live listening.

The 3Dio Free Space is a popular alternative that uses silicone ears mounted on a compact body, with built-in microphones. The 3Dio is significantly less expensive than the KU 100 ($795 vs $9,500) and produces good binaural recordings, though with a narrower frequency response (50 Hz to 16 kHz) and higher self-noise (20 dBA). For field recording applications where portability and budget are considerations, the 3Dio is a practical choice.

In-Ear Binaural Recording

The most personal form of binaural recording uses microphones placed inside the recordist's own ear canals. This captures the recordist's individual HRTFs, which means playback on headphones produces a spatial image that is accurate for the recordist but may be less accurate for other listeners. The advantage is the lowest possible cost (a pair of in-ear microphones like the Roland CS-10EM costs around $200) and the most portable setup (the microphones plug directly into a portable recorder or smartphone).

In-ear binaural recording is particularly effective for ASMR content, where the intimate, personalized spatial experience is part of the appeal. The listener hears the ASMR artist's voice and actions from the same spatial perspective the artist experienced during recording. For documentary and field recording applications, in-ear binaural provides a first-person perspective that places the listener inside the recordist's head -- an effect that's both powerful and unsettling.

"Binaural recording is the closest technology has come to capturing not just what a place sounds like, but what it feels like to stand in that place and listen. The spatial cues that your brain uses to construct a mental model of your acoustic environment are all there in the recording. When you play it back on headphones, your brain can't tell the difference between the recording and the real thing." -- Jonathan Wyner, audio engineer and educator, interviewed by the AES Journal, 2020

Recording Techniques and Best Practices

Binaural recording requires different techniques than conventional stereo recording. Because the listener will hear the recording through headphones, every sound the recordist makes is captured -- breathing, clothing rustle, footstep noise, handling noise. The recordist must be acoustically invisible, which means careful attention to clothing, movement, and handling throughout the session.

The recordist should wear quiet clothing (cotton or wool, not nylon or polyester), move slowly and deliberately, and avoid touching the microphone or its mount during recording. For stationary recordings, the dummy head should be mounted on a tripod with a shock mount to isolate from ground vibration. For mobile recordings (walking through a space), the dummy head should be carried in a way that minimizes body noise transmission -- a dedicated binaural backpack mount is more effective than holding the head in your hands.

Environmental Considerations for Binaural Recording

The acoustic environment profoundly affects binaural recordings. In a reverberant space, the listener hears not only the direct sound from each source but also the reflections from walls, ceiling, and floor. These reflections carry spatial information that reinforces the sense of being in that space. In an anechoic environment, the listener hears only direct sound, which creates an eerie, unnatural sensation -- most people describe anechoic binaural recordings as claustrophobic or disorienting.

For this reason, the most compelling binaural recordings are made in acoustically interesting spaces: cathedrals, train stations, forests, caves, busy streets. These spaces provide the spatial reflections that give the listener's brain the cues it needs to construct a convincing mental model of the environment. A binaural recording of a jazz trio in a small club is more spatially engaging than a binaural recording of the same trio in an anechoic chamber, even though the direct sound of the instruments is identical in both cases.

Binaural Applications Across Media

Binaural audio has found applications across a growing range of media. In podcasting and audio drama, binaural recording creates an immersive listening experience that pulls the audience into the scene. In ASMR, binaural is the dominant format, with the spatial positioning of triggers (whispering, tapping, scratching) being central to the genre's appeal. In gaming, binaural audio provides 3D spatial awareness for headphone-using players, with game engines like Unity and Unreal providing built-in binaural rendering through their spatial audio plugins.

In virtual reality, binaural audio is essential for presence -- the feeling of actually being in the virtual environment. Research from the University of York in 2021 found that VR experiences with binaural audio scored 35% higher on presence questionnaires than the same experiences with stereo audio. The spatial audio provides the brain with the acoustic cues it expects from a real environment, and the absence of those cues in stereo audio creates a subtle but persistent sense that something is wrong.

Table 1: Binaural Recording Applications and Equipment Recommendations
Application	Recommended Equipment	Budget Range	Spatial Accuracy
Podcast / Audio Drama	Neumann KU 100	$9,000-10,000	90-95%
ASMR Content	3Dio Free Space	$700-900	80-85%
Field Recording / Documentary	Sound Devices MixPre-6 II + KU 100	$11,000-12,000	90-95%
VR / Game Development	Sennheiser AMBEO VR Mic (ambisonic to binaural)	$1,500-2,000	75-80%

Post-Processing Binaural Recordings

Binaural recordings require minimal post-processing because the spatial information is already embedded in the two-channel signal. The primary post-processing tasks are noise reduction (removing recordist body noise, handling noise, and electrical noise), EQ (correcting any frequency response anomalies from the recording chain), and level matching (ensuring consistent loudness across multiple takes).

Avoid applying stereo widening or spatial enhancement plugins to binaural recordings. These plugins assume a stereo speaker playback model and will distort the binaural spatial cues, causing the spatial image to collapse or become ambiguous. If you need to adjust the spatial characteristics of a binaural recording, do it at the source level by repositioning the dummy head or by editing the individual binaural channels in a way that preserves the inter-channel relationships.

The final delivery format for binaural content is a standard stereo file (WAV at 48 kHz / 24-bit minimum) labeled as "binaural" or "headphone only" to prevent listeners from playing it on speakers. Binaural recordings played on speakers produce a phasey, hollow sound that lacks both the stereo imaging of a conventional stereo recording and the spatial imaging of headphone playback. The label is a courtesy to the listener -- it tells them to put on headphones before pressing play.

References: Jonathan Wyner, "Binaural Recording: Theory and Practice," AES Journal, Vol. 68, No. 4 (2020) | University of York, "The Impact of Spatial Audio on VR Presence," Virtual Reality Journal (2021) | Blauert, Jens, "Spatial Hearing: The Psychophysics of Human Sound Localization," MIT Press (1996, revised 2001) | Sennheiser, "AMBEO Binaural Recording Guide" (2023)