Creating Explosions Foley for AR

Creating Explosions Foley for AR

By Sarah Okonkwo ·

1) Introduction: the technical problem AR exposes

Explosions are one of the most abused sound categories in media: they’re often designed for impact on stereo speakers, not for plausibility in a listener’s immediate environment. Augmented Reality (AR) forces the issue. When a “virtual” explosion is anchored to a real room, your audio must survive scrutiny from a brain that is continuously cross-checking audio cues against the user’s actual acoustics, head movement, and visual tracking. A conventional cinematic boom may still “sound cool,” but it can fail in AR because:

This article treats “explosion foley for AR” as an engineering task: building assets and playback logic that preserve believable shock, scale, direction, and environmental integration under head tracking and device constraints.

2) Background: physics and engineering principles behind “explosion sound”

2.1 What the ear interprets as an explosion

An explosion in air is fundamentally a rapid release of energy that creates a high-pressure front (shock wave in the near field of the blast) followed by turbulent flow, debris interactions, and often a sustained burn/roar. For audio purposes, listeners parse explosions into layers:

2.2 Distance law, air absorption, and the “boom that vanishes” on small speakers

In free field, sound pressure level decays approximately with inverse distance (−6 dB per doubling of distance), assuming a point source and no boundary gain. AR experiences are rarely free-field: a living room adds early reflections and low-frequency modal buildup, while outdoor scenes may have strong ground reflections. Your foley must accommodate a range of playback environments.

Air absorption increases with frequency and distance. At 20°C and 50% RH, attenuation above 10 kHz can become noticeable over tens of meters; at 100 m, high-frequency content is heavily reduced compared to low-mid energy. This is critical for AR because visual distance is explicit: if the explosion is rendered as 50 m away but retains lots of 8–12 kHz “crackle,” it contradicts the physics cues.

On mobile/headset speakers, energy below ~150–250 Hz may be reproduced weakly or via psychoacoustic bass enhancement. Explosion design for AR must therefore include perceptual low-end cues (50–120 Hz “feel” translates poorly) using harmonics and controlled distortion or resonant layers that translate on small drivers while staying believable on headphones.

2.3 Time structure: rise time, precedence, and head tracking

Humans localize transients using ITD/ILD and onset timing. A true blast has a steep onset; if your foley uses slow fades or smeared transients, localization becomes ambiguous. Under head tracking, ambiguity becomes obvious as the sound “swims” instead of pinning to an AR anchor.

Early reflections are interpreted under the precedence effect: the first arriving wavefront dominates localization, while later arrivals contribute spaciousness. In AR, the real room already provides reflections from loudspeaker playback (or headphone leakage), so adding strong synthetic early reflections can shift apparent source position or blur it.

2.4 Standards and metrics relevant to AR asset prep

While AR platforms vary, professional pipelines benefit from consistent measurement:

3) Detailed technical analysis: building explosion foley that survives AR

3.1 Source layering architecture (recommended)

A robust AR explosion asset is not a single WAV; it’s a bundle of coherent elements with separate control in-engine:

3.2 Data points: timing and spectral targets that read as “explosion”

Real blasts vary enormously, but for designed foley that reads well in AR, the following heuristics are effective:

3.3 Recording and foley construction methods

AR explosions are typically designed rather than recorded literally, for safety and control. Professional foley/sound design techniques that produce convincing components:

Microphone strategy: even for foley, record multiple perspectives. A practical setup includes a close mic for transient detail (e.g., dynamic or small diaphragm condenser), and a mid/far mic for natural air and time smear. Maintain phase awareness: align or deliberately offset layers, but avoid accidental cancellations around 80–200 Hz where “body” lives.

3.4 Processing chain: preserving punch under spatial rendering

Spatializers (HRTF binaural, Ambisonics decoders, object-based renderers) can change peak levels and perceived brightness. A conservative processing approach:

3.5 AR-specific: environmental integration without double-rooming

In AR, a common failure mode is double room: you add a big tail, then the user’s room adds another, resulting in a smeared, detached explosion. Strategies:

3.6 Visual description: a practical layer timeline diagram

Imagine a horizontal timeline from 0 to 6 seconds:

4) Real-world implications: mixing, playback systems, and user safety

4.1 Headphones vs speakers: the AR split

AR users may listen on open speakers (phone/tablet), near-ear transducers, or sealed headphones. Each changes your explosion design priorities:

4.2 Safety and comfort

Explosions can cause discomfort if dynamic peaks are aggressive, especially in headsets. A pragmatic engineering stance is to cap effect peaks in the mix bus (not by crushing assets) and to maintain consistent loudness relative to UI and dialogue. If your platform supports it, provide a “reduced intensity” mode that reduces transient level and LF body by a few dB without destroying the event’s readability.

4.3 Asset memory and CPU budgets

Long, high-sample-rate stereo files consume bandwidth and memory. AR benefits from a hybrid approach: short PCM assets for transient/body, and procedurally generated or parameterized tails (reverb, filtered noise) computed in-engine. When streaming is required, ensure your codec choice doesn’t pre-echo transients; explosions are particularly revealing of transform coding artifacts.

5) Case studies: professional patterns that work

5.1 “Tabletop detonation” in a living room AR demo

Scenario: a small virtual charge detonates on a coffee table, with users standing 1–2 m away. The room is the user’s actual living room; you cannot predict RT60.

5.2 “Street-level blast” in an outdoor AR navigation experience

Scenario: a virtual explosion occurs 30–50 m down a street. Visuals show a flash and dust plume.

5.3 “Cinematic” explosion adapted for head-tracked binaural

Scenario: a pre-existing library explosion designed for film is reused in AR. Typical issues: heavy bus limiting, wide stereo low end, baked reverb, and phasey layers.

6) Common misconceptions (and what actually works)

7) Future trends: where AR explosion audio is heading

8) Key takeaways for practicing engineers

AR doesn’t require that explosions be “real.” It requires that they be consistent with the user’s sensory context. When the transient is localizable, the spectrum implies the correct scale, and the environmental response doesn’t fight the room the user is standing in, explosions stop sounding like a sound effect and start sounding like an event.