How to Create Whooshes from Scratch

How to Create Whooshes from Scratch

By Marcus Chen ·

How to Create Whooshes from Scratch

1) Introduction: why “whoosh” is a technical problem

A convincing whoosh is not “just noise with a filter sweep.” In professional sound design, a whoosh has to read as motion, scale, and intent—often in less than a second—while remaining mix-ready across playback systems. The technical question is: what acoustic cues make the ear interpret a broadband event as an object moving past us, or energy ramping through space?

At a minimum, a whoosh must encode: (1) evolving spectrum (brightening/dulling), (2) amplitude envelope (approach/impact/decay), (3) spatial trajectory (pan/width/depth), and often (4) turbulence or modulation that implies airflow. The best synthetic whooshes deliberately shape these cues using engineering principles: time-varying filters, controlled modulation, psychoacoustic loudness management, and spatial processing aligned with localization science.

2) Background: physics and engineering principles behind whooshes

2.1 Spectral motion and auditory interpretation

The auditory system treats changes in spectral centroid and bandwidth as proxies for changing excitation and distance. A rising spectral centroid is commonly interpreted as an approach or acceleration; a falling centroid reads as departure or dissipation. This is not mystical: it’s a learned mapping from everyday sources (airflow, friction, engines) where higher velocities tend to increase broadband noise and shift energy upward.

2.2 Turbulence, “air,” and why noise works

Many whooshes are fundamentally shaped turbulence. Airflow noise is broadband with energy often concentrated in mid/high bands, and its “alive” character comes from correlated amplitude modulation across bands. Simple white noise is too static; it lacks the slow, random fluctuations (“gustiness”) characteristic of real turbulence.

Two useful models for synthetic turbulence are:

2.3 Motion cues: Doppler, level change, and spectral tilt

True Doppler shift is often overused or misapplied in whooshes. In real pass-bys, Doppler is noticeable on tonal components and narrowband resonances (engines, whistles), not on purely broadband noise. For broadband whooshes, the dominant motion cues are:

2.4 Engineering standards and constraints

Whooshes are often used in film/TV and games where deliverables may be measured under standards such as ITU-R BS.1770 loudness (LUFS) and true-peak constraints. Even when not strictly specified, mix translation demands headroom and consistent perceived loudness. A whoosh with uncontrolled sub energy can steal headroom, trigger limiters, and mask dialogue.

3) Detailed technical analysis: building blocks, data points, and repeatable recipes

3.1 Choose a synthesis topology: subtractive, hybrid, or convolutional

For “from scratch” work, three practical topologies cover most professional needs:

3.2 Noise source selection: white vs pink vs shaped

Start with noise whose baseline spectrum matches your target:

Practical data point: in most cinematic mixes, a whoosh rarely needs significant energy below 40–60 Hz unless it’s intentionally a “sub swoop.” A safe starting high-pass is 24 dB/oct at 30–50 Hz, then adjust by context.

3.3 The envelope: the simplest predictor of believability

Use an amplitude envelope that implies pass-by physics rather than a generic fade. A common shape is an asymmetric rise and fall with a short “proximity peak.” Suggested starting values (adjust to tempo and picture):

To avoid a flat, synthesizer-like contour, add 1–3 dB of slow random amplitude variation (low-passed noise at 2–8 Hz) on top of the macro envelope.

3.4 Time-varying filtering: spectral centroid as the motion control

A whoosh’s “travel” is primarily a controlled shift of spectral energy. Implement it with:

Specific, mix-proven ranges:

Visual description (spectrum over time): imagine a spectrogram where a dense noise cloud “tilts” upward—energy gradually climbing from midrange into the presence band—then thinning and darkening after the peak.

3.5 Modulation: making noise feel like moving air

Static filtered noise reads as “static.” Add controlled modulation in multiple domains:

Engineering note: if you plan to sum to mono, keep very short delays and constantly check correlation. Extremely wide, decorrelated noise can partially collapse in mono, changing the perceived level and brightness.

3.6 Nonlinearity and “edge”: saturation that survives the mix

Broadband elements can disappear under music because they lack midrange density. Moderate saturation increases perceived loudness and presence without requiring large peak levels. A practical chain is:

Keep true-peak headroom. If your deliverable is -1 dBTP max (common streaming constraint), design the whoosh to peak below that even after bus processing.

3.7 Spatial motion: pan is not enough

A convincing whoosh often needs a trajectory: left-to-right, front-to-back, expanding width, or “overhead.” You can synthesize this using a combination of:

If mixing for immersive formats, object-based panning plus height sends can add realism. In stereo-only contexts, avoid hard-left-to-hard-right moves unless you want a stylized “wipe.” Many natural pass-bys never fully collapse to one speaker; they arc through a field.

3.8 A repeatable “from-scratch” recipe (subtractive core)

Core chain: Pink noise → HPF → band-pass sweep → gentle saturation → transient shaping (optional) → stereo decorrelation → short ambience

This produces a “neutral cinematic” whoosh that can then be specialized: brighter and shorter for UI; darker and longer for large-scale transitions; more resonant and distorted for sci-fi.

4) Real-world implications: designing whooshes that mix and translate

4.1 Loudness, crest factor, and why whooshes cause limiter pumping

Whooshes can have high RMS energy over a short window, which is exactly what makes bus compression and limiting react. If a whoosh is too broadband and too loud, it can cause “ducking” of dialogue or music, even if peaks aren’t extreme. Manage this by:

Practical check: monitor short-term loudness (3 s window) and momentary loudness (400 ms window) if available. Even without targeting a specific LUFS for a single effect, these meters reveal whether your whoosh will dominate program loudness briefly.

4.2 Mono compatibility and playback systems

Many whooshes are used in environments where mono fold-down happens (mobile devices, TVs at distance, in-store systems). If your whoosh relies on phasey widening, it may thin out in mono. Confirm by summing to mono and checking:

If it does, reduce inter-channel delay, favor mid/side EQ over phasey wideners, and keep essential energy in the mid channel.

5) Case studies: professional workflows and examples

5.1 Film trailer transition whoosh (big but controlled)

Goal: A 1.2 s whoosh that feels massive, supports a title reveal, and doesn’t obscure narration.

Build:

Mix notes: dynamic EQ dips 2–4 kHz by 1–3 dB during narration-sensitive moments; sidechain from dialogue bus if needed. Peak managed to remain under -1 dBTP post-bus limiting.

5.2 UI swipe whoosh (fast, tactile, consistent across devices)

Goal: 120–250 ms swish that reads on phone speakers.

Build: white noise → steep band-pass around 1–6 kHz → fast envelope → light transient shaping.

Consistency trick: normalize perceived level using short-term loudness, not peak normalization. Two swishes with identical peaks can feel very different in loudness depending on bandwidth and saturation.

5.3 Game whoosh library (variation without chaos)

Goal: Dozens of whooshes that share a signature but don’t repeat obviously.

Workflow: design a parametric patch where randomness is bounded:

This produces controlled variation that remains mix-compatible and stylistically coherent.

6) Common misconceptions (and what to do instead)

7) Future trends: where whoosh design is heading

8) Key takeaways for practicing engineers

When whooshes are built from first principles—envelope physics, spectral motion, turbulence modulation, and robust spatial strategy—they stop being generic transitions and become controllable, scene-specific tools. The result is not just a better “swoosh,” but a mix-aware motion cue that reads immediately and holds up under professional delivery constraints.