
How to Create Whooshes from Scratch
How to Create Whooshes from Scratch
1) Introduction: why “whoosh” is a technical problem
A convincing whoosh is not “just noise with a filter sweep.” In professional sound design, a whoosh has to read as motion, scale, and intent—often in less than a second—while remaining mix-ready across playback systems. The technical question is: what acoustic cues make the ear interpret a broadband event as an object moving past us, or energy ramping through space?
At a minimum, a whoosh must encode: (1) evolving spectrum (brightening/dulling), (2) amplitude envelope (approach/impact/decay), (3) spatial trajectory (pan/width/depth), and often (4) turbulence or modulation that implies airflow. The best synthetic whooshes deliberately shape these cues using engineering principles: time-varying filters, controlled modulation, psychoacoustic loudness management, and spatial processing aligned with localization science.
2) Background: physics and engineering principles behind whooshes
2.1 Spectral motion and auditory interpretation
The auditory system treats changes in spectral centroid and bandwidth as proxies for changing excitation and distance. A rising spectral centroid is commonly interpreted as an approach or acceleration; a falling centroid reads as departure or dissipation. This is not mystical: it’s a learned mapping from everyday sources (airflow, friction, engines) where higher velocities tend to increase broadband noise and shift energy upward.
2.2 Turbulence, “air,” and why noise works
Many whooshes are fundamentally shaped turbulence. Airflow noise is broadband with energy often concentrated in mid/high bands, and its “alive” character comes from correlated amplitude modulation across bands. Simple white noise is too static; it lacks the slow, random fluctuations (“gustiness”) characteristic of real turbulence.
Two useful models for synthetic turbulence are:
- Filtered noise (white noise into time-varying band-pass or shelving filters) to set timbre and motion.
- Noise with low-frequency modulation (e.g., 2–15 Hz random or quasi-random AM) to simulate gusts without audible tremolo.
2.3 Motion cues: Doppler, level change, and spectral tilt
True Doppler shift is often overused or misapplied in whooshes. In real pass-bys, Doppler is noticeable on tonal components and narrowband resonances (engines, whistles), not on purely broadband noise. For broadband whooshes, the dominant motion cues are:
- Amplitude envelope (approach = increasing, pass = peak, depart = decreasing).
- Spectral tilt (closer = more high-frequency detail; farther/occluded = rolled-off highs).
- Interaural cues: ILD (level difference) and ITD (time difference) and evolving inter-channel coherence.
2.4 Engineering standards and constraints
Whooshes are often used in film/TV and games where deliverables may be measured under standards such as ITU-R BS.1770 loudness (LUFS) and true-peak constraints. Even when not strictly specified, mix translation demands headroom and consistent perceived loudness. A whoosh with uncontrolled sub energy can steal headroom, trigger limiters, and mask dialogue.
3) Detailed technical analysis: building blocks, data points, and repeatable recipes
3.1 Choose a synthesis topology: subtractive, hybrid, or convolutional
For “from scratch” work, three practical topologies cover most professional needs:
- Subtractive: noise source → time-varying filters → modulation → saturation → spatial.
- Hybrid: subtractive core + tonal/resonant layer (FM, wavetable, resonator) to add identity.
- Convolutional / resonant: noise excites an impulse response or physical model (tube, plate, cavity) to imply size.
3.2 Noise source selection: white vs pink vs shaped
Start with noise whose baseline spectrum matches your target:
- White noise: equal power per Hz; sounds bright and “hissy.” Useful for sci-fi, fast, sharp whooshes.
- Pink noise: ~1/f power; more energy in lows; reads fuller, more “wind-like.”
- Brown/red noise: even more low-heavy; can become boomy; use cautiously and high-pass.
Practical data point: in most cinematic mixes, a whoosh rarely needs significant energy below 40–60 Hz unless it’s intentionally a “sub swoop.” A safe starting high-pass is 24 dB/oct at 30–50 Hz, then adjust by context.
3.3 The envelope: the simplest predictor of believability
Use an amplitude envelope that implies pass-by physics rather than a generic fade. A common shape is an asymmetric rise and fall with a short “proximity peak.” Suggested starting values (adjust to tempo and picture):
- Length: 300 ms to 1.5 s (UI and transitions shorter; cinematic moves longer).
- Attack: 50–300 ms (approach).
- Peak hold: 0–60 ms (the “closest point”).
- Release: 150–900 ms (departure).
To avoid a flat, synthesizer-like contour, add 1–3 dB of slow random amplitude variation (low-passed noise at 2–8 Hz) on top of the macro envelope.
3.4 Time-varying filtering: spectral centroid as the motion control
A whoosh’s “travel” is primarily a controlled shift of spectral energy. Implement it with:
- Band-pass sweep (classic): center frequency moves from low to high (or reverse).
- Low-pass / high-shelf automation: approach adds brightness; departure removes it.
- Formant or resonant filters: adds a recognizable “body” (air passing a cavity).
Specific, mix-proven ranges:
- Band-pass center: ~200 Hz → 6 kHz for broad cinematic whooshes; narrower for focused swishes.
- Q (resonance): 0.5–2 for natural wind; 2–8 for stylized, “laser-swish” edges.
- High shelf: +2 to +8 dB above 4–8 kHz for approach; automate down on exit.
Visual description (spectrum over time): imagine a spectrogram where a dense noise cloud “tilts” upward—energy gradually climbing from midrange into the presence band—then thinning and darkening after the peak.
3.5 Modulation: making noise feel like moving air
Static filtered noise reads as “static.” Add controlled modulation in multiple domains:
- Filter cutoff modulation: random LFO at 0.5–5 Hz (subtle) to mimic turbulent eddies changing the effective aperture.
- Amplitude modulation: 2–12 Hz random or sample-and-hold with smoothing. Keep depth modest (5–20%) to avoid audible tremolo.
- Stereo decorrelation: micro-delays (0.2–1.5 ms) or all-pass networks to widen without obvious echo.
Engineering note: if you plan to sum to mono, keep very short delays and constantly check correlation. Extremely wide, decorrelated noise can partially collapse in mono, changing the perceived level and brightness.
3.6 Nonlinearity and “edge”: saturation that survives the mix
Broadband elements can disappear under music because they lack midrange density. Moderate saturation increases perceived loudness and presence without requiring large peak levels. A practical chain is:
- Soft clip / tape-like saturation after the filter sweep.
- Drive to add 2nd/3rd harmonic density in 1–5 kHz (where human sensitivity is high).
- Oversampling if available (2×–8×) to reduce aliasing, especially for aggressive brightness or high-Q resonances.
Keep true-peak headroom. If your deliverable is -1 dBTP max (common streaming constraint), design the whoosh to peak below that even after bus processing.
3.7 Spatial motion: pan is not enough
A convincing whoosh often needs a trajectory: left-to-right, front-to-back, expanding width, or “overhead.” You can synthesize this using a combination of:
- Pan automation with equal-power law.
- Width automation: narrow at distance, wider at closest pass, narrow again on exit.
- Early reflections: short room response to place the whoosh in the scene.
- Pre-delay automation: increase pre-delay slightly to imply increased distance (subtle: 5–25 ms range).
If mixing for immersive formats, object-based panning plus height sends can add realism. In stereo-only contexts, avoid hard-left-to-hard-right moves unless you want a stylized “wipe.” Many natural pass-bys never fully collapse to one speaker; they arc through a field.
3.8 A repeatable “from-scratch” recipe (subtractive core)
Core chain: Pink noise → HPF → band-pass sweep → gentle saturation → transient shaping (optional) → stereo decorrelation → short ambience
- HPF: 30–50 Hz, 24 dB/oct.
- Band-pass: center 300 Hz → 5 kHz; Q 0.8–1.5.
- Macro envelope: 120 ms attack, 15 ms hold, 420 ms release (adjust length).
- Random AM: 4 Hz low-pass noise mod, 10% depth.
- Saturation: 2–6 dB drive; output trim to maintain headroom.
- Ambience: early reflections or 0.3–0.8 s room, low mix (5–15%).
This produces a “neutral cinematic” whoosh that can then be specialized: brighter and shorter for UI; darker and longer for large-scale transitions; more resonant and distorted for sci-fi.
4) Real-world implications: designing whooshes that mix and translate
4.1 Loudness, crest factor, and why whooshes cause limiter pumping
Whooshes can have high RMS energy over a short window, which is exactly what makes bus compression and limiting react. If a whoosh is too broadband and too loud, it can cause “ducking” of dialogue or music, even if peaks aren’t extreme. Manage this by:
- Controlling sub energy (HPF and dynamic EQ).
- Reducing 200–500 Hz buildup if it masks speech fundamentals.
- Managing 2–5 kHz where harshness and perceived loudness accumulate.
Practical check: monitor short-term loudness (3 s window) and momentary loudness (400 ms window) if available. Even without targeting a specific LUFS for a single effect, these meters reveal whether your whoosh will dominate program loudness briefly.
4.2 Mono compatibility and playback systems
Many whooshes are used in environments where mono fold-down happens (mobile devices, TVs at distance, in-store systems). If your whoosh relies on phasey widening, it may thin out in mono. Confirm by summing to mono and checking:
- Does the peak drop dramatically (more than ~3 dB)?
- Does the timbre lose highs or become hollow?
If it does, reduce inter-channel delay, favor mid/side EQ over phasey wideners, and keep essential energy in the mid channel.
5) Case studies: professional workflows and examples
5.1 Film trailer transition whoosh (big but controlled)
Goal: A 1.2 s whoosh that feels massive, supports a title reveal, and doesn’t obscure narration.
Build:
- Layer A (air): pink noise band-pass sweep (250 Hz → 4.5 kHz), mild saturation.
- Layer B (body): resonator or convolved noise through a “large metal tube” IR; HPF at 60 Hz, LPF at 6–8 kHz.
- Layer C (edge): short white-noise burst with high-pass at 2 kHz, 150–250 ms long, aligned near the peak for definition.
Mix notes: dynamic EQ dips 2–4 kHz by 1–3 dB during narration-sensitive moments; sidechain from dialogue bus if needed. Peak managed to remain under -1 dBTP post-bus limiting.
5.2 UI swipe whoosh (fast, tactile, consistent across devices)
Goal: 120–250 ms swish that reads on phone speakers.
Build: white noise → steep band-pass around 1–6 kHz → fast envelope → light transient shaping.
- Keep energy centered in 1.5–5 kHz where small speakers reproduce reliably.
- Avoid heavy sub content; it wastes headroom and won’t translate.
Consistency trick: normalize perceived level using short-term loudness, not peak normalization. Two swishes with identical peaks can feel very different in loudness depending on bandwidth and saturation.
5.3 Game whoosh library (variation without chaos)
Goal: Dozens of whooshes that share a signature but don’t repeat obviously.
Workflow: design a parametric patch where randomness is bounded:
- Randomize sweep time within ±15%.
- Randomize filter cutoff endpoints within a constrained range (e.g., end frequency 4–7 kHz).
- Randomize modulation depth slightly (e.g., 8–14%).
This produces controlled variation that remains mix-compatible and stylistically coherent.
6) Common misconceptions (and what to do instead)
- Misconception: “A whoosh is just a filter sweep.”
Correction: The ear listens for multi-cue motion: envelope, turbulence modulation, and spatial evolution. Add gustiness and width/depth changes, not only cutoff automation. - Misconception: “More sub makes it bigger.”
Correction: Sub energy often reads as “rumble,” not “air.” Size is frequently conveyed by low-mid body (100–300 Hz), controlled brightness, and early reflections. Use sub intentionally and manage it with high-pass/dynamic EQ. - Misconception: “Doppler on everything equals realism.”
Correction: Doppler is most audible on tonal components. For broadband whooshes, focus on spectral tilt and amplitude trajectory; add a resonant or tonal layer if you want Doppler to be perceptible and meaningful. - Misconception: “Wider is always better.”
Correction: Extreme decorrelation can collapse unpredictably in mono and smear localization. Use width automation and early reflections; keep the core readable in mono.
7) Future trends: where whoosh design is heading
- Procedural/parameterized sound design in engines: More whooshes are generated at runtime (especially in games/VR) using bounded randomness and real-time filtering tied to gameplay variables (speed, distance, camera motion).
- Better turbulence models: Expect increased use of physically informed noise modulation—band-limited stochastic processes and multi-band correlated modulators that mimic real airflow statistics more closely than simple LFOs.
- Immersive and binaural-first workflows: With wider adoption of Dolby Atmos and binaural rendering, whooshes will increasingly be designed as moving objects with controlled divergence and height cues, not as static stereo beds.
- AI-assisted variation (with constraints): Not “generate a whoosh,” but “generate 20 variants within these spectral and loudness bounds,” helping libraries scale while preserving mix discipline.
8) Key takeaways for practicing engineers
- Design motion, not just timbre: spectral centroid movement + asymmetric envelope + evolving spatial cues is the core recipe.
- Use noise intelligently: pink for natural wind-like body, white for crisp edges; add gustiness via low-rate random modulation.
- Control bandwidth and headroom: high-pass below ~30–50 Hz unless the brief demands sub; manage 2–5 kHz to avoid harshness and masking.
- Layer for readability: “air” (broadband), “body” (resonant/IR), and “edge” (short bright accent) yields whooshes that survive music-heavy mixes.
- Mix translation matters: check mono fold-down and correlation; avoid relying on phase tricks for the main identity.
- Make it repeatable: build parameterized patches with bounded randomness for consistent libraries across projects.
When whooshes are built from first principles—envelope physics, spectral motion, turbulence modulation, and robust spatial strategy—they stop being generic transitions and become controllable, scene-specific tools. The result is not just a better “swoosh,” but a mix-aware motion cue that reads immediately and holds up under professional delivery constraints.









