
Convolution Texture Creation Guide
Convolution Texture Creation Guide
1) Introduction: why “texture” is the real question behind convolution
Convolution is usually introduced as “make a signal sound like it’s in a space” via an impulse response (IR). That explanation is accurate, but it undersells what engineers actually chase in production: texture. Texture is the perceptual blend of time-domain microstructure (early reflections, combing, diffusion), spectral shaping (air absorption, boundary coloration), and nonlinear “density” cues (which classic convolution cannot reproduce directly, but can suggest through clever IR design).
This guide frames convolution as a texture creation tool: how to design, capture, edit, and deploy IRs to sculpt depth, grit, size, smoothness, and “material feel” in a repeatable, technically defensible way. The focus is on experienced engineers and acoustics-minded users who want control over time alignment, deconvolution quality, phase, noise floor, and the perceptual tradeoffs between realism and mix utility.
2) Background: convolution, LTI assumptions, and what an IR really encodes
At its core, convolution reverb implements the output of a linear time-invariant (LTI) system:
y(t) = x(t) * h(t)
where x(t) is the dry signal, h(t) is the impulse response, and * denotes convolution. In discrete time, this is:
y[n] = Σ x[k] · h[n−k]
An IR is the system’s response to a Dirac impulse, but in practice we measure it using sweeps or noise and recover it via deconvolution. The IR encodes:
- Early reflections: discrete arrival events (often 1–80 ms) that define size, distance, and boundary geometry.
- Late reverberation: a dense stochastic tail whose decay is often summarized by RT60 and frequency-dependent decay rates.
- Spectral coloration: boundary absorption, air loss, microphone/directivity effects, and source spectrum if the measurement isn’t carefully normalized.
- Phase/time dispersion: frequency-dependent arrival structure and comb-filter patterns.
Convolution assumes linearity and time invariance. Many “textures” we love—tape compression, plate drive, loud transducers, spring “boing”—involve nonlinearity or time variance. But convolution can still approximate parts of those textures by capturing the linear component accurately and then combining it with controlled nonlinear stages (pre/post) or by using multi-IR and dynamic convolution methods.
3) Detailed technical analysis: measurement, deconvolution, editing, and numeric targets
3.1 Measurement signals: impulse, MLS, and log sine sweep
Direct impulse (starter pistol, balloon pop) is historically popular but often compromised by limited bandwidth and poor repeatability. Modern practice favors:
- MLS (Maximum Length Sequence): good SNR efficiency but sensitive to time variance and distortion; can smear if the system isn’t strictly LTI.
- Exponential/log sine sweep (ESS): the current workhorse because it separates harmonic distortion in time during deconvolution, enabling cleaner linear IR extraction.
For texture-focused IRs, ESS is typically best. A common sweep setup for rooms and hardware is:
- Sweep length: 10–30 s (longer for higher SNR in large spaces)
- Bandwidth: 20 Hz–20 kHz (or 10 Hz–24 kHz when capturing infra/ultra content)
- Sample rate: 48 kHz minimum; 96 kHz recommended when you expect strong HF cues or plan heavy time-stretching
- Level target: keep measurement chain below clipping with at least 12 dB headroom; avoid loudspeaker compression
3.2 Deconvolution quality: alignment, windowing, and noise floor
After recording the sweep, you deconvolve using the inverse sweep. The resulting IR must be time-aligned and cleanly windowed. Small errors here are exactly what separates “realistic” from “phasey” texture.
Time alignment: identify the direct sound peak (or the first physically plausible arrival) and set it to a defined sample position. For mix textures, many engineers intentionally shift the IR so that the first energy begins after 0 ms (a built-in predelay). But do it deliberately rather than leaving arbitrary latency. A practical predelay range:
- Intimate/room texture: 0–10 ms
- Vocal depth without wash: 15–35 ms
- Cinematic separation: 40–80 ms
Windowing: early reflections carry localization; the late tail carries “envelopment.” When creating texture IRs, you can treat these as separate layers:
- Early window: typically 0–80 ms (small room) up to 0–120 ms (larger spaces)
- Late window: from early cutoff to tail end (often 1.0–6.0 s depending on desired decay)
Noise floor: If your tail decays into a noisy HVAC bed or preamp hiss, convolution will “print” that noise on every signal. For professional-grade IR libraries, aim for a tail noise floor at least 60–70 dB below the early peak. If you can achieve 80 dB below peak (in a quiet hall at night with long sweep), the IR will survive heavier send levels without audible “air hiss.”
3.3 Frequency-dependent decay and RT60 shaping (without pretending it’s just EQ)
Texture is strongly controlled by frequency-dependent decay. Real spaces often have longer LF decays and shorter HF decays due to air absorption and material losses. When editing IRs, avoid treating the tail as a static EQ problem: decay is time-varying by frequency.
Engineers often target approximate decay profiles, for example:
- Modern scoring stage vibe: RT60 around 1.6–2.2 s, with HF (8–16 kHz) decaying 20–40% faster than mids
- Controlled studio room: RT60 around 0.3–0.6 s, relatively flat to avoid tonal “room notes”
- Plate-like smoothness: perceived RT60 1.2–2.8 s but with very dense early buildup and fewer distinct discrete reflections
To shape this inside an IR, you can apply multi-band envelope processing: split the IR into bands (e.g., 4 bands at 250 Hz, 1 kHz, 4 kHz crossovers), apply different exponential decay envelopes, then recombine. This preserves the “decay character” better than broad EQ alone.
3.4 Phase, minimum-phase conversion, and when “wrong” is useful
IRs are inherently phaseful. The phase response contributes to comb filtering and the sense of “edge” or “hollowness.” Many tools offer minimum-phase conversion, which keeps magnitude response while removing excess phase. This can:
- Reduce pre-ringing artifacts in certain IR-derived EQ processes
- Make short ambiences feel tighter and less “swimmy”
But it can also destroy the physical early reflection timing that creates spatial plausibility. A practical guideline:
- Use minimum-phase for color IRs (cabinet, mic, resonator textures) where you want stable tonal imprint.
- Keep linear/true phase for space IRs where early reflection geometry matters.
3.5 Stereo and surround IRs: true stereo, LR, MS, and decorrelation
Texture becomes convincing when the spatial channels are measured and used appropriately. Common formats:
- Mono IR: fast, but collapses spatial cues; best for “tone verbs.”
- Stereo IR (L→L, R→R): essentially dual mono; limited crossfeed cues.
- True stereo (4-channel IR: L→L, L→R, R→L, R→R): captures cross-channel energy that creates width and envelopment.
- MS-based capture: offers flexible width control; useful when you want adjustable ambience spread.
For texture creation, true stereo IRs often feel larger and more “expensive” at lower send levels because crossfeed reflections fill the panorama. If CPU is a concern, a pragmatic compromise is early reflections in true stereo with a mono or dual-mono late tail.
3.6 A “diagram” of IR anatomy (visual description)
Picture an IR waveform plotted over time:
- 0 ms: a tall spike (direct sound arrival, if included).
- 1–20 ms: several smaller spikes (first-order reflections from nearby boundaries), spaced irregularly.
- 20–120 ms: increasingly dense spikes (higher-order reflections), approaching a noise-like texture.
- 120 ms onward: a decaying “fuzz” (late reverb tail) with amplitude envelope falling roughly exponentially, often with frequency-dependent decay visible in a spectrogram as HF dying sooner than LF.
When building a convolution texture, you’re essentially sculpting the spike timing, the density ramp, and the spectral decay profile.
4) Real-world implications: how convolution texture choices show up in mixes
Texture decisions become audible as:
- Depth and distance: stronger early reflections relative to direct sound pushes sources back; longer predelay preserves front placement.
- Size illusion: early reflection spacing and density imply room dimensions. Sparse early reflections with long gaps can feel “warehouse-like.” Dense early buildup reads as smaller or more diffuse environments.
- Sibilance behavior: HF decay shape affects whether vocals feel airy or spitty. A tail that holds 8–12 kHz too long exaggerates “sss” and cymbal hash.
- Low-end clarity: LF energy in the tail masks kick/bass. Highpassing the IR (or using a band-limited tail) often improves translation.
A particularly useful mental model: convolution texture is a controlled way to add correlated complexity. Unlike algorithmic reverb where parameters generate new structure, convolution repeats the same measured structure each time. That repeatability is a strength for cinematic consistency, and a weakness when you want lively modulation to avoid metallic buildup. Engineers often add subtle modulation post-convolution (micro-pitch, chorus at 0.1–0.3 Hz, a few cents) to simulate time variance.
5) Case studies: professional workflows and repeatable recipes
Case study A: “Room glue” for multi-mic drums without obvious reverb
Goal: Make close mics feel like they belong together, with minimal audible tail.
Method:
- Capture or choose an IR with strong early reflections and short RT: 0.3–0.6 s.
- Edit IR to emphasize 0–60 ms region; fade tail aggressively after 250–500 ms.
- Highpass IR at 120–200 Hz to avoid kick/bass masking; optionally lowpass at 8–12 kHz.
- Predelay: 0–5 ms (or none), to keep immediacy.
Why it works: The ear interprets early reflection patterns as shared space cues. A short, filtered convolution tail increases cohesion without washing transients. This is especially effective on parallel sends from snare/toms rather than inserting on each channel.
Case study B: Vocal “front-and-wide” using split IR layers
Goal: Keep the vocal upfront while creating width and a premium halo.
Method:
- Create two IR derivatives from the same source: an early IR (0–90 ms) and a late IR (90 ms onward).
- Early IR: true stereo, widened via MS decode or stereo width tool, predelay 10–20 ms.
- Late IR: mono or narrow stereo, darker HF decay (lowpass around 7–10 kHz) and reduced 2–4 kHz to avoid presence masking.
- Blend: early higher than late; keep late tail 10–15 dB below early energy for pop clarity.
Why it works: The early layer provides lateral cues (width) without pushing the vocal back; the late layer supplies sustain and luxury without turning consonants into haze.
Case study C: Hardware “color convolution” for post and sound design
Goal: Impose a recognizable material or device fingerprint (megaphone, handset, resonant cavity, passive filter network).
Method:
- Measure a device chain using ESS at 96 kHz, moderate level to keep it linear.
- Trim IR to the device’s significant response (10–200 ms typical for resonant objects; shorter for filters).
- Minimum-phase convert to stabilize transient behavior if the goal is tonal imprint rather than spatial realism.
- Combine with mild saturation after convolution to restore the nonlinear “push” users expect from hardware.
Why it works: Convolution excels at linear coloration and resonant signatures. Paired with nonlinear processing, it becomes a reliable texture engine for post effects that must match across scenes.
6) Common misconceptions (and corrections)
- Misconception: “An IR is the room.”
Correction: An IR is the room plus loudspeaker, mic, placement, directivity, and the specific source/receiver geometry. Changing mic pattern from omni to cardioid materially changes HF balance and early reflection capture. - Misconception: “Longer IR always sounds more realistic.”
Correction: Past a point, you’re convolving noise floor and HVAC. Realism comes from clean early structure and plausible decay behavior, not necessarily a 15-second tail. - Misconception: “Convolution can’t be creative.”
Correction: IR editing (time slicing, band-envelope shaping, spectral tilts, reverse segments) enables textures algorithmic reverbs don’t naturally produce—especially “found spaces” and object resonances. - Misconception: “Stereo IRs are just wider.”
Correction: True stereo is about cross-channel reflection energy. It changes depth and envelopment perception at the same send level, not just width. - Misconception: “EQ before or after convolution is equivalent.”
Correction: Pre-EQ changes what excites the IR; post-EQ changes the resulting composite. If you’re managing sibilance triggering, pre-EQ is often more effective. If you’re fitting the reverb into a mix, post-EQ is more typical.
7) Future trends: beyond static IRs
Several developments are pushing convolution from static realism toward dynamic texture engines:
- Dynamic convolution / level-dependent IR sets: multiple IRs captured at different drive levels or conditions, interpolated in real time. This addresses the “linear-only” limitation, especially for springs, plates, and certain analog chains.
- Time-varying convolution (modulated IRs): subtle randomized micro-shifts or phase modulation to reduce metallic repetition. Expect more hybrid engines that combine algorithmic modulation with convolution-derived early structure.
- Spatial/immersive IR workflows: higher-order ambisonic (HOA) IRs and object-based rendering for Atmos and beyond. Texture creation becomes about steering reflection energy and maintaining timbral consistency across playback layouts.
- Machine-learning assisted IR cleaning and synthesis: denoising tails without destroying decay character, estimating frequency-dependent decay curves, and generating “plausible” late fields from measured early reflections (hybrid convolution/algorithmic tails).
One practical near-term direction: measured early reflections + synthetic late reverb. Early reflections are the most location-specific and difficult to fake; late tails can be generated with high quality and modulation. Hybrid engines reduce CPU and increase mix flexibility while preserving the “fingerprint” that makes convolution attractive.
8) Key takeaways for practicing engineers
- Texture is timing + density + decay shape, not just “a reverb sound.” Treat the IR as an editable object with anatomy.
- Capture cleanly: ESS sweeps (10–30 s), sufficient headroom, and a tail noise floor ideally < −70 dB relative to peak will outperform “cool” spaces captured poorly.
- Align and window deliberately: define the direct/early start, set predelay intentionally, and consider splitting early/late layers for control.
- Shape frequency-dependent decay with multiband envelopes when you need realism or mix-fit; EQ alone doesn’t replicate time-varying absorption.
- Choose phase strategy based on purpose: true-phase for spatial realism; minimum-phase for tonal/color IR applications.
- True stereo matters when you want width and envelopment at low send levels; it’s not just “stereo marketing.”
- Hybridize when needed: add subtle modulation, saturation, or algorithmic late tails to overcome the static, linear nature of classic convolution.
Convolution texture creation is less about finding the perfect IR and more about building a repeatable chain: disciplined capture, careful deconvolution, purposeful editing, and mix-aware deployment. When you treat IRs as engineered assets—measured, validated, and sculpted—you gain a level of spatial and tonal authorship that’s difficult to achieve any other way.









