The Physics of Comb Filtering Explained

By Marcus Chen · April 28, 2026

The Physics of Comb Filtering Explained

1) Project overview: what, where, who, and why

In March 2026, Sonus Gear Flow was brought into a mid-sized corporate venue in Austin, Texas to troubleshoot a problem that sounded simple but behaved like a physics lesson: speech was intelligible in some seats and hollow or “phasey” in others. The room was a 22 m (72 ft) wide by 30 m (98 ft) deep multipurpose hall with a 6.5 m (21 ft) ceiling, used for quarterly town halls, panel discussions, and hybrid livestreams.

The client’s internal AV team had recently upgraded to a more “modern” audio chain—two flown line-array modules per side, a DSP-driven matrix, and multiple front fills. They also added a confidence foldback wedge for presenters and deployed two floor-standing delay speakers for overflow seating. On paper, coverage looked fine. In reality, the room had strong comb filtering across large sections of the audience and, more critically, in the livestream feed when multiple mics were open.

The team on-site included one system engineer (lead), one RF/mic tech, and a project manager coordinating with facilities and the client’s broadcast operator. The goal was not just to “fix the sound,” but to document a repeatable process the client could use for future room changes. We also wanted to turn the outcome into a training artifact: comb filtering is often discussed abstractly; this was a chance to show it as an operational issue with measurable causes and fixes.

2) Challenges and requirements at the outset

At kickoff, the client listed four requirements:

Improve intelligibility across the first 18 rows where executives typically sit.
Stabilize tonal consistency so a presenter’s voice didn’t change dramatically as they paced.
Fix livestream coloration caused by open mics and acoustic leakage from PA into the stage mics.
Keep downtime minimal: one overnight window (6:00 pm–2:00 am) plus a half-day the next morning for verification.

Constraints shaped the approach. The rigging points were fixed; ceiling access required a lift scheduled weeks out, so we couldn’t reposition arrays. The room had glass on the back wall and painted drywall side walls—reflective surfaces that amplify early reflections. The client also insisted on keeping front fills because VIP seating extended under the balcony lip. Finally, the AV team had been “tuning by ear” during events, which created inconsistent DSP presets and ad hoc delays.

Comb filtering was audible in two classic situations:

Multiple sources reproducing the same program with misaligned arrival times (mains + front fills + delays).
Early reflections arriving within roughly 5–30 ms of the direct sound, creating frequency-dependent cancellations.

Our job was to determine which of those dominated, quantify it, then implement corrections that survived normal operational use.

3) Approach and methodology chosen

We treated this like a forensic alignment project. The core methodology was:

Measure before changing: capture impulse responses, coherence, and transfer functions at representative listener positions.
Separate causes: isolate loudspeaker-to-loudspeaker interference from reflection-driven comb filtering.
Align by precedence: set delays so the intended “priority” source arrives first by a controlled margin (typically 5–10 ms), minimizing destructive summation while preserving localization.
Control coverage, not EQ: avoid “EQing out comb filtering.” Use timing, level, and directivity decisions first.

Tools and instrumentation were chosen for speed and repeatability:

Measurement: Rational Acoustics Smaart v9 on a laptop with a Focusrite Scarlett 2i2 interface.
Mics: Earthworks M30 (reference), plus a Shure SM58 as a “sanity check” for what common mics hear.
DSP access: Q-SYS Core 110f (existing), with control via Q-SYS Designer.
Console: Yamaha DM7 (existing), for routing pink noise and talkback, and verifying livestream sends.

We defined nine measurement positions: three in the front seating, three mid-room, two under the balcony, and one at FOH. We also measured on-stage at the lectern and at the panel table to understand PA spill into microphones.

4) Step-by-step execution narrative

Step 1: Establish a baseline and confirm symptom patterns (6:30–7:15 pm).
We started with a speech playback track and pink noise routed through the mains only. Then we added front fills, then delays, then stage wedge. The “phasey” character increased dramatically when front fills were enabled, and became worse when the delay speakers were enabled without event-specific delay updates. That immediately pointed to multi-source interaction rather than reflections as the primary driver—though reflections were clearly present.

Step 2: Impulse response capture and arrival-time mapping (7:15–8:20 pm).
Using Smaart’s impulse response, we captured arrival times at each listening position for mains, front fills, and delays independently. A consistent pattern appeared in the first 10 rows: front fills were arriving only 0.6–2.5 ms after the mains depending on seat position. That’s a worst-case window for comb filtering because the path-length difference is small enough to create closely spaced notches across the midrange.

To connect that to physics in practical terms: a time offset of 1 ms corresponds to about 0.343 m (13.5 in) of path difference. The first deep cancellation in a two-source sum tends to occur when the offset equals half a wavelength. For 1 ms, the “half-wavelength” frequency is about 500 Hz (because one full cycle at 1 kHz is 1 ms; half-cycle is 0.5 ms). That means a 1 ms offset produces strong cancellations in the vocal fundamental and lower harmonics range, exactly where intelligibility lives.

Step 3: Identify whether the issue is delay polarity, processing latency, or geometry (8:20–9:00 pm).
We confirmed all outputs were polarity-consistent using a polarity pulse and by verifying transfer-function phase traces. No inverted wiring was found. Processing latency differences, however, were real: the front fill output path included an additional 0.75 ms of FIR filtering the mains did not. The delays had a different limiter look-ahead and added 1.2 ms beyond the nominal delay setting. These are normal DSP realities, but they have to be included in time alignment.

Step 4: Choose precedence strategy for each zone (9:00–10:10 pm).
We defined intended source priorities:

Front seating (rows 1–10): front fills should anchor imaging and clarity, with mains as support.
Mid-room: mains should dominate; front fills should be effectively invisible.
Under balcony / overflow: delays should dominate; mains should be late enough to avoid combing but not so late as to create an echo.

In practice, this meant adding delay to the mains in the front zone is not possible (one main feed), so we delayed the front fills to arrive earlier? You can’t “advance” a signal, only delay it. So the workable strategy was to delay the front fills to arrive later than the mains by enough time to reduce destructive summation while keeping them useful as near-field support. The target offset we selected was 7 ms at the measurement point where overlap was greatest. That places the fills outside the tight comb-filter zone and leverages the precedence effect so localization stays with the earliest source (mains), while the fills add level without fighting phase in the critical range.

Step 5: Implement and verify front fill timing and level (10:10–11:20 pm).
Front fills were six compact coaxials (client-owned) mounted along the stage edge, previously set at 0 ms delay. We applied 7.5 ms of delay to the fills, then verified at three front-row positions. The transfer function between mains-only and mains+fills showed improved coherence from 250 Hz to 4 kHz, and the magnitude response smoothed significantly—particularly the repeated 6–10 dB notches between 600 Hz and 1.6 kHz that had been audible as “honkiness” and hollowness.

Then we reduced fill level by 3 dB overall. The client had them set “hot” to impressively boost the first rows, but that increased interference. The combination—more delay and slightly less level—reduced comb depth without sacrificing coverage.

Step 6: Align delay speakers with real processing latency accounted for (11:20 pm–12:35 am).
The delay speakers were positioned about 20 m from the mains. The AV team had previously used a simple distance/343 calculation and set 58 ms. Measurements showed the effective offset at seats under the balcony was only about 51–53 ms once DSP latency differences were considered, putting the mains and delays too close in time and producing strong combing for those listeners.

We set delays to achieve a consistent 10 ms precedence advantage for the delay speakers in the under-balcony zone: delays first, mains second. That meant adding delay to the mains isn’t possible, so we instead added delay to the delay speakers? Again, only delay is possible; to make delays “first,” you reduce their delay. We reduced delay time from 58 ms to 46.5 ms, then iterated in 0.5 ms steps. The final setting was 46.0 ms, which—combined with the measured processing latencies—produced roughly a 9–11 ms advantage for the delay speakers at the key under-balcony positions.

Step 7: Address reflection-driven comb filtering with practical mitigation (12:35–1:25 am).
Not all comb filtering was from multiple loudspeakers. The glass back wall generated a strong reflection arriving around 18–22 ms at mid-room positions. That creates a notch spacing of roughly 1/(0.018–0.022) ≈ 45–55 Hz, which shows up as a repeating ripple in the magnitude response. You can’t eliminate that with timing changes; it’s a room behavior.

With no budget or time for architectural treatment during this window, we made two practical adjustments:

Reduce high-frequency energy aimed at the rear wall by narrowing the mains’ HF shading. The installed arrays supported manufacturer presets with adjustable HF taper; we reduced HF by 2 dB on the lower module and adjusted aiming in software (no physical movement) to shift energy down into the audience rather than back.
Update the broadcast mix strategy so the livestream relied less on “room sound” captured by open mics. We tightened automix parameters and added high-pass filtering to reduce low-mid buildup that made reflections more audible online.

Step 8: Morning verification with real program and mic workflow (9:00–11:30 am next day).
We returned with the client’s typical event setup: two lavs (Shure ULX-D), two handhelds, a lectern gooseneck, and a panel boundary mic. We walked the room during live speech and verified at multiple seats. We also checked that changing console scenes did not overwrite DSP delay values—this had happened before. The project manager worked with the client to lock DSP parameters behind a “tuning” user role.

5) Technical decisions and trade-offs made

We did not EQ away comb filtering. Some notches were 8–12 dB deep and moved with listener position. EQ would only “fix” one seat and worsen others. We used EQ only after timing/level decisions, and only for broad tonal shaping (e.g., a gentle -2 dB shelf above 8 kHz on front fills to match voicing).

We accepted a precedence-based compromise. Setting fills 7.5 ms behind mains slightly reduced their ability to improve localization in the first row, but it prevented the audible “swim” that was the larger complaint. For speech reinforcement, stable timbre and intelligibility beat razor-sharp localization.

We chose fewer, stronger zones over many weak overlaps. The previous configuration attempted to have mains, fills, and delays all contribute everywhere. We retuned so that in any seating area, one system is clearly primary and the others are supportive but not time-competitive.

We documented DSP latency explicitly. The hidden trade-off of advanced processing (FIR, look-ahead limiting, multiple I/O stages) is latency variance. We logged measured latencies per output so future changes wouldn’t reintroduce alignment errors.

6) Results and outcomes with specific details

The improvements were measurable and audible:

Front seating consistency: At three front-row measurement points, magnitude response ripple between 500 Hz and 2 kHz reduced from roughly ±6–8 dB to ±3–4 dB with mains+fills enabled.
Coherence improvement: Average coherence in the critical speech band (250 Hz–4 kHz) improved from ~0.55–0.70 to ~0.75–0.85 at front positions when combining sources.
Under-balcony clarity: After delay adjustment, the “double image” effect disappeared. Audience members seated under the balcony reported the sound as “more direct” and less hollow. Measurements showed fewer deep notches and reduced phase wrap in the crossover region.
Livestream tonal stability: With automix tightened and open-mic gain reduced by 3–6 dB on average, the livestream feed exhibited less room coloration and fewer audible comb artifacts when multiple panel mics were active.

Timeline-wise, the heavy lifting fit into the planned window: about 7 hours overnight for measurement and tuning, plus 2.5 hours the next morning for verification and documentation. The client’s AV lead requested the final “alignment sheet” as a one-page reference: delays, levels, and which zones should be active for each room configuration.

7) Lessons learned and what could be done differently

Comb filtering is often self-inflicted. The strongest artifacts were not the room’s fault. They came from multiple loudspeakers reproducing the same content with small time offsets—exactly the condition that generates dense midrange notches.

Distance-based delay math is a starting point, not an answer. DSP latency differences of 0.7–1.2 ms per path mattered. That’s enough to shift cancellation notches into or out of the vocal range. Next time, we’d measure latency per output immediately and build alignment from that baseline instead of discovering it mid-process.

Operational discipline prevents regressions. The room had been tuned repeatedly, but values drifted because scenes and presets overwrote DSP. Locking parameters and documenting a change-control process was as important as the acoustic work.

Some comb filtering is the room speaking back. The rear-wall reflection remained visible in measurements. If the venue budgets for improvements, adding absorption or diffusion on the back wall (even 30–40 m² of treatment) would reduce the 18–22 ms reflection and improve consistency without touching the PA.

8) Takeaways applicable to other projects

Use precedence intentionally: In overlap zones, pick a primary source and make other sources clearly earlier or later by 5–10 ms, rather than “almost aligned.” “Almost” is where comb filtering is worst.
Measure impulse response and transfer function together: Impulse response shows arrival times; transfer function shows the audible consequence (coherence, magnitude ripple). You need both to make confident decisions.
Don’t EQ notches that move: If the cancellation changes with position, timing/level/directivity are the tools. EQ is for stable, broad issues.
Account for DSP latency and processing variance: FIR filters, limiters, and different signal paths add latency. Measure it and write it down. Your future self will thank you.
Document and lock: Once tuned, protect settings from scene recalls and unauthorized edits. Comb filtering often returns because someone “nudged a delay” during a show and forgot.

Comb filtering isn’t just a concept from textbooks—it’s what happens when two versions of the same sound arrive close together in time. This project reinforced a practical definition: if your system design allows multiple sources to compete in the same seats within a few milliseconds, the physics will show up in the most unforgiving place—human voice. The fix is rarely glamorous. It’s measurement, deliberate precedence, disciplined zoning, and documentation that keeps the system aligned long after the tuning laptop is packed away.