The Physics of Comb Filtering Explained

The Physics of Comb Filtering Explained

By Marcus Chen ·

The Physics of Comb Filtering Explained

1) Project overview: what, where, who, and why

In March 2026, Sonus Gear Flow was brought into a mid-sized corporate venue in Austin, Texas to troubleshoot a problem that sounded simple but behaved like a physics lesson: speech was intelligible in some seats and hollow or “phasey” in others. The room was a 22 m (72 ft) wide by 30 m (98 ft) deep multipurpose hall with a 6.5 m (21 ft) ceiling, used for quarterly town halls, panel discussions, and hybrid livestreams.

The client’s internal AV team had recently upgraded to a more “modern” audio chain—two flown line-array modules per side, a DSP-driven matrix, and multiple front fills. They also added a confidence foldback wedge for presenters and deployed two floor-standing delay speakers for overflow seating. On paper, coverage looked fine. In reality, the room had strong comb filtering across large sections of the audience and, more critically, in the livestream feed when multiple mics were open.

The team on-site included one system engineer (lead), one RF/mic tech, and a project manager coordinating with facilities and the client’s broadcast operator. The goal was not just to “fix the sound,” but to document a repeatable process the client could use for future room changes. We also wanted to turn the outcome into a training artifact: comb filtering is often discussed abstractly; this was a chance to show it as an operational issue with measurable causes and fixes.

2) Challenges and requirements at the outset

At kickoff, the client listed four requirements:

Constraints shaped the approach. The rigging points were fixed; ceiling access required a lift scheduled weeks out, so we couldn’t reposition arrays. The room had glass on the back wall and painted drywall side walls—reflective surfaces that amplify early reflections. The client also insisted on keeping front fills because VIP seating extended under the balcony lip. Finally, the AV team had been “tuning by ear” during events, which created inconsistent DSP presets and ad hoc delays.

Comb filtering was audible in two classic situations:

Our job was to determine which of those dominated, quantify it, then implement corrections that survived normal operational use.

3) Approach and methodology chosen

We treated this like a forensic alignment project. The core methodology was:

Tools and instrumentation were chosen for speed and repeatability:

We defined nine measurement positions: three in the front seating, three mid-room, two under the balcony, and one at FOH. We also measured on-stage at the lectern and at the panel table to understand PA spill into microphones.

4) Step-by-step execution narrative

Step 1: Establish a baseline and confirm symptom patterns (6:30–7:15 pm).
We started with a speech playback track and pink noise routed through the mains only. Then we added front fills, then delays, then stage wedge. The “phasey” character increased dramatically when front fills were enabled, and became worse when the delay speakers were enabled without event-specific delay updates. That immediately pointed to multi-source interaction rather than reflections as the primary driver—though reflections were clearly present.

Step 2: Impulse response capture and arrival-time mapping (7:15–8:20 pm).
Using Smaart’s impulse response, we captured arrival times at each listening position for mains, front fills, and delays independently. A consistent pattern appeared in the first 10 rows: front fills were arriving only 0.6–2.5 ms after the mains depending on seat position. That’s a worst-case window for comb filtering because the path-length difference is small enough to create closely spaced notches across the midrange.

To connect that to physics in practical terms: a time offset of 1 ms corresponds to about 0.343 m (13.5 in) of path difference. The first deep cancellation in a two-source sum tends to occur when the offset equals half a wavelength. For 1 ms, the “half-wavelength” frequency is about 500 Hz (because one full cycle at 1 kHz is 1 ms; half-cycle is 0.5 ms). That means a 1 ms offset produces strong cancellations in the vocal fundamental and lower harmonics range, exactly where intelligibility lives.

Step 3: Identify whether the issue is delay polarity, processing latency, or geometry (8:20–9:00 pm).
We confirmed all outputs were polarity-consistent using a polarity pulse and by verifying transfer-function phase traces. No inverted wiring was found. Processing latency differences, however, were real: the front fill output path included an additional 0.75 ms of FIR filtering the mains did not. The delays had a different limiter look-ahead and added 1.2 ms beyond the nominal delay setting. These are normal DSP realities, but they have to be included in time alignment.

Step 4: Choose precedence strategy for each zone (9:00–10:10 pm).
We defined intended source priorities:

In practice, this meant adding delay to the mains in the front zone is not possible (one main feed), so we delayed the front fills to arrive earlier? You can’t “advance” a signal, only delay it. So the workable strategy was to delay the front fills to arrive later than the mains by enough time to reduce destructive summation while keeping them useful as near-field support. The target offset we selected was 7 ms at the measurement point where overlap was greatest. That places the fills outside the tight comb-filter zone and leverages the precedence effect so localization stays with the earliest source (mains), while the fills add level without fighting phase in the critical range.

Step 5: Implement and verify front fill timing and level (10:10–11:20 pm).
Front fills were six compact coaxials (client-owned) mounted along the stage edge, previously set at 0 ms delay. We applied 7.5 ms of delay to the fills, then verified at three front-row positions. The transfer function between mains-only and mains+fills showed improved coherence from 250 Hz to 4 kHz, and the magnitude response smoothed significantly—particularly the repeated 6–10 dB notches between 600 Hz and 1.6 kHz that had been audible as “honkiness” and hollowness.

Then we reduced fill level by 3 dB overall. The client had them set “hot” to impressively boost the first rows, but that increased interference. The combination—more delay and slightly less level—reduced comb depth without sacrificing coverage.

Step 6: Align delay speakers with real processing latency accounted for (11:20 pm–12:35 am).
The delay speakers were positioned about 20 m from the mains. The AV team had previously used a simple distance/343 calculation and set 58 ms. Measurements showed the effective offset at seats under the balcony was only about 51–53 ms once DSP latency differences were considered, putting the mains and delays too close in time and producing strong combing for those listeners.

We set delays to achieve a consistent 10 ms precedence advantage for the delay speakers in the under-balcony zone: delays first, mains second. That meant adding delay to the mains isn’t possible, so we instead added delay to the delay speakers? Again, only delay is possible; to make delays “first,” you reduce their delay. We reduced delay time from 58 ms to 46.5 ms, then iterated in 0.5 ms steps. The final setting was 46.0 ms, which—combined with the measured processing latencies—produced roughly a 9–11 ms advantage for the delay speakers at the key under-balcony positions.

Step 7: Address reflection-driven comb filtering with practical mitigation (12:35–1:25 am).
Not all comb filtering was from multiple loudspeakers. The glass back wall generated a strong reflection arriving around 18–22 ms at mid-room positions. That creates a notch spacing of roughly 1/(0.018–0.022) ≈ 45–55 Hz, which shows up as a repeating ripple in the magnitude response. You can’t eliminate that with timing changes; it’s a room behavior.

With no budget or time for architectural treatment during this window, we made two practical adjustments:

Step 8: Morning verification with real program and mic workflow (9:00–11:30 am next day).
We returned with the client’s typical event setup: two lavs (Shure ULX-D), two handhelds, a lectern gooseneck, and a panel boundary mic. We walked the room during live speech and verified at multiple seats. We also checked that changing console scenes did not overwrite DSP delay values—this had happened before. The project manager worked with the client to lock DSP parameters behind a “tuning” user role.

5) Technical decisions and trade-offs made

We did not EQ away comb filtering. Some notches were 8–12 dB deep and moved with listener position. EQ would only “fix” one seat and worsen others. We used EQ only after timing/level decisions, and only for broad tonal shaping (e.g., a gentle -2 dB shelf above 8 kHz on front fills to match voicing).

We accepted a precedence-based compromise. Setting fills 7.5 ms behind mains slightly reduced their ability to improve localization in the first row, but it prevented the audible “swim” that was the larger complaint. For speech reinforcement, stable timbre and intelligibility beat razor-sharp localization.

We chose fewer, stronger zones over many weak overlaps. The previous configuration attempted to have mains, fills, and delays all contribute everywhere. We retuned so that in any seating area, one system is clearly primary and the others are supportive but not time-competitive.

We documented DSP latency explicitly. The hidden trade-off of advanced processing (FIR, look-ahead limiting, multiple I/O stages) is latency variance. We logged measured latencies per output so future changes wouldn’t reintroduce alignment errors.

6) Results and outcomes with specific details

The improvements were measurable and audible:

Timeline-wise, the heavy lifting fit into the planned window: about 7 hours overnight for measurement and tuning, plus 2.5 hours the next morning for verification and documentation. The client’s AV lead requested the final “alignment sheet” as a one-page reference: delays, levels, and which zones should be active for each room configuration.

7) Lessons learned and what could be done differently

Comb filtering is often self-inflicted. The strongest artifacts were not the room’s fault. They came from multiple loudspeakers reproducing the same content with small time offsets—exactly the condition that generates dense midrange notches.

Distance-based delay math is a starting point, not an answer. DSP latency differences of 0.7–1.2 ms per path mattered. That’s enough to shift cancellation notches into or out of the vocal range. Next time, we’d measure latency per output immediately and build alignment from that baseline instead of discovering it mid-process.

Operational discipline prevents regressions. The room had been tuned repeatedly, but values drifted because scenes and presets overwrote DSP. Locking parameters and documenting a change-control process was as important as the acoustic work.

Some comb filtering is the room speaking back. The rear-wall reflection remained visible in measurements. If the venue budgets for improvements, adding absorption or diffusion on the back wall (even 30–40 m² of treatment) would reduce the 18–22 ms reflection and improve consistency without touching the PA.

8) Takeaways applicable to other projects

Comb filtering isn’t just a concept from textbooks—it’s what happens when two versions of the same sound arrive close together in time. This project reinforced a practical definition: if your system design allows multiple sources to compete in the same seats within a few milliseconds, the physics will show up in the most unforgiving place—human voice. The fix is rarely glamorous. It’s measurement, deliberate precedence, disciplined zoning, and documentation that keeps the system aligned long after the tuning laptop is packed away.