Sidechain Compression for Podcast and Spoken Word

Sidechain Compression for Podcast and Spoken Word

By Marcus Chen ·

Sidechain Compression for Podcast and Spoken Word

1) What you’ll learn (and why it matters)

Sidechain compression is a way to make one sound automatically move out of the way of another. In podcast and spoken-word work, that usually means your voice stays intelligible while music beds, intro themes, sound effects, or remote guests stay present but never compete with the message. Done well, it reduces manual automation, keeps levels consistent from episode to episode, and prevents the classic “music swallows the first sentence” problem.

This tutorial walks you through a practical, repeatable workflow: setting up routing, choosing a compressor, dialing in threshold/ratio/attack/release, filtering the sidechain so the compressor reacts to speech rather than plosives, and troubleshooting the common failures (pumping, lisping, over-ducking, latency, and inconsistent ducking).

2) Prerequisites / setup requirements

3) Step-by-step instructions

  1. Action: Decide what should trigger the ducking

    What to do: Choose the “key” signal that controls the compressor. For most podcasts, the key is the main host vocal. If you have multiple speakers, create a Dialogue Key bus that receives all voice tracks (host + guest + co-host) and use that bus as the sidechain input.

    Why: If only the host triggers ducking, a guest may start talking and the music won’t move, creating sudden masking. A combined key ensures the music responds to whichever voice is active.

    Technique: Route all dialogue tracks to a bus named DIA KEY at unity gain (0.0 dB). This bus does not need to be audible; it can be set to “no output” if your DAW allows.

    Pitfalls: If your key bus is post-fader and you automate the dialogue fader, the sidechain behavior will change. Prefer pre-fader send to the key bus when possible so ducking stays consistent even if you ride dialogue levels.

  2. Action: Insert a compressor on the music bed (or music bus)

    What to do: Put the compressor on the music track if there’s only one music file, or on a MUSIC bus if you have multiple cues. Enable external sidechain and select DIA KEY as the sidechain input.

    Why: Compressing the music in response to speech is more transparent than compressing the voice. The listener perceives clarity without hearing a “processed” vocal.

    Starting settings (baseline):

    • Ratio: 4:1
    • Attack: 10 ms
    • Release: 200 ms
    • Knee: soft (if available, start around 6 dB knee)
    • Makeup gain: off (you don’t want the compressor “helping” the music get louder)

    Pitfalls: Putting the compressor on the master bus and keying it from dialogue will duck everything (including SFX and sometimes the voice, depending on routing). Keep ducking targeted: music and/or ambience, not the entire mix.

  3. Action: Set the threshold by watching gain reduction, not guessing

    What to do: Play a section where the voice starts over music (common: the first sentence after an intro). Lower the threshold until you see consistent gain reduction when speech occurs. Aim for 3–6 dB of gain reduction on normal speech, with occasional peaks hitting 8–10 dB for louder moments.

    Why: Gain reduction targets are more repeatable than “it sounds good.” In spoken word, 3–6 dB typically clears space without the listener noticing the music “dropping.”

    Specific approach: If your dialogue sits around -18 LUFS on the track, you’ll often land on a threshold that produces GR when the key signal is around that level. Exact numbers vary by compressor calibration, so use GR as the reference.

    Pitfalls: Over-ducking (12–20 dB GR) can make the bed feel like it’s switching on/off, especially with sustained pads. Under-ducking (1–2 dB GR) may be inaudible and won’t solve masking.

  4. Action: Tune attack so consonants stay clear without “thumps”

    What to do: Adjust attack while listening to the first consonants of phrases (“T”, “K”, “P”, “S”). For most podcast beds, an attack between 5–15 ms works well. Start at 10 ms, then:

    • If the first syllable is still fighting the music, reduce attack toward 5 ms.
    • If the bed audibly “clicks” or the downbeat feels chopped, increase attack toward 15–25 ms.

    Why: Too fast an attack clamps the music instantly and can create a noticeable dip right on transients (drum hits, guitar pick). Too slow an attack lets the music mask the initial consonants that drive intelligibility.

    Pitfalls: If you hear a “pumping” sensation synchronized with every syllable, the issue might be release time (next step) rather than attack.

  5. Action: Set release to recover naturally between phrases

    What to do: Adjust release so the music returns smoothly when the speaker pauses. Typical release times for spoken word ducking are 150–350 ms. Start at 200 ms and fine-tune:

    • Music feels like it “breathes” too obviously between words: increase release to 300–450 ms.
    • Music stays too quiet too long after speech stops: decrease release to 120–180 ms.

    Why: Release controls the emotional continuity of the bed. A slightly slower release preserves the bed’s momentum and avoids that nervous up/down motion.

    Pitfalls: Extremely long release (1–2 seconds) can make music never fully recover in fast dialogue. Extremely short release (under 80 ms) often causes chatter/pumping, especially with sibilant speakers.

  6. Action: Filter the sidechain so plosives don’t over-trigger the ducking

    What to do: If your compressor provides a sidechain EQ or high-pass filter, enable it. Set a high-pass on the sidechain around 100–150 Hz (start at 120 Hz). If the speaker has heavy plosives or mic proximity, go higher, up to 180 Hz.

    Why: Low-frequency bursts (“P” and “B” plosives) can be much louder than the perceived speech level. Without filtering, the compressor will duck the music aggressively on plosives, even though the listener doesn’t need extra space there.

    Optional refinement: If sibilance (“S”, “SH”) is causing rapid pumping, try a gentle dip in the sidechain around 5–8 kHz by 2–4 dB, or simply lengthen release slightly.

    Pitfalls: Over-filtering the sidechain (HPF too high, e.g., 300 Hz) can make ducking ignore the body of the voice and respond mostly to midrange, sometimes feeling late or inconsistent.

  7. Action: Calibrate ducking depth with ratio and/or a compressor range control

    What to do: If your compressor has a range (maximum gain reduction) parameter, set it to 6 dB as a safety ceiling. If there is no range control, use ratio and threshold together:

    • For subtle beds: 2:1 to 3:1 ratio, aim for 2–4 dB GR.
    • For energetic music under dense speech: 4:1 to 6:1 ratio, aim for 4–8 dB GR.

    Why: Ratio and range help you avoid the “DJ duck” effect where music collapses too far. Speech needs space, but the bed should still feel continuous.

    Pitfalls: High ratio with a low threshold can create a hard, obvious dip. If it sounds unnatural, lower the ratio first, then re-set threshold to maintain a similar GR target.

  8. Action: Verify in real podcast scenarios (not just one clip)

    What to do: Audition at least three sections:

    • Intro: voice enters over full music.
    • Normal conversation: quick back-and-forth with short pauses.
    • Emotional emphasis: louder laughter, a raised voice, or an excited moment.

    Watch the GR meter. In conversation, you want fairly steady GR during active speech and smooth recovery during pauses.

    Why: A setting that works for a slow monologue might pump during a fast interview. Validating across situations prevents surprises late in the mix.

    Pitfalls: If you only tune on an intro, you may set release too long and the music will never come back during the rest of the show.

  9. Action: Troubleshoot common problems

    Problem: Music “pumps” on every syllable.
    Fix: Increase release by 80–150 ms (e.g., from 150 to 280 ms). If still pumping, reduce ratio from 6:1 to 4:1 and re-adjust threshold for 3–6 dB GR.

    Problem: Ducking is late; first words get masked.
    Fix: Reduce attack from 15 ms to 5–8 ms. Also check routing/latency: some DAWs delay the sidechain path if plugins add latency. Try disabling lookahead limiters on the dialogue key path, or enable plugin delay compensation.

    Problem: A single plosive causes a huge dip.
    Fix: Increase sidechain HPF from 120 Hz to 160–180 Hz. Also consider a dedicated de-plosive edit or clip gain on the dialogue track.

    Problem: Music never returns to full level between sentences.
    Fix: Shorten release (e.g., 350 ms down to 200 ms) or raise threshold slightly so low-level room tone doesn’t keep the compressor engaged.

    Problem: Ducking feels too obvious even at modest GR.
    Fix: Use a softer knee, reduce ratio, and consider lowering the music bed itself by 1–2 dB. Sometimes the simplest fix is the right one: the bed was just too hot.

4) Before and after: what to expect

Before sidechain: Music sits at a constant level. When the host starts speaking, the first consonants blur into the music, especially if the bed has strong midrange (guitars, synth leads, snare). You compensate by turning the music down too far, making transitions feel flat, or by drawing lots of automation.

After sidechain (typical target): During speech, the bed drops smoothly by 3–6 dB (occasionally 8–10 dB on louder phrases). When the speaker pauses, the bed returns over 200–350 ms without noticeable “breathing.” The voice reads as closer and clearer even though you didn’t boost it, and you do far less manual level riding.

5) Pro tips to take it further

6) Wrap-up (practice goals)

Sidechain compression is one of the fastest ways to make spoken-word mixes feel professional: consistent clarity, smoother transitions, and less time drawing automation. Practice by saving a few presets based on real use cases (gentle bed, aggressive theme, dynamic EQ ducking), then tweak only threshold and release per episode. The skill is learning how much gain reduction is enough to create space without calling attention to the processing.