
Synthesis Before and After Comparison
Synthesis Before and After Comparison
1) Introduction: what “before and after” really means in synthesis
“Before and after” comparisons in synthesis are often treated like simple A/B demos: bypass the synth, enable the synth, and decide which is “better.” For serious engineering work, the more useful question is: what exactly changed—spectrally, temporally, statistically, and perceptually—between the signal before synthesis and the signal after synthesis or synthesis-derived processing?
In practice, “synthesis” can mean generating a signal from scratch (subtractive, FM, wavetable, physical modeling), or it can mean re-synthesizing an existing signal (vocoder, spectral morphing, sinusoidal modeling, neural resynthesis). In both cases, comparison is not just about tonal preference; it is about whether the transformation preserved (or intentionally altered) key attributes such as pitch stability, transient integrity, spectral envelope, modulation statistics, phase behavior, and loudness. This article frames a rigorous comparison methodology, anchored in established audio engineering principles and measurement conventions, with enough practical guidance to apply it in a studio, post room, or research lab.
2) Background: physics and engineering principles behind synthesis changes
A synthesis engine is fundamentally a controlled mechanism for shaping energy in time and frequency. The most common “after” differences come from:
- Bandlimiting and alias control: Digital oscillators must prevent harmonics above Nyquist from folding back (aliasing). Any bandlimiting method (BLEP/BLAMP, minBLEP, polyBLEP, oversampling, wavetable mipmapping) alters high-frequency content relative to an “ideal” mathematical waveform.
- Filter topology and nonlinearity: “Analog-modeled” filters introduce nonlinearities (saturation, zero-delay feedback artifacts, topology-preserving transform behavior). These add harmonic distortion and often dynamic behavior (frequency response depends on input level).
- Phase behavior: Minimum-phase vs linear-phase filters, phase-reset oscillators, and unison detuning alter waveform symmetry, transient timing, and inter-channel correlation.
- Time-frequency resolution constraints: In spectral or granular synthesis, STFT window size, hop size, and overlap determine what transients survive. A 2048-sample window at 48 kHz gives ~42.7 ms time span—excellent frequency resolution, but it can smear transients unless special transient handling is used.
- Control-rate sampling and modulation: Modulators (LFOs, envelopes) may run at control rate (e.g., 1 kHz) rather than audio rate. Fast modulation can imprint stepping or limit bandwidth, affecting sideband structure in FM/AM scenarios.
Two standards-adjacent concepts help anchor comparisons:
- Loudness and level: A fair comparison demands level matching. Integrated loudness per ITU-R BS.1770 (LUFS) is a common baseline; short-term loudness is critical for transient-rich examples.
- Noise and distortion measures: While “THD+N” is traditionally a hardware metric, analogous measurement thinking applies to synthesis engines: quantify spurious components, modulation artifacts, and noise floors.
3) Detailed technical analysis: what to measure (with concrete numbers)
3.1 Level matching: the precondition for meaningful evaluation
Human perception is strongly level-dependent; a 0.5–1.0 dB mismatch can bias preference. For synthesis A/B:
- Match integrated loudness within 0.2 LU (practical threshold) using BS.1770 gating where appropriate.
- Also check true peak (dBTP) because “after” synthesis often increases inter-sample peaks via nonlinearity and sharp waveforms. A common engineering target is to keep true peak below -1.0 dBTP for distribution safety, but for internal comparison you mainly need to avoid clipping artifacts that confound the test.
3.2 Harmonic structure and aliasing: reading the spectrum correctly
Consider a sawtooth at 48 kHz sample rate. An ideal saw has harmonics at all integer multiples, with amplitude proportional to 1/n. But a bandlimited digital saw must roll off harmonics as they approach Nyquist (24 kHz). The “before” might be a naïve saw (for demonstration) and the “after” a bandlimited oscillator. What changes?
- Naïve oscillator: Harmonics above Nyquist fold back as aliases. Example: a 10 kHz fundamental has harmonics at 20, 30, 40 kHz… Components above 24 kHz reflect back: 30 kHz aliases to 18 kHz, 40 kHz aliases to 8 kHz, etc. This produces inharmonic content that is not musically related to the fundamental.
- Bandlimited oscillator: Harmonics above Nyquist are suppressed; the top octave may be quieter and smoother. The “after” spectrum shows less high-frequency hash and fewer inharmonic components.
Quantify this by measuring alias-to-signal ratio (ASR): integrate energy at non-harmonic bins relative to harmonic bins for a steady-state tone. In practice, with a high-quality BLEP oscillator and moderate oversampling (e.g., 2×–4×), ASR can be well below -80 dB for many notes, while a naïve saw can show inharmonic partials as high as -30 to -40 dB depending on pitch and analysis windowing.
3.3 Temporal behavior: transients, envelopes, and “punch”
Engineers often describe “after” synthesis as losing attack or sounding “smeared.” This is measurable. Use:
- Attack time metrics: time from -60 dB to -10 dB on an amplitude envelope (or other standardized thresholds). Compare before/after; a difference of 5–15 ms is audible on percussive sources.
- Crest factor (peak-to-RMS): transient-rich signals typically have higher crest factors. Heavy saturation, limiting, or some resynthesis approaches reduce crest factor. A change from, say, 14 dB to 9 dB indicates a materially different transient profile.
For spectral resynthesis (phase vocoder-style), windowing is key. A 4096-sample window at 48 kHz spans ~85.3 ms. Without transient preservation, an impulse-like transient is distributed across that window, reducing perceived punch. A “before” drum loop and “after” resynthesized loop can match spectral balance yet differ greatly in microdynamics.
3.4 Phase, stereo, and correlation: what changes when you add “width”
Many synth patches add stereo via unison detune, per-voice phase offsets, or chorus-like modulation. The comparison should include:
- Inter-channel cross-correlation: Values near +1 indicate strong mono compatibility; near 0 indicates decorrelation. Wide unison can push correlation toward 0.2–0.6 depending on detune and modulation depth.
- Mid/Side energy ratios: Track how much energy shifts into the Side channel after enabling unison or ensemble effects.
Be cautious: apparent “better” often correlates with “wider,” but wider can collapse poorly in mono or in acoustically reflective environments where channel decorrelation interacts with room reflections.
3.5 Nonlinearity and dynamic coloration: saturation is not a footnote
If the “after” state includes analog-modeled filter drive or waveshaping, measure harmonic distortion with a simple sine test at representative levels (e.g., -18 dBFS RMS, -12 dBFS RMS). You’ll often see:
- Odd/even harmonic profiles depending on symmetry. Asymmetry introduces even harmonics; symmetrical clipping emphasizes odd.
- Level-dependent frequency response in nonlinear filters: resonance peak and cutoff behavior may shift with drive.
On a 1 kHz sine at -18 dBFS RMS, subtle saturation might produce 2nd/3rd harmonics around -70 to -55 dB. Harder drive can bring harmonics up to -40 dB or higher, clearly audible and materially changing mix placement.
3.6 Visual description: a “before/after” measurement panel
Imagine a three-row diagnostic view:
- Row 1: Spectrum (magnitude) from 20 Hz–24 kHz. “Before” shows dense high-frequency content and possible inharmonic spikes; “after” shows a cleaner harmonic series and a controlled roll-off near Nyquist.
- Row 2: Spectrogram over 5 seconds. “Before” transient hits appear as sharp vertical lines; “after” may show horizontal smearing if resynthesis windowing is long.
- Row 3: Stereo vectorscope + correlation meter. “Before” is a narrow ellipse (more mono); “after” becomes a wider cloud with correlation reduced.
4) Real-world implications: how these differences affect mixes, translation, and deliverables
The practical consequences of “after” synthesis changes tend to surface in three domains:
- Mix translation and headroom: Bandlimited oscillators often reduce brittle top-end and free headroom; conversely, nonlinear filters can inflate midrange density and true peaks, requiring gain staging. A patch that measures -14 LUFS integrated might still hit -0.5 dBTP due to sharp waveform peaks and saturation.
- Masking and arrangement: Aliasing energy frequently occupies upper mids and highs in a non-harmonic way, masking cymbals, sibilance, or strings. Cleaning aliasing can open space without EQ. Conversely, adding controlled harmonic distortion can increase audibility on small speakers by creating upper harmonics that survive bandwidth limitations.
- Mono compatibility and playback context: Wide unison sounds impressive in nearfield monitoring but can partially cancel in mono (especially if width is achieved via phase manipulation). Broadcast and club playback often sums to mono or quasi-mono in parts of the chain; correlation monitoring matters.
5) Case studies: professional scenarios where before/after comparisons decide outcomes
Case study A: aliasing cleanup in a high-register lead
A production lead line sits around E5–E6 (659–1319 Hz) with a bright saw and aggressive filter modulation. In a naïve oscillator, upper harmonics fold back. The lead sounds “exciting” solo but becomes grainy and masks vocal air around 10–14 kHz.
Before: spectrogram shows inharmonic components wandering with pitch; correlation between pitch and upper partials is inconsistent. Integrated inharmonic energy in 8–16 kHz measures only 10–15 dB below harmonic energy—too high for a “clean” lead.
After: switching to a bandlimited oscillator with 4× oversampling reduces inharmonic components by ~30 dB (typical of a strong implementation), and a small shelf boost can restore perceived brightness without reintroducing non-harmonic hash. The lead sits above guitars with less EQ carving.
Case study B: transient preservation in spectral resynthesis for post
In sound design for film, a metallic impact is resynthesized to allow pitch control and time stretching. A basic phase vocoder with a 2048–4096 sample window yields smooth tonal control but blunts the initial transient.
Before: transient peak occurs within 1–2 ms; crest factor ~16 dB.
After: transient spreads across ~30–80 ms depending on window; crest factor drops to ~10–12 dB, perceived as less “violent.”
Solution: hybrid approach—transient detection and separate treatment (copy transient from original or use multi-resolution STFT). After correction, crest factor returns closer to the original while retaining resynthesis pitch flexibility.
Case study C: unison width vs mono robustness in club-oriented mixes
A supersaw stack uses 8–16 voices with random phase and stereo spread. In stereo it feels huge; in mono it thins dramatically.
Before (stereo-only perspective): correlation ~0.3–0.5, Side energy strong, sounds wide.
After (optimized): constrain low frequencies to mono (e.g., below 120 Hz), reduce detune variance in the lowest voices, and introduce subtle mid-only saturation to keep presence in mono. Correlation increases to ~0.6–0.8 while preserving perceived width in the upper band. The drop translates better to club systems and broadcast downmix.
6) Common misconceptions and corrections
- Misconception: “More harmonics always means more detail.”
Correction: Harmonics that are musically related can add detail; inharmonic aliasing often reads as grit or fizz and masks other elements. Detail is not the same as broadband energy. - Misconception: “If two spectra match, the sounds match.”
Correction: Time structure and phase matter. Two sounds can share a similar long-term magnitude spectrum while differing in transient timing, modulation, and fine structure. Spectrograms and envelope metrics reveal differences that a static FFT hides. - Misconception: “Linear-phase is always more accurate.”
Correction: Linear-phase filtering preserves phase relationships but introduces pre-ringing on sharp transients. Minimum-phase filters avoid pre-ringing but shift phase. “Accuracy” depends on the perceptual and technical goal. - Misconception: “Stereo width is free.”
Correction: Width achieved through phase manipulation or decorrelation can reduce mono compatibility and change perceived punch. Always check mono sum and correlation, particularly for broadcast and club contexts. - Misconception: “Oversampling fixes everything.”
Correction: Oversampling reduces aliasing from nonlinearities and sharp discontinuities, but it costs CPU and can introduce its own filter artifacts if poorly implemented. Also, it does not solve time-resolution issues in spectral processes.
7) Future trends: where synthesis comparison is heading
Several developments are changing what “before and after” means:
- Neural and differentiable synthesis: Neural oscillators and resynthesis models can reproduce instrument-like microstructure, but introduce new artifacts—latent-space “averaging,” temporal instability, and content-dependent noise. Expect more standardized evaluation methods beyond simple THD-like measures, including perceptual metrics and artifact classifiers.
- Perceptual evaluation frameworks: Borrowing from codec testing, controlled listening tests (MUSHRA-like paradigms) and perceptually weighted error metrics are becoming more common for evaluating resynthesis and time-frequency methods.
- Multi-rate and event-based engines: Modern synth architectures increasingly mix audio-rate modulation, control-rate logic, and event-based per-voice processing. Comparisons will need to include modulation bandwidth and stepping artifacts, not just oscillator spectra.
- Spatial and immersive contexts: With Atmos and other multichannel formats, “after” changes include spatial image stability, channel coherence, and downmix behavior. A patch that works in stereo can misbehave when rendered to objects or folded down to binaural.
8) Key takeaways for practicing engineers
- Level-match first: Use BS.1770 loudness (LUFS) and check true peak (dBTP). Keep loudness within ~0.2 LU for meaningful A/B judgments.
- Separate harmonic intent from artifacts: Measure and listen for inharmonic aliasing versus musically related harmonics. A clean top end often mixes better than a “brighter” but aliased one.
- Analyze in time and frequency: Use spectra, spectrograms, and transient metrics (attack time, crest factor). Static FFT matching is not sufficient.
- Monitor stereo robustness: Correlation and mid/side balance matter. Always check mono sum for unison-heavy or phase-based width.
- Nonlinearity is a primary design element: Saturation and filter drive change not only tone but dynamics and headroom. Measure harmonic generation at realistic input levels.
- Choose the right “after” for the job: Bandlimited oscillators for clean leads, hybrid transient handling for spectral resynthesis, controlled unison for club translation—each “after” has a context where it wins.
Engineers who treat synthesis “before and after” as a measurable transformation—rather than a vibe check—gain repeatable control. The payoff is faster decisions, fewer mix surprises, and synth parts that translate across monitoring systems and distribution formats without losing the character you intended.









