Synthesis Before and After Comparison

Synthesis Before and After Comparison

By Sarah Okonkwo ·

Synthesis Before and After Comparison

1) Introduction: what “before and after” really means in synthesis

“Before and after” comparisons in synthesis are often treated like simple A/B demos: bypass the synth, enable the synth, and decide which is “better.” For serious engineering work, the more useful question is: what exactly changed—spectrally, temporally, statistically, and perceptually—between the signal before synthesis and the signal after synthesis or synthesis-derived processing?

In practice, “synthesis” can mean generating a signal from scratch (subtractive, FM, wavetable, physical modeling), or it can mean re-synthesizing an existing signal (vocoder, spectral morphing, sinusoidal modeling, neural resynthesis). In both cases, comparison is not just about tonal preference; it is about whether the transformation preserved (or intentionally altered) key attributes such as pitch stability, transient integrity, spectral envelope, modulation statistics, phase behavior, and loudness. This article frames a rigorous comparison methodology, anchored in established audio engineering principles and measurement conventions, with enough practical guidance to apply it in a studio, post room, or research lab.

2) Background: physics and engineering principles behind synthesis changes

A synthesis engine is fundamentally a controlled mechanism for shaping energy in time and frequency. The most common “after” differences come from:

Two standards-adjacent concepts help anchor comparisons:

3) Detailed technical analysis: what to measure (with concrete numbers)

3.1 Level matching: the precondition for meaningful evaluation

Human perception is strongly level-dependent; a 0.5–1.0 dB mismatch can bias preference. For synthesis A/B:

3.2 Harmonic structure and aliasing: reading the spectrum correctly

Consider a sawtooth at 48 kHz sample rate. An ideal saw has harmonics at all integer multiples, with amplitude proportional to 1/n. But a bandlimited digital saw must roll off harmonics as they approach Nyquist (24 kHz). The “before” might be a naïve saw (for demonstration) and the “after” a bandlimited oscillator. What changes?

Quantify this by measuring alias-to-signal ratio (ASR): integrate energy at non-harmonic bins relative to harmonic bins for a steady-state tone. In practice, with a high-quality BLEP oscillator and moderate oversampling (e.g., 2×–4×), ASR can be well below -80 dB for many notes, while a naïve saw can show inharmonic partials as high as -30 to -40 dB depending on pitch and analysis windowing.

3.3 Temporal behavior: transients, envelopes, and “punch”

Engineers often describe “after” synthesis as losing attack or sounding “smeared.” This is measurable. Use:

For spectral resynthesis (phase vocoder-style), windowing is key. A 4096-sample window at 48 kHz spans ~85.3 ms. Without transient preservation, an impulse-like transient is distributed across that window, reducing perceived punch. A “before” drum loop and “after” resynthesized loop can match spectral balance yet differ greatly in microdynamics.

3.4 Phase, stereo, and correlation: what changes when you add “width”

Many synth patches add stereo via unison detune, per-voice phase offsets, or chorus-like modulation. The comparison should include:

Be cautious: apparent “better” often correlates with “wider,” but wider can collapse poorly in mono or in acoustically reflective environments where channel decorrelation interacts with room reflections.

3.5 Nonlinearity and dynamic coloration: saturation is not a footnote

If the “after” state includes analog-modeled filter drive or waveshaping, measure harmonic distortion with a simple sine test at representative levels (e.g., -18 dBFS RMS, -12 dBFS RMS). You’ll often see:

On a 1 kHz sine at -18 dBFS RMS, subtle saturation might produce 2nd/3rd harmonics around -70 to -55 dB. Harder drive can bring harmonics up to -40 dB or higher, clearly audible and materially changing mix placement.

3.6 Visual description: a “before/after” measurement panel

Imagine a three-row diagnostic view:

4) Real-world implications: how these differences affect mixes, translation, and deliverables

The practical consequences of “after” synthesis changes tend to surface in three domains:

5) Case studies: professional scenarios where before/after comparisons decide outcomes

Case study A: aliasing cleanup in a high-register lead

A production lead line sits around E5–E6 (659–1319 Hz) with a bright saw and aggressive filter modulation. In a naïve oscillator, upper harmonics fold back. The lead sounds “exciting” solo but becomes grainy and masks vocal air around 10–14 kHz.

Before: spectrogram shows inharmonic components wandering with pitch; correlation between pitch and upper partials is inconsistent. Integrated inharmonic energy in 8–16 kHz measures only 10–15 dB below harmonic energy—too high for a “clean” lead.

After: switching to a bandlimited oscillator with 4× oversampling reduces inharmonic components by ~30 dB (typical of a strong implementation), and a small shelf boost can restore perceived brightness without reintroducing non-harmonic hash. The lead sits above guitars with less EQ carving.

Case study B: transient preservation in spectral resynthesis for post

In sound design for film, a metallic impact is resynthesized to allow pitch control and time stretching. A basic phase vocoder with a 2048–4096 sample window yields smooth tonal control but blunts the initial transient.

Before: transient peak occurs within 1–2 ms; crest factor ~16 dB.

After: transient spreads across ~30–80 ms depending on window; crest factor drops to ~10–12 dB, perceived as less “violent.”

Solution: hybrid approach—transient detection and separate treatment (copy transient from original or use multi-resolution STFT). After correction, crest factor returns closer to the original while retaining resynthesis pitch flexibility.

Case study C: unison width vs mono robustness in club-oriented mixes

A supersaw stack uses 8–16 voices with random phase and stereo spread. In stereo it feels huge; in mono it thins dramatically.

Before (stereo-only perspective): correlation ~0.3–0.5, Side energy strong, sounds wide.

After (optimized): constrain low frequencies to mono (e.g., below 120 Hz), reduce detune variance in the lowest voices, and introduce subtle mid-only saturation to keep presence in mono. Correlation increases to ~0.6–0.8 while preserving perceived width in the upper band. The drop translates better to club systems and broadcast downmix.

6) Common misconceptions and corrections

7) Future trends: where synthesis comparison is heading

Several developments are changing what “before and after” means:

8) Key takeaways for practicing engineers

Engineers who treat synthesis “before and after” as a measurable transformation—rather than a vibe check—gain repeatable control. The payoff is faster decisions, fewer mix surprises, and synth parts that translate across monitoring systems and distribution formats without losing the character you intended.