Additive Synthesis for Interactive AR

Additive Synthesis for Interactive AR

By Marcus Chen ·

Additive Synthesis for Interactive AR

1) Introduction: what you’ll build and why it matters

Additive synthesis is the most direct way to “design timbre on purpose”: you build a sound by stacking sine-wave partials (harmonics and inharmonics) with controlled amplitudes and envelopes. For interactive AR, that control becomes practical: you can map user distance, gaze, gesture speed, object material, or lighting changes to partial levels and spectra without the phasey artifacts you often get from heavy filtering or time-stretching.

In this tutorial you’ll build an additive synth patch suitable for AR interactions: a stable “object tone” that changes convincingly as the user approaches, rotates around it, or triggers events. You’ll also set up performance safeguards (CPU, aliasing, amplitude management) and an interaction mapping that feels responsive but not twitchy.

2) Prerequisites / setup

3) Step-by-step instructions

  1. Step 1 — Define the sound role and interaction target

    Action: Decide what the additive synth represents (object hum, UI beacon, “energy field,” material resonance) and which interaction will control it.

    Why: Additive is powerful, but unfocused patches turn into “pretty tones” that don’t read as interactive. A clear role tells you whether to emphasize stable pitch, noisy inharmonics, or spectral motion.

    Concrete target: Build a “smart artifact hum” that gets brighter and more complex as the user moves from 2.0 m to 0.2 m away. Map distance → brightness and amplitude, with subtle animation from hand movement.

    Common pitfalls: Mapping too many AR parameters at once; interaction feels chaotic. Start with one main mapping (distance) and one secondary (gesture speed).

  2. Step 2 — Choose a base frequency and partial budget

    Action: Pick a fundamental frequency and how many partials you can afford.

    Why: The fundamental determines perceived pitch stability and how quickly partials reach Nyquist (and thus alias or need muting). The partial budget is your CPU and mix headroom.

    Settings to use:

    • Fundamental (f0): 110 Hz (A2) is a good starting point for “object hum” without being sub-heavy.
    • Partials: Start with 24 harmonics (n = 1…24). At 48 kHz, the 24th harmonic is 2640 Hz—safe and clear. You can push higher later.
    • Oscillators: Sine oscillators only; no detune yet.

    Common pitfalls: Starting at f0 = 440 Hz with 40 partials—your spectrum crowds the upper mids fast and becomes fatiguing in headphones, especially in AR where users may listen longer than expected.

  3. Step 3 — Create a controlled amplitude roll-off (timbre foundation)

    Action: Assign each partial an amplitude using a predictable curve, then normalize overall gain.

    Why: A stable roll-off prevents harshness and gives you a reliable “neutral” tone that will respond well to interaction. Without it, small parameter changes can cause big perceived loudness jumps.

    Technique (recommended): Use a power-law amplitude curve:

    A(n) = 1 / (n^p), where p is typically 1.0 to 1.6.

    • Start with p = 1.2.
    • Set A(1) = 1.0, A(2) ≈ 0.435, A(3) ≈ 0.267, …
    • After summing partials, apply a master gain of -18 dB to start (you will refine later).

    Normalization approach: Don’t “normalize to 0 dBFS.” Instead, aim for -20 to -16 LUFS short-term at your typical listening level, leaving headroom for interaction-driven brightness.

    Common pitfalls: Using equal amplitude partials. That produces an organ-like buzz that becomes abrasive quickly and masks spatial cues.

    Troubleshooting: If the tone feels thin, reduce p to 1.0. If it’s too bright/raspy, raise p to 1.4–1.6 or reduce partial count to 16.

  4. Step 4 — Add per-partial envelopes to avoid “static synth” syndrome

    Action: Give partial groups different attack/decay times so the tone has life when triggered and doesn’t sound like a frozen test oscillator.

    Why: In AR, users associate responsiveness with believable physics. Real resonant objects don’t bring all frequencies up instantly; higher modes often bloom or decay differently.

    Settings to use (grouped envelopes):

    • Partials 1–4 (core): Attack 20 ms, Decay 300 ms, Sustain 0.7, Release 250 ms.
    • Partials 5–12 (body): Attack 40 ms, Decay 500 ms, Sustain 0.5, Release 350 ms.
    • Partials 13–24 (air): Attack 80 ms, Decay 800 ms, Sustain 0.25, Release 500 ms.

    Implementation note: If your system uses one envelope per voice, multiply each partial amplitude by a per-group envelope scalar.

    Common pitfalls: Too-fast attacks on high partials (0–10 ms) create “clicky glitter,” which is fatiguing and can sound like tracking noise rather than designed audio.

    Troubleshooting: If you hear clicks at note-on/off, increase minimum attack/release to 10 ms and ensure all gains are smoothed with at least a 5–20 ms ramp.

  5. Step 5 — Map distance to spectral tilt (brightness) with smoothing

    Action: Use user distance to tilt the spectrum: closer = brighter (more high partial energy), farther = darker.

    Why: Brightness changes read clearly even in noisy real-world environments, and they remain intelligible when spatial audio is imperfect (common on mobile speakers or open-ear AR devices).

    Mapping strategy:

    • Measure distance d in meters. Clamp to 0.2–2.0 m.
    • Convert to proximity x (0–1): x = 1 - ((d - 0.2) / (2.0 - 0.2)).
    • Smooth x with a 1-pole filter or glide: time constant 120 ms (fast enough to feel responsive, slow enough to avoid zipper noise).
    • Apply spectral tilt by changing p dynamically: p = 1.6 - 0.7*x. That yields p ≈ 1.6 when far (dark), p ≈ 0.9 when near (bright).

    Common pitfalls: Mapping distance directly to overall volume only. That feels like a crude proximity effect; brightness mapping feels more “interactive object” and less like a speaker getting louder.

    Troubleshooting: If brightness jumps erratically, your AR distance estimate is noisy. Increase smoothing to 250 ms or apply a median filter over the last 5–9 frames before smoothing.

  6. Step 6 — Add a controlled inharmonic layer for realism and “tech” character

    Action: Add 4–6 inharmonic partials (non-integer multiples of f0) at low level, and modulate them subtly with motion.

    Why: Perfect harmonic stacks can sound too “musical instrument.” Many AR objects (energy fields, holograms, sci-fi artifacts) benefit from a slight inharmonic shimmer that implies electronics or complex resonance.

    Specific recipe:

    • Add sine oscillators at frequency ratios: 2.37, 3.91, 5.22, 7.14, 9.63 × f0.
    • Set their base amplitudes to -24 dB relative to the fundamental (multiply by ~0.063).
    • Modulate their amplitudes with hand speed v (0–1): Ainh = Abase * (0.7 + 0.6*v).
    • Smooth v with 80 ms glide.

    Common pitfalls: Too much inharmonic level turns into “detuned bell” and distracts from spatial cues. Keep it subtle; you should miss it when muted, not hear it as a separate sound.

    Troubleshooting: If it sounds like beating/warbling in a bad way, reduce inharmonic count to 3 and lower them to -30 dB.

  7. Step 7 — Prevent clipping and maintain consistent loudness

    Action: Implement gain staging, a safety limiter, and equal-loudness compensation for brightness changes.

    Why: Additive sums can spike unexpectedly when partials align or when you brighten the spectrum. In AR, clipping is especially damaging because it reads as a device failure, not a creative choice.

    Settings to use:

    • Master bus: Insert a limiter with ceiling -1.0 dBFS, lookahead 1 ms, release 80 ms.
    • Pre-limiter headroom: Aim for peaks around -6 dBFS during normal interaction (limiter should rarely work).
    • Brightness compensation: As p decreases (brighter), reduce master gain slightly: G = -12 dB - (3 dB * x). This counters the psychoacoustic “brighter = louder” effect.

    Common pitfalls: Relying on the limiter as a volume control. That makes interaction feel squashed and reduces perceived dynamics.

    Troubleshooting: If the limiter is constantly reducing >3 dB, lower partial amplitudes globally by -6 dB or reduce the number of active partials when near.

  8. Step 8 — Optimize CPU: dynamic partial allocation and Nyquist-aware muting

    Action: Only run partials that matter, and mute any partials above Nyquist.

    Why: AR apps often share CPU/GPU with tracking, rendering, and networking. A patch that’s fine on desktop can glitch on mobile. Also, partials above Nyquist fold back as aliasing, which sounds like metallic noise unrelated to your design.

    Technique:

    • Nyquist rule: At 48 kHz, Nyquist is 24 kHz. Mute any partial where f0 * ratio > 22000 Hz (use 22 kHz as a safety margin).
    • Dynamic partial count: Set max harmonics based on proximity: N = round(8 + 24*x). That gives N=8 when far, N=32 when near.
    • Update rate: Recalculate partial gains at 60 Hz (frame rate) but smooth gains at audio rate with 10–20 ms ramps to avoid zippering.

    Common pitfalls: Updating oscillator frequency/gain at audio buffer boundaries without smoothing. You’ll hear stepping as the user moves.

    Troubleshooting: If you get crackles when moving quickly, reduce update rate to 30 Hz but increase smoothing to 150 ms, or precompute gain tables and interpolate.

  9. Step 9 — Place it in AR space: spatial and environmental considerations

    Action: Integrate basic spatialization and a light environmental send so the additive tone sits in the user’s world.

    Why: Additive synths are very “pure,” which can feel detached. A small amount of room/early reflections helps the sound glue to the environment while staying readable.

    Suggested settings:

    • Spatialization: Use your engine’s HRTF/spatializer. Keep source width narrow: 0.0–0.2 (or “mono” source) so localization remains stable.
    • Distance rolloff: Use a gentle curve; start with rolloff factor 0.7 and min/max distances 0.2 m / 8 m.
    • Reverb send: Small room or early reflections. Pre-delay 10 ms, decay 0.8 s, high-cut 6 kHz, wet send around -18 dB relative to dry.

    Common pitfalls: Too much reverb on a bright additive sound becomes splashy and masks interaction detail. Keep reverb subtle and filtered.

    Troubleshooting: If localization feels “wobbly,” reduce high partial levels (raise p slightly) and keep the source mono; overly wide sources confuse HRTF cues.

4) Before and after: expected results

Before (typical first attempt): One oscillator or a fixed harmonic stack with static levels. The sound is either dull (too few partials) or harsh (too many highs), and distance changes feel like simple volume automation. Fast movement produces zipper noise or clicks.

After (what you should hear): At 2.0 m the object hum is warm and stable, with a clear pitch center and restrained highs. As you approach 0.2–0.5 m, the spectrum opens smoothly—higher partials bloom rather than snapping on—while overall loudness stays controlled. Hand/gesture speed introduces a subtle inharmonic shimmer that reads as “active energy” without turning into a separate bell tone. Peaks remain under control with minimal limiter action, and there’s no stepping or crackling when moving quickly.

5) Pro tips for taking it further

6) Wrap-up

Additive synthesis rewards careful engineering: a clear partial strategy, smoothing on every control signal, and disciplined gain staging. In AR, those fundamentals translate directly into an interaction that feels physical and intentional rather than “audio glued on top.” Build one sound with one strong mapping, test it while moving quickly and slowly, and refine the partial curve and smoothing until it stays musical under real tracking conditions. Repeat with a second “material” preset, and you’ll start developing a practical additive toolkit you can drop into production.