Sound Design

Procedural Audio: Generating Sound Through Algorithms Instead of Samples

By Tom Bradley -- Spatial Audio Specialist, Dolby Atmos Mix Engineer · 13 min read

Procedural audio synthesis visualization on a multi-monitor audio workstation

Every sound in a traditional game or interactive application is a recording -- a sample captured at a specific moment, stored as a file, and played back when triggered. This approach works well until the number of variations needed exceeds what's practical to record and store. Footsteps on 50 different surfaces, with 12 variations per surface, at 3 different speeds each, requires 1,800 individual audio files. That's manageable. Now multiply that by every sound category in an open-world game and the storage requirements become a serious constraint.

Procedural audio takes a different approach. Instead of playing back recordings, it generates sound in real time using mathematical models and algorithms. A procedural footstep system doesn't play a recording of a foot hitting concrete -- it models the physical interaction between a foot, a shoe sole, and a concrete surface, and generates the resulting sound through synthesis. The storage cost drops from 1,800 files to a few kilobytes of algorithm parameters. The variation is infinite rather than finite.

The Three Approaches to Procedural Sound Generation

Procedural audio falls into three categories: physical modeling synthesis, granular synthesis, and algorithmic composition. Each approach generates sound through different mechanisms and is suited to different types of content. Understanding these categories helps you choose the right technique for each application.

Physical modeling synthesis uses mathematical models of physical systems -- vibrating strings, resonant cavities, impact mechanics -- to generate sound. When you strike a virtual drum, the system calculates the force of the impact, the resonant frequencies of the drum shell, the damping characteristics of the drum head, and the acoustic radiation pattern. The result is a sound that responds naturally to changes in the input parameters -- hit it harder and it gets louder and brighter, hit it softer and it gets quieter and darker, just like a real drum.

Granular synthesis generates sound by breaking source material into micro-sounds (grains) and reorganizing them in time, pitch, and density. Unlike physical modeling, granular synthesis starts with recorded source material -- but the grains are so short (10-200 milliseconds) that the resulting sound bears little resemblance to the source. Rain can become a shimmering pad. A vocal recording can become a swarm of metallic textures. The power of granular synthesis lies in its ability to transform familiar sounds into unfamiliar textures while retaining some recognizable harmonic content from the source.

Physical Modeling in Practice

The most widely used physical modeling synthesis engine is Yamaha's Virtual Acoustic (VA) technology, which models the acoustic behavior of wind and string instruments by solving the differential equations that describe air column vibration and string vibration. The resulting sound responds to breath pressure, finger position, and bow speed in ways that are indistinguishable from the real instruments to most listeners.

For game audio, physical modeling is most commonly applied to impact sounds -- footsteps, collisions, weapon impacts, and environmental interactions. The Csound physical modeling toolkit provides a set of unit generators for impact, resonance, and radiation that can be combined to model almost any physical sound-generating system. On a recent project, I used Csound to model the sound of a sword striking different armor types -- chainmail produces a dense, complex impact with high-frequency ringing; plate armor produces a single, lower-frequency impact with longer decay; leather produces a dull thud with minimal resonance. All three were generated from the same physical model with different parameter sets.

Granular Synthesis for Ambient Soundscapes

Granular synthesis excels at generating evolving ambient soundscapes that never repeat. By taking a short source recording (a few seconds of ocean waves, for instance) and spreading it across a 30-second granular cloud with randomized grain position, pitch, and density, you create a texture that sounds like the source but never plays the same sequence twice. This is particularly valuable for game environments that the player might spend hours in -- the ambient sound needs to sustain attention without becoming repetitive.

The key parameters in granular ambient generation are grain size (20-80 ms for ambient textures), grain density (50-200 grains per second), pitch randomization (+/- 2 to 5 semitones), and spatial randomization (panning each grain to a different position in the stereo or spatial field). Increasing grain density while reducing grain size creates a smoother, more continuous texture. Decreasing grain density while increasing grain size creates a more rhythmic, granular texture.

Procedural Audio vs Sample-Based Audio

The choice between procedural and sample-based audio isn't about which is better -- it's about which is more appropriate for the specific use case. Procedural audio offers infinite variation and minimal storage requirements but demands more CPU and requires careful design to achieve natural-sounding results. Sample-based audio offers predictable quality and lower CPU cost but requires significant storage and produces finite variation.

Table 1: Procedural vs Sample-Based Audio Comparison
Criteria	Procedural Audio	Sample-Based Audio
Storage per sound type	1-50 KB (algorithm parameters)	100 KB - 10 MB (audio files)
Variation	Infinite (parameter-driven)	Finite (number of samples)
CPU cost per voice	5-30% of a core	1-5% of a core
Sound quality predictability	Variable (depends on model quality)	Consistent (fixed recording)
Real-time responsiveness	Excellent (parameters change in real time)	Limited (crossfades between samples)

"Procedural audio is not about replacing recordings. It's about creating sound that responds to the player's actions in ways that recordings simply cannot. When every footstep is slightly different because the surface, the speed, and the force are all different, the player feels the world under their character's feet in a way that no sample library can deliver." -- Andy Farnell, author of "Designing Sound" (2010) and procedural audio pioneer

Building a Procedural Sound System

The architecture of a procedural sound system follows a signal flow from input parameters through synthesis engines to output mixing. The input parameters come from the game or application -- impact force, surface type, material properties, environmental conditions. These parameters feed into synthesis engine modules that generate the raw sound. The output of the synthesis engines passes through mixing and spatialization stages before reaching the final output.

A procedural footstep system for a game might have the following architecture: the game engine sends impact velocity (0-15 m/s), surface material (wood, concrete, grass, metal, water), and shoe type (barefoot, sneaker, boot, heel) to the synthesis engine. The synthesis engine selects the appropriate impact model, excites it with the velocity parameter, and applies the material-specific resonance filter. The output is then spatialized based on the player's position and orientation, and mixed with other game audio on the appropriate bus.

The Impact-Resonance-Radiation Model

Most procedural impact sounds use the impact-resonance-radiation model. The impact stage generates the initial excitation -- a short, broadband impulse that represents the physical collision. The resonance stage applies the resonant characteristics of the struck object -- its natural frequencies, decay rates, and mode shapes. The radiation stage models how the vibrating object radiates sound into the surrounding air -- its directional characteristics and distance attenuation.

The impact stage is typically modeled as a filtered noise burst with a duration of 5-50 milliseconds, depending on the hardness of the colliding objects. A metal-on-metal impact produces a 5-millisecond burst with significant high-frequency content. A rubber-on-wood impact produces a 30-millisecond burst with most energy below 2 kHz. The resonance stage uses a bank of bandpass filters tuned to the object's natural frequencies, with Q factors determined by the material's damping characteristics. The radiation stage applies distance attenuation and directional filtering.

Procedural Weather Systems

Weather is one of the most compelling applications of procedural audio in games and interactive media. A realistic weather soundscape involves wind (varying by speed and direction), rain (varying by intensity and drop size), thunder (varying by distance and atmospheric conditions), and environmental interactions (rain on different surfaces, wind through different vegetation types). Recording all of these variations as samples would require thousands of files. Procedural audio generates them from a handful of algorithms.

A procedural wind system uses filtered noise as the base signal, with the filter characteristics controlled by wind speed and direction parameters. Low wind speeds produce a narrow-band filter centered around 200-400 Hz, creating a gentle whisper. High wind speeds produce a broader filter extending from 100 Hz to 4 kHz, creating a roaring sound. Directional information controls the panning position and the Doppler shift applied to the wind sound.

Procedural rain generation uses a combination of impact synthesis (individual raindrops hitting surfaces) and continuous noise (the general sound of rainfall). The impact synthesis generates individual drop sounds at random intervals, with the rate determined by rainfall intensity. Each drop's sound is determined by the surface it hits -- a drop on a metal roof produces a sharp, high-frequency ping; a drop on grass produces a soft, broadband thud. The continuous noise layer provides the underlying rainfall bed that fills the gaps between individual drops.

Performance Optimization for Procedural Audio

Procedural audio's CPU cost is its primary limitation. Generating sound in real time through synthesis is computationally more expensive than playing back a decompressed audio file. The key to practical procedural audio is managing the computational budget -- allocating CPU resources to the most important sounds and optimizing the synthesis algorithms for efficiency.

The most effective optimization strategy is hierarchical synthesis. The most important sounds (player actions, nearby events) use high-quality, detailed synthesis models. Less important sounds (distant events, ambient background) use simplified models with fewer resonances and coarser parameter resolution. The least important sounds use pre-computed approximations -- the procedural system generates a simplified version that's cached and reused rather than regenerated each time.

On a console platform with 6 CPU cores dedicated to audio, a typical budget allocation might be: 2 cores for dialogue and music playback (sample-based), 2 cores for critical procedural sounds (footsteps, combat, vehicle), 1 core for ambient procedural sounds (weather, wildlife, environmental), and 1 core for mixing, spatialization, and effects processing. This allocation assumes approximately 50-80 concurrent procedural voices, which covers the needs of most open-world game scenarios.

Tools and Frameworks for Procedural Audio

The procedural audio tooling landscape has matured significantly in the past five years. The primary tools and frameworks include: Csound (the original computer music language, with extensive physical modeling capabilities), Faust (a functional programming language for audio synthesis, compiled to efficient C++ code), SuperCollider (a real-time synthesis server with a powerful programming language), and Gen~ (Max/MSP's code-generation environment, which exports procedural synthesis algorithms as standalone C++ code).

For game audio specifically, the Csound Unity integration and the Faust audio plugin format (which is supported by Wwise through custom plugins) are the most practical options. Both allow the audio designer to write synthesis algorithms in a high-level language and deploy them as real-time audio engines within the game. The Csound Unity integration, in particular, provides a direct bridge between Unity game parameters and Csound synthesis parameters, making it possible to drive procedural audio directly from game logic.

If you're starting with procedural audio for the first time, I recommend beginning with Faust. Its functional programming model maps naturally to signal flow concepts, the online editor lets you test algorithms in the browser without installing anything, and the compiled output runs efficiently on everything from desktop computers to embedded processors. The Faust tutorial on the GRAME website walks you through building a complete procedural footstep system in about 200 lines of code.

References: Andy Farnell, "Designing Sound: Principles and Practices of Real-Time Sound Synthesis," CreateSpace (2010) | GRAME, "Faust Programming Language Documentation" (2024) | Julius O. Smith III, "Physical Audio Signal Processing," W3K Publishing (2010) | Game Audio Network Guild, "Procedural Audio in Games: State of the Industry" (2023)