Taming sox
Bart Massey and Claude Code
Source: https://github.com/pdx-cs-sound/taming-sox.
Licensed under CC-BY 4.0.
Introduction
Sox is a command-line audio Swiss Army knife: it converts formats, applies DSP effects, mixes files, generates tones, and slots cleanly into shell pipelines. Its CLI is genuinely strange — effects come after the output filename, format flags are positional, and a typo can silently mean something completely different. This tutorial introduces those quirks in an order that makes them feel inevitable rather than arbitrary.
What this book covers. Sox has many more effects than any one
tutorial can reasonably teach. The approach here: show the arguments
that teach concepts — the q/t/h/l shapes in fade, the
Q/Hz/octaves units in equalizer, the transfer-function syntax in
compand. For arguments that just tune existing behavior, the sox
man page is the right reference. When an effect is mentioned only in
passing, that’s why.
Sox has real limits — they’re collected at the end of the last chapter if you want to check whether it fits your problem before investing in learning it.
Sample audio files are provided; test tones are generated as we go.
sox and sox_ng
Original sox stopped releasing in 2015. In 2024 a community fork,
sox_ng, picked up active development. Most modern distros now ship
sox_ng under the name sox — when this book says “sox,” it means
whichever binary you have. Everything here works on both. A handful
of features in the later chapters are sox_ng 14.5+ only; those are
called out where they come up.
The title of this book stays Taming sox because the command is
still sox. “sox_ng” only comes up when the fork itself is the
subject.
Getting sox
Most modern distros ship sox_ng under the name sox. Some older
stable releases still ship the 2015 legacy build. Homebrew and the
BSDs vary.
Run sox --version to check. SoX_ng means you’re set. SoX 14.4.2
means you have the legacy build — everything in this book still works,
but you’ll miss the sox_ng-flagged features later, and you’ll be
running an unmaintained binary.
If your platform doesn’t ship sox_ng and you want it, build from https://codeberg.org/sox_ng/sox_ng. Windows binaries are on the releases page.
Getting Started
Check your install
sox --version
You should see a version line. If it starts with SoX_ng, you have
the maintained fork; if it says SoX 14.4.2, you have the 2015
legacy release. The book works on both — a few sox_ng-only features
are flagged where they come up.
Inspecting files with soxi
soxi reads metadata without touching the audio. The book ships
with a short voice recording you can try it on:
⬇ voice.wav (CC0)
soxi voice.wav
Input File : 'voice.wav'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:00:04.09 = 196162 samples ~ 306.503 CDDA sectors
File Size : 392k
Bit Rate : 768k
Sample Encoding: 16-bit Signed Integer PCM
Useful individual fields (handy in shell scripts):
soxi -r voice.wav # sample rate
soxi -b voice.wav # bit depth
soxi -c voice.wav # channels
soxi -D voice.wav # duration in seconds
A test tone
When you want predictable audio to experiment with, generate it:
sox -n test.wav synth 10 sine 440 gain -6
play test.wav
Three new pieces, one sentence each: -n in the input position
means “no input file — generate audio instead.” synth 10 sine 440
synthesizes ten seconds of a 440 Hz sine wave. gain -6 knocks it
down 6 dB (roughly half amplitude), which leaves headroom so later
effects don’t clip. Chapter 10 goes deeper on synth; this is all
you need for now.
Playing audio with play
play test.wav
play is sox with your speaker as the implicit output. It is
literally a symlink to the same binary. It needs a working audio
device — on headless servers, use sox ... -n with stat or
soxi to verify results instead.
Scale the playback amplitude with -v (1.0 = unchanged,
0.5 = half amplitude). Despite the name, -v operates on amplitude,
which is linear — halving the amplitude reduces perceived loudness by
6 dB, not by half. Use gain (chapter 3) if you want to think in
decibels:
play -v 0.5 test.wav
Play only the first half-second:
play test.wav trim 0 0.5
That trim 0 0.5 at the end is an effect. Don’t worry about the
syntax yet — chapter 2 will make it click.
Recording with rec
rec is the mirror image: sox with your microphone as the implicit
input.
rec capture.wav # record until Ctrl-C
rec capture.wav trim 0 5 # record for 5 seconds
Conversions and Anatomy
Your first conversion
Sox infers format from file extension. Converting is often just:
sox test.wav test.mp3
sox test.wav test.flac
sox test.wav test.ogg
Use soxi to verify the output matches your expectations.
The null output -n discards the result entirely — useful for
checking that sox can read a file without writing anything:
sox test.wav -n
The anatomy of a sox command
Every sox command follows this structure:
sox [global opts] [input opts] infile(s) [output opts] outfile [effects...]
──────────── ────────────────────── ──────────────────── ──────────
globals input output effects
- globals — options affecting the whole run (
-V,--buffer, …) - input — format options for the input(s), placed immediately before the filename
- output — the output file, with its own optional format options before it
- effects — the effects chain, applied left to right
The -v flag from chapter 1 lives in the input section: it scales the
input amplitude as the file is read, before any effects run. The vol
effect (chapter 3) does the same arithmetic but in the effects chain.
These two commands produce identical output:
sox -v 0.5 test.wav out.wav # scale at input
sox test.wav out.wav vol 0.5 # scale in effects chain
-v is input-only — sox will reject it as an output flag. To scale the
output, use vol or gain in the effects chain.
play and rec are just sox with one section missing. play has
no output section (the speaker is implicit). rec has no input section
(the microphone is implicit). Any effect chain you’d write after the
output filename in sox comes directly after the input in play:
sox test.wav out.wav highpass 300 norm -3
play test.wav highpass 300 norm -3
Format flags are positional
A format flag describes the next filename in the command — input or output, depending on where you place it:
sox -r 8000 test.wav out.wav # override input's declared rate; output inherits 8000 Hz
sox test.wav -r 8000 out.wav # resample output to 8000 Hz; input rate unchanged
sox -r 16000 test.wav -r 8000 out.wav # both specified explicitly
All three are valid and mean different things. Placing a flag in the wrong section will silently produce a different result than you intended, which is the most common source of bugs in sox commands. Format options are covered fully in chapter 5.
Effects come last
Effects go after the output filename. This surprises most people once and never again:
sox test.wav out.wav trim 5 10 reverse
# ───────────────── effects
play out.wav
Multiple effects are applied left to right: first trim, then
reverse on the trimmed result.
Basic Effects
trim — cut out a section
play test.wav trim start [length]
trim takes a start position and a length, not start and end.
play test.wav trim 0 5 # first 5 seconds
play test.wav trim 3 4 # 4 seconds starting at 3s
play test.wav trim 5 # skip the first 5 seconds
play test.wav trim -3 # last 3 seconds
play test.wav trim 00:01:30 # start at 1m30s
reverse — play backwards
play test.wav reverse
Sox loads the whole file into memory to do this; large files are slow.
fade — smooth edges
play test.wav fade [type] fade-in [duration] fade-out
The type can be q (quarter-sine, natural sounding), t (linear),
h (half-sine), or l (logarithmic). Omitting type defaults to
linear.
play test.wav fade 1 # 1s linear fade-in, play to end
play test.wav fade q 2 0 2 # 2s fade-in, full duration, 2s fade-out
Duration 0 means “play to the natural end of the file.”
vol and gain — adjust volume
vol takes a multiplier; gain takes decibels:
play test.wav vol 0.5 # half amplitude
play test.wav vol 2.0 # double (can clip!)
play test.wav gain -6 # quieter by 6 dB
play test.wav gain 6 # louder by 6 dB (can clip!)
A rough guide: −6 dB ≈ half perceived loudness; +6 dB ≈ double.
sox_ng 14.5+:
volaccepts a second argument that enables a soft-clipping limiter so boosts don’t hard-clip when they exceed 0 dBFS. Seeman soxfor the exact argument. On legacy sox,vol 2clips; on sox_ng with the limiter, it shapes the peak instead.
norm — automatic normalization
norm brings the peak sample to a target level (default 0 dBFS):
play test.wav norm # peak to 0 dBFS
play test.wav norm -3 # peak to -3 dBFS (safer headroom)
To save the result: sox test.wav out.wav norm -3.
stat — measure levels
Use -n as the output to discard audio and just print statistics:
sox test.wav -n stat # linear amplitudes, whole file mixed to mono
sox test.wav -n stats # dB levels, per-channel columns
stats is generally more useful: it reports in dB and breaks out
each channel separately. stat reports linear amplitude values,
which are harder to interpret. Both print to stderr.
Chaining Effects
Effects in the effects chain are applied strictly left to right. The output of one effect becomes the input to the next.
play test.wav trim 5 10 fade q 1 0 1 norm -3
# ──────── ──────────── ──────
# 1. trim 2. fade 3. norm
Order matters
# norm then gain: normalize to 0 dBFS, then boost 6 dB — likely clips
play test.wav norm gain 6
# gain then norm: boost first, then normalize back down — norm undoes the gain
play test.wav gain 6 norm
Neither is wrong — they just do different things. Think through the pipeline before you run it.
Writing to a file
When you’re happy with the chain, swap play for sox and add an
output filename:
sox test.wav output.wav trim 5 10 fade q 1 0 1 norm -3
play output.wav
Sox converts the format and applies the effects in a single pass, so trimming and converting to MP3 is one command:
sox test.wav output.mp3 trim 0 30 norm -3
play output.mp3
Format Options
Sox detects format from file extensions. When that isn’t possible — raw PCM files, pipes, unusual encodings — you provide it explicitly.
Recall from chapter 2: format flags describe the next filename. Put them in the wrong section and they apply to the wrong file.
The four core flags
These describe the audio itself. Sox will resample, convert, or remix as needed to meet them.
| Flag | Meaning | Example |
|---|---|---|
-r | sample rate (Hz) | -r 44100 |
-b | bit depth | -b 16 |
-c | channels | -c 1 (mono), -c 2 (stereo) |
-e | encoding | -e signed-integer |
Common encodings: signed-integer, unsigned-integer,
floating-point, a-law, u-law.
The file type flag
-t is different: it names the container format (WAV, AIFF, FLAC,
raw, and so on) rather than a property of the audio. Sox normally
infers it from the filename extension, so you rarely set it. Reach
for -t only when there’s no extension to read (pipes with -,
headerless raw files) or the extension lies about the content.
-t raw # headerless PCM
-t wav # force WAV regardless of extension
Resampling
sox input.wav -r 8000 telephone.wav # downsample to 8 kHz
play telephone.wav # noticeably lo-fi
sox input.wav -r 48000 hq.wav # upsample to 48 kHz
The format flag before telephone.wav describes the output.
Sox resamples automatically.
Changing bit depth and channels
sox input.wav -b 24 output.wav # convert to 24-bit
sox stereo.wav -c 1 mono.wav # stereo → mono (averages channels)
play mono.wav
sox mono.wav -c 2 stereo.wav # mono → stereo (duplicates channel)
-c uses sox’s default algorithm: averaging when going down,
duplication when going up. For anything more specific — dropping a
channel, swapping L and R, custom mix weights — use remix (chapter 8).
Fully-specified output
Sometimes you want to know exactly what comes out: a specific sample rate, bit depth, channel count, and encoding. This matters for archival (so the artifact doesn’t drift with the default audio config), for interop (another tool expects 16-bit 44.1 kHz stereo signed-integer and nothing else), and for pipelines that hand audio to downstream processes with narrow assumptions.
The recipe: specify all four flags on the output.
sox input.wav -r 44100 -b 16 -c 2 -e signed-integer output.wav
That produces a WAV with exactly those properties regardless of
what the input looked like — sox resamples, converts bit depth,
remixes channels, and re-encodes as needed. Verify with soxi output.wav.
The same four-flag pattern works for raw output — just add -t raw:
sox input.wav -t raw -r 44100 -b 16 -c 1 -e signed-integer output.raw
Reading raw PCM
Raw files have no header, so you must describe them completely:
sox -r 44100 -b 16 -c 1 -e signed-integer input.raw output.wav
play output.wav
Writing raw output (see “Fully-specified output” above):
sox input.wav -t raw -r 8000 -b 8 -c 1 -e unsigned-integer output.raw
Piping
Use - for stdin or stdout, with -t to specify the format:
# Two sox processes in a pipeline
sox input.wav -t raw - | sox -t raw -r 44100 -b 16 -c 1 -e signed-integer - output.wav
For piping between two sox processes specifically, the -p flag
emits sox’s own internal format, which avoids specifying all those
flags manually:
sox test.wav -p trim 0 5 | sox - output.wav norm -3
Filters
Filters shape the frequency content of audio. A quick reference: human hearing spans roughly 20 Hz (low rumble) to 20 kHz (high hiss).
Filtering a single sine wave is uninteresting — it either passes or it doesn’t. Pink noise has energy across the whole spectrum, so filters produce an audible and visible change. Generate some:
sox -n noise.wav synth 5 pinknoise gain -6
play noise.wav
highpass and lowpass
Remove everything below or above a cutoff frequency:
play noise.wav highpass 2000 # remove everything below 2 kHz
play noise.wav lowpass 2000 # remove everything above 2 kHz
play noise.wav highpass 300 lowpass 3400 # telephone band
The telephone band example is a good one to listen to: the characteristic “tinny phone” sound comes entirely from cutting the low and high ends.
bass and treble are shelving variants of equalizer — convenient
when you just want to lift or cut one end. See man sox for arguments.
equalizer — parametric EQ
Three arguments: center frequency, width, gain in dB. Width units are controlled by a suffix:
| Suffix | Unit | Example |
|---|---|---|
| none | Hz | 200 = 200 Hz wide |
q | Q factor | 2q = Q of 2 |
o | octaves | 1o = one octave wide |
Q and Hz are inversely related: a higher Q means a narrower band.
Q = center / bandwidth, so 2q at 1 kHz equals a 500 Hz bandwidth.
Q is more useful when you want consistent relative width across
different center frequencies.
Stack multiple equalizer effects to build a full EQ:
play noise.wav equalizer 1000 200 -6 # cut 6 dB at 1 kHz, 200 Hz wide
play noise.wav equalizer 1000 2q -6 # same centre, Q=2 (500 Hz wide)
play noise.wav equalizer 3000 1o 3 # boost 3 dB at 3 kHz, one octave wide
A practical voice cleanup chain
⬇ voice.wav (CC0)
sox voice.wav clean.wav \
highpass 100 \
equalizer 3000 500 2 \
norm -3
play clean.wav
Removes low-frequency noise, adds a little presence, normalizes.
sox_ng 14.5+: adds a FIR filter designed from frequency-response knots — you specify points on the desired magnitude response and sox builds the filter. Useful when neither a shelving nor a parametric shape fits what you want. See
man sox.
Time and Pitch
Four effects; two axes:
| Effect | Changes speed? | Changes pitch? | Notes |
|---|---|---|---|
rate | no | no | proper resampler; changes sample rate only |
speed | yes | yes | varispeed tape |
tempo | yes | no | time-stretch, pitch preserved |
pitch | no | yes | pitch-shift, duration preserved |
rate — resampling
Resamples the audio to a new sample rate. Pitch and duration are both preserved — the output just has fewer (or more) samples per second. Use it to change the technical format of a file, not to alter how it sounds:
sox test.wav out.wav rate 22050 # downsample to 22050 Hz
play out.wav
sox test.wav out.wav rate 48000 # upsample to 48000 Hz
play out.wav
This is equivalent to writing -r 22050 out.wav as an output format
flag, but as an explicit effect it fits naturally in a chain and
gives access to quality options:
-h— high quality: longer anti-aliasing filter, better stopband rejection, audibly cleaner on music-v— very high quality: even longer filter; diminishing returns over-hbut useful for archival or repeated resampling where rounding errors accumulate
speed — varispeed
Like a tape running faster or slower. Factor > 1 speeds up and raises pitch; < 1 slows down and lowers pitch.
play test.wav speed 1.5 # faster and higher
play test.wav speed 0.75 # slower and lower
tempo — time-stretch only
Changes duration while preserving pitch using the WSOLA algorithm (chops audio into overlapping segments and re-stitches them). Practical range: 0.5–2.0.
play test.wav tempo 1.2 # 20% faster, same pitch
play test.wav tempo 0.8 # 20% slower, same pitch
Three presets tune the algorithm for different material:
play test.wav tempo -m 1.2 # music (default)
play test.wav tempo -s 0.75 # speech
play test.wav tempo -l 1.1 # linear (least CPU, more artifacts)
pitch — pitch-shift only
Argument is in cents (100 cents = 1 semitone, 1200 = one octave).
play test.wav pitch 200 # up 2 semitones
play test.wav pitch -1200 # down one octave
pitch uses the same WSOLA algorithm as tempo — it is implemented
as a tempo stretch followed by a rate resample in the opposite
direction, so the duration cancels out and only the pitch shift
remains. The -m/-s/-l presets are not exposed on pitch, but you
can pass the same segment search overlap tuning parameters if needed.
Combining them
tempo and pitch are independent effects applied in sequence:
play test.wav tempo 1.2 pitch -400 # faster but lower
See also: Rubber Band
Sox has no phase vocoder. When quality matters — especially for
time-stretching, pitch-shifting, or formant-preserved vocal shifts —
rubberband is the standard
tool. The “Beyond sox” chapter covers how to reach for it.
Combining Files
Sample files — download and place in your working directory:
⬇ music.wav — “Erase Data” by Koi-discovery (CC0)
⬇ voice.wav (CC0)
Setup:
sox -n a.wav synth 3 sine 440 gain -6
sox -n b.wav synth 3 sine 660 gain -6
# normalise samples to a common format for mixing
sox samples/music.wav -c 1 -r 44100 music.wav
sox samples/voice.wav -r 44100 voice.wav
Per-input format flags
With multiple inputs, input-section format flags repeat independently for each input file — place them immediately before the file they describe:
sox [input-a] infile_a [input-b] infile_b [output] outfile [effects]
Any input flag works this way: -v, -r, -b, -c, -t, -e.
The most common use is -v for per-input volume (shown below), and
format flags when combining files of different types or encodings.
-v takes a linear multiplier only — there is no dB form. Common
conversions: −6 dB ≈ 0.5, −12 dB ≈ 0.25, −20 dB = 0.1.
sox -v 0.8 a.wav -t raw -r 48000 -b 32 -c 1 -e signed-integer -v 0.5 b.raw out.wav
play out.wav
Concatenation — A then B
List multiple inputs before the output:
sox a.wav b.wav combined.wav
play combined.wav
Files must have identical sample rates and channel counts — sox
hard-fails if they differ. Use rate to resample first if needed.
For a smooth crossfade at the join, use the splice effect:
sox a.wav b.wav out.wav splice 3 # crossfade at the 3-second mark
play out.wav
Mixing — A over B
The -m global flag sums inputs together rather than concatenating:
play -m music.wav voice.wav
Mixing raises the overall level — normalize afterward to avoid clipping:
play -m music.wav voice.wav norm -3
Set per-file volume with -v immediately before each input:
play -m -v 0.3 music.wav -v 1.0 voice.wav norm -3
Merging channels — A and B side by side
-M puts channels from each file side by side. Two mono files
become one stereo file:
sox -M left.wav right.wav stereo.wav
play stereo.wav
remix — channel routing
Where -c uses sox’s default averaging/duplication, remix gives
explicit control. Each argument describes one output channel by
naming the input channel(s) that feed it.
play stereo.wav remix 2 1 # swap L and R
play stereo.wav remix - # average all channels to mono
play stereo.wav remix 1 # keep left channel only, drop right
play stereo.wav remix 1,2 1,2 # both output channels = L+R mix
- averages all input channels into one output channel — equivalent
to -c 1 but as an explicit effect. 1,2 sums channels 1 and 2.
Effects and Dynamics
Sample file — download and place in your working directory:
⬇ music.wav — “Erase Data” by Koi-discovery (CC0)
Setup:
# Varying dynamics for compand: loud / quiet / loud
sox -n _loud.wav synth 2 sawtooth 220 gain -6
sox -n _quiet.wav synth 2 sawtooth 220 gain -20
sox _loud.wav _quiet.wav _loud.wav dynamics.wav
play dynamics.wav
reverb
Simulates room acoustics. Arguments: reverberance (0–100), HF damping (0–100), room scale (0–100). Defaults are reasonable.
play music.wav reverb
play music.wav reverb 80 50 100 # large, bright room
--wet-only removes the dry signal, leaving only the wet (reverberated) signal:
play music.wav reverb --wet-only 80
Note:
reverbdoes not extend the output file. The reverb decay is truncated at the input length. To capture the full tail, pad silence onto the end of the input first:play music.wav pad 0 2 reverb 80
silence — trim silence
These effects need a file that actually has silence. Generate one
with pad, which adds silence (in seconds) to the start and end:
sox -n padded.wav synth 5 sawtooth 220 gain -6 pad 1 1
play padded.wav
Remove leading and trailing silence:
play padded.wav silence 1 0.1 1% -1 0.1 1%
Each group is: periods duration threshold. The first group handles
the start; the second (preceded by -1) handles the end.
For voice recordings, vad (voice activity detection) is simpler —
it finds the onset of audio activity and trims everything before it:
play padded.wav vad
compand — dynamic range compression
compand reduces the gap between loud and quiet passages.
dynamics.wav from the setup has 14 dB of range to work with.
play dynamics.wav compand 0.3,1 6:-70,-60,-20 -5 -90 0.2
Breaking that down:
0.3,1— attack 0.3 s, decay 1 s6:-70,-60,-20— transfer function: input/output dB pairs-5— output gain offset (reduce if sox warns about clipping)-90— initial signal level0.2— delay before processing
A practical podcast leveling chain:
sox dynamics.wav podcast.wav \
highpass 80 \
compand 0.3,1 6:-70,-60,-20 -5 -90 0.2 \
norm -3
Other time-based effects
Sox also provides echo (discrete delays), chorus, and flanger.
Their defaults are reasonable starting points; man sox covers the
tuning parameters.
Synthesis
synth — generating audio from nothing
-n in the input position means “no input file; generate audio.”
synth tells sox what to generate.
play -n synth duration waveform frequency
Waveforms
play -n synth 3 sine 440
play -n synth 3 square 440
play -n synth 3 triangle 440
play -n synth 3 sawtooth 440
Noise
play -n synth 5 whitenoise
play -n synth 5 pinknoise
play -n synth 5 brownnoise
Sweeps
Specify frequency as a range to sweep:
play -n synth 5 sine 100:8000 # 100 Hz → 8 kHz over 5s
Chords
Multiple waveforms on one synth generate simultaneously:
# C major: C4, E4, G4
play -n synth 2 sine 261.63 sine 329.63 sine 392.00 gain -6
Specifying output format
The output format follows your system’s default audio configuration,
which may not be what you want. Specify it explicitly with output
format flags between -n and the output filename (see “Fully-specified
output” in chapter 5):
sox -n -r 44100 -b 16 -c 1 out.wav synth 3 sine 440
Adding effects
play -n synth accepts a full effects chain:
play -n synth 10 sine 440 reverb 80
Batch Processing
Sox works well in shell scripts and pipelines. The examples here
assume a POSIX shell (bash, zsh, etc.).
Shell loops
Process a directory
mkdir -p normalized
for f in *.wav; do
sox "$f" "normalized/$f" norm -3
done
Construct output filenames
for f in *.wav; do
out="${f%.wav}_clean.wav"
sox "$f" "$out" highpass 100 norm -3
done
${f%.wav} strips the .wav suffix.
Batch format conversion
mkdir -p mp3
for f in *.wav; do
sox "$f" "mp3/${f%.wav}.mp3"
done
Use soxi in scripts
duration=$(soxi -D "$f")
if awk "BEGIN { exit !($duration > 5) }"; then
sox "$f" trimmed.wav trim 0 5
fi
Parallel processing
ls *.wav | xargs -P 4 -I{} sox {} "out/{}" norm -3
Check exit codes
Sox exits non-zero on errors. Always check in scripts:
for f in *.wav; do
sox "$f" "out/$f" norm -3 || echo "Failed: $f" >&2
done
Piping between sox processes
The -p flag emits sox’s internal format on stdout — no need to
specify sample rate, bit depth, or encoding on the receiving end:
sox voice.wav -p trim 0 3 | play - reverb 80
This avoids intermediate files in multi-step pipelines.
Troubleshooting
Most sox problems fall into a small set of patterns. Each one has a quick diagnostic.
Silent output
The file exists, soxi shows sensible numbers, and you hear nothing.
Likely causes, in order:
- Audio device: another app has the output, or
playis pointed at the wrong sink. Tryplayon a known-good file first (play -n synth 1 sine 440). If that is silent, it’s not a sox problem. -vin the wrong section:sox -v 0 input.wav out.wavscales the input to zero, producing a silent file — no error. Check that-vbelongs where you put it (see chapter 2).- System volume muted at the OS level — check that independently.
Clipping
Clipping sounds like harsh distortion on loud passages — a kind of fuzzy crunch that tracks peaks rather than being continuous. Common causes:
gain Nafternorm:normlifts the peak to 0 dBFS, thengainpushes above it. Reorder, ornorm -Ninstead.- Mixing without headroom:
-msums inputs, so two full-scale signals clip immediately. Either-v 0.5each input ornorm -3the result. - Upsampling a signal that was already at 0 dBFS — the interpolator’s ringing can exceed the original peak.
Detect clipping with stats:
sox output.wav -n stats
Watch the Pk lev dB line and the Flat factor / Num samples
report for saturated counts. A non-zero Flat factor on output that
shouldn’t have any flat runs is a strong signal.
Format mismatch on concat or mix
Sox hard-fails when inputs to concatenation or -m mixing differ in
sample rate or channel count:
sox FAIL sox: Input files must have the same sample-rate
Fix by pre-processing the outlier:
sox other.wav -r 44100 other-44k.wav # match the rate
sox mono.wav -c 2 mono-stereo.wav # match channels
sox main.wav other-44k.wav combined.wav # now concat works
Or do it in a single pipeline with -p:
sox other.wav -p rate 44100 channels 2 | sox main.wav - combined.wav
play fails on headless systems
play needs a working audio device. On servers, CI, and containers,
it typically can’t find one and errors out. Diagnose a chain without
actually playing:
# Write to /dev/null-style null output and read stats
sox input.wav -n stats
# Or render to a temp file and inspect with soxi
sox input.wav out.wav <effects>
soxi out.wav
-n as the output lets effects run through to stat/stats
without needing a device.
Typos that silently mean something else
Effect names in sox aren’t validated against a “did you mean” list;
a misspelling is often a valid effect that does something completely
different. bass and bas both parse; one of them isn’t the
shelving filter. Similarly, format flags put in the wrong section
apply to the wrong file without a warning.
The defense: eyeball the command before running, and use -V3 to
see what sox actually thinks it’s doing.
Use -V for diagnostics
Sox takes a verbosity level from -V1 (errors only) up through
-V4 (everything it knows). -V3 is the usual sweet spot — it
prints the effect chain as sox understands it, including which
effects actually ran and with what arguments:
sox -V3 input.wav out.wav highpass 100 norm -3
If an effect isn’t doing what you expected, -V3 usually tells you
why in the first few lines.
Reading sox error messages
Most sox errors are in the form sox FAIL <subsystem>: <message>.
A few common ones:
sox FAIL formats: no handler for file extension ...— you asked sox to read or write a format without a-tflag, and the extension didn’t disambiguate. Add-t wav(or whatever).sox FAIL sox: Input files must have the same sample-rate— see the format-mismatch section above.sox FAIL rate: Input sample-rate ... is unchanged— you askedrateto resample to the rate it’s already at. Remove the effect.sox WARN ...: clipped N samples; ...— output clipped; see the clipping section above.
When in doubt, re-run with -V3 — the warning is usually right next
to the line that caused it.
Beyond sox
The manual
man sox is the authoritative reference — comprehensive, well-written,
and covers every effect and flag in detail. Two companion pages are
also worth bookmarking:
man sox # effects, global options, examples
man soxformat # format flags, encodings, file type details
LADSPA plugins
Sox can load any LADSPA plugin via the ladspa effect, which opens
up hundreds of production-quality processors — noise gates, limiters,
multiband compressors, pitch correction, and more:
# List installed plugins
listplugins
# Apply a plugin by label
play voice.wav ladspa <plugin-label> [params...]
On Debian/Ubuntu, apt install swh-plugins installs Steve Harris’s
widely-used collection. LADSPA extends sox without changing its
pipeline model.
ffmpeg
ffmpeg handles the containers and codecs sox can’t: AAC, Opus,
MP4, video tracks, streaming protocols. The two tools pair naturally
— use ffmpeg to get audio into or out of awkward formats, sox for
signal processing:
# Extract audio from a video, then process with sox
ffmpeg -i video.mp4 -vn audio.wav
sox audio.wav processed.wav highpass 100 norm -3
Rubber Band: high-quality time-stretching and pitch-shifting
Sox has no phase vocoder. WSOLA (tempo, pitch) is fast and
reasonable, but on complex music you can hear it working. For
higher-quality time-stretching, pitch-shifting, or near-unity
resampling, reach for
rubberband:
rubberband --tempo 1.2 input.wav output.wav # 20% faster (same sense as sox tempo)
rubberband --time 0.8 input.wav output.wav # 0.8x duration (--time is 1/--tempo)
rubberband --pitch 2 input.wav output.wav # up 2 semitones (not cents)
rubberband --pitch -2 --tempo 1.1 input.wav output.wav # combine freely
Rubber Band has two engines: R2 (default, fast, WSOLA-based) and R3 (slower, phase vocoder, noticeably better on music):
rubberband --fine --pitch 4 input.wav output.wav # R3 engine
rubberband-r3 --pitch 4 input.wav output.wav # equivalent
For vocal pitch-shifting, --formant preserves the formant
structure so voices don’t sound cartoonish:
rubberband --fine --formant --pitch 3 voice.wav output.wav
rubberband doesn’t support stdout, so to combine with sox for
format conversion, route through a temp file:
rubberband -q --pitch 2 input.wav tmp.wav && sox tmp.wav output.flac
libsox
Sox is also a C library. If you need to embed audio processing in a
program, libsox exposes the full effect chain and format I/O via
a C API. The header is sox.h; the source ships with examples.
What sox isn’t
Sox is a Swiss Army knife, but some problems aren’t shaped like a knife. Knowing what sox is not good at saves time:
- No phase vocoder. WSOLA (
tempo,pitch) works well but produces artifacts on complex material; use Rubber Band (above) when quality matters. - No multitrack routing. Sox processes one stream at a time.
For independent tracks with sends and returns, look at
ecasoundor a DAW. - No streaming protocols. Sox reads and writes files and pipes; it has no RTSP, HLS, or WebRTC support.