Combining Files
Sample files — download and place in your working directory:
⬇ music.wav — “Erase Data” by Koi-discovery (CC0)
⬇ voice.wav (CC0)
Setup:
sox -n a.wav synth 3 sine 440 gain -6
sox -n b.wav synth 3 sine 660 gain -6
# normalise samples to a common format for mixing
sox samples/music.wav -c 1 -r 44100 music.wav
sox samples/voice.wav -r 44100 voice.wav
Per-input format flags
With multiple inputs, input-section format flags repeat independently for each input file — place them immediately before the file they describe:
sox [input-a] infile_a [input-b] infile_b [output] outfile [effects]
Any input flag works this way: -v, -r, -b, -c, -t, -e.
The most common use is -v for per-input volume (shown below), and
format flags when combining files of different types or encodings.
-v takes a linear multiplier only — there is no dB form. Common
conversions: −6 dB ≈ 0.5, −12 dB ≈ 0.25, −20 dB = 0.1.
sox -v 0.8 a.wav -t raw -r 48000 -b 32 -c 1 -e signed-integer -v 0.5 b.raw out.wav
play out.wav
Concatenation — A then B
List multiple inputs before the output:
sox a.wav b.wav combined.wav
play combined.wav
Files must have identical sample rates and channel counts — sox
hard-fails if they differ. Use rate to resample first if needed.
For a smooth crossfade at the join, use the splice effect:
sox a.wav b.wav out.wav splice 3 # crossfade at the 3-second mark
play out.wav
Mixing — A over B
The -m global flag sums inputs together rather than concatenating:
play -m music.wav voice.wav
Mixing raises the overall level — normalize afterward to avoid clipping:
play -m music.wav voice.wav norm -3
Set per-file volume with -v immediately before each input:
play -m -v 0.3 music.wav -v 1.0 voice.wav norm -3
Merging channels — A and B side by side
-M puts channels from each file side by side. Two mono files
become one stereo file:
sox -M left.wav right.wav stereo.wav
play stereo.wav
remix — channel routing
Where -c uses sox’s default averaging/duplication, remix gives
explicit control. Each argument describes one output channel by
naming the input channel(s) that feed it.
play stereo.wav remix 2 1 # swap L and R
play stereo.wav remix - # average all channels to mono
play stereo.wav remix 1 # keep left channel only, drop right
play stereo.wav remix 1,2 1,2 # both output channels = L+R mix
- averages all input channels into one output channel — equivalent
to -c 1 but as an explicit effect. 1,2 sums channels 1 and 2.