If you buy into Malcolm Gladwell’s interpretation of the “10,000-Hour Rule,” popularized in his book Outliers, I’m a long way from being an expert at audio production. A very long way. SoundStage! Solo editor Brent Butterworth and I have recorded 22 episodes of the SoundStage! Audiophile Podcast to date, and I’ve mixed and mastered about half of them. Each episode takes me anywhere between 16 and 20 hours to produce and edit, which means I’m creeping up on somewhere around 200 hours of experience in the field of audio production. And 200 divided by 10,000 is math.
Needless to say, that makes me a novice. (Although, if this were D&D 5e instead of real life, I’d technically have enough XP to be a 5th-level artificer). And yet, even with as little experience as I have, my little toe-dip into audio production has influenced the way I think about audio reproduction in some pretty significant ways, which is of course going to influence the way I write about audio. It has reinforced some of my biases, challenged others, and solidified my thoughts on one of the most-oft-parroted-but-rarely followed maxims of the audiophilia world. And it occurs to me that the last point is just clickbaity enough to save for last.
But where to start? Let’s go with the least controversial.
Point 1: The “Circle of Confusion” is real, y’all.
This is a concept that Floyd Toole articulated in Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, although you can read a more digestible summary of the book on Sean Olive’s blog. What it boils down to, though, is this: audio is largely produced on gear whose quality is judged by how well it reproduces audio that was produced on gear whose quality was judged by how well it produces audio that—well, you get the picture. Without objective standards, it’s a tautology. Or maybe it’s an ouroboros; I don’t know. What I do know is that although both Floyd and Sean explain it brilliantly, it wasn’t until I started producing audio that I realized what a difficult problem it is.
I explained this a bit in my recent rant about audio and video standards, but just to recap: until recently, when producing the podcast, I would do my first-pass mix on a pair of headphones before switching over to my nearfield desktop system, which always revealed some tonal quirks that I would have to go back and tweak with EQ. And once I got a mix sounding good on my headphones and nearfield system, I’d throw it over to my hi-fi stereo system, only to hear more quirks that needed to be EQ’d. Then I would hop in the car for another listen, only to discover more weirdness. And back and forth I went. Settling on a mix that sounded good on high-quality headphones, crappy earbuds, a nearfield desktop system, my stereo rig, and the pretty rockin’ sound system in my wife’s Hyundai Tucson was like playing a game of Whac-A-Mole, but with a scalpel instead of a mallet.
As discussed in the aforementioned rant, though, since adding a plugin to my stereo bus that corrects the frequency response and phase of my headphones, I don’t have to whack moles anymore. My first-pass EQ, done via headphones that now conform to the Harman curve, sounds great on my desktop system, great on my hi-fi system, great in my wife’s new whip, and even pretty darned good on the bargain-bin true wireless earphones I normally use only for phone calls.
So what bearing does any of this have on my attitude toward audio reproduction? It really comes down to this: I’m now even less inclined to tolerate tonally idiosyncratic electronics with their own distinctive voices. I used to pull out my B&W PX7 wireless cans every few months—mostly when my preferred Sonys were charging and I needed to rock right now—just to try and find some love for them. So many of my friends adore them, after all, and they are gorgeous. Frankly, though, I’ve finally given up even trying to tolerate them, and now I want to defenestrate them because the B&W app won’t let me EQ their distinctive tonal character out of existence.
It’s hard enough to produce audio that sounds consistently good across a number of reasonably neutral sound systems. Add radically tonally idiosyncratic gear into the mix, and it really just becomes a frustrating guessing game. More bass or less bass is fine. That’s a crapshoot anyway. But once you start mucking around with the midrange and treble, you’ve lost me. And now I finally have the weensiest bit of production experience to explain why.
Point 2: Hi-res is stupid, but it isn’t. But it is. But not really.
You may be asking yourself what podcast production has to do with high-res audio, given that Brent and I record in 16/44.1 WAV and distribute in MP3 at the same bit-depth and sample rate. And when I first started mixing and mastering some of those earlier episodes (which I can barely even stand to listen to now), I would have agreed with you.
But if you’ve listened to the podcast (and if you haven’t, why not?!), you’ll know that Brent and I insert interstitial music between each segment. And when you introduce music under dialogue, you have to use side-chain compression to keep it from being a jumble of unintelligible noise or have one element of the mix just get completely lost. If you don’t know what the heck side-chain compression is, by the way, make sure to check out Mark Phillips’s excellent article about audio mixing over on SoundStage! Experience. He digs way deeper than I have the space to do here.
For the purposes of this article, all you really need to keep in mind is that as soon as you add dynamic-range compression to a music track, you’ve moved into the domain of music production, not merely podcast audio production. And I’ve learned that I prefer different types of compressors when I’m working with music as opposed to our voices. With our main dialogue track, I like really clean digital compressors, like ToneBoosters’ Compressor 4, along with an automated fader rider. When I’m compressing Brent’s demo music tracks, or those from our friend Terry Landry, I tend to prefer using plugins that emulate old analog gear, like the CLA-2A plugin from Waves.
It took me forever to figure out why I couldn’t get the CLA-2A to sound right, though. And what I eventually discovered is that if you want to emulate analog in the digital domain, you really need to work in 96kHz or even 192kHz. And it’s really just a function of aliasing. This will be easier to explain in video form, so if you don’t mind, join me over on YouTube for just a minute.
Now, here’s the thing that might not be intuitive. I can hear the difference between working in 44.1kHz (the sample rate at which we record) and working in 96kHz so I can oversample my analog-emulating compressors. Big difference. The sort of difference that anyone can hear. But when I low-pass the results and render the file back down to 44.1kHz, I can’t hear a bit of difference between the CD-resolution render and the 96kHz working file. Not a bit.
In other words, it makes all the sense in the world to mix and master in high-res, but I can’t see any benefit whatsoever to playing back a high-res file on my stereo system. Working in high-res is essential. Listening in high-res is just sort of whatever.
And all of the above sort of sounds like my status quo position for years now, true. I’ve never made much secret of the fact that I think streaming and downloading high-res audio is sort of a cargo-cult ritual. What’s different is that I now have just enough hands-on experience to understand why so many music producers work in high-res. And, hey, if you want to lay claim to owning (or streaming) a perfect copy of the studio master, then sure. There’s something to that. The next point might provide some ammunition for such a stance, depending on how you look at it. But when it comes to resolution, what works for production doesn’t necessarily have much bearing on what’s needed for full-fidelity reproduction.
Point 3: Sometimes, the weirdest stuff affects sound quality.
There are, of course, some really obvious things that can ruin the sound quality of an episode of the podcast. Technical glitches. Monitoring problems that don’t let me hear one of our signals clipping when we’re rolling tape. Brent recording from a random Motel 6 in the middle of nowhere with blankets hung over chairs and dressers as makeshift acoustical treatments.
But there are also some unexpected bugbears that legitimately make a difference. Thankfully, I have more control over these. I’ll give you one example. As mentioned above, Brent and I record in WAV, but of course no podcast is actually released in anything other than MP3 or AAC or Ogg Vorbis. That might change if Qobuz ever gets into the podcasting game in a serious way, but for now it’s all lossy compression all the time.
But here’s the thing about Reaper, the digital audio workstation (DAW) that Brent and I use to produce the podcast: you have a bazillion and three options for the formatting of the file it spits out once all the effects are baked and the stems are rendered and a wall full of disparate audio elements is folded down to a simple stereo track. I can have Reaper spit out a 192kbps MP3 track or a high-res FLAC file in anything up to 24/192. We always deliver our final edit in 320kbps MP3, though, so the first time I rendered the final track for an episode I was mixing, that was the format I chose.
And it sounded kinda bad. Not horrible, but the artifacts bothered me. That’s doubly weird given that I’ve always said a truly well-encoded 320kbps MP3 file doesn’t really sound meaningfully different from uncompressed CD audio or even high-res with most music (there are, of course, tracks that will reveal the flaws in almost any lossy format, but that’s generally not the sort of music I dig).
Around the time I realized this, though, a memory hit me. When I was doing some editing in Audacity, before I switched to Reaper, I didn’t think the MP3s it spat out sounded bad at all. So I went back to Reaper and rendered that episode in 16/44.1 WAV, pulled that into Audacity, and converted it to MP3. ABXing the resulting files, I found no meaningful difference between the Audacity MP3 and the Reaper WAV, but the Reaper MP3 stood out like a sore thumb every time. So I started digging. Both Reaper and Audacity rely on LAME for MP3 encoding. Reaper uses v3.99.5; Audacity uses v3.100. The differences between those two versions? Nothing that ought to affect audio quality, as far as I can see. But they sound different. I can hear it every time.
Of course, there are going to be some self-professed golden-eared snobs who read this and claim victory, secure in their belief that they knew all along how horrible lossy compression was. But it doesn’t prove that. All it really proves is that the encoder matters every bit as much as the format. This is something we all need to keep closer to the forefront of our minds.
Another example: as I mentioned above, Terry Landry has been kind enough to give us his demo tracks for an album he’s working on, so we have a little more variety in our bumper music. Aside from—I believe—the saxophone, everything in these tracks is MIDI, and none of it was intended as a final mix or master, so understandably it’s a little flat. And I mean that in terms of soundstaging, not the composition. As such, Terry gave me permission to remaster the demos for our purposes, not only to make some room for our voices, but also just to give the music a little extra zhuzh.
Here’s the craziest thing: I was using these demos to tinker around with a new tape-bus emulator I was trying out, and at a certain point while futzing with the exact balance of tape speed, saturation, hiss, and asperity, all of a sudden the music just leaped right out of my speakers and into the room. The depth of the soundstage was unmistakable. There was air and “room” in the music that simply didn’t exist in the original demo track. And all I’d really added were distortion and noise.
I copied my settings and tried them on another track. They didn’t work. I had to futz with the knobs for a solid five minutes to get a similar effect, but “similar” was about as close as I could come.
Mind you, give me a few hundred more hours of audio production experience, or another few levels in artificer if you want to pick the D&D metaphor up again, and maybe I’ll understand the precise relationship between distortion, noise, and soundstaging. But for now, what I’m learning is mostly just how much I don’t know about how audio works.
And that’s OK. If anything, it keeps me honest—more inclined to simply report what I’m hearing rather than attempting to explain it unless I’m 100% secure in my understanding of the explanation.
But it also reinforces my notion that I want any sort of analog chaos in my music to come from the studio, not from my electronics. Sure, something a bit funkier and more boutique in terms of electronics might add some depth or air to my favorite recordings, but how that happens is a bit of a crapshoot, so let’s not pretend it’s true to the recording. Let the engineers and artists add what effects they want. They’ll come through just fine on gear that doesn’t do its own editorializing.
Point 4: Whoever said “trust your ears” is a freaking genius.
I went into this podcast knowing exactly diddly squat about audio production. When we first started, Brent had to kick my ass on a nigh-weekly basis to beat bad practices out of me before they became habits. I listened to YouTube gurus who recommended this plugin or that sure-fire shortcut for compressing vocals. I made a ton of mistakes and listened to a lot of the wrong people.
But at every step of the way, when critiquing my work, Brent would repeat that old nugget of wisdom that we all know so well: “trust your ears.”
I’ll admit, I’ve always sort of snickered at that adage. I’m more inclined to trust measurements than my own listening impressions. I generally doubt my own conclusions, and I’ll admit that I’m not as skeptical of objective measurements as I should be—especially given that measurements are performed by fallible humans whose methodology isn’t always rock-solid. That’s uncontroversial.
So many of the mistakes I made in those early days, I should have recognized as mistakes before I ever shared my rough drafts with Brent. I knew I didn’t like the way things sounded, but I was doing it the way the so-called experts said to do it. If I had just trusted my ears, I would have learned a long time ago that, while there are all sorts of objective standards for audio reproduction, audio production is largely a subjectivist’s game. I just want it to sound good. Not “accurate.” Not “neutral.” Not “high-fidelity.” Those are characteristics I’m looking for from my electronics and transducers. With production, though, I just want to make as many ears as happy as possible. And only my ears can tell me whether I’m coming close to that goal.
Mind you, I use spectrum analyzers and meters of all sorts to help me figure out why something might not sound good if it doesn’t. But when it comes right down to it, they can’t tell me whether something sounds good or not.
And it’s the same with the measurements we publish here on the SoundStage! Network. When we ran my review of the Monitor Audio Silver 300 7G loudspeakers, I legitimately lost sleep over the fact that the measurements from the NRC revealed a tiny little peak at 3.5kHz that I didn’t hear in my extensive subjective listening. It shook me so badly that I didn’t want to do another speaker review for a long time afterward. I prostrated myself in front of SoundStage! Network founder Doug Schneider and offered to perform seppuku. Here’s a speaker that I positively adored. One whose tonal balance I described as “pitch perfect.” But the measurements didn’t bear that out 100%. Close, but no lobster.
In retrospect, the lesson I should have learned from that sooner is that a little 1.5dB peak at 3.5kHz doesn’t affect my perception of a speaker’s tonal neutrality. But that doesn’t make the measurements any less valuable, especially when there’s an audible flaw I need to explain or verify.
In conclusion, working in audio production—even on a silly podcast—has changed the way I think about audio, and that’s necessarily going to change the way I write about audio. I honestly kinda think all audiophiles should do this. Go buy a copy of Reaper or some other DAW, or just use Audacity if you want a free solution. Go buy a halfway decent microphone and record yourself strumming a guitar or banging on the walls or just hitting a wooden block with a spoon. Or record a more musically inclined friend playing some groovy tunes. Then play with your recordings. Just play. Compress them. EQ them. Mix them. Rough them up with harmonic distortion and add some noise and hiss.
I don’t know what lessons you’ll take from it, but I guarantee it’ll change the way you think about audio.
. . . Dennis Burger