An Audio Quality Primer

December 29, 2020

#culture #philosophy #science #technology

Basics

Sound is the vibration of something within air that causes waves, or pulses, of that air that can be measured by the number of waves per second.
This is measured in Hertz, which is one wave per second.
The human theoretical limits for hearing are 20 hertz (20 waves per second), and 20khz (20,000 waves).
There isn’t much musical information beyond 10-15Khz.
Most humans can’t hear much beyond 15Khz anyway, although our theoretical limit is 20Khz.
When you listen to a live band you’re hearing analog sound, meaning the mechanics of your ear are being directly influenced by the air coming off of instruments and vocal cords.
When you listen through technology there’s a necessary analog to digital conversion that takes place, because computers can only do 1’s and 0’s.
The trick is that when you go from analog to digital, you have many options.
The main two options are how often you take a sample from the analog (real) thing, and how much dynamic range is possible within each sample.
CD Quality is known as 16/44, which is shorthand for 16-bit samples taken at 44Khz, meaning the samples are taken 44.1 thousand times per second. This level is considered very high quality compared to most music streaming services from before 2018 or so, because most of them were encoding down to much lower quality to avoid buffering over mobile internet connections.
A very super-high quality file or stream sits around 24/96, which can be found in offerings like MQA on Tidal.
The extremes here are quality levels like 32/192 and beyond.
You need to sample at twice the rate of the frequency, so if you want to sample 20Khz you need to sample at 40Khz or higher, hence 44.1 or 48 Khz.
Ultimately these are resulting in a dynamic range, which is how quiet and loud something is, and specifically what the distance is between the quietest and loudest sound in a file.
A 16-bit recording has a dynamic range of 96 decibels (DB), and a 24-bit file has a dynamic range of 144DB.
Contrary to what many believe, a higher bit-rate than 16-bits, or a higher sample rate than 44.1Khz does not give you better sound by itself. Those numbers already exceed the capabilities of human hearing, so going beyond them does nothing for the sound quality during playback.
There are production reasons for tracking and mixing at a 24 or 32 bitrate, however, which basically come down to giving yourself room to make mistakes.

Human ear sensitivity is 10-12 w/m2

Decibels are measures of loudness, and they’re logarithmic, meaning they scale by the exponent, hence DECI-bel (10).
Humans can hear between 0 and infinite decibels, but the scale is extreme.
A whisper is 40 db, normal voice is around 60, a playground is around 80, 90 is where you start damaging your hearing, a loud concert might be around 100, a plane taking off is around 130, and you can evidently kill someone with around 160 to 180 decibels.
There is tons of confusion in comparing audio sources, streaming Music sources, etc., and many of the different file types.
Encoding is the process of converting from one format to another, including from analog to digital.
Whenever you encode you have the possibility of losing data, and this is especially true if you’re trying to make a smaller file.
If you’re trying to go from CD quality (16/44) to a smaller file—for example because of limited bandwidth—that is why we came up with encoding like mp3, which has different levels of quality.
Low-quality mp3 files are very small, but sound worse because they strip away data from the original
As you get higher and higher quality of mp3, such as 256 and 320, you end up with less audio quality reduction, but larger file sizes
The highest quality comes from super high-quality recordings of the analog experience, which happens in the original studio recordings, and quality there can be as high as 32/192 (and higher). But again, these don’t give you better playback by themselves; they just help with production.
It’s good to have this level of original to work with, which you can then encode downwards from for various uses, such as streaming.

If you sample at double the frequency you get the crest and the trough, so you can recreate the rest of the wave perfectly. There is no need whatsoever to go higher.

When you go down from those super high quality levels to 16-bit, you do actually lose data in the various ranges, especially at higher frequencies. More
The part that is controversial is how much REAL WORLD effect the various encodings have on the actual listening experience.
This is why there are so many philosophical debates within the music and audiophile communities, and this is also why I wrote this primer.
The truth is this: there are so many variables in play in the equation of listener experience here are some of them:
- The quality of the original recording
- The quality of the encoding of the file you’re listening to
- The quality of the equipment and environment that you’re listening on
- Your own personal biases and psychological priming that’s currently in effect as you listen
It’s established science that the human mind can easily be tricked into thinking something is better or worse based on what the person was told beforehand, or what they just experienced right before.
So if you hear a horrible recording encoded into a tiny mp3 file, and go from that to a halfway decent situation, the jump might sound far more dramatic than a jump from decent to extraordinary.
The human brain is a major factor in this equation, and it’s too often discounted as an explanation for differences

Analysis and takeaways

When you are doing comparisons, as a human, between two different musical audio experiences, you have to consider the full stack of variables.
Are you listening on the same equipment?
Are you listening on the same app?
Do those apps have different EQ settings built in that can radically change the sound?
Do the two apps have different versions of the actual song, i.e., different recordings?
Is one a different mastering of the song than the other?
And finally—are you primed psychologically to hear one thing or the other?
You should always suspect bias in yourself, and look for ways to reduce it

TL;DR: Optimize your entire chain, and be suspicious if you hear someone say that the difference in an experience comes from bit/sample rates above 16/24. It’s probably one or more of the factors above.

Notes

Logarithms take super large or super small numbers and turn them into nice numbers. A log(10) of 100,000 is 5 because 100,000 has 5 zeroes. The log(10) of one billion is 9 because it has nine zeroes. The cool part about this is that you can deal with massive numbers like 100K and 1 billion by working instead with 5 and 9.