And how it affects your audio quality.
In the audiophile industry, there is an endless list of topics that spark debate. Contentious topics like expensive cables and high-resolution (hi-res) audio are some that especially rile up the community.
The definition of hi-res audio states that any music file recorded with a sample rate and bit depth higher than 44.1kHz/16-bit is considered high definition (HD) audio.
In this article, we will cover the fundamentals of sample rate and bit depth along with their impact on perceived audio quality.
We will also touch on another concept: bit rate. Bit rate, or bitrate, is commonly used to describe audio stream quality for music streaming services.
When sound is produced, it creates a pressure wave that propagates through the air. If the diaphragm of a recording device, such as a microphone, is nearby, the pressure waves in the air create a vibration in the diaphragm. Through the magic of transducers, this vibration, in turn, creates an electrical signal that varies continuously with the waves in the air.
This continuous and proportionate variation is where the term “analog” comes from.
The signal created by the diaphragm is often not strong enough on its own. Typically, a preamplifier first boosts the signal so that it can be recorded in a number of ways.
Throughout history, various materials have been used to record and store analog signals. This includes wax, vinyl disks, and magnetic tapes. Eventually, digital records were introduced and became commonplace.
Digital systems (ones and zeroes) record analog signals (continuously variable values) by sampling them.
By grabbing enough samples of an incoming analog signal and saving it into memory, digital records are able to capture and later on reproduce said signal.
A typical digital audio recording has as many as 44,100 samples every second. However, it is not unusual to see 96,000 samples a second with some digital audio formats.
There are several types of sampling methods but Pulse Code Modulation (PCM) is the de facto standard.
PCM serves as the industry standard for storing analog waves in a digital format. In a PCM stream, the amplitude of the audio is sampled at a uniform interval. PCM is non-proprietary so anyone can use it for free!
However, it is uncommon to find audio in PCM format due to two reasons:
- File size
- Playback compatibility
As PCM is uncompressed, the file size of the recorded audio is massive. It is possible to compress audio files using lossy or even lossless compression algorithms to retain the fidelity of the audio while reducing the file size.
Dolby and DTS are lossy audio compressions which are often used for this purpose as they’re capable of reducing PCM audio file sizes by as much as 90%.
Unfortunately, the way that Dolby and DTS encode PCM channels into a bitstream for storage and then decode it back for playback is not perfect. The resulting audio, though smaller in file size, isn’t always as clean and crisp as the original, resulting in a drop-off in accuracy and quality.
This is where lossless formats such as Dolby Digital TrueHD and DTS-HD Master Audio come in. They are capable of decoding the PCM audio signals exactly as they were originally captured.
Unfortunately, popular operating systems (OS) do not support the playback of PCM files natively. IBM and Microsoft defined the Waveform Audio Format (WAV) format for Windows OS while Apple used the Audio Interchange File Format (AIFF) for the Macintosh OS. Both formats are just a wrapper around the PCM audio format with additional audio information like author profile and title of the track, etc.
The fidelity/quality of a PCM stream is represented by two attributes:
- Sample Rate
- Bit Depth
These two attributes indicate how accurate the digital recording is to the original analog signal.
Think back to animated films from a couple of decades ago.
Films were just slides of still images being shown one after another to create the illusion of movement. The speed of the transition determined how smooth the resulting animation was. The faster the transition, the better the illusion of animation.
The speed of the changing slides is just like framerate when it comes to modern video.
In digital audio recordings, sample rate is analogous to the framerate in video. The more sound data (samples) gathered per period of time, the closer to the original analog sound the captured data becomes.
In a typical digital audio CD recording, the sampling rate is 44,100 or 44.1kHz. If you’re wondering why the frequency is so high when the human ear can only hear frequencies up to 20kHz at best. It’s because of the Nyquist-Shannon sampling theorem.
Commonly referred to as the Nyquist theorem or Nyquist frequency, this states that to prevent any loss of information when digitally sampling a signal, you have to sample at a rate of at least twice the highest expected signal frequency.
Other examples of common sampling rates are 8,000 Hz in telephones and anywhere between 96,000 Hz to 192,000 Hz for Blu-ray audio tracks. A sample rate of 384,000 Hz is also used in certain special situations, like when recording animals that produce ultrasonic sound.
Computer stores information in 1 and 0s. Those binary values are called bits. The higher the number of bits indicates more space for information storage.
When a signal is sampled, it needs to store the sampled audio information in bits. This is where the bit depth comes into place. The bit depth determines how much information can be stored. A sampling with 24-bit depth can store more nuances and hence, more precise than a sampling with 16-bit depth.
To be more explicit, let’s see what is the maximum number of values each bit depth can store.
- 16-bit: We are able to store up to 65,536 levels of information
- 24-bit: We are able to store up to 16,777,216 levels of information
You can see the huge difference in the number of possible values between the two bit depth.
Another important factor that bit depth affect is the dynamic range of a signal. A 16-bit digital audio has a maximum dynamic range of 96dB while a 24-bit depth will give us a maximum of 144dB.
CD quality audio is recorded at 16-bit depth because, in general, we only want to deal with sound that’s loud enough for us to hear but, at the same time, not loud enough to damage equipment or eardrums.
A bit depth of 16-bit for a sample rate of 44.1kHz is enough to reproduce the audible frequency and dynamic range for the average person, which is why it became the standard CD format.
Although there are no limits to sample rate and bit depth, 192kHz/24-bit is the gold standard for hi-res audio. (There are already manufacturers touting the 32-bit depth capability, eeks!) We will use 192kHz/24-bit as the reference for the pinnacle of recording fidelity.
So when is such fidelity required?
We know that the higher the sample rate and bit depth, the more similar our digital signal will be to the original analog signal. But it also gives us extra headroom.
Headroom refers to the difference between the audio signal’s dynamic range and what’s allowed by the bit depth. It’s kind of like driving a truck that’s 3 meters high through an overpass with a vertical clearance of 5 meters. This gives you 2 meters of headroom to work with, just in case you have an unusually tall load to haul.
Sampling in 16-bit gives audio engineers a dynamic range of 96db to work with. On the other hand, 24-bit ups the dynamic range to as high as 144db, although, realistically, most audio equipment can only go as high as 125db.
With the extra headroom, audio engineers can minimize if not eliminate the possibility of excessive noise or clipping, which is when sound waves essentially become flattened and cause audible distortion.
Clipping happened when the incoming electrical signal cannot be represented fully numerically. This can happen when the bit depth is shallow.
As the possible signal range of professional audio equipment is much larger than what the average person can hear, using 24-bit allows audio professionals to cleanly apply the thousands of effects and operations involved in mixing and mastering audio to make it ready for reproduction and distribution.
Other than the potentially redundant headroom, a higher fidelity recording creates a much larger file size.
Just to give you an idea of the difference in file size, let’s try and come up with a hypothetical scenario involving a five-minute uncompressed song.
1) First, calculate the bit rate using the formula sampling frequency * bit depth * No. of channels.
- 44.1kHz/16-bit: 44,100 x 16 x 2 = 1,411,200 bits per second (1.4Mbps)
- 192kHz/24bit: 192,000 X 24 X 2 = 9,216,000 bits per second (9.2Mbps)
2) Using the bit rate calculated, we multiply it by the length of the recording in seconds.
- 44.1kHz/16-bit: 1.4Mbps * 300s = 420Mb (52.5MB)
- 192kHz/24bit: 9.2MBps * 300s = 2760Mb (345MB)
Audio recorded in 192kHz/24-bit will take up 6.5x more file space than one sampled at 44.1kHz/16-bit.
So when do you need to record in 192kHz/24-bit?
It’s all down to what you want to do with the audio recording. Do you want to manipulate the recording and do you have unlimited memory storage? Then 192kHz/24-bit should be a no-brainer. But if you are intending to stream your music to your listeners, 192kHz/24-bit will suck up your listener’s bandwidth and rack up their internet bill.
Chris Montgomery, a professional audio engineer and the founder of the Xiph.Org foundation, provides an in-depth and technical explanation on why sampling in 192kHz/24bit doesn’t necessarily result in a superior listening experience.
He uses a combination of signal processing and how we humans perceive audio to help explain why sampling in 192kHZ/24bit makes no sense, while also giving readers an idea on how to conduct their own listening tests at home to try and verify things on their own.
You can check out the article by Chris.
Our opinion is that the law of diminishing returns applies to sample rate/bit depth. Once you hit a certain threshold, the marginal improvement in sound quality becomes smaller and smaller until it becomes negligible.
Bitrate (or bit rate, if you prefer) refers to the number of bits conveyed or processed per second, or minute, or whatever unit of time is used as measurement.
It’s kind of like the sample rate, but instead, what’s measured is the number of bits instead of the number of samples.
Bitrate is used more commonly in a playback/streaming context than a recording one.
The term bitrate isn’t exclusive to the audio industry. It is also prevalent in multimedia and networking. However, in music, a higher bitrate is commonly associated with higher quality. This is because each bit in an audio file captures a piece of data we can use to reproduce the original sound.
In essence, the more bits you can fit into a unit of time, the closer it comes to recreating the original continuously variable sound wave, and thus the more accurate it is as a representation of the song.
Unfortunately, a higher bitrate also means a bigger file size, which is a big no-no when storage space and bandwidth is a concern, such as with music streaming services like Apple Music and Spotify.
From the above section, we see that to stream an uncompressed 5-min song recorded in 44.1kHz/16-bit, it will take a bitrate of 1.4Mbps which is a significant amount of bandwidth.
Apple Music and Spotify circumvent this bandwidth issue by compressing the audio. Of course, file compression doesn’t come without consequences. For starters, Spotify limits the bitrate of audio files to 160kbps for desktop users and 96kbps for mobile users. However, premium subscribers have the option to listen to 320kbps audio on a desktop. Meanwhile, Apple Music subscribers are “limited” to a bitrate of 256 kbps.
There are also audio streaming services for those who prefer to listen to music with higher bitrates.
Both TIDAL and Qobuz Sublime+ are widely considered the go-to audio streaming services for those who prefer the best audio streaming quality, with Hi-FI options available for a monthly subscription of $19.99.
TIDAL supports 44.1kHz/16-bit FLAC files that can be stream at a bitrate of 1411kbps.
Of the two, the TIDAL Hi-Fi subscription offers more value for the money. This is because you gain access to a huge library of high-quality FLAC files, as well as 50,000 master-quality songs compressed using the proprietary Master Quality Authenticated (MQA) technology for better sound quality.
Given our example earlier, a typical five-minute 44.1kHz/16 bit song would have had a file size of 50+ megabytes uncompressed.
The MP3 codec was developed to solve this problem by making it possible to compress CD-quality audio without a loss of quality. Early MP3 encoders started off with 128kbps or 192kbps before eventually moving on to 320kbps to compete with other codecs. However, in audio streaming, Ogg Vorbis (Spotify) and AAC (Apple Music) are used.
It’s open source, in the public domain, and delivers high quality relative to the bandwidth required to stream it. We tried out several different file formats and did another shoot out a couple of years ago, and the Ogg Vorbis format came out on top.
The obscurity of the format isn’t so relevant in that users never see the files themselves, so if for some reason another format came into prominence that delivered a better ROI, it isn’t difficult to change to that new format – Spotify’s former Vice President.
Circling back to Chris Montgomery’s explanation, we now know that anything north of 192kbps on a decent encoder doesn’t really matter — the average human ear simply isn’t precise enough to be able to tell the difference.
This means that any music at a bitrate of 192kbps or higher becomes indistinguishable from its original audio analog as long as it was properly encoded in an Ogg, MP3, AAC, or FLAC audio file.
Of course, this doesn’t mean that a high bitrate isn’t useful. It does help guarantee a superior listening experience. However, this only applies in specific situations. For example, if you have a complete Hi-Fi audio system that can take advantage of the minute improvements in audio quality when streaming Hi-Fi audio files.
In general, the casual listener using the average headphone won’t benefit from streaming audio north of 192kbps.
In summary, sample rate is the number of audio samples recorded per unit of time and bit depth measures how precisely the samples were encoded. Finally, the bit rate is the amount of bits that are recorded per unit of time.
That wasn’t so hard now, was it?
Hopefully, we helped clear up some of the mysteries surrounding sample rate, bit depth, and bit rate using our guide.
Going forward, you should now be able to think critically when someone tells you how much “clearer” an audio file sounds based on its encoding process. More importantly, you should now find it easier to find the relevant audio formats and streaming services that meet your auditory needs.