Sample Rate and Bit Depth

Quick start

Up to a point, it's true to say that the better you understand concepts of sample rate and bit depth as they pertain to the quality of audio in a digital audio system and are able to apply them to your audio work, the better quality your final product will be. A good understanding of this information could better inform future equipment and software purchases, help you make decisions regarding what quality to work at along the various stages of a project from initial concept to final distribution-ready product and will help you avoid converting back and forth between sample rates and bit depths too many times in a given project.

All this said, understanding this will only be useful up to a point. The law of diminishing returns does come into play as you get deeper into the numbers and people can get themselves tied in knots over things done to audio or things being contemplated, the results of which are sometimes contested even among audio professionals. Here are the essentials that you absolutely must understand, followed by a more in-depth explanation of the whys and wherefores.

the higher the sample rate, the more frequencies audio sampled at that rate can have. Note that you do not improve the quality of a piece of audio just by increasing its sample rate, It's already been recorded and the chance to capture those higher frequencies is gone. in fact, you may worsen the audio quality. However, you may make it possible to achieve a higher quality end result by processing audio at a higher sample rate during the mixing/post-production and mastering stages though.
Sampling at two low a rate can cause aliasing, an unpleasant form of digital distortion.
You must sample at over twice the highest frequency component that you want to capture in order to capture it accurately.
The higher the bit depth, the greater the dynamic range and the fewer and less significant the rounding errors in digital processing. Mix and process your audio at as high a bit depth as possible, regardless of what it was recorded in. As with sample rate, converting to a higher bit depth won't improve the dynamic range of your source, but it will help you preserve the dynamic range of what you have.
Two low a bit depth can cause quantization distortion. Always dither when converting to a lower bit depth to mask this.
CD quality is 44.1 KHz 16-bit.
DVD quality is 48 KHz 24-Bit.
Sound effects and royalty free music for use in productions are sold at a variety of bit depths and sample rates, but 48 KHz 24-bit is fast becoming the standard so that they are compatible with film.

Analogue and Digital

To all practical purposes, the real world is analogue. A sound wave is a vibrating stream of air molecules that moves smoothly back and forth. The frequency of these vibrations determines the pitch of the sound. The violence of these vibrations determines the sound's amplitude. There are no breaks in this sound wave.

When the sound wave causes the diaphragm of a microphone to vibrate, the vibrations are turned into an electrical signal with a certain voltage. All these quantities - the sound's rate of vibration, the violence of its vibration and the voltage it induces are all what is called continuous. That is to say there is a virtually infinite number of different possible values these quantities may have. A sound wave may vibrate at a frequency of 20 Hz or it may vibrate at a frequency of 20.0000001 Hz with a pressure level of 50 millepascals or 50.000000001 millepascals. these quantities may also vary by equally tiny amounts continuously. Analogue is a virtually infinite number of shades of grey.

Computers are digital systems storing information in binary code, which is made up of 0s and 1s. Zero means off, 1 means on. There is nothing continuous about that. There is no maybe in a digital system, no half on or almost off. It's all very black and white.

A computer must use groups of 0s and 1s, with a certain number of 0s and 1s per group, to make a model of the analogue world. The trick to making that model convincing is for the computer to take a snapshot or sample of whatever it is trying to model, the sound wave in this case, fast enough not to miss any discernible variations. Each sample is one group of 0s and 1s. Each sample must have enough places for zeros or 1s (called bits) to produce numerical representations of the breadth of values between which the quantity being modeled ranges. A full explanation of binary notation is beyond the scope of this article, but suffice it to say that, for every extra place you allow a computer to store either a 0 or a 1, an on or an off, the number of possible values that can be resolved doubles.

Sample Rate

The digital to analogue converter in an audio device measures the amplitude of the incoming audio regularly and assigns it a value. Each measurement is a sample. Each sample taken is a point being plotted on the graph that makes up the sound wave. This means that the number of complete cycles of the sound wave per second can be calculated, which is the frequency.

Nyquist

The higher the sound wave's frequency, the faster its amplitude will vary and the faster the digital to analogue converter will have to sample it to keep up. Like frequency, sample rate is measured in Hertz.

Nyquist's Theorum states: whatever the highest frequency of a sound you wish to capture, you must sample at double that rate.

This means that if the highest frequency you want to capture is 20000 Hertz, you would theoretically need to sample at 40000 Hertz to capture all the fine detail required.

Half the Story

We've established that, in order to capture all the fine detail of a 20 kilohertz sine wave, we would need to sample at twice that, 40 kilohertz. However, what if the sound being sampled contains frequencies above that magic number. Are they missed out altogether? Unfortunately not. Their presence will be recorded, but only jerkily. Variations will be missed out, the sound that is too high may appear to suddenly jump from one amplitude to another, rather than making a smooth transition. The audible effect of this is called aliasing. A harsh ringing quality can be imparted to the audio.

Aliasing does not just occur in audio systems. In the days of slower video cameras, it was possible to film a wheel spinning and for that wheel to appear to be spinning the wrong way because the camera was capturing frames per second more slowly than the wheel was revolving. This is exactly like what happens when a system has to deal with frequencies for which the sample rate is too low.

The solution

The solution is simply to record more than you need. If we want to capture everything within the range of human hearing accurately and keep it unadulterated by aliasing, we must capture frequencies higher than those we need and then filter them out using an anti-alias filter. This is why the standard sample rate for commercially distributed music is 44100 Hertz rather than 40000. It means that we accurately sample 22050 Hertz of the frequency spectrum, since 22050 is half of 44100.

there may be frequencies in our incoming audio above even this value, which will be inaccurately captured, however the aliasing problems this will create can be removed with an anti-aliasing filter, that does not allow anything above 20000 Hertz through at all, leaving our audio theoretically pristine.

Bit Depth

bit depth is the number of bits in a given sample. The number of bits determines the number and range of possible amplitude values that can be recorded.

Think of our audio like a mine-shaft for a moment. Bit depth is the rope ladder you have brought with you to climb down it and explore. The number of bits dictates the difference between the maximum amplitude that can be recorded, 0 DBFS, at the top of the shaft, and the quietest possible volume that can be recorded. Bit depth then, dictates the length of your ladder. If your ladder isn't long enough to reach the bottom of the shaft, too bad. You never get to find out what is below the bottom most rung because you can't get down any further, i.e. any sounds that are quieter than the lowest amplitude that can be recorded are snipped off.

to find out the theoretical dynamic range that can be resolved for a given bit-depth, multiply the number of bits by six. The lowest amplitude that can be recorded in an 8-bit file is -48 DB. The lowest amplitude that can be recorded in a 16-bit file is -96 DB.

Bit-depth doesn't just dictate the length of the ladder though, it also dictates the number of rungs it has and therefore the smoothness of your climb.

Using binary notation, the bits in a given sample make up an integer, i.e. a whole number. As mentioned earlier, the more bits per sample, the greater the number of values we can make.

an 8 bit sound can have 256 different amplitudes. But what if the analogue sound is at an amplitude that falls some way between two of these values? The amplitude must be rounded to the nearest resolvable value. Lots of variation in our original sound will be lost between the widely spaced rungs of our ladder, even if it is long enough to reach the bottom of the mine-shaft.

A Practical Example

What would the result be if we recorded a classical concert at 8-bit? Classical pieces have a wide dynamic range. If we want to capture the loudest instruments without clipping and we only have 48 DB of dynamic range to play with, we are going to lose little bits of the quieter passages, maybe even whole sections. The dynamic range will be truncated. Also, the tremendous range of different amplitudes encompassed by a complex piece of music, with all its crescendos ETC, must be simplified to just 256 different values. This rounding is called quantization.

What will the recording sound like? It will contain quantization distortion, i.e. the music may sound gritty, harsh, crunchy. and will lack the warmth of the original performance.

The solution

The solution, believe it or not, is to take the bit of our sample that represents the quietest sounds, the least significant bit (Lsb), and pick whether it will be a 1 or a 0 at random. The audible result of this is to add hiss at a very low level to the audio. This is like inserting a false bottom into our mine-shaft when we drill it, before we lower the ladder. We create a noise floor so that nothing can go below it, beyond the dynamic range allowed by the bit depth.

Adding this hiss is called dither. there are various different types of dither and you can make it even less audible with noise-shaping. A discussion of the relative merits of different dither types is beyond the scope of this article and, compared to other aspects of audio, not very important.

CD Quality

CD quality audio is 16-bit, which means it has a dynamic range of 96 DB, which means that, although it doesn't cover the whole range of auditory perception, which is estimated at approximately 130 DB, in order to perceive the full dynamic range of a CD, you would have to have the average stereo system at 75 % volume, which would not be listening at a safe level. This standard is a trade-off between file size and audio quality.

Why Go Any Higher

We have established that the CD quality standard, 44.1 KHz 16-bit audio, covers the entire range of human hearing and most of the range of auditory perception in terms of volume. Given that uncompressed wav files are large enough at this format as it is and given that consumers are more than happy to listen to lower quality, lossily compressed files, why would we ever want to bother with higher qualities? There are several answers to this.

Hearing What You Can't Hear

While most consumers struggle to tell if a sound changes by as much as a decibel and do not possess ears with the full theoretical frequency response, we are referring hear to conscious perception. Many argue that, while a consumer wouldn't be able to explain how CD quality audio sounded different to the real thing, the quality loss inherent in the conversion from digital to analogue and back again is still discernible.

Intermodular Distortion

There has never been any evidence that humans can hear anything above around 20 KHz. Even to hear sounds at that frequency, in some cases, the volume of the source has to be turned up loud enough to be close to the threshold of pain. However, sounds and harmonics of sounds above that frequency still exist and evidence suggests that we can hear them, albeit indirectly.

A complex phenomenon called intermodular distortion forms the bridge between sounds we can hear and sounds we cannot. Intermodular distortion is the complex interaction between sounds that produces even more harmonics. It is called distortion because it is a change from how sounds in their purest form would sound, but we never hear them that way. This distortion is how we hear real life every day.

If we cut frequencies above 20 KHz from our audio, they aren't there to interact with everything else and give sounds we can hear that familiar coloration. Audio enthusiasts argue that we should make audio available at higher sample rates so that this distortion is preserved.

the counter-argument to this is that analogue to digital converters, the hardware responsible for converting digitally stored date back into voltage that will make our speakers work, are often not good enough to reproduce these ultrasonic frequencies well ad that the results are not true to real life and can cause undesirable harshness.

Keep Out the Chill

Even though 16-bit files can produce so many different amplitudes across their 96 DB of dynamic range that the gap between each level is tiny, audio enthusiasts claim that these gaps between levels and the rounding errors they necessitate, which aren't present in analogue systems, rob audio of some of its warmth and clarity. Whether this is indeed true is the subject of hot debate. What is certain however is that one would need good equipment and good ears to tell the difference between 16-bit and 24-bit audio. Also, the amount of difference is likely to depend on the type of audio being compared as well. You are more likely to be able to tell the difference when listening to a piece of classical music with a wide dynamic range, than with a hiphop track where the levels are almost constantly maxed.

Behind the scenes

Where higher quality settings really come in handy is at the audio production stage, rather than at the distribution stage.

Every time you apply a process to a piece of audio you change it, often in unpredictable ways.

If for example, you raise the pitch of audio by an octave and you are dealing with audio at 44.1 KHz, you will run into the nyquist frequency pretty quickly, which will mean a high risk of aliasing. Likewise, many effects will change the audio such that higher frequency content will be added, which a sample rate of 44.1 KHz may not be able to accommodate.

Conversely, if you wish to lower the pitch of audio recorded at 44.1 KHz, it will quickly sound dull, that is to say all the brightness and sparkle imparted by the high frequencies will diminish as the pitch is decreased. Recording at higher sample rates means that, assuming your equipment is good enough, ultrasonic content will be captured, which will be shifted into the audible range and keep the audio sounding brighter as the pitch decreases.

Recording at 24-bit means that the recordist is walking less of a tightrope between the risk of clipping and running into the digital noise floor. If he/she records with his levels too low, at 16-bit, the very quietest sounds on the recording may be trunkated. If he/she records with the levels as high as possible for the optimum signal to noise ratio, an unexpectedly loud sound may cause clipping, the damage from which is very difficult to reverse. Recording at 24-bit gives the recordist more wiggle room.

Mixing with more than 16 bits cuts down the cumulative rounding error. Every time you perform an operation on a piece of audio that changes its volume, there is a high probability that rounding errors will occur. The best method of explaining this that I've ever seen is provided by Bob Cats.

Instead of losing you with esoteric concepts like 2's complement notation, fixed versus floating point, and other digital details, I'm going to talk about digital dollars. Suppose that the value of your first digital audio sample was expressed in dollars instead of volts, for example, a dollar 51 cents--$1.51. And suppose you wanted to take it down (attenuate it) by 6 DB. If you do this wrong, you'll lose more than money, by the way. 6 dB is half the original value (it has to do with logarithms; don't worry about it). So, to attenuate our $1.51 sample, we divide it by 2.

Oops! $1.51 divided by 2 equals 75-1/2 cents, or .755. So, we've just gained an extra decimal place. What should we do with it, anyway? It turns out that dealing with extra places is what good digital audio is all about. If we just drop the extra five, we've theoretically only lost half a penny--but you have to realize that half a penny contains a great deal of the natural ambience, reverberation, decay, warmth, and stereo separation that was present in the original $1.51 sample! Lose the half penny, and there goes your sound. The dilemma of digital audio is that most calculations result in a longer word length than you started with. Getting more decimal places in our digital dollars is analogous to having more bits in our digital words. When a gain calculation is performed, the word length can increase infinitely, depending on the precision we use in the calculation. A 1 dB gain boost involves multiplying by 1.122018454 (to 9 place accuracy). Multiply $1.51 by 1.122018454, and you get $1.694247866 (try it on your calculator). Every extra decimal place may seem insignificant to you, until you realize that DSPs require repeated calculations to perform filtering, equalization, and compression. One dB up here, one dB down here, up and down a few times, and the end number may not resemble the right product at all, unless adequate precision is maintained. Remember, the more precision, the cleaner your digital audio will sound in the end (up to a reasonable limit).

It is far better to mix at high bit depths and then convert to a lower bit-depth right at the end, with dither, to preserve as much of the original audio as possible.

Under Your Mixer's Hood

Once you've done your part and given your Digital Audio Work Station the most pristine audio that is possible or practical, it's then the job of the DAW to handle it with great care. For this reason, Reaper, for example, mixes at 64-bit floating point. For a discussion of the differences between fixed integer bit-depths that digital recorders and sound-cards use, which we have been discussing here and floating point, have a look at this article, which explains the concepts pretty well.

Audio Bit Depth

Conclusion

If you keep your audio quality as high as possible for as long as possible making as few conversions as possible, subject to the limitations of your equipment and the capacity of your storage media, you will have grasped the essence of managing digital audio in such a way that it should reach the end of its journey pretty much unscathed.

SampleRateAndBitDepth

Sample Rate and Bit Depth

Quick start

Analogue and Digital

Sample Rate

Nyquist

Half the Story

The solution

Bit Depth

A Practical Example

The solution

CD Quality

Why Go Any Higher

Hearing What You Can't Hear

Intermodular Distortion

Keep Out the Chill

Behind the scenes

Under Your Mixer's Hood

Conclusion