Sanders Sound Systems - Digital Recording White Paper

Digital Recording White Paper

Most audiophiles do not understand exactly how digital recording works, so allow me to shed some light on the subject. Since I do not know your level of technical knowledge, I will start with the basics. I apologize if some of this is review, so please bear with me.

I will be discussing mainly linear PCM (Pulse Code Modulation) recording because this is the format used by CDs. There are other digital systems (like SACD and DSD), but they are not as good as linear PCM, are not generally commercially available, so I will only briefly mention them.

The linear PCM standard used by CDs is specified in detail in the "Red Book" developed by the Sony and Phillips engineers when they invented the CD back in the 1980s. I will be referring to this standard often throughout this discussion.

The Red Book engineers wanted to produce a digital recording system that would reproduce music perfectly -- but use the minimum amount of data so that they could maximize the recording time on a CD. They wisely did not compromise performance -- but nor did they use more data than what was necessary.

Analog recordings of the day were compromised in many ways. Specifically, the frequency bandwidth of LPs and FM multiplex broadcasts were limited from 30 Hz to 15 KHz. The S/N (Signal to Noise ratio) was limited to around 40 dB. Even the best studio, open-reel tape decks could barely achieve a bandwidth of 20 Hz - 20 KHz with a S/N of 68 dB. By the late 1970's better tape oxide formulations got the S/N up to 72 dB. No analog system could capture the full dynamic range of a symphony orchestra. None had a silent background.

Analog tape decks have loads of flutter caused by imperfect capstan bearings, capstan shafts that weren't round, and tape scrape flutter (where the tape moves in tiny jerks across the tape head). LP's were plagued with "wow" due to eccentricities in the disks caused by the center hole not being perfectly concentric with the record grooves. Both wow and flutter are inaccuracies and variations in the frequency.

Tape decks also suffered from amplitude flutter caused by variations in the tape coating thickness. If you recorded a steady state tone from an audio generator, you would see +/- 2 dB fluctuations in the output from a good tape deck. This flutter is easily be heard on music that has sustained tones. A good example is slow, sustained piano music.

The frequency response of these analog recording systems was not only limited to just a portion of the human hearing range, but the linearity of the frequency response was quite poor. It is typical to see frequency response variations greater than plus/minus 3 dB.

As if all this weren't bad enough, the THD (Total Harmonic Distortion) and IMD (InterModulation Distortion) of analog recording systems often exceeded several percent. As a result, analog recordings always sounded obviously different than the live microphone feed. Analog tape recorders simply could not capture, store and reproduce music with accuracy.

LPs had substantially worse performance than open reel tape. So by the time you made a recording on analog tape and then transferred it to a vinyl LP, the accumulated errors were severe. As a result, the LP playback on an audio system was quite different from the original, live performance. A vastly better way to record music was needed. The reason that digital recording was developed was to resolve the serious problems and limitations caused by analog recording.

The Sony and Phillips engineers who invented digital recording decided to produce a digital system that would solve all these problems. This meant that the S/N of the new system must be greater than 86 dB. This is minimum needed to produce the full dynamic range of a symphony orchestra, which is about 72 dB. The noise floor needs to be at least 10 dB below that to produce a silent
background. So they settled on a S/N at a very conservative 96 dB, which was 10 dB better than the minimum required.

The engineers wanted to capture and reproduce the full frequency range of human hearing, so their CD was able to record 20 Hz to 20 KHz. Actually the nature of a CD is such that it will record right down to DC, which is zero Hz. But the highs are limited to the extremes of human hearing at 20 KHz.

Most adults cannot hear 20 KHz. But the Red Book engineers made no compromises, so pushed the frequency response all the way to 20 KHz. By comparison, analog recordings were limited to 15 KHz.

They insisted on having extremely linear frequency response. Plus/minus 3 dB produces very obvious flaws in the reproduced sound, which simply was not acceptable. So the Red Book frequency response was specified to be better than 0.1% across the entire bandwidth.

The Red Book engineers would not accept any short-term frequency and amplitude variations. They found the usual analog wow and flutter errors in analog were in excess of 2%. These flaws ruined the realism of the sound and could not be accepted. To solve this problem, the engineers used a quartz clock instead of mechanical devices to lock the frequency and amplitude to incredibly low levels -- less than 0.001%!

Finally, the THD and IM distortion of the system had to be reduced to a tiny fraction of a percent. A typical CD player will have digital distortion well under a thousandth of a percent. The usual limit on distortion will be the inherent distortion in the analog input and output buffer amplifiers, which will be far more than that produced by the digital system.

Now let's look at how a digital system works to solve all the above problems. You have surely noted the two key specifications on a PCM system. They are the sampling rate and the bit depth. Just what do these do and how do they work?

Most audiophiles completely misunderstand how they work. For example, they think that the sampling rate defines the "resolution" of the system. They imagine that the sampling rate defines how many times the wave form is sampled in one second and that the wave form is then reconstructed during playback as discrete points. They then further imagine that these points are connected by straight lines that form "stair steps" in the digital wave form.

Now based on this view of digital recording, it is completely understandable that these audiophiles would conclude that a digital wave form is missing information compared to the original analog wave form -- and that the digital wave form is not smooth. They would further assume that a higher sampling rate would provide more detail to the wave form, thereby increasing the "resolution" and accuracy of that wave form.

Their view is total fantasy. Digital systems simply do not work that way.

In particular, the whole purpose of a DAC (Digital to Analog Converter) is to produce a perfectly smooth, complete, and accurate wave form. There are absolutely no "stair steps" in a digital wave form. In fact, a digital recording system produces a far more accurate wave form of the original signal than any analog system can.

If you doubt this, let me point out that if there were "stair steps" in the wave form, the distortion would measure extremely high (higher than 50% THD). But the distortion in digital signals is vastly lower than any analog system, measuring only a couple thousandths of 1% at worst, while analog recording systems measure several percent.

In short, a digital system produces an essentially perfect and complete wave form. There are no steps in it. It is more accurate than the wave form produced by an analog recording system.

So if the sampling rate doesn't determine the "resolution", just what exactly does it do? Before explaining, let me point out that there is no such specification as "resolution" in audio engineering. This is another audiophile myth. Therefore the sampling rate does not define resolution, it defines the highest audio frequency that the system can capture, store, and reproduce.

In a linear PCM system, the sampling rate must be twice the highest frequency of interest (known as the Nyquist frequency). Since the Red Book engineers wanted to reproduce 20 KHz music, they had to sample the music at twice that -- 40 KHz.

You may note that Red Book CD does not sample at 40 KHz. It samples at 44.1 KHz. Why?

A bit of extra bandwidth is required because the digital system must not be fed any frequencies higher than its Nyquist frequency as these will confuse the system and produce a lot of distortion. So all frequencies above 20 KHz must be eliminated by a filter. This is called an "anti aliasing filter."

The anti-aliasing filter will require some additional bandwidth in which to operate. By using a digital filter, the Red Book engineers were able to roll off the high frequencies at 96 dB/octave, thereby needing only 4.1 KHz of additional sampling to accommodate it.

Note that this means that in the worst case (20 KHz), the digital system will only sample the wave form twice. So if the audiophile belief that digital systems produced "stair steps" in the wave form, then a 20 KHz sine wave would actually be reproduced as a square wave.

But it is easy to see that this is not true. Feed a 20 KHz sine wave into a digital system like a digital signal processor or digital crossover and observe its output on an oscilloscope. The digital
components just described will feed the analog signal through an A/D converter to digitize it, then back out through a DAC to convert it back to analog. So the signal will have gone through a pair of digital converters. If audiophiles were correct, you would see a square wave at the output of the DAC. But instead you will see that the output is a perfect sine wave of vanishingly low distortion -- it will not be a square wave.

In short, linear PCM does not have any "missing pieces" or "stair steps" in the wave. The entire purpose of a DAC is to reconstruct the wave completely accurately and with virtually no
distortion. They do so magnificently.

So where does upsampling fit into this picture? To begin, let me point out the obvious, which is that one cannot add information or replace "missing pieces" of a wave form after the fact. So upsampling cannot produce accurate musical information where none was originally recorded anymore than one can reconstruct a drop-out on analog tape.

So what is the value of upsampling? Not much actually. But remember the extra sampling needed to produce a sliver of bandwidth for the anti-aliasing filter I mentioned earlier? It was only 4.1 KHz wide and required that a digital "brick wall" filter be used at 96 dB/octave. This conserved data space on the CD.

Some audiophiles believe that such a steep filter degrades the sound -- even though the filter operates in the supersonic range, which is well above those frequencies that humans can hear. They believe that a more gradual, analog filter will sound better. By upsampling the data stream, they can add all the bandwidth they want and by doing so, they can use analog filters.

Can the effect of analog anti-aliasing filters be heard? Obviously, anything that does not produce frequencies in the range of human hearing cannot be heard. But that doesn't keep some audiophiles from believing that they can hear the effects of supersonic filters. So some CD player manufacturers use upsampling thinking that will please audiophiles.

Now let's look at the word length. Red Book CDs operate using 16 bits. Why? What effect does the number of bits have on the sound?

Simply put, the word length defines the S/N of the system. Each bit is worth 6 dB of S/N.

As mentioned previously, one needs a S/N of at least 86 dB to produce a silent background. Sixteen bits will produce a S/N of about 96 dB, which is about 10 dB better than required.

Actually, in the real world, for many technical reasons including the need for "dither" and the fact that very few analog electronics can produce a S/N of 96 dB, most CD players only produce a S/N of about 92 dB. But this still produces a silent background and full dynamic range and is far better than any analog recording system.

The Red Book engineers picked the number of bits required to achieve a silent background and record the full dynamic range of all music. They did not use any more bits than necessary, nor did they include extra bits that would waste data space. Simply put, 16 bits is the number of bits required to reproduce music with a perfectly silent background. I'm sure you would agree that all properly-recorded CDs have silent backgrounds. You do not hear hiss and noise like you do with analog recordings.

So why would anybody want to use more than 16 bits? What would this gain you?

Many audiophiles believe that using 24 bits will produce better recordings. But when pressed to explain why this would be, they can't tell you.

The truth is that 24 bits will produce a digital S/N of 144 dB. Note that I said "digital" S/N. In reality, we can't listen to a digital signal. We must convert it to analog to play it through a speaker. There is no analog system that can produce a 144 dB S/N. About the quietest analog electronics can be is around 120 dB.

But the quietest microphones have a S/N of only 92 dB due to Browning Effect. This is the noise caused by the vibration of air molecules at room temperature striking the diaphragm of a microphone and causing it to make a small amount of noise. So it is virtually impossible to record music with a S/N greater than 92 dB.

A digital S/N greater than that is of no practical use when playing back music. After all, a silent background is silent and it is impossible to make silence any quieter. So adding bits on playback is simply a waste of data space.

Although there is no point in using more than 16 bits on playback, there is a good reason to use more than 16 bits when making a live recording. Understand that to get the full dynamic range of a 16 bit system, you must accurately place the dynamic range of the music in the 16 bit "window." If the recording level is too low, you won't use the entire 16 bit range and you will hear background noise on quiet passages of music. If you have the level too high, you will exceed the maximum level defined by the 16 bits and massive distortion will result.

Now when playing back music, the recording engineer will always know the levels and it is a simple matter for him to place his recording correctly in a 16 bit window. But when recording -- especially when recording live concerts -- the maximum sound level is not exactly known. So the recording levels must be conservative as exceeding the maximum digital recording level will result in massive distortion that will ruin the recording.

So for recording, it is best to have some extra headroom. Therefore, recording studios use 20 or 24 bit recording systems.

The extra headroom provided by more bits also is useful when the recording engineer needs to do mixing and processing where equalization may be desired. Boosting the energy at some frequencies using equalization requires more bits and might exceed the maximum digital limit. Once the recording is made, mixed, and processed, the final product can then be accurately placed in a 16 bit window so that it has a silent background.

So for recording, 20 bit or 24 bit systems make sense. But there simply is no point is using more than 16 bits for playback.

This brings up the topic of "High Resolution" (Hi-Rez) audio. Many audiophiles believe that higher sampling rates and more bits increase the "resolution" of the recording. This is utter nonsense for all the reasons that I have outlined above. The Red Book CD standard makes essentially perfectly accurate recordings and increasing the sampling rate and word length doesn't not make the recording any more perfect.

The latest fads in sampling rate is to use 96 KHz or even 192 KHz sampling. 96 KHz will record sounds up to 40 KHz (80 KHz captures the 40 KHz sound and the remaining 16 KHz are used for the anti-aliasing filter). 192 KHz sampling will record 80 KHz sounds (160 KHz sampling captures 80 KHz sounds while the remaining 32 KHz are used for the anti-aliasing filter).

Now think about that. What good does it do to record 40 KHz sounds? No music microphone records above 20 KHz, so the additional 20 KHz available in a 40 KHz recording system simply captures supersonic noise and wastes 50% of the data space.

A 192 KHz sampling system is even worse. Fully 75% of the bandwidth is used to record supersonic noise and wastes 75% of the data space.

Of course, these higher sampling rates are also combined with 24 bits. So the wasted data space is much greater than just described as one needs about 30% more data space for the extra bits.

Amazingly, these so called "Hi-Rez" recordings actually degrade the sound quality. This is because the supersonic noise will produce intermodulation products (beat frequencies) down lower in the audio range.

For example, noise frequencies at 37 KHz and 38 KHz will interact together to form intermodulation frequencies at 1 KHz, which is a frequency humans can hear. So hi-rez recordings will actually produce noise and distortion in the audio bandwidth, which degrades the sound while making no improvement in the sound in other ways.

Fortunately the amount of distortion and noise these systems add is small enough that most humans can't hear it. However, sensitive instruments like a distortion analyzer easily reveal the flaws.

There is an interesting article on this subject that I think you will enjoy. Here is a direct link to it:
http://drewdaniels.com/audible.pdf

DSD and SACD digital formats work differently than PCM encoding. They actually do produce steps in the musical wave form because they do not use a DAC to smooth it out. As a result, they are extremely noisy and have high distortion levels.

To deal with these problems, engineers use "noise shaping" to move the noise and distortion up into the supersonic region so we cannot hear it. This result is that they sound as good as a CD. They don't measure as well as a CD, and their flaws are still present, but a human cannot hear their flaws.

Because DSD and SACD do not use a DAC to produce a smooth wave form, they must sample at extremely high rates in order to make the "stair steps" in their wave forms small enough to keep distortion at a reasonably low level. This is why the sampling rate of these formats must be several MHz. DSD samples at 2.8 MHz and some of the newer DSD formats sample as high as 8 MHz.

Sampling at such high frequencies requires a huge amount of data storage. Since data storage costs money, it is very unlikely that these formats will gain wide acceptance or that the major recording labels will release music on this format. It simply makes more sense to use a DAC and PCM encoding to make recordings that are technically better and use less data than DSD. So do not expect DSD to take over the market any more than the now defunct SACD did.

Digital recording systems simply produce much more accurate recordings than analog can. For this reason, all modern recordings are made using digital equipment. Also, because digital recordings are essentially perfect, they can be copied repeatedly without any degradation.

By comparison, the serious flaws in analog recording mean that every copy is substantially worse than the one from which it was made. By the time an analog recording has been copied several times, its audio quality is so bad that it is unlistenable. It should now be clear why digital recordings sound better than analog ones.

If digital recording is more accurate, why do some audiophiles think that analog recordings sound better? The cause of poor sound from digital recordings is caused by the poor quality of the recording. In other words, garbage in gets you garbage out, no matter how accurate the recording system is.

Most of today's recordings are "engineered" to sound good in cars. To do so, they have been highly compressed, their frequency response has been altered, and they have a lot of artificial reverberation. Such heavily processed recordings do not sound natural when reproduced through a high quality audio system.

Many of the old recordings you find on LPs were made before the availability of inexpensive mixing equipment. So some old recordings were recorded in a very simple and pure way in excellent acoustical environments instead of in sterile recording studios. As a result, these recordings sound very natural and realistic.

What this means is that many modern digital recordings sound awful, despite the perfection of digital recording. Many old recordings sound wonderful, despite the flaws of analog recording.

Because audiophiles fail to do valid testing, they do not understand the cause of the poor sound they hear. So they make false assumptions. They assume that it is the digital recording medium that is the cause of the poor sound quality, when in fact it is the nature of the recording that is at fault.

In summary, digital recording is vastly superior to analog recording. Only digital recording can accurately record music, which is why all serious recording engineers use digital equipment.

To this point, I have discussed the most pure and common type of digital recording -- linear PCM. But the cost of data storage and transmission brings us to what is becoming the most popular digital format -- MP3.

MP3 uses complex computer algorithms to reduce the amount of data required. The algorithms are highly detailed and complex and there is insufficient time to discuss them in detail in this opus. Suffice it to say that MP3 is able to record high quality sound while dramatically reducing data storage and transmission requirements compared to linear PCM (or DSD) recording. The use of MP3 is why iPods can store thousands of songs and why you can listen to music over the internet.
When discussing MP3, it is essential to understand that the recording quality is defined by MP3's data rate. The data rate defines the amount of data that are processed per second and greatly affects the sound quality. The data rate is measured in KBPS (KiloBytes Per Second). Basically what this means is that higher data rates improve the sound quality at the expense of higher data storage requirements.

The relationship between data rate and sound quality means that you cannot simply use the term "MP3" without also specifying its data rate. Low data rate MP3 sounds very different from high data rate MP3, so the two are vastly different. Therefore audiophiles can't simply make a blanket statement like "MP3 sounds badly." Faults can be heard in low data rate MP3, but high data rate MP3 can sound flawless.

Specifically data rates of 64 KHz and below significantly compromise sound quality. You clearly can hear a difference between the source and the recording at low data rates. The type of music and its inherent amount of compression has a big influence on whether you can hear the difference, but in general low data rates are only acceptable for speech -- not music.

However, when you get to 128 KBPS and higher data rates, it becomes quite difficult to hear any difference between the source and the recording. In my tests with groups of "golden ear" audiophiles, most could not detect a difference between the source and the recording when a 128 KBPS data rate was used.

However, 128 KBPS is not quite perfect. The ability to hear faults is highly dependent on the source material. Typical "pop" recordings with lots of percussive sounds generally sound perfect at 128 KBPS. The most critical material was quiet orchestral works. A sustained piano note was the most taxing test and most listeners could hear a slight difference at 128 KHz data rates.

If you want MP3 recordings to sound exactly like the source, you must use 192 KBPS or higher. 192 KBPS is considered "CD quality." No human can hear any difference between an MP3 at that rate (or higher) and a CD.

Note that because most audiophile tests fail to control the variables in their testing that they are easily deceived. For example, they often hear differences in the sound and blame those differences on the recording format, when in most cases, the format is not responsible for the differences.

If you have not already done so, it is essential that you read my "Testing White Paper" to understand this. When I say that no human can hear any difference between an MP3 at 192 KHz, this is true -- but only when the listening test properly controls all the variables.

Some of the better on-line music sources like www.pandora.com give you the option of selecting MP3 at 192 KBPS. Other on-line music sources only use high data rates. For example, www.mog.com streams at 320 KBPS. So be sure you check out the data rate used by your favorite on-line music source. Rest assured that if you use a suitably high data rate that the sound quality will be essentially perfect.

Some audiophiles believe that MP3 sounds badly. But the truth is that it sounds exactly like the source as long as the data rate is 192 KBPS or higher. This is a good thing because the future of music is on-line and music services will continue to use MP3 to conserve data.

In summary, you do not need to buy hi-rez recordings. CD quality is so accurate that you can't hear any difference between a properly-recorded CD and the original microphone feed. Therefore it is impossible to get better sound from any other source. But since the future of music is on-line, you will be getting your music in MP3 format. As long as it has a high data rate, it too will sound flawless.

Technical White Papers

Digital Recording White Paper