In digital audio, 44,100 Hz (alternately represented as 44.1 kHz) is a common sampling frequency. Analog audio is often recorded by sampling it 44,100 times per second, and then these samples are used to reconstruct the audio signal when playing it back.

The 44.1 kHz audio sampling rate is widely used due to the compact disc (CD) format, dating back to its use by Sony from 1979.

History

edit
 
Early digital audio was recorded on U-matic video cassette tapes.

The 44.1 kHz sampling rate originated in the late 1970s with PCM adaptors, which recorded digital audio on video cassettes,[note 1] notably the Sony PCM-1600 introduced in 1979 and carried forward in subsequent models in this series. This then became the basis for Compact Disc Digital Audio (CD-DA), defined in the Red Book standard in 1980.[1]: sec. 2.6  Its use has continued as an option in 1990s standards such as the DVD, and in 2000s, standards such as HDMI. This sampling frequency is commonly used for MP3 and other consumer audio file formats which were originally created from material ripped from compact discs.

Origin

edit

The selection of the sample rate was based primarily on the need to reproduce the audible frequency range of 20–20,000 Hz (20 kHz). The Nyquist–Shannon sampling theorem states that a sampling rate of more than twice the maximum frequency of the signal to be recorded is needed, resulting in a required rate of greater than 40 kHz. The exact sampling rate of 44.1 kHz was inherited from PCM adaptors which was the most affordable way to transfer data from the recording studio to the CD manufacturer at the time the CD specification was being developed.[1]: sec. 2.6 

The rate was chosen following debate between manufacturers, notably Sony and Philips, and its implementation by Sony, yielding a de facto standard. The actual choice of rate was the point of some debate, with other alternatives including 44.1 / 1.001 ≈ 44.056 kHz (corresponding to the NTSC color field rate of 60 / 1.001 = 59.94 Hz) or approximately 44 kHz, proposed by Philips. Ultimately Sony prevailed on both sample rate (44.1 kHz) and bit depth (16 bits per sample, rather than 14 bits per sample). The technical reasoning behind the rate being chosen is associated with characteristics of human hearing and early digital audio recording systems as described below.[1]: sec. 8.5 

Human hearing and signal processing

edit

The Nyquist–Shannon sampling theorem says the sampling frequency must be greater than twice the maximum frequency one wishes to reproduce. To capture the human hearing range of roughly 20 Hz to 20,000 Hz, the sampling rate had to be greater than 40 kHz.

But to avoid aliasing when sampling, signals must first be bandlimited to within half the sampling frequency, which can be achieved with low-pass filtering. While an ideal low-pass filter (a sinc filter) can perfectly pass frequencies below 20 kHz (without attenuating them) and perfectly cut frequencies above 20 kHz, this ideal filter is theoretically and practically impossible to implement as it is noncausal, so in practice a transition band is necessary, where frequencies are partly attenuated. The wider this transition band is, the easier and more economical it is to make an anti-aliasing filter. The 44.1 kHz sampling frequency allows for a 2.05 kHz transition band.

Recording on video equipment

edit

Early digital audio was recorded to existing analog video cassette tapes, as VCRs were the only available transports with sufficient capacity to store meaningful lengths of digital audio.[note 2] To enable reuse with minimal modification of the video equipment, these ran at the same speed as video, and used much of the same circuitry. 44.1 kHz was deemed the highest usable rate compatible with both PAL and NTSC video and requiring encoding no more than 3 samples per video line per audio channel.

The sample rate is composed as follows:[note 3]

Active lines/field Fields/second Samples/line Resulting sample rate
PAL 294 50 3 294 × 50 × 3 = 44,100 Hz
B&W NTSC 245 60 3 245 × 60 × 3 = 44,100 Hz
Color NTSC 245 ≈59.94 3 245 × ≈59.94 × 3 = 44,056 Hz

NTSC has 490 active lines per frame, out of 525 lines total; PAL has 588 active lines per frame, out of 625 lines total.

edit

44,100 is the product of the squares of the first four prime numbers ( ) and hence has many useful integer factors.

Various halvings and doublings of 44.1 kHz are used – the lower rates 11.025 kHz and 22.05 kHz are found in WAV files, and are suitable for low-bandwidth applications, while the higher rates of 88.2 kHz and 176.4 kHz are used in mastering and in DVD-Audio – the higher rates are useful both for the usual reason of providing additional resolution (hence less sensitive to distortions introduced by editing), and also making the low-pass filtering easier, since a much larger transition band (between human-audible at 20 kHz and half the sampling rate) is possible. The 88.2 kHz and 176.4 kHz rates are primarily used when the ultimate target is a CD.

Other rates

edit

Several other sampling rates were also used in early digital audio. A 50 kHz sample rate, used by Soundstream in the 1970s, following a 37 kHz prototype. In the early 1980s, a 32 kHz sampling rate was used in broadcast (esp. in UK and Japan), because this is sufficient for FM stereo broadcasts, which have 15 kHz bandwidth. Some digital audio was provided for domestic use in two incompatible EIAJ formats, corresponding to 525/59.94 (44,056 Hz sampling) and 625/50 (44.1 kHz sampling).

The Digital Audio Tape (DAT) format was released in 1987 with 48 kHz sampling. This sample rate has become the standard rate for professional audio.[2] Until recently, sample rate conversion between 44,100 Hz and 48,000 Hz was complicated by the high ratio number between the rates of these as the lowest common denominator of 44,100 and 48,000 is 147:160, but with modern technology this conversion is accomplished quickly and efficiently.[3] Early consumer DAT machines did not support 44.1 kHz and this difference made it difficult to make direct digital copies of 44.1 kHz CDs using 48 kHz DAT equipment.[4]

Status

edit

Due to the popularity of CDs, a great deal of 44.1 kHz equipment exists, as does a great deal of audio recorded in 44.1 kHz (or multiples thereof). However, some more recent standards use 48 kHz in addition to or instead of 44.1 kHz.[2] In video, 48 kHz is now the standard, but for audio targeted at CDs, 44.1 kHz (and multiples) are still used.

The HDMI TV standard (2003) allows both 44.1 kHz and 48 kHz (and multiples thereof). This provides compatibility with DVD players playing CD, VCD and SVCD content. The DVD-Video and Blu-ray Disc standards use multiples of 48 kHz only.

Most PC sound cards contain a digital-to-analog converter capable of operating natively at either 44.1 kHz or 48 kHz. Some older processors include only 44.1 kHz output, and some cheaper newer processors only include 48 kHz output, requiring the PC to perform digital sample rate conversion to output other sample rates. Similarly, cards have limitations on the sample rates they support for recording.

See also

edit

Notes

edit
  1. ^ Specifically U-matic cassettes
  2. ^ Digital audio recording using a VCR as the transport and this format has been termed pseudo-video.[1]
  3. ^ In actual practice, different machines used different video standards – for example, the Sony PCM-1610 only used 525/60 monochrome video (NTSC, US), not 625/50 (PAL, Europe) or NTSC color.

References

edit
  1. ^ a b c d Watkinson, John (1989). The art of digital audio (Revised Reprint ed.). Oxford: Focal Press. ISBN 978-0-08-049936-9. OCLC 171287847.
  2. ^ a b AES5-2008 (r2013): AES recommended practice for professional digital audio - Preferred sampling frequencies for applications employing pulse-code modulation (revision of AES5-2003), Audio Engineering Society
  3. ^ Larry Jordan. "Understanding Audio Sample Rate Conversions". Retrieved 2018-05-14.
  4. ^ Henning Schulzrinne. "Explanation of 44.1 kHz CD sampling rate". Retrieved 2022-10-23.