Current location - Music Encyclopedia - QQ Music - Detailed explanation of audio knowledge (1)
Detailed explanation of audio knowledge (1)

In real life, the sounds we hear are time-continuous. We call this signal an analog signal. Analog signals need to be digitized before they can be used in computers.

At present, we need to rely on audio files for audio playback on the computer. The generation process of audio files is the process of sampling, quantizing and encoding sound information to produce digital signals. The lowest frequency of the sound that the human ear can hear is from 20Hz to the highest frequency of 20KHZ, so the maximum bandwidth of the audio file format It is 20KHZ. According to Nyquist's theory, only when the sampling frequency is higher than twice the highest frequency of the sound signal, the sound represented by the digital signal can be restored to the original sound. Therefore, the sampling rate of audio files is generally 40~50KHZ, such as the most common The CD quality sampling rate is 44.1KHZ.

Sampling: The wave is infinitely smooth. The process of sampling is to extract the frequency value of certain points from the wave, which is to digitize the analog signal. As shown in the figure below:

Sampling frequency: the number of samples of the analog signal per unit time. The higher the sampling frequency, the more realistic and natural the sound will be restored, but of course the larger the amount of data will be. Sampling frequencies are generally divided into three levels: 22.05KHz, 44.1KHz, and 48KHz. 8KHz - the sampling rate used by telephones, which is sufficient for human speech. 22.05KHz can only achieve the sound quality of FM broadcasts (suitable for speech and medium-quality music). 44.1KHz is the most common sampling rate standard. In theory The limit of CD sound quality is 48KHz, which is more accurate (the human ear cannot distinguish sampling frequencies higher than 48KHz, so it has little use value on computers).

Sampling bits (also known as quantization level, sample size, quantized data bits): the data range that each sampling point can represent. The number of sampling bits is usually 8 bits or 16 bits. The larger the number of sampling bits, the more delicate the changes in sound can be recorded, and the corresponding amount of data is larger. 8-bit word length quantization (low quality) and 16-bit word length quantization (high quality), 16 bit is the most common sampling precision.

Quantization: The process of expressing the amplitude of a sampled discrete signal as a binary number is called quantization. (Quantification in daily life means setting a range or interval, and then seeing the data collected within this condition).

PCM: PCM (Pulse Code Modulation), which is pulse code modulation, performs a sampling and quantization process on sound without any encoding or compression processing.

Encoding: The sampled and quantized signal is not yet a digital signal. It needs to be converted into digital encoded pulses. This process is called encoding. The binary sequence formed after analog audio is sampled, quantized and encoded is a digital audio signal.

Number of channels: The number of channels refers to the number of speakers that can produce different sounds. It is one of the important indicators for measuring audio equipment.

Code rate: (also called bit rate, bit rate) refers to the amount of information that can pass through per second in a data stream, and represents the compression quality. For example, the commonly used bit rates for MP3 are 128kbit/s, 160kbit/s, 320kbit/s, etc. The higher the bitrate, the better the sound quality. The data in MP3 consists of ID3 and audio data. ID3 is used to store common information such as song title, artist, album, audio track, etc.

Audio frame: Audio data is streaming, and there is no clear concept of a frame. In actual applications, for the convenience of audio algorithm processing/transmission, it is generally agreed to take 2.5ms~60ms as The unit amount of data is one frame of audio. This time is called the "sampling time", and there is no special standard for its length. It is determined based on the needs of the codec and specific application.

Analog signal -> Input device (transmit voltage value) -> Sound card (after sampling and quantization (i.e. setting various values ??such as sound size)) -> Disk (file) -> Sound card -> Output device ->Analog signal

Our sound is physically represented by waveforms, then we call these waveforms analog signals. Our computer disk can only store the format of (01010101). We convert the analog signal into a format (010101) that can be stored on a disk and call it a digital signal. This conversion process is called analog-to-digital conversion.

The sound (analog signal) we emit is continuous. If we want to continuously convert the analog signal, the digital signal generated will be very large. Then we have to sample, and the sampling accuracy is the number of times the computer samples the analog signal per second. The most common sampling accuracy is the 44.1khz/s mentioned above. This is the data obtained by masters after many years of research. If it is lower than this data, the effect will be very poor, and if it is higher than this data, the difference in effect will not be very obvious. .

After sampling, it becomes (0101010110100101...). The volume of the sound has a size. So how does this string of data represent the size of the sound? This involves bit rate, which refers to the amount of information that can pass through in a data stream per second. Bit rate is how many levels the sound volume is divided into. For example: 8 bits, in binary, means there are 8 bits, and the decimal value represented is 0 (00000000) ~ 256 (11111111), then each value represents a sound size.

After sampling, quantization, encoding, it is converted into a digital signal and then stored as a file.

The file is used to store digital signals. The file includes the bit rate, sampling rate, channel, encoding method, and the encoded digital signal.

A file format is a name specified by the manufacturer. Each file format specifically supports several encoding formats. An analogy is that a file is a container, which can hold different kinds of water, some can hold one kind, and some can hold several kinds.

The sampled digital signal is very large. Sometimes we don’t need such a large signal, so we have to perform encoding and compression. Of course, the compression technology is lossy. Without greatly affecting the audio effect, discard some high-frequency or low-frequency data.

Encoding format can be understood as different encoding and decoding methods for each audio format.

The encapsulation format is the file format, and the encoding is the encoding format.

After understanding the basic concepts, we can list a classic audio playback process (taking MP3 as an example):

In the iOS system, Apple encapsulates the above process and Provides interfaces at different levels (pictures are quoted from official documents).

The following is a functional description of the mid-to-high-level interfaces:

It can be seen that the interface types provided by Apple are very rich and can meet the needs of various categories:

< p> /p/5c5e95d89c4f The writing is quite good

/p/423726cc9090 The knowledge points are very comprehensive

/p/b3db09fb69dc The summary is very good

/p/ a75f2411225f is a bit professional and understands part of it

/liusandian/article/details/52488078 The concept is very clear and easy to understand