Current location - Music Encyclopedia - QQ Music - Masking effect
Masking effect

If you are interested in audio and video related knowledge, you can subscribe to my topic Video Player and Audio and Video Basics.

The so-called masking effect refers to the phenomenon that the auditory perception of a weaker sound is affected by another stronger sound. We call it the "masking effect" of the human ear. The "masking effect" plays an important role in practical acoustic applications.

We assume that in a quiet environment, the threshold for hearing sound A is 30dB. If sound B can be heard at the same time, the threshold of sound A is increased to 40dB due to the influence of B, that is, compared with It turned out to be an improvement of 10dB. At this time, we call B the masking sound and A the masked sound. The number of decibels by which the masked hearing threshold is increased is called the masking amount, that is, the above 10dB is the masking amount, and 40dB is called the masking threshold.

Masking can be divided into frequency domain masking and time domain masking.

In fact, the masking effect is not just a volume problem, because when the frequencies of the masking sound and the masked sound are different, the masking effect is not so serious. But a loud pure tone can easily mask another pure tone with a higher frequency.

A strong pure tone will mask the weak pure tone that sounds at the same time near it. This characteristic is called frequency domain masking, also known as simultaneous masking (simultaneous masking), as shown in Figure 1.

As can be seen from Figure 1, the sound with a sound frequency near 300 Hz and a sound intensity of about 60 dB masks the sound with a sound frequency near 150 Hz and a sound intensity of about 40 dB. Another example is a pure tone with a sound intensity of 60 dB and a frequency of 1000 Hz, and another pure tone of 1100 Hz. The former is 18 dB higher than the latter. In this case, our ears can only hear the 1000 Hz strong sound. If there is a pure tone of 1000 Hz and a pure tone of 2000 Hz whose sound intensity is 18 dB lower than it, then our ears will hear both sounds at the same time. If you want to make the pure tone of 2000 Hz inaudible, you need to reduce it to 45 dB lower than the pure tone of 1000 Hz. Generally speaking, the closer a weak pure tone is to a strong pure tone, the easier it is to be masked.

The set of curves in Figure 2 respectively represent the masking effect of pure tones with frequencies of 250 Hz, 1 kHz and 4 kHz. Their sound intensity is 60 dB. It can be seen from Figure 2:

1) The masking effect on other pure tones is most obvious near 250 Hz, 1 kHz and 4 kHz pure tones.

2) Low-frequency pure tones can effectively mask high-frequency pure tones, but the masking effect of high-frequency pure tones on low-frequency pure tones is not obvious.

Since the relationship between sound frequency and masking curve is not linear, in order to uniformly measure sound frequency from a perceptual perspective, the concept of "critical band" is introduced. It is generally believed that there are 24 critical frequency bands in the range of 20 Hz to 16 kHz. The unit of critical frequency band is called Bark.

1 Bark = the width of a critical frequency band

f When (frequency) < 500 Hz, 1 Bark≈f/100

When f (frequency) > 500 Hz, 1Bark≈9 + 4log(f/1000)

Above we discussed loudness, pitch and masking effects, especially human subjective perception. Among them, the masking effect is particularly important and is the basis of psychoacoustic models.

In addition to the masking phenomenon between sounds emitted at the same time, there is also a masking phenomenon between sounds that are adjacent in time, and it is called temporal masking. Time domain masking is divided into pre-masking and post-masking, as shown in Figure 3. The main reason for temporal masking is that it takes a certain amount of time for the human brain to process information. Generally speaking, lead masking is very short, only about 5 to 20 ms, while lag masking can last 50 to 200 ms.

In addition to frequency domain masking and time domain masking, there is also an effect called "temporal masking".

The synchronic masking effect is related to the frequency and relative volume of sounds of different frequencies, while temporal masking is only related to time.

If two sounds are very close in time, we will have difficulty distinguishing them. For example, if a very loud sound is followed by a very weak sound, the latter sound will be difficult to hear. But if the second sound is played some time after the first sound has stopped, the latter sound can be heard. Generally speaking for pure tones, this interval is 5 milliseconds. Of course, if the timing is reversed, the effect is the same. If a lower sound appears before a higher sound, and the interval is very short, we will not hear the lower sound.

The masking effect means that the human ear is only sensitive to the most obvious sounds, and is less sensitive to insensitive sounds. For example, in the entire frequency spectrum of sound, if the sound in a certain frequency band is relatively strong, people will be insensitive to sounds in other frequency bands.

Applying this principle, people invented compressed digital music formats such as mp3. In these format files, only the mid-frequency band sounds to which human ears are more sensitive are recorded, while the higher and lower frequencies are recorded. Frequency sounds are simply recorded, thus greatly reducing the required storage space.

MP3 users can specify how many bits are used to store each second of music. The MP3 codec only cares about the relationship between frequencies and volume.

During the encoding process, the "unwanted components" in the signal are compared to a mathematical model of human psychoacoustics and the bit rate used for compression to determine which data to throw away. The current bit rate used for MP3 compression is generally 128kbps. The encoder will take this number into account when outputting each frame of data. If the bit rate is relatively low, then the definition of "irrelevant" and "redundant" data will be relaxed, resulting in a large amount of data being considered useless data. This The compressed audio will lose a lot of details, resulting in a decrease in sound quality. In contrast, if a higher bitrate encoding is used, the criteria for "irrelevance" and "redundancy" are more stringently defined, and details are preserved, but the file size is larger.

In addition, the auditory masking effect is also widely used in the electroacoustic field. For example, dynamic noise reduction is designed based on the different principles of noise masking for different programs.

The masking effect is not only a physiological phenomenon of hearing, but also a psychological phenomenon, of which the "cocktail effect" is one example. The cocktail effect means that when the attention is very concentrated, or for relatively familiar sounds, the human hearing can selectively listen to the sounds they want to hear under severe masking noise. In a cocktail party where many people gather, you can hear the speech of a specific person most clearly. This also has many applications in actual recording.