Current location - Music Encyclopedia - QQ Music - What is a voice chip?
What is a voice chip?

Voice chip

1. Definition of voice chip: Convert voice signals into numbers through sampling, store them in the ROM of the IC, and then restore the numbers in the ROM into voice signals through circuits .

The playback function of ordinary voice chips is essentially a DAC process, and the ADC process data is completed by the computer, including sampling, compression, EQ and other processing of the voice signal.

The recording chip includes two processes, ADC and DAC, both of which are completed by the chip itself, including steps such as voice data collection, analysis, compression, storage, and playback.

ADC=Analog Digital Change, DAC=Digital Analog Change,

The sound quality depends on the number of bits in ADC and DAC. For example, in Weitron's WTV series, both ADC and DAC are 16bit, which is close to CD sound quality. Weichuang's WTB series DAC is 8bit and has ordinary sound quality.

2. Quantitative expression of speech signal

(1) Quantification of speech signal

Sampling rate (f), number of bits (n), baud rate (T)

Sampling: Convert voice analog signals into digital signals.

Sampling rate: the number of samples per second (byte).

Baud rate: the number of bits sampled per second. The baud rate directly determines the sound quality. Bps: bit per second,

The number of sampling bits refers to the number of bits under binary conditions. Generally, unless otherwise specified, the number of sound sampling bits refers to 8 bits, ranging from 00H to FFH, and mute is set to 80H.

(2) Sampling rate

Nyquist Sampling Theorem (Nyquist Law): To restore the original signal from the sampled signal without distortion, the sampling frequency should be greater than 2 times the maximum signal frequency. When the sampling frequency is less than 2 times the maximum frequency of the spectrum, the spectrum of the signal has aliasing. When the sampling frequency is greater than 2 times the highest frequency of the spectrum, the spectrum of the signal has no aliasing.

The frequency bandwidth of the voice is about 20~20KHZ, and the ordinary voice is about 3KHZ or less. Therefore, the general sound quality of CDs is 44.1K and 16bit. If you encounter some special sounds, such as musical instruments, the sound quality may also be 48K and 24bit, but it is not mainstream.

Generally, when we deal with ordinary voice ICs, the sampling rate is up to 16K, and the speaking sound is generally 8K (such as telephone sound quality) or about 6K. The effect is poor below 6K.

In the process of applying microcontrollers, the higher the sampling rate, the faster the timer interrupt speed, which will affect the monitoring and detection of other signals, so it must be considered comprehensively.

(3) Voice compression technology.

Due to the huge amount of voice data, it is necessary to effectively compress the voice data, which allows us to record more voice content in the limited ROM space. There are several methods:

Speech segmentation: intercept the repeatable parts of the speech, and play back the content completely through arrangement and combination.

Voice sampling: Generally, the frequency response curve of the speakers we use is in the mid-frequency part, and high frequencies are rarely used. Therefore, when the speaker sound quality is acceptable, the sampling frequency should be appropriately reduced to achieve the compression effect. This This process is irreversible and cannot restore the original appearance, so it is called lossy compression.

Mathematical compression: It mainly compresses the number of sampling bits. This method is also lossy compression. For example, the ADPCM compression format we often use compresses voice data from 16 bits to 4 bits, with a compression rate of 4 times. MP3 compresses data streams and involves data prediction. Its baud rate compression ratio is about 10 times.

Usually, the above compression methods are used in combination.

(4) Commonly used voice formats

PCM format: Pulse Code Modulation, which samples the sound analog signal to obtain quantized voice data, is the most basic and original A speech format. Very similar to it are RAW format and SND format. They are all speech-only formats.

WAV format: Wave Audio Files is a sound file format developed by Microsoft, also called waveform sound file, and is widely supported by Windows platforms and their applications. The WAV format supports many compression algorithms and supports a variety of audio bits, sampling frequencies and channels. However, the WAV format requires too much storage space and is not convenient for communication and dissemination. Each piece of data stored in the WAV file has its own independent identifier. These identifiers can tell the user what data it is. These data include sampling frequency and number of bits, mono or stereo, etc.

ADPCM format: It uses several past sample values ??to predict the current input sample value, and makes it have an adaptive prediction function to compare with the actual detection value, and compare the measured difference at any time Automatically process the quantization level difference so that it always changes synchronously with the signal. It is suitable for situations where the voice change rate is moderate and the sound playback process is brief. Its advantage is that the processing of human voices is relatively realistic, generally reaching more than 90%, and it has been widely used in the field of telephone communications.

MP3 format: Moving Picture Experts Group Audio Layer III, referred to as MP3. It uses MPEG Audio Layer 3 technology and adopts an encoding algorithm called "sensory encoding technology": when encoding, the audio file is first analyzed for spectrum, then a filter is used to filter out the noise level, and then the remaining audio is quantized. Each bit below is scattered and arranged, and finally an mp3 file with a higher compression ratio is formed, so that the compressed file can achieve a sound effect closer to the original sound source during playback. Its essence is that vbr (Variant Bitrate variable baud rate) can dynamically select an appropriate baud rate based on the encoded content, so the encoding result ensures sound quality while taking care of the file size.

MP3 compression rate is 10 times or even 12 times. It is a high compression rate voice format that first appeared.

Linear Scale format: According to the change rate of the sound, the sound is divided into several segments, and each segment is compressed using a linear ratio, but its ratio is variable. The Linear Scale format of SUNLINK Company and ALPHA Company is 5bit.

Logpcm format: basically linearly compresses the entire sound, removing the last few bits. This compression method is easy to implement on hardware, but the sound quality is worse than Linear Scale, especially when the volume is smaller and the sound is more delicate. Mainly used for pure speech

3. Expression of voice ROM space

The voice chip is the visualization of expression, represented by the length of the voice

a) Normal The voice chip calculates the voice length based on the 6K sampling rate.

b) The recording IC uses a 4K sampling rate as the voice length calculation standard.

4. Elements of voice chips

The cost of chips of the same type is directly proportional to the size of the chip.

a) The allocation of I/O ports and the size of ROM (voice seconds) determine the chip cost. Low-second voice chips have fewer I/O ports.

b) The sound quality is improved, the sampling is improved, and the voice seconds are shortened.

The sound quality is reduced, the sampling is reduced, and the voice seconds become longer

c) Calculation method of voice seconds: M/(n*f)

M- --ROM size (bit) n*f---Baud rate

5. Introduction to sound processing software

1) SoundForge

2) Cooledit

3) goldwave

Voice chip classification:

Common chip classifications on the market now:

Short-term chips have 10 seconds and 20 seconds , 40 seconds, 80 seconds, 170 seconds chips, for models: WTV series and ISD1700 series chips

Commonly used modules are: 6 minutes, 8 minutes, 16 minutes, 1 hour, etc. . Target model: WT588D series voice module

Long-term chips include: 340 seconds, 500 seconds, 1000 seconds, and longer than 2000 seconds. . Target models: WTV340 and ISD4000 series voice chips

Common chips include: 3 seconds to 340 seconds. Target models: WTV series, WTB series, APLUS series.

Voice chips are divided according to the type of integrated circuits. All integrated circuits related to sound are collectively called voice chips (also called voice ICs, here they should be called (called Voice IC), but among the large types of voice chips, they are divided into two types: voice IC (here it should be called Speech IC) and music IC (here it should be called Music IC).

Voice Chip supplier: Guangzhou Weichuang Electronics Li Yiping