Chip (chip) is the collective name for semiconductor component products. It is the carrier of integrated circuit (IC, integrated circuit) and is divided into wafers.
A silicon wafer is a small piece of silicon that contains an integrated circuit and is part of a computer or other electronic device. Voice chip definition: Convert voice signals into numbers through sampling, store them in the ROM of the IC, and then restore the numbers in the ROM to voice signals through circuits.
According to the output mode of the voice chip, it is divided into two categories. One is the PWM output mode and the other is the DAC output mode. The PWM output volume is not continuously adjustable and cannot be connected to an ordinary power amplifier. Currently, most of the output modes on the market Most voice chips use PWM output mode. The other is that the DAC is amplified by the internal EQ. The sound of the voice chip is continuously adjustable, can be digitally controlled and adjusted, and can be connected to an external power amplifier.
The playback function of ordinary voice chips is essentially a DAC process, and the ADC process data is completed by the computer, including sampling, compression, EQ and other processing of the voice signal.
The recording chip includes two processes, ADC and DAC, both of which are completed by the chip itself, including steps such as voice data collection, analysis, compression, storage, and playback.
ADC=Analog Digital Change
DAC= Digital Analog Change
The sound quality depends on the number of ADC and DAC bits . For example: 20 seconds to 340 seconds, the minimum is from 10 seconds to 340 seconds. Intuitively from the name, the voice chip is a chip related to voice. Voice is the stored electronic sound. Any chip that can emit sound is a voice chip. Commonly known as sound chip, more accurately in English it should be Voice IC. In the large family of voice chips, they can be divided into two types according to the type of sound: (Speech IC) and (Music IC). This should be regarded as the professional distinction between voice chips. Methods. Mask production. In layman's terms, mask production means burning the sound into the chip first, and then encapsulating it. Generally, there are quantitative requirements.
otp production. The so-called otp means one-time burning. First package the chip, and then use software to burn in the sound.
Voice chips have multiple channels based on the physical structure of the IC itself (sending sounds from multiple channels at the same time) and can be divided into multiple types:
First, single channel:
1. Single-channel speech IC (Speech IC) (this kind of speech chip does not support music IC music storage method); Common speech ICs are single-channel speech chips, DKC020-OTP20 seconds and DKA010 animal call Audio is the most typical single-channel voice chip.
2. Single-channel music IC (Music IC) can only emit one kind of music in the same unit of time. There is only one electronic sound file. The .Mid suffix file of the channel.
The often mentioned monophonic chip is the most basic music IC. The effect of the monophonic chip is determined by the number of notes output within a certain period of time. There are 64 notes. There are many, 128 notes, etc. Monophonic chips have a wide range of applications and are extremely cheap. The most common ones are monophonic chips and Happy Birthday card monophonic chips. Typical ones are DK20S, etc.
Strictly speaking, monophonic chips are The structures of the channel music IC and the monophonic chip are different
Two, 2-channel:
1, 2-channel voice IC, 2-channel and multi-channel voice Chip, in actual applications, voice playback is generally fixed in a certain channel for sound playback (equivalent to a single channel), but this type of product is more expensive than a single-channel voice IC (Speech ic), and the price will be higher. In order to balance product price and application during design, voice chip manufacturers generally do a more perfect job in functional support and sound effects.
This structure may be due to the actual application of products and solutions. Determined by the field and price, voice chip output is generally single-channel sound output, and there are very few products that support stereo. For high-end products, you must choose solutions such as MP3 main control chips
2. A 2-channel music chip, commonly known as a Music With Dual Tone IC. As the name suggests, two channels of music IC can emit music in the same unit of time. The electronic sound source file is generally .Mid. Channel file. Common Christmas series music ICs such as: .
I would like to add a few words here. There is also a music chip called melody on the market. What is its definition? To put it simply, it is better than a single music chip. The effect of the sound chip is better than that of the polyphonic music chip, so the dual sound chip is also called the melody music chip. The melody structure should be said to be a more advanced single sound chip, or it can be said that It is a monophonic piece with twice the effect.
Three, 4 channels, 8 channels or more:
Sounds with more than three channels. Also known as polyphonic music. Often referred to as 4 chords Music IC refers to a 4-channel music IC, such as DKC040...
Generally, multi-channel voice chips support both music IC (Music IC) and voice IC (Speech IC) functions.
p>
(a) "Voice chip" introduction:
(1) Quantification of speech signal
Sampling rate (f), number of bits (n), baud rate (T)
Sampling: Convert voice analog signals into digital signals.
Sampling rate: the number of samples per second (byte).
Baud rate: the number of bits sampled per second. The baud rate directly determines the sound quality. Bps: bit per second
The number of sampling bits refers to the number of bits under binary conditions. Generally, unless otherwise specified, the number of sound sampling bits refers to 8 bits, ranging from 00H to FFH, and mute is set to 80H.
(2) Sampling rate
Nyquist Sampling Theorem (Nyquist Law): To restore the original signal from the sampled signal without distortion, the sampling frequency should be greater than 2 times the maximum signal frequency. When the sampling frequency is less than 2 times the highest frequency of the spectrum, the spectrum of the signal has aliasing.
When the sampling frequency is greater than 2 times the highest frequency of the spectrum, the spectrum of the signal has no aliasing.
The frequency bandwidth of the voice is about 20~20KHZ, and the ordinary voice is about 3KHZ or less. Therefore, the general sound quality of CDs is 44.1K and 16bit. If you encounter some special sounds, such as musical instruments, the sound quality may also be 48K and 24bit, but it is not mainstream.
Generally when we deal with ordinary voice ICs, the sampling rate is up to 16K, and the speaking sound is generally 8K (such as telephone sound quality) or about 6K. The effect is poor below 6K. The DKC series voice chip sampling can reach 22K.
In the process of applying microcontrollers, the higher the sampling rate, the faster the timer interrupt speed, which will affect the monitoring and detection of other signals, so it must be considered comprehensively.
(3) Voice compression technology.
Due to the huge amount of voice data, it is necessary to effectively compress the voice data, which allows us to record more voice content in the limited ROM space. There are several ways:
Voice segmentation: intercept the repeatable parts of the speech, and play back the content completely through arrangement and combination.
Voice sampling: Generally, the frequency response curve of the speakers we use is in the mid-frequency part, and high frequencies are rarely used. Therefore, when the speaker sound quality is acceptable, the sampling frequency can be appropriately reduced to achieve a compression effect. This process is irreversible and cannot restore the original appearance, so it is called lossy compression.
Mathematical compression: It mainly compresses the number of sampling bits. This method is also lossy compression. For example, the ADPCM compression format we often use compresses voice data from 16 bits to 4 bits, with a compression rate of 4 times. MP3 compresses data streams and involves data prediction. Its baud rate compression ratio is about 10 times.
Usually, the above compression methods are used in combination.
(4) Commonly used voice formats
PCM format: Pulse Code Modulation, which samples the sound analog signal to obtain quantized voice data, is the most basic and original A speech format. Very similar to it are RAW format and SND format. They are all speech-only formats.
WAV format: Wave Audio Files is a sound file format developed by Microsoft, also called waveform sound file, and is widely supported by Windows platforms and their applications. The WAV format supports many compression algorithms and supports a variety of audio bits, sampling frequencies and channels. However, the WAV format requires too much storage space and is not convenient for communication and dissemination. Each piece of data stored in the WAV file has its own independent identifier. These identifiers can tell the user what data it is. These data include sampling frequency and number of bits, mono or stereo, etc.
ADPCM format: It uses several past sample values ??to predict the current input sample value, and makes it have an adaptive prediction function to compare with the actual detection value, and compare the measured difference at any time Automatically process the quantization level difference so that it always changes synchronously with the signal. It is suitable for situations where the voice change rate is moderate and the sound playback process is brief. Its advantage is that the processing of human voices is relatively realistic, generally reaching more than 90%, and has been widely used in the field of telephone communications.
MP3 format: Moving Picture Experts Group Audio Layer III, referred to as MP3. It uses MPEG Audio Layer 3 technology and adopts an encoding algorithm called "sensory encoding technology": when encoding, the audio file is first analyzed for spectrum, then a filter is used to filter out the noise level, and then the remaining audio is quantized. Each bit below is scattered and arranged, and finally an mp3 file with a higher compression ratio is formed, so that the compressed file can achieve a sound effect closer to the original sound source during playback.
Its essence is that vbr (Variant Bitrate variable baud rate) can dynamically select an appropriate baud rate based on the encoded content, so the encoding result ensures sound quality while taking care of the file size.
MP3 compression rate is 10 times or even 12 times. It is a high compression rate voice format that first appeared.
Linear Scale format: According to the change rate of the sound, the sound is divided into several segments, and each segment is compressed using a linear ratio, but its ratio is variable.
Logpcm format: basically linearly compresses the entire sound, removing the last few bits. This compression method is easy to implement on hardware, but the sound quality is worse than Linear Scale, especially when the volume is smaller and the sound is more delicate. Mainly used for pure speech. mid format. Mid-format voice occupies a relatively small space, and sometimes more than ten pieces of mid-format music can be loaded into a chip in just 20 seconds.
(b) Introduction to "Music Chip":
(1) Music channels and timbres:
Envelope (envelope) square wave (patch) channel (channel)
Envelope: part of a synthesized timbre, unit Changes in note output within time are commonly seen in "ADSR"
Square wave: a part of the synthesized timbre, changes in note square wave current per unit time. (See also triangle wave, etc.)
Channel: At the same time, the number of notes output by the chip, that is, the number of "monophonic instruments".
PCT: A type of analog timbre that simulates the pitch of each note by sampling 256 points of musical instrument sounds. (The sound is soft, takes up little space, but is not realistic enough)
FULL WAVE: Simulates the pitch of each note by collecting the sound of an instrument. (The sound of musical instruments is real, but it takes up a lot of space and has high requirements for collecting timbre and sound quality)
(2) Compression of music:
Due to the huge amount of music data, the music data needs to be effectively compressed It is very necessary to enable us to record more music content in the limited ROM space. There are several ways:
Music segmentation: Cut out the repeatable parts of the music, and play back the content completely through arrangement and combination.
Tone: Determine the selection of Full wave, PCT, and dual tone according to the fullness and demand of the music. The space occupied by each timbre is not clear, and the timbre quality is also different.
Mathematical compression: It mainly compresses the sampled timbre (Full wave). This method is also lossy compression. It performs downsampling and processing on the timbre to be collected to reduce the size of the collected timbre (same as Phonetic modification). The voice chip is the visualization of the expression, represented by the length of the voice
a) Ordinary voice chips use a 6K sampling rate to calculate the voice length, and the maximum sampling rate is 22K.
b) The recording IC uses a 6K sampling rate as the voice length calculation standard.
That is: the length that the chip can play with a 6k sampling rate. The cost of chips of the same variety is directly proportional to the size of the chip.
a) The allocation of I/O ports and the size of ROM (voice seconds) determine the chip cost. Low-second voice chips have fewer I/O ports.
b) The sound quality is improved, the sampling is improved, and the voice seconds are shortened.
The sound quality is reduced, the sampling is reduced, and the voice seconds become longer M---ROM size (bit) n*f---Baud rate
Introduction to sound processing software
1) SoundForge
2) Cooledit
3) goldwave
4) Calewalk
Song: Te