Generally speaking, the most popular speech evaluation technologies on the market are based on statistical models. For example, say "hello" in front of children and tell them that this is the standard pronunciation. Children will gradually learn how to pronounce "hello". When someone reads the word "hello" again, the child will know whether it is accurate or not. Treat the computer in the same way, collect standard pronunciation and input it into the computer, and the computer will analyze the sound signal into numbers, which we call features. When the computer compares the features of the same word from different speakers, it will find that many features (numbers) are very different, and then it will be eliminated. With the accumulation of more and more data, features unrelated to pronunciation features themselves will be eliminated, and the retained feature data is called acoustic model, that is, the correctness of words. Then when learners practice the pronunciation of this word, they will extract the corresponding features and compare them with the acoustic model, so as to get the score.
As far as I know, Iflytek and Chisheng have developed very well in science and technology. There are many projects in Iflytek, and voice evaluation is one of them. Chisheng is relatively more focused on the research of voice evaluation technology.
I hope my answer can help you. It is not easy to type so many words. Please kneel down and ask for adoption!