Voiceprint identification technology automatically recognizes speaker characteristics

Research on speaker recognition began in the 1930s. With the continuous improvement of research methods and tools, research on speaker recognition has gradually got rid of the early simple model of human hearing. L. G. Kesta of Bell Labs used visual observation of the spectrogram method to identify and proposed the concept of “voiceprint”. China's voiceprint identification technology started late. In the late 1980s, the Ministry of Public Security (now the Ministry of Public Security's physical evidence appraisal center) introduced the United States' DSP5500 sound spectrometer to carry out scientific research and inspection practices for voiceprint identification. In 1992, the Ministry of Public Security Material Evidence Verification Center completed a ministerial-level research project “Research on the Application of 5500 Plotters in the Identification of Acoustic Lines”. In 2001, the Center undertook the National Key Project for Science and Technology in the Ninth Five-Year Plan. "Recognition system research" has passed acceptance and developed a VS99 voice workstation with independent intellectual property rights, which indicates that China's voiceprint identification technology is maturing.

The project “Soundprint identification and automatic recognition technology research” was completed by the Ministry of Public Security Physical Evidence Identification Center and other units. Its main research achievement is to implant the voiceprint automatic recognition function into the VS99 speech workstation. This system can automatically analyze and judge the speaker characteristics. The display and measurement of sonograms can be combined with expert appraisals to determine the speaker's identity, which is suitable for the practical application of forensic science. This project has developed a very practical speech workstation that integrates the sound spectrometer and the speaker automatic identification system in the current voiceprint appraisal work, which greatly improves the accuracy of the conclusions and provides a practical system for voiceprint identification.

â—† Innovative Technology:

1. The effect of anti-noise processing noise on test results is a problem that cannot be ignored. In this system, for non-stationary noise, the researchers proposed an SS method that uses the HMM of the even-numbered frame segment feature input combined with the smoothing of the time direction to improve the robustness of the Chinese continuous speech recognition system in noisy environments. The method to obtain better recognition results.

2. Voice endpoint detection Endpoint detection can avoid false actions caused by noise and misidentification caused by noise. It is of great significance for accurately detecting the start of voice signals and improving the accuracy of the recognition system. The use of the traditional voice endpoint detector SAD can easily cause missed voice activation detection. In addition, large interference signals may be considered as the activation of voice, causing false detection of voice activation. To overcome this shortcoming, the researchers used a correlation-based speech activation detector to define an effective correlation function, found a method to determine the threshold of discrimination, and methods to prevent missed detection and false detection.

3. Identification Algorithm This system uses an optimization algorithm based on the GMM model.

(1) Modified GMM model training method The experiment found that the EM algorithm has significant defects in singular arrays, and the maximum likelihood estimation (ML), although the recognition rate is relatively low, but there is no singular array. Therefore, the researchers used the maximum likelihood estimation (ML) model as the initial model, and then used the EM algorithm for each step of the model to correct the correction ratio by using the α value to correct it, and called the improved EM algorithm.

(2) GMM model optimization algorithm based on genetic algorithm Researchers have improved the traditional genetic algorithm and used it in GMM parameter optimization to greatly improve the optimization degree of the model.

(3) Optimization of speaker recognition methods for GMM Researchers have proposed a new optimized GMM-based speaker recognition scheme by first making a specific change in the likelihood of each frame of a model corresponding to a single pronunciation. Then calculate the total likelihood of the syllable, that is, the total score of the syllable corresponding model, denoted by Sc, and the speaker corresponding to the model to which the largest Sc belongs is the target speaker.

â—†Social Benefits:

At present, the national “Ninth Five-Year Plan” research achievement VS99 voice workstation completed by the National Bureau of Physical Examination and Identification Center of the Ministry of Public Security has been popularized in China and has played an important role in the actual handling of cases. The project is based on VS99 to increase the automatic identification function, thereby further improving the efficiency of the case and the accuracy of the identification.

The automatic identification system for voiceprint identification developed by this project has complete independent intellectual property rights and strong practicability. It is ideally suited to the actual needs of public security work. A large number of suspects can be investigated in investigations, which can effectively provide investigation directions and narrow the investigation scope. Improve work efficiency. At the same time, the system has a real-time display of speech maps, which is suitable for speech signal acquisition in mobile technology. Since 2002, 200 cases have been actually tested and identified. The types of cases include criminal, economic, civil, and public security cases. From the conclusion of the case feedback and court trial results, the positive judgment rate was 100%.

Posted on