| ¡¡ | Chinese Journal of Computers Full Text |
| Title | Solving the Pinyin-to-Chinese-Character Conversion Problem Based on Hybrid Word Lattice |
| Authors | ZHANG Sen |
| Address | (Information and Computational Sciences Research Laboratory, Beijing University of Technology, Beijing 100022) |
| Year | 2007 |
| Issue | No.7(1145¡ª1153) |
| Abstract & Background | Abstract The research and development of the Pinyin-to-Chinese-Character conversion is the core technique of Chinese Input system, Chinese speech recognition and Chinese information processing. First, the state-of-the-art of Pinyin-to-Chinese-Character conversion is briefly discussed, and its principles and shortcomings are analyzed. Then the conversion approach based on hybrid word lattice is proposed. The implementation of the main architecture is studied. The related problems with hybrid language model and the algorithms to solve the word lattice are investigated. Finally, the automatic prediction algorithm and the machine learning technology used in Chinese intelligent input systems are discussed. A prototype system realized based on the proposed approach is presented, and compared with the MS Pinyin input system in Windows XP. The experimental results show that the correct conversion rate from Pinyin to Chinese characters is significantly improved. keywords Pinyin-to-Chinese-Character conversion; n-gram language model; Markov model; word lattice; user¡¯s action background The Pinyin-to-Chinese-Character conversion is the fundamental and core technique in Chinese Input system, Chinese speech recognition and Chinese information processing. The research and development in this area have made great progress and promoted Chinese information processing theory and technology significantly since 1980s, i.e., the conversion accuracy which is the most important evaluation factor can reach 90% or higher in some environments. However, the conversion accuracy still can be improved by exploring the uses¡¯ input activities and exploiting dynamic knowledge yielded in the process of users and systems interaction. The purpose of the authors¡¯s work is to provide high performance Pinyin-to-Chinese-Character conversion approach based on large scale hybrid language models and the word lattice decoding algorithm. Hence, the proposed approach tried to integrate the dynamic information such as the recently selected context, the user¡¯s recent profile, the automatic prediction algorithm and the machine learning technology to improve the performances of the Pinyin-to-Chinese-Character conversion. This report not only contributes some new techniques for Pinyin-to-Chinese-Character conversion research, but also tests their effectiveness in Chinese Input system and Chinese speech recognition system. The research is partly supported by the National Natural Science Foundation of China under project "The Study of Non-Linear Features of Speech Based on Re-producing Kernel" with grant No.60572125. This project is to investigate new features of speech and exploit them in speech recognition with the hope of promoting the performance, especially the accuracy, of Mandarin speech recognition systems. In the early 1990s, the author started the research on Chinese information processing. In last few years, many works have been finished by his research groups. Some of their papers and reports have been published by some important domestic and oversea publications or proceedings, such as Journal of Chinese Information Processing, Journal of Software and Int. Conference on Audio, Speech and Signal Processing. |