¡¡Chinese Journal of Computers   Full Text
  TitleA Non-Uniform Clustering Synthesis Instances Pruning Approach for Corpus-Based TTS
  AuthorsZHANG Wei1),2) WU Xiao-Ru3) LIU Jiang3) WANG Ren-Hua2)
  Address1)(Department of Computer Science, Ocean University of China, Qingdao, Shandong 266100)
2)(Department of Electronic Engineering and Information Science, University of Science £¦ Technology of China, Hefei 230027)
3)(Anhui USTC Iflytek Co., Ltd., Heifei 230088)
  Year2007
  IssueNo.11(2017¡ª2024)
  Abstract &
  Background
Abstract The employment of non-uniform does great help for Corpus-based TTS to synthesize high natural speech. But Tailoring TTS voice font, or pruning redundant synthesis instances, usually results in loss of non-uniform. In order to solve this problem, this paper proposes the algorithm named NuClustering-VPA. According to this algorithm, the high level non-uniforms containing same syllables are clustered to several centers, then the centers are projected to low level non-uniforms. Therefore, the center¡¯s projections can guide the clustering of low level non-uniforms. These series of processes avoid erasing or destroying those key non-uniforms for synthesis. In experiments, the naturalness scored by MOS does not severely degrade when reduction rate is above 39.63%. And this approach has been applied in software products of Ifytek Co. Ltd.

keywords Corpus-based TTS; Tailoring TTS voice font; pruning redundant synthesis instances; scalable TTS

background This work is from the project of Scalable Speech Synthesis (Text to Speech) System, which is supported by the National Natural Science Foundation of China (60602017) and the National High Technology Research and Development Program (863 Program) of China (2004AA114030). The research aims at the theories and key techniques of pruning corpus redundance and of making the TTS system scalable to hardwares. The iflytek research team who carry out this project has constructed a Corpus-Based Continuous Chinese-English Text-to-Speech Engine, which is awarded best performance in the routine national ¡°863¡± evaluation.
This paper solves the question of Tailoring TTS font or pruning redundant synthesis instances without severely degrading naturalness scored by MOS. Thus the result can be used to analysis redundance of large corpus and to shrink database of synthesis instances according to hardwares.