A Matlab tutorial toolbox on corpus-based Text-to-Speech synthesis
|Download the zipped toolbox (version 1.1) : 8MB|
|Add-on for another way of smoothing concatenation, by M. Cernak (uses pitch marking)|
TTSBOX is a Matlab toolbox for teaching Text-to-Speech synthesis to undergraduate and graduate students. It was designed with the hope that it can help to increase the personal involvment of students in their TTS courses. I imagined it when teaching TTS in the EPFL post-graduate course in computer science "Language and Speech Engineering", and later involved my own graduate students at FPMs (Belgium) in its design.
TTSBOX performs the synthesis of Genglish (for "Generic English"), an imaginary language obtained by replacing English words by generic words. Genglish therefore has a rather limited lexicon, but its pronunciation maintains most of the problems encountered in natural languages. TTSBOX uses simple data-driven techniques (Bigrams, CARTs, NUUs) while trying to keep the code minimal, so as to keep it readable for students with reasonable MATLAB practice.
As the Chinese proverb says : "Tell me and I'll forget. Show me and I'll remember. But involve me and I'll understand."
Text-to-Speech synthesis, however, is a complex combination of language processing, signal processing, and computer science. Students are therefore usually introduced to it in a top-down approach, emphasising problems to be solved and introducing solutions on paper, but with little real practice : designing a TTS takes too much time, and modifying one is usually impossible if you did not take part in its design (yet only if it was correctly documented). Apart from the FESTIVAL TTS system, which uses SCHEME as an interactive language for letting students play with TTS basics, no real "hands on" toolbox was available, especially for engineering students (who are most often familiar with MATLAB).
If you have created MATLAB files extending TTSBOX, or designed students projects including it, please drop us a line (thierry.dutoit at fpms.ac.be). We'll create a repository of pointers and data.
I would like to express my best thanks to the people who helped me on this project. First, to some of my Masters students at FPMs, who contributed to the tools mentioned in this chapter in several ways : Mathieu Jospin et Grégory Lenoir, who initiated the Matlab programming of simple CART trees, and Julien Hamaide and Stéphanie Devuyst, who worked on the n-gram tagger (and designed part of the Genglish training and test copora). I am also indebted to Laurent Couvreur, who segmented the Genglish speech corpus using the HMM/ANN tools at TCTS Lab. Last but not least, many thanks to Milos Cernak at Slovak University of Technology (Bratislava, Slovakia) for his work on NUU synthesis of Genglish (and for pushing me to finish this project !). Thanks to David Dorran for proposing a fine-tuned xcorr-based smotting algorithm, which improves a lot the final synthesis quality!