A Short Introduction to Text-to-Speech Synthesis

by Thierry Dutoit

Voice Quality research team, TCTS Lab.

As the Chinese proverb says : "Tell me and I'll forget. Show me and I'll remember. But involve me and I'll understand."

Text-to-Speech synthesis, however, is a complex combination of language processing, signal processing, and computer science. Students are therefore usually introduced to it in a top-down approach, emphasising problems to be solved and introducing solutions on paper, but with little real practice : designing a TTS takes too much time, and modifying one is usually impossible if you did not take part in its design (yet only if it was correctly documented).

You will find here a short but comprehensive introduction to state-of-the-art Text-To-Speech (TTS) synthesis, which I wrote when designnig TTSBOX, a Matlab toolbox for teaching Text-to-Speech synthesis to undergraduate and graduate students.

You can also find an older and more "top-down" introduction to TTS here.

And, for a much more detailed introduction to the subject, the reader is invited to refer to my book on TTS synthesis(Dutoit, 1996)