TTS Synthesis with the MBROLA algorithm

Together with three other TTS systems based on the same diphone database, The Multi-Band Re-synthesis OverLap-Add (MBROLA) concatenative synthesizer demonstrated here has been used for a general comparison of the use of speech models in the context of TTS synthesis, in :

"High Quality Text-To-Speech Synthesis : A Comparison of Four Candidate Algorithms", T. DUTOIT, Proc. ICASSP'94, Adelaide, Australia, 19-22 April 1994, vol. 1, pp. 565-568. (Postscript file of a draft version : 36 Kb).

The MBROLA TTS synthesizer has recently been made available to internet users in the context of the MBROLA Project.

Demo files (16 kHz/16 bits - SUN AU format)

IMPORTANT : It should be emphasized that, in order to test the segmental quality of this concatenation-based synthesizer independently of suprasegmental effects, we have provided it with prosodic information directly stylized from natural pronunciation of the text.

For example, "bonjour.raw" was obtained from the following input file :

_ 51 25 114
b 62
on 127 48 170
j 110 53 116
ou 211
r 150 50 91
_ 91

Each line contains a phoneme name, a duration (in ms), and a series (possibly none) of pitch pattern points composed of two integer numbers each : the position of the pitch pattern point within the phoneme (in % of its total duration), and the pitch value (in Hz) at this position. Hence, the first line of bonjour.pho :

_ 51 25 114

tells the synthesizer to produce a silence of 51 ms, and to put a pitch pattern point of 114 Hz at 25% of 51 ms. Pitch pattern points define a piecewise linear pitch curve.

Last updated December 17, 1999, send comments to