The MBROLA Algorithm

The MBROLA synthesizer uses a synthesis method known itself as MBROLA. MBROLA has not been published yet, but it is inspired from the mbr-psola algorithm detailed in :

T. DUTOIT, H. LEICH, "MBR-PSOLA : Text-To-Speech Synthesis based on an MBE Re-Synthesis of the Segments Database", Speech Communication, Elsevier Publisher, November 1993, vol. 13, n03-4.

and with more details in :

An Introduction to Text-To-Speech Synthesis,
T. DUTOIT, Kluwer Academic Publishers, Dordrecht
Hardbound, ISBN 0-7923-4498-7 April 1997, 312 pp.

The MBROLA algorithm produces speech by concatenating elementary speech units called diphones. Unlike other time domain methods such as PSOLA-TD (TM of France Telecom), it makes use of a diphone database specially adapted to the requirements of the synthesizer, and obtained after a complex processing (actually, a hybrid Harmonic/Stochastic analysis-synthesis) of an original diphone database (i.e., a database composed of speech samples). The resulting synthesis technique takes advantage of the flexibility of parametric speech models while keeping the computational simplicity of time-domain synthesizers. As a result, it compares favourably with others for the following reasons :

  • Its computational complexity has been kept as low as 7 operations/sample on the average, while enabling the synthesizer to apply spectral smoothing in the time-domain between neighbouring segments, a unique feature of MBROLA. MBROLA, indeed, is a time-domain algorithm with outstanding diphone smoothing capabilities, due to the particular format of the diphone database.
  • As a result, the fluidity of MBROLA-based speech is enhanced, so that even diphones (as opposed to, e.g., triphones) produce high quality synthetic speech. Yet, there is no need to optimize the original diphone database through the classical and painstaking trial and error operation which consists of rejecting and re-recording bad diphones (i.e. segments which introduce important discontinuities when concatenated to others). The quality available with the French database FR1, for instance, was obtained in one go. This feature is precisely the one which makes MBROLA the best candidate for multi-lingual synthesis by use of diphone databases developped all over the world.

Last updated December 17, 1999, send comments to Mbrola Team