TTS Synthesis with the TD-PSOLA algorithm

Together with three other TTS systems based on the same diphone database, The time domain pitch synchrounous overlap-add (TD-PSOLA) concatenative synthesizer demonstrated here has been used for a general comparison of the use of speech models in the context of TTS synthesis, in :

"High Quality Text-To-Speech Synthesis : A Comparison of Four Candidate Algorithms", T. DUTOIT, Proc. ICASSP'94, Adelaide, Australia, 19-22 April 1994, vol. 1, pp. 565-568. (Postscript file of a draft version : 36 Kb).
Its general organization and parameter settings are detailed in this paper.

Demo files (16 kHz/16 bits - SUN AU format)

Since this synthesizer is not particularly based on any model, no analysis algorithm has been applied. In practice, however, it is necessary to determine the position of pitch markers throughout segments. This has been achieved semi-automatically, with the help of a pitch determination algorithm and tailored signal editing tools.

IMPORTANT : It should be emphasized that, in order to test the segmental quality of this concatenation-based synthesizer independently of suprasegmental effects, we have provided it with prosodic information directly stylized from natural pronunciation of the text.

For example, "bonjour.raw" was obtained from the following input file :

_ 51 25 114
b 62
on 127 48 170
j 110 53 116
ou 211
r 150 50 91
_ 91

Each line contains a phoneme name, a duration (in ms), and a series (possibly none) of pitch pattern points composed of two integer numbers each : the position of the pitch pattern point within the phoneme (in % of its total duration), and the pitch value (in Hz) at this position. Hence, the first line of bonjour.pho :

_ 51 25 114

tells the synthesizer to produce a silence of 51 ms, and to put a pitch pattern point of 114 Hz at 25% of 51 ms. Pitch pattern points define a piecewise linear pitch curve.

Last updated December 17, 1999, send comments to