"It would be a considerable invention indeed, that of
A Matlab tutorial toolbox on corpus-based Text-to-Speech synthesis
a machine able to mimic speech, with its sounds and articulations.
I think it is not impossible."
Leonhard Euler (1761)
Wed 14/07/2004 : After a lot of preparatory work, TTSBOX 1.0 has just been released.
Thu 07/10/2004 : Milos Cernak proposes an add-on : ttsbox1.0.milos.zip, which refines unit concatenation by smoothing the clicks in the signal. Listen to s3*.wav examples in the add-on. The concatenation uses pitchmark files (in a .\pm directory), obtained with the Edinburgh Speech Tools. First, the nearest pitchmarks around the boundaries of the diphones are searched, then a Hann window is used for fade-in/out smoothing.
Mon 21/03/2005 : T. Dutoit and M. Cernak presented TTSBOX in a poster session at ICASSP'05, Philadelphia. TTSBOX (the paper and the complete matlab toolbox) is included in the ICASSP'05 CDROM. Get the ICASSP 05 Poster here.
Thu 022/06/2005 : TTSBOX 1.1 : David Dorran proposes a modified concatenation function, which is now used as the standard (the old file has been renamed as "tts_concantenate_using_xorr_old" so as to let you hear the difference). It improves upon the concatenation of non-consecutive units, by : 1. extending the range of the correlation function to twice the duration of the longest likely pitch period (approx 10ms) i.e. 300 samples in a 16kHz signal and 2.preventing the correlation search range to be too small (less than 5ms).
These two procedures proved to ensure that a good overlap position is identified, and increased the final quality a lot. This is a clear example of the importance of fine tuning a good initial idea, so as to make it work optimally.