Welcome to the TTS research group homepage

[Introduction to TTS synthesis] [Technologies] [Projects] [Publications] [Pointers]

Projects

The Speech Synthesis Research Group of the Facult Polytechnique de Mons was created in the 90's, and produced several widely spread tools, mostly in the context of the MBROLA project, and its follow-up projects, like MBROLIGN and MBRDICO, and W. The activities of the TTS group later (2000->) evolved to the development of the EULER Project (a temptative towards a generic, open source TTS solution).
These research activities have been transfered to MULTITEL ASBL, a spin-off R&D Center of the University, which is now in charge of developing the eLiteTTS synthesizer, incorporating the LION corpus-based speech synthesis engine. In parallel, the main IP results of our R&D have been transfered to Babel Technologies, S.A..

In 2003, the activities of the group were retargetted toward the processing of voice quality effects, and the group was renamed as "VOQUAL". See the VOQUAL Group Web pages for more info.

Current R&D projects

MAGE / pHTS (2010 - 2013) - PhD Thesis Maria Astrinaki

This project is based on the HMM-Based Speech Synthesis System (HTS), a statistical parametric speech synthesis system, where vocal tract, vocal source and prosody of speech are modelled simultaneously by HMMs and the synthetic speech is generated from HMMs themselves. HTS provides intelligibility and expressivity, it is flexible, easily adapted and with small footprint but on the other hand it is not reactive to real time user input and control. Going one step further, towards on the fly control over the synthesised speech we developed pHTS (performative HTS) that allows reactive speech synthesis and MAGE that is the engine independent and thread safe layer of pHTS that can be used in reactive application designs. This will enable performative creation of synthetic speech, from a single or multiple users, in one or multiple platforms, using different user interfaces and applications. This PhD thesis is supported by a public-private partnership between University of Mons and Acapela Group SA, Belgium.

The DiYSE project (2009 - 2011)

The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.

The MediaTIC project (2008 - 2015)

The MediaTIC portfolio was submitted in September 2007 in response to the first call for proposals of the ERDF and started on 1st July, 2008. This ambitious project falls within the scope of measure 2.2 dedicated to the exploitation of the potential of research centres. More concretely, the project's objective is to increase the competitiveness of innovating technological SMEs in Wallonia through collective projects dictated by concrete industrial requests. It works as a cross-action for the innovation in the NTIC component of each strategic line defined by the Walloon Marshall Plan.
To reach that goal, Multitel, as a project leader, has gathered a consortium composed of academic entities and research centres split all over the Walloon territory. Actually, MediaTIC has been submitted in both objectives of the period for 2007-2013 of the European structural funds programme, namely "Convergence" and "Regional Competitiveness and employment". The project counts on the know-how of laboratories such as the SEMI, TCTS and Telecommunications units of the Faculté polytechnique de Mons, the TELE laboratory from the Catholic University of Louvain-la-Neuve, of the research units in microelectronics (Microsys) and signal & image processing (Intelsig) from the University of Liege, of the Centexbel and SIRRIS research centres and finally, of the GIE MUWAC. By calling upon complementary partners, Multitel aimed at providing MediaTIC with the typical action leverages of a collaborative research and allowing the projects focusing towards common objectives.
MediaTIC is a portfolio of six integrated projects oriented towards specific industrial needs. Each one is run by a specialist from Multitel in the targeted field. These thematic platforms are Transmedia, Envimedia, Tracemedia, Intermedia, 3Dmedia and Optimedia.

The COST 2102 project (2007 - 2011)

The main objective of the Action is to develop an advanced acoustical, perceptual and psychological analysis of verbal and non-verbal communication signals originating in spontaneous face-to-face interaction, in order to identify algorithms and automatic procedures capable of identifying human emotional states. Several key aspects will be considered, such as the integration of the developed algorithms and procedures for application in telecommunication, and for the recognition of emotional states, gestures, speech and facial expressions, in anticipation of the implementation of intelligent avatars and interactive dialogue systems that could be exploited to improve user access to future telecommunication services.

The Edutain project (2004 - 2008)

The R&D activities of TCTS Lab in the area of edutainment and speech communication have led to the development of real-time voice interfaces based on acoustic features. Such tools play an important part in the voice control of information systems, as studied in a multi-modal perspective by the SIMILAR European Network of Excellence.

The MAIS project (2004 - 2007)

The objective of MAIS is to develop a low-cost, low-consumption, secure smart card that will be readable from a distance. The main applications of the project will be freight train tractability and inclusion in windshields. For this last application, the project partners work in close collaboration with Glaverbel.

The STRUT project (1996 - 2000)

The Speech Training and Recognition Unified Tool (STRUT) has been developed to do research on speech recognition and fast development and testing of related applications. The software is able to do speech analysis, models training and speech recognition. The tool consists in many ``independent'' small pieces of code, one for each of identified module in the process of speech recognition: sampling, feature extraction, clustering, probability estimation, and decoding. It is now being extended (versino 2.0) in collaboration with MULTITEL ASBL.


Past R&D projects

The KWS Predict project (2007 - 2008)

Automatic speech recognition has a huge importance in the field of automatic indexing of audiovisual documents. Indexing time widespread broadcast news is a challenge from a vocabulary point of view, because of new words, new names, new places. Techniques for updating LVCSR language models (vocabulary and grammar) are necessary. An alternative to LVCSR is to use keyword spotting. In this case, we just need the phonetic translation of the new words that have to be detected. Every keywords are not equals in terms of "detectability". The work focuses on the prediction of keyword spotting performances, and on keyword spotting accuracy improvement by adapting decision parameters given a priori information on the words to be detected.

The IRMA project (2005 - 2008)

L'objectif d'IRMA est de concevoir et développer une interface modulaire innovante pour la recherche et la navigation multimodale personnalisée, performante, sécurisée et économiquement viable dans des bases de données audiovisuelles indexées. Elle permettra une recherche contextuelle, intuitive et naturelle complétée par une navigation fluide. De la sorte, IRMA fournira un environnement permettant d'exploiter au mieux l'intelligence de l'utilisateur du moteur de recherche.

The IC&C project (2004 - 2006)

Le projet IC&C vise à la mise au point d'une interface homme-machine naturelle pour les systèmes de dessin et de conception assistés par ordinateur. Au contraire des interfaces classiques telles que souris, claviers, icônes et menus, le projet IC&C propose une interface inédite basée sur des agents logiciels combinant l'interprétation du tracé graphique à main libre, l'interprétation d'image et la reconnaissance vocale.

The DOMINI project (2004 - 2006)

This project deals with the development of computerized medical files calls upon competences of hospital needs analysis, with the control of data-processing technologies and of computational linguistics. It also requires to take into account the legal aspects related to the protection of the private life and the medical data.

The MODIVOC project (2002 - 2004)

Speech-based interfaces are about be used in many applications, for which the most demanding is that of being able to recognize any person (without prior training of the machine), even in noisy conditions. The techniques required to achieve this are mostly availble, but their use in real portable applications is limited by their memory and CPU comsuption. MODIVOC aims at :

  • simplifying ASR algorithms
  • increasing their robustness
  • dispatching CPU load among portable computers in a network
  • specifying generic models to apply this solution in heteogeneous environements

The COST 278 project (2001 - 2008)

The main objective of this Action is to create knowledge in several problem areas of spoken language interaction in telecommunications in order to achieve communicative interfaces that provide a natural human-computer interaction through more cognitive, intuitive and robust interfaces, whether monolingual, multilingual or multimodal. The scientific programme emphasises speech and dialogue processing in multimodal communication interfaces, issues related to robustness and multilinguality, human-computer dialogue theories, and models and systems and associated tools for the establishment of interactive systems. The programme also involves the evaluation of telecommunication applications in which spoken language is the only or one of many types of input or output modalities.

The ARTHUR project (2000 - 2003)

Le système prototype ARTHUR constituera un point de convergence des groupes de recherche les plus avancés en technologie de l'information de la Région Wallonne autour de la thématique des technologies de l'information intelligentes et conviviales. En s'attachant à une activité spécifique, l'assistance aux interventions d'un urgentiste, il est possible de modéliser une chaïne complète de manière intégrée et originale y incluant des recherches sur des domaines aussi chauds que les interfaces homme-machine intelligents pilotés par la voix, le multicast pour les communications sécurisées, l'élaboration et le stockage de documents multimédias actifs et sécurisés et les interfaces graphiques conviviaux.

DIALOGUE (2000 - 2004) - PhD Thesis Olivier Pietquin

This book addresses the problems of spoken dialogue system design and especially automatic learning of optimal strategies for man-machine dialogues. Besides the description of the learning methods, this text proposes a framework for realistic simulation of human-machine dialogues based on probabilistic techniques, which allows automatic evaluation and unsupervised learning of dialogue strategies. This framework relies on stochastic modelling of modules composing spoken dialogue systems as well as on user modelling. Special care has been taken to build models that can either be hand-tuned or learned from generic data.

CONFIDENCE (2000 - 2004) - PhD Thesis Erhan Mengusoglu

Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we defined confidence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available confidence measures and the newly defined acoustic prior information based confidence measure in two different conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly defined confidence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We defined also a new interpretation technique for confidence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verification. Transformation provided us with a linearly interpretable confidence level which can be used directly in real time applications like for dialog management. We have also tested the confidence measures for speaker verification systems and evaluated the efficiency of the confidence measures for adaptation of speaker models. We showed that use of confidence measures to select adaptation data improves the accuracy of the speaker model adaptation process. Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for other languages. A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly defined language modeling techniques showed that it outperforms the classical n-gram language modeling technique.

The RESPITE project (1999 - 2002)

REcognition of Speech by Partial Information TEchniques ESPRIT Long Term Research RTD Project Ref. 28149.
RESPITE extended and applied two novel technologies missing data theory and multi-stream theory to the problem of robust automatic speech recognition (ASR), with particular application to cellular phones and in-car environments. It also supported studies whose purpose was to inform this endeavour. The specific measurable objectives were to :

  • develop techniques for identifying reliable data,
  • advance the theory of multi-stream processing,
  • advance the theory of missing and masked data handling,
  • inform the above by obtaining new perceptual data on speech recognition,
  • combine missing data and multistreamprocessing with existing robust ASR methods,
  • evaluate all this within a framework of demonstrator ASR applications to cellular phones and in cars.

The DEMOSTHENES project (1998 - 1999)

Acquiring a good command of spoken Dutch is a non-trivial task for most French speaking learners of the language. In this prospect, two Belgian research teams have joined their expertise in speech recognition (Polytechnique - Mons) and software development for foreign language learning (Namur University) to produce a multimedia courseware for Dutch pronunciation, which detects and corrects the typical errors made by French speaking learners, using the hybrid HMM/ANN systems mastered at TCTS Lab. The final product discriminates pronunciation errors at the phoneme level.

The THISL project (1997 - 2000)

Thematic Indexing of Spoken Language (EC RTD Long Term Research Project 23495)
The aim of the THISL project was to produce a broadcast news retrieval demonstrator for the BBC. The approach adopted was to transcribe radio and television broadcasts using the Abbot speech recognizer and then to index the resulting transcriptions using the thislIR information retrieval system - similar to a web search engine - which allows users to search for news items of interest to them. ThislIR returns a list of news clips most relevant to each query which users can listen to. Demonstrators have been produced with both text and spoken query interfaces.

The SPRACH project (1995 - 1998)

SPeech Recognition Algorithms for Connectionist Hybrids (ESPRIT Long Term Research RTD Project Ref. 20077)
The goal of the proposed project is to further improve the current state-of-the-art in continuous speech recognition using Artificial Neural Network (ANN) and Hidden Markov Model (HMM) approaches. Pursuing the theoretical and development work successfully carried out under the WERNICKE project (ESPRIT Basic Research Project 6487, October 1992-October 1995), this new project, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), will extend the research to robust and flexible speech recognition systems that can easily be adapted to new languages and new domains with new lexica and new syntaxes.

The COST 250 project (1995 - 2000)

Speaker Recognition in Telephony

The COST 249 project (1994 - 2000)

The main objective of the project is to co-ordinate research efforts in the area of multlingual continuous speech recognition for future public network services. This will be accomplished by establishing a unified language-independent speech recognition concept, and by investigating specific topics within the framework of this concept. This way it should be possible to validate the partners' efforts in signal processing, statistical pattern recognition and linguistic processing in a more unified way .

The OOBP project (1994 - 2005)

OOBP is a programming paradigm developped at TCTS Lab since 1994. It is defined as Object Oriented Programming around processes and combines OOP and block descriptions. Plug and Play Software extends OOBP by defining input and output data as abstract streams.

The HIMARNNET project (1993 - 1995)

The development and assessment of neural network techniques for improving the robustness of medium vocabulary (50-100 words), speaker-independent, isolated word recognisers for telephone transmission quality speech. The dominant technology is Hidden Markov Models (HMMs) but this has significant limitations, some of which could be alleviated by the judicious use of artificial neural networks (ANNs) or hybrid combinations of both techniques. Direct comparisons of ANN-based, HMM-based, and hybrid ANN/HMM techniques for speech recognition will be made. The developments will be integrated and validated in the context of a telephone application including speech recognition capabilities. A number of prototypes have been demonstrated on low cost commodity systems. The telephone application developed within the project will be the basis for product development by Tedas.