|
|
ProjectsThe Speech Synthesis Research Group of the Facult Polytechnique de Mons was created in the 90's, and produced several widely spread tools, mostly in the context of the MBROLA project, and its follow-up projects, like MBROLIGN and MBRDICO, and W. The activities of the TTS group later (2000->) evolved to the development of the EULER Project (a temptative towards a generic, open source TTS solution).
In 2003, the activities of the group were retargetted toward the processing of voice quality effects, and the group was renamed as "VOQUAL". See the VOQUAL Group Web pages for more info. |
|
Current R&D projects
This project is based on the HMM-Based Speech Synthesis System (HTS), a statistical parametric speech synthesis system, where vocal tract, vocal source and prosody of speech are modelled simultaneously by HMMs and the synthetic speech is generated from HMMs themselves. HTS provides intelligibility and expressivity, it is flexible, easily adapted and with small footprint but on the other hand it is not reactive to real time user input and control. Going one step further, towards on the fly control over the synthesised speech we developed pHTS (performative HTS) that allows reactive speech synthesis and MAGE that is the engine independent and thread safe layer of pHTS that can be used in reactive application designs. This will enable performative creation of synthetic speech, from a single or multiple users, in one or multiple platforms, using different user interfaces and applications. This PhD thesis is supported by a public-private partnership between University of Mons and Acapela Group SA, Belgium.
The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.
The MediaTIC portfolio was submitted in September 2007 in response to the first call for proposals of the ERDF and started on 1st July, 2008. This ambitious project falls within the scope of measure 2.2 dedicated to the exploitation of the potential of research centres. More concretely, the project's objective is to increase the competitiveness of innovating technological SMEs in Wallonia through collective projects dictated by concrete industrial requests. It works as a cross-action for the innovation in the NTIC component of each strategic line defined by the Walloon Marshall Plan.
The main objective of the Action is to develop an advanced acoustical, perceptual and psychological analysis of verbal and non-verbal communication signals originating in spontaneous face-to-face interaction, in order to identify algorithms and automatic procedures capable of identifying human emotional states. Several key aspects will be considered, such as the integration of the developed algorithms and procedures for application in telecommunication, and for the recognition of emotional states, gestures, speech and facial expressions, in anticipation of the implementation of intelligent avatars and interactive dialogue systems that could be exploited to improve user access to future telecommunication services.
The R&D activities of TCTS Lab in the area of edutainment and speech communication have led to the development of real-time voice interfaces based on acoustic features. Such tools play an important part in the voice control of information systems, as studied in a multi-modal perspective by the SIMILAR European Network of Excellence.
The objective of MAIS is to develop a low-cost, low-consumption, secure smart card that will be readable from a distance. The main applications of the project will be freight train tractability and inclusion in windshields. For this last application, the project partners work in close collaboration with Glaverbel.
The Speech Training and Recognition Unified Tool (STRUT) has been developed to do research on speech recognition and fast development and testing of related applications. The software is able to do speech analysis, models training and speech recognition. The tool consists in many ``independent'' small pieces of code, one for each of identified module in the process of speech recognition: sampling, feature extraction, clustering, probability estimation, and decoding. It is now being extended (versino 2.0) in collaboration with MULTITEL ASBL. Past R&D projects
Automatic speech recognition has a huge importance in the field of automatic indexing of audiovisual documents. Indexing time widespread broadcast news is a challenge from a vocabulary point of view, because of new words, new names, new places. Techniques for updating LVCSR language models (vocabulary and grammar) are necessary. An alternative to LVCSR is to use keyword spotting. In this case, we just need the phonetic translation of the new words that have to be detected. Every keywords are not equals in terms of "detectability". The work focuses on the prediction of keyword spotting performances, and on keyword spotting accuracy improvement by adapting decision parameters given a priori information on the words to be detected.
L'objectif d'IRMA est de concevoir et développer une interface modulaire innovante pour la recherche et la navigation multimodale personnalisée, performante, sécurisée et économiquement viable dans des bases de données audiovisuelles indexées. Elle permettra une recherche contextuelle, intuitive et naturelle complétée par une navigation fluide. De la sorte, IRMA fournira un environnement permettant d'exploiter au mieux l'intelligence de l'utilisateur du moteur de recherche.
Le projet IC&C vise à la mise au point d'une interface homme-machine naturelle pour les systèmes de dessin et de conception assistés par ordinateur. Au contraire des interfaces classiques telles que souris, claviers, icônes et menus, le projet IC&C propose une interface inédite basée sur des agents logiciels combinant l'interprétation du tracé graphique à main libre, l'interprétation d'image et la reconnaissance vocale.
This project deals with the development of computerized medical files calls upon competences of hospital needs analysis, with the control of data-processing technologies and of computational linguistics. It also requires to take into account the legal aspects related to the protection of the private life and the medical data.
Speech-based interfaces are about be used in many applications, for which the most demanding is that of being able to recognize any person (without prior training of the machine), even in noisy conditions. The techniques required to achieve this are mostly availble, but their use in real portable applications is limited by their memory and CPU comsuption. MODIVOC aims at :
The main objective of this Action is to create knowledge in several problem areas of spoken language interaction in telecommunications in order to achieve communicative interfaces that provide a natural human-computer interaction through more cognitive, intuitive and robust interfaces, whether monolingual, multilingual or multimodal. The scientific programme emphasises speech and dialogue processing in multimodal communication interfaces, issues related to robustness and multilinguality, human-computer dialogue theories, and models and systems and associated tools for the establishment of interactive systems. The programme also involves the evaluation of telecommunication applications in which spoken language is the only or one of many types of input or output modalities.
Le système prototype ARTHUR constituera un point de convergence des groupes de recherche les plus avancés en technologie de l'information de la Région Wallonne autour de la thématique des technologies de l'information intelligentes et conviviales. En s'attachant à une activité spécifique, l'assistance aux interventions d'un urgentiste, il est possible de modéliser une chaïne complète de manière intégrée et originale y incluant des recherches sur des domaines aussi chauds que les interfaces homme-machine intelligents pilotés par la voix, le multicast pour les communications sécurisées, l'élaboration et le stockage de documents multimédias actifs et sécurisés et les interfaces graphiques conviviaux.
This book addresses the problems of spoken dialogue system design and especially automatic learning of optimal strategies for man-machine dialogues. Besides the description of the learning methods, this text proposes a framework for realistic simulation of human-machine dialogues based on probabilistic techniques, which allows automatic evaluation and unsupervised learning of dialogue strategies. This framework relies on stochastic modelling of modules composing spoken dialogue systems as well as on user modelling. Special care has been taken to build models that can either be hand-tuned or learned from generic data.
Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we defined confidence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available confidence measures and the newly defined acoustic prior information based confidence measure in two different conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly defined confidence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We defined also a new interpretation technique for confidence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verification. Transformation provided us with a linearly interpretable confidence level which can be used directly in real time applications like for dialog management. We have also tested the confidence measures for speaker verification systems and evaluated the efficiency of the confidence measures for adaptation of speaker models. We showed that use of confidence measures to select adaptation data improves the accuracy of the speaker model adaptation process. Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for other languages. A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly defined language modeling techniques showed that it outperforms the classical n-gram language modeling technique.
REcognition of Speech by Partial Information TEchniques ESPRIT Long Term Research RTD Project Ref. 28149.
Acquiring a good command of spoken Dutch is a non-trivial task for most French speaking learners of the language. In this prospect, two Belgian research teams have joined their expertise in speech recognition (Polytechnique - Mons) and software development for foreign language learning (Namur University) to produce a multimedia courseware for Dutch pronunciation, which detects and corrects the typical errors made by French speaking learners, using the hybrid HMM/ANN systems mastered at TCTS Lab. The final product discriminates pronunciation errors at the phoneme level.
Thematic Indexing of Spoken Language (EC RTD Long Term Research Project 23495)
SPeech Recognition Algorithms for Connectionist Hybrids (ESPRIT Long Term Research RTD Project Ref. 20077)
Speaker Recognition in Telephony
The main objective of the project is to co-ordinate research efforts in the area of multlingual continuous speech recognition for future public network services. This will be accomplished by establishing a unified language-independent speech recognition concept, and by investigating specific topics within the framework of this concept. This way it should be possible to validate the partners' efforts in signal processing, statistical pattern recognition and linguistic processing in a more unified way .
OOBP is a programming paradigm developped at TCTS Lab since 1994. It is defined as Object Oriented Programming around processes and combines OOP and block descriptions. Plug and Play Software extends OOBP by defining input and output data as abstract streams.
The development and assessment of neural network techniques for improving the robustness of medium vocabulary (50-100 words), speaker-independent, isolated word recognisers for telephone transmission quality speech. The dominant technology is Hidden Markov Models (HMMs) but this has significant limitations, some of which could be alleviated by the judicious use of artificial neural networks (ANNs) or hybrid combinations of both techniques. Direct comparisons of ANN-based, HMM-based, and hybrid ANN/HMM techniques for speech recognition will be made. The developments will be integrated and validated in the context of a telephone application including speech recognition capabilities. A number of prototypes have been demonstrated on low cost commodity systems. The telephone application developed within the project will be the basis for product development by Tedas. |
|