Current R&D projects
Social Communicative Events Processing (2014 - ) - PhD Thesis KÃ©vin El Haddad
Human-machine interactions are becoming more and more anchored in our daily lives. Yet, the state of the art of this general term is still a very poorly explored domain compared to the future achievements possible. For an easier and more natural interaction, human-machine dialogue is one of the most interesting sub-domains to develop. This thesis focuses on ameliorating this dialogue by improving the machine's expressions on one side and its understanding of the users' messages on the other. The main strategy adopted till now here is to "teach" the machine to imitate (synthesize) and understand (recognize) the humans' social communicative signals and emotion expressions (and their "meaning" in all social contexts possible, like fillers, laughter, confusion etc.).
ThÃ¨se Onur Babacan (2010 - ) - PhD Thesis Onur Babacan
Past R&D projects
AVLASYN - Audio-Visual Laughter Synthesis (2012 - 2016) - PhD Thesis HÃ¼seyin Cakmak
Laughter is one of the most important signals of human interactions. It has important various functions in the social context, we can find conveying our emotions, back-channeling, displaying affiliation or mitigating an unpleasant comment. With the advances in human-machine interactions and the developments in speech processing, a growing interest in laughter processing has been seen in the last decades. Detecting, analyzing and producing laughter have become tasks that a machine should be able to perform. This project aims at producing convincing synchronous acoustic and visual laughter. Possible application fields include human-machine interactions (smartphones, navigation systems, etc), video-games development, animation movies production or humanoid robots control.
The HandSketch project (2012 - 2014)
Development of a new digital musical instrument that will give a musician the possibility to perform synthetic singing on stage.
Degree of Articulation (2009 - 2013) - PhD Thesis Benjamin Picart
Nowadays, speech synthesis is part of various daily life applications. The ultimate goal of such technologies consists in extending the possibilities of interaction with the machine, in order to get closer to human-like communications. However, current state-of-the-art systems often lack of realism: although high-quality speech synthesis can be produced by many researchers and companies around the world, synthetic voices are generally perceived as hyperarticulated. In any case, their degree of articulation is fixed once and for all.
The present thesis falls within the more general quest for enriching expressivity in speech synthesis. The main idea consists in improving statistical parametric speech synthesis, whose most famous example is Hidden Markov Model (HMM) based speech synthesis, by introducing a control of the articulation degree, so as to enable synthesizers to automatically adapt their way of speaking to the contextual situation, like humans do. The degree of articulation, which is probably the least studied prosodic parameters, is characterized by modifications of phonetic context, of speech rate and of spectral dynamics (vocal tract rate of change). It depends upon the surrounding environment and the communication context, and provides information on the relationship between the speaker and the listener(s).
Current R&D projects
ThÃ¨se Onur Babacan (2010 - ) - PhD Thesis Onur Babacan
The DiYSE project (2009 - 2011)
The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.
The COST 2102 project (2007 - 2011)
The main objective of the Action is to develop an advanced acoustical, perceptual and psychological analysis of verbal and non-verbal communication signals originating in spontaneous face-to-face interaction, in order to identify algorithms and automatic procedures capable of identifying human emotional states. Several key aspects will be considered, such as the integration of the developed algorithms and procedures for application in telecommunication, and for the recognition of emotional states, gestures, speech and facial expressions, in anticipation of the implementation of intelligent avatars and interactive dialogue systems that could be exploited to improve user access to future telecommunication services.
The TTSBOX project (2004 - 2008)
TTSBOX performs the synthesis of Genglish (for "Generic English"), an imaginary language obtained by replacing English words by generic words. Genglish therefore has a rather limited lexicon, but its pronunciation maintains most of the problems encountered in natural languages. TTSBOX uses simple data-driven techniques (Bigrams, CARTs, NUUs) while trying to keep the code minimal, so as to keep it readable for students with reasonable MATLAB practice.
The MBROLA project (1995 - 1999)
The goal of the MBROLA project is to obtain a set a high quality speech synthesizers for as many languages as possible, free for use in non-commercial applications. The ultimate goal is to boost up academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges in Text-to-Speech Synthesis for the years to come. As of 2003, 26 languages are available, and ore than 50 voices. Many other languages are in preparation. The software has been compiled on 21 machine/OS combinations
Past R&D projects
MAGE / pHTS (2010 - 2014) - PhD Thesis Maria Astrinaki
This project is based on the HMM-Based Speech Synthesis System (HTS), a statistical parametric speech synthesis system, where vocal tract, vocal source and prosody of speech are modelled simultaneously by HMMs and the synthetic speech is generated from HMMs themselves. HTS provides intelligibility and expressivity, it is flexible, easily adapted and with small footprint but on the other hand it is not reactive to real time user input and control. Going one step further, towards on the fly control over the synthesised speech we developed pHTS (performative HTS) that allows reactive speech synthesis and MAGE that is the engine independent and thread safe layer of pHTS that can be used in reactive application designs. This will enable performative creation of synthetic speech, from a single or multiple users, in one or multiple platforms, using different user interfaces and applications.
This PhD thesis is supported by a public-private partnership between University of Mons and Acapela Group SA, Belgium.
The MediaTIC project (2008 - 2015)
The MediaTIC portfolio was submitted in September 2007 in response to the first call for proposals of the ERDF and started on 1st July, 2008. This ambitious project falls within the scope of measure 2.2 dedicated to the exploitation of the potential of research centres. More concretely, the project's objective is to increase the competitiveness of innovating technological SMEs in Wallonia through collective projects dictated by concrete industrial requests. It works as a cross-action for the innovation in the NTIC component of each strategic line defined by the Walloon Marshall Plan.
To reach that goal, Multitel, as a project leader, has gathered a consortium composed of academic entities and research centres split all over the Walloon territory. Actually, MediaTIC has been submitted in both objectives of the period for 2007-2013 of the European structural funds programme, namely "Convergence"Â and "Regional Competitiveness and employment"Â. The project counts on the know-how of laboratories such as the SEMI, TCTS and Telecommunications units of the FacultÃ© polytechnique de Mons, the TELE laboratory from the Catholic University of Louvain-la-Neuve, of the research units in microelectronics (Microsys) and signal & image processing (Intelsig) from the University of Liege, of the Centexbel and SIRRIS research centres and finally, of the GIE MUWAC. By calling upon complementary partners, Multitel aimed at providing MediaTIC with the typical action leverages of a collaborative research and allowing the projects focusing towards common objectives.
MediaTIC is a portfolio of six integrated projects oriented towards specific industrial needs. Each one is run by a specialist from Multitel in the targeted field. These thematic platforms are Transmedia, Envimedia, Tracemedia, Intermedia, 3Dmedia and Optimedia.
The CALLAS project (2007 - 2010)
CALLAS ("Conveying Affectiveness in Leading-Edge Living Adaptive Systems") is a European Integrated Project (FP6). It aims at designing and developing multimodal architectures giving a strong importance to emotions, for Arts and Entertainment.
The global idea of the project is that New Medias, targeting recognition and production of emotions, can enhance users' (or spectators') experience and interaction. CALLAS is thus investigating how, at the input level, emotions can be detected and how, at the output level, these emotions can be processed to generate a new audiovisual content enriching users' experience. The input modalities include both vocal and body languages (recorded through video cameras and haptic devices). In order to improve the recognition of emotions, the problem of merging the information coming from these different modalities will also be examined. The applications are ranging from digital theatre productions (playing an audio or visual content in relation with the actors' and spectators' feelings) to real or virtual museum tours (taking the visitor's interest into account to reshape the exposition and select the level of information its audioguide will give), without forgetting interactive television (modifying a scenario according to the spectator's emotions).
The NUMEDIART project (2007 - 2012)
Numediart is a long-term research programme centered on Digital Media Arts, funded by RÃ©gion Wallonne, Belgium (grant NÂ°716631). Its main goal is to foster the development of new media technologies through digital performances and installations, in connection with local companies and artists.
It is organized around three major R&D themes: HyFORGE - hypermedia navigation, COMEDIA - body and media, COPI - digital instrument making. It is performed as a series of short (3-months) projects, typically 3 or 4 of them in parallel, which are concluded by a 1-week "hands on" workshop.
Numediart is the result of collaboration between Polytech.Mons (Information Technology R&D Department) and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits from the expertise of the Multitel research center on multimedia and telecommunications. As such, it is the R&D component of MONS2015, a broader effort towards making Mons the cultural capital of Europe in 2015.
The KWS Predict project (2007 - 2008)
Automatic speech recognition has a huge importance in the field of automatic
indexing of audiovisual documents. Indexing time widespread broadcast news
is a challenge from a vocabulary point of view, because of new words, new
names, new places. Techniques for updating LVCSR language models (vocabulary
and grammar) are necessary. An alternative to LVCSR is to use keyword
spotting. In this case, we just need the phonetic translation of the new
words that have to be detected. Every keywords are not equals in terms of
"detectability". The work focuses on the prediction of keyword spotting
performances, and on keyword spotting accuracy improvement by adapting
decision parameters given a priori information on the words to be detected.
HMM2SPEECH (2007 - 2011) - PhD Thesis Thomas Drugman
Intelligibility and expressivity have become the keywords in speech synthesis. For this, a system (HTS) based on the statistical generation of
voice parameters from Hidden Markov Models has recently shown its potential efficiency and flexibility. Nevertheless this approach has not yet reached its maturity and is limited by the buzziness it produces. This latter inconvenience is undoubtedly due to the parametrical representation of speech inducing a lack of voice quality. The first part of this thesis is consequently devoted to the high-quality analysis of speech. In the future, applications oriented towards voice conversion and expressive speech
synthesis could also be carried out.
LAUGHTER (2007 - 2014) - PhD Thesis JÃ©rÃ´me Urbain
Human speech contains a lot of paralinguistic sounds conveying information about the speaker's (affective) state. Laughter is one of those signals. Due to its high variability, both inter- and intra- speaker (one same person will laugh differently depending on its emotional state, environment, etc.), it is difficult to recognize laughter from an audio record or to synthesize human-like laughter, sounding natural. In the framework of the CALLAS project, our study aims at catching the global patterns of laughter in order to develop algorithms to detect it in real-time and to produce natural laughter utterances. Potential uses cover the broad range of applications using automatic speech recognition and synthesis for human computer interactions.
The ECLIPSE project (2006 - 2012)
There are various methods of analysis aiming at classifying vocal pathologies, but none is really powerful. First of all, the "perceptive" analysis makes it possible to the doctor to qualify the quality of the voice according to several criteria, the problem of this method being subjectivity of the judgement. That's why specialists prefer the "acoustic" analysis, computer-assisted method consisting in calculating on the vocal signal a series of objective parameters which are used to qualify the voice of the patient. But this method is only effective to analyze supported vowels, and thus not continuous speech, what would be more suitable. Moreover, the strongly hoarse speakers are unable to produce pseudoperiodic speech.
The ECLIPSE project aims to develop software of acoustic analysis for any type of voice and any degree of hoarseness. The project implements the simultaneous analysis of the vocal signals and the images of the vibration of the vocal cords and aims, in addition to the realization of a clinical prototype, the realization of a portable device intended to ensure a follow-up of the patients at the risk on their workplace.
The IRMA project (2005 - 2008)
L'objectif d'IRMA est de concevoir et dÃ©velopper une interface modulaire innovante pour la recherche et la navigation multimodale personnalisÃ©e, performante, sÃ©curisÃ©e et Ã©conomiquement viable dans des bases de donnÃ©es audiovisuelles indexÃ©es. Elle permettra une recherche contextuelle, intuitive et naturelle complÃ©tÃ©e par une navigation fluide. De la sorte, IRMA fournira un environnement permettant d'exploiter au mieux l'intelligence de l'utilisateur du moteur de recherche.
RAMCESS (2005 - 2009) - PhD Thesis Nicolas D'Alessandro
RAMCESS, for "Realtime and Accurate Musical Control of Expressivity in
Sound Synthesis". Expressivity is nowadays one of the most challenging
topics studied by researchers in both speech and music processing.
Indeed recent synthesizers provide acceptable results in term of
naturalness and articulation but the need to improve human/computer
interactions has brought researchers to developing systems that
present more human-like expressive skills. Currently most of the
research seems to converge towards applications where huge databases
are recorded (non-uniform unit selection or giga-sampling),
corresponding to a certain number of labelled expressions. At
synthesis time, the expression of the virtual source is set by
choosing the units inside the corresponding corpus, and then
concatenating or overlapping. On the other side, systems based on
physical modeling try to provide a concrete access to underlying
acoustic mechanisms, with today some problems in naturalness. This PhD
thesis (N. d'Alessandro, supervisor: Prof. T. Dutoit) proposes to "re-
consider" the data-based approach by investigating the short-term
analysis of signals, the description of expressive attributes of
sound, the realization of realtime and "smart" database browsing
techniques and the study of some control-based layers.
The COST 277 project (2004 - 2005)
The main objective of this COST Action is to improve the quality and capabilities of the voice services for telecommunication systems through the development of new nonlinear speech processing techniques. The proposed new mathematical methods are expected to provide advances in generic speech processing functions. Examples of these are: higher quality speech synthesis, more efficient speech coding, improved speech recognition, and improved speaker identification.
The SIMILAR project (2003 - 2007)
The SIMILAR European Network of Excellence will create an integrated task force on multimodal interfaces that respond intelligently to speech, gestures, vision, haptics and direct brain connections by merging into a single research group excellent European laboratories in Human-Computer Interaction (HCI) and in Signal Processing.
SIMILAR will develop a common theoretical framework for fusion and fission of multimodal information using the most advanced Signal Processing tools constrained by Human Computer Interaction rules.
SIMILAR will develop a network of usability test facilities and will establish an assessment methodology.
SIMILAR will develop a common distributed software platform available for researchers and the public at large through www.openinterface.org
SIMILAR will address Grand Challenges in the field of edutainment, interfaces for disabled people and interfaces for medical applications.
SIMILAR will establish a top-level foundation which will manage an International Journal, Special Sessions in existing conferences, organize summer schools, interact with key European industrial partners and promote new research activities at the European level.
TCTS Lab's contibution will be on Grand Challenges related to TTS and ASR technologies, and their integration into a multimodal framework. We will also work on enhancing Brain Computer Interfaces. SIMILAR is considered a central project for the evolution of our lab.
The ARMAGEDDON project (2003 - 2004)
Armageddon is an opera sung and played by human-controled robots, in real time. Created by Art Zoyd; Robot voices taken from the MBROLA Project (under Max/MSP).
The STOP project (2003 - 2006)
The STOP Project aims at studying the relationship between speech dynamics and voice quality, based on home-made tools for efficient source-tract separation.
The NUMBROLA project (2001 - 2005)
NUMBROLA is an extension of MBROLA towards corpus-based, non-uniform unit (NUU) selection techniques in speech synthesis. The goal of NUMBROLA is to provide a standard concatenative synthesizer to people active in NUU research. A French database has been made available, and a first version of the software. We are currently working on an improved version, based on a modified MBROLA agorithm : TP-MBROLA.
The COST 278 project (2001 - 2008)
The main objective of this Action is to create knowledge in several problem areas of spoken language interaction in telecommunications in order to achieve communicative interfaces that provide a natural human-computer interaction through more cognitive, intuitive and robust interfaces, whether monolingual, multilingual or multimodal.
The scientific programme emphasises speech and dialogue processing in multimodal communication interfaces, issues related to robustness and multilinguality, human-computer dialogue theories, and models and systems and associated tools for the establishment of interactive systems. The programme also involves the evaluation of telecommunication applications in which spoken language is the only or one of many types of input or output modalities.
The MLRR project (2000 - 2001)
The goal of this program is to transcribe a symbolic input, i.e. a string of symbols belonging to some alphabet, into a symbolic output according to a regular grammar described in terms of a system of multi-level rewriting rules (MLRR). "Symbols" and "alphabet" have to be understood here as generic terms: they can be characters, phonemes, syllables, words, phrases, etc.
This project is closed but the software is available in Open Source format.
DIALOGUE (2000 - 2004) - PhD Thesis Olivier Pietquin
This book addresses the problems of spoken dialogue system design and especially automatic learning of optimal strategies for man-machine dialogues. Besides the description of the learning methods, this text proposes a framework for realistic simulation of human-machine dialogues based on probabilistic techniques, which allows automatic evaluation and unsupervised learning of dialogue strategies. This framework relies on stochastic modelling of modules composing spoken dialogue systems as well as on user modelling. Special care has been taken to build models that can either be hand-tuned or learned from generic data.
CONFIDENCE (2000 - 2004) - PhD Thesis Erhan Mengusoglu
Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system.
Speech/speaker recognition systems are usually based on statistical modeling techniques. In
this thesis we defined confidence measures for statistical modeling techniques used in speech/speaker recognition systems.
For speech recognition we tested available confidence measures and the newly defined acoustic prior information based confidence measure in two different conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly defined confidence measure performs better in both tests.
Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis.
We defined also a new interpretation technique for confidence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verification. Transformation provided us with a linearly interpretable confidence level which can be used directly in real time applications like for dialog management.
We have also tested the confidence measures for speaker verification systems and evaluated the efficiency of the confidence measures for adaptation of speaker models. We showed that use of confidence measures to select adaptation data improves the accuracy of the speaker model adaptation process.
Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for
A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly defined language modeling techniques showed that it outperforms the classical n-gram language modeling technique.
The EULER project (1997 - 2001)
For years, non-coordinated research effort on the design of text-to-speech (TTS) systems has led to unavoidable cross-system and cross-language incompatibility. The EULER project aimed at producing a unified, extensible, and publicly available research, development and production environment for multilingual TTS synthesis. EULER has led to the development of a corpus-based French TTS system. The project is no longer supported, but the software components are still available.
EULER has been reworked into eLITE, by the TTS team of MULTITEL ASBL.
The MBRDICO project (1997 - 2001)
MBRDICO is a talking dictionnary using MBROLA as a back-end speech synthesizer. Text processing is performed using a complete GNU GPL package for automatic phonetization training (letter/phoneme alignement, decision tree building, stress assignment) and duration/intonation generation. French, US English, and Arabic are available. We do not work directly on this project any longer, but all its sources are available for use or extension. This work is the result of a collaboration between:
- FacultÃ© Polytechnique de Mons
- Carnegie Mellon University
- University of Edinburgh
The MBROLIGN project (1997 - 2001)
MBROLIGN is a fast MBROLA-based text-to-speech aligner. It is provided free for use in non commercial applications. The goal of this project is to create large phonetically and prosodically labeled for as many languages as possible, thereby drastically expanding the reach of speech technology. This project is currently closed, but the software is available for database creation.
The W project (1997 - 2001)
The W project aimed at creating a fast computer keyboard driver for people with speech disabilities. The related software is based on grade II Braille languages developed by blind people associations all over the world and minimizes the number of keystrokes to utter a word (the name of the project is the grade II abreviation for "word" in English).
This project has been extended by MULTITEL ASBL in the framework of the FASTY EC/FP5 Project.
The COST 250 project (1995 - 2000)
Speaker Recognition in Telephony
The OOBP project (1994 - 2005)
OOBP is a programming paradigm developped at TCTS Lab since 1994. It is defined as
Object Oriented Programming around processes and combines OOP and block descriptions.
Plug and Play Software extends OOBP by defining input and output data as abstract streams.