Audio Processing Group
The audio processing group studies audio and acoustic signal processing, with special interest in low bit rate audio coding (using parametric, harmonic and/or hybrid coders), detection and localization of sounds (with microphone networks), noise cancelling techniques, audio signal recognition, audio watermarking (adding inaudible pseudo-random signal to audio signals), and music synthesis, with special focus on real time interfaces.
Applications : audio coding and broadcasting, robust speech recognition, multimodal man-machine interfaces, creation of hidden communication channels in a conventional radio channel
Current R&D projects
The DIGISTORM project (2016 - 2020)
The SonixTrip project (2013 - 2016)
The DiYSE project (2009 - 2011)
The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.
The MaxMBROLA project (2004 - 2008)
The main topics of this research project are:
- The development of a flexible external object for Max/MSP (4.5) encapsulating the main features of the MBROLA speech synthesizer and the adaptation of the MBROLA functions to the asynchronous request-based architecture of the Max/MSP environment.
- Discussions and Max/MSP developments about the real-time control issues in the phonetic/prosodic content generation process. This research topic is a good "first-trial" concerning overall issues of real-time manipulation of concatenation-based signals.
- Propositions of various real-time concatenation-based applications (standalone, virtual instruments or Max/MSP patches) allowing performers to produce versatile voice with standard musical devices.
Past R&D projects
The SLOWDIO project (2013 - 2015)
In this project we implement new methods for time-stretching of stereo audio signals, especially audio generated during sport events. This will enable viewers to watch slow-motion videos with synchronous time-stretched quality-preserved sound.
MAGE / pHTS (2010 - 2014) - PhD Thesis Maria Astrinaki
This project is based on the HMM-Based Speech Synthesis System (HTS), a statistical parametric speech synthesis system, where vocal tract, vocal source and prosody of speech are modelled simultaneously by HMMs and the synthetic speech is generated from HMMs themselves. HTS provides intelligibility and expressivity, it is flexible, easily adapted and with small footprint but on the other hand it is not reactive to real time user input and control. Going one step further, towards on the fly control over the synthesised speech we developed pHTS (performative HTS) that allows reactive speech synthesis and MAGE that is the engine independent and thread safe layer of pHTS that can be used in reactive application designs. This will enable performative creation of synthetic speech, from a single or multiple users, in one or multiple platforms, using different user interfaces and applications.
This PhD thesis is supported by a public-private partnership between University of Mons and Acapela Group SA, Belgium.
The COMPTOUX project (2010 - 2013)
Designing interaction for browsing media collections (by similarity) (2010 - 2015) - PhD Thesis Christian Frisson
Sound designers source sounds in massive and heavily tagged collections. When searching for media content, once queries are filtered by keywords, hundreds of items need to be reviewed. How can we present these results efficiently? This doctoral work aims at improving the usability of browsers of media collections by blending techniques from multimedia information retrieval (MIR) and human-computer interaction (HCI). We produced an in-depth state-of-the-art on media browsers. We overviewed HCI and MIR techniques that support our work: organization by content-based similarity (MIR), information visualization and gestural interaction (HCI). We developed the MediaCycle framework for organization by content-based similarity and the DeviceCycle toolbox for rapid prototyping of gestural interaction, both facilitated the design of several media browsers. We evaluated the usability of some of our media browsers.
Our main contribution is AudioMetro, an interactive visualization of sound collections. Sounds are represented by content-based glyphs, mapping perceptual sharpness (audio) to brightness and contour (visual). These glyphs are positioned in a starfield display using Student t-distributed Stochastic Neighbor Embedding (t-SNE) for dimension reduction, then a proximity grid optimized for preserving direct neighbors. Known-item search evaluation shows that our technique significantly outperforms a grid of sounds represented by dots and ordered by filename.
The MediaTIC project (2008 - 2015)
The MediaTIC portfolio was submitted in September 2007 in response to the first call for proposals of the ERDF and started on 1st July, 2008. This ambitious project falls within the scope of measure 2.2 dedicated to the exploitation of the potential of research centres. More concretely, the project's objective is to increase the competitiveness of innovating technological SMEs in Wallonia through collective projects dictated by concrete industrial requests. It works as a cross-action for the innovation in the NTIC component of each strategic line defined by the Walloon Marshall Plan.
To reach that goal, Multitel, as a project leader, has gathered a consortium composed of academic entities and research centres split all over the Walloon territory. Actually, MediaTIC has been submitted in both objectives of the period for 2007-2013 of the European structural funds programme, namely "Convergence"Â and "Regional Competitiveness and employment"Â. The project counts on the know-how of laboratories such as the SEMI, TCTS and Telecommunications units of the FacultÃ© polytechnique de Mons, the TELE laboratory from the Catholic University of Louvain-la-Neuve, of the research units in microelectronics (Microsys) and signal & image processing (Intelsig) from the University of Liege, of the Centexbel and SIRRIS research centres and finally, of the GIE MUWAC. By calling upon complementary partners, Multitel aimed at providing MediaTIC with the typical action leverages of a collaborative research and allowing the projects focusing towards common objectives.
MediaTIC is a portfolio of six integrated projects oriented towards specific industrial needs. Each one is run by a specialist from Multitel in the targeted field. These thematic platforms are Transmedia, Envimedia, Tracemedia, Intermedia, 3Dmedia and Optimedia.
SLAW (2008 - 2013) - PhD Thesis Alexis Moinet
In this project we develop new methods for time-stretching of audio signals, especially audio generated during sport events. This will enable viewers to watch slow-motion videos with synchronous time-stretched quality-preserved sound.
The NUMEDIART project (2007 - 2012)
Numediart is a long-term research programme centered on Digital Media Arts, funded by RÃ©gion Wallonne, Belgium (grant NÂ°716631). Its main goal is to foster the development of new media technologies through digital performances and installations, in connection with local companies and artists.
It is organized around three major R&D themes: HyFORGE - hypermedia navigation, COMEDIA - body and media, COPI - digital instrument making. It is performed as a series of short (3-months) projects, typically 3 or 4 of them in parallel, which are concluded by a 1-week "hands on" workshop.
Numediart is the result of collaboration between Polytech.Mons (Information Technology R&D Department) and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits from the expertise of the Multitel research center on multimedia and telecommunications. As such, it is the R&D component of MONS2015, a broader effort towards making Mons the cultural capital of Europe in 2015.
HMM2SPEECH (2007 - 2011) - PhD Thesis Thomas Drugman
Intelligibility and expressivity have become the keywords in speech synthesis. For this, a system (HTS) based on the statistical generation of
voice parameters from Hidden Markov Models has recently shown its potential efficiency and flexibility. Nevertheless this approach has not yet reached its maturity and is limited by the buzziness it produces. This latter inconvenience is undoubtedly due to the parametrical representation of speech inducing a lack of voice quality. The first part of this thesis is consequently devoted to the high-quality analysis of speech. In the future, applications oriented towards voice conversion and expressive speech
synthesis could also be carried out.
LAUGHTER (2007 - 2014) - PhD Thesis JÃ©rÃ´me Urbain
Human speech contains a lot of paralinguistic sounds conveying information about the speaker's (affective) state. Laughter is one of those signals. Due to its high variability, both inter- and intra- speaker (one same person will laugh differently depending on its emotional state, environment, etc.), it is difficult to recognize laughter from an audio record or to synthesize human-like laughter, sounding natural. In the framework of the CALLAS project, our study aims at catching the global patterns of laughter in order to develop algorithms to detect it in real-time and to produce natural laughter utterances. Potential uses cover the broad range of applications using automatic speech recognition and synthesis for human computer interactions.
The COST SID project (2007 - 2011)
Sonic Interaction Design is the exploitation of sound as one of the principal channels conveying information, meaning, and aesthetic/emotional qualities in interactive contexts. The Action proactively contributes to the creation and consolidation of new design theories, tools, and practices in this innovative and interdisciplinary domain. While being advanced through a few sparse projects, this field relies on the COST - SID Action to strengthen the links between scientists, artists, and designers in the European Research Area. The COST - SID platform stands on four legs: (i) perception, cognition, and emotion; (ii) design; (iii) interactive art; (iv) information display and exploration. These are each supported by the research and development of the requisite new interactive technologies. Due to the breadth of its application spectrum, the COST - SID Action has the potential of affecting everyday life through physical and virtual interactive objects, as today there is the possibility to design and actively control their acoustic response so that it conveys an intended aesthetic, informational, or emotional content.
The SERKET project (2006 - 2009)
The goals of SERKET are twofold:
- define the requirements and the
specifications of an open security platform for public places and events
- demonstrate the new architectural principle for security systems on realistic scenarios, by integrating heterogeneous sensors (video, audio, human, etc), by applying advanced fusion technologies of multimedia information and by assessing automatically threats.
The MOUSTIC project (2005 - 2007)
MOUSTIC project aims at developing new frameworks, complementary to the existing ones, for the diffusion of road information in Wallonia.
It would use new channels of diffusion which we propose to develop and to integrate in the existing steps of the WHIST project (Walloon Highway Information System for Traffic).
The system consists of the creation of a free communication channel using existing broadcastings.
During radio transmission, information will be hidden in the form of pseudo-random noise inaudible by a human.
A low cost receiver will decode this information and synthesize it vocally, or display it on a screen.
The IRMA project (2005 - 2008)
L'objectif d'IRMA est de concevoir et dÃ©velopper une interface modulaire innovante pour la recherche et la navigation multimodale personnalisÃ©e, performante, sÃ©curisÃ©e et Ã©conomiquement viable dans des bases de donnÃ©es audiovisuelles indexÃ©es. Elle permettra une recherche contextuelle, intuitive et naturelle complÃ©tÃ©e par une navigation fluide. De la sorte, IRMA fournira un environnement permettant d'exploiter au mieux l'intelligence de l'utilisateur du moteur de recherche.
RAMCESS (2005 - 2009) - PhD Thesis Nicolas D'Alessandro
RAMCESS, for "Realtime and Accurate Musical Control of Expressivity in
Sound Synthesis". Expressivity is nowadays one of the most challenging
topics studied by researchers in both speech and music processing.
Indeed recent synthesizers provide acceptable results in term of
naturalness and articulation but the need to improve human/computer
interactions has brought researchers to developing systems that
present more human-like expressive skills. Currently most of the
research seems to converge towards applications where huge databases
are recorded (non-uniform unit selection or giga-sampling),
corresponding to a certain number of labelled expressions. At
synthesis time, the expression of the virtual source is set by
choosing the units inside the corresponding corpus, and then
concatenating or overlapping. On the other side, systems based on
physical modeling try to provide a concrete access to underlying
acoustic mechanisms, with today some problems in naturalness. This PhD
thesis (N. d'Alessandro, supervisor: Prof. T. Dutoit) proposes to "re-
consider" the data-based approach by investigating the short-term
analysis of signals, the description of expressive attributes of
sound, the realization of realtime and "smart" database browsing
techniques and the study of some control-based layers.
The SIMILAR project (2003 - 2007)
The SIMILAR European Network of Excellence will create an integrated task force on multimodal interfaces that respond intelligently to speech, gestures, vision, haptics and direct brain connections by merging into a single research group excellent European laboratories in Human-Computer Interaction (HCI) and in Signal Processing.
SIMILAR will develop a common theoretical framework for fusion and fission of multimodal information using the most advanced Signal Processing tools constrained by Human Computer Interaction rules.
SIMILAR will develop a network of usability test facilities and will establish an assessment methodology.
SIMILAR will develop a common distributed software platform available for researchers and the public at large through www.openinterface.org
SIMILAR will address Grand Challenges in the field of edutainment, interfaces for disabled people and interfaces for medical applications.
SIMILAR will establish a top-level foundation which will manage an International Journal, Special Sessions in existing conferences, organize summer schools, interact with key European industrial partners and promote new research activities at the European level.
TCTS Lab's contibution will be on Grand Challenges related to TTS and ASR technologies, and their integration into a multimodal framework. We will also work on enhancing Brain Computer Interfaces. SIMILAR is considered a central project for the evolution of our lab.
DIALOGUE (2000 - 2004) - PhD Thesis Olivier Pietquin
This book addresses the problems of spoken dialogue system design and especially automatic learning of optimal strategies for man-machine dialogues. Besides the description of the learning methods, this text proposes a framework for realistic simulation of human-machine dialogues based on probabilistic techniques, which allows automatic evaluation and unsupervised learning of dialogue strategies. This framework relies on stochastic modelling of modules composing spoken dialogue systems as well as on user modelling. Special care has been taken to build models that can either be hand-tuned or learned from generic data.
Full Professor, Head
TCTS Lab - FPMs
|tel : +32 65 37 47 30