Audio Processing Group
[Homelab]
[Projects]
[People]
[Publications]
[Links]
Projects
The audio processing group studies audio and acoustic signal processing, with special interest in low bit rate audio coding (using parametric, harmonic and/or hybrid coders), detection and localization of sounds (with microphone networks), noise cancelling techniques, audio signal recognition, audio watermarking (adding inaudible pseudo-random signal to audio signals), and music synthesis, with special focus on real time interfaces.
Applications : audio coding and broadcasting, robust speech recognition, multimodal man-machine interfaces, creation of hidden communication channels in a conventional radio channel
Current R&D projects
The MediaTIC project (2008 - 2013)
The MediaTIC portfolio was submitted in September 2007 in response to the first call for proposals of the ERDF and started on 1st July, 2008. This ambitious project falls within the scope of measure 2.2 dedicated to the exploitation of the potential of research centres. More concretely, the project’s objective is to increase the competitiveness of innovating technological SMEs in Wallonia through collective projects dictated by concrete industrial requests. It works as a cross-action for the innovation in the NTIC component of each strategic line defined by the Walloon Marshall Plan.
To reach that goal, Multitel, as a project leader, has gathered a consortium composed of academic entities and research centres split all over the Walloon territory. Actually, MediaTIC has been submitted in both objectives of the period for 2007-2013 of the European structural funds programme, namely “Convergence” and “Regional Competitiveness and employment”. The project counts on the know-how of laboratories such as the SEMI, TCTS and Telecommunications units of the Faculté polytechnique de Mons, the TELE laboratory from the Catholic University of Louvain-la-Neuve, of the research units in microelectronics (Microsys) and signal & image processing (Intelsig) from the University of Liege, of the Centexbel and SIRRIS research centres and finally, of the GIE MUWAC. By calling upon complementary partners, Multitel aimed at providing MediaTIC with the typical action leverages of a collaborative research and allowing the projects focusing towards common objectives.
MediaTIC is a portfolio of six integrated projects oriented towards specific industrial needs. Each one is run by a specialist from Multitel in the targeted field. These thematic platforms are Transmedia, Envimedia, Tracemedia, Intermedia, 3Dmedia and Optimedia.
The NUMEDIART project (2007 - )
Numediart is a long-term research programme centered on Digital Media Arts, funded by Région Wallonne, Belgium (grant N°716631). Its main goal is to foster the development of new media technologies through digital performances and installations, in connection with local companies and artists.
It is organized around three major R&D themes: HyFORGE – hypermedia navigation, COMEDIA – body and media, COPI – digital instrument making. It is performed as a series of short (3-months) projects, typically 3 or 4 of them in parallel, which are concluded by a 1-week “hands on” workshop.
Numediart is the result of collaboration between Polytech.Mons (Information Technology R&D Department) and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits from the expertise of the Multitel research center on multimedia and telecommunications. As such, it is the R&D component of MONS2015, a broader effort towards making Mons the cultural capital of Europe in 2015.
The HMM2SPEECH project (2007 - )
Intelligibility and expressivity have become the keywords in speech
synthesis. For this, a system (HTS) based on the statistical generation of
voice parameters from Hidden Markov Models has recently shown its potential
efficiency and flexibility. Nevertheless this approach has not yet reached
its maturity and is limited by the buzziness it produces. This latter
inconvenience is undoubtedly due to the parametrical representation of
speech inducing a lack of voice quality. The first part of this thesis is
consequently devoted to the high-quality analysis of speech. In the future,
applications oriented towards voice conversion and expressive speech
synthesis could also be carried out.
The PAST project (2007 - )
PAST stands for Pathology Assessment by Source-Tract separation of speech. Speech is one of the most natural way to communicate among humans and can be affected by some troubles when used in an intensive way. Specially, this kind of problems affect people like singers or teachers. When the pathology becomes painful, these persons have to undercome a speech assessment performed by a clinician. This examination consists of acoustical, aerodynamic and image recordings which help the clinician to diagnose the degree of pathology.
In the field of speech processing, most researchers have been interested in estimating contributions of the glottal source and the vocal tract in the speech signal. Among these, the ZZT representation was recently proposed and suggest very interesting perspectives. This PhD thesis proposes to use this representation and other ones in order to evaluate the impact of pathology by the estimation of the glottal source and the vocal tract contributions in speech signal.
The COST SID project (2007 - 2011)
Sonic Interaction Design is the exploitation of sound as one of the principal channels conveying information, meaning, and aesthetic/emotional qualities in interactive contexts. The Action pro– actively contributes to the creation and consolidation of new design theories, tools, and practices in this innovative and interdisciplinary domain. While being advanced through a few sparse projects, this field relies on the COST – SID Action to strengthen the links between scientists, artists, and designers in the European Research Area. The COST – SID platform stands on four legs: (i) perception, cognition, and emotion; (ii) design; (iii) interactive art; (iv) information display and exploration. These are each supported by the research and development of the requisite new interactive technologies. Due to the breadth of its application spectrum, the COST – SID Action has the potential of affecting everyday life through physical and virtual interactive objects, as today there is the possibility to design and actively control their acoustic response so that it conveys an intended aesthetic, informational, or emotional content.
The SERKET project (2006 - )
The goals of SERKET are twofold:
- define the requirements and the
specifications of an open security platform for public places and events
- demonstrate the new architectural principle for security systems on realistic scenarios, by integrating heterogeneous sensors (video, audio, human, etc), by applying advanced fusion technologies of multimedia information and by assessing automatically threats.
The RAMCESS project (2005 - )
RAMCESS, for "Realtime and Accurate Musical Control of Expressivity in
Sound Synthesis". Expressivity is nowadays one of the most challenging
topics studied by researchers in both speech and music processing.
Indeed recent synthesizers provide acceptable results in term of
naturalness and articulation but the need to improve human/computer
interactions has brought researchers to developing systems that
present more human-like expressive skills. Currently most of the
research seems to converge towards applications where huge databases
are recorded (non-uniform unit selection or giga-sampling),
corresponding to a certain number of labelled expressions. At
synthesis time, the expression of the virtual source is set by
choosing the units inside the corresponding corpus, and then
concatenating or overlapping. On the other side, systems based on
physical modeling try to provide a concrete access to underlying
acoustic mechanisms, with today some problems in naturalness. This PhD
thesis (N. d'Alessandro, supervisor: Prof. T. Dutoit) proposes to "re-
consider" the data-based approach by investigating the short-term
analysis of signals, the description of expressive attributes of
sound, the realization of realtime and "smart" database browsing
techniques and the study of some control-based layers.
The MaxMBROLA project (2004 - )
The main topics of this research project are:
- The development of a flexible external object for Max/MSP (4.5) encapsulating the main features of the MBROLA speech synthesizer and the adaptation of the MBROLA functions to the asynchronous request-based architecture of the Max/MSP environment.
- Discussions and Max/MSP developments about the real-time control issues in the phonetic/prosodic content generation process. This research topic is a good "first-trial" concerning overall issues of real-time manipulation of concatenation-based signals.
- Propositions of various real-time concatenation-based applications (standalone, virtual instruments or Max/MSP patches) allowing performers to produce versatile voice with standard musical devices.
Past R&D projects
The MOUSTIC project (2005 - 2007)
MOUSTIC project aims at developing new frameworks, complementary to the existing ones, for the diffusion of road information in Wallonia.
It would use new channels of diffusion which we propose to develop and to integrate in the existing steps of the WHIST project (Walloon Highway Information System for Traffic).
The system consists of the creation of a free communication channel using existing broadcastings.
During radio transmission, information will be hidden in the form of pseudo-random noise inaudible by a human.
A low cost receiver will decode this information and synthesize it vocally, or display it on a screen.
The IRMA project (2005 - 2008)
L’objectif d’IRMA est de concevoir et développer une interface modulaire innovante pour la recherche et la navigation multimodale personnalisée, performante, sécurisée et économiquement viable dans des bases de données audiovisuelles indexées. Elle permettra une recherche contextuelle, intuitive et naturelle complétée par une navigation fluide. De la sorte, IRMA fournira un environnement permettant d’exploiter au mieux l’intelligence de l’utilisateur du moteur de recherche.
The SIMILAR project (2003 - 2007)
The SIMILAR European Network of Excellence will create an integrated task force on multimodal interfaces that respond intelligently to speech, gestures, vision, haptics and direct brain connections by merging into a single research group excellent European laboratories in Human-Computer Interaction (HCI) and in Signal Processing.
SIMILAR will develop a common theoretical framework for fusion and fission of multimodal information using the most advanced Signal Processing tools constrained by Human Computer Interaction rules.
SIMILAR will develop a network of usability test facilities and will establish an assessment methodology.
SIMILAR will develop a common distributed software platform available for researchers and the public at large through www.openinterface.org
SIMILAR will address Grand Challenges in the field of edutainment, interfaces for disabled people and interfaces for medical applications.
SIMILAR will establish a top-level foundation which will manage an International Journal, Special Sessions in existing conferences, organize summer schools, interact with key European industrial partners and promote new research activities at the European level.
TCTS Lab's contibution will be on Grand Challenges related to TTS and ASR technologies, and their integration into a multimodal framework. We will also work on enhancing Brain Computer Interfaces. SIMILAR is considered a central project for the evolution of our lab.
The DIALOGUE project (2000 - 2004)
This book addresses the problems of spoken dialogue system design and especially automatic learning of optimal strategies for man-machine dialogues. Besides the description of the learning methods, this text proposes a framework for realistic simulation of human-machine dialogues based on probabilistic techniques, which allows automatic evaluation and unsupervised learning of dialogue strategies. This framework relies on stochastic modelling of modules composing spoken dialogue systems as well as on user modelling. Special care has been taken to build models that can either be hand-tuned or learned from generic data.
Research staff
Academics
|
Thierry DUTOIT
Full Professor TCTS Lab - FPMs
| tel : +32 65 37 47 74
| thierry.dutoit |  |
|
|
|
Joël HANCQ
Full Professor, Head TCTS Lab - FPMs
| tel : +32 65 37 47 30
| joel.hancq |  |
|
|
|
Researchers
|
Frédéric BETTENS
Senior researcher, PhD TCTS Lab - FPMs
| tel : +32 65 37 47 65
| frederic.bettens |  |
|
|
|
Anderson MILLS
Researcher, PhD TCTS Lab - FPMs
| tel : +32 65 37 4731
| john.mills |  |
|
|
|
Alexis MOINET
Researcher, PhD Student TCTS Lab - FPMs
| tel : +32 65 37 47 47
| alexis.moinet |  |
|
|
Links
|