TCTS Lab Research Groups

Welcome to the ASR group homepage

[ FPMs > TCTS > ASR group ]








Current research on speech recognition covers speaker-independent isolated word recognition, large vocabulary continuous speech recognition based on keyword spotting, speaker adaptation, and robust speech recognition using multi-band models (patent pending), in the framework of the THISL, RESPITE, SPRACH, and LKIT research projects.

Our twenty years of experience in this area is materialized in the STRUT Speech Recognition software toolkit, which includes hybrid HMM/ANN (Hidden Markov Models/Artificial Neural Networks) speech modeling, for which we are recognized as one of the major research units worldwide.

Applications: speech recognition systems for embedded applications (car navigation, consumer products), computer-assisted language learning, multimodal interfaces.

See "A short introduction to speech recognition" for further details about the ASR technology.

Check the speech recognition part of comp.speech for interesting links to ASR sites.

Research areas

Through several European and national projects, but also in collaboration with the MULTITEL ASBL research center, the ASR team addresses many aspects of the speech recognition problem:

  • Hybrid HMM/ANN: combining hidden Markov models with artificial neural networks is a powerful alternative to classical stochastic models. This technique has been extensively studied and is now used as the baseline approach of our ASR systems. 

  • Software engineering: over the last years we have built the STRUT toolkit for the fast development of ASR applications. Based on the plug and play programming philosophy, this software library implements many ASR related algorithms: signal processing, feature extraction, GMM, ANN, Viterbi decoding, state alignment, ...

  • Robust speech recognition: through several projects (RESPITE, MODIVOC, AURORA) we have deeply investigated the problem of robust speech recognition: spectral subtraction, wiener filtering, noise estimation, multi-band, mixture of experts, missing data, microphone arrays, ...

  • Keyword spotting: the problem of keyword spotting is crucial in real life application in order to partly handle spontaneous speech in man-machine dialogues. Next to keyword spotting, the problem of rejection of out of vocabulary words or poorly recognized word, has also been studied through the estimation of relevant confidence levels.

  • Model adaptation: fast model adaptation allows to update the statistical models with very few data. Noise adaptation and speaker adaptation have been studied, more particularly in the framework of the hybrid HMM/ANN systems.

  • Automatic phonetization

  • Microphone arrays

  • Dialogue management

  • Distributed speech recognition: adapting voice interface to mobile environments imposes many constraints on the hardware capacities. Distributing the speech recognition processing consists in achieving a part (as light as possible) of this processing on the mobile device, transmitting some internal data representation and performing the rest of the recognition on fixed servers. This approach allows to control the CPU load on the mobiles and the useful bandwidth while guaranteeing the ASR performance. This problem has been adressed through the MODIVOC and AURORA projects.

  • Embedded speech recognition.

^ Top ^   

Ongoing projects

2009 - 2011
Do-it-Yourself Smart Experiences

COST 2102
2007 - 2011
COST 2102

2004 - 2008

1996 - 2000
Speech Training and Recognition Unified Tool

Former projects

2010 - 2014
PhD Thesis Maria Astrinaki

2008 - 2015

KWS Predict
2007 - 2008
KWS Prediction

2005 - 2008
Multimodal Search Interface for Audiovisual content

2004 - 2006
Interface Créative & Conception

2004 - 2006

2004 - 2007
Mobile Access Information System

2002 - 2004
Systèmes MObiles et DIstribués à interface VOCale

COST 278
2001 - 2008
Spoken Language Interaction in Telecommunication

2000 - 2003
ARchitecture de Télécommunication Hospitalière pour les services d'Urgence

2000 - 2004
PhD Thesis Olivier Pietquin

2000 - 2004
PhD Thesis Erhan Mengusoglu

1999 - 2002
REcognition of Speech by Partial Information Techniques

1998 - 1999

1997 - 2000
THematic Indexing of Spoken Language

1995 - 1998
SPeech Recognition Algorithms for Connectionist Hybrids

COST 250
1995 - 2000
Automatic Speaker Recognition over the Telephone Network

COST 249
1994 - 2000
Continuous Speech Recognition Over the Telephone Network

1994 - 2005
Object-Oriented Block Processing

1993 - 1995
HIdden MARkov models and Neural NETworks