>
Research Projects
 
 

Our Research

TCTS lab evolved as early as the seventies toward digital processing techniques, mostly by studying digital filters. The evolution of its research activities has naturally led it to investigate many areas of 1D signal processing as speech processing or bio-medical engineering. It also had an evolution to 2D signal processing with the creation of an image research group.
Finally, a new and more transversal research axis dealing with the adaptation of the biological attention to computers was added. Computational attention may be applied as well to 1D audio signals, to 2D still images or 2.5D videos.

Newly, our lab is involved within the digital arts with the Numediart project.

You can find here our R&D groups, ongoing and past R&D projects and PhD theses. You may also take a look at our publication page.


Our R&D groups
Synthesis Group Logo
Speech Analysis and Synthesis


Recognition Group Logo
Speech Recognition


Audio Group Logo
Audio Processing



Fusion Group Logo
Sensors and Data Fusion


Biomed Group Logo
Biomedical Data Processing


Harware Group Logo
Hardware and Sofware for Signal Processing
Group Image Logo
Image Processing



Ongoing R&D projects
Past R&D projects
Ongoing PhD Theses
Past PhD Theses


Ongoing R&D projects

The DIGISTORM project (2016 - 2020)

Top

The ROBIGame project (2014 - 2017)

Top

The i-Treasures project (2013 - 2017)

i-Treasures (Intangible Treasures - Capturing the Intangible Cultural Heritage and Learning the Rare Know-How of Living Human Treasures FP7-ICT-2011-9-600676-i-Treasures) is an Integrated Project (IP) of the European Union's 7th Framework Programme 'ICT for Access to Cultural Resources'. The project started on February 1, 2013, and will last 48 months.
Cultural expression is not limited to architecture, monuments or collections of artifacts. It also includes fragile intangible live expressions, which involve knowledge and skills. Such expressions include music, dance, singing, theatre, human skills and craftsmanship. These manifestations of human intelligence and creativeness constitute our Intangible Cultural Heritage (ICH). ICH is at the same time traditional, contemporary and living, because it does not only refer to inherited knowledge but also to the renewal of contemporary cultural expressions. It refers to the past, to the present, and, certainly to the future and is the mainspring of humanity's cultural diversity.
The main objective of i-Treasures is to develop an open and extendable platform to provide access to ICH resources, enable knowledge exchange between researchers and contribute to the transmission of rare know-how from Living Human Treasures to apprentices.

Top

The JOKER project (2013 - 2016)

This project will build and develop JOKER, a generic intelligent user interface providing a multimodal dialogue system with social communication skills including humor, empathy, compassion, charm, and other informal socially-oriented behavior.
JOKER will emphasize the fusion of verbal and non-verbal channels for emotional and social behavior perception, interaction and generation capabilities. Our paradigm invokes two types of decision: intuitive (mainly based upon non-verbal multimodal cues) and cognitive (based upon fusion of semantic and contextual information with non-verbal multimodal cues.) The intuitive type will be used dynamically in the interaction at the non-verbal level (empathic behavior: synchrony of mimics such as smile, nods) but also at verbal levels for reflex small- talk (politeness behavior: verbal synchrony with hello, how are you, thanks, etc). Cognitive decisions will be used for reasoning on the strategy of the dialog and deciding more complex social behaviors (humor, compassion, white lies, etc.) taking into account the user profile and contextual information.
JOKER will react in real-time with a robust perception module (sensing user's facial expressions, gaze, voice, audio and speech style and content), a social interaction module modelling user and context, with long-term memories, and a generation and synthesis module for maintaining social engagement with the user.

Top

The SonixTrip project (2013 - 2016)

Top

The PREDATTOR project (2012 - 2015)

Based on neuro-psychological and human attention modelling research, the Predattor prototype is able to automatically compute an attention map of any image. This map shows where YOU will gaze when you see an image providing results very close to eye-tracking data. Predattor focuses on neuromarketing and helps you optimize your website or online ads. Cheap and fast, you can use Predattor during the creative process until your key message is visible enough and until you stand out from the competition.

Top

The DiYSE project (2009 - 2011)

The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.

Top

The BIOFACT project (2009 - 2012)

In the field of orthoses and prostheses, products suggested on a daily basis to patients suffering of paralysis or having had a limb amputated, little progress has been made despite the significant technological developments in the manufacturing industry and industrial robotics. However, the development of neurophysiological knowledge and micro-electronic and IT technologies should allow for a better use of the biological signals available in a non-invasive manner such as electroencephalogram (EEG) and electromyogram (EMG) to supplement the deficient motor commands of the handicapped. Researchers around the world agree on the idea that this century will be the one that will integrate human cerebral capacities with devices made from new materials. Biomanufacturing - the production by added manufacturing of biocompatible pieces, regroups four fields of application: medical devices, orthoses and prostheses, tissue engineering, and decision supporting anatomical models. Although the last sector is already well represented, the latest EBM (Electron Beam Melting) developments open new perspectives for the direct manufacturing of biocompatible titanium and stainless steel parts. Today, there is still no tool for the digital simulation of the EBM process. Its development in the MORFEO digital simulation platform will help enrich it with an essential link of the chain in the digital manufacturing of innovative products. As biomanufacturing R&D projects multiply throughout Europe, it is time to realise results in the form of services offered to businesses active in the sector. The realisation of this biomanufacturing platform in Hainaut will to help meet this growing demand from businesses.

Top

The RECITE project (2007 - 2009)

RECITE aims at extending OCR application for machine vision for objects with different surfaces (metal and so on) and with very various characters. Close to natural scene text understanding, this project focuses on interactively configurable recognition software in order to give access to non-experts people (in SMEs for instance). Hence, the main goal is to enable the creation of dedicated recognizers for particular applications. Based on smart dialogues between the computer and the end-user, particularities of the application, degradations embedded into images will be semi-automatically defined in order to build an efficient recognizer. Additionally, some challenges are met such as extraction and recognition of engraved/embossed characters, which are limitations of systems dealing with natural scene text. In that context, one example is first taken in order to make further the model more versatile: the recognition of engraved characters into metallic and reflective surfaces in uncontrolled environment.

Top

The TRANSLOGISTIC project (2007 - 2011)

TransLogisTIC is an ambitious research project financed by the Walloon Region (2.5 years - 14 m. euros) which is built around a longterm strategy aimed at developing a complete and efficient multimodal transport system in Wallonia as well as high quality logistics services with high added value. Supported by internationnaly recognized Walloon actors (10 enterprises and 5 universities), the project will result in the creation of innovative and efficient products and services.

Top

The COST 2102 project (2007 - 2011)

The main objective of the Action is to develop an advanced acoustical, perceptual and psychological analysis of verbal and non-verbal communication signals originating in spontaneous face-to-face interaction, in order to identify algorithms and automatic procedures capable of identifying human emotional states. Several key aspects will be considered, such as the integration of the developed algorithms and procedures for application in telecommunication, and for the recognition of emotional states, gestures, speech and facial expressions, in anticipation of the implementation of intelligent avatars and interactive dialogue systems that could be exploited to improve user access to future telecommunication services.

Top

The IAP - DYSCO project (2007 - 2011)

The Interuniversity Attraction Poles programme is managed by the Belgian Science Policy Office (BELSPO). The programme was created in 1987 by Guy Verhofstadt, then Minister for Scientific Affairs. It is presently in its sixth phase. Full information about the programme can be found at the IAP website of BELSPO. The aim of the programme is to fund basic research by promoting collaborative research, through networks of research teams in different Belgian universities. There are presently 44 networks in Belgium, covering all disciplines. The year 2007 is the starting year for Phase VI of our IAP network "Dynamical systems, control and optimization" (DYSCO) , which covers the period 2007 - 2011. Several of the teams of DYSCO have participated in previous IAP networks.

Top

The PIST project (2004 - 2008)

The PIST project (for Safe and Intelligent Positioning for Transport) deals with the development of sensor fusion systems for self-positioning of vehicles (navigation). The PIST team will design the algorithms for use in applications where safety is crucial, such as railways signalling. The project combines aspects of signal processing, data fusion, system modelisation and integrity assessment.

Top

The Edutain project (2004 - 2008)

The R&D activities of TCTS Lab in the area of edutainment and speech communication have led to the development of real-time voice interfaces based on acoustic features. Such tools play an important part in the voice control of information systems, as studied in a multi-modal perspective by the SIMILAR European Network of Excellence.

Top

The TTSBOX project (2004 - 2008)

TTSBOX performs the synthesis of Genglish (for "Generic English"), an imaginary language obtained by replacing English words by generic words. Genglish therefore has a rather limited lexicon, but its pronunciation maintains most of the problems encountered in natural languages. TTSBOX uses simple data-driven techniques (Bigrams, CARTs, NUUs) while trying to keep the code minimal, so as to keep it readable for students with reasonable MATLAB practice.

Top

The MaxMBROLA project (2004 - 2008)

The main topics of this research project are:

  • The development of a flexible external object for Max/MSP (4.5) encapsulating the main features of the MBROLA speech synthesizer and the adaptation of the MBROLA functions to the asynchronous request-based architecture of the Max/MSP environment.

  • Discussions and Max/MSP developments about the real-time control issues in the phonetic/prosodic content generation process. This research topic is a good "first-trial" concerning overall issues of real-time manipulation of concatenation-based signals.

  • Propositions of various real-time concatenation-based applications (standalone, virtual instruments or Max/MSP patches) allowing performers to produce versatile voice with standard musical devices.

Top

The LASEF project (2004 - 2008)

Le but de ce projet est la démonstration d'un système LIDAR (LIght Detection And Ranging) pour la détection des turbulences et flux d'air et en établir un modèle théorique. Le système sera basé sur la détection des mouvements de particules dans l'air par effet Doppler. Cette technique met en jeu une source d'émission LASER couplée à un appareil de détection de la lumière rétro-diffusée par les particules. L'aboutissement du projet consistera en la démonstration d'une mesure de détection de type LIDAR, à l'aide d'un appareil fiable et transportable sur terrain. En effet, la fréquence des atterrissages et des décollages dans les aéroports est telle qu'il est primordial de vérifier que la distance entre avions est suffisante, notamment dans le sillage des grands porteurs (A380 notamment). L'application directe recherchée est donc une mesure des turbulences à l'atterrissage ou au décollage d'un avion.

Top

The HCR-NN project (1998 - 2002)

Off-Line Handwritten Character Recognition using Neural Networks

Top

The STRUT project (1996 - 2000)

The Speech Training and Recognition Unified Tool (STRUT) has been developed to do research on speech recognition and fast development and testing of related applications. The software is able to do speech analysis, models training and speech recognition. The tool consists in many ``independent'' small pieces of code, one for each of identified module in the process of speech recognition: sampling, feature extraction, clustering, probability estimation, and decoding. It is now being extended (version 2.0) in collaboration with MULTITEL ASBL.

Top

The MBROLA project (1995 - 1999)

The goal of the MBROLA project is to obtain a set a high quality speech synthesizers for as many languages as possible, free for use in non-commercial applications. The ultimate goal is to boost up academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges in Text-to-Speech Synthesis for the years to come. As of 2003, 26 languages are available, and ore than 50 voices. Many other languages are in preparation. The software has been compiled on 21 machine/OS combinations

Top


 

Past R&D projects

The SLOWDIO project (2013 - 2015)

In this project we implement new methods for time-stretching of stereo audio signals, especially audio generated during sport events. This will enable viewers to watch slow-motion videos with synchronous time-stretched quality-preserved sound.

Top

The HandSketch project (2012 - 2014)

Development of a new digital musical instrument that will give a musician the possibility to perform synthetic singing on stage.

Top

The ILHAIRE project (2011 - 2014)

ILHAIRE is funded under the Future and Emerging Technologies (FET) chapter of the 7th framework program for research of the European Union, a very competitive line of research funding where less than 6% of research proposals get funded. It intends to study the role of laughter during interactions between humans and machines and to develop new paradigms for natural man-machine interactions, including though anthropomorphic avatars that may play an important role in future digital media. The project is in particular focused on non-verbal social communication cues related to smile an laughter, within a framework that will consider laughter as part of dialogs, and using technologies for accurate multimodal capture of the different facets of social communication (voice, gestures, posture and facial expressions). The ILHAIRE consortium is composed of an interdisciplinary team of nine organizations.

Top

The LinkedTV project (2011 - 2015)

LinkedTV is supported under the Networked Media and Search Systems strategic objective of the 7th framework program for research of the European Union. The project aims to provide a novel practical approach to Networked Media based on four phases: annotation, interlinking, retrieval, and presentation. LinkedTV will allow to seamlessly connect multimedia content on the Web by integrating networked media analysis, personalization and presentation technologies within an integrated and coherent framework. UMONS is involved in novel approaches for gathering user preferences through behavior analysis, and for presentation interfaces facilitating content and video search. The LinkedTV consortium is composed of twelve organizations, and it lead by Fraunhofer IAIS.

Top

The COMPTOUX project (2010 - 2013)

Top

The COST IC0903 project (2009 - 2013)

Knowledge Discovery from Moving Objects (MOVE)

The main objective of the Action is to develop improved methods for knowledge extraction from massive amounts of data regarding moving objects. This Action aims to build a network for collaboration that leads to the improvement of ICT methods for knowledge extraction from massive amounts of data about moving objects. This knowledge is essential to substantiate decision making in public and private sectors. Moving object data typically include trajectories of concrete objects (e.g. humans, vehicles, animals, and goods), as well as trajectories of abstract concepts (e.g. spreading diseases). While movement records are nowadays generated in huge volumes, methods for extracting useful information are still immature, due to fragmentation of research and lack of comprehensiveness from monodisciplinary approaches. Overcoming these limitations calls for COST-like networking. In response to a strong expression of interest from the academic, industrial, and user communities, this Action will empower the development of substantial and widely applicable methods in mobility analysis, focusing on representation and analysis of movement, including spatio-temporal data mining, and visual analytics. Results will be demonstrated through showcases for decision makers. Researchers from various subdomains in computer and geographic information sciences will join domain specialists from a broad range of relevant applications, from courier services and transportation to ecology, and epidemiology, among others. This will make Europe a central stakeholder in an emerging key domain.

Top

The EUCogII project (2009 - 2012)

EUCogII is a European network for researchers in artificial cognitive systems and related areas who want to connect to other researchers and reflect on the challenges and aims of the discipline. The network funds meetings, workshops, members' participation in academic events, faculty exchanges and other activities that further its aims. It continues and builds on the work of the FP6 euCognition network (2006-2008). EUCogII is funded by the Information and Communication Technologies division of the European Commission, Cognitive Systems and Robotics unit, under the 7th Research Framework Programme. FP7-ICT-EUCogII-231281

Top

The MediaTIC project (2008 - 2015)

The MediaTIC portfolio was submitted in September 2007 in response to the first call for proposals of the ERDF and started on 1st July, 2008. This ambitious project falls within the scope of measure 2.2 dedicated to the exploitation of the potential of research centres. More concretely, the project's objective is to increase the competitiveness of innovating technological SMEs in Wallonia through collective projects dictated by concrete industrial requests. It works as a cross-action for the innovation in the NTIC component of each strategic line defined by the Walloon Marshall Plan.
To reach that goal, Multitel, as a project leader, has gathered a consortium composed of academic entities and research centres split all over the Walloon territory. Actually, MediaTIC has been submitted in both objectives of the period for 2007-2013 of the European structural funds programme, namely "Convergence" and "Regional Competitiveness and employment". The project counts on the know-how of laboratories such as the SEMI, TCTS and Telecommunications units of the Faculté polytechnique de Mons, the TELE laboratory from the Catholic University of Louvain-la-Neuve, of the research units in microelectronics (Microsys) and signal & image processing (Intelsig) from the University of Liege, of the Centexbel and SIRRIS research centres and finally, of the GIE MUWAC. By calling upon complementary partners, Multitel aimed at providing MediaTIC with the typical action leverages of a collaborative research and allowing the projects focusing towards common objectives.
MediaTIC is a portfolio of six integrated projects oriented towards specific industrial needs. Each one is run by a specialist from Multitel in the targeted field. These thematic platforms are Transmedia, Envimedia, Tracemedia, Intermedia, 3Dmedia and Optimedia.

Top

The OLIMP project (2008 - 2013)

Les applications interactives multimédias live (Live Interactive Multimédia ou LIM) réclament de très hautes performances pour satisfaire les exigences en qualité et vitesse de traitement. Ce projet se concentre sur l'étude et le développement d'outils logiciels et matériels pour répondre aux besoins des utilisateurs de multimédia en temps réel. Nous travaillons notamment actuellement sur l'exploitation des processeurs graphiques GPU pour le calcul intensif en traitement d'images, à la fois pour des applications médicales (détection de contours et de mouvements) et des applications en arts numériques (analyse, détection et suivi de mouvements, incrustation de virtuel dans des images réelles). Pour ce dernier point, il y a collaboration étroite avec le Programme d'Excellence Numédiart (voir http://www.numediart.org).

Top

The CALLAS project (2007 - 2010)

CALLAS ("Conveying Affectiveness in Leading-Edge Living Adaptive Systems") is a European Integrated Project (FP6). It aims at designing and developing multimodal architectures giving a strong importance to emotions, for Arts and Entertainment. The global idea of the project is that New Medias, targeting recognition and production of emotions, can enhance users' (or spectators') experience and interaction. CALLAS is thus investigating how, at the input level, emotions can be detected and how, at the output level, these emotions can be processed to generate a new audiovisual content enriching users' experience. The input modalities include both vocal and body languages (recorded through video cameras and haptic devices). In order to improve the recognition of emotions, the problem of merging the information coming from these different modalities will also be examined. The applications are ranging from digital theatre productions (playing an audio or visual content in relation with the actors' and spectators' feelings) to real or virtual museum tours (taking the visitor's interest into account to reshape the exposition and select the level of information its audioguide will give), without forgetting interactive television (modifying a scenario according to the spectator's emotions).

Top

The NUMEDIART project (2007 - 2012)

Numediart is a long-term research programme centered on Digital Media Arts, funded by Région Wallonne, Belgium (grant N°716631). Its main goal is to foster the development of new media technologies through digital performances and installations, in connection with local companies and artists.
It is organized around three major R&D themes: HyFORGE - hypermedia navigation, COMEDIA - body and media, COPI - digital instrument making. It is performed as a series of short (3-months) projects, typically 3 or 4 of them in parallel, which are concluded by a 1-week "hands on" workshop.
Numediart is the result of collaboration between Polytech.Mons (Information Technology R&D Department) and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits from the expertise of the Multitel research center on multimedia and telecommunications. As such, it is the R&D component of MONS2015, a broader effort towards making Mons the cultural capital of Europe in 2015.

Top

The COST SID project (2007 - 2011)

Sonic Interaction Design is the exploitation of sound as one of the principal channels conveying information, meaning, and aesthetic/emotional qualities in interactive contexts. The Action proactively contributes to the creation and consolidation of new design theories, tools, and practices in this innovative and interdisciplinary domain. While being advanced through a few sparse projects, this field relies on the COST - SID Action to strengthen the links between scientists, artists, and designers in the European Research Area. The COST - SID platform stands on four legs: (i) perception, cognition, and emotion; (ii) design; (iii) interactive art; (iv) information display and exploration. These are each supported by the research and development of the requisite new interactive technologies. Due to the breadth of its application spectrum, the COST - SID Action has the potential of affecting everyday life through physical and virtual interactive objects, as today there is the possibility to design and actively control their acoustic response so that it conveys an intended aesthetic, informational, or emotional content.

Top

The SERKET project (2006 - 2009)

The goals of SERKET are twofold:

  • define the requirements and the specifications of an open security platform for public places and events
  • demonstrate the new architectural principle for security systems on realistic scenarios, by integrating heterogeneous sensors (video, audio, human, etc), by applying advanced fusion technologies of multimedia information and by assessing automatically threats.

Top

The ECLIPSE project (2006 - 2012)

There are various methods of analysis aiming at classifying vocal pathologies, but none is really powerful. First of all, the "perceptive" analysis makes it possible to the doctor to qualify the quality of the voice according to several criteria, the problem of this method being subjectivity of the judgement. That's why specialists prefer the "acoustic" analysis, computer-assisted method consisting in calculating on the vocal signal a series of objective parameters which are used to qualify the voice of the patient. But this method is only effective to analyze supported vowels, and thus not continuous speech, what would be more suitable. Moreover, the strongly hoarse speakers are unable to produce pseudoperiodic speech.
The ECLIPSE project aims to develop software of acoustic analysis for any type of voice and any degree of hoarseness. The project implements the simultaneous analysis of the vocal signals and the images of the vibration of the vocal cords and aims, in addition to the realization of a clinical prototype, the realization of a portable device intended to ensure a follow-up of the patients at the risk on their workplace.

Top

The TANIA project (2006 - 2009)

In the frame of the TANIA project, we aim at designing a decision support tool for the anesthesiologists. The research involves diverse fields of applied mathematics, in particular data mining and signal processing techniques.

Top

The MOUSTIC project (2005 - 2007)

MOUSTIC project aims at developing new frameworks, complementary to the existing ones, for the diffusion of road information in Wallonia. It would use new channels of diffusion which we propose to develop and to integrate in the existing steps of the WHIST project (Walloon Highway Information System for Traffic). The system consists of the creation of a free communication channel using existing broadcastings. During radio transmission, information will be hidden in the form of pseudo-random noise inaudible by a human. A low cost receiver will decode this information and synthesize it vocally, or display it on a screen.

Top

The IRMA project (2005 - 2008)

L'objectif d'IRMA est de concevoir et développer une interface modulaire innovante pour la recherche et la navigation multimodale personnalisée, performante, sécurisée et économiquement viable dans des bases de données audiovisuelles indexées. Elle permettra une recherche contextuelle, intuitive et naturelle complétée par une navigation fluide. De la sorte, IRMA fournira un environnement permettant d'exploiter au mieux l'intelligence de l'utilisateur du moteur de recherche.

Top

The COST 277 project (2004 - 2005)

The main objective of this COST Action is to improve the quality and capabilities of the voice services for telecommunication systems through the development of new nonlinear speech processing techniques. The proposed new mathematical methods are expected to provide advances in generic speech processing functions. Examples of these are: higher quality speech synthesis, more efficient speech coding, improved speech recognition, and improved speaker identification.

Top

The IC&C project (2004 - 2006)

Le projet IC&C vise à la mise au point d'une interface homme-machine naturelle pour les systèmes de dessin et de conception assistés par ordinateur. Au contraire des interfaces classiques telles que souris, claviers, icônes et menus, le projet IC&C propose une interface inédite basée sur des agents logiciels combinant l'interprétation du tracé graphique à main libre, l'interprétation d'image et la reconnaissance vocale.

Top

The DOMINI project (2004 - 2006)

This project deals with the development of computerized medical files calls upon competences of hospital needs analysis, with the control of data-processing technologies and of computational linguistics. It also requires to take into account the legal aspects related to the protection of the private life and the medical data.

Top

The F3M project (2004 - 2008)

The goal of the project is to assess the usability of a solution based on wearable computer connected through a wireless network for improvement of the workflow in the field of maintenance, for instance for planes in the aviation sector and for trains in the railways sector. Our concept will equip any field technician with a mobile wearable computer allowing communication in real time with its colleagues and with a central server supervising all the field maintenance process and connected with the existing maintenance database used more traditionally.

Top

The MAIS project (2004 - 2007)

The objective of MAIS is to develop a low-cost, low-consumption, secure smart card that will be readable from a distance. The main applications of the project will be freight train tractability and inclusion in windshields. For this last application, the project partners work in close collaboration with Glaverbel.

Top

The DREAMS project (2003 - 2008)

Sleep scoring is essential for the detection of sleep pathologies in hospitals. It is usually performed manually by visual inspection of polysomnograms (PSG : EEG+EMG+EOG, mainly). Automated techniques exist, but fail to provide reliable results for pathological sleep.
The DREAMS project precisely aims at producing automated sleep scoring techniques in case of sleep pathologies.

Top

The iMed project (2003 - 2006)

The iMed project is about the design of a method to automatically detect emboli in the vessel tree of the pulmonary artery, from HCT (helicoidal computed tomography) millimeter slices.

Top

The MERCATOR project (2003 - 2007)

In the context of preoperative images visualization and computer-assisted surgical planning, the Mercator project aims at updating the plannings made before the operation by integrating real-time information resulting from intra-operative events in order to readjust the plans and the initial data on the real evolution during the operation or the radiotherapy.

Top

The SYPOLE project (2003 - 2006)

The blind or partially sighted people represent 17.5 million people in Europe and about 75.000 in Wallonia. For most of these people, much information, which exists in written or imaged forms, is not easily accessible for them. The main aim of Sypole project is to remedy all these needs by the realization of a prototype device, which will be portable, autonomous, small-size and easy to use for blind or partially sighted people. This kind of device will be able to recognize text and coloured forms, such as logos, and to auto-generate a speech signal.

Top

The SIMILAR project (2003 - 2007)

The SIMILAR European Network of Excellence will create an integrated task force on multimodal interfaces that respond intelligently to speech, gestures, vision, haptics and direct brain connections by merging into a single research group excellent European laboratories in Human-Computer Interaction (HCI) and in Signal Processing.
SIMILAR will develop a common theoretical framework for fusion and fission of multimodal information using the most advanced Signal Processing tools constrained by Human Computer Interaction rules.
SIMILAR will develop a network of usability test facilities and will establish an assessment methodology.
SIMILAR will develop a common distributed software platform available for researchers and the public at large through www.openinterface.org
SIMILAR will address Grand Challenges in the field of edutainment, interfaces for disabled people and interfaces for medical applications.
SIMILAR will establish a top-level foundation which will manage an International Journal, Special Sessions in existing conferences, organize summer schools, interact with key European industrial partners and promote new research activities at the European level.
TCTS Lab's contibution will be on Grand Challenges related to TTS and ASR technologies, and their integration into a multimodal framework. We will also work on enhancing Brain Computer Interfaces. SIMILAR is considered a central project for the evolution of our lab.

Top

The ARMAGEDDON project (2003 - 2004)

Armageddon is an opera sung and played by human-controled robots, in real time. Created by Art Zoyd; Robot voices taken from the MBROLA Project (under Max/MSP).

Top

The STOP project (2003 - 2006)

The STOP Project aims at studying the relationship between speech dynamics and voice quality, based on home-made tools for efficient source-tract separation.

Top

The CAPA project (2002 - 2004)

The CAPA (Automatic Classification of Agricultural Products) project implies 4 labs., from 3 Universities, which combine their respective skills in order to develop an automatic classification system of agricultural products, such as apples, according to the current quality norms applied in practice. The quality will be estimated from the possible marks, the color, or the shape of the products. The aim is to obtain a concrete prototype allowing to show the algorithmical and the mechanical possibilities of an automatic selection of fruits or vegetables.

Top

The MODIVOC project (2002 - 2004)

Speech-based interfaces are about be used in many applications, for which the most demanding is that of being able to recognize any person (without prior training of the machine), even in noisy conditions. The techniques required to achieve this are mostly availble, but their use in real portable applications is limited by their memory and CPU comsuption. MODIVOC aims at :

  • simplifying ASR algorithms
  • increasing their robustness
  • dispatching CPU load among portable computers in a network
  • specifying generic models to apply this solution in heteogeneous environements

Top

The NUMBROLA project (2001 - 2005)

NUMBROLA is an extension of MBROLA towards corpus-based, non-uniform unit (NUU) selection techniques in speech synthesis. The goal of NUMBROLA is to provide a standard concatenative synthesizer to people active in NUU research. A French database has been made available, and a first version of the software. We are currently working on an improved version, based on a modified MBROLA agorithm : TP-MBROLA.

Top

The COST 278 project (2001 - 2008)

The main objective of this Action is to create knowledge in several problem areas of spoken language interaction in telecommunications in order to achieve communicative interfaces that provide a natural human-computer interaction through more cognitive, intuitive and robust interfaces, whether monolingual, multilingual or multimodal. The scientific programme emphasises speech and dialogue processing in multimodal communication interfaces, issues related to robustness and multilinguality, human-computer dialogue theories, and models and systems and associated tools for the establishment of interactive systems. The programme also involves the evaluation of telecommunication applications in which spoken language is the only or one of many types of input or output modalities.

Top

The MLRR project (2000 - 2001)

The goal of this program is to transcribe a symbolic input, i.e. a string of symbols belonging to some alphabet, into a symbolic output according to a regular grammar described in terms of a system of multi-level rewriting rules (MLRR). "Symbols" and "alphabet" have to be understood here as generic terms: they can be characters, phonemes, syllables, words, phrases, etc. This project is closed but the software is available in Open Source format.

Top

The ARTHUR project (2000 - 2003)

Le système prototype ARTHUR constituera un point de convergence des groupes de recherche les plus avancés en technologie de l'information de la Région Wallonne autour de la thématique des technologies de l'information intelligentes et conviviales. En s'attachant à une activité spécifique, l'assistance aux interventions d'un urgentiste, il est possible de modéliser une chaïne complète de manière intégrée et originale y incluant des recherches sur des domaines aussi chauds que les interfaces homme-machine intelligents pilotés par la voix, le multicast pour les communications sécurisées, l'élaboration et le stockage de documents multimédias actifs et sécurisés et les interfaces graphiques conviviaux.

Top

The RESPITE project (1999 - 2002)

REcognition of Speech by Partial Information TEchniques ESPRIT Long Term Research RTD Project Ref. 28149.
RESPITE extended and applied two novel technologies missing data theory and multi-stream theory to the problem of robust automatic speech recognition (ASR), with particular application to cellular phones and in-car environments. It also supported studies whose purpose was to inform this endeavour. The specific measurable objectives were to :

  • develop techniques for identifying reliable data,
  • advance the theory of multi-stream processing,
  • advance the theory of missing and masked data handling,
  • inform the above by obtaining new perceptual data on speech recognition,
  • combine missing data and multistreamprocessing with existing robust ASR methods,
  • evaluate all this within a framework of demonstrator ASR applications to cellular phones and in cars.

Top

The DEMOSTHENES project (1998 - 1999)

Acquiring a good command of spoken Dutch is a non-trivial task for most French speaking learners of the language. In this prospect, two Belgian research teams have joined their expertise in speech recognition (Polytechnique - Mons) and software development for foreign language learning (Namur University) to produce a multimedia courseware for Dutch pronunciation, which detects and corrects the typical errors made by French speaking learners, using the hybrid HMM/ANN systems mastered at TCTS Lab. The final product discriminates pronunciation errors at the phoneme level.

Top

The EULER project (1997 - 2001)

For years, non-coordinated research effort on the design of text-to-speech (TTS) systems has led to unavoidable cross-system and cross-language incompatibility. The EULER project aimed at producing a unified, extensible, and publicly available research, development and production environment for multilingual TTS synthesis. EULER has led to the development of a corpus-based French TTS system. The project is no longer supported, but the software components are still available.
EULER has been reworked into eLITE, by the TTS team of MULTITEL ASBL.

Top

The MBRDICO project (1997 - 2001)

MBRDICO is a talking dictionnary using MBROLA as a back-end speech synthesizer. Text processing is performed using a complete GNU GPL package for automatic phonetization training (letter/phoneme alignement, decision tree building, stress assignment) and duration/intonation generation. French, US English, and Arabic are available. We do not work directly on this project any longer, but all its sources are available for use or extension. This work is the result of a collaboration between:

  • FacultĂ© Polytechnique de Mons
  • Carnegie Mellon University
  • University of Edinburgh

Top

The MBROLIGN project (1997 - 2001)

MBROLIGN is a fast MBROLA-based text-to-speech aligner. It is provided free for use in non commercial applications. The goal of this project is to create large phonetically and prosodically labeled for as many languages as possible, thereby drastically expanding the reach of speech technology. This project is currently closed, but the software is available for database creation.

Top

The W project (1997 - 2001)

The W project aimed at creating a fast computer keyboard driver for people with speech disabilities. The related software is based on grade II Braille languages developed by blind people associations all over the world and minimizes the number of keystrokes to utter a word (the name of the project is the grade II abreviation for "word" in English). This project has been extended by MULTITEL ASBL in the framework of the FASTY EC/FP5 Project.

Top

The THISL project (1997 - 2000)

Thematic Indexing of Spoken Language (EC RTD Long Term Research Project 23495)
The aim of the THISL project was to produce a broadcast news retrieval demonstrator for the BBC. The approach adopted was to transcribe radio and television broadcasts using the Abbot speech recognizer and then to index the resulting transcriptions using the thislIR information retrieval system - similar to a web search engine - which allows users to search for news items of interest to them. ThislIR returns a list of news clips most relevant to each query which users can listen to. Demonstrators have been produced with both text and spoken query interfaces.

Top

The SPRACH project (1995 - 1998)

SPeech Recognition Algorithms for Connectionist Hybrids (ESPRIT Long Term Research RTD Project Ref. 20077)
The goal of the proposed project is to further improve the current state-of-the-art in continuous speech recognition using Artificial Neural Network (ANN) and Hidden Markov Model (HMM) approaches. Pursuing the theoretical and development work successfully carried out under the WERNICKE project (ESPRIT Basic Research Project 6487, October 1992-October 1995), this new project, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), will extend the research to robust and flexible speech recognition systems that can easily be adapted to new languages and new domains with new lexica and new syntaxes.

Top

The COST 250 project (1995 - 2000)

Speaker Recognition in Telephony

Top

The COST 249 project (1994 - 2000)

The main objective of the project is to co-ordinate research efforts in the area of multlingual continuous speech recognition for future public network services. This will be accomplished by establishing a unified language-independent speech recognition concept, and by investigating specific topics within the framework of this concept. This way it should be possible to validate the partners' efforts in signal processing, statistical pattern recognition and linguistic processing in a more unified way .

Top

The OOBP project (1994 - 2005)

OOBP is a programming paradigm developped at TCTS Lab since 1994. It is defined as Object Oriented Programming around processes and combines OOP and block descriptions. Plug and Play Software extends OOBP by defining input and output data as abstract streams.

Top

The HIMARNNET project (1993 - 1995)

The development and assessment of neural network techniques for improving the robustness of medium vocabulary (50-100 words), speaker-independent, isolated word recognisers for telephone transmission quality speech. The dominant technology is Hidden Markov Models (HMMs) but this has significant limitations, some of which could be alleviated by the judicious use of artificial neural networks (ANNs) or hybrid combinations of both techniques. Direct comparisons of ANN-based, HMM-based, and hybrid ANN/HMM techniques for speech recognition will be made. The developments will be integrated and validated in the context of a telephone application including speech recognition capabilities. A number of prototypes have been demonstrated on low cost commodity systems. The telephone application developed within the project will be the basis for product development by Tedas.

Top



Ongoing PhD Theses

Saliency models for video protection applications (2015 - ) - PhD Thesis Pierre Marighetto

Saliency is, in computer science, the way to model visual attention, which is the ability to selectively focus on an external stimulus and take into account the most important information.

Concerning video protection, many tools can be used. These aim a specific goal, as abandoned luggage detection, people tracking, abnormal events detection or crowd analysis. These methods need a ground-truth to learn normal and abnormal situations. However, these mechanisms usually fail in unlearned conditions. This issue could be resolved using saliency mechanism.

By merging one of these approaches with saliency models, we could catch abnormal events, events defined as salient and redefine the normality.

Top

Thèse Thierry Ravet (2014 - ) - PhD Thesis Thierry Ravet

Top

Social Communicative Events Processing (2014 - ) - PhD Thesis KĂ©vin El Haddad

Human-machine interactions are becoming more and more anchored in our daily lives. Yet, the state of the art of this general term is still a very poorly explored domain compared to the future achievements possible. For an easier and more natural interaction, human-machine dialogue is one of the most interesting sub-domains to develop. This thesis focuses on ameliorating this dialogue by improving the machine's expressions on one side and its understanding of the users' messages on the other. The main strategy adopted till now here is to "teach" the machine to imitate (synthesize) and understand (recognize) the humans' social communicative signals and emotion expressions (and their "meaning" in all social contexts possible, like fillers, laughter, confusion etc.).

Top

Thèse Omar Seddati (2014 - ) - PhD Thesis Omar Seddati

Top

Thèse Gueorgui Pironkov (2014 - ) - PhD Thesis Gueorgui Pironkov

In recent years, deep neural networks have been outperforming Gaussian mixture models on various speech recognition and speech synthesis tasks. Deep neural networks have proven to be a powerful modeling tool. Thanks to their many levels of non-linarites, they enable to compute signal features becoming increasingly invariant and discriminative as the networks become deeper. In this context, this thesis investigates architectural variants as well as different methods for adapting neural networks to particular voices. The performance of the different algorithms will be compared and evaluated for speech recognition along with speech synthesis. This thesis is part of the European Project HITNI 2.0.

Top

Real time motion recognition and motion quality assessment (2014 - ) - PhD Thesis Sohaib Laraba

The PhD subject is the conception, realization and validation of a real-time gesture recognition methodology, based on skeleton tracking data, taking into account the recognition or characterization of the motion « quality » or motion « style ». The analysed motions will be traditional and contemporary dance motions, recorded in the framework of the i-Treasures project using both markerless motion capture techniques (depth cameras) and optical motion capture systems. The effect of the quality of the recorded motion data (highly dependent on the motion capture technology) on the motion recognition results will have to be studied and compared.
This thesis is part of the European Project i-Treasures.

Top

Thèse Mickaël Tits (2014 - ) - PhD Thesis Mickaël Tits

Top

Thèse Willy Yvart (2013 - ) - PhD Thesis Willy Yvart

Top

Thèse François Rocca (2011 - ) - PhD Thesis François Rocca

Top

Thèse Onur Babacan (2010 - ) - PhD Thesis Onur Babacan

Top

Thèse Radhwan Ben Madhkour (2009 - ) - PhD Thesis Radhwan Ben Madhkour

Top


 

Past PhD Theses

AVLASYN - Audio-Visual Laughter Synthesis (2012 - 2016) - PhD Thesis HĂĽseyin Cakmak

Laughter is one of the most important signals of human interactions. It has important various functions in the social context, we can find conveying our emotions, back-channeling, displaying affiliation or mitigating an unpleasant comment. With the advances in human-machine interactions and the developments in speech processing, a growing interest in laughter processing has been seen in the last decades. Detecting, analyzing and producing laughter have become tasks that a machine should be able to perform. This project aims at producing convincing synchronous acoustic and visual laughter. Possible application fields include human-machine interactions (smartphones, navigation systems, etc), video-games development, animation movies production or humanoid robots control.

Top

MAGE / pHTS (2010 - 2014) - PhD Thesis Maria Astrinaki

This project is based on the HMM-Based Speech Synthesis System (HTS), a statistical parametric speech synthesis system, where vocal tract, vocal source and prosody of speech are modelled simultaneously by HMMs and the synthetic speech is generated from HMMs themselves. HTS provides intelligibility and expressivity, it is flexible, easily adapted and with small footprint but on the other hand it is not reactive to real time user input and control. Going one step further, towards on the fly control over the synthesised speech we developed pHTS (performative HTS) that allows reactive speech synthesis and MAGE that is the engine independent and thread safe layer of pHTS that can be used in reactive application designs. This will enable performative creation of synthetic speech, from a single or multiple users, in one or multiple platforms, using different user interfaces and applications. This PhD thesis is supported by a public-private partnership between University of Mons and Acapela Group SA, Belgium.

Top

VISION: Video and Image Saliency Detection (2010 - 2016) - PhD Thesis Nicolas Riche

Human visual system receives 80% of our daily lives information but this amount of visual data physically exceeds the capacity of our brain. The mechanism that overcomes this fundamental issue and determines what part of the incoming ocular information is interesting and must be processed first is called visual attention. Since the early 2000s, modeling visual attention has been a very active research area. This thesis adds a brick to this crucial endeavor of understanding and modeling human attention: after presenting several state of the art models, it proposes new methods illustrated through several practical applications. Besides, these algorithms need to be fairly evaluated which is the scope of the second part of the thesis by developing a new framework to assess saliency models.

Top

Thèse Julien Leroy (2010 - 2016) - PhD Thesis Julien Leroy

Top

Designing interaction for browsing media collections (by similarity) (2010 - 2015) - PhD Thesis Christian Frisson

Sound designers source sounds in massive and heavily tagged collections. When searching for media content, once queries are filtered by keywords, hundreds of items need to be reviewed. How can we present these results efficiently? This doctoral work aims at improving the usability of browsers of media collections by blending techniques from multimedia information retrieval (MIR) and human-computer interaction (HCI). We produced an in-depth state-of-the-art on media browsers. We overviewed HCI and MIR techniques that support our work: organization by content-based similarity (MIR), information visualization and gestural interaction (HCI). We developed the MediaCycle framework for organization by content-based similarity and the DeviceCycle toolbox for rapid prototyping of gestural interaction, both facilitated the design of several media browsers. We evaluated the usability of some of our media browsers. Our main contribution is AudioMetro, an interactive visualization of sound collections. Sounds are represented by content-based glyphs, mapping perceptual sharpness (audio) to brightness and contour (visual). These glyphs are positioned in a starfield display using Student t-distributed Stochastic Neighbor Embedding (t-SNE) for dimension reduction, then a proximity grid optimized for preserving direct neighbors. Known-item search evaluation shows that our technique significantly outperforms a grid of sounds represented by dots and ordered by filename.

Top

Simulation de la réponse de la rétine en conditions lumineuses mésopiques (2009 - 2013) - PhD Thesis Justine Decuypere

Simulation de la réponse de la rétine en conditions lumineuses mésopiques

Top

Brain-Computer Interfaces for Ambulatory Applications (2009 - 2014) - PhD Thesis Matthieu Duvinage

Disabilities affecting mobility, in particular, often lead to exacerbated isolation and thus fewer communication opportunities, resulting in a limited participation in social life. Additionally, as costs for the health-care system can be huge, rehabilitation-related devices and lower-limb prostheses (or orthoses) have been intensively studied so far. However, although many devices are now available, they rarely integrate the direct will of the patient. Indeed, they basically use motion sensors or the residual muscle activities to track the next move.

Therefore, to integrate a more direct control from the patient, Brain-Computer Interfaces (BCIs) are here proposed and studied under ambulatory conditions. Basically, a BCI allows you to control any electric device without the need of activating muscles. In this work, the conversion of brain signals into a prosthesis kinematic control is studied following two approaches. First, the subject transmits his desired walking speed to the BCI. Then, this high-level command is converted into a kinematics signal thanks to a Central Pattern Generator (CPG)-based gait model, which is able to produce automatic gait patterns. Our work thus focuses on how BCIs do behave in ambulatory conditions. The second strategy is based on the assumption that the brain is continuously controlling the lower limb. Thus, a direct interpretation, i.e. decoding, from the brain signals is performed. Here, our work consists in determining which part of the brain signals can be used.

Top

Degree of Articulation (2009 - 2013) - PhD Thesis Benjamin Picart

Nowadays, speech synthesis is part of various daily life applications. The ultimate goal of such technologies consists in extending the possibilities of interaction with the machine, in order to get closer to human-like communications. However, current state-of-the-art systems often lack of realism: although high-quality speech synthesis can be produced by many researchers and companies around the world, synthetic voices are generally perceived as hyperarticulated. In any case, their degree of articulation is fixed once and for all.

The present thesis falls within the more general quest for enriching expressivity in speech synthesis. The main idea consists in improving statistical parametric speech synthesis, whose most famous example is Hidden Markov Model (HMM) based speech synthesis, by introducing a control of the articulation degree, so as to enable synthesizers to automatically adapt their way of speaking to the contextual situation, like humans do. The degree of articulation, which is probably the least studied prosodic parameters, is characterized by modifications of phonetic context, of speech rate and of spectral dynamics (vocal tract rate of change). It depends upon the surrounding environment and the communication context, and provides information on the relationship between the speaker and the listener(s).

Top

SLAW (2008 - 2013) - PhD Thesis Alexis Moinet

In this project we develop new methods for time-stretching of audio signals, especially audio generated during sport events. This will enable viewers to watch slow-motion videos with synchronous time-stretched quality-preserved sound.

Top

HMM2SPEECH (2007 - 2011) - PhD Thesis Thomas Drugman

Intelligibility and expressivity have become the keywords in speech synthesis. For this, a system (HTS) based on the statistical generation of voice parameters from Hidden Markov Models has recently shown its potential efficiency and flexibility. Nevertheless this approach has not yet reached its maturity and is limited by the buzziness it produces. This latter inconvenience is undoubtedly due to the parametrical representation of speech inducing a lack of voice quality. The first part of this thesis is consequently devoted to the high-quality analysis of speech. In the future, applications oriented towards voice conversion and expressive speech synthesis could also be carried out.

Top

LAUGHTER (2007 - 2014) - PhD Thesis JĂ©rĂ´me Urbain

Human speech contains a lot of paralinguistic sounds conveying information about the speaker's (affective) state. Laughter is one of those signals. Due to its high variability, both inter- and intra- speaker (one same person will laugh differently depending on its emotional state, environment, etc.), it is difficult to recognize laughter from an audio record or to synthesize human-like laughter, sounding natural. In the framework of the CALLAS project, our study aims at catching the global patterns of laughter in order to develop algorithms to detect it in real-time and to produce natural laughter utterances. Potential uses cover the broad range of applications using automatic speech recognition and synthesis for human computer interactions.

Top

Stylistic walk - Synthèse de marche (2007 - 2013) - PhD Thesis Joëlle Tilmanne

In this thesis, we tackle the problem of model-based walk synthesis and highlight the strong parallelism that exists between speech and motion. We analyze how speech synthesis approaches, and more specifically Hidden Semi Markov Models (HSMM) taking the data dynamics into account, can be adapted to motion. Although the main scope of our work is to analyze how motion can be synthesized by applying probabilistic modeling techniques such as Hidden Markov Models, we also tested two methods which represent the motion space by a set of simpler functions, through Principal Component Analysis (PCA) or Fourier transform.

Top

Thèse Thomas Dubuisson (2006 - 2011) - Glottal Source Estimation and Automatic Detection of Dysphonic Speakers

This thesis is devoted to the development of methods for detecting the dysphonic speakers. The pathological aspects of these phonations are usually assessed in clinics by means of perceptive and objective analysis. In support to this assessment, there is a need to develop new objective methods in order to detect a pathology or evaluate the voice quality before and after surgery. After a large overview of existing methods in terms of features and classification approaches and a comparison between different methodologies for the features selection, it is investigated to which extent a limited number of features can be combined in a simple classification approach to detect the presence of a pathology. A first application shows that the correlation between acoustic descriptors, which do not require the estimation of fundamental period, is able to discriminate well between normal and pathological sustained vowels. A second application shows the interest of combining the information extracted from the speech signal and the estimation of the glottal source for the detection of voice pathologies. In this application, two features (one computed on the speech signal and the other on the glottal contribution) are selected by means of mutual information-based measure and their distribution for normal and pathological voices is estimated to derive a simple classifier based on Gaussian Mixture Models. The ability of this classification approach to discriminate between normal and pathological sustained vowels is demonstrated and it is proposed to nuance the decision provided by the classifier by including indetermination zones in the normal/pathological decision. These precautions allow to increase the reliability of the decision provided to the clinician.

Top

RAMCESS (2005 - 2009) - PhD Thesis Nicolas D'Alessandro

RAMCESS, for "Realtime and Accurate Musical Control of Expressivity in Sound Synthesis". Expressivity is nowadays one of the most challenging topics studied by researchers in both speech and music processing. Indeed recent synthesizers provide acceptable results in term of naturalness and articulation but the need to improve human/computer interactions has brought researchers to developing systems that present more human-like expressive skills. Currently most of the research seems to converge towards applications where huge databases are recorded (non-uniform unit selection or giga-sampling), corresponding to a certain number of labelled expressions. At synthesis time, the expression of the virtual source is set by choosing the units inside the corresponding corpus, and then concatenating or overlapping. On the other side, systems based on physical modeling try to provide a concrete access to underlying acoustic mechanisms, with today some problems in naturalness. This PhD thesis (N. d'Alessandro, supervisor: Prof. T. Dutoit) proposes to "re- consider" the data-based approach by investigating the short-term analysis of signals, the description of expressive attributes of sound, the realization of realtime and "smart" database browsing techniques and the study of some control-based layers.

Top

ATTENTION (2003 - 2007) - PhD Thesis Matei Mancas

Attention is a simplification or filtering process which transforms a huge acquired unstructured data set into a smaller structured one while preserving the main information. All cognitive processes need attention; humans pay attention (consciously or unconsciously) from their birth to their death in every single moment. Attention is even used during the dreams and the R.E.M. (Rapid Eye Movements) sleep phase.

Nevertheless, attention is not specifically a human process but it is simply used by any living being from humans to insects. Attention is the beginning of intelligence: there is no intelligence without attention!

Similarly to the fact that attention is the beginning of intelligence in biology, computational attention may be the starting point of artificial intelligence in engineering applications. Computational attention provides machines with human-like reactions and behaviours and let them free to make decisions even in unexpected situations:

  • A computer which pays attention is able to be surprised and interested in novel data.
  • A computer which pays attention is able to understand novel situations and to choose the important data it will learn.

Top

UNDERSTAND (2003 - 2006) - PhD Thesis CĂ©line Thillou

With the drastic expansion of low-priced cameras, text recognition is nowadays a fast changing field; in particular, natural scene text understanding which aims at extracting text from daily images. From text extraction to correction of recognition errors, each sub-step is deeply studied to enhance versatility for handling most complex images. Either in color camera-based images or in low resolution thumbnails, inherent degradations, such as complex backgrounds, artistic fonts, uneven lighting or unsatisfactory resolution, must be taken into account. In order to circumvent or correct them, studies of image formation and degradation sources challengingly led to overcome too constrained definitions of color spaces. Hence our selective metric text extraction attempts to combine magnitude and directional processing of colors in an unsupervised framework. Text extraction from background is simultaneously linked to subsequent steps of character segmentation and recognition. This intermingled chain mainly aims at combining color, intensity and spatial information of pixels for robustness and accuracy. Each of these features addresses different issues; the first one for text extraction and the two latter ones for recovering initial separation between characters through log-Gabor filtering. In order to reach higher quality results, pre- and post-processing of natural scene text understanding are necessary and deal with Teager-based super-resolution, assuming a simple affine motion between frames with our SURETEXT proposition for the first one and with association of recognition outputs and linguistic information through lightweight finite state machines for the second one.

Top

Thèse S. DEVUYST (2003 - 2011) - Automatic Analysis of Polysomnographic Traces from Adults

Cette thèse a été réalisée en collaboration avec le laboratoire de sommeil de l’hôpital André Vésale de Charleroi et a pour objectif l’analyse automatique des signaux du sommeil des adultes. Plus spécifiquement, son but est, d’une part, de détecter un ensemble de micro-événements apparaissant dans certains états de sommeil, ou caractéristiques à certaines pathologies (comme les apnées du sommeil), et d’autre part, de discerner automatiquement les différents stades de sommeil. Pour ce faire, une attention toute particulière a été portée au traitement des artefacts. Une méthode originale de correction des interférences cardiaques sur les électroencéphalogrammes a notamment été mise au point. En outre, les procédés de classification automatiques en stades du sommeil ont été revus de manière à s’adapter aux nouvelles règles de cotation en stades du sommeil de l’AASM (l’American Academy of Sleep Medicine). Enfin, plusieurs procédés de classification ont été comparés en évaluant leurs résultats de détection sur une même base de données de 47 enregistrements polysomnographiques de nuits complètes.

Top

APPLE (2002 - 2006) - PhD Thesis Devrim Unay

Quality inspection of apple fruits, traditionally performed by human experts, has to be automated by machine vision to reduce error, variation, fatigue and cost due to humans as well as to increase speed... A typical apple inspection system should employ image processing and pattern recognition techniques to precisely segment defected skin by minimal confusion with stem/calyx areas and classify fruit into correct quality category. In this thesis, we present a work performed for quality inspection of bi-colored apples using multispectral images by tackling each of these sub-problems (namely, stem/calyx recognition, defect detection and fruit grading) individually. Stem and calyx are natural parts of apples that are confused with some defects in machine vision systems. A precise inspection system requires their discrimination, which is achieved by a highly accurate support vector machines-based approach. Defect detection of apples by machine vision is very problematic due to numerous defect types present as well as high natural variability of skin color. This task is accomplished by multi-layer perceptrons (an artificial neural network), which outperformed several other methods in accuracy and speed. Final grading of fruit is obtained by binary and multi-category classification with different classifiers, where results achieved are very encouraging.

Top

EMBOLI (2002 - 2007) - PhD Thesis Raphael Sebbe

Pulmonary embolism (PE) is an extremely common and highly lethal condition that is a leading cause of death in all age groups. Over the past 10 years, computed tomography (CT) scanners have gained acceptance as a minimally invasive method for diagnosing PE. In this book, a framework for computer-aided diagnosis of PE in contrast- enhanced CT images is presented. It consists of a combination of a method for segmenting the pulmonary arteries (PA), emboli detection methods as well as a scheme for evaluating their performances. The segmentation of the PA serves one of the clot detection methods, and is carried out through a region growing method that makes use of a priori knowledge of vessel topology. Two different approaches for clot detection are proposed: the first one performs clot detection by analyzing the concavities in the segmentation of the pulmonary arterial tree. It works in a semi-automatic way and it enables the detection of thrombi in the larger sections of the PA. The second method does not make use of PA segmentation and is thus fully automatic, enabling detection of clots farther in the vessels. The combination of these methods provides a robust detection technique that can be used as a safeguard by radiologists, or even as preliminary computer-aided diagnosis (CAD) tool. The evaluation of the method is also discussed, and a scheme for measuring its performance is proposed, including a practical approach to making reference detection data, or ground truths, by radiologists.

Top

ZZT (2001 - 2005) - PhD Thesis Baris Bozkurt

This study proposes a new spectral representation called the Zeros of Z-Transform (ZZT), which is an all-zero representation of the z-transform of the signal. In addition, new chirp group delay processing techniques are developed for analysis of resonances of a signal. The combination of the ZZT representation with the chirp group delay processing algorithms provides a useful domain to study resonance characteristics of source and filter components of speech. Using the two representations, effective algorithms are developed for: source-tract decomposition of speech, glottal flow parameter estimation, formant tracking and feature extraction for speech recognition. The ZZT representation is mainly important for theoretical studies. Studying the ZZT of a signal is essential to be able to develop effective chirp group delay processing methods. Therefore, first the ZZT representation of the source-filter model of speech is studied for providing a theoretical background. We confirm through ZZT representation that anti-causality of the glottal flow signal introduces mixed-phase characteristics in speech signals. The ZZT of windowed speech signals is also studied since windowing cannot be avoided in practical signal processing algorithms and the effect of windowing on ZZT representation is drastic. We show that separate patterns exist in ZZT representations of windowed speech signals for the glottal flow and the vocal tract contributions. A decomposition method for source-tract separation is developed based on these patterns in ZZT. We define chirp group delay as group delay calculated on a circle other than the unit circle in z-plane. The need to compute group delay on a circle other than the unit circle comes from the fact that group delay spectra are often very noisy and cannot be easily processed for formant tracking purposes (the reasons are explained through ZZT representation). In this thesis, we propose methods to avoid such problems by modifying the ZZT of a signal and further computing the chirp group delay spectrum. New algorithms based on processing of the chirp group delay spectrum are developed for formant tracking and feature estimation for speech recognition. The proposed algorithms are compared to state-of-the-art techniques. Equivalent or higher efficiency is obtained for all proposed algorithms. The theoretical parts of the thesis further discuss a mixed-phase model for speech and phase processing problems in detail.

Top

DIALOGUE (2000 - 2004) - PhD Thesis Olivier Pietquin

This book addresses the problems of spoken dialogue system design and especially automatic learning of optimal strategies for man-machine dialogues. Besides the description of the learning methods, this text proposes a framework for realistic simulation of human-machine dialogues based on probabilistic techniques, which allows automatic evaluation and unsupervised learning of dialogue strategies. This framework relies on stochastic modelling of modules composing spoken dialogue systems as well as on user modelling. Special care has been taken to build models that can either be hand-tuned or learned from generic data.

Top

CONFIDENCE (2000 - 2004) - PhD Thesis Erhan Mengusoglu

Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we defined confidence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available confidence measures and the newly defined acoustic prior information based confidence measure in two different conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly defined confidence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We defined also a new interpretation technique for confidence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verification. Transformation provided us with a linearly interpretable confidence level which can be used directly in real time applications like for dialog management. We have also tested the confidence measures for speaker verification systems and evaluated the efficiency of the confidence measures for adaptation of speaker models. We showed that use of confidence measures to select adaptation data improves the accuracy of the speaker model adaptation process. Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for other languages. A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly defined language modeling techniques showed that it outperforms the classical n-gram language modeling technique.

Top


^ Top ^