Stéphane Dupont

Machine Intelligence Research (MIR) Group
Numediart and Infortech Research Institutes
University of Mons

Machine Intelligence Reseach (MIR) Group


Members: Academic Collaborations:

Research Areas

Multimedia Understanding

Providing computers with the capability to perceive and understand the environment around us though audio-visual modalities has long been a fascinating goal both for its scientific (artificial intelligence) and application appeal (automation, natural man-machine interaction). Machine learning is the most promising answer to these questions because it allows the computer to learn to solve these problems via an optimization process. In our group, we have both an extensive knowledge as well as tools that leverage deep artificial neural networks. Our expertise includes:
  • automatic recognition and detection of sounds, objects in images (computer vision), sketches, etc.
  • automatic recognition and transcription of music (in particular guitar and voice).
  • joint understanding of images and text, and machine translation.
  • search engines for large audiovisual databases.
  • vertical search engines, specific to some markets.
  • indexing and enriching of audiovisual flows, digitized cultural archives, collections of creative content, collections of commercial (including musical) content, or the physical world.
Our goal is to master the most advanced technologies in terms of artificial intelligence by developing partnerships with various major players in the field in Europe and around the world. Beyond AI, we also propose innovative interactions and interfaces for intelligent content search for the professional world, big-data and semantic web (specialized search engines), as well as for the general public (museum installations or interactive arts).

Natural Human-Computer Interaction

Natural human-computer interaction methods, especially through voice, increase the breadth of possibilities for developing innovative interfaces between humans and computers. Beyond voice, non-verbal and expressive behaviors can also be detected and used to facilitate interaction with the user, or influence his mood. Our expertise includes:
  • speech recognition, and identification of the person by her voice or face.
  • synthesis of speech and non-verbal signals related to emotions and expressiveness.
  • affective computing, enabling the computer to recognize the emotional state (especially amusement and laughter) of a person, and to create interactive agents with expressive faces and natural behavior, in the context of educational or playful applications.
  • situated interaction and grounded understanding, merging computer vision and natural language.
  • use of synthesized stimuli for research in the psychology of personality.
We have long experience in speech processing, with many advances in the field, and skills in both automatic recognition and synthesis, but also in more generic sound processing: noise reduction, synthesis, etc ... We are also undertaking research on the understanding of speech and language via embodied and situated approaches, when understanding the environment is a necessity to disambiguate the language.

More text