next up previous contents
Next: APPROACH Up: No Title Previous: SUMMARY




The basic theme of this proposal, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), is to build upon WERNICKE (ESPRIT Basic Research Project 6487, October 1992-October 1995) to further develop new theories, algorithms, hardware and software tools for the extension of hybrid Hidden Markov Models (HMM) --- Artificial Neural Networks (ANN) methods for different continuous speech recognition systems. However, while continuing the theoretical and development work successfully carried out in WERNICKE, this new project also aims at extending the WERNICKE results to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes. This thus means that SPRACH will also develop powerful tools to allow an easy adaptation and testing of the known (as well as the newly developed) technology to different tasks.

In WERNICKE, on top of substantial theoretical results [Bourlard & Morgan, 1994], it was demonstrated (see, e.g., [Robinson et al., 1993]), using standard international reference databases (such as the unlimited vocabulary ARPA North American Business News database and the EU funded SQALE project), that the hybrid HMM/ANN approaches lead to competitive state-of-the-art systems. Furthermore, the investigated hybrid approach was shown to have additional advantages in terms of CPU utilization and memory bandwidth. It is, however, our belief that such systems can also be more flexible and more robust. In addition to building on the WERNICKE large vocabulary continuous speech recognition system, SPRACH will investigate the development of systems for smaller, task independent applications, with no need to retrain the system or develop a new lexicon or grammar when moving from one task to another. Although some typical examples of those tasks are given in Section 1.4, it is clear that they eventually will be redefined during the project to take the input and recommendations of our Industrial Advisory Board (see Section 1.3 and Appendix B.2) into account.

Industrial Relevance


The industrial relevance of this project is high, and many useful results are expected. It is clear that speech processing, and speech recognition in particular, will play a major role in the future multimedia and telematics applications. For example, most of the applications foreseen in the Language Engineering area of the Telematics Application Programme (area D, sector 12) as well as the Multimedia Systems (Domain 3) of the RTD in Information Technologies will benefit from robust and flexible speech recognition systems. Additionally this research will have an impact on the Software Technologies domain of the RTD in IT, particularly tasks 1.25 and 1.26.

Motivated by the results achieved in WERNICKE, several industrial and academic laboratories have recently compared the hybrid approaches developed in WERNICKE with the best classical HMM approaches on a number of speech recognition tasks. In cases where the comparison was controlled, the hybrid approach performed better when the number of parameters were similar, and about the same for some cases in which the classical system used many more parameters. Evidence for this can be found in a number of sources, including:

The most recent results, those of the EU funded SQALE evaluations, show the hybrid approach slightly ahead of more traditional HMM systems. The hybrid system was evaluated on both British and American English tasks, using a 20,000 word vocabulary and a trigram language model, along with the other leading European systems produced by LIMSI (France), Philips (Germany) and Cambridge University/HTK (UK) [Steeneken & Van Leeuwen, 1995] gif. Additionally, the hybrid system was efficient in its runtime CPU and memory requirements.

Finally, the hybrid HMM/ANN approaches developed in WERNICKE are quite general and can be applied to other tasks. Recently, this approach was adopted by several laboratories to handle speaker verification [Naik & Lubensky, 1994] (NYNEX), handwriting recognition [Schenkel et al., 1994] (AT&T), gene classification and fault diagnosis [Smyth et al., 1994].

Industrial Advisory Board


To further reinforce the industrial relevance of this project and its possible industrial impact, four major industrial partners expressed an interest in the current proposal and agreed to be part of a SPRACH Industrial Advisory Board with the aim of (1) guiding the research partners through the cooperative definition of potential applications, test tasks and development prototypes, and (2) maintaining an awareness of current and future developments in the area.

These industrial partners are: (1) British Broadcasting Corporation (BBC), UK, (2) Thomson CSF, France, (3) Daimler-Benz, Germany, and (4) CSELT, Italy. A short description of each of these industrial advisors, together with the reasons for their interest in the current proposal, is given in Section B.2. It is clear that all of them are highly interested in the possible outputs of the present project. Furthermore, it is worth noting that:

  1. BBC and Thomson are particularly interested in automatic indexing of spoken language and of recognition of broadcast speech (which will be one of the applications considered in this project -- see Section 1.4).
  2. Daimler-Benz is very active in the area of speech recognition and has an interest in learning about potential advantages of the hybrid HMM/ANN technology, particularly for robust systems. Additionally, Daimler-Benz is also one of the German industries funding ICSI, the US subcontractor of the current proposal.
  3. CSELT is also a major player in the European speech recognition technology and is committed to turning this technology into products. Recently, they presented a (patent pending) speech recognition system based on hybrid HMM/ANN technology [Gemello et al., 1994].

Expected Results


Possible applications that will be targeted in this project and resulting demonstration systems will include, e.g.:

  1. Very large vocabulary ( 64K words) continuous speech recognition of read speech---this will be an essential enabling technology for many multimedia and telematics applications.
  2. Voice-driven typewriter: A dictation system running in real time with simple editing commands.
  3. Flexible continuous speech recognizer in which lexica and grammars can be defined on the spot, without the need of training.
  4. Smaller (but realistic) tasks, including, e.g., robust recognition of free format numbers. This could be done on the basis of existing databases like the OGI numbers databases.
  5. Recognition of broadcast speech---transcription of radio or television speech (e.g. newsreaders).
  6. Extension of the above to several European languages. On top of the properties discussed above, another interesting feature of the hybrid systems is that they do not seem to require extensive knowledge of the languages or their phonological rules to adapt the recognizer. With appropriate databases (which become more and more available), development of a new language is quite straightforward.

As a by-product this project will also give the partners access to the new fast and flexible hardware and software that is being developed by the same team that provided these capabilities for the WERNICKE project. As was shown in that project, the availability of common hardware and software that is somewhat customized for the research approaches under investigation permitted both the incorporation of very computationally-intensive algorithms, and the comparison of their efficacy across the different sites. In SPRACH, we intend to incorporate some of the newest relevant hardware and software technology that has been developed largely to support research such as that being done in WERNICKE and SPRACH.

This proposal is ambitious. However, due to the quality of the partners, their strong relationships, and the subcontract which will provide the project with the necessary hardware and software tools, the research should be successful and should make several important technical and scientific contributions to both ANN and HMM speech recognition technologies. Through its many innovative aspects (including software and hardware), this project can have a direct impact on future products, such as multimedia applications.

next up previous contents
Next: APPROACH Up: No Title Previous: SUMMARY

Jean-Marc Boite
Mon Dec 9 18:18:02 MET 1996