The basic theme of this proposal, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), is to build upon WERNICKE (ESPRIT Basic Research Project 6487, October 1992-October 1995) to further develop new theories, algorithms, hardware and software tools for the extension of hybrid Hidden Markov Models (HMM) --- Artificial Neural Networks (ANN) methods for different continuous speech recognition systems. However, while continuing the theoretical and development work successfully carried out in WERNICKE, this new project also aims at extending the WERNICKE results to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes. This thus means that SPRACH will also develop powerful tools to allow an easy adaptation and testing of the known (as well as the newly developed) technology to different tasks.
In WERNICKE, on top of substantial theoretical results [Bourlard & Morgan, 1994], it was demonstrated (see, e.g., [Robinson et al., 1993]), using standard international reference databases (such as the unlimited vocabulary ARPA North American Business News database and the EU funded SQALE project), that the hybrid HMM/ANN approaches lead to competitive state-of-the-art systems. Furthermore, the investigated hybrid approach was shown to have additional advantages in terms of CPU utilization and memory bandwidth. It is, however, our belief that such systems can also be more flexible and more robust. In addition to building on the WERNICKE large vocabulary continuous speech recognition system, SPRACH will investigate the development of systems for smaller, task independent applications, with no need to retrain the system or develop a new lexicon or grammar when moving from one task to another. Although some typical examples of those tasks are given in Section 1.4, it is clear that they eventually will be redefined during the project to take the input and recommendations of our Industrial Advisory Board (see Section 1.3 and Appendix B.2) into account.
The industrial relevance of this project is high, and many useful results are expected. It is clear that speech processing, and speech recognition in particular, will play a major role in the future multimedia and telematics applications. For example, most of the applications foreseen in the Language Engineering area of the Telematics Application Programme (area D, sector 12) as well as the Multimedia Systems (Domain 3) of the RTD in Information Technologies will benefit from robust and flexible speech recognition systems. Additionally this research will have an impact on the Software Technologies domain of the RTD in IT, particularly tasks 1.25 and 1.26.
Motivated by the results achieved in WERNICKE, several industrial and academic laboratories have recently compared the hybrid approaches developed in WERNICKE with the best classical HMM approaches on a number of speech recognition tasks. In cases where the comparison was controlled, the hybrid approach performed better when the number of parameters were similar, and about the same for some cases in which the classical system used many more parameters. Evidence for this can be found in a number of sources, including:
Finally, the hybrid HMM/ANN approaches developed in WERNICKE are quite general and can be applied to other tasks. Recently, this approach was adopted by several laboratories to handle speaker verification [Naik & Lubensky, 1994] (NYNEX), handwriting recognition [Schenkel et al., 1994] (AT&T), gene classification and fault diagnosis [Smyth et al., 1994].
To further reinforce the industrial relevance of this project and its possible industrial impact, four major industrial partners expressed an interest in the current proposal and agreed to be part of a SPRACH Industrial Advisory Board with the aim of (1) guiding the research partners through the cooperative definition of potential applications, test tasks and development prototypes, and (2) maintaining an awareness of current and future developments in the area.
These industrial partners are: (1) British Broadcasting Corporation (BBC), UK, (2) Thomson CSF, France, (3) Daimler-Benz, Germany, and (4) CSELT, Italy. A short description of each of these industrial advisors, together with the reasons for their interest in the current proposal, is given in Section B.2. It is clear that all of them are highly interested in the possible outputs of the present project. Furthermore, it is worth noting that:
Possible applications that will be targeted in this project and resulting demonstration systems will include, e.g.:
As a by-product this project will also give the partners access to the new fast and flexible hardware and software that is being developed by the same team that provided these capabilities for the WERNICKE project. As was shown in that project, the availability of common hardware and software that is somewhat customized for the research approaches under investigation permitted both the incorporation of very computationally-intensive algorithms, and the comparison of their efficacy across the different sites. In SPRACH, we intend to incorporate some of the newest relevant hardware and software technology that has been developed largely to support research such as that being done in WERNICKE and SPRACH.
This proposal is ambitious. However, due to the quality of the partners, their strong relationships, and the subcontract which will provide the project with the necessary hardware and software tools, the research should be successful and should make several important technical and scientific contributions to both ANN and HMM speech recognition technologies. Through its many innovative aspects (including software and hardware), this project can have a direct impact on future products, such as multimedia applications.