SPeech Recognition Algorithms for Connectionist Hybrids
ESPRIT Long Term Research RTD Project Ref. 20077
This is the official website of the SPRACH project.
The goal of the proposed project is to further improve the current state-of-the-art in continuous speech recognition using Artificial Neural Network (ANN) and Hidden Markov Model (HMM) approaches. Pursuing the theoretical and development work successfully carried out under the WERNICKE project (ESPRIT Basic Research Project 6487, October 1992-October 1995), this new project, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), will extend the research to robust and flexible speech recognition systems that can easily be adapted to new languages and new domains with new lexica and new syntaxes.
In WERNICKE, on top of substantial theoretical results, it was demonstrated, using standard international reference databases (such as the unlimited vocabulary ARPA North American Business News database, and the EU funded SQALE project), that the hybrid HMM/ANN approaches lead to competitive state-of-the-art speech recognizers. Furthermore, the investigated hybrid approach was shown to have additional advantages in terms of CPU utilization and memory bandwidth. These conclusions have been confirmed by many different independent sources.
While building on the WERNICKE large vocabulary continuous speech recognition system, SPRACH will also investigate the development of flexible systems for smaller, task independent applications, in different languages (UK English, French and Portuguese).
The industrial relevance of this project is high, and many useful results are expected. Firstly, it is clear that speech processing, and speech recognition in particular, will play a major role in the future multimedia and telematics applications. In this respect, while SPRACH is fully exploiting the promising HMM/ANN technology, it also addresses most of the relevant issues of speech recognition in general, such as language and lexicon modeling, application domain adaptation, and prototype development. Secondly, on top of its obvious relevance to the speech recognition technology, it is also important to note that, motivated by the results achieved in WERNICKE, these hybrid systems have already been adopted by several industries and laboratories in many different areas. To reinforce the industrial relevance of this project and its possible industrial impact, a SPRACH Industrial Advisory Board including BBC (UK), CSELT (I), Daimler-Benz (D) and Thomson (F) has been set up.
Possible applications and demonstration systems that will be targeted in this project are summarized in the Expected Results section.
The basic theme of this proposal, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), is to build upon WERNICKE (ESPRIT Basic Research Project 6487, October 1992-October 1995) to further develop new theories, algorithms, hardware and software tools for the extension of hybrid Hidden Markov Models (HMM) - Artificial Neural Networks (ANN) methods for different continuous speech recognition systems. However, while continuing the theoretical and development work successfully carried out in WERNICKE, this new project also aims at extending the WERNICKE results to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes. This thus means that SPRACH will also develop powerful tools to allow an easy adaptation and testingof the known (as well as the newly developed) technology to different tasks.
In WERNICKE, on top of substantial theoretical results [Bourlard & Morgan, 1994], it was demonstrated (see, e.g., [Robinson et al., 1993]), using standard international reference databases (such as the unlimited vocabulary ARPA North American Business News database and the EU funded SQALE project), that the hybrid HMM/ANN approaches lead to competitive state-of-the-art systems. Furthermore, the investigated hybrid approach was shown to have additional advantages in terms of CPU utilization and memory bandwidth. It is, however, our belief that such systems can also be more flexible and more robust. In addition to building on the WERNICKE large vocabulary continuous speech recognition system, SPRACH will investigate the development of systems for smaller, task independent applications, with no need to retrain the system or develop a new lexicon or grammar when moving from one task to another. Although some typical examples of those tasks are given in the Expected Results section, it is clear that they eventually will be redefined during the project to take the input and recommendations of our Industrial Advisory Board into account.
As already mentioned, this project will build upon the 1992-1995 ESPRIT project WERNICKE which developed a state-of-the-art, speaker independent, large vocabulary continuous speech recognition system (comparable with the best) that is significantly more compact and efficient than its competitors.
WERNICKE also demonstrated that hybrid HMM/ANN technology is viable, probably preferable, to build on for the goals of this project(e.g., more compact, less ``specialized'' and, consequently, easier to adapt to new tasks and new languages). Actually, the resulting hybrid HMM/ANN systems have proven to be good alternatives to standard HMM technology. This is particularly promising since it seems to be more and more difficult to improve on standard HMMs and the need for alternative technologies and new paradigms is often acknowledged by scientists working in this field. The output of WERNICKE can thus be considered as successful and has already attracted substantial interest from several industries. However, it is clear that there is still much to be done to improve the existing system. Given what has been achieved in WERNICKE, the remaining research issues to be addressed (and that will be addressed in SPRACH) are clearly listed and briefly described in this section.
As briefly explained in the Objectives section, the fundamental aim of the present project is to further develop and optimize our hybrid HMM/ANN speaker independent, large vocabulary (>64K words), continuous speech recognizers, and continue their comparison with other state-of-the-art systems. However, the advantages of hybrid HMM/ANN systems will be further exploited by extending the systems to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes.
To achieve this goal, the approach followed in this project has been built upon several basic parts, spread across different Work Packages with very strong relationships and inter-dependencies:
Recently, our WERNICKE Subcontractor ICSI (also Subcontractor of the current proposal) released (as originally planned in WERNICKE) their full-custom single chip vector microprocessor that will be used in this project. This processor was designed to be a good match to the kind of research that is being done by the teams in this proposal. However, to surpass the level of performance obtained by high-end workstations, the design needed to be somewhat specialized for the relevant styles of computation. In particular, what was built was (in addition to an on-chip general-purpose MIPS-compatible core) a fixed-point vector coprocessor. In order to permit efficient use of this chip that is simultaneously flexible along the lines of research pursued by this group, some further software development will be required. In particular, ICSI will be developing software classes that permit all the computation for the kinds of neural networks that we use (feedforward MLPs and the kind of recurrent networks used at Cambridge) using fixed point computation internally while looking externally like a floating point block of computation. This has been done for some pilot examples and appears to lose very little in efficiency for the sizes of networks we are interested in, while permitting users to think in terms of floating point computation. Furthermore, the parallelism for such a block is also hidden. In general, the funding for this component will cover the costs of developing software building blocks that will be used by the partners to develop their applications software.
Possible applications that will be targeted in this project and resulting demonstration systems will include, e.g.:
As a by-product this project will also give the partners access to the new fast and flexible hardware and software that is being developed by the same team that provided these capabilities for the WERNICKE project. As was shown in that project, the availability of common hardware and software that is somewhat customized for the research approaches under investigation permitted both the incorporation of very computationally-intensive algorithms, and the comparison of their efficacy across the different sites. In SPRACH, we intend to incorporate some of the newest relevant hardware and software technology that has been developed largely to support research such as that being done in WERNICKE and SPRACH.
This proposal is ambitious. However, due to the quality of the partners, their strong relationships, and the subcontract which will provide the project with the necessary hardware and software tools, the research should be successful and should make several important technical and scientific contributions to both ANN and HMM speech recognition technologies. Through its many innovative aspects (including software and hardware), this project can have a direct impact on future products, such as multimedia applications.
The SPRACHdemo, a live multilingual speech recognition system, able to recognize English, French and Portuguese.