next up previous contents
Next: Structure, Work Packages and Up: Work plan Overview Previous: Work plan Overview

General Analysis

As already mentioned, this project builds upon the 1992-1995 ESPRIT project WERNICKE which developed a state-of-the-art, speaker independent, large vocabulary continuous speech recognition system (comparable with the best) that is significantly more compact and efficient than its competitors.

WERNICKE also demonstrated that hybrid HMM/ANN technology is viable, probably preferable, to build on for the goals of this project (e.g., more compact, less ``specialized'' and, consequently, easier to adapt to new tasks and new languages). Actually, the resulting hybrid HMM/ANN systems have proven to be good alternatives to standard HMM technology. This is particularly promising since it seems to be more and more difficult to improve on standard HMMs and the need for alternative technologies and new paradigms is often acknowledged by scientists working in this field. As briefly discussed in Section 0.1.2, this technology has also proven to be potentially useful in other application domains. The output of WERNICKE can thus be considered as successful and has already attracted substantial interest from several industries. However, it is clear that there is still much to be done to improve the existing system.

As briefly explained in Section 0.1.1, the fundamental aim of the present project is to further develop and optimize our hybrid HMM/ANN speaker independent, large vocabulary ($\geq$ 64K words), continuous speech recognizers, and continue their comparison with other state-of-the-art systems. In SPRACH, the advantages of hybrid HMM/ANN systems are further exploited by extending the systems to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes.

To achieve this goal, the approach followed in this project has been built upon several basic parts, spread across different Work Packages, with very strong relationships and inter-dependencies:

Extension of baseline HMM/ANN systems (available for American English and UK English) to French and Portuguese, and adaptation to different assessment databases. This is covered by Work Packages WP1 (for databases and baseline systems), WP2 (for lexica and automatic learning of lexica) and WP3 (for language models and language model adaptation).
Development, and assessment on applications defined in Section 0.1.5, of task independent hybrid HMM/ANN recognizers in UK English, US English (for international assessment), French and Portuguese. This requires: (1) large databases in the targeted languages (covered by WP1),(2) automatic generation of phonetic transcription and phonological rules of new lexica (covered by WP2), (3) fast adaptation of language models (covered by WP3), and (4) task-independent acoustic models robust to noise and channel conditions (covered by WP4).

Formal assessment of these systems are not always be possible. However, prototype systems will be set up regularly and will be made available for testing by our industrial advisors (on applications possibly defined by them); this is covered by WP7. However, whenever possible, formal assessment will be done on smaller databases (with or without retraining) when available; this will be the case for the OGI free format numbers, as mentioned in Section 0.1.5.

Following the WERNICKE format, formal assessment and comparisons with other state-of-the-art systems via international competition on the basis of common databases are being pursued. Therefore, this project has to put a large effort in the use of speech data that are widely used for evaluating continuous speech recognizers all around the world. This is covered by WP1 and WP7 (since training and assessment on large common databases requires substantial effort and was originally underestimated in WERNICKE). In WP7, a task exclusively devoted to maintaining a good and efficient decoder for large lexica has been added.
Development and evaluation of new theories and methods to improve or go beyond the existing hybrid HMM/ANN systems. This constitutes the ``research core'' of this project, and is addressed in work package WP5. In this work package, several promising approaches that could go beyond the initial hybrid HMM/ANN systems and improve them have been listed. Although this part is more research oriented, it is not too speculative since preliminary work has already been done in each of the mentioned areas and since these are closely related to the above mentioned issues.
Use of common hardware and software tools to help the research and to implement resulting algorithms (covered by WP6). This was shown to be particularly useful and efficient in WERNICKE since:
This forces all the partners to work on the same software and hardware.
Although hybrid HMM/ANN approaches appear to show several advantages in terms of performance and reduced complexity during recognition, this is achieved at the cost of drastically increased time for training, which makes further European developments and investigations in this field (and probably also in many other problems involving ANN algorithms) completely impossible without special hardware. Such kind of hardware and associated software does not exist in Europe yet and its development would probably require tens of man-years. Note that there are some more specialized computers that have been developed for this purpose in Europe, but they are less applicable to the kind of flexible programming needs that are present in the research environment such as was the case in WERNICKE.
This significantly reduces research and test cycles.
Recently, our subcontractor ICSI released (as originally planned in WERNICKE) their full-custom single chip vector microprocessor that will be used in this project. This processor was designed to be a good match to the kind of research that is being done by the SPRACH partners. However, to surpass the level of performance obtained by high-end workstations, the design needed to be somewhat specialized for the relevant styles of computation. In order to permit efficient use of this chip that is simultaneously flexible along the lines of research pursued by this group, ICSI keeps developing software classes that permit all the computation for the kinds of neural networks that are used in this project.

next up previous contents
Next: Structure, Work Packages and Up: Work plan Overview Previous: Work plan Overview
Christophe Ris