next up previous contents
Next: PROJECT WORK PLAN Up: No Title Previous: OBJECTIVESINDUSTRIAL RELEVANCE

APPROACH

 

As already mentioned, this project will build upon the 1992-1995 ESPRIT project WERNICKE which developed a state-of-the-art, speaker independent, large vocabulary continuous speech recognition system (comparable with the best) that is significantly more compact and efficient than its competitors.

WERNICKE also demonstrated that hybrid HMM/ANN technology is viable, probably preferable, to build on for the goals of this project (e.g., more compact, less ``specialized'' and, consequently, easier to adapt to new tasks and new languages). Actually, the resulting hybrid HMM/ANN systems have proven to be good alternatives to standard HMM technology. This is particularly promising since it seems to be more and more difficult to improve on standard HMMs and the need for alternative technologies and new paradigms is often acknowledged by scientists working in this field. As briefly discussed in Section 1.2, this technology has also proven to be potentially useful in other application domains. The output of WERNICKE can thus be considered as successful and has already attracted substantial interest from several industries. However, it is clear that there is still much to be done to improve the existing system. Given what has been achieved in WERNICKE, the remaining research issues to be addressed (and that will be addressed in SPRACH) are clearly listed and briefly described in this section.

As briefly explained in Section 1.1, the fundamental aim of the present project is to further develop and optimize our hybrid HMM/ANN speaker independent, large vocabulary ( 64K words), continuous speech recognizers, and continue their comparison with other state-of-the-art systems. However, the advantages of hybrid HMM/ANN systems will be further exploited by extending the systems to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes.

To achieve this goal, the approach followed in this project has been built upon several basic parts, spread across different Work Packages (Work Packages referred to below are described in details in Section 3), with very strong relationships and inter-dependencies:

  1. Extension of baseline HMM/ANN systems (available for American English and UK English) to French and Portuguese, and adaptation to different assessment databases. This will be covered by Work Packages WP1 (for databases and baseline systems), WP2 (for lexica and automatic learning of lexica) and WP3 (for language models and language model adaptation).
  2. Development, and assessment on applications defined in Section 1.4, of task-independent hybrid HMM/ANN recognizers in UK English, US English (for international assessment), French and Portuguese. This requires: (1) large databases in the targeted languages (covered by WP1),(2) automatic generation of phonetic transcription and phonological rules of new lexica (covered by WP2), (3) fast adaptation of language models (covered by WP3), and (4) task-independent acoustic models robust to noise and channel conditions (covered by WP4).

    Formal assessment of these systems will not always be possible. However, prototype systems will be set up regularly and will be made available for testing by our industrial advisors (on applications possibly defined by them); this will be covered by WP7. However, whenever possible, formal assessment will be done on smaller databases (with or without retraining) when available; this will be the case for the OGI free format numbers, as mentioned in Section 1.4.

  3. Following the WERNICKE format, formal assessment and comparisons with other state-of-the-art systems via international competition on the basis of common databases will still be pursued. Therefore, this project will put a large effort in the use of speech data that are widely used for evaluating continuous speech recognizers all around the world. This will be covered by WP1 and WP7 (since training and assessment on large common databases requires substantial effort and was originally underestimated in WERNICKE). In WP7, a task exclusively devoted to maintaining a good and efficient decoder for large lexica has been added.
  4. Development and evaluation of new theories and methods to improve or go beyond the existing hybrid HMM/ANN systems. This constitutes the ``research core'' of this project, and is addressed in works package WP5. In this work package, several promising approaches that could go beyond the initial hybrid HMM/ANN systems and improve them have been listed. Although this part is more research oriented, it is not too speculative since preliminary work has already been done in each of the mentioned areas and since these are closely related to the above mentioned issues.
  5. Use of common hardware and software tools to help the research and to implement resulting algorithms (covered by WP6). This was shown to be particularly useful and efficient in WERNICKE since:
    1. This forces all the partners to work on the same software and hardware.
    2. Although hybrid HMM/ANN approaches appear to show several advantages in terms of performance and reduced complexity during recognition, this is achieved at the cost of drastically increased time for training, which makes further European developments and investigations in this field (and probably also in many other problems involving ANN algorithms) completely impossible without special hardware. Such kind of hardware and associated software does not exist in Europe yet and its development would probably require tens of man-years. Note that there are some more specialized computers that have been developed for this purpose in Europe, but they are less applicable to the kind of flexible programming needs that are present in the research environment such as was the case in WERNICKE.
    3. This significantly reduces research and test cycles.
Recently, our WERNICKE subcontractor ICSI (also subcontractor of the current proposal) released (as originally planned in WERNICKE) their full-custom single chip vector microprocessor that will be used in this project. This processor was designed to be a good match to the kind of research that is being done by the teams in this proposal. However, to surpass the level of performance obtained by high-end workstations, the design needed to be somewhat specialized for the relevant styles of computation. In particular, what was built was (in addition to an on-chip general-purpose MIPS-compatible core) a fixed-point vector coprocessor. In order to permit efficient use of this chip that is simultaneously flexible along the lines of research pursued by this group, some further software development will be required. In particular, ICSI will be developing software classes that permit all the computation for the kinds of neural networks that we use (feedforward MLPs and the kind of recurrent networks used at Cambridge) using fixed point computation internally while looking externally like a floating point block of computation. This has been done for some pilot examples and appears to lose very little in efficiency for the sizes of networks we are interested in, while permitting users to think in terms of floating point computation. Furthermore, the parallelism for such a block is also hidden. In general, the funding for this component will cover the costs of developing software building blocks that will be used by the partners to develop their applications software.



next up previous contents
Next: PROJECT WORK PLAN Up: No Title Previous: OBJECTIVESINDUSTRIAL RELEVANCE



Jean-Marc Boite
Mon Dec 9 18:18:02 MET 1996