TCTS Lab Research Groups
 
 

The project

[ FPMs > TCTS > ASR group > Projects > SPRACH ]

 

[Partners]

[Summary]

[Objectives]

[Analysis]

[Results]

[Workplan]

[Reports]

[Demo]

 

SPeech Recognition Algorithms for Connectionist Hybrids

ESPRIT Long Term Research RTD Project Ref. 20077

This is the official website of the SPRACH project.

Partners

LTR Consortium

Industrial Advisory Board


Summary

The goal of the proposed project is to further improve the current state-of-the-art in continuous speech recognition using Artificial Neural Network (ANN) and Hidden Markov Model (HMM) approaches. Pursuing the theoretical and development work successfully carried out under the WERNICKE project (ESPRIT Basic Research Project 6487, October 1992-October 1995), this new project, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), will extend the research to robust and flexible speech recognition systems that can easily be adapted to new languages and new domains with new lexica and new syntaxes.

In WERNICKE, on top of substantial theoretical results, it was demonstrated, using standard international reference databases (such as the unlimited vocabulary ARPA North American Business News database, and the EU funded SQALE project), that the hybrid HMM/ANN approaches lead to competitive state-of-the-art speech recognizers. Furthermore, the investigated hybrid approach was shown to have additional advantages in terms of CPU utilization and memory bandwidth. These conclusions have been confirmed by many different independent sources.

While building on the WERNICKE large vocabulary continuous speech recognition system, SPRACH will also investigate the development of flexible systems for smaller, task independent applications, in different languages (UK English, French and Portuguese).

The industrial relevance of this project is high, and many useful results are expected. Firstly, it is clear that speech processing, and speech recognition in particular, will play a major role in the future multimedia and telematics applications. In this respect, while SPRACH is fully exploiting the promising HMM/ANN technology, it also addresses most of the relevant issues of speech recognition in general, such as language and lexicon modeling, application domain adaptation, and prototype development. Secondly, on top of its obvious relevance to the speech recognition technology, it is also important to note that, motivated by the results achieved in WERNICKE, these hybrid systems have already been adopted by several industries and laboratories in many different areas. To reinforce the industrial relevance of this project and its possible industrial impact, a SPRACH Industrial Advisory Board including BBC (UK), CSELT (I), Daimler-Benz (D) and Thomson (F) has been set up.

Possible applications and demonstration systems that will be targeted in this project are summarized in the Expected Results section.

Keywords: speech recognition, hidden Markov models (HMM), artificial neural networks (ANN), statistical inference in ANNs, hybrid HMM/ANN technology, language models, application domain adaptation, neural network hardware and software, speech recognition applications.


Objectives

The basic theme of this proposal, referred to as SPRACH ( SPeech Recognition Algorithms for Connectionist Hybrids), is to build upon WERNICKE (ESPRIT Basic Research Project 6487, October 1992-October 1995) to further develop new theories, algorithms, hardware and software tools for the extension of hybrid Hidden Markov Models (HMM) - Artificial Neural Networks (ANN) methods for different continuous speech recognition systems. However, while continuing the theoretical and development work successfully carried out in WERNICKE, this new project also aims at extending the WERNICKE results to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes. This thus means that SPRACH will also develop powerful tools to allow an easy adaptation and testingof the known (as well as the newly developed) technology to different tasks.

In WERNICKE, on top of substantial theoretical results [Bourlard & Morgan, 1994], it was demonstrated (see, e.g., [Robinson et al., 1993]), using standard international reference databases (such as the unlimited vocabulary ARPA North American Business News database and the EU funded SQALE project), that the hybrid HMM/ANN approaches lead to competitive state-of-the-art systems. Furthermore, the investigated hybrid approach was shown to have additional advantages in terms of CPU utilization and memory bandwidth. It is, however, our belief that such systems can also be more flexible and more robust. In addition to building on the WERNICKE large vocabulary continuous speech recognition system, SPRACH will investigate the development of systems for smaller, task independent applications, with no need to retrain the system or develop a new lexicon or grammar when moving from one task to another. Although some typical examples of those tasks are given in the Expected Results section, it is clear that they eventually will be redefined during the project to take the input and recommendations of our Industrial Advisory Board into account.


Analysis

As already mentioned, this project will build upon the 1992-1995 ESPRIT project WERNICKE which developed a state-of-the-art, speaker independent, large vocabulary continuous speech recognition system (comparable with the best) that is significantly more compact and efficient than its competitors.

WERNICKE also demonstrated that hybrid HMM/ANN technology is viable, probably preferable, to build on for the goals of this project(e.g., more compact, less ``specialized'' and, consequently, easier to adapt to new tasks and new languages). Actually, the resulting hybrid HMM/ANN systems have proven to be good alternatives to standard HMM technology. This is particularly promising since it seems to be more and more difficult to improve on standard HMMs and the need for alternative technologies and new paradigms is often acknowledged by scientists working in this field. The output of WERNICKE can thus be considered as successful and has already attracted substantial interest from several industries. However, it is clear that there is still much to be done to improve the existing system. Given what has been achieved in WERNICKE, the remaining research issues to be addressed (and that will be addressed in SPRACH) are clearly listed and briefly described in this section.

As briefly explained in the Objectives section, the fundamental aim of the present project is to further develop and optimize our hybrid HMM/ANN speaker independent, large vocabulary (>64K words), continuous speech recognizers, and continue their comparison with other state-of-the-art systems. However, the advantages of hybrid HMM/ANN systems will be further exploited by extending the systems to new languages (UK English, French and Portuguese) and to flexible speech recognition systems that can easily be adapted to new domains with new lexica and new syntaxes.

To achieve this goal, the approach followed in this project has been built upon several basic parts, spread across different Work Packages with very strong relationships and inter-dependencies:

  • Extension of baseline HMM/ANN systems (available for American English and UK English) to French and Portuguese, and adaptation to different assessment databases. This will be covered by Work Packages WP1 (for databases and baseline systems), WP2 (for lexica and automatic learning of lexica) and WP3 (for language models and language model adaptation).

  • Development, and assessment on applications defined in the Expected Results section of task-independent hybrid HMM/ANN recognizers in UK English, US English (for international assessment), French and Portuguese. This requires:

    • large databases in the targeted languages (covered by WP1),

    • automatic generation of phonetic transcription and phonological rules of new lexica (covered by WP2),

    • fast adaptation of language models (covered by WP3), and

    • task-independent acoustic models robust to noise and channel conditions (covered by WP4).

Formal assessment of these systems will not always be possible. However, prototype systems will be set up regularly and will be made available for testing by our industrial advisors (on applications possibly defined by them); this will be covered by WP7. However, whenever possible, formal assessment will be done on smaller databases (with or without retraining) when available.

  • Following the WERNICKE format, formal assessment and comparisons with other state-of-the-art systems via international competition on the basis of common databases will still be pursued. Therefore, this project will put a large effort in the use of speech data that are widely used for evaluating continuous speech recognizers all around the world. This will be covered by WP1 and WP7 (since training and assessment on large common databases requires substantial effort and was originally underestimated in WERNICKE). In WP7, a task exclusively devoted to maintaining a good and efficient decoder for large lexica has been added.

  • Development and evaluation of new theories and methods to improve or go beyond the existing hybrid HMM/ANN systems. This constitutes the research core of this project, and is addressed in works package WP5. In this work package, several promising approaches that could go beyond the initial hybrid HMM/ANN systems and improve them have been listed. Although this part is more research oriented, it is not too speculative since preliminary work has already been done in each of the mentioned areas and since these are closely related to the above mentioned issues.

  • Use of common hardware and software tools to help the research and to implement resulting algorithms (covered by WP6). This was shown to be particularly useful and efficient in WERNICKE since:

    • This forces all the partners to work on the same software and hardware.

    • Although hybrid HMM/ANN approaches appear to show several advantages in terms of performance and reduced complexity during recognition, this is achieved at the cost of drastically increased time for training, which makes further European developments and investigations in this field (and probably also in many other problems involving ANN algorithms) completely impossible without special hardware. Such kind of hardware and associated software does not exist in Europe yet and its development would probably require tens of man-years. Note that there are some more specialized computers that have been developed for this purpose in Europe, but they are less applicable to the kind of flexible programming needs that are present in the research environment such as was the case in WERNICKE.

    • This significantly reduces research and test cycles.

Recently, our WERNICKE Subcontractor ICSI (also Subcontractor of the current proposal) released (as originally planned in WERNICKE) their full-custom single chip vector microprocessor that will be used in this project. This processor was designed to be a good match to the kind of research that is being done by the teams in this proposal. However, to surpass the level of performance obtained by high-end workstations, the design needed to be somewhat specialized for the relevant styles of computation. In particular, what was built was (in addition to an on-chip general-purpose MIPS-compatible core) a fixed-point vector coprocessor. In order to permit efficient use of this chip that is simultaneously flexible along the lines of research pursued by this group, some further software development will be required. In particular, ICSI will be developing software classes that permit all the computation for the kinds of neural networks that we use (feedforward MLPs and the kind of recurrent networks used at Cambridge) using fixed point computation internally while looking externally like a floating point block of computation. This has been done for some pilot examples and appears to lose very little in efficiency for the sizes of networks we are interested in, while permitting users to think in terms of floating point computation. Furthermore, the parallelism for such a block is also hidden. In general, the funding for this component will cover the costs of developing software building blocks that will be used by the partners to develop their applications software.


Expected Results

Possible applications that will be targeted in this project and resulting demonstration systems will include, e.g.:

  • Very large vocabulary (>64K words) continuous speech recognition of read speech---this will be an essential enabling technology for many multimedia and telematics applications.

  • Voice-driven typewriter: A dictation system running in real time with simple editing commands.

  • Flexible continuous speech recognizer in which lexica and grammars can be defined on the spot, without the need of training.

  • Smaller (but realistic) tasks, including, e.g., robust recognition of free format numbers. This could be done on the basis of existing databases like the OGI numbers databases.

  • Recognition of broadcast speech---transcription of radio or television speech (e.g. newsreaders).

  • Extension of the above to several European languages. On top of the properties discussed above, another interesting feature of the hybrid systems is that they do not seem to require extensive knowledge of the languages or their phonological rules to adapt the recognizer. With appropriate databases (which become more and more available), development of a new language is quite straightforward.

As a by-product this project will also give the partners access to the new fast and flexible hardware and software that is being developed by the same team that provided these capabilities for the WERNICKE project. As was shown in that project, the availability of common hardware and software that is somewhat customized for the research approaches under investigation permitted both the incorporation of very computationally-intensive algorithms, and the comparison of their efficacy across the different sites. In SPRACH, we intend to incorporate some of the newest relevant hardware and software technology that has been developed largely to support research such as that being done in WERNICKE and SPRACH.

This proposal is ambitious. However, due to the quality of the partners, their strong relationships, and the subcontract which will provide the project with the necessary hardware and software tools, the research should be successful and should make several important technical and scientific contributions to both ANN and HMM speech recognition technologies. Through its many innovative aspects (including software and hardware), this project can have a direct impact on future products, such as multimedia applications.


Project workplan

SPRACH Technical Annex


Reports

Demo

The SPRACHdemo, a live multilingual speech recognition system, able to recognize English, French and Portuguese.