next up previous contents
Next: References Up: No Title Previous: WP7: EVALUATIONS AND

WP8: RESULTS DISSEMINATION AND EXPLOITATION

Work Package Manager: SU
Executing Partners: all partners, including Industrial Advisory Board

WP8 Overview

The objective of this WP is to effectively disseminate and exploit results arising from the project.

WP8 Milestones and Deliverables

WP8 Milestones: M8.1, M8.2

WP8 Deliverable

No deliverables per se for WP8, but application software delivered in WP7.

Task 8.1: Information Dissemination

Task Coordinator: SU
Executing Partners: all
 

Task 8.1: Task Objective

The objective of this task is to efficiently disseminate the results arising from the project.

Task 8.1: Status

  1. SPRACH home page (http://tcts.fpms.ac.be/sprach/sprach.html). This was constructed at the beginning of the project, and has been updated to reflect project progress.
  2. SPRACHWORKS. During the last internal meeting (see minutes of the meeting), it was agreed to release a bundled (and documented) software package including the source code, associated data and demonstrations resulting from WERNICKE and SPRACH.
  3. HP-Abbot Research Contract. A research contract, sponsored by Hewlett Packard Labs (Bristol, UK), ran in collaboration with Cambridge and Sheffield Universities for 6 months during 1996. HP sought access to ABBOT for their own research purposes. The aim of the project was to deliver a complete documented version of ABBOT to H.P., porting all software to widely-available platforms, and to provide support for its use.
  4. Publications. Over 20 papers supported by SPRACH have been published or accepted during the period. Additionally 1 PhD thesis and 5 Masters' theses describing work supported by the project have been submitted.

Task 8.1: Technical Description

SPRACHWORKS

The SPRACHWORKS software release, scheduled for April 1997 will include:

  1. Overview document - how to build a system,
  2. STRUT: the FPMs training and recognition toolkit (see progress report for a full description),
  3. RASTA code,
  4. ICSI software, including QuickNet and fast matrix/vector libraries,
  5. NOWAY: the SU/CUED large vocabulary decoder,
  6. CMU-Cambridge Language Model toolkit,
  7. BEEP/CMU dictionary/Moby,
  8. Pyne Hill Robinson (a small single speaker speech database),
  9. ABBOTDEMO: the CUED demonstration system (using NOWAY).

Different alternatives regarding the distribution license agreement were considered:

  1. GNU Public License (``copyleft'') where everything is free (but not public domain), including commercial use, as long as source code is always distributed;
  2. Free access, free use and free distribution of source code, excluding commercial use;
  3. Hybrid license agreement.

Solution (2), i.e., no commercial use agreement, was finally accepted by all of the partners.

Bundling this software package however requires a major effort in which CUED, SU, FPMs and ICSI will have to be actively involved. It was decided that

  1. General management of this effort will be done by CUED.
  2. Name of the software package: SPRACHWORKS.
  3. Targeted internal release: February 1, 1997.
  4. Targeted official release: ICASSP'97 (April 1997).

The directory structure for the SPRACHWORKS package was agreed upon by comparing the current directory structure of STRUT (FPMs), DRSPEECH (ICSI) and ABBOT (CUED). The resulting directory structure is represented overleaf:

 
Figure 8.1: Directory structure for SPRACHWORKS software release.

HP-Abbot

The key developments achieved under the HP-ABBOT development were:

The final packaged software tree was burned onto CD, and distributed to each of the SPRACH partners.

SPRACH Publications

In the following, we list all the publications from the SPRACH consortium related to this project. Some of these publications are attached to this document for information or included as deliverables.

Konig, Y., Bourlard, H., and Morgan, H. (1996), ``REMAP --- Experiments with Speech Recognition,'' IEEE Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing (Atlanta, GA), pp. VI:3350-3353, May 7-10, 1996.

Bourlard, H., Konig, Y., Morgan, N., and Ris, C. (1996), ``A New Training Algorithm for Hybrid HMM/ANN Speech Recognition Systems,'' Proceedings of VIII European Signal Processing Conference (EUSIPCO'96) (Trieste, Italy), pp. 101-104, Sep. 10-13, 1996.

Bourlard, H., Dupont, S., Hermansky, H., and Morgan, N. (1996), ``Towards Subband-Based Speech Recognition,'' Proceedings VIII European Signal Processing Conference (EUSIPCO'96) (Trieste, Italy), pp. 1579-1582, Sep. 10-13, 1996.

Bourlard, H., Hermansky, H., and Morgan, N. (1996), ``Towards Increasing Speech Recognition Error Rates,'' special-interest invited paper, SPEECH COMMUNICATION, vol. 18, no. 3, pp. 205-231, June 1996.

Bourlard, H. (1996), ``Copernicus and the ASR Challenge -- Waiting for Kepler,'' Invited Talk, Proceedings of ARPA Speech Recognition Workshop, Arden House, NY, pp. 157-162, Feb. 18-21, 1996.

Bourlard, H. (1996), ``Reconnaissance Automatique de la Parole: Modélisation ou Description?,'' Actes des XXIèmes Journées d'Etude sur la Parole (JEP), Plenary Talk, pp. 263-272, Avignon (France), June 1996.

Bilmes, J., Morgan, N., Wu, S.-L., and Bourlard, H. (1996), ``Stochastic Perceptual Speech Models with Durational Dependence,'' Proc. of Intl. Conf. on Spoken Language Processing (ICSLP), Philadelphia, Oct. 3-6, 1996.

Bourlard, H., Konig, Y., and Morgan, N. (1996), ``A Training Algorithm for Statistical Sequence Recognition with Applications to Transition-Based Speech Recognition,'' IEEE SIGNAL PROCESSING LETTERS, vol. 3, no. 7, pp. 203-205.

Bourlard, H. and Dupont, S. (1996), ``A New ASR Approach Based on Independent Processing and Recombination of Partial Frequency Bands,'' Proc. of Intl. Conf. on Spoken Language Processing (ICSLP), Philadelphia, Oct. 3-6, 1996.

Bourlard, H. and Morgan, N. (1996), ``Connectionist Techniques,'' to be published in the NSF-EC Survey on the STATE OF THE ART IN SPEECH AND NATURAL LANGUAGE PROCESSING, R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue (Eds.), Springer Verlag, 1996.

Bourlard, H. and Dupont, S. (1997), ``Subband-Based Speech Recognition,'' accepted to IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Munich, April 1997.

Dupont, S., Bourlard, H., Deroo, O., Fontaine, V., and Boite, J.-M., (1997), ``Hyrbid HMM/ANN Systems for Training Independent Tasks: Experiments on 'Phonebook' and Related Improvements,'' accepted to IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Munich, April 1997.

Renals, S. (1996), ``Phone deactivation pruning in large vocabulary continuous speech recognition'', IEEE Signal Processing Letters, 3, 4--6.

Renals, S. and Hochberg, M. (1996), ``Efficient evaluation of the LVCSR search space using the NOWAY decoder'', Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta GA, 1, 149--152.

Neto, J., Martins, C. and Almeida, L. (1996) ``An Incremental Speaker-Adaptation Technique for Hybrid HMM-MLP Recognizer'', Proc. of Intl. Conf. on Spoken Language Processing (ICSLP), Philadelphia, 1289--1292.

Clarkson, P. and Robinson, T. (1997) ``Language Model Adaptation Using Mixtures and an Exponentially Decaying Cache'', accepted to IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Munich, April 1997.

Waterhouse, S., Kershaw, D. and and Robinson, T. (1996) ``Smoothed Local Adaptation of Connectionist Systems'', Proc. of Intl. Conf. on Spoken Language Processing (ICSLP), Philadelphia.

Cook, G., Kershaw, D., Christie, J., Seymour, C. and Waterhouse, S. (1997) ``Transcription of broadcast television and radio news: The 1996 Abbot System'', accepted to IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Munich, April 1997.

Cook, G. and Robinson, T. (1996) ``Boosting the Performance of Connectionist Large Vocabulary Speech Recognition'', Proc. of Intl. Conf. on Spoken Language Processing (ICSLP), Philadelphia.

Theses

As a result of SPRACH and the earlier WERNICKE project one PhD thesis entitled ``Phonetic Context-Dependency in a Hybrid ANN/HMM Speech Recognition System'' has been submitted [35] and two more close to completion [31,39], all at Cambridge University. Related issues were explored in Masters' theses, namely:

Task 8.1: Future Developments

The SPRACHWORKS software release is scheduled for April 1997, with an alpha-release scheduled in February 1997.

Although the HP-ABBOT project is now complete, maintaining and managing this software tree is an on-going task.

Task 8.2: Exploitation of Results

Task Coordinator: SU
Executing Partners: all

Task 8.2: Task Objective

The objective of this task is to efficiently exploit results arising from the project, with the Industrial Advisory Board playing a leading role.

Task 8.2: Status

Section 6 of the technical annex discussed the posibility of forming a ``spin-off'' company to assist in the exploitation of the results. An agreement has been reached between an existing start-up company, SoftSound, and Cambridge University whereby the university has granted a licence to the company. It is expected that this will be an on-going relationship and it is hoped that it will provide a suitable mechanism for the exploitation of results from this project.

The BBC have experimented with the ABBOTDEMO system using a collection of recordings made for BBC programmes. Although ABBOT can be spectacularly successful when dealing with speech read, 'live' to a microphone, with recorded material which we would typically use, the performance is markedly worse.

Task 8.2: Technical Description

The initial performance of the ABBOTDEMO system on the BBC data is as follows:

  1. With a newsreader speaking clearly and with little background noise, the word recognition rate is about 65%. This is the best that we can achieve with any of the recordings.
  2. If we use clear, studio recordings but with unscripted speech, then the best recognition rate that we have achieved is 60%. In this case hesitations, repetitions/stuttering and err's and ums, throw the recogniser but much of what's left is understandable.
  3. If we use an interview recorded in a studio, then we achieve about 15% recognition. The main difference here is that the delivery is a casual, conversational style although there are complete sentences which are spoken clearly and without interruption. Even isolating these latter sentences doesn't improve the performance.
  4. In examples where there is a noticeably higher background noise level (even though recorded in studio) the recognition rate is poor: again about 15%.

Task 8.2: Future Developments

BBC would suggest that the reported variations in performance should be investigated as they are fundamental to the wider application of CSR. (BBC can put these recordings on the SPRACH ftp site if appropriate.) BBC is also planning further investigations to identify the limitations of the recogniser as more recordings are gathered and will report further when more results are available.

A new ESPRIT Long Term Research project (THISL) is scheduled to start in February 1997. The object of this project is information retrieval from broadcast speech and will draw on the results of this project (as well as the preceding WERNICKE project).

WP8: Conclusion

This WP is progressing very well with a wide selection of publications arising from the project, and dissemination of project advances through follow-on projects and the planned release of the SPRACHWORKS software.



next up previous contents
Next: References Up: No Title Previous: WP7: EVALUATIONS AND



Jean-Marc Boite
Tue Jan 7 12:46:31 MET 1997