next up previous contents
Next: Task 7.1: Future Developments Up: Task 7.1: Decoder Issues Previous: Task 7.1: Status

Task 7.1: Technical Description

The NOWAY decoder was initially developed as part of the WERNICKE project and development has continued in SPRACH (and THISL). It was designed for large vocabulary continuous speech recognition problems, with arbitrary language models. It is a modified stack decoder, referred to as a start synchronous decoder. The most complete description of the algorithms used and the performance of the decoder is contained in the paper submitted to IEEE Trans. Speech and Audio Proc. contained in the appendix. Over the past year, much of the decoder development has been in two areas: online decoding and support for various LMs investigated in WP3.

Previously decoding has consisted of processing speech utterance-by-utterance, with each segment passed to the decoder of duration 30s or less. For many applications this is unrealistic, for example decoding a stream of broadcast audio. In this case we require online decoding. By online we mean:

1.
The capability to process an input stream of acoustic data of arbitrary length;
2.
Incremental output of recognition hypotheses;
3.
No increase CPU/memory usage with increasing ``utterance length''.

The efficient implementation of online decoding in NOWAY required a substantial re-implementation of the hypothesis classes, in which common histories are merged and prefixes common to all hypotheses are output incrementally. The hypothesis data structure is now tree structured: each node of the tree corresponds to a hypothesis element (word) and includes an end time, a local (word) probability, a global (hypothesis) probability, a pointer to its parent (previous hypothesis) and a count of the number of children. Incrementing a hypothesis by one word involves dynamically allocating a new child to the parent node. Deleting a hypothesis involves deleting the node corresponding to the final word, and decrementing the count of the parent. Parents with a count of zero are added to a garbage collection list (garbage collection is not immediate since new children may be generated). This list is used for a ``reverse'' garbage collection in which nodes with an end time previous to the current reference time are deleted; the process is continued recursively deleting parents of deleted nodes with no children. A second ``forward'' garbage collection phase is used to provide online output. Starting with the root of the tree the information for all nodes (previous to the current reference time) with exactly one child is output, and the node deleted. This form of online output has an average ``algorithm delay'' of about 2s on a typical broadcast news decoding.

Previously, the NOWAY decoder supported only bigram and trigram language models. To support work in WP3, the decoder now supports n-grams for any n, mixtures of n-gram language models and class-based language models for the named entity language modelling work in T3.1. All these language models may be applied in a single pass decoding, without the need for interim lattice construction. Since these classes have been written for LM experimentation, relatively little effort has been directed towards efficiency issues.

Much of the work in this task may be classed as software maintenance and support, including ongoing bug fixing and compatibility with various file formats (ARPA/NIST CTM and SRT formats, HTK SLF lattice format, ARPA/NIST index (ndx) files).


next up previous contents
Next: Task 7.1: Future Developments Up: Task 7.1: Decoder Issues Previous: Task 7.1: Status
Christophe Ris
1998-11-10