Previously decoding has consisted of processing speech utterance-by-utterance, with each segment passed to the decoder of duration 30s or less. For many applications this is unrealistic, for example decoding a stream of broadcast audio. In this case we require online decoding. By online we mean:
The efficient implementation of online decoding in NOWAY required a substantial re-implementation of the hypothesis classes, in which common histories are merged and prefixes common to all hypotheses are output incrementally. The hypothesis data structure is now tree structured: each node of the tree corresponds to a hypothesis element (word) and includes an end time, a local (word) probability, a global (hypothesis) probability, a pointer to its parent (previous hypothesis) and a count of the number of children. Incrementing a hypothesis by one word involves dynamically allocating a new child to the parent node. Deleting a hypothesis involves deleting the node corresponding to the final word, and decrementing the count of the parent. Parents with a count of zero are added to a garbage collection list (garbage collection is not immediate since new children may be generated). This list is used for a ``reverse'' garbage collection in which nodes with an end time previous to the current reference time are deleted; the process is continued recursively deleting parents of deleted nodes with no children. A second ``forward'' garbage collection phase is used to provide online output. Starting with the root of the tree the information for all nodes (previous to the current reference time) with exactly one child is output, and the node deleted. This form of online output has an average ``algorithm delay'' of about 2s on a typical broadcast news decoding.
Previously, the NOWAY decoder supported only bigram and trigram language models. To support work in WP3, the decoder now supports n-grams for any n, mixtures of n-gram language models and class-based language models for the named entity language modelling work in T3.1. All these language models may be applied in a single pass decoding, without the need for interim lattice construction. Since these classes have been written for LM experimentation, relatively little effort has been directed towards efficiency issues.
Much of the work in this task may be classed as software maintenance and support, including ongoing bug fixing and compatibility with various file formats (ARPA/NIST CTM and SRT formats, HTK SLF lattice format, ARPA/NIST index (ndx) files).