Up: WP3: LANGUAGE MODELS AND
Previous: Task 3.3: Future Development
Work on a variety of novel language modelling techniques is in
progress, and these being incorporated into our existing speech
Results are promising for the Named Entity
tagged n-grams, particularly with respect to increasing vocabulary
size to well beyond 64K. The Latent Semantic Analysis approach to
adaptive language modelling, which derives a ``document space'' has
been developed. Small improvements on text corpora have been
observed, speech recognition experiments are in progress. The n-gram
cache approach has been extensively investigated; the results appear
to indicate that a perfect cache (based on a reference transcription)
does not result in an improved speech recognition performance.
Language modelling techniques, at both word and morpheme level, have
been developed for Portuguese and evaluated on text. Speech
recognition tests are planned for these LMs.