next up previous contents
Next: Bibliography Up: WP3: LANGUAGE MODELS AND Previous: Task 3.3: Future Development


Work on a variety of novel language modelling techniques is in progress, and these being incorporated into our existing speech recognition systems. Results are promising for the Named Entity tagged n-grams, particularly with respect to increasing vocabulary size to well beyond 64K. The Latent Semantic Analysis approach to adaptive language modelling, which derives a ``document space'' has been developed. Small improvements on text corpora have been observed, speech recognition experiments are in progress. The n-gram cache approach has been extensively investigated; the results appear to indicate that a perfect cache (based on a reference transcription) does not result in an improved speech recognition performance. Language modelling techniques, at both word and morpheme level, have been developed for Portuguese and evaluated on text. Speech recognition tests are planned for these LMs.

Christophe Ris