Next: Task 3.2: Grammatical Inference
Up: Task 3.1: Markov Model
Named entity tagged language models seem to be a very promising
approach to extending the vocabulary size of an LVCSR system.
Immediate future work will concentrate on two areas:
Some of these tasks may be carried in THISL, as the THISL task has a
naturally extending vocabulary (as new names appear in the news).
- Improved text normalization and preprocessing;
- Specialization of the NE tagger for language modeling rather
than the DARPA MUC task;
- Possible investigation of statistical NE tagger (using n-grams