next up previous contents
Next: Task 3.1: Technical Description Up: Task 3.1: Markov Model Previous: Task 3.1: Objective

Task 3.1: Status

SU's work in this area has concentrated on the development of n-gram language models using Named Entity tags. Our experience has indicated that around 70% of out of vocabulary (OOV) words are names (personal, location, company, etc.). We have investigated using a named entity tagger on the LM text training data to develop language models incorporating these tags. This approach appears very promising in speech recognition experiments on the NAB database -- using these language models to increase the vocabulary of the system from 20,000 words to over 100,000 words resulted in a word error rate reduction from 20.5% to 17.7%.

Christophe Ris