next up previous contents
Next: Task 3.3: Status Up: Task 3.3: Language Model Previous: Task 3.3: Language Model

Task 3.3: Objective

In constructing a language model intended for general text, one is faced with the following problem. One can either generate a model which is trained on material from a specific domain, with the result that the model's performance will be good for text from the same domain, but poor for more general text, or one can construct a model by training it on text from many diverse sources, which will perform better on general text, but will not be especially well suited for any particular domain. Clearly, the ideal would be a general language model whose parameters could be automatically tuned according to the style of text it is attempting to model.

In this task we are investigating dynamic and adaptive language modelling algorithms that can operate in an unsupervised manner. Various adaptive language models have been developed, many of which have been shown to have a lower perplexity than the equivalent baseline trigram model. However, when these models have been incorporated into speech recognition systems their effect on word error rate has been less encouraging. One possible explanation for this phenomenon is that the adaptation techniques will, in general, require an initial transcription upon which to base the adaptation. If this transcription contains many errors then it will be of little use in providing information about the topic and style of the text, and so the resulting adaptation will be poor.

This work aims to develop a method for language model adaptation which will be effective even if the initial transcription has a high error rate. In addition, the premise that adaptation performance is linked to the error rate of the initial transcription is investigated.

We are also concerned with adapting the language model to technology to other languages, specifically Portuguese. In this task we are concerned with developing techniques to build robust statistical language models for highly inflectional languages such as Portuguese.


next up previous contents
Next: Task 3.3: Status Up: Task 3.3: Language Model Previous: Task 3.3: Language Model
Christophe Ris
1998-11-10