In this task we are investigating dynamic and adaptive language modelling algorithms that can operate in an unsupervised manner. Various adaptive language models have been developed, many of which have been shown to have a lower perplexity than the equivalent baseline trigram model. However, when these models have been incorporated into speech recognition systems their effect on word error rate has been less encouraging. One possible explanation for this phenomenon is that the adaptation techniques will, in general, require an initial transcription upon which to base the adaptation. If this transcription contains many errors then it will be of little use in providing information about the topic and style of the text, and so the resulting adaptation will be poor.
This work aims to develop a method for language model adaptation which will be effective even if the initial transcription has a high error rate. In addition, the premise that adaptation performance is linked to the error rate of the initial transcription is investigated.
We are also concerned with adapting the language model to technology to other languages, specifically Portuguese. In this task we are concerned with developing techniques to build robust statistical language models for highly inflectional languages such as Portuguese.