next up previous contents
Next: Task 3.3: Technical Description Up: Task 3.3: Language Model Previous: Task 3.3: Objective

Task 3.3: Status

The principal contribution of the work at SU is to characterize the document space resulting from a modeling approach referred to as Latent Semantic Analysis and to demonstrate the approach for mixture LM application. Text experiments have been carried out using the British National Corpus, and we expect speech recognition results by the time of the review meeting.

CUED have developed a method termed an N-best cache. The effectiveness of this adaptation technique has been assessed on data from the DARPA Hub-4 Broadcast News. The results show that although large reductions in perplexity are possible, the effect on word error rate is minimal. It has also been shown that even when language model adaptation is applied in a supervised mode (i.e. when the adaptation is based on the correct transcription) no significant effect on word error rate is observed.

INESC have been concerned with Portuguese language modelling. A decomposition method using both words and morphemes has been developed and applied. Experiments have been carried out using an 11 million word portion of the BD-PUBLICO database, and results reported in terms of perplexity, OOV rate and memory requirements.


next up previous contents
Next: Task 3.3: Technical Description Up: Task 3.3: Language Model Previous: Task 3.3: Objective
Christophe Ris
1998-11-10