Next: Modeling the Document Space Up: Document Space Modelling Previous: Document Space Modelling

#### Mixture LM

A mixture LM, , is constructed as the weighted sum of component LMs derived from the partitioned corpus (either hand-labeled or automatic) [7]. Given a document, i.e., a sequence of words , it is computed using the conventional trigram LMs by

 (3.6)

where cj is a mixing factor such that .

Mixing factors cj are tuned on-the-fly to the previously processed part of the document using the expectation-maximization (EM) type algorithm. Suppose n words, , have been processed from the beginning. Then, considering the likelihood function for the mixture LM, it is straightforward to derive incrementally adjusting formula for cj(n);

 (3.7)

where is estimated by

 (3.8)

with appropriate terminating condition. Note that a posterior mode may be used instead by combining some prior function at Equation (3.7).

Next: Modeling the Document Space Up: Document Space Modelling Previous: Document Space Modelling
Christophe Ris
1998-11-10