next up previous contents
Next: Task 1.3: Future Developments Up: Automatic segmentation and labeling Previous: Automatic segmentation and labeling

Language modeling

From the total PÚBLICO texts we selected 80% as the training part, 10% as development part and 10% as evaluation part. From the training part, bigram backoff closed language models were computed. The 5K development test set language model yielded a perplexity of 231. This represents a large perplexity task.

Christophe Ris