Next: Task 1.3: Future Developments
Up: Automatic segmentation and labeling
Previous: Automatic segmentation and labeling
From the total PÚBLICO texts we selected 80% as the training part, 10% as
development part and 10% as evaluation part.
From the training part, bigram backoff closed language
models were computed. The 5K development test set language model yielded
a perplexity of 231. This represents a large perplexity task.