next up previous contents
Next: Task 1.2: Future Developments Up: Task 1.2: Technical Description Previous: Phoneme Recognition Experiments

Word Recognition Experiments

The following experiments were performed with PLP features. We used the same test set than for the phone recognition experiment and defined different vocabulary sizes. The smaller lexicon contained the 1093 words found in the test sentences. The 3K and 13K lexicons are extensions of the 1K vocabulary with the most common words in the original text corpus. The 64K lexicon is open vocabulary.

In addition we used bigram and trigram language models estimated on texts extracted from the French newspaper Le Monde (1990-1992, 80M words) using the CMU-Cambridge SLM toolkit. The bigram has a perplexity of 151.6, while the trigram perplexity is 94.4. The results obtained with our baseline CI system are reported in table 1.7 and table 1.8.


 
Table 1.7: Word error rates using a classical hybrid system and PLP features on different vocabulary sizes - Bigram language model
Lexicon size Substitution Deletion Insertion WER
1K 12.9% 5.0% 2.9% 20.8%
3K 14.0% 5.2% 3.4% 22.5%
13 17.4% 6.2% 3.1% 26.7%
64K 18.9% 6.3% 2.7% 28.0%
 


 
Table 1.8: Word error rates using a classical hybrid system and PLP features on different vocabulary sizes - Trigram language model
Lexicon size Substitution Deletion Insertion WER
1K 9.1% 4.2% 2.3% 15.7%
3K 12.2% 4.6% 3.0% 19.8%
13K 15.3% 5.6% 3.1% 24.1%
64K 16.5% 6.0% 2.5% 25.0%
 

Note that the baseline system developed so far suffers from some weaknesses as :


next up previous contents
Next: Task 1.2: Future Developments Up: Task 1.2: Technical Description Previous: Phoneme Recognition Experiments
Christophe Ris
1998-11-10