next up previous contents
Next: The model Up: Task 4.1: Training Independent Previous: Task 4.1: Status

Task 4.1: Technical Description

At FPMs, we have trained our system on an isolated words database, PHONEBOOK. The database contains 106 word lists, each composed of 75 or 76 words that have been pronounced by a few (typically around 11) speakers. The speakers and words are different for each word list. We have tested our system on 8 word list that did not belong to the training set. Since the lexicon is different in each of those 8 word lists, we then have the choice to recognize the 8 word lists as a whole (yielding a lexicon of 600 words) or to recognize each word list independently with a lexicon of about 75 words. In the second case, the recognition rate will be the (unweighted) average over the 8 recognition rates. So far, only a ``small'' training set (5 hours of speech) has been used (although final systems will also be trained on the ``full'' training set). The first dictionary is the one released with PHONEBOOK and contains the phonetic transcriptions of the PHONEBOOK words according to a 42-phoneme inventory. The second dictionary is the 110,000-word CMU 0.4 dictionary using 39 phonemes (a subset of the TIMIT phonemes). Some of the PHONEBOOK words that were not present in CMU 0.4 have been transcribed manually.



 
next up previous contents
Next: The model Up: Task 4.1: Training Independent Previous: Task 4.1: Status
Christophe Ris
1998-11-10