The goals of large vocabulary, speaker independent recognition, domain adaptation and task independence require the availability of appropriate pronunciation lexica, encompassing a very large number of words and possibly multiple pronunciations of each word. It has been shown recently that the pronunciation lexica are one of the most important blocks in the development of large vocabulary, speaker independent recognition systems.
The main focus of this workpackage is on the portability of these systems to new languages and the necessary development of new lexica for those languages. In this respect, the two new languages involved in SPRACH are in different situations. For French, there already exists a significant amount of speech recognition work done, and consequently there already exist relatively large lexica, which may however have to be augmented. For Portuguese, the only existing work is on small hand-built vocabularies for very limited tasks.
To cover the necessary developments we divided this workpackage in two different tasks. In task 2.1: Baseline dictionaries for new languages, with duration of just the first year of the project, we developed baseline lexica for the French and Portuguese languages. In task 2.2: Automatic Learning of New Dictionaries which lasts the next two years of the project, our aim is to develop techniques to automatically create new pronunciations for existing words in the lexicon and for adding new words with their pronunciations to the baseline lexica created in task 2.1. Next we give a detailed description of task 2.2.