In this report, we describe the Bref-80 database on which the system
is based, the labeling and training procedure and preliminary results
we get on phoneme and word recognition experiments. All those
experiments were carried out with the context-independent hybrid
HMM/MLP technology using the STRUT software  for the
training and the phonetic alignment and the NOWAY decoder for large
vocabulary recognition. BREF-80  is a large read speech
corpus from 80 speakers. The text material was selected from the
French newspaper Le Monde so as to provide large vocabularies
(over 20,000 words) and a wide range of phonetic contexts. As Bref
contains 1115 distinct diphones and over 17,500 triphones, it can be
efficiently used to train phonetic models. The base lexicon,
represented by 35 phonemes, was obtained using a text-to-phoneme tool
and was manually verified. The lexicon was extended in order to deal
with potential liaisons between words.
The training set used throughout our experiments consists of 3737 sentences (3363 sentences for the training and 374 for the cross-validation) from 56 speakers (approximately 9 hours of speech) and the test set consists of 144 sentences from 8 speakers (4 males, 4 females).