... Toolkit1.1
Available on http://svr-www.eng.cam.ac.uk/~prc14/ CMU-Camb_Toolkit_v2-BETA.4.tar.gz
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Y0 1.2
Pronounced ``Why nought''. It was one of the decoders used in the WERNICKE project and is still in use to make phonetic alignments.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... system1.3
This system was developed at INESC in collaboration with CLUL, in the group of Prof. Isabel Trancoso.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... words3.1
There are some variations to the procedure which lead to slightly different modeling of tagged corpus; e.g., one may mark the document by a set of tags ${\cal T}$ (regardless of whether the word is in the vocabulary or not) first, then choose the vocabulary from most frequent identifiers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...[6]3.2
For calibration, using this acoustic model with a $60\:000$ word vocabulary and the `standard' 1995 back-off, trigram LM computed from 230 million words of text data resulted in a word error rate of 15.3% on this test data.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... pronunciations)3.3
Due to text processing problems (mis-spellings, abbreviations, etc.), around $16\:000$ words in the tagged unigram word set were not used in these speech recognition experiments (i.e., pronunciations were not provided for this set of words).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... significant3.4
Statistical significance test have been performed using the patched pairs test described in [24].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... morphemes3.5
In Morphology analysis we call the ``building blocks'' for words by morphemes. A word can be a morpheme itself - as in the word can, or it can be composed of a number of morphemes, as in destabilize.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... /3.6
We use here ``/'' as decomposition mark only. It does not appear in the Portuguese spelling.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... morphologically3.7
For that propose we used a morphological analyzing tool - PALAVROSO [10], which classifies any Portuguese word along morphological lines
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... show4.1
ABC Nightline: Episode 05/23/96
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...4.2
Note that minimum WER of 67% is calculated considering the full 5186 words that occur in the half hour broadcast not only the 3500 that occur in the evaluation data.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... maximizing5.1
The a posteriori-based formulation (finding the model M maximizing P(M|X)) is not discussed here. For further details, see [16]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... segmentation5.2
This was achieved by using time dependent syllable transition penalties, where the penalties are very high for the time slots where a syllable transition is not allowed.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...tsylb25.3
The actual syllabification of the lexicon was done by Eric Fosler, of the International Computer Science Institute.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... 0.05)5.4
Significance tests were performed using the two-tailed matched pairs method described in [24].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... 0.055.5
Significance tests were performed using the two-tailed matched pairs method described in [24].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Christophe Ris
1998-11-10