next up previous contents
Next: Task 5.3: Future Developments Up: Task 5.3: Technical Description Previous: Task 5.3: Technical Description

Syllable Based Decoding

The NOWAY [22,23] stack decoder was used to incorporate syllable boundary information in the decoding process. The context-independent phones may occur both at a syllable onset, or not directly after the syllable onset. This can be seen in the example pronunciation shown below in which the schwa (ax) occurs both at the beginning of the first syllable, and as the second phone of the last syllable. Phones that occur at syllable onsets are tagged with _on.
ABATEMENTS = { ax_on bcl b_on ey tcl m_on ax n tcl s }
Therefore two phone models are required for each context independent phone in the system, one model for when the phone occurs at a syllable onset, and one when it does not. The same acoustic model is used to generate the observation probabilities for the syllable onset phones and the standard (ie. not at syllable onsets) phones. This assumes that the realization of any particular phone is not affected by whether or not it is the onset of a syllable. The observation probabilities of the onset phone models are set to zero when no onset is detected, and to those of the standard model when a syllable onset is detected. This effectively means that the decoder can only choose syllable onset phones when a syllable onset is detected, and thus allows the incorporation of syllable boundary information into a standard decoder.


 
Table 5.3: Word error rates by acoustic conditions for a context-independent system, and a context-independent system incorporating syllable boundary information.
Acoustic Standard CI + syllable
Condition CI system onset system
BASELINE SPEECH 22.5 21.1
SPONTANEOUS SPEECH 38.4 32.1
TELEPHONE SPEECH 43.6 40.0
SPEECH IN MUSIC 39.2 37.1
SPEECH IN NOISE 32.1 31.5
NON-NATIVE SPEECH 33.3 32.3
ALL OTHER SPEECH 63.4 59.3
OVERALL 31.5 28.8
 

The results for context-independent systems with and without syllable boundary information can be seen in Table 5.3. Incorporating syllable onset information has reduced the word error rate for each of the focus conditions, and resulted in an overall reduction in word error rate of 8.6% (which is significant at p < 0.05)5.4.


next up previous contents
Next: Task 5.3: Future Developments Up: Task 5.3: Technical Description Previous: Task 5.3: Technical Description
Christophe Ris
1998-11-10