next up previous contents
Next: WP5: Conclusion Up: Task 5.5: Context Dependent Previous: Task 5.5: Status

Task 5.5: Technical Description

The method used to implement CD phone models is based on the factorization of conditional context-class probabilities [30,31]. The joint a posteriori probability of context class j and phone class i is given by

 
yij(t) = yi(t) yj|i(t), (5.7)

where yi(t) is estimated by the recurrent network. Single-layer networks or ``modules'' are used to estimate the conditional context-class posterior,

\begin{displaymath}y_{j\vert i}(t) \simeq \Pr( c_{j}(t) \vert q_i(t) ),
\end{displaymath} (5.8)

where cj(t) is the context class for phone class qi(t). The input to each module is the internal state (similar to the hidden layer of an MLP) of the recurrent network, since it is assumed that the state vector contains all the relevant contextual information necessary to discriminate between different context classes of the same monophone. The context classes for each context module are determined by using a decision tree based approach. This allows for sufficient statistics for training and keeps the system compact (allowing fast context training).


 
Table 5.6: Word error rates by focus conditions for different numbers of context-dependent phone models.
Acoustic CI Number of CD phone models
Condition System 589 697 792 1002
BASELINE SPEECH 22.5 20.1 19.9 20.5 21.2
SPONTANEOUS SPEECH 38.4 34.6 33.7 35.5 34.5
TELEPHONE SPEECH 43.6 45.5 40.0 39.1 43.6
SPEECH IN MUSIC 39.2 32.2 31.4 28.8 31.2
SPEECH IN NOISE 32.1 30.9 31.2 29.7 29.4
NON-NATIVE SPEECH 33.3 35.4 34.4 34.9 37.5
ALL OTHER SPEECH 63.4 63.8 60.6 61.0 63.4
OVERALL 31.5 28.9 28.2 28.5 29.2
 

Word error rates are shown in Table 5.6 for systems with different numbers of context-dependent phone models. The test data is an episode of NPR Marketplace recorded on 12 July 1996, an consists of 30 minutes of data containing 4413 words. It can be seen that the number of context-dependent models has only a small effect on recognition performance. The differences between each of the context-dependent systems are not significant at p < 0.055.5. However, introducing context-dependent models provides a significant (at p <0.05) improvement over a context-independent system.


next up previous contents
Next: WP5: Conclusion Up: Task 5.5: Context Dependent Previous: Task 5.5: Status
Christophe Ris
1998-11-10