Next: WP5: Conclusion
Up: Task 5.5: Context Dependent
Previous: Task 5.5: Status
The method used to implement CD phone models is based on
the factorization of conditional contextclass
probabilities [30,31]. The joint a posteriori
probability of context class j and phone class i is given by
y_{ij}(t) = y_{i}(t) y_{ji}(t),

(5.7) 
where y_{i}(t) is estimated by the recurrent network.
Singlelayer networks or ``modules'' are used to estimate the conditional
contextclass posterior,

(5.8) 
where c_{j}(t) is the context class for
phone class q_{i}(t). The input to each module is the internal state
(similar to the hidden layer of an MLP) of the
recurrent network, since it is assumed that the state vector contains
all the relevant contextual information necessary to discriminate
between different context classes of the same monophone. The context
classes for each context module are determined by using a decision
tree based approach. This allows for sufficient statistics for
training and keeps the system compact (allowing fast context
training).
Table 5.6:
Word error rates by focus conditions for different numbers of
contextdependent phone models.
Acoustic 
CI 
Number of
CD phone models 
Condition 
System 
589 
697 
792 
1002 
BASELINE SPEECH 
22.5 
20.1 
19.9 
20.5 
21.2 
SPONTANEOUS SPEECH 
38.4 
34.6 
33.7 
35.5 
34.5 
TELEPHONE SPEECH 
43.6 
45.5 
40.0 
39.1 
43.6 
SPEECH IN MUSIC 
39.2 
32.2 
31.4 
28.8 
31.2 
SPEECH IN NOISE 
32.1 
30.9 
31.2 
29.7 
29.4 
NONNATIVE SPEECH 
33.3 
35.4 
34.4 
34.9 
37.5 
ALL OTHER SPEECH 
63.4 
63.8 
60.6 
61.0 
63.4 
OVERALL 
31.5 
28.9 
28.2 
28.5 
29.2 

Word error rates are shown in Table 5.6 for systems
with different numbers of contextdependent phone models. The test
data is an episode of NPR Marketplace recorded on 12 July 1996, an
consists of 30 minutes of data containing 4413 words. It can be
seen that the number of contextdependent models has only a small effect on
recognition performance. The differences between each of the
contextdependent systems are not significant at p <
0.05^{5.5}. However,
introducing contextdependent models provides a significant (at p
<0.05) improvement over a contextindependent system.
Next: WP5: Conclusion
Up: Task 5.5: Context Dependent
Previous: Task 5.5: Status
Christophe Ris
19981110