Experiments have been done to compare context independent (CI) models with context independent models combined with context dependent (CD) models
(CI&CD). A minimum duration of half the mean duration of the considered phoneme was also used. The ANNs were feed-forward multilayer perceptrons. The number of parameters was kept to 166,000 for all of the compared systems.
Three tying configurations were compared. For the CI&CD(I) case, generalised transitions were based on the phonetic classes (9 classes + silence) of the left and right phonemes of the transition. Hence, the neural network estimating context independent probabilities as well as context dependent probabilities had 46 (CI phones) + 10*10 (CD transitions) = 146 outputs. For the CI&CD(II) case, generalised transitions were based on the place of articulation of the left and right phonemes of the transition (8 classes + silence). The neural network had 127 outputs. Finally, for the CI&CD(III) case, we used the automatic data driven clustering approach to obtain a set of 81 transition classes (neural network with 127 outputs).