next up previous contents
Next: Discussion Up: Task 4.1: Technical Description Previous: The model

Phone to phone transition clustering

Two strategies have been used to cluster the whole set of transitions into a limited number of generalized classes.

The first one was to define transition classes based on the phonetic classes of the left phone and of the right phone. As opposed to transitions (from the complete set of transitions), which are defined by the left and right phones, the generalized transitions were characterized by the left and right phonetic classes. It was indeed assumed that left or right phones from the same phonetic class have similar effects on the realization of the transition. We have considered two criteria to define the phonetic classes :

Table 4.1: Broad phonetic classes.
broad class TIMIT phonemes
voiced stops b, bcl, d, dcl, g, gcl
unvoiced stops p, pcl, t, tcl, k, kcl
affricates jh, ch
unvoiced fricatives s, sh, f
voiced fricatives z, zh, th, v, dh
nasals m, n, ng
semivowels l, r, w, y
whisper hh
vowels iy, ih, eh, ey, ae, aa, aw,
  ay, ah, ao, oy, ow, uh, uw, er

Table 4.2: Broad phonetic classes based on the place of articulation.
broad class TIMIT phonemes
closures bcl, dcl, gcl, pcl, tcl, kcl
front consonants b, p, f, th, v, dh, m
middle consonants d, t, jh, ch, s, sh, z, zh, n, l
back consonants g, k, ng, hh
front vowels y, iy, ih, eh, ae, ey, ay, oy
middle vowels r, er, aa, ah, ao
back vowels uh, uw
middle diphtongs w, aw, ow

The second strategy was based on automatic data-driven clustering of all of the phone to phone transitions. In this approach, we identified states for which pdf's could be tied with a minimal loss of the system modelling capability, avoiding introducing a priori (and usually inaccurate) knowledge in the system. The pdf's of a given state were mode-led using single Gaussian distributions and a K-means algorithm was used to cluster the Gaussian. The distance measure between two Gaussians was defined by

\begin{displaymath}d(i,j)=[\frac{1}{V} \sum_{k=0}^V \frac{(\mu_{ik}-\mu_{jk})^2}{\sqrt{\sigma_{ik}^2 \sigma_{jk}^2}}]^{1/2}\end{displaymath}

where $\mu_i$ is the mean vector and $\sigma_i$ is the standard deviation vector of Gaussian i.

next up previous contents
Next: Discussion Up: Task 4.1: Technical Description Previous: The model
Christophe Ris