Data reduction of EPG data is possible because the EPG sequence presents redundancy due to physical constraints of the tongue-palate system. This redundancy is both spatial and temporal. Currently we are only investigating spatial redundancy.

We have experimented with a subset of the multilingual EUR-ACCOR database (ESPRIT II Basic Research Actions 3279 and 7098), which was designed for the cross-language study of coarticulation. EPG frames corresponding to the English language were selected and divided into training and test sets. The various dimension reduction algorithms were evaluated on this data.

The results are presented and discussed in detail in a paper contained in the appendix to this report. In summary:

- In terms of log likelihood factor analysis performs slightly better than PCA.
- Although GTM is limited to a two dimensional latent space, it performs better than factor analysis in terms of test set log likelihood, and as well as a PCA or factor analysis with 10 latent variables in terms of test set reconstruction error.
- Mixtures of factor analysers suffer from optimization problems when using the EM algorithm due to singularities of the likelihood surface at degenerate solutions.