Task 4.2: Status

We have used methods derived from acoustic confidence measures based on estimates of local posterior probabilities produced by an RNN to segment continuous audio into regions where it is appropriate to apply speech recognition and those where it is not. The segmentation is cheap to compute and has lead to lower overall word error rate and reduced decoding time. The technique was evaluated using material from the Broadcast News corpus.

Frequency warping results were obtained on the WSJCAM0 database, and show that vocal track length normalization is an effective method for adaptation.

Christophe Ris