|
||||||
STRUT |
|||
|
|
The Speech Training and Recognition Unified ToolThe Speech Training and Recognition Unified Tool (STRUT) has been
developped to do research on speech recognition and fast development and testing of related applications. The software is able to do speech analysis, models training and speech recognition. The tool consists in many
"independent'' small pieces of code, one for each of identified module in the process of speech recognition: sampling, feature extraction, clustering, probability estimation, and decoding.
Data exchange between the programs can be done in three ways:
For this structure to work well, data and file formats have to be precisely defined. Each data file has an ASCII header of at least 1024 bytes, describing the contents of the file. Putting as much information as possible in those headers allows the user to check each step of the recognition or training process. Routines have been developed to edit or remove those header, as well as to read or write the data. The format of the header has been inspired by the format defined by the National Institute of Standards and Technology (NIST) for speech waveform files. This allows to be compatible to databases provided by the Linguistic Data Consortium (LDC), as well as to use the SPeech HEader REsources (SPHERE) software package to handle those headers. Figure 1: Recognizer block diagram The block diagram of a recognizer is presented in figure 1. If you don't work with discrete probabilities, the clustering block can be removed.
In a Viterbi training diagram, the decoder block is replaced by a state path decoder (figure 2). Then the segmentation is used by another program to update the models, whatever they are. Baum-Welch can also be easily implemented: probabilities for each state are stored instead of segmentation. DocumentationHave a look at the strut user's
guide. |
|
|