The general motivation of the multi-stream approach discussed in this paper is to allow for the parallel processing of several feature streams, each feature stream resulting from a particular observation of the speech phenomena. These different information sources, possibly representing different properties of the speech signal are treated independently up to some recombination point (e.g., at the syllable level). In this context, the different streams are not restricted to the same frame rate and the underlying HMM models associated with each stream do not necessarily have the same topology.
This multi-stream approach is a principled way to merging different sources of temporal information (possibly asynchronous and/or with different frame rate) and has many potential advantages. In the case of subband-based recognition, a particular case of multi-stream recognition, it was shown on several databases that this approach was yielding much better noise robustness .