We have tested two common assumptions on multi-band ASR: 1) the objection of the critics of multi-band ASR that it is inherently inferior to a full-band approach because phonetic information is lost due to the division of the frequency space into sub-bands; and 2) the assumption by multi-band ASR researchers that transitions in bands often occur asynchronously (i.e., at different times than the full-band transition).
To study the first point, we calculated phonetic feature transmission for sub-bands. Not only did we fail to substantiate the above objection, but we observed the contrary. We confirmed the second hypothesis by analyzing the transition lags in each sub-band.
Our exploration of the first question further showed that, even when using a simple multi-band merging method, phonetic features are transmitted better (60.94% for our database) than the comparable full-band system (59.06%).
For the second question, we found that there is no consistent delay or expedition of phone transitions in a frequency-dependent manner, as the per-band transition lags had a mean close to zero. However, the spread of these transition lags were both dependent on frequency and on contrast conditions (speaking rate and reverberation). In particular, roughly one-third of the sub-band transitions in the control condition do not occur within 50 ms of each other. Furthermore, the high frequency band timings have a spread that is strongly dependent on speaking rate.