Next: Porting SPRACH technology to
Up: No Title
Over the last 2 years CUED developed a baseline Broadcast News system.
This year, we are choosing a number of promising developments from the
SPRACH project and attempting to find ways to integrate them into the
It was agreed that for convenience in integrating ideas from distant
partners we would, at least initially, use n-best lists and word lattices
from the baseline system; these would be rescored using the new
approaches from the partners. Work on the baseline system that would
generate the lattices will continue at CUED.
The cross-site collaborative work will primarily consist of the
Static pronunciation modeling for baseline system: CUED, Sheffield,
and ICSI will all be involved in this. The primary collaboration will
be between Gethin Williams of Sheffield and Eric Fosler of ICSI; Williams
has been spending January through May at ICSI working with Fosler.
CUED will in all cases act as integrator of the new approaches into the
baseline system. Gary Cook from CUED will also be visiting ICSI in May
to further facilitate this joint work. Sheffield will also work on
including compound words in the pronunciation lexicon, looking for
instance at the bigrams with the highest prior probabilities, as well
as considering named entities.
Dynamic pronunciation modeling(incorporating on-line measures of
speaking rate, confidence): this activity involves the same cast of
characters as the previous one, but we list it separately to suggest that
it is a trickier and higher risk approach. It will almost certainly be
applied at the level of rescoring word lattices or n-best lists, as
opposed to integration into the baseline system as will be done for the
Multiple streams: CUED, ICSI, and FPMS will work on this. In
ICSI and FPMS have developed multistream measures and approaches which
could in principle be applied either at the frame level in the current
decoder or as a postprocessing step for the baseline system. In each
case CUED will act as the final arbiter to determine which of these
work well enough in combination with the baseline to be included in
the final system. In ICSI's case, an MLP-based system, trained on
alignments from CUED, will be trained using ``sluggish'' features
(modulation spectral variables) in addition to RASTA-PLP. This is
the combination that ICSI found to be useful in reducing the effects of
environmental acoustics. They will also experiment with using telephone
bandwidth features as probability estimators that will further complement
the basic CUED set. FPMS will consider the use of NLDA and other
promising multistream approaches (e.g., syllabic time scale mergers).
We have agreed to synchronise on alignments with CUED, so that at all
times each site will work with the same CUED-generated phonetic targets for the
Language modeling: CUED and Sheffield will work together on this,
comparing approaches from the exploratory phases in SPRACH
and incorporating the best in the BN evaluation system. Current
candidates are incorporating named entities, mixture LSA models,
and top-down variable order LMs.
Decoder development: this will continue as before, but the
between Cambridge and Sheffield will be tightened in the coming months
to ensure that the next set of changes in the Noway decoder correspond
to the immediate requirements of the partners' participation in the
Segmentation: CUED and Sheffield are both interested in conducting
a few experiments to improve the segmentation of Broadcast News data
into focus sections (e.g., segmenting out the musical regions).
Experiments are planned using functions of posterior estimates, as well as
long-time covariance matrices from acoustic features. Poor segmentation
was a significant contributor to error for CUED's systems in the last
evaluation, so it is worth spending a little time on this.
Integration: CUED will coordinate all of this
collaborative effort on the BN system.
Synchronisation will be provided by periodic agreement on a new set of
training alignments and development test lattices from the previous
round. CUED will provide these alignments, and specify the development tests.
In addition to these collaborative efforts, the baseline system itself
will be further developed at CUED. The major developments planned are
noise estimation, cross-word CD modelling and training, and vocal tract
length normalisation/adaptation using covariance.
In the next two months, we will agree on
common dev test sets with known characteristics and a
core 64K vocabulary. Sheffield has also promised to provide a
new decoder (Noway) release. Sheffield and ICSI will generate a baseline
set of static pronunciation models, incorporating confidence measures,
and Sheffield will generate an initial list of
compound lexical items (e.g. top 200 bigrams, or most frequent named entities).
Sheffield and CUED will test and release a segmenter, and CUED will
conduct tests on their existing BN system
using the new decoder, different decoder parameters as
suggested by Sheffield, and the new segmenter. ICSI will also build up
a preliminary MLP-based system in this period incorporating RASTA-PLP.
FPMS will build a preliminary MLP-base system as well.
It should be noted that ICSI has vastly greater SPERT-based
computational capability than the other sites. This will be made
available to the collaboration for all of the MLP-based training,
particularly as time gets short before the evaluation.
By the end of the SPRACH project, we will have an improved Broadcast
News system incorporating the most effective ideas from each of the
sites. Those that are most promising will be considered for inclusion
at a more tightly integrated level (i.e., in the decoder) for the
Next: Porting SPRACH technology to
Up: No Title