Next: WP1: DATABASES AND
Up: No Title
The basic theme of this project, referred to as SPRACH ( SPeech
for Connectionist Hybrids),
is to build upon WERNICKE
(ESPRIT Basic Research Project 6487, October 1992-October 1995)
to further develop
new theories, algorithms, hardware and software tools
for the extension of hybrid Hidden Markov Models (HMM) --- Artificial
Neural Networks (ANN)
methods for different continuous speech recognition
while continuing the theoretical and development work
successfully carried out in WERNICKE,
this new project also aims at extending the WERNICKE results
to new languages (UK English, French and Portuguese) and to
flexible speech recognition systems that can easily be adapted to new
domains with new lexica and new syntaxes. This thus means that
one of the SPRACH objectives
is also to develop powerful tools to allow an easy adaptation and testing
of the known (as well as the newly developed) technology to different
on top of substantial theoretical results , it was
demonstrated (see, e.g., ),
using standard international reference databases (such
as the unlimited vocabulary ARPA North American Business News
database and the EU funded SQALE project), that the hybrid HMM/ANN
approaches lead to competitive
state-of-the-art systems. Furthermore, the investigated hybrid
approach was shown to have additional advantages in terms of CPU
utilization and memory bandwidth. It is, however, our belief that such
systems can also be more flexible and more robust.
In addition to building on the WERNICKE large vocabulary continuous
speech recognition system, SPRACH investigates the development of
systems for smaller, task independent applications, with no need to
retrain the system or develop a new lexicon or grammar when
moving from one task to another.
Motivated by the results achieved in WERNICKE, several industrial and
academic laboratories have recently compared the
hybrid approaches developed in WERNICKE with the best classical
HMM approaches on
a number of speech recognition tasks. In cases where the comparison
was controlled, the hybrid approach
performed better when the number of parameters were similar,
and about the same for some cases in which the classical system
used many more parameters.
Evidence for this can be found in a number of sources, including:
The most recent results, those of the EU funded SQALE evaluations, show
the hybrid approach slightly ahead of more traditional HMM systems.
The hybrid system was evaluated on both British and American English
tasks, using a 20,000 word vocabulary and a trigram language model,
along with the other leading European systems produced by LIMSI
(France), Philips (Germany) and
Cambridge University/HTK (UK) .
Additionally, the hybrid system was efficient in its runtime CPU and
where results on Resource Management (a standard
reference database for testing ASR systems) obtained in the
framework of the WERNICKE project are presented,
In  (NYNEX) where high recognition
accuracy on a connected digit recognition task is achieved using a fairly
straightforward HMM/ANN hybrid (and is compared to state-of-the-art
Finally, the hybrid HMM/ANN approaches developed in WERNICKE
are quite general and can be applied to other tasks.
Recently, this approach was adopted by several laboratories
to handle speaker verification  (NYNEX),
handwriting recognition  (AT&T),
gene classification and fault diagnosis .
The partners are:
Faculté Polytechnique de Mons (FPMs), Belgium -- Expertise
in HMMs and ANNs (multilayer perceptrons) for speech recognition
and other signal classification problems; strong collaboration
Cambridge University Engineering Department (CUED), UK --
Experience with recurrent neural networks, statistically motivated
neural network architectures and language models.
Sheffield University (SU), UK -- Expertise in HMMs, ANNs (both
multilayer perceptrons and recurrent neural networks) and large vocabulary
INESC, Portugal -- Expertise in signal processing, neural networks
and use of hybrid HMM/MLP systems for speaker adaptation.
Intl. Computer Science Institute (ICSI), USA, as a sub-Contractor of FPMs
-- Expertise in hybrid systems, feature extraction and ``neural'' hardware.
To further reinforce the industrial relevance of this project and its possible
industrial impact, four major industrial partners agreed to be part
the SPRACH Industrial Advisory Board with the aim of
(1) guiding the research partners through the cooperative
definition of potential applications, test tasks and development
prototypes, and (2) maintaining an awareness of current and
future developments in the area.
These industrial partners
are: (1) British Broadcasting Corporation (BBC), UK, (2) Thomson CSF, France,
(3) Daimler-Benz, Germany, and (4) CSELT, Italy.
It is clear that all of them are highly interested in the possible outputs
of the present project.
Furthermore, it is worth noting that:
BBC and Thomson are particularly interested
in automatic indexing of spoken language and of recognition of broadcast
speech (which is one of the specific applications considered in this
Daimler-Benz is very active in the area of speech recognition and has
an interest in learning about potential advantages
of the hybrid HMM/ANN technology, particularly for robust systems.
Additionally, Daimler-Benz is also
one of the German industries funding ICSI, the US Subcontractor of the
CSELT is also a major player in the European speech recognition technology
and is committed to turning this technology into products. Recently, they
presented a (patent pending) speech recognition system
based on hybrid HMM/ANN technology .
Possible applications and demonstration systems that are
targeted in this project include:
Very large vocabulary ( 64K words) continuous speech recognition
of read speech---this will be an essential enabling technology for
many multimedia and telematics applications.
Voice-driven typewriter: A dictation system running in real time with
simple editing commands.
Flexible continuous speech recognizer in which lexica and grammars can
be defined on the spot, without the need of training.
Smaller (but realistic) tasks, including, e.g., robust recognition
of free format numbers. This could be done on the basis of
existing databases like the OGI numbers databases.
Recognition of broadcast speech---transcription of radio or television
speech (e.g. news-readers).
Extension of the above to several European languages. On top of the
properties discussed above, another interesting
feature of the hybrid systems is that they do not seem to require extensive
knowledge of the languages or their phonological rules to adapt the recognizer.
With appropriate databases (which become more and more available), development
of a new language is quite straightforward.
To conclude this introduction, we also remind the reader that in this
project all the partners use a common fast and flexible
hardware (SPERT) that has been developed by ICSI, the SPRACH
As already shown in WERNICKE, the availability of common
hardware and software that is somewhat customized for the research approaches
under investigation permitted both the incorporation of very
computationally-intensive algorithms, and the comparison of their
efficacy across the different sites.
As already mentioned,
this project builds upon the 1992-1995 ESPRIT project WERNICKE
which developed a state-of-the-art, speaker independent,
large vocabulary continuous speech
recognition system (comparable with the best) that is significantly
more compact and efficient than its competitors.
WERNICKE also demonstrated that hybrid HMM/ANN technology is viable,
probably preferable, to build on for the goals of this project
(e.g., more compact, less ``specialized'' and, consequently, easier to
adapt to new tasks and new languages).
Actually, the resulting hybrid HMM/ANN systems have proven to be
good alternatives to standard HMM technology. This is particularly
promising since it seems to be more and more difficult to improve on
standard HMMs and the need for alternative technologies and
new paradigms is often acknowledged by scientists working in this field.
As briefly discussed in Section 0.1.2, this technology has also
proven to be potentially useful in other application domains.
The output of WERNICKE can thus be considered as successful and has already attracted
substantial interest from several industries.
However, it is clear that there is still much to be done
to improve the existing system.
As briefly explained in Section 0.1.1,
the fundamental aim of the present project is to
further develop and optimize our hybrid HMM/ANN
speaker independent, large vocabulary ( 64K words), continuous
speech recognizers, and continue their comparison with other state-of-the-art
systems. In SPRACH, the advantages of hybrid HMM/ANN systems
are further exploited by extending the systems
to new languages (UK English, French and Portuguese) and to
flexible speech recognition systems that can easily be adapted to new
domains with new lexica and new syntaxes.
To achieve this goal, the approach followed in this project has been
built upon several basic parts, spread across different
Work Packages, with very strong relationships
Recently, our subcontractor ICSI
released (as originally
planned in WERNICKE) their full-custom single chip vector microprocessor
that will be used in this project. This processor was designed to
be a good match to the kind of research that is being done by the
SPRACH partners. However, to surpass the level of performance
obtained by high-end workstations, the design needed to be somewhat
specialized for the relevant styles of computation. In order to permit
efficient use of this chip that is simultaneously flexible along the
lines of research pursued by this group, ICSI keeps developing
software classes that permit all the computation for the kinds of
neural networks that are used in this project.
Extension of baseline HMM/ANN systems (available for American English
and UK English) to French and
Portuguese, and adaptation to different assessment databases.
This is covered by Work Packages
WP1 (for databases and baseline systems), WP2 (for lexica and automatic
learning of lexica) and WP3 (for language
models and language model adaptation).
Development, and assessment on applications defined in Section 0.1.5,
of task independent hybrid HMM/ANN recognizers in UK English,
US English (for international
assessment), French and Portuguese. This requires:
(1) large databases in the targeted languages
(covered by WP1),(2) automatic generation of
phonetic transcription and phonological rules of new lexica (covered by WP2),
(3) fast adaptation of language models (covered by WP3), and
(4) task-independent acoustic models robust to noise and channel
conditions (covered by WP4).
Formal assessment of these systems are not always be possible.
However, prototype systems will be set up regularly and will be made
available for testing by our industrial advisors (on applications
possibly defined by them); this is covered by WP7.
However, whenever possible, formal assessment will be
done on smaller databases (with or without retraining) when available;
this will be the case for the OGI free format numbers, as mentioned in
Following the WERNICKE format,
formal assessment and comparisons with other state-of-the-art systems via
international competition on the basis of common databases are
Therefore, this project has to put a large effort in the use of speech
data that are widely used for evaluating continuous speech recognizers
all around the world. This is covered by WP1 and WP7 (since training and
assessment on large common databases requires substantial effort and was originally
underestimated in WERNICKE). In WP7, a task exclusively devoted
to maintaining a good and efficient decoder for large lexica has been added.
Development and evaluation
of new theories and methods to improve or go beyond the existing
hybrid HMM/ANN systems. This constitutes the ``research core'' of this project,
and is addressed in work package WP5. In this work package,
several promising approaches that could go beyond the initial hybrid
HMM/ANN systems and improve them have been listed. Although this part is more
research oriented, it is not too speculative since preliminary work has already been
done in each of the mentioned areas and since these are closely related to the above
Use of common hardware and
software tools to help the research and to implement
resulting algorithms (covered by WP6). This was shown to be
particularly useful and efficient in WERNICKE since:
This forces all the partners to work on the same software and hardware.
Although hybrid HMM/ANN approaches appear to show
several advantages in terms of performance
and reduced complexity during recognition,
this is achieved at the cost of drastically increased time for training,
which makes further European developments and investigations in this field
(and probably also in many other problems involving ANN algorithms)
completely impossible without special hardware. Such kind of hardware
and associated software does not exist in Europe yet and its development would
probably require tens of man-years. Note that there are some more
specialized computers that have been developed for this purpose in Europe,
but they are less applicable to the kind of flexible programming
needs that are present in the research environment such as was
the case in WERNICKE.
This significantly reduces research and test cycles.
In short, on top of WP0 on Project Management,
eight work packages have been defined:
WP1: Database gathering from different sources and set up of baseline
systems. In this framework, the large vocabulary, continuous speech
recognizer resulting fromWERNICKE will be extended to French and
WP2: Development of (and development tools for) lexica for multiple
languages, including baseline dictionaries for new languages and
automatic learning of new dictionaries.
WP3: Development tools and research on different approaches to
represent and adapt language models (LM), with particular focus on generality
and ease to use.
WP4: Development tools and research on application domain independence
and adaptation, including task independency of acoustic models,
and unsupervised adaptation and training of speakers and
WP5: More fundamental research into important issues related to
speech recognition in general and hybrid systems in particular, including
perceptual models, global discrimination, mixture of experts, and others.
It is expected that, as for WERNICKE, research into those
very well defined promising research areas will lead to further enhancement
of our existing systems.
WP6: Development of the necessary software and hardware tools necessary
to carry out the proposed work. As already shown with WERNICKE, this
is particularly important in (1) reducing the research cycle and (2)
forcing all the partners to work on the same software and hardware basis.
WP7: Evaluations and prototypes development to regularly assess
the progress of SPRACH. Building upon the WERNICKE software,
it is expected that some of those demonstration systems will actually
be close to real ``products''.
WP8: Results dissemination and exploitation.
In the table below, we summarize the seven work packages broken
down into their component tasks and their estimated manpower.
Strong interaction between all the partners and
all work packages is guaranteed through the use of the same
hardware, software and (research, i.e., US English) databases.
Only language specific developments (UK English, French and Portuguese)
will be carried out by the respective sites.
At the end of its first year, the general status of this project
is quite satisfactory. The preparation of the Portuguese database
acquisition is doing well, and a Portuguese dictionary has been built.
A first version of a baseline Portuguese speaker-independent
continuous speech recognition system was built with sucess.
Unfortunately, no work could been done for the
development of a French large vocabulary speech recognition system, due
to the lack of a database. Work on a variety of novel language modelling
techniques is in progress, with some preliminary results reported in
WP 3. A vocabulary independent isolated word recognition
system has been developed.
The Linear Input Network (LIN) technique for speaker adaptation
has been further investigated at CUED.
Of course, work on large vocabulary continuous speech recognition has
Many new techniques have been investigated in WP 5, including:
SPAM, sub-band based model, REMAP, and mixture of expert for speaker
All the partners have been equipped with the SPERT board developed at
ICSI, allowing use of algorithm requiring high performance hardware.
The software was roughly functional at an early stage, but
16 bit variables were found to be inadequate for the weights used in
our training algorithms. Modified software was developed to
update, store, and retrieve 32-bit weights. The modified fixed-point
routines appear to have resolved the differences between floating
point and fixed point trainings for the feedforward neural network. We are now
working on a corresponding resolution for the recurrent neural network (RNN).
A speech training and recognition toolkit, compatible with existing
has been developed at FPMs.
CUED has released a new version of AbbotDemo in September. The
consortium has decided to make the software developed in the framework
of WERNICKE and SPRACH available to the research community.
The general guidelines for the format of this progress report were
"short" and "precise". Theoretical results developed previously and
related to the approaches used in this work are not recalled in
the written documents (although they will probably be briefly recalled during
the formal presentation); only references to these theoretical results
are provided. In case of new results, general technical descriptions are
given in the technical section of the progress report
(section on ``Technical Description'' for each
task); a more detailed description is given in the Deliverables when necessary.
Technical reports and publications have been included as parts of
The outline for each workpackage write-up is the following:
WP Overview: List of workpackage manager and partners and short description
of the workpackage.
Milestones and Deliverables:
List of T0+12 Milestones and Deliverables
and pointers to the following sections.
For each Task x:
3.x.1 -- Task x.1: Task Objective
3.x.2 -- Task x.2: Status
3.x.3 -- Task x.3: Technical Description
3.x.4 -- Task x.4: Future Developments
Next: WP1: DATABASES AND
Up: No Title
Tue Jan 7 12:46:31 MET 1997