Running MBROLA

Installing MBROLA
Testing MBROLA
Command line format of MBROLA
Phonetic file format of MBROLA V2.04
Limitations of the program

Installing MBROLA
Simply create a directory mbrola (although this is not critical),
copy the mbrXXX.zip file into it (in which XXX stands for a version
number), and unzip the file :
unzip mbrXXX.zip (or pkunzip on PC/DOS)
You are now ready to synthesize your first French words.
First try
mbrola
to get a help screen on how to use the software
Testing MBROLA
Now in order to go further, you need to get a version of an MBROLA
language/voice database from the MBROLA project homepage. Let
us assume you have copied the FR1 database and referred to the
accompanying fr1.txt file for its installation. Then try :
mbrola fr1/fr1 bonjour.pho bonjour.wav
to create a sound file for the word 'bonjour'.
Command line format of MBROLA
mbrola diphone_database phonetic_file output_file
The phonetic_file format is described here
Basically output file is composed of signed integer numbers on 16 bits,corresponding to samples at the sampling frequency of the MBROLA voice/language database (16 kHz for the diphone database supplied by the author of MBROLA : Fr1). Since release 2.00b, MBROLA produces .au, .wav, .aiff, .aif, and .raw file formats depending on the ouput_file extension. If the extension is not recognized, the format is RAW.
Optional parameters let you shorten or lengthen synthetic speech and
transpose it by providing optional time and frequency ratios:
mbrola -t 1.2 -f 0.8 fr1/fr1 bonjour.pho bonjour.wav
for instance, will result in a RIFF Wav file bonjour.wav 1.2 times longer than
the previous one (slower rate), and containing speech in which all fundamental
frequency values have been multiplied by 0.8 ( sounds deeper )
A - instead of phonetic_file or output_file means stdin or stdout. On multitasking
machines, it is easy to run the synthesizer in real time to obtain audio
output from the audio device, by using pipes.
On a SOLARIS, try with :
mbrola fr1/fr1 bonjour.pho -.au | audioplay
On a HPUX, try with :
mbrola fr1/fr1 bonjour.pho -.au | splayer
Generally, on a UN*X machine, provided you have SOX, try with :
mbrola fr1/fr1 bonjour.pho -.au | sox -t au - -t raw -r 8000 -Ub - > /dev/audio
You've understood that we can force the format of a stdout output. For example "mbrola fr1/fr1 bonjour.pho -.wav"
will output the audio file on stdout with a RIFF Wav header. Though, note that the length
of the audio file declared in the header is invalid (some audio file manipulation
programs will do with it, other won't) since the length is known when the whole
sentence has been synthesized and that we can't rewind stdout.
Phonetic file format of MBROLA V2.04
(For automatic production of pho files from recorded speech please check out MBROLIGN).
The input file bonjour.pho supplied as an example with FR1 simply contains :
_ 51 25 114
b 62
o~ 127 48 170
Z 110 53 116
u 211
R 150 50 91
_ 91
This shows the format of the input data required by MBROLA. Each
line contains a phoneme name, a duration (in ms), and a series
(possibly none) of pitch pattern points composed of two integer
numbers each : the position of the pitch pattern point within
the phoneme (in % of its total duration), and the pitch value
(in Hz) at this position.
Hence, the first line of bonjour.pho :
_ 51 25 114
tells the synthesizer to produce a silence of 51 ms, and to put
a pitch pattern point of 114 Hz at 25% of 51 ms. Pitch pattern
points define a piecewise linear pitch curve. Notice that the
pitch pattern they define is continuous, since the program automatically
drops pitch information when synthesizing unvoiced phones.
The data on each line is separated by blank characters or tabs.Comments
can optionally be introduced in command files, starting with a
semi-colon(;). A comment begining with "T=ratio" or "F=ratio" changes
the time or frequency ratio respectively.
Notice, finally, that the synthesizer outputs chunks of synthetic
speech determined as sections of the piecewise linear pitch curve.
Phones inside a section of this curve are synthesized in one go.
The last one of each chunk, however, cannot be properly synthesized
while the next phone is not known (since the program uses diphones
as base speech units). When using mbrola with pipes, this may
be a problem. Imagine, for instance, that mbrola is used to
create a pipe-based speaking clock on an HP :
speaking_clock | mbrola - -.au | splayer
which tells the time, say, every 30 seconds. The last phone of
each time announcement will only be synthesized when the next announcement
starts. To bypass this problem, mbrola accepts a special command
phone, which flushes the synthesis buffer : "#"
Limitations of the program
- There may be up to 20 pitch pattern points in each phone,
although not more than two or three are sufficient to copy natural
prosody. We have set up a higher limit so as to enable the use
of MBROLA to produce synthetic singing voices, in which case
long vowels with vibrato may require a large number of pitch pattern
points.
- Phones can be synthesized with a maximum duration which depends
on the fundamental frequency with which they are produced. The
higher the frequency, the lower the duration. For a frequency
of 133 Hz, the maximum duration is 7.5 sec. For a frequency of
66.5 Hz, it is 15 sec. For a frequency of 266 Hz, is is 3.75 sec.
- Although pitch pattern points are facultative, the synthesizer
will refuse to produce sequences of more than 250 phones with
no pitch information.

Last updated December 17, 1999, send comments to Mbrola Team
|