Running MBROLA

Installing MBROLA

Testing MBROLA

Command line format of MBROLA

Phonetic file format of MBROLA V2.04

Limitations of the program

Installing MBROLA

Simply create a directory mbrola (although this is not critical), copy the file into it (in which XXX stands for a version number), and unzip the file :

unzip (or pkunzip on PC/DOS)

You are now ready to synthesize your first French words.

First try


to get a help screen on how to use the software

Testing MBROLA

Now in order to go further, you need to get a version of an MBROLA language/voice database from the MBROLA project homepage. Let us assume you have copied the FR1 database and referred to the accompanying fr1.txt file for its installation. Then try :

mbrola fr1/fr1 bonjour.pho bonjour.wav

to create a sound file for the word 'bonjour'.

Command line format of MBROLA

mbrola diphone_database phonetic_file output_file

The phonetic_file format is described here

Basically output file is composed of signed integer numbers on 16 bits,corresponding to samples at the sampling frequency of the MBROLA voice/language database (16 kHz for the diphone database supplied by the author of MBROLA : Fr1). Since release 2.00b, MBROLA produces .au, .wav, .aiff, .aif, and .raw file formats depending on the ouput_file extension. If the extension is not recognized, the format is RAW.

Optional parameters let you shorten or lengthen synthetic speech and transpose it by providing optional time and frequency ratios:

mbrola -t 1.2 -f 0.8 fr1/fr1 bonjour.pho bonjour.wav

for instance, will result in a RIFF Wav file bonjour.wav 1.2 times longer than the previous one (slower rate), and containing speech in which all fundamental frequency values have been multiplied by 0.8 ( sounds deeper )

A - instead of phonetic_file or output_file means stdin or stdout. On multitasking machines, it is easy to run the synthesizer in real time to obtain audio output from the audio device, by using pipes.

On a SOLARIS, try with :

mbrola fr1/fr1 bonjour.pho | audioplay
On a HPUX, try with :
mbrola fr1/fr1 bonjour.pho | splayer
Generally, on a UN*X machine, provided you have SOX, try with :
mbrola fr1/fr1 bonjour.pho | sox -t au - -t raw -r 8000 -Ub - > /dev/audio

You've understood that we can force the format of a stdout output. For example "mbrola fr1/fr1 bonjour.pho -.wav" will output the audio file on stdout with a RIFF Wav header. Though, note that the length of the audio file declared in the header is invalid (some audio file manipulation programs will do with it, other won't) since the length is known when the whole sentence has been synthesized and that we can't rewind stdout.

Phonetic file format of MBROLA V2.04

(For automatic production of pho files from recorded speech please check out MBROLIGN).

The input file bonjour.pho supplied as an example with FR1 simply contains :

_ 51 25 114 
b 62 
o~ 127 48 170 
Z 110 53 116 
u 211 
R 150 50 91 
_ 91

This shows the format of the input data required by MBROLA. Each line contains a phoneme name, a duration (in ms), and a series (possibly none) of pitch pattern points composed of two integer numbers each : the position of the pitch pattern point within the phoneme (in % of its total duration), and the pitch value (in Hz) at this position.

Hence, the first line of bonjour.pho :

_ 51 25 114

tells the synthesizer to produce a silence of 51 ms, and to put a pitch pattern point of 114 Hz at 25% of 51 ms. Pitch pattern points define a piecewise linear pitch curve. Notice that the pitch pattern they define is continuous, since the program automatically drops pitch information when synthesizing unvoiced phones.

The data on each line is separated by blank characters or tabs.Comments can optionally be introduced in command files, starting with a semi-colon(;). A comment begining with "T=ratio" or "F=ratio" changes the time or frequency ratio respectively.

Notice, finally, that the synthesizer outputs chunks of synthetic speech determined as sections of the piecewise linear pitch curve. Phones inside a section of this curve are synthesized in one go. The last one of each chunk, however, cannot be properly synthesized while the next phone is not known (since the program uses diphones as base speech units). When using mbrola with pipes, this may be a problem. Imagine, for instance, that mbrola is used to create a pipe-based speaking clock on an HP :

speaking_clock | mbrola - | splayer

which tells the time, say, every 30 seconds. The last phone of each time announcement will only be synthesized when the next announcement starts. To bypass this problem, mbrola accepts a special command phone, which flushes the synthesis buffer : "#"

Limitations of the program

  1. There may be up to 20 pitch pattern points in each phone, although not more than two or three are sufficient to copy natural prosody. We have set up a higher limit so as to enable the use of MBROLA to produce synthetic singing voices, in which case long vowels with vibrato may require a large number of pitch pattern points.
  2. Phones can be synthesized with a maximum duration which depends on the fundamental frequency with which they are produced. The higher the frequency, the lower the duration. For a frequency of 133 Hz, the maximum duration is 7.5 sec. For a frequency of 66.5 Hz, it is 15 sec. For a frequency of 266 Hz, is is 3.75 sec.
  3. Although pitch pattern points are facultative, the synthesizer will refuse to produce sequences of more than 250 phones with no pitch information.

Last updated December 17, 1999, send comments to Mbrola Team