MULTI-LINGUAL PROSODIC TRANSPLANTATION TOOL


Contents

Terms and conditions for the distribution of the program

Terms and conditions on the use of the program

Disclaimer

  Download, installation How to align speech ?

How to modify the phonetic alignment and the pitch curve ?

How to create or modify a phonetic transcription ?

How to listen to a speech signal (original or synthetic) ?

How to zoom in a specific region ?

How to zoom out ?

How to change the diphones database ?

Files types ?


License

This program is being provided to "you", the licensee, by Fabrice Malfrère, the "author", under the following license, which applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this license. The "program", below, refers to any such program or work.

By obtaining, using and/or copying this program, you agree that you have read, understood, and will comply with these terms and conditions.

Terms and conditions for the distribution of the program

This program may not be sold or incorporated into any product which is sold without prior permission from the author.

When no charge is made, this program may be copied and distributed freely, provided that this notice is copied and distributed with it. Each time you redistribute the program (or any work based on the program), the recipient automatically receives a license from the original licenser to copy or distribute the program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this license. If you wish to incorporate the program into other free programs whose distribution conditions are different, write to the author to ask for permission.

If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this license, they do not excuse you from the conditions of this license. If you cannot distribute so as to satisfy simultaneously your obligations under this license and any other pertinent obligations, then as a consequence you may not distribute the program at all. For example, if a patent license would not permit royalty free redistribution of the program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this license would be to refrain entirely from distribution of the program.

Terms and conditions on the use of the program

Permission is granted to use this software for non-commercial, non-military purposes. In return, the author asks you to mention the MBROLIGN reference paper:

" High Quality Speech Synthesis for Phonetic Speech Segmentation ", F. Malfrère & T. Dutoit, Proceedings of the European Conference On Speech Communication and Technology, pp. 2631-2634, 1997in any scientific publication referring to work for which this program has been used.

Disclaimer

THIS SOFTWARE CARRIES NO WARRANTY, EXPRESSED OR IMPLIED. THE USER ASSUMES ALL RISKS, KNOWN OR UNKNOWN, DIRECT OR INDIRECT, WHICH INVOLVE THIS SOFTWARE IN ANY WAY. IN PARTICULAR, THE AUTHOR DOES NOT TAKE ANY COMMITMENT IN VIEW OF ANY POSSIBLE THIRD PARTY RIGHTS.



NEW To MBROLIGN 1.1.

Brief description of the MBROLIGN software

MBROLIGN v.1.0. is a prosody transplantation tool based on the use of the MBROLA speech synthesizer (http://tcts.fpms.ac.be/synthesis/mbrola.html).

It takes a .wav (Windows format) sound file sampled at 16 kHz (16 bits) and its phonetic transcription (SAMPA alphabet) as inputs. The system performs a temporal alignment of the phonetic transcription on the speech signal (the .wav file) and generates a .pho file which can be used with MBROLA to produce natural sounding synthetic speech. The alignment approach is described in "An Alignment System for Prosodic Parameter Extraction of a French Text" (Malfrère & Dutoit, 1997) (see References).

A more detailed description of the MBROLIGN v.1.0. is given in "How to use MBROLIGN ?"


Distribution

The distribution of MBROLIGN v.1.0. contains the following files:

If you want to use the software, download the MBROLA databases you need to perform the alignment for a specific language. You may download several databases from the MBROLA homepage:

http://tcts.fpms.ac.be/synthesis.





Download, installation

NEW : UPDATED version MBROLIGN v.1.1. is designed for Windows95, NT, 2000 and XP.

  1. Download the MbrolaTools.exe file from here and install MBROLA.
  2. Install the MBROLA databases you need: the available databases could be found at http://tcts.fpms.ac.be/synthesis .
  3. Download the MBROLIGN_INSTALL.exe file from here
  4. Run the Setup program file

    How to … ?
     
     

    How to align speech ?

    First of all, select the speech file you want to align (File è Open), next select the Mbrolign option in the main menu to perform the alignment. If there is no phonetic transcription (.txt or .pho associated with the .wav file) then the system asks you to insert the phonetic transcription (See How to create or modify the phonetic transcription ?). When the transcription is inserted, the system first determine the fundamental frequency curve by an autocorrelation method (a more precise algorithm (MBE) is available in the commercial version). This curve is then stylized with a linear piece-wise (other method are available for the commercial version). The segmentation step is then perform following the approach described in "High Quality Speech Synthesis for Phonetic Speech Segmentation". The results of the alignment will then appear on the screen.

    To create the synthetic speech, choose Synthesize in the main menu.

    Finally, the main frame of MBROLIGN will look like this:

    From the top to the bottom: the original speech signal, the phonetic transcription time aligned with the speech, the corresponding synthetic speech signal and the pitch curve (in red, the curve at the output of the pitch extractor; in blue, the stylized curve).

    How to modify the phonetic alignment and the pitch curve ?

    When your alignment is done, you can correct the mistake by using your mouse.

    Modify the phonetic alignment:

    To modify the phonetic alignment, drag and drop the limit you want to move (right button). To insert a new phonetic label, double click on the right button between two existing labels. A dialog box will appear (with the default phoneme _) to insert the new label. To modify the name of a label, double click on the right button near the desired label, the dialog box will then appear with the label to change. Finally to delete a label, push on Delete in the dialog box.

    Insertion of a new label (default _ )

    Modification or deletion of a label (d in this case)

    Modify the stylized pitch curve:

    Only the stylized pitch curve (in blue) could be modified. To modify this curve, place the mouse pointer in the pitch curve frame where you want to modify the curve and press and maintain the left button to select and move the curve. Release the button and the curve is fixed.

    How to create or modify a phonetic transcription ?

    To create or to modify the phonetic transcription of the speech signal, select the Transcription of the main menu and the phonetic transcription box will appear.

    Phonetic transcription dialog box (no initial transcription)

    Phonetic transcription dialog box (with initial transcription)

    To edit, modify or create a phonetic transcription, a copy/paste approach can also be used.

    How to listen to a speech signal (original or synthetic) ?

    To listen to the original or the synthetic speech file, place the mouse pointer into the speech signal frame you want to listen to, select the part of speech you want to listen with the right button of the mouse: push and maintain the right button of the mouse (selection of the beginning of the region), move the mouse in the windows (selection of the region) and release the button to listen the selected region. Double click on the right button to listen to the part of speech displayed.

    How to zoom in a specific region ?

    The main part of the interface of MBROLIGN is composed of four windows:

    All these windows are synchronized with each other and you have different ways of zooming on the signal. First you can use the scrollbar on the top of the main window. The scrollbar parameters are defined in mbrolgin.ini in the ZOOM section. ALT+ (towards the end) and ALT- (towards the beginning) can also be used to move into the signals (same parameters as the scrollbar). Finally, select in any of the window the region you want to zoom in with the left button of the mouse: push and maintain the left button of the mouse (selection of the beginning of the region), move the mouse in the windows (selection of the region) and release the button to zoom in the selected region.

    How to zoom out ?

    To zoom out the full signal, double click with the left button of the mouse in any of the windows (original speech, phonetic transcription, synthetic speech or pitch curve).

    How to change the diphones database ?

    To select another diphone database, use the Database option in the main menu. The Database Dialog box will appear and let you make a choice among the databases listed in the mbrolign.ini file.

    Files types ?

    Output files: all the output files are stored in the directory specified in the mbrolign.ini file. There are 2 outputs: a .pho file (with the same name as the original speech file) with the results of the alignment and a synthetic speech file with the same name as the original + s (Ex.: original= test1.wav è .pho= test1.pho and synthetic= test1s.

    Acknowledgments

    I would like to thank Thierry Dutoit for his support and his interest during the development of this Text-to-Speech alignment system.

    I would also thank Vincent Pagel, Michel Bagein and Alain Ruelle for their comments about the MBROLIGN interface.

    I am also grateful to all the members of the TCTS Lab of Faculté Polytechnique de Mons who tried and evaluated the MBROLIGN. Special thanks to Olivier Deroo and Vincent Pagel for their intensive testing.

    Finally, I thank the FRIA (Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agricuture) for its financial support.
     
     


    MBROLIGN mailing list

    To subscribe to the mbrolign mailing list send a email to mbrolign-request@tcts.fpms.ac.be with subscribe as subject.

    To send a message to the list: mbrolign@tcts.fpms.ac.be


    References


    Contacting the author

    Fabrice Malfrère

    Faculté Polytechnique de Mons,

    Circuit Theory and Signal Processing Lab,

    31, Boulevard Dolez, B-7000 Mons, Belgium.

    Tel: + 32.65.37.41.33

    Fax: + 32.65.37.41.29

    WWW: http://tcts.fpms.ac.be/~malfrere

    E-mail: malfrere@tcts.fpms.ac.be, for general information, questions on the installation of the software.