The ID3Phonetizer is a GNU generic phonetizer based on decision trees tained with the ID3 algorithm, as implemented (and made available) in the MBRDICO project. This module computes the phonetization of each Word in the Word layer, on the basis of its spelling and parto-of-speech (which must be available in the Word).
The tree file must be declared like this:
An optional (z-score based) phoneme duration file can be provided:
It is also possible to provide a tag convertion file for converting part-of-speech tags in Word items into tags used in tree_file_name :
The flag -setLower or -setUpper could be used to convert the word according the case of the tree feature.
ID3Phonet = ID3Phonetizer.dll -ID3File french.tree -zsFile fr.zs -tagFile tags.rul -setLower
The .tree and the .zs files can be obtained with the tools made available in the MBRDICO project. More information on their format can be found there.
For an example of a tag conversion file, see euler/databases/tags_pm.rul
ID3Phonetizer creates a Phoneme layer (composed of Phonemes), from the 'Name' and 'POS' features of each Word in the Word layer of the MLC.
If a phoneme duration file is defined, the 'Duration' feature of Phoneme items is set accordingly. Otherwise, it is set to a default value (100 ms).
Phoneme items are linked with their corresponding Word items (only the first Phoneme item is linked, actually).
NB: Setting the 'Duration' feature of phonemes (to either default values or to values set using the phoneme duration file) makes it possible to listen to the output of this module, using the MBROLAInterface module. Following the same idea, this module actually assigns a 100ms "silence" ("_") phoneme to any Word item whose 'POS' feature is found to be SILENT ("_" and SILENT is therefore keywords predefined by this module). As a result, it is possible to assign fixed pauses to punctuation characters in the input text, by setting their 'POS' feature to SILENT (this is either done by the RulesLemmatizer module or defined in the tag_file_name, for instance).
No option flag is available for customizing run-time behaviour.
ID3-based phonetization tree files have already been made available in the context of the MBRDICO project, for Arabic, Dutch, Spanish, British English, French, and American English.
The MBRDICO project also provides the ID3 training tools to create phonetization trees for other languages, using a phonetized corpus.
This module and its sources are distributed
under the GNU
license, take a look in the source
Copyright © 1999 TCTS LAB, Faculté Polytechnique de Mons, Belgium