How To Build Your Own Voice File
We provide daily built voice files at http://www.rockbox.org/daily.shtml
- built using Festival TTS. If you have a better sounding TTS engine (or lots of spare time) and would like to make your own voice files, read on!
This is an in-depth description how to build your own voice files. It requires a little more than basic computer knowledge. For an easy solution (but not suitable for batch-generating voice files) you can also try RockboxUtility
's voice generation feature.
First, you need the [language].lang file for your language. Bring this up to date if necessary, and fill in the voice: values for those entries that have them in the current english.lang file (the master of all .lang files). The voice files are then created by performing the following 4 steps:
- Convert each voice: value into a .wav file. This can be done either by using Text-to-speech software, or recording a human voice reading the items. Of course, only the first alternative can be automated. The VOICE_PAUSE entry is a special case, see detailed description below.
- (Optional, but recommended.) Trim the .wav clips so that there is (almost) no leading and trailing silence. This helps both to keep the voice file small and to have fluent spelling and number speaking.
- Compress all .wav snippets into .mp3 using lame or another encoder.
- Assemble the final .voice file from all these .mp3 snippets. A special tool (provided here) is needed for this. For archos jukeboxes the voice file can be no larger than 1.4 MB, if it is go back to step #3 and choose a higher compression.
Using Linux or Cygwin
You will need a build environment
and the Rockbox source, either from Git
or from the download pages. Once that's done, follow these simple steps:
- Follow the on-screen instructions, selecting:
- your target platform (a number)
- A for advanced
- V for voice, confirm "Voice build selected"
- Select your language (a number) or press ENTER for English.
- Your TTS engine (F for festival, E for espeak, L for flite or S for SAPI5)
- Your TTS engine options (ENTER for defaults)
- Your LAME options (ENTER for defaults)
You now have a
Human Voices (under Windows)
This is good for languages without a usable TTS engine. Go to Start->Run and type sndrec32. Have a printed copy of the .lang with you. Follow the printed copy, recording the strings in the correct order. Save each clip by giving the name as the id of the phrase. When that is complete, put voicefont.exe, yourlang.lang, and english.lang in to the directory in which you saved all the clips. Go to the command prompt, change to the build dir, and run lame on all the clips. Once that is taken care of, you can run voicefont and get a .voice file.
Human Voices (under Linux)
Download the script in FS#7650 and save it in a subdir of the Rockbox source. Now put your recorded Voice clips in a directory called "originals", the filename should contain the string that's being voiced followed by .wav (For example "Yes.wav"). Now edit the top lines of the file, so that LANGNAME is set to the correct language. After this, run the script, and it should provide you with voicefiles, or tell you which strings are missing.
%W% This script needs to be updated to use rbspeexenc - easily done on line 180.
|| Enhanced wavtrim tool, originally written by Jörg Hohensohn. The enhanced version does not only trim digital silence, but scans the file from either end until it finds a sample with a higher value than the given noise floor value. Then it trims this end, leaving 10 msec of the silent part there in order to not cut away faint beginnings/ends.
|| Enhanced wavtrim tool, Windows executable.
|| Voicefont tool, written by Jörg Hohensohn.
|| Voicefont tool, Windows executable.
|| Pause .wav file (300 msec).
|| VBScript for automated .voice building. Note: Currently out of date. Volunteers required to fix!
|| VBScript that lists all installed SAPI5 voices.
|| VBScript that lists all installed SAPI4 voices.
Copyright © by the contributing authors.