dev builds
themes manual
device status forums
mailing lists
IRC bugs
dev guide

Rockbox mail archive

Subject: Re: a solution for voice clips (talkbox/UI)

Re: a solution for voice clips (talkbox/UI)

From: Brian Wolven <>
Date: Thu, 04 Mar 2004 18:34:10 -0500

[IDC]Dragon wrote:
> Hello,
> the topic of blind usage brought me into the field of text-to-speach
> converters.
> I learned about the Microsoft Speech API (SAPI), which makes it very easy to
> program talking applications, but sounds pretty crappy. I have made a first
> command line program. All it does is "speak" the text of the first command
> line argument into a wav file. This can be useful to make Talkbox clips in an
> automated way, recursively across the whole disk if you like and if somebody
> makes a Perl script for it (hint, wink).
> My program is here:
> I don't know if this runs on a plain standard Windows box, maybe you need to
> install some SAPI runtime:
> The best speech is done by AT&T Natural Voices. I have no program which uses
> it, but there's an interactive web demo:
> Just an idea: would it be possible to abuse this demo and harvest the wav
> files it generates, in an automated way? I have no web programming skills,
> maybe somebody else can look behind the scenes.
> First I played with the eval version of TextAloud. It can talk better, but
> is not made to be scripted. I convinced it with a trick, to get all rockbox
> language strings as wav files. The program rememberes "articles" to be spoken,
> so I made many little articles from the strings. In the program directory
> there will be a last-settings file username_TAExport.txt, which I modified like
> this:
> ||||||TextAloud MP3 Start of Article||||||
> Sound Settings
> ||||||TextAloud MP3 End of Article||||||
> ||||||TextAloud MP3 Start of Article||||||
> General Settings
> ||||||TextAloud MP3 End of Article||||||
> and so forth. When playing this into separate files I get a bunch of wav's,
> which lame can then nicely encode. The integrated lame is too restricted,
> won't do VBR, no quality settings avail. So I have all the UI clips. I have
> placed the first shot here:
> A drawback is that TextAloud generates a lot of pause (silence) around the
> actual clip, which I'd like to remove. Lame has no option for that, I haven't
> found a tool yet which does it on a batch of files.

I've got a couple of working VBscripts (Windows Script Host - not for
nixers) now, using the free-but-crappy MS voices, and assisted by Jörg's
speak.exe (link is above in his post) and an executable versiion of
lame. You will need both apps to run these scripts.

readlang.vbs takes a language file (command line, dialog box, or just
drag the file onto the script in your file manager) and generates all
the voice clips, storing them in a subfolder named after the language.

mp3ClipGen.vbs recursively generates all of the talkbox-style clips for
the directories in your music folder. Just drag the folder[s] you wish
to have processed onto the script, or use the command line (e.g.,
mp3clipgen.vbs "c:\music\music files"), or simply run it and enter the
folder at the prompt. If you have all your music folders under one root
folder, it will process the whole thing in one bite, although it may
need to chew for a while.

Both scripts (including source =P) are available at:

Received on 2004-03-05

Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy