dev builds
themes manual
device status forums
mailing lists
IRC bugs
dev guide

Rockbox mail archive

Subject: Re: a solution for voice clips (talkbox/UI)

Re: a solution for voice clips (talkbox/UI)

From: Brian Wolven <>
Date: Fri, 05 Mar 2004 09:20:16 -0500

[IDC]Dragon wrote:
>>>A drawback is that TextAloud generates a lot of pause (silence)
>>>around the
>>>actual clip, which I'd like to remove. Lame has no option for that,
>>>I haven't
>>>found a tool yet which does it on a batch of files.
> So I had to write my own, it is here:

You are a coding machine - the good kind. =)

> I have applied this to my UI clips, the result is in the same place as
> before:
> The strings have been edited to remove the printf formatting and
> abbreviations.
> For lame, I use the options "-V 9 --vbr-new -t" to save space. This is
> important for the UI clips, we have to cram them all into ~1.5MB for the mp3
> buffer. For talkbox, it doesn't matter.

I can change my scripts to use those options. It'll be interesting to
see how much space it saves.

>>I've got a couple of working VBscripts (Windows Script Host - not for
>>nixers) now, using the free-but-crappy MS voices, and assisted by Jörg's
>>speak.exe (link is above in his post) and an executable versiion of
>>lame. You will need both apps to run these scripts.
> Excellent!
> Especially your mp3ClipGen.vbs is exactly what I was "asking" for. I already
> ran that across my drive. (Now I need the talkbox patch)
> From vbs, you may be able to use SAPI directly via automation, so you could
> go without my speak.exe program.

I'd given that a quick try, but had no success with the first cut. It
was easiest to just use your .exe in the interest of getting a workable
version out. Tweaking can go on... indefinitely. =)

> Maybe some minor string processing can improve the quality. The MS speech
> just ignores hyphenation. I have dir names like "Artist name - Album name" and
> that's spoken without separation. Sounds better if speak.exe gets a full stop
> inbetween, so giving it a string like "Artist name. Album name".

It is possible, at least according to the SAPI docs, to change the TTS
modes using "context" tags. I'm not sure whether there is a mode that
will be suitable for all cases, though. OTOH, it's easy enough to define
some regular expressions and apply them to each string, both for the UI
clips and the talkbox clips, to handle most of the 'odd' cases we are
likely to encounter in our strings, e.g., replace ' - ' with appropriate
words or chars to generate the desired speech outcome. I have artists
with hyphenated names, so we just need to be a bit... selective ("-" !=
" - "). Some renaming and retagging (hopefully in batch mode!) may also
be desirable in order to produce consistent results with spoken names.
The names for classical works, in particular, can be pretty hilarious
(Opus vs. Op., Vol., etc.), and who wants to listen to Vye-Vall-Dee?
Fortunately, the list of strings to be processed can easily be expanded
or upgraded by anyone who knows a little about regexp's. I'll give that
a start in version 2. We might also want to add a phonetic 'lookup
table' that can be edited by the user, allowing them to specify a better
pronunciation for exceptionally bad (as far as the TTS engine goes)

BTW, has anyone tried to compile a program using the SAPI code with the
Dev-C++ IDE? I don't have the MS tools, nor any real desire to purchase
them, but trying to compile the 'speak' code gave me exceptions and
errors out the wazoo. Importing one of the sample projects that came
with the SAPI SDK did the same. There must be a 'secret bit' that needs
to be set somewhere?

Received on 2004-03-05

Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy