dev builds
themes manual
device status forums
mailing lists
IRC bugs
dev guide

Rockbox mail archive

Subject: Re: a solution for voice clips (talkbox/UI)

Re: a solution for voice clips (talkbox/UI)

From: Rocker <>
Date: Fri, 05 Mar 2004 20:30:06 -0700

You guys freaking blow me away! I think I'll go pull out my Adam computer!
lol! As far as costs go I would be more then happy to pitch in some cash in
order to make the best possible product. Maybe I can buy a case lot of
those 8 bit chips eh?

Keep up the great work. You guys bring tears to these rock & roll eyes!


----- Original Message -----
From: "Glenn Ervin at Home" <>
To: "Rockbox development" <>
Sent: Friday, March 05, 2004 8:11 PM
Subject: Re: a solution for voice clips (talkbox/UI)

You would be surprised as to how you can make synthetic speech enunciate
multiple words or separate words put together if you capitalize certain
If you capitalize the first letters of the words you want spoken as separate
words, it sounds much better, so you will not have to hyphenate.

----- Original Message -----
From: "Brian Wolven" <>
To: <>
Sent: Friday, March 05, 2004 8:20 AM
Subject: Re: a solution for voice clips (talkbox/UI)

[IDC]Dragon wrote:
>>>A drawback is that TextAloud generates a lot of pause (silence)
>>>around the
>>>actual clip, which I'd like to remove. Lame has no option for that,
>>>I haven't
>>>found a tool yet which does it on a batch of files.
> So I had to write my own, it is here:

You are a coding machine - the good kind. =)

> I have applied this to my UI clips, the result is in the same place as
> before:
> The strings have been edited to remove the printf formatting and
> abbreviations.
> For lame, I use the options "-V 9 --vbr-new -t" to save space. This is
> important for the UI clips, we have to cram them all into ~1.5MB for the
> buffer. For talkbox, it doesn't matter.

I can change my scripts to use those options. It'll be interesting to
see how much space it saves.

>>I've got a couple of working VBscripts (Windows Script Host - not for
>>nixers) now, using the free-but-crappy MS voices, and assisted by Jörg's
>>speak.exe (link is above in his post) and an executable versiion of
>>lame. You will need both apps to run these scripts.
> Excellent!
> Especially your mp3ClipGen.vbs is exactly what I was "asking" for. I
> ran that across my drive. (Now I need the talkbox patch)
> From vbs, you may be able to use SAPI directly via automation, so you
> go without my speak.exe program.

I'd given that a quick try, but had no success with the first cut. It
was easiest to just use your .exe in the interest of getting a workable
version out. Tweaking can go on... indefinitely. =)

> Maybe some minor string processing can improve the quality. The MS speech
> just ignores hyphenation. I have dir names like "Artist name - Album name"
> that's spoken without separation. Sounds better if speak.exe gets a full
> inbetween, so giving it a string like "Artist name. Album name".

It is possible, at least according to the SAPI docs, to change the TTS
modes using "context" tags. I'm not sure whether there is a mode that
will be suitable for all cases, though. OTOH, it's easy enough to define
some regular expressions and apply them to each string, both for the UI
clips and the talkbox clips, to handle most of the 'odd' cases we are
likely to encounter in our strings, e.g., replace ' - ' with appropriate
words or chars to generate the desired speech outcome. I have artists
with hyphenated names, so we just need to be a bit... selective ("-" !=
" - "). Some renaming and retagging (hopefully in batch mode!) may also
be desirable in order to produce consistent results with spoken names.
The names for classical works, in particular, can be pretty hilarious
(Opus vs. Op., Vol., etc.), and who wants to listen to Vye-Vall-Dee?
Fortunately, the list of strings to be processed can easily be expanded
or upgraded by anyone who knows a little about regexp's. I'll give that
a start in version 2. We might also want to add a phonetic 'lookup
table' that can be edited by the user, allowing them to specify a better
pronunciation for exceptionally bad (as far as the TTS engine goes)

BTW, has anyone tried to compile a program using the SAPI code with the
Dev-C++ IDE? I don't have the MS tools, nor any real desire to purchase
them, but trying to compile the 'speak' code gave me exceptions and
errors out the wazoo. Importing one of the sample projects that came
with the SAPI SDK did the same. There must be a 'secret bit' that needs
to be set somewhere?



Received on 2004-03-06

Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy