• Status Closed
  • Percent Complete
  • Task Type Patches
  • Category User Interface
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version Daily build (which?)
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by sdoyon - 2006-10-10
Last edited by nls - 2007-08-06

FS#6159 - Add voice to roughly 100 splash screens and yes-no menus

I’m a new Rockbox user, and I’m both blind and a programmer. I noticed
there are a lot of contexts in which my rockbox’ified player remains
silent and leaves me “in the dark”. This patch set goes after the
low-hanging froot.

I’ve split it into four patches to facilitate review.

talkmore-infra: This patch adds tiny bits of infrastructure that help
avoid clutter for the simple cases.
-A function and helper macro to speak a sequence of IDs (from an array),
conditional to talking menus being enabled.
-The ability for gui_syncsplash and gui_syncyesno_run to decode those so
called virtual pointers that can represent either an ordinary string or a
language / voicefont ID. Strings are processed as usual, and encountered
IDs can be spoken.

talkmore-lang: english.lang update, adding about 40 voice strings. There
are only 2 new phrases (for a special case), all the others are existing
phrases that did not have the voice part filled in.

talkmore-doit: Grunt work for lots of simple cases. Just substitutions of
ID2P() in place of str() for about 80 splash screens and 4 yesno
screens. This is not a mindless search&replace however: I’ve considered
each case. (Well I’m new to this code so I’m not saying I couldn’t have
missed some issue…) I’ve left out some contexts: progress report
splashes (they probably update very fast), some cases in which errors are
reported that imply the voice can’t be working, messages before the voice
is initialized, and some spots I just wasn’t sure about. As I said, I’m
going for the 80-90% of low-hanging fruit.

talkmore-special: A few slightly more complicated cases where I was just
a bit more creative than a simple substitution. About 11 more splashes, 5
more yesno screens and one menu (eq_advanced_menu). For example: passing
“%s %s” to gui_syncsplash() in order to concatenate two language
IDs. Some variable print messages were given a constant voice
correspondance for simplicity.

Speech for splash screens more or less assumes that the talking will
complete within the splash screen’s delay (the time it’s shown on the
screen). Usually that’s the case, as messages are pretty short and most
splashes last for a full second or more. If the speech lasts longer, it
runs the risk of being interrupted by the next utterance that the
subsequent context will want to voice. One interesting case is those
splashes with a 0 duration. Those that are used for progress report I
haven’t touched yet. The most used 0 duration splash is LANG_WAIT
(”Loading”). This is used for things that might take anywhere from a
fraction of a second to complete, to a long time. Even a short voice
message is annoying for the frequent cases where the actual waiting is
imperceptible or very short. Yet I do want feedback, even for cases with
moderate waiting, if only to tell me that the player really did pick up
my key press. The solution to this is to play a very short audio clip,
instead of speaking something. I have this 0.15s sound which I put in as
the clip for LANG_WAIT. On hearing that I know the player got my key
press and I just have to wait a bit. With such a short duration, it’s not
annoying and not likely to interfere with anything else. It does mean a
new special case for the voice building scripts, but it’s exactly like
the VOICE_PAUSE case. And nothing breaks if your voice building script
doesn’t handle this case yet or if you have an older voice file: the
entry remains blank and you just don’t hear anything on LANG_WAIT.


The task depends upon
ID Project Summary Priority Severity Assigned To Progress
6574 Rockbox  FS#6574 - Lang v2 cleanup  Very Low Low
The task blocks this from closing
ID Project Summary Priority Severity Assigned To Progress
6338 Rockbox  FS#6338 - Playlist playing time  Very Low Low
Closed by  nls
2007-08-06 13:09
Reason for closing:  Accepted
Additional comments about closing:   Warning: Undefined array key "typography" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 371 Warning: Undefined array key "camelcase" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 407





Updated talkmore-infra:
-Fix array sizeof() error I had made in gui_syncyesno_run().
-Add a flag in talk.c to force enqueuing the next utterance, so that
splash messages are not interrupted. The messages usually don’t take
longer to speak than the splash remains on screen, however sometimes there’s
a delay before we manage to actually start talking.

Updated talkmore-special.diff: use my new enforce_enqueue flag
where appropriate.

Yet another small revision to talkmore-infra: cut down on unnecessary
stack usage, and remove stupid TABs.

CVS resync for talkmore-doit.

Is there really no interest at all for any of this?
I think the talking yesno screens are particularly important.

I’ve gone through and applied all the patches at their current latest version and summarised it into one patch.
I’ve also synced the patch to CVS, and added support for the shutting down splash screen - so this patch now superseeds my shuttingdown.patch at  FS#6097 .

As for general comments on the patch, it works as expected, and it’s a great difference to the usability.
I hope this gets committed some time soon.

I have taken Will’s patch (and removed it from his post) and done it properly with cvs diff, and also removed the bit which he accidently added in… I’m also looking through it in the hope it can be commited

Project Manager

The patch looks really good, apart from where you commented out code in main_menu.c instead of removing it.

I can see one problem with this, and that’s the increased size of the voice file. How big is it with all these new phrases?

Linus: That part was not supposed to be in the patch, that was a silly mistake.

As for voice file size, it looks like it rose about 200kb.
I think that it could be kept under 1.5mb, but I’ve used slow, higher quality voice files, so my english.voice is 1.8mb (which works fine on the iPods).

As to voice file size: well I’m using espeak myself, and a home brew
generation script, so YMMV I guess, but here are some numbers:

espeak’s default rate is 160 words per minute, but I find that too slow,
I use 240 words per minute.
Lame options: –vbr-new -t

-espeak at 160wpm, -V 9 –resample 12 -B 64 –vbr-new ⇒ 1166896bytes.
-espeak at 240wpm, lame -V 6 –resample 12 -B 64 ⇒ 1524544bytes
-espeak at 240wpm, lame -V 9 –resample 12 -B 64 –vbr-new ⇒ 947248bytes
-espeak at 240wpm, lame -V 9 –resample 16 –vbr-new ⇒ 1120384bytes

I have read somewhere the recommendation not to use lame -V >6, but to my
ear -V 9 works fine.

Another thing that comes to mind: it’s nice to have a very short sound
clip to play for LANG_WAIT. That requires a special case in the
generation script just like the one for VOICE_PAUSE ⇒ _voice_pause.wav.

I do have such a clip, it lasts 0.09s, and it sounds nice. Only problem
is I can’t remember where I got it. I doubt anyone can assert copyright
over 0.09s of audio but I guess we don’t want trouble :-) so I didn’t
upload it. I guess I ought to find or generate something else.

Here’s a revised version.

People seem to prefer it all in one big patch. I thought my separate
patches were easier to review, but whatever, here’s a combined patch.

-Some advanced EQ menu stuff seems to have been lost in the summary patch
somehow, put it back in.

-Someone apparently added some lang entries that I hadn’t put in there… It reminded me to add voice for the “Load Last Bookmark?” query,
which I did.

-Sync up with CVS.

-Handle new battery warnings on shutdown.

-Try to wait for the “shutting down” message to finish before going on:

  voice_wait() function.

-Export shutup() function from talk.c. It’ll be useful in future patches.

-Export talk_idarray() function, move talk_menu condition into macro.
cond_talk_idarray_fq() macro used in only one place, so just expand
it there and take it out of talk.h.

-Straightened out a few phrases in english.lang, must have gotten mixed
up in previous patch manipulations. Patch probably has a hard
time with the format of the language file: the id: field is far
enough from the voice field that it’s not part of the context, so
patch must rely on line numbering. It’s probably confused when the
same string appears for multiple phrases like
probably redundant).

-Added some progress notification during insertion into playlist.

Re voice file size: it increases by about 12%.

It’s been a while since this has seen the light of day, but here’s a bit of an update.
The patch is a sync to SVN, however the equalizer menu implementation has changed in the SVN code making it difficult to resync.
As a result I’ve dropped the voicing of that particular setting menu for the time being.

Other than that, I really feel that this patch is a worthy candidate for SVN inclusion - I’m not sure why it’s taken so long.

nls commented on 2007-03-14 12:10

I tested this patch for a little while on my h300 and it works nicely :-) however i get a warning in bookmark.c when building.
The patch also modifies the id stings in two manual files and the hunk that used
to apply to apps/sound_menu.c doesn not apply and is afaik not needed any more.

apps/sound_menu.c was split into apps/menus/sound_menu.c and apps/menus/recording_menu.c and its very likely it will still be needed… (/me hasnt checked the patch tho)

nls commented on 2007-03-14 12:53

ah, yes you’re right, line 641 in apps/menus/recording_menu.c should be changed from
gui_syncsplash(50, true, str(LANG_MENU_SETTING_CANCEL));


gui_syncsplash(50, true, ID2P(LANG_MENU_SETTING_CANCEL));

nls commented on 2007-03-16 09:41

I resynced the patch, fixed the warning in bookmark.c and made
it build for hwcodec targets by ifdeffing out a call to
voice_wait(); in misc.c
I do not know if this has been tested on hwcodec, and do not want
to commit it until it has been confirmed working.

nls commented on 2007-03-16 10:03

I marked this task on depending on the Lang V2 cleanup  FS#6574  because that will decrease the size of voice files for hwcodec
to accomodate the increase that this patch results in, this also
needs to bee tested on archos targets, I think that the voice file
can be made small enough by using lower quality or higher compression.

Resync’d again.

Resynced, but this time I added the equaliser settings that were lost recently.
This is the complete patch, once again.

jteh commented on 2007-06-21 09:40

Synced to latest svn.

Synced to latest svn.
A couple of things:
The shutting down message wasn’t working so I have fixed that
Also some of the voice strings weren’t being added to english.lang so I have fixed that.

And I am unsure about the low battery warning:
If the battery is between 2 and 10 rockbox will voice “battery low” when you shut down the player. And when the battery is less than 2rockbox will voice “battery empty”. So is this correct? When the battery is less than to is the battery empty?
Hopefully it is ok.

I forgot to say in my last comment that there was a problem with settings.c. So could someone make sure I haven’t taken anything important out?

Sync’ed to r14205, which includes the commit for P#6574, on which this
task was depending.

I tried to be thorough:
-Had a quick look through all uses of str() for all trivial cases. Added
a couple ID2P conversions I believe… -Checked that all newly voiced IDs have a definition in the lang file.
-Checked for merge errors in lang file (was one).
-Had a quick reread of the whole patch

Non-trivial cases that are spoken:
-shutdown message,
-progress for insertion into playlists,
-database building progress (new).

I did drop a few things from the patch:

-Advanced equalizer peak filter items: I’d really like the simple eq menu
to talk, and that will require a separate patch, with some kind of menu
talk callback. When that’s done, there will be no need to add duplicate
lang entried for peak 1, 2 and 3, so it seems to me there’s no point
doing that here.

-A couple of splashes in the playlist viewer and playlist catalog: as
long as the listed items are not spoken, those are not really usable with
voice, and there’s is not much point speaking a few random errors. It can
be done in separate patches addressing the viewer and catalog lists.

I increased the number of context lines for english.lang, hoping to avoid
unnoticed merge errors.

This patch adds voice text for 48 lang IDs, increasing voice file size by
about 14%.

Here are some voice file size stats, building for an IAUDIO X5.
-With my preferred voice settings:

  1. Without this patch: 959156
  2. With this patch: 1091896

-With default voice settings:

  1. Without this patch: 3141152
  2. With this patch: 3587524


Available keyboard shortcuts


Task Details

Task Editing