Rockbox

Tasklist

FS#11541 - Add Voice Announcement of Summary Info to WPS hotkey

Attached to Project: Rockbox
Opened by Sean Inglis (seani) - Sunday, 15 August 2010, 01:51 GMT
Task Type Patches
Category User Interface
Status Unconfirmed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version Release 3.6
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

This patch adds an additional menu option to the list of WPS hotkey choices, "Announce Information". This voices basic information about the player, currently playing file and current tracklist.


This feature reads a simple definition with a single line of no more than 20 characters from the file "announce.cfg" in the .rockbox directory.


This definition consists of a string of single character tokens, each of which announces a particular canned piece of information.


The tokens are:

A - current time
B - elapsed time and total length of the current track
C - current track number and total tracks in playlist
D - battery level as a percentage
E - battery remaining in hours and minutes
F - sleep timer remaining in minutes


If more than one token is specified, more than one piece of information is announced, so:

AB

would announce the time followed by elapsed time and track length


Multiple announcements may be bound to the hotkey by separating one or more groups of tokens with a ":"

A:DF


In this example, the first time the hotkey is pressed, the time is announced. If it is pressed again within 10 seconds, the battery level and sleep timer are announced and so on.


A space between tokens adds a short pause to the announcement.


If no announce.cfg file is present, the time is announced by default.
This task depends upon

Comment by Jonathan Gordon (jdgordon) - Sunday, 15 August 2010, 02:09 GMT
nice patch.
Is this only for the WPS? or everywere the hotkey works?
Can you change to using the config.cfg (and global_settings) instead of a seperate file?
Comment by Sean Inglis (seani) - Sunday, 15 August 2010, 09:35 GMT
I couldn't see - straight off the bat - how you enable editing for a text setting if it's in global_settings. Was also a bit uncertain what would be considered the "best" place for the definition to live. I'll go back and peer at the settings* files.

At the moment, this is only added to the hotkey list for the wps. I was going to address the browse hotkey as a separate exercise and keep this patch simple.
Comment by Sean Inglis (seani) - Sunday, 15 August 2010, 18:44 GMT
Correct build failure for non-RTC targets
Comment by Steve Bavin (pondlife) - Monday, 16 August 2010, 06:40 GMT
I like this idea very much, but rather than a seperate announce.cfg, might it be better to read out the current WPS. A visually impaired user could then set up a simple WPS with just the required fields. A blank line would insert a pause. We'd need to ensure that all the above fields were available as WPS tags, of course!

Less to document, more extensible and more obvious IMHO.
Comment by Jonathan Gordon (jdgordon) - Monday, 16 August 2010, 06:47 GMT
voicing skins does add some interestingness, but voicing isnt only for blind people... it would be nice to have a fancy wps (or sbs even) which will say a limited set of tags on a press (the time for example, or whatever).

The obvious way to do this would be useing a viewport which isnt drawn and only used for the voicing. But yeah this means a bunch of new voice strings need to be added
Comment by Steve Bavin (pondlife) - Monday, 16 August 2010, 08:44 GMT
FWIW - I use voice when driving, and also a dedicated CFG/theme (big font etc.).

How about a modifier tag which means "voice the following tag if possible"...?
Comment by Jonathan Gordon (jdgordon) - Monday, 16 August 2010, 09:17 GMT
I tihnk another tag would just be annoying. A viewport would I tinhk be simplest (something like how the ui viewport is done so it can be conditionaly chosen perhaps).
The problem is still getting the extra voice strings in and making it all work :)

by the way, one of the things stopping me useing the skin engine for the id3 info screen is no voice support :)
Comment by Sean Inglis (seani) - Monday, 16 August 2010, 09:31 GMT
All good ideas, but please remember that the *primary* driver for the patch is to allow easy access to this information to visually impaired users.

As such, maintaining a separate file with the simplest possible syntax means that the user can easily create a file in their editor of choice and deploy to the .rockbox directory. I think this also applies to editing a setting in place; I'm not visually impaired but - with no disrespect to the authors - find using the built in text editing awkward to use.

Including these definitions somehow in the WPS raises the bar in terms of accessibility. If anything, the feedback I've had so far is to make the format even simpler - it's currently case-sensitive with an eye towards a possible 52 fabulous tokens - and that's identified as an issue for some screen-readers.

Secondly, WPS development seems to be very active and that "feels" like it might make it much more difficult to get in. And changes that then break skin compatibility are likely to be a bigger deal for the main target audience. (although reading it back, I think JDG is proposing "in addition"?)

Last, this only looks at the WPS hotkey. There are obvious applications for a browser hotkey to do something similar on a playlist / directory basis. Then I think it has to live in a separate config setting / file anyway and it would seem appropriate to make the two consistent.

Comment by Jonathan Gordon (jdgordon) - Monday, 16 August 2010, 09:59 GMT
yes, sorry for going overboard :)
I still tihnk putting it directly into the config.cfg and global_settings is better than a sepearate file, after all, the settings needs to know what file to try to read.
Comment by alex wallis (alexwallis646) - Monday, 16 August 2010, 10:10 GMT
Hi, just to come in on this.
Checking case of letters isn't a problem with screen readers.
My reason for suggesting the case insensitivity is simply because in windows commands are not case sensitive, and when I was reading the description for the patch because I was simply reading it as a description and wasn't making a point of going through it closely I didn't pick up on the fact the letters had to be in upper case. When you are just reading lets say a long document or something, normally screen readers do not inform about the case of letters, unless you go through it letter by letter or set it to actually tell you about case.

Case sensitivity is not a problem, provided that in whatever form this is eventually committed, that the fact the tags are case sensitive is specifically put in the manual entry so that if someone with a screen reader is reading it they will actually know that case is important.
Personally I think we should stick with case sensitivity to get the maximum amount of flexibility.
Comment by Sean Inglis (seani) - Monday, 16 August 2010, 10:17 GMT
@JDG - understood, I'll put adding a property with a simple custom editor on the todos

@alex - my bad, I misunderstood, I'll put a manual entry on the list as well.
Comment by Sean Inglis (seani) - Tuesday, 17 August 2010, 22:32 GMT
Changes:


Added a menu option under Settings -> General Settings -> Voice to allows the format to be edited.

Format now saved under "wps announcement format" in standard config rather than separate file.

Tokens are now two chars in size and case insensitive. This allows a bit of grouping, more combinations, and saves having to flip between pages of characters when editing the format string on the dap.

For similar reasons, "." is now used as separator to keep everything on one page


Token list:


Prefix A: Date and time

Aa = TIme
Ab = Date
Ac = "TIME" Time
Ad = "DATE" Date
Ae = Date Time
Af = sleeptimer remaining
Ag = "SLEEPTIMER" sleeptimer remaining


Prefix B: Track information

Ba = elapsed time
Bb = track length
Bc = remaining time
Bd = elapsed time "ELAPSED" remaining time "REMAINING"
Be = elapsed time "OF" track length


Prefix C: Playlist information

Ca = current track
Cb = number of tracks
Cc = tracks remaining
Cd = "TRACK" current track "OF" number of tracks


Prefix D: Battery and sleeptimer

Da = battery level percentage
Db = battery level in minutes
Dc = "BATTERY TIME" battery level percentage
Dd = "BATTERY TIME" battery level in minutes


Prefix S: Various prefixes, suffices, connectives etc.

Sa = "TIME"
Sb = "DATE"
Sc = "TRACK"
Sd = "ELAPSED"
Se = "REMAINING"
Sf = "OF"
Sg = "BATTERY TIME"
Sh = "SLEEP TIMER"
Comment by Marek Salaba (salaba) - Saturday, 21 August 2010, 12:39 GMT
It is much more great function for blind an visualy ipared.
Please integrated it.
Salaba
Comment by Sean Inglis (seani) - Saturday, 21 August 2010, 17:03 GMT
Mark, can I ask, have you tried this out in a patch yourself or are you guessing it would be useful?
Comment by Sean Inglis (seani) - Saturday, 21 August 2010, 17:03 GMT
Oops, I mean Marek, not Mark.
Comment by Jonathan Gordon (jdgordon) - Sunday, 22 August 2010, 01:13 GMT
I dont like this latest patch. I fail to understand how a 2 letter combo with no semblance to what they each link to could possibly be easy for someone (blind or otherwise) to update without the docs (which means at the computer).

If you are going to do 2 letter combos then you absolutely must use the same combos as the skin code, preferably using the actul skin files also.

doing it that way would allow something like this: %?pS(.5)<%Tk(%ia, -, %in, -, %it)> which would talk (the %Tk() tag) the track artist, pause, track number, pause, track title when the song changes. Yes it imght be more complex but far more useful and I don't believe that someone wouldnt have someone to help out if needed (especially when we have the forum and irc)
Comment by Sean Inglis (seani) - Sunday, 22 August 2010, 01:53 GMT
I disagree.

1) Linking the announcement to the WPS is unnecessary. The announcement happens to be made from the WPS. It isn't part of the WPS and doesn't need to be.

2) The format you suggest will be significantly more difficult for someone blind or visually impaired to parse and amend.

3) Allowing someone to amend the announcement configuration from memory isn't a goal for this patch, and I don't agree that making any significant effort to allow them to do so is worthwhile.

Maybe there are people happily creating themes on their DAP, using the built-in text editor plugin and recalling the WPS tags from memory away from their PC, who knows? But they have to be in an "enthusiastic" minority. I can see no justification for compromising ease of use for the *main* target audience to make that very narrow use case any easier.


If you have suggestions for two letter combinations that you feel better reflect the intention of the tokens I'm allowing access to here, by all means suggest them.
Comment by Jonathan Gordon (jdgordon) - Sunday, 22 August 2010, 02:05 GMT
1) well, yes and no

2,3) either way you need to be at a computer to change what is being said which means you can get help. I never said anything about making it easier to update on the dap, just that whatever magic letters are used they need to be with docs to be understood.

Yes using the skin engine is more complicated (which I dont really see as a massive issue when there is always someone around who can help) but it is far more useful
Comment by Sean Inglis (seani) - Sunday, 22 August 2010, 09:14 GMT
1) Then we're agreed that needing a reference to the tokens in question is a non-issue, and that being at a computer is a non-issue.

Except that for, the target audience, actually editing and amending the announcement will be far easier in this format than in a WPS format of the type suggested.

I believe this will be particularly true when I've completed the accessibility changes for the vkeyboard. Current WPS tokens bear no semblance to what they link to. They are familiar (for some people), not intuitive. And they won't be particularly familiar to the target audience.

2) Totally agree on the documentation. The feedback I've had so far stresses the importance of unambiguous documentation for blind users. My aim is Blind FAQ on the wiki, then manual.

3) I don't agree that using the skin engine is far more useful for this use case. I do agree that it's more complicated, and for no benefit. If there was general TTS on the DAP as opposed to the semi-static announcements supported, I'd probably have more to agree with on that point. But that isn't currently the case.


Looking at the history, there have been proposals to add TTS the core since at least 2009 and laudable efforts ongoing to do it now. If that's achieved for required targets, it might be worth revisiting.


But I stress again that the purpose of this patch is to allow access for visually impaired users to key information on demand, in a format they find useful, not to add generalised voice support to the WPS.
Comment by Sean Inglis (seani) - Sunday, 22 August 2010, 09:17 GMT
When I say "what they link to" I meant "intuitive link to actual information".

But I've looked again at the tokens and I'm dead wrong on that score, the first letter grouping *is* linked more logically, my mistake.

I'll revisit the actual two letter codes used in the patch.
Comment by Marek Salaba (salaba) - Sunday, 22 August 2010, 09:27 GMT
Sean, I can not test it befere rellease in SVN. I can test after because I have more blind friends and clients, because I work with them. I'm litlle visualy impared and worknig in Czech Blind United as a network superviser and IT technik.I am not an experienced programmer, just trying it yet. I mean that this function could certainly be very useful for us.

I agree with Gordon that a recent letter abbreviations in totmo patches are not entirely clear. I think that should be guiding but I certainly would not use the same as in the WPS.

Practice would be to use some such semantics:

Prefix D: Date and time

Dt = TIme
Dd = Date
DT = "TIME" Time
DD = "DATE" Date
D = Date Time
Ts = sleeptimer remaining
TS = "SLEEPTIMER" sleeptimer remaining


Prefix T: Track information

Te = elapsed time
Tl = track length
Tr = remaining time
T = elapsed time "ELAPSED" remaining time "REMAINING"
To = elapsed time "OF" track length


Prefix P: Playlist information

Pc = current track
Pn = number of tracks
Pr = tracks remaining
P = "TRACK" current track "OF" number of tracks


Prefix B: Battery and sleeptimer

Bp = battery level percentage
Bm = battery level in minutes
BP = "BATTERY TIME" battery level percentage
BM = "BATTERY TIME" battery level in minutes


Prefix V: Various prefixes, suffices, connectives etc.

Vt = "TIME"
Vd = "DATE"
Vt = "TRACK"
Ve = "ELAPSED"
Vr = "REMAINING"
Vo = "OF"
Vb = "BATTERY TIME"
Vs = "SLEEP TIMER"
Comment by Sean Inglis (seani) - Sunday, 22 August 2010, 09:49 GMT
Marek,


I'd have to say I've moved closer towards JDs position on the actual tokens used. There is certainly some link between the first letter grouping the tokens, for instance:


Prefix C = Clock

Prefix P = Playlist - track number etc.

Prefix B = Battery and charging


and the others are decent stabs at a link of some kind to the type of information.


At the moment, I'm thinking of using those prefixes with some additions for canned phrases and connectives.
Comment by alex wallis (alexwallis646) - Sunday, 22 August 2010, 10:16 GMT
Hi.
I agree partially with sean,

I can't see why this patch needs to be connected with the
wps at all, its about giving quick access to status information on the player, not adding new tags to the wps.

However I can totally understand Jonathans point about the tokens needing to be more intuitive to work with, I think if possible they should be based off the letters of what they will voice, so
for example dt could be date time, and bp could be battery percentage.

On the other hand, Jonathan, I am interested in what you mean when you say you think linking it with the wps could be more useful,
what things could be voiced if the patch was linked to the wps in the way you suggest?
I do think that the wps tags are not very intuitive at the moment, But I am always interested in the possibility of making the voice more flexible, so I would be interested to know what else could be voiced.

Personally I think if the only benefit you would get from linking the patch to the wps is the ability to voice artist names and that kind of thing the benefit is far outweighed by the complexity of the wps tags, but it really depends on what else this would allow you to voice.
Perhaps this patch could be treated as one particular feature, and having some kind of voice tags in the wps could be another task.
Just my thoughts.
Comment by Steve Bavin (pondlife) - Monday, 23 August 2010, 07:26 GMT
The feature doesn't need to be linked to the WPS per se, but I'd much prefer a "read screen" function that voiced whatever skin is currently displayed (i.e. SBS/WPS/FMS) . We already have a well documented set of tags there (plus a parser already built in).

Most importantly IMHO it's good to unify user interfaces, rather than forking.
Comment by Sean Inglis (seani) - Monday, 23 August 2010, 10:21 GMT
@pondlife

I can understand that sighted users would like, as a convenience, an attempt at voicing the current WPS. A moments reflection will show that this is fraught with complexity and (sometimes arbitrary) design choices. ISTR this is something that this (generic voice rendering of the WPS) has been suggested before, some time ago, and so far it hasn't happened. It's worth reflecting on why that is.


The patch isn't meant to voice the current WPS. It's meant to announce information available when a hotkey is pressed from the WPS context. In the fullness of time, I'd hope to extend this to the browser, but I won't be doing that in the same patch.

Where should the format for announcements be stored then?


This patch is *specifically* for blind or visually impaired users to get access to information on demand, by pressing a hotkey. This information can be tailored to a small extent, and expanded on with subsequent presses.

The format is designed to be as easy to manage as possible for the *target* audience. Compromises to allow a sighted audience marginally greater familiarity with the format are not a goal, and don't need to be.

For this audience, simplicity, stability and consistency are key. I admire the work done on the WPS and it's flexibility, but these three adjectives could not be considered hallmarks of the code as it stands.

I'll change the tags to follow the 2 character WPS format where it's appropriate to do so, and where one is available, but I won't be making any effort to fold the code into the WPS. It doesn't belong there, and indeed there are positive advantages in maintaining separation from the WPS code both in terms of source contention and "psychological" distance.
Comment by Jonathan Gordon (jdgordon) - Monday, 23 August 2010, 10:24 GMT
" A moments reflection will show that this is fraught with complexity and (sometimes arbitrary) design choices. ISTR this is something that this (generic voice rendering of the WPS) has been suggested before, some time ago, and so far it hasn't happened. It's worth reflecting on why that is." I can tell you exactly why that is.
Untill very recently the parser was fairly simple, if a tag wanted to have other tags as what we call params, it bassically has to reimplement the parser to work. With the new one we can do an incredibly simple (but very usable) skin talking patch in maybe a few dozen lines (which I might just do now to prove :D )
Comment by Sean Inglis (seani) - Monday, 23 August 2010, 10:38 GMT
Sounds like it will be a good start on the voicing WPS patch that @pondlife raises above. I'm unconvinced it should be a replacement for this patch.
Comment by Steve Bavin (pondlife) - Monday, 23 August 2010, 10:59 GMT
My argument is absolutely not about sighted vs. visually impaired. Quite the opposite, it''s about having one UI to support and not dividing the community. If the display Ui and voice UI diverge then it may be harder to maintain and get better voice support in general.

Whilst it's a fair bit of work, I don't see any fundamental reason why each WPS tag couldn't have voice support, even if it involves more spelling than I'd like. I don't see where the "compromise" comes in. Is it that the WPS syntax is awkward to use with a screenreader, or that you'd want different details displayed

(If the standard WPS tags are used, then there's always the possibility of improving/adding accessiblily to the WPS editor.)

Disclosure - I'm sighted, but with poor eyesight and use the voice UI all the time. I very much want the non-voiced areas to get full voice support. I'd particualrly like to have a hotkey option which would voice whatever is currently on-screen; including the SBS and WPS. (For a list screen it would probably just voice the selected item; not sure reading the entire list is sensible!)

Either way, it's your patch and you're the one putting the work in, so do what you will. I'm only speaking up in the hope it will help long-term support of the voice UI, which is (to me) the #1 feature of Rockbox.
Comment by Sean Inglis (seani) - Monday, 23 August 2010, 11:29 GMT
I have a few principal objections and you correctly allude to them above:

1) The syntax for announcing information is considerably more complex. The driver behind simplifying it is to make it accessible for editing, and remove distractions. I take your point that a theme editor with improved accessibility should address that concern. I'm pretty sceptical that it would get done in any reasonable timeframe, and I wouldn't be the one to do it. I hope you could accept that isn't sour grapes, just an honest reflection of the discretionary time I have available.

2) The patch is meant as a first step to a similar patch for browsing playlists etc, so a companion hotkey would be added. Where should that definition live? To me, the natural grouping is that these are on demand announcements with varying context; grouping with the WPS is artificial


I don't make any apologies for aiming it squarely at visually impaired users as a priority. I don't think the "compromise" - of having to maintain a separate setting where the syntax isn't necessarily identical - is a consideration.


Part of the driver for implementing the patch in this way - as a separate discrete function with a very modest impact - is to attempt to get it adopted and prove useful in a reasonable timeframe. It's pretty clear to me that this approach isn't going to work.

The reason for the patch is to improve accessibility. I personally won't use it, so although I've tried to think through how it would operate in some detail, I've asked for feedback on the way it works at the moment and it's been informed by that feedback.

If the consensus is that it can be used as a way of applying pressure to improve the general accessibility, and that adopting it would negatively affect a long-term goal, I certainly have an opinion on the wisdom of that approach, but it's not something I feel I have a "vote" on.


So I think I'll bow out and leave it as it is. I'm certainly willing to feedback on the WPS voice tags if anyone feels it would be useful.
Comment by Sean Inglis (seani) - Monday, 23 August 2010, 11:59 GMT
@JDG

As a follow up on the "how do we defined WPS voice tags":


1) How would you add voiced prefixes, suffixes and connectives?

As it stands, I use a set of tags to link to voice entries for the likes of "TRACK", "REMAINING" etc. rather than spelling them. I also added a few entries to the LANG file to generate definitions I thought were required, so you might need to do that.

I was considerting @A @B @C etc. as the tokens to represent pre-generated entries like this, and just adding to the hard-coded list if any cropped up.

I was also considering adding #A #B #1 #2 etc. to indicate that a specific character should be spelled out at that point in the announcement.



2) The patch allows differing announcements to be made, depending on how many times the hotkey has been pressed. If a press is made within a timeout period, you get the next defined announcement. If it's after the timeout period, you start from scratch.

This allows the most common announcement to be specified as the first entry, but access to other information without "cluttering up" the common one. You might want to consider doing something similar (some sort of alternating subline-esque syntax?).
Comment by alex wallis (alexwallis646) - Monday, 23 August 2010, 13:52 GMT
Hi, I am in favour of voicing the wps if possible, I like the approach that this patch adopts with being able to flexibly set what is and isn't announced.

speaking for myself I find it extremely useful being able to get basic status information on the player for example the battery with just a few key presses, rather than having to mess around going through all the menus just to get it and even then when I did that I would have to listen to a huge long message with the info as both a percentage and remaining time. This patch greatly speeds up getting status information.

It has been quite frustrating for me not being able to get status information on elapsed and remaining time of a track or playlist. Of course I haven't even had access to status information on the sleep timer before.

I think if a serious attempt is going to be made to voice the wps, it should have an option to use the hotkey for scrolling through the wps information as this patch does, and perhaps even make it so you can customise what you want spoken in a particular wps and what order.
As it is important to minimise the number of key presses needed to get information, Due to the fact that you can't pause a file and still use speech.
the big thing I like with this patch is the flexibility it offers.

As for building in accessibility to the theme editor, that's a possibility, but does it use qt? As of course rbutil suffers from using qt which isn't very friendly with some screen readers.

The wps tags are not two friendly with screen readers, as I believe they use a lot of punctuation signs such as % so in order to work with them, you would have to probably set your screen reader to read all punctuation simbles, and then to understand a string of tags, you would probably have to arrow through one letter or word at a time from left to right,
you couldn't just scroll down a hole line as it wouldn't make much sense it would just sound like a meaningless string of letters.
What I am saying is it would probably take a while to interpret a given list of tags. As at least at first, you would have to keep flipping between what you wanted to understand, and a listing for all of the tags.

At least with seans patch, I don't have to worry about stuff I don't need to understand such as colours or layout, I can just give a list of tags, and know they are only applicable to what is voiced.
Comment by Jonathan Gordon (jdgordon) - Monday, 23 August 2010, 14:03 GMT
FS#11564 is my first attempt at making the talking wps, and this is the last I will say about it in this thread.
Sean, I dont see any technical reason for this atch to not build ontop of the other one (making it mostly an accessability patch), this patch could easily set up a bunch of canned voice "sentences" which can be loaded and talked on the hotkey press outside of the skin system (but still use it to do the actual talking).

Any special connectors, strings, etc would obviously be more useful in the other patch where everyone would be able to use them, but they could be just as easily available here.

One possible solution might be to have a separate file with a list of "name: sentence" (which could be shipped in the rockbox zip) and then the accounce global_settings var might be just "a:b" which would be "talk the a sentence the first time, and b the second" (where sentences here would be valid skin syntax... i.e %Tx(%cl, %cM) )
Comment by Frank Gevaerts (fg) - Sunday, 12 December 2010, 21:29 GMT
Synced to r28818 (untested, and no opinion about whether or not we want this)

Loading...