Rockbox mail archiveSubject: Re: Voice patches
Re: Voice patches
From: Stéphane Doyon <s.doyon_at_videotron.ca>
Date: Fri, 05 Oct 2007 23:21:30 -0400 (EDT)
On Thu, 4 Oct 2007, Daniel Dalton wrote:
> On 28/09/2007 1:03 AM, Stéphane Doyon wrote:
>> I have a lot more ideas on making Rockbox's voice interface more usable,
> Could you tell us some of your ideas?
Well, I have some 15 or 20 patches languishing in the tracker, so as I
said before, I don't think there's much point adding more until I see
progress on some of this.
P7563, P6325 and P6171 should be ready for inclusion IMO.
And kbd-accessible.diff from P6324 as well.
I'm happy with P7774 and P7775 too, although perhaps a bit wider testing
would be a good idea.
I could use advice on P6239: I need to enqueue multiple thumbnails (.talk
clips). I suppose it wouldn't be acceptable to just declare several more
buffers. I'd rather not make this part more complex by adding another
thread and reading files from another context. Right now I'm using a
hack... but I'd welcome suggestions.
Depending on that, talk_file and playlist_catalog from P6323 are ready,
as well as P6240: improved feedback in bookmark selection. I've had
several positive comments about that one: it's useful, people like
it. Could use testing on HWCODEC, if there are any HWCODEC voice users
P7653 also is good, although perhaps the setting I added in there is
overkill and should always be on.
P6331 and P7777 are also ready.
Still I guess that's a lame answer to your question. About other ideas...
-Of course there's the low-level issues I'd like to work on. Ability to
speak while paused, interrupting voice quickly while music is playing,
cleaner interface between talk.c and playback.c. A quick one would be to
have mp3_play_stop() do the same as voice_stop(), that'd eliminate some
stuttering. And what voice_stop() does should perhaps be a shutup() call
from playlist_start(). But there's a lot of higher level stuff we can do
without depending on this.
-I'm considering implementing an alternate quick screen for blind users:
something to put a few key functions closer at hand. We would like a
quick way to have the time spoken, the Rockbox info menu is just too
far. I'd like to put in a function that temporarily overrides file and
dir .talk clips, for those times where you can't make out what the
synthesizer was saying in that .talk clip, because it's a weird band name
that you haven't heard before, and you need it spelled out jus this
once. Also a quick way to adjust volumes when browsing files or
menus. Perhaps a hot key to toggle tracklock / study mode (P6188).
-Voice memo recorder functionality: the recording screen is meant more
for elaborate music recording jobs. I'd like a context where I can be
sure what's going on with little or no feedback. There's the issue of
voice being disabled during recording, but beyond that, I'd like to be
able to record a quick memo on my player without having to take out my
earphones (and put them away again afterwards). It could be as simple as
a button context where it records only while you hold the RECORD
key. Anyway avoid using one button for start/pause, and one button for
STOP and exit, so in case you're not confident your keypress went in, you
can always press it a second time. And then find ways to facilitate
managing a collection of memos: it's nice having the date in the
filename, but it's not manageable when spelled. Add a context menu
function to allow on the spot recording of a .talk clip associated to a
particular file or dir. Eventually perhaps implement cut&paste editing of
an audiofile, as long as it's uncompressed.
-Infrastructure to load secondary voice files. Use that to make plugins
talk, without having to increase the size of the main voice file.
-Increase playback speed without affecting pitch. I'm not starting from
scratch here, I have something more or less working, only the integration
into Rockbox is still somewhat rough.
-Make some basic WPS info accessible, at least time position and trac
duration. Perhaps in a list like the id3 screen.
-Coarse navigation function: I once had an entire audiobook in a single
track, lasting 24hours. It would be nice to be able to jump by some
coarse increments like 5mins, 30mins, 3hours, then jump to within 10secs
of the end of the track, and perhaps some proportional jumps like 10% of
track... Part of the problem is that fastforward/rewind gives no progress
feedback at all for blind users, because of the issue with pausing. But
beyond that, a tool to move around big tracks might be useful.
-A kind of voice database: idea in part from Mario Lang. The metadata
spoken in the database browser or id3 screen is always spelled, which is
really slower. What if we preprocessed all files on the host computer,
extracted all metadata tag text, have them all spoken similarly to what
we do for .talk clips, and put all that in some sort of mini-database:
perhaps just a big blob with some kind of hash index, or perhaps even
each entry into a separate file. The idea is to have the voiced audio
data indexed by the string that is being spoken. These could then be
loaded on demand and cached. We could try to mitigate the disk access
delays by keeping stats as to the most often used tags and preload those
in one pass. I'd need to experiment to find out whether this would be
feasible. If it does work then it would be useful beyond metadata
tags. OTOH, if the espeak plugin works and assuming the response time is
good, perhaps this is really not needed.
-See if we could implement DAISY, or a subset of it, by preprocessing the
DAISY on the host machine and coming up with a cue sheet for each DAISY
level. We'd need to have a playlist backed by multiple cue sheets and the
ability to switch between them. I'm not entirely sure this makes sense,
would need to look into it more seriously.
-Spontaneous battery level warning, say at 50% and 20%.
-Make some plugins talk, at least those called from core.
-- Stéphane Doyon <s.doyon_at_videotron.ca> http://pages.videotron.com/sdoyon/Received on 2007-10-06