Rockbox.org home
release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Rockbox mail archive

Subject: Re: Voice patches

Re: Voice patches

From: Stéphane Doyon <s.doyon_at_videotron.ca>
Date: Sun, 07 Oct 2007 23:43:57 -0400 (EDT)

On Sat, 6 Oct 2007, Daniel Weck wrote:
> On 6 Oct 2007, at 04:21, Stéphane Doyon wrote:
>> -Increase playback speed without affecting pitch. I'm not starting from
>> scratch here, I have something more or less working, only the integration
>> into Rockbox is still somewhat rough.
>
> As part of my programming work in the context of self-voicing user
> interfaces,

What sort of work would that be exactly? That's OT though I guess, so
perhaps privately...

> I have used 2 open-source implementations (GPL I think) of
> "timescale modification" algorithms.

Could you give me references? I'm aware of Soundtouch.

> Stéphane, what algorithm have you been playing with ? (WSOLA ?)

Err what's that? Looking it up. Hmm yes, that's it :-). Gives you an idea
of what a strong theoretical background I have on this subject :-).

What I'm using now is a previously unreleased implementation by Nicolas
Pitre <nico_at_cam.org>. I've helped him with a bit of tuning and a bug
report or two, but the credit really goes to him. He implemented that from
scratch. He worked from a good understanding of the algorithm, although
I'm pretty sure he did not read that paper :-). We both used that
implementation for a few years on our Linux'ified iPAQ H3600 handhelds, to
speed up talking books. Those have a StrongArm processor running at
206MHz, which is relatively modest. Since I was familiar with Nicolas's
implementation and I knew it did not require too much CPU power, I
naturally used that when trying to speed up playback on Rockbox. Nicolas
said he was happy to release his implementation under the GPL, so
integrating it into Rockbox officially is quite possible.

My current integration of that implementation into Rockbox is rather
rough. I do intend to post it as a patch soon. Then I'd need help from
some dsp.c guru to fit it in properly. One difficulty is that this
algorithm needs a relatively large sound buffer to work on, a latency of
about 0.1s, and this would be the first Rockbox DSP effect to have this
kind of requirement AFAICT.

The good news is it works for me on an X5 and an e200. Audio books are
typically lower bitrate and mono, and those I can speed up to a factor 3
without problem. High bitrate music can also be accelerated to some
degree, just not as much. My rough integration causes latency in the UI
when speeding up high bitrate files to near the max capacity of the CPU.
But for audio books, it works well. I haven't measured the effect on my
battery life, but I've been using this for a while and I know that
qualitatively, the effect is not disastrous.

I have not done a thorough comparison of this implementation vs
Soundtouch, or other implementations. I did do a quick subjective
comparison with Soundtouch: for speeding up speech, Nicolas's algorithm
appeared to sound a bit superior, while for slowing down music, Soundtouch
was better. I must admit that this works well enough for me that I am not
really motivated to further investigate alternatives.

>> -See if we could implement DAISY, or a subset of it, by preprocessing the
>> DAISY on the host machine and coming up with a cue sheet for each DAISY
>> level. We'd need to have a playlist backed by multiple cue sheets and the
>> ability to switch between them. I'm not entirely sure this makes sense,
>> would need to look into it more seriously.
>
> Right. I think that would be a good start, as it would be easy to implement
> using existing Rockbox features.
>
> However, I would like to offer blind/visually-impaired users the convenience
> of using their standard Daisy 2.02 or even 3.0 DTBs (Digital Talking Books),
> without conversion.

I imagine that would be more convenient, and probably acceptable as long
as things like XML parsing happen in a plugin. Although personally I don't
see preprocessing on the host as a big limitation because it would be near
instantaneous (unlike video conversion for example).

> 2) text support: well, I think this is unnecessary to start with. When we've

I agree.

> 3) the ability to play Daisy books from a plugin: my test implementations so

I'm not sure about this one. Why play from a plugin? Or is this just a
step in your development strategy.

Admitedly I an not familiar with the newer versions of the DAISY standard
and may be missing some issues.

-- 
Stéphane Doyon
<s.doyon_at_videotron.ca>
http://pages.videotron.com/sdoyon/
Received on 2007-10-08

Page was last modified "Jan 10 2012" The Rockbox Crew
aaa