FS#8894 - Speeding playback up/down without affecting pitch
Opened by Stephane Doyon (sdoyon) - Tuesday, 15 April 2008, 01:23 GMT
Last edited by Steve Bavin (pondlife) - Friday, 12 June 2009, 07:21 GMT
Speeding playback up/down without affecting pitch
aka time scaling.
The good news is: this actually works.
The bad news: it still needs work.
I've been using this on my players for many months now. Life has made it
so that my Rockbox spare time has been reduced to very little,
unfortunately. And this works just well enough that I have not felt
a pressing need to complete this. Therefore I am putting this up in its
current state so that it can be useful to others and in the hope that
someone will pick it up and give it the love it needs.
This work it based on a previously unreleased implementation by
Nicolas Pitre <email@example.com>. So the credit for this mostly goes to him.
It's loosely based on the WSOLA algorithm. Nicolas implemented this from
scratch, working from a good understanding of the general algorithm. My
contributions: I helped with a bit of tuning and a bug report or two, and
I started on this half-baked integration into Rockbox which is still
Nicolas and I both used this implementation for a few years on our
Linux'ified iPAQ H3600 handhelds, to speed up talking books. Those have a
StrongArm processor running at 206MHz, which is relatively modest and
does not support floating point operations.
Nicolas released the code to me under GPL, with the explicit
understanding that I would post it here for integration into Rockbox. He
is not himself a Rockbox user at this time.
This patch has been tested on X5 and E200. It works well enough for
speeding up audio books (which are typically lower bit rate and mono). I
cannot stress enough how tremendously useful this feature is to me.
Slowing down speech works, but intelligibility is not much
improved. Music can also be sped up or slowed, but with significant
I can speed up low bit rate speech to about a factor 3, although in actual
use one would normally use a factor of 1.6 to 2. Speeding up high bit
rate music runs out of CPU at a somewhat lower factor.
Since I was familiar with Nicolas's implementation and I knew it did not
require too much CPU power, I naturally used that when trying to speed up
playback on Rockbox. Nowadays there are other implementations that could
potentially be used. The only one I have actually tried is Soundtouch,
and that comparison was admittedly done in haste. My findings were that
for speeding up speech, Nicolas's algorithm appeared to sound somewhat
better (less clicking distortion), while for slowing down music,
Soundtouch was better. Since my goal is to speed up audio books, and
since this implementation works well enough for me, I am not really
motivated to further investigate alternatives.
The main difficulty in integrating this algorithm into Rockbox is that it
needs a relatively large sound buffer to work on, a latency of about
0.1s, and this would be the first Rockbox DSP effect to have this kind of
requirement AFAICT. Also the implementation was meant to process larger
chunks at a time, and I do not have a very accurate estimate of the
required input buffer size for the algorithm, and so I am feeding it
larger chunks than absolutely necessary.
Some latency can be felt in the UI: little or none for low bitrate files,
but pretty bad for high bit rate files. A better integration with dsp.c
and better buffering estimations would presumably prevent this.
I haven't measured the effect on my battery life. Subjectively, it
doesn't feel disastrous, but I imagine it could be improved.
I've bypassed the IRAM buffer that was too small for my needs. It should
be easy to add logic to use the IRAM buffer at least when time scaling is
not in effect.
I've also left a bunch of debugging macros in there.
The algorithm has several tunables that trade quality for CPU
utilization. I imagine some DSP gurus might like to tinker with these and
with the code. I have played with this a bit and I think the current
quality level is (subjectively) just about right for speech.
Another interesting feature to add would be a true pitch shift function:
combining this time scaling function with the
sampling rate alteration effect (what Rockbox
currently calls pitch, to produce an effect that shifts pitch without
affecting speed, or that allows controlling both speed and pitch
independently. I imagine musicians would find that useful.
I hope this will make other speech listeners as happy as it's made me.
Friday, 12 June 2009, 07:21 GMT
Reason for closing: Accepted
Additional comments about closing: Thanks!