This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#12223 - sinc resampling for rockbox
Attached to Project:
Rockbox
Opened by Stefan Keller (drezon) - Tuesday, 09 August 2011, 12:36 GMT+2
Opened by Stefan Keller (drezon) - Tuesday, 09 August 2011, 12:36 GMT+2
|
DetailsThe attached patch implements sinc based resampling in rockbox.
I have only tested it stand-alone so far (i.e. the code outside of rockbox -- I am waiting for a replacement for my broken android phone to test it there). There is (of course) quite some room for improvement. Things which could be improved: - buffer handling - calculate the correct sinc spanning the whole table for downsampling. Then the downsampling and upsampling code would be the same (no need for sinc_increment, saving some additions). Also the result would be more accurate resampling. - Save some memory by always calculating the correct sinc and putting it into resample_data. Would need to use fp_sincos() for that. - Save some more memory by taking advantage of sinc's symmetry, only storing half of it in the table. I know that I'm breaking the coding-style by using more than 80 chars in some lines. I will fix that when I have time. I wanted to get this code finally out to get some feedback after it sitting for two weeks (basically unchanged) on my HD. |
This task depends upon
Now it WorksForMe on my Android phone.
I also think the sinc tables should be int16_t to save some space.
However, since sinc_increment depends only on the sample rate conversion, theres only a few possible values it can take for downsampling (corresponding to 48->44.1, 88.2->44.1 and 96->44.1). Would it be possible to just store the needed coefficients for each of these three conversions (and perhaps also for the upsample cases 11.025->44.1, 22.05->44.1 and 32->44.1)? This would have the added advantage that we'd have regular stride lengths, which are much easier to optimize, particularly on systems like ours with small cache memory.
How would you get rid of the fixed-point current in sinc_upsample? Note, that the highest 8 bit of the fractional part are used as an index into the sinc (to get the sinc's value between the two samples / zero-crossings).
- exploit sinc's symmetry and only store one half
- stretch and scale for downsampling in resampler_new_delta to have unified core up/downsampling code
- reverse sinc indices to help keeping related data in a cache line
Fixed resample_data usage to not have to adapt all assembler code
sinc is alloced using buffer_alloc (not enough IRAM on Clip+)
Corrected input/output sample number calculations
Without DSP: 668.14% 35.92 MHz
With DSP linear interpolation: 561.49% 42.74 MHz
With DSP sinc interpolation: 364.41% 65.85 MHz
Linear interpolation used the ARM assembler code, for sinc there is only C so far -- I don't know how much speedup one could expect by using assembler.
Edit: I just tried v1.4 in my sim for a 48kHz-file, down-sampled to 44.1kHz and several upsampled files. The upsampling results in heavy distortions...
Some performance measurements on my iPod Video or 48 kHz to 44.1 kHz downasampling of a VBR MP3 file:
svn (no resampling): ~24 MHz
svn (linear interpolation): ~31 MHz
v1.5 (no ASM): ~58 MHz
v1.5 (v04 ASM): ~49 MHz
v1.5 (v05 ASM): ~44 MHz
The speedup from v1.5 to v1.5_v04 is smaller as I have expected.
@drezon: Could you review all the memcpy's that are inserted for sinc-resampling. Those will most likely hurt some of our targets. Is it possible to get rid of them?
I was (and currently am) on vacation and don't really have internet access. I did a quick check on your patches and noticed some things:
- the sinc is static again, but you need two of them because there are two dsp states (for voice and audio) with possibly different sampling rates. Maybe we should use linear interpolation for voice.
- the cache-align code was left in but then one need to allocate CACHEALIGN-1 more Bytes than actually needed
- for the 11 tap version the sinc should be calculated differently (the windowing should be done differently). Also for this version one only needs 256*6 int32_ts, saving some memory in addition to cpu cycles.
Changing the octave script to scale the sinc by 2^12 to get rid of the shifts is easy. Once I'm back from vacation I will do this (and enhance the other things mentioned above).
Getting rid of the memcpys isn't that easy. By merging the two buffers we could get rid of some memcpys but the code would become pretty complex and hard to read I guess. I'll ponder over this. :)
DSP stages must obey output constraints as any output buffer request may in fact get the size truncated (crossfade will do this at all times). tdspeed is already a troublesome thing in this respect due to storing data and then spitting out whatever it wants when it's ready to output a frame (bad, bad, bad). Input/output counts based upon frequency ratios aren't sufficient hints.
Another thing is it's not a good idea to use this resampling for all codecs on all processors. Some are already near the limit of cababilities of the CPU (like SPC). Linear must remain available and hints given by the metadata parsers or some other mechanism as to which to use.