This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#9882 - Speed-up mod for large block (Vorbis)
Attached to Project:
Rockbox
Opened by Sei Aoyumi (Aoyumi) - Sunday, 08 February 2009, 08:23 GMT+2
Last edited by MichaelGiacomelli (saratoga) - Sunday, 02 August 2009, 20:31 GMT+2
Opened by Sei Aoyumi (Aoyumi) - Sunday, 08 February 2009, 08:23 GMT+2
Last edited by MichaelGiacomelli (saratoga) - Sunday, 02 August 2009, 20:31 GMT+2
|
DetailsThis patch speeds up the decode of the OggVorbis file of mainly "q-1(/-2)[32kHz~48kHz]" equivalency encoded in libvorbis.
I tested this patch in Sansa e200, but probably even Sansa c200, iaudio M5/X5, iPod color/Photo/Mini 2nd gen effective. |
This task depends upon
Closed by MichaelGiacomelli (saratoga)
Sunday, 02 August 2009, 20:31 GMT+2
Reason for closing: Accepted
Additional comments about closing: Accepted in r20783 and r20843 so I think its safe to close this now.
Sunday, 02 August 2009, 20:31 GMT+2
Reason for closing: Accepted
Additional comments about closing: Accepted in r20783 and r20843 so I think its safe to close this now.
Under the present conditions, it is used only in the window lookup table and a PCM buffer.
some of my test files are decodign faster with your patch and some without...
attatched are the test codec logs
1. Create pcm back buffer (used by block.c) in IRAM if possible also. if we can do this, we can then use a true double-buffer approach (flip between bufferA and bufferB when synthesising), meaning vect_add can operate iram-to-iram, and save a memcpy for each block. If we can't do this, we use the old back buffer approach where we just memcpy the right-hand subframe of the current frame
2. replace call to vect_copy with call to memcpy instead, as they are surely interchangable and memcpy ought to be optimal
3. move vorbis_synthesis_blockin into ICODE so that vect_add operations (which are inline) operate in icode.
4. move vorbis_apply_window into ICODE so that vect_mult operations (which are also inline) operate in icode
5. On ARM, save one instruction for each register multiply in vect_mult_bw, by removing the left-shift (i.e. effectively use MULT32 instead of MULT31), and we then do the left-shift inline in vect_add_xxx. Implemented C code version (misc.h) also, although not yet for Coldfire (not sure if coldfire would have a benefit from that approach anyway)
6. Remove memset in vorbis_apply_window, since we already only add the parts of the frames that we know overlap, and the data outside that range is not used for lapping anyway
I would be very interested to know what improvement people observe on targets, especially non ARM targets. I see about 1.5MHz imrprovement over svn, with greatest improvement on lower bitrates.
Also seems my assertion #2 is flawed for coldfire - vect_copy indeed seems to run faster than memcpy (anyone know why that should be). Hacking it to use vect_copy again (only on coldfire; memcpy on all other targets) gives me the attached test_codec output. Still seems odd to me that coldfire should now run 2MHz slower on some bitrates than before the change, which is I guess the same thing nls observed earlier. Not sure if I've made it worse in fact.
Perhaps its worth comparing the .map files to see how memory allocation differs with this patch.
>Also seems my assertion #2 is flawed for coldfire - vect_copy indeed seems to run faster than memcpy (anyone know why that should be)
amiconn once told me that different code should be used for IRAM access then DRAM access in order to maximize write performance. I don't really understand why, but it seems to make a pretty big difference for WMA:
http://svn.rockbox.org/viewvc.cgi?view=rev&revision=18228
This could be an issue where the memcpy isn't optimal because of IRAM or DRAM use. You may want to ask amiconn about it.
Updated patch. This also now incorporates both the patches from
FS#10140also.Patch also now has /*comment*/ instead of // comments, and also includes patched apps/codecs/lib/* (to also incorporate
FS#10140, plus using memcpy for vect_copy on arm). Note that apps/codecs/lib/* does NOT have the vorbis trick of delaying the shift in vect_mult until we do the vect_add : thinking that apps/codecs/lib/* is generic, I shouldn't assume that anyone using the lib would always and only call vect_add after vect_mult. So that optimisation is only in libtremor.On coldfire-h120 I now see an extra 1MHz improvement (rather than worse performance than svn). (That's extra on top of
FS#10140). So about 2MHz improvement over svn.On ipod-video-64MB I now see about 2MHz improvement over current svn.
nls - would be interested to know what performance difference you get on your test files.
Does anything actually use those vect_xx functions in codecs/lib/ ? I couldn't see anything..
Good work!
I'll want to do more tests before committing
It causes a noise sound with high probability after processing to seek.
When a noise occurs once, my e200 continues producing a noise till rockbox initialize the libtremor again.
If iram_pcm_storage > pcmend (i.e. we have at least one pcm buffer in iram), then we need to setup the buffer pointer, but only once.
From that point onwards, it should either flip automatically after each block (if we have two pcm buffers in iram), or the buffer will remain unchanged (if we do a memcpy between that buffer and the one in regular mem). Either way, we shouldn't reset the buffer pointer in synthesis - the buffer pointer is under the control of block.c
Except - on track change when bitrate changes and we now need a pcm buffer that doesn't fit in iram. (and from that point it should behave as before, with buffer pointer managed by block.c - in this case if the pcm buffer doesn't fit in iram then the buffer pointer doesn't need to change at all since we will always memcpy rather than buffer flip)
Except - on track change when the bitrate changes to something smaller again, and we go back to a pcm buffer that does fit into iram, when we need to reset the buffer pointer again to an initial state that is consistent between both synthesis.c and block.c.
I think I hadn't appreciated that seeking will go through vorbis_synthesis_restart - so maybe after a seek we just need to reset buffer pointers again to be consistent in both block.c and synthesis.c
Now actually, if the pcm buffer doesn't fit in iram (synthesis.c) we probably shouldn't care about reset_pcmb at all - we should always create a buffer in mem. So I think one other thing that's missing is the logic to selectively oggfree the vb->pcm buffer if it wasn't in iram. I'm not sure where that went, thought it was there but can't see it right now ..
The 2nd PCM buffer is initialized after processing to seek now.