Rockbox

Tasklist

FS#8750 - add some ARM assembler for dsp-routines

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Monday, 17 March 2008, 12:39 GMT
Last edited by Nils Wallménius (nls) - Wednesday, 19 March 2008, 17:25 GMT
Task Type Patches
Category Music playback
Status Closed
Assigned To No-one
Operating System PortalPlayer-based
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Adding ARM assembler for the dsp-routines sample_output_mono() and sample_output_stereo(). Saves roughly 0.6MHz during playback of 44.1kHz stereo playback.
This task depends upon

Closed by  Nils Wallménius (nls)
Wednesday, 19 March 2008, 17:25 GMT
Reason for closing:  Accepted
Comment by Andree Buschmann (Buschel) - Monday, 17 March 2008, 14:17 GMT
New version:
- minor change after discussion in IRC (other solution for "bx lr")
- added ARM assembler channels_process_sound_chan_karaoke() -- can be further optimized, but has nearly no influence on CPU-load
Comment by Andree Buschmann (Buschel) - Monday, 17 March 2008, 14:17 GMT
Now with attached patch.
Comment by Andree Buschmann (Buschel) - Monday, 17 March 2008, 20:51 GMT
Using faster clipping (suggested by preglow). Total speed up for stereo-signals is +22% now.
Comment by Andree Buschmann (Buschel) - Monday, 17 March 2008, 21:44 GMT
Final version tonight, added ARM asm for channels_process_sound_chan_mono().
Comment by Andree Buschmann (Buschel) - Monday, 17 March 2008, 21:45 GMT
Now with attached patch.
Comment by Andree Buschmann (Buschel) - Tuesday, 18 March 2008, 11:43 GMT
Next patch version for further speed up. Changes:

- store 2 halfword samples via packing into 1 word
- process 2 samples in each loop -> use of ldm and stm possible

ToDo: Odd sample counts are not handled properly. On odd counts the routines will process an additional (not needed) sample. This should not end up in any bad effects as odd counts will always access a valid buffer address (odd counts are always smaller than SAMPLE_BUF_COUNT/2).

Speed of this patchversion vs. C-code:
- sound processing mono: +76%
- sound processing karaoke: +77%
- playback mono: +83%
- playback stereo: +41%
Comment by Andree Buschmann (Buschel) - Tuesday, 18 March 2008, 20:26 GMT
Next patch after review via irc.

- within sound processing mono/karaoke the division by 2 is done before adding/subtracting to avoid possible overflow
- added a note regarding the behaviour with odd sample counts (tested and showed no negative effect)
- only perform yield() in dsp.c each tick, not each 128 samples

Loading...