- Status Closed
- Percent Complete
- Task Type Patches
- Category Music playback
- Assigned To No-one
- Operating System PortalPlayer-based
- Severity Low
- Priority Very Low
- Reported Version Daily build (which?)
- Due in Version Undecided
-
Due Date
Undecided
- Votes
- Private
FS#8750 - add some ARM assembler for dsp-routines
Adding ARM assembler for the dsp-routines sample_output_mono() and sample_output_stereo(). Saves roughly 0.6MHz during playback of 44.1kHz stereo playback.
Loading...
Available keyboard shortcuts
- Alt + ⇧ Shift + l Login Dialog / Logout
- Alt + ⇧ Shift + a Add new task
- Alt + ⇧ Shift + m My searches
- Alt + ⇧ Shift + t focus taskid search
Tasklist
- o open selected task
- j move cursor down
- k move cursor up
Task Details
- n Next task
- p Previous task
- Alt + ⇧ Shift + e ↵ Enter Edit this task
- Alt + ⇧ Shift + w watch task
- Alt + ⇧ Shift + y Close Task
Task Editing
- Alt + ⇧ Shift + s save task
New version:
- minor change after discussion in IRC (other solution for "bx lr")
- added ARM assembler channels_process_sound_chan_karaoke() – can be further optimized, but has nearly no influence on CPU-load
Now with attached patch.
Using faster clipping (suggested by preglow). Total speed up for stereo-signals is +22% now.
Final version tonight, added ARM asm for channels_process_sound_chan_mono().
Now with attached patch.
Next patch version for further speed up. Changes:
- store 2 halfword samples via packing into 1 word
- process 2 samples in each loop → use of ldm and stm possible
ToDo: Odd sample counts are not handled properly. On odd counts the routines will process an additional (not needed) sample. This should not end up in any bad effects as odd counts will always access a valid buffer address (odd counts are always smaller than SAMPLE_BUF_COUNT/2).
Speed of this patchversion vs. C-code:
- sound processing mono: +76%
- sound processing karaoke: +77%
- playback mono: +83%
- playback stereo: +41%
Next patch after review via irc.
- within sound processing mono/karaoke the division by 2 is done before adding/subtracting to avoid possible overflow
- added a note regarding the behaviour with odd sample counts (tested and showed no negative effect)
- only perform yield() in dsp.c each tick, not each 128 samples