FS#9318 - MP3 synthesis filter on COP

Attached to Project: Rockbox
Opened by MichaelGiacomelli (saratoga) - Monday, 25 August 2008, 01:50 GMT
Last edited by MichaelGiacomelli (saratoga) - Saturday, 20 September 2008, 22:06 GMT
Task Type Patches
Category Codecs
Status Closed
Assigned To No-one
Operating System PortalPlayer-based
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


My measurements of the COP and boost power consumptions for the PP5024 suggest that load balancing rockbox as much as possible across both of the cores is optimal for battery life. To test this, I've put the synthesis filter from MAD on the coprocessor. This reduces CPU load sufficiently that the CPU doesn't need to boost at 30MHz, and probably very little at 24MHz.

Unfortunately, theres still some glitching, either because of some cache coherency problem, or because I don't properly synchronize the two threads.
This task depends upon

Closed by  MichaelGiacomelli (saratoga)
Saturday, 20 September 2008, 22:06 GMT
Reason for closing:  Accepted
Comment by Ryan Sawhill (ryran) - Monday, 25 August 2008, 03:10 GMT
Seems to be a little problem with the very beginning of your patchfile, saratoga. Can't see where that first hunk was supposed to go in mpa.c...
Comment by Ryan Sawhill (ryran) - Monday, 25 August 2008, 03:32 GMT
Mmmm. On closer inspection...I should be asleep already. No problem fixing it. Giving it a go.
However, the patchfile is still b0rked.
Comment by MichaelGiacomelli (saratoga) - Monday, 25 August 2008, 12:20 GMT
I tried it again, and it applied and compiled cleanly for me when applied to fresh SVN. If you're trying to use a PP5020 player, it might not work due to lack of IRAM, and I think COP isn't working at all on PP5002.

Though if you mean MP3 decoding is broken, I'm still trying to figure out why it has that distortion.
Comment by Ryan Sawhill (ryran) - Monday, 25 August 2008, 15:50 GMT
With a completely clean & updated SVN tree, it fails on the first hunk for me. I can see why and as I said, I can fix it. Just seems odd. Also, for some reason I've always thought the 5g had a 5024 chipset. After looking again at the wiki, now I see I was wrong. So I suppose that explains why mp3 decoding doesn't work with this on my 5g. :)
Comment by MichaelGiacomelli (saratoga) - Tuesday, 26 August 2008, 01:28 GMT
Improved version of the patch with feedback from amiconn.

Properly decodes a single file on my Sansa, but I have not yet figured out how to clean up after my threads, and so track changes do not work yet. Additionally, running test_codec still isn't possible, so theres still some sort of bug thats only apparent under certain conditions.

Additionally, I have not yet investigated the codecs.c code here:
Comment by MichaelGiacomelli (saratoga) - Wednesday, 27 August 2008, 22:55 GMT
Fixed a stupid mistake on my part that caused mp3 playback to break on track changes. Playback now appears correct, although test_codec still doesn't work.
Comment by Nicolas Pennequin (nicolas_p) - Saturday, 06 September 2008, 17:56 GMT
On an iPod 5.5G 80GB, the boosting is down to almost nothing. UI navigation seems to be improved a bit at first glance, though it would still benefit from some boosting. Great work!
Comment by MichaelGiacomelli (saratoga) - Saturday, 06 September 2008, 21:32 GMT
Updated to use the event functions for synchronization. Should allow the cores to sleep while they wait, hopefully conserving additional power.
Comment by Michael Sevakis (MikeS) - Sunday, 07 September 2008, 04:00 GMT
A single count semaphore will degenerate into exactly the behavior of an automatic event. I kind of considered events to be an experimental device that might be expanded upon probably incompatibly with what is there in SVN. It was useful with codec swapping but at this point their inclusion at all is barely justified IMHO. Just a heads-up on those.

Why not use 2-count semas and a circular buffer to obviate the memcpy usage? SPC uses that method and allows simultaneous DSP processing and EMU execution.

Edit: I might be seeing now why it's that way. Some details about what libmad does with the sbsample arrays might help improve parallelism.
Comment by MichaelGiacomelli (saratoga) - Sunday, 07 September 2008, 16:22 GMT
sbsample is the input to the filterbank. memcpy allows the next frame's sbsample values to be computed while the current frame is still being synthesized. A circular buffer might be a good idea because it would allow avoiding the memcpy.

Regarding a binary semaphore, I was under the impression that they worked differently. I need a way for one thread to release a lock for a single iteration of some code, and then automatically retrieve it after the second thread is finished. I don't really understand semaphores, but it looks like once a thread releases a binary semaphore, the code it blocks becomes open to a second thread indefinitely, and I couldn't see a way for the second thread to give back the lock to the first thread. Events seem like a good fit because a thread blocking on an event can only receive each event once, and thus they automatically block if a thread calls it twice.
Comment by Michael Sevakis (MikeS) - Sunday, 07 September 2008, 20:59 GMT
There's lots of info online about counted semaphores. semaphore_wait reduces the internal count by one. if the count remains >= 0, the thread passes; if the count goes negative, the wait blocks. semaphore_release increments the count by one (saturating at the max count). If one or more threads were blocked at the time, the next thread in the wait queue is woken up and allowed to pass. With a max count of one, this is the same signalling as an automatic event.

"The Little Book of Semaphores" by Downey is a good read.
Comment by MichaelGiacomelli (saratoga) - Saturday, 13 September 2008, 22:20 GMT
Updated to use semaphores and a circular buffer as suggested by MikeS.

I've made minor changes to Mpegplayer as well to accommodate the changes to libmad.
Comment by Michael Sevakis (MikeS) - Sunday, 14 September 2008, 00:08 GMT
So far I haven't had any problems. COP usage still seems light while CPU heavy. Perhaps it's balanced when CPU has no other duties like DSP. Personally I'm curious about running more decoder on COP and how that would impact performance. I guess we covered some of that in IRC but went ahead and summarized.
Comment by Michael Sevakis (MikeS) - Sunday, 14 September 2008, 00:54 GMT
Something is curious here. I'd expect your sem inits to be:

ci->semaphore_init(&synth_done_sem, 2, 0); /* No blocks used */
ci->semaphore_init(&synth_pending_sem, 2, 2); /* All blocks free */


ci->semaphore_init(&synth_done_sem, 2, 0); /* No blocks used */
ci->semaphore_init(&synth_pending_sem, 2, 0); /* No blocks free */

followed by

ci->semaphore_release(&synth_pending_sem); /* Bring count back up to 2 to let the synth run */

If I understand the code correctly, the way it is now it can only fill one buffer at a time. Perhaps my evaluation is whacked. :)
Comment by Craig Mann (inigomontoya) - Sunday, 14 September 2008, 04:34 GMT
I just spent a some time testing this patch. Very nice, hardly any boosting at all. I am however noticing some noise in the right channel. It occurs at the beginning of a track. I can reproduce by playing a track, pausing the track for a few seconds and then playing again. The noise is not identical each time but always in the same spot. It seems to happen on specific files and not at all on others. Sounds like popping or crackling. Can anyone confirm? I'm using an e200 with 18509
Comment by Craig Mann (inigomontoya) - Sunday, 14 September 2008, 16:59 GMT
Correction, the above only happens when the code is modified with MikeS's first suggestion. It's fine with the last patch. Also, I have changed the default cpu speed to 24Mhz in my build. With this patch it still never needs to boost while playing a song, only while initially buffering. This should further enhance battery life yes?
Comment by Michael Sevakis (MikeS) - Sunday, 14 September 2008, 21:15 GMT
Hehe. Craig, that really wasn't a suggestion for immediate implementation and it won't work. Yes, the patch ran all night for me.

The confusing part for me was that there is no parallelism until the second iteration of the decoding loop (well, it's not my code so it needed a good staring session). The mutual waiting is quite tight however and to loosen it up into a full FIFO would need separate head and tail pointers and a PCM queue as well. As it is, one buffer is always free to one of the threads where a full FIFO would allow one thread to write or read all elements.