Rockbox.org home
release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Rockbox mail archive

Subject: Re: Profiling the codecs

Re: Profiling the codecs

From: Brandon Low <lostlogic_at_lostlogicx.com>
Date: Fri, 26 Jan 2007 06:25:35 -0600

I notice that on mpa 5.7 and on vorbis a whopping 7.3% of CPU time is
spent in switch_thread. Can you by chance try turning off priority
scheduling and see how that impacts those percentages? That seems like
an awefully high scheduling overhead.

Nice work on the profiling -- definitely much higher information content
than I was able to achieve with on device profiling.

--Brandon

On 2007-01-25 (Thu) at 22:14:08 +0100, Tomasz Malesinski wrote:
> I have used my iFP emulator (http://www.mimuw.edu.pl/~tmal/rockbox/)
> to make some profiling. The emulator counts CPU cycles, memory
> accesses and cache events (iFP has 8 kB of 2-way unified cache).
>
> The results are here:
> http://www.mimuw.edu.pl/~tmal/rockbox/profile_mpa.txt
> http://www.mimuw.edu.pl/~tmal/rockbox/profile_vorbis.txt
>
> The counters are:
> CHIT - cache hits
> CMIS - cache misses
> CFLU - cache flushes
> IRMA - IRAM accesses
> SRMA - SRAM (32-bit) accesses
> CPUC - CPU cycles (not including the memory wait states)
>
> What is interesting in the mpa codec is that most of the memory
> accesses happens in dct32 and synth_full. This is most likely caused
> by the huge size (~4 kB on ARM) of the dct32 routine. The libmad Makefile
> even uses -O instead -O2 on Ipod to get smaller code. I suppose that
> if someone could implement a small not unrolled dct32 routine, it
> would make a significant speed up.
>
> Does someone know a compact implementation of dct32? It is called
> DCT-II in
>
> http://en.wikipedia.org/wiki/Discrete_cosine_transform
>
> If noone volunteers I might try to find some time and implement/find
> something.
>
> --
> Tomek Malesinski
Received on 2007-01-26


Page was last modified "Jan 10 2012" The Rockbox Crew
aaa