|
Rockbox mail archiveSubject: Re: Profiling the codecsRe: Profiling the codecs
From: Brandon Low <lostlogic_at_lostlogicx.com>
Date: Fri, 26 Jan 2007 06:25:35 -0600 I notice that on mpa 5.7 and on vorbis a whopping 7.3% of CPU time is spent in switch_thread. Can you by chance try turning off priority scheduling and see how that impacts those percentages? That seems like an awefully high scheduling overhead. Nice work on the profiling -- definitely much higher information content than I was able to achieve with on device profiling. --Brandon On 2007-01-25 (Thu) at 22:14:08 +0100, Tomasz Malesinski wrote: > I have used my iFP emulator (http://www.mimuw.edu.pl/~tmal/rockbox/) > to make some profiling. The emulator counts CPU cycles, memory > accesses and cache events (iFP has 8 kB of 2-way unified cache). > > The results are here: > http://www.mimuw.edu.pl/~tmal/rockbox/profile_mpa.txt > http://www.mimuw.edu.pl/~tmal/rockbox/profile_vorbis.txt > > The counters are: > CHIT - cache hits > CMIS - cache misses > CFLU - cache flushes > IRMA - IRAM accesses > SRMA - SRAM (32-bit) accesses > CPUC - CPU cycles (not including the memory wait states) > > What is interesting in the mpa codec is that most of the memory > accesses happens in dct32 and synth_full. This is most likely caused > by the huge size (~4 kB on ARM) of the dct32 routine. The libmad Makefile > even uses -O instead -O2 on Ipod to get smaller code. I suppose that > if someone could implement a small not unrolled dct32 routine, it > would make a significant speed up. > > Does someone know a compact implementation of dct32? It is called > DCT-II in > > http://en.wikipedia.org/wiki/Discrete_cosine_transform > > If noone volunteers I might try to find some time and implement/find > something. > > -- > Tomek Malesinski Received on 2007-01-26 Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy |