Rockbox.org home
release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Rockbox mail archive

Subject: Profiling the codecs

Profiling the codecs

From: Tomasz Malesinski <tmal_at_mimuw.edu.pl>
Date: Thu, 25 Jan 2007 22:14:08 +0100

I have used my iFP emulator (http://www.mimuw.edu.pl/~tmal/rockbox/)
to make some profiling. The emulator counts CPU cycles, memory
accesses and cache events (iFP has 8 kB of 2-way unified cache).

The results are here:
http://www.mimuw.edu.pl/~tmal/rockbox/profile_mpa.txt
http://www.mimuw.edu.pl/~tmal/rockbox/profile_vorbis.txt

The counters are:
CHIT - cache hits
CMIS - cache misses
CFLU - cache flushes
IRMA - IRAM accesses
SRMA - SRAM (32-bit) accesses
CPUC - CPU cycles (not including the memory wait states)

What is interesting in the mpa codec is that most of the memory
accesses happens in dct32 and synth_full. This is most likely caused
by the huge size (~4 kB on ARM) of the dct32 routine. The libmad Makefile
even uses -O instead -O2 on Ipod to get smaller code. I suppose that
if someone could implement a small not unrolled dct32 routine, it
would make a significant speed up.

Does someone know a compact implementation of dct32? It is called
DCT-II in

http://en.wikipedia.org/wiki/Discrete_cosine_transform

If noone volunteers I might try to find some time and implement/find
something.

-- 
Tomek Malesinski
Received on 2007-01-25

Page was last modified "Jan 10 2012" The Rockbox Crew
aaa