|
Rockbox mail archiveSubject: Profiling the codecsProfiling the codecs
From: Tomasz Malesinski <tmal_at_mimuw.edu.pl>
Date: Thu, 25 Jan 2007 22:14:08 +0100 I have used my iFP emulator (http://www.mimuw.edu.pl/~tmal/rockbox/) to make some profiling. The emulator counts CPU cycles, memory accesses and cache events (iFP has 8 kB of 2-way unified cache). The results are here: http://www.mimuw.edu.pl/~tmal/rockbox/profile_mpa.txt http://www.mimuw.edu.pl/~tmal/rockbox/profile_vorbis.txt The counters are: CHIT - cache hits CMIS - cache misses CFLU - cache flushes IRMA - IRAM accesses SRMA - SRAM (32-bit) accesses CPUC - CPU cycles (not including the memory wait states) What is interesting in the mpa codec is that most of the memory accesses happens in dct32 and synth_full. This is most likely caused by the huge size (~4 kB on ARM) of the dct32 routine. The libmad Makefile even uses -O instead -O2 on Ipod to get smaller code. I suppose that if someone could implement a small not unrolled dct32 routine, it would make a significant speed up. Does someone know a compact implementation of dct32? It is called DCT-II in http://en.wikipedia.org/wiki/Discrete_cosine_transform If noone volunteers I might try to find some time and implement/find something. -- Tomek MalesinskiReceived on 2007-01-25 Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy |