On Sat, 16 Jul 2005 15:08:31 +0200
Magnus Holmgren <firstname.lastname@example.org> wrote:
> Pedro Vasconcelos wrote:
> > The optimizations that I made were fairly straightforward: writing short
> > asm routines for 32-bit arithmetic in the hooks provided and placing
> > some critical arrays in the fast IRAM.
> Any idea how much the cosine table is used? It can be made to fit
> (perhaps by throwing out some of the window lookup tables that aren't
> used anyway), but when I tried that, I didn't notice much of a difference.
I tried keeping the most used stuff in IRAM: the pcm buffer used in
floor synthesis, sine/cosine tables and window lookup tables. Because
space is limited only the most commonly used windows sizes (256 and 2048)
are kept in IRAM (there isn't enough space for all of them anyway).
Also: there are two methods in the Vorbis spec for floor encoding and
only floor1 method is used by current encoders, so only those tables are
kept in IRAM.
Right now Rockbox Tremor uses almost all of the 32kb IRAM available for
a codec, but can't see much more we could put there that would make much
of a difference.
> > The other difficulty is the lack of profiling in the actual iriver
> > hardware. I have done Tremor profiling on my P4 to get an idea of what
> > were the critical functions, but the Coldfire is very diferent (cache,
> > pipelines, etc) so it all guesswork.
> Still, that should give an indication over what functions are used a
> lot, to see what to focus on and what to put in IRAM... Have you asked
> on the Tremor mailing list about profiling data (on ARM or ColdFire
Sure, I have done that profiling and it does give an indication of the
most relevant functions; here is the profiling obtained from decoding a
2min Q6 song:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
20.30 4.41 4.41 346878 0.01 0.01 mdct_butterfly_generic
13.79 7.41 3.00 15778 0.19 0.60 mdct_backward
10.06 9.59 2.19 6065325 0.00 0.00 decode_packed_entry_number
9.92 11.74 2.15 453691 0.00 0.01 vorbis_book_decodevv_add
8.20 13.53 1.78 15778 0.11 0.11 _vorbis_apply_window
7.09 15.06 1.54 7847603 0.00 0.00 oggpack_look
5.82 16.33 1.26 164430 0.01 0.01 render_line
3.98 17.20 0.86 15778 0.05 0.06 mdct_bitreverse
2.85 17.82 0.62 vorbis_synthesis_blockin
The mdct functions seem like the best candidates for optimisation, which
was also what the Tremor list people suggested. The most promising route
seems to be replacing the MDCT with an FFT together with some shufling
of the data to give a mathematically equivalent answer. From what I
gather the FFT algorithm is much more regular and can be optimised to
use the MAC pipeline more effectively. But I have no idea about how to
do this replacement.
Received on Sat Jul 16 18:34:25 2005