Rockbox mail archive
Subject: Re: Optimized bitswap
From: Linus Nielsen Feltzing (linus_at_haxx.se)
On Sun, 18 Aug 2002 20:54:43 +0200, Magnus Holmgren wrote:
> After testing using a small fliptable (i.e. 256 bytes rather than
> 128 kB), I've found that the following code in mpeg.c:bitswap()
> actually is quick enough (at least when executing in internal RAM)
> on my Player.
Remember that the Player is faster than the Recorder.
> Seems to slow down scrolling a bit though. :)
So is it really quick enough then? :-)
> The memory accesses seems to be the critical part.
Yes, _all_ memory accesses matter. The reason that the 128Kb table is
quick is because it allows for a tight loop with half as many laps as
there are bytes. Out of all the memory accesses, only 3 of them deal
with the MP3 data and the table. The rest are instruction fetches.
Thus, having the code in IRAM is what gives us the major performance
boost. Let's say that today's word-based loop does 10 assembly
instructions per MP3 data word, 1 cycle each. Add to that the slow
MP3 data and table accesses: 1 MP3 data read, 1 table lookup read, 1
MP3 data write. Let's say that they take 3 cycles each. The cost for
100 words is then 10x100+100x3x3 = 1900 cycles.
Your new version is somewhat more complicated. I haven't looked at
the generated assembly code, but I can guess that it would be in the
range of 15-20 instructions per loop. You have one extra table
lookup, resulting in 4 slow accesses. Estimated cost: 17x100+100x4x3
= 2900. It's faster than a straight-forward byte-based swap.
If we now put the table in IRAM, it would then be
17x100+100x2x3+100x2=2500. So in theory, it could be about 50% slower
than the word-based swap, and we save 128Kb. Definitely something to
Linus Nielsen Feltzing, linus_at_haxx.se on 2002-08-18
Page was last modified "Jan 10 2012" The Rockbox Crew