|
Rockbox mail archiveSubject: Re: Optimized bitswapRe: Optimized bitswap
From: Linus Nielsen Feltzing <linus_at_haxx.se>
Date: Sun, 18 Aug 2002 21:11:22 +0200 On Sun, 18 Aug 2002 20:54:43 +0200, Magnus Holmgren wrote: > Hi, > > After testing using a small fliptable (i.e. 256 bytes rather than > 128 kB), I've found that the following code in mpeg.c:bitswap() > actually is quick enough (at least when executing in internal RAM) > on my Player. Remember that the Player is faster than the Recorder. > Seems to slow down scrolling a bit though. :) So is it really quick enough then? :-) > The memory accesses seems to be the critical part. Yes, _all_ memory accesses matter. The reason that the 128Kb table is quick is because it allows for a tight loop with half as many laps as there are bytes. Out of all the memory accesses, only 3 of them deal with the MP3 data and the table. The rest are instruction fetches. Thus, having the code in IRAM is what gives us the major performance boost. Let's say that today's word-based loop does 10 assembly instructions per MP3 data word, 1 cycle each. Add to that the slow MP3 data and table accesses: 1 MP3 data read, 1 table lookup read, 1 MP3 data write. Let's say that they take 3 cycles each. The cost for 100 words is then 10x100+100x3x3 = 1900 cycles. Your new version is somewhat more complicated. I haven't looked at the generated assembly code, but I can guess that it would be in the range of 15-20 instructions per loop. You have one extra table lookup, resulting in 4 slow accesses. Estimated cost: 17x100+100x4x3 = 2900. It's faster than a straight-forward byte-based swap. If we now put the table in IRAM, it would then be 17x100+100x2x3+100x2=2500. So in theory, it could be about 50% slower than the word-based swap, and we save 128Kb. Definitely something to consider. /Linus -- Linus Nielsen Feltzing, linus_at_haxx.se on 2002-08-18Received on 2002-08-18 Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy |