Rockbox mail archive
Subject: Re: Optimized bitswap
From: Magnus Holmgren (lear_at_algonet.se)
Linus Nielsen Feltzing wrote:
> > After testing using a small fliptable (i.e. 256 bytes rather than
> > 128 kB), I've found that the following code in mpeg.c:bitswap()
> > actually is quick enough (at least when executing in internal RAM)
> > on my Player.
> Remember that the Player is faster than the Recorder.
> > Seems to slow down scrolling a bit though. :)
> So is it really quick enough then? :-)
Yes, playback is fine. No other problems noticed (but I haven't tested
*that* much yet).
> Yes, _all_ memory accesses matter. The reason that the 128Kb table is
> quick is because it allows for a tight loop with half as many laps as
> there are bytes. Out of all the memory accesses, only 3 of them deal
> with the MP3 data and the table. The rest are instruction fetches.
And maybe word fetches are faster than byte fetches? E.g., aren't there
architectures that always fetches e.g. longs, even for byte accesses?
I've no idea how the SH1 works...
> Your new version is somewhat more complicated. I haven't looked at
> the generated assembly code, but I can guess that it would be in the
> range of 15-20 instructions per loop. You have one extra table
> lookup, resulting in 4 slow accesses. Estimated cost: 17x100+100x4x3
> = 2900. It's faster than a straight-forward byte-based swap.
Are there any L1 caches that could keep (much of) the table in memory?
> If we now put the table in IRAM, it would then be
> 17x100+100x2x3+100x2=2500. So in theory, it could be about 50% slower
> than the word-based swap, and we save 128Kb. Definitely something to
In my tests, the table was not in IRAM...
Page was last modified "Jan 10 2012" The Rockbox Crew