Rockbox mail archiveSubject: Descrambling
From: Magnus Holmgren <lear_at_algonet.se>
Date: Sat, 09 Aug 2003 13:09:49 +0200
Someone "complained" that the descrambling in rolo_load() was slow, so I
had a look at it. :) As divides on the SH-1 is fairly slow, having two
per descrambled byte explains a lot. So I first tried by replacing the
two divides with one multiply (basically use the "addr" calculation from
scramble.c instead). This halved the execution time.
Then I rewrote the new version in assembler and halved the execution
time again. :) Moving it to the internal RAM cut another 20-25 percent.
In numbers, that means the execution time has gone down from 1.6 seconds
to 0.3 seconds (32 ticks). There's one limitation though: the image can
be at most 256 kB large... Maybe that's one reason for the 200 kB limit
imposed by the Archos firmware?
I also noticed that the code that really needed to be in the "top" RAM
wasn't much, so I moved that to the internal RAM too, so the .topcode
section could be removed. It means an additional 80 bytes of the
internal RAM is used (this includes the descramble code). Even if the
ROLO stuff, per definition, isn't executed much, I'd say it's worth it
:) But what do you think? Should I leave that in the patch?
Also, before submitting it, I think I'll convert the checksumming to
assembler as well. It will only add a few bytes anyway, and only adds
one cycle per byte in execution time.
Btw, while doing this, I noticed a potential bug in bitswap.S: a nop
should be added after the rts instruction. It isn't a problem with the
current code, but if it is changed, the bug could be triggered, I think...
-- Magnus HolmgrenReceived on 2003-08-09