|
Rockbox mail archiveSubject: DescramblingDescrambling
From: Magnus Holmgren <lear_at_algonet.se>
Date: Sat, 09 Aug 2003 13:09:49 +0200 Hi, Someone "complained" that the descrambling in rolo_load() was slow, so I had a look at it. :) As divides on the SH-1 is fairly slow, having two per descrambled byte explains a lot. So I first tried by replacing the two divides with one multiply (basically use the "addr" calculation from scramble.c instead). This halved the execution time. Then I rewrote the new version in assembler and halved the execution time again. :) Moving it to the internal RAM cut another 20-25 percent. In numbers, that means the execution time has gone down from 1.6 seconds to 0.3 seconds (32 ticks). There's one limitation though: the image can be at most 256 kB large... Maybe that's one reason for the 200 kB limit imposed by the Archos firmware? I also noticed that the code that really needed to be in the "top" RAM wasn't much, so I moved that to the internal RAM too, so the .topcode section could be removed. It means an additional 80 bytes of the internal RAM is used (this includes the descramble code). Even if the ROLO stuff, per definition, isn't executed much, I'd say it's worth it :) But what do you think? Should I leave that in the patch? Also, before submitting it, I think I'll convert the checksumming to assembler as well. It will only add a few bytes anyway, and only adds one cycle per byte in execution time. Btw, while doing this, I noticed a potential bug in bitswap.S: a nop should be added after the rts instruction. It isn't a problem with the current code, but if it is changed, the bug could be triggered, I think... -- Magnus HolmgrenReceived on 2003-08-09 Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy |