FS#2039 - 20% faster bitswap, turbocharged copy_read_sectors etc.

Attached to Project: Rockbox
Opened by Jens Arnold (amiconn) - Tuesday, 02 March 2004, 19:20 GMT
Last edited by Jörg Hohensohn (hohensoh) - Monday, 05 April 2004, 08:36 GMT
Task Type Patches
Status Closed
Assigned To Jörg Hohensohn (hohensoh)
Operating System
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No


This patch applies to 4 files:

- 20 % faster (from 18 to 15 clock cycles per loop),
- corrected alignment to longword, saves space

- new, turbo-charged copy_read_sectors in assembler

Speed figures (from clock cycle counting, taking
pipeline stalls into account):

| word-aligned | unaligned
C original | 2.02 MB/s (100 %) | 1.71 MB/s (100 %)
[IDC]Dragon | 3.42 MB/s (169 %) | 2.22 MB/s (130 %)
new version | 4.76 MB/s (236 %) | 3.33 MB/s (195 %)

If there are wait states, the speed differences may be
less but the speed relation between the routines should
be preserved. If there are memory wait states, my
routine for unaligned data should be even faster
compared to the others since it does write words, not

This is not enabled by default, if you want it compiled
in you have to comment out "#define PREFER_C".

- corrected to the desired longword alignment, saves space

- took out now unnecessary variable "oldlevel" from

The patch is against current rockbox source (2004-03-02)
This task depends upon

Closed by  Jörg Hohensohn (hohensoh)
Monday, 05 April 2004, 08:36 GMT
Reason for closing:  Accepted
Additional comments about closing:  Logged In: YES

Althought the fast read is default disabled, I think we can
close this as a patch.
Comment by Linus Nielsen Feltzing (linusnielsen) - Tuesday, 02 March 2004, 19:47 GMT

Nice work! Ill review and test it asap.

Comment by Jörg Hohensohn (hohensoh) - Wednesday, 03 March 2004, 08:15 GMT

alignment and bitswap committed (14% increase in the real
ATA code pending, I measured a real world speedup of
18%/34% aligned/misaligned.
Comment by Jens Arnold (amiconn) - Thursday, 04 March 2004, 00:06 GMT

New version of ata.c patch. Didn't include bitswap and other
fixes this time since they are already committed. The new
ata.c patch

- further improved read speed for both aligned/unaligned (only
about 3% this time)
- shorter code (6 instructions less) to save IRAM
- scratches one register less than my old code

- alternative, only slightly optimized (versus C version)
to test with slow drives that have problems with the turbo
version. This one is very short.

Look in the ata.c header area for an additional #define
Comment by Jens Arnold (amiconn) - Monday, 08 March 2004, 00:47 GMT

Corrected bug within alternative assembler routine.
Comment by Jörg Hohensohn (hohensoh) - Monday, 05 April 2004, 08:36 GMT

Althought the fast read is default disabled, I think we can
close this as a patch.