dev builds
themes manual
device status forums
mailing lists
IRC bugs
dev guide

Rockbox mail archive

Subject: Re: Various bug fixes and improvements - where to put?
From: Jens Arnold (
Date: 2004-03-02

Hello [IDC]Dragon

On 02.03.2004, you wrote:

>> as I have done various patches and improvements on different
>> low-level routines in Rockbox I would like to know where I
>> can put these in order to be revised and eventually
>> integrated into the main cvs.

> The patch tracker is the official way to post patches. I know
> it's difficult for the core developers to follow users'
> ...
> I'm very interested in your patches and like to see them
> there.

Ok, will file patches then (not now, has to wait until the
>> - corrected memory waste in "bitswap.s" and "descramble.s"
>> (simple .align issue)

> Wasting 2 bytes, or what? The alignment may be intentional,
> because the SH fetches two 16-bit instruction in one 32-bit
> bus cycle. The execution time may depend on this pairing.

Not 2 bytes, but more like 12 bytes (maximum) for each align.
I know well from the SH1 manual that memory load/store
instructions in IRAM should be long aligned in order to not
cause contention with instruction fetch. The problem is with the
syntax of the .align pseudo-instruction:
".align <n>" does not align to <n> bytes but to 2^<n> bytes,
so the ".align 4" which is used there aligns to 16 bytes and not
4. For longword alignment ".align 2" must be used.

This is _confirmed_ by assembler output of compiler (gcc -S
xxx.c), internet gcc tips and by comparing final Rockbox size.

>> - 20% faster bitswap

> How that? (OK, I'll wait for your patch posting)

You will see my new loop...

>> - new turbo-charged copy_read_sectors() in "ata.c", with
>> speed
>> relations as follows:

>> | word-aligned | unaligned
>> -------------+-------------------+--------------------
>> C original | 2.02 MB/s (100 %) | 1.71 MB/s (100 %)
>> [IDC]Dragon | 3.42 MB/s (169 %) | 2.22 MB/s (130 %)
>> new version | 4.76 MB/s (236 %) | 3.33 MB/s (195 %)

> Which "C original" is that? The current routines have already
> been optimized on C level by me (look into the cvs history of
> ata.c), so the gain could be even more drastic if you compare
> with the 2.1 code.

This is against _current_ "ata.c" code ("loop compiles to...)

> assembler code. Again. I'm interested in what you did. A bug
> (?) in the inline assembler prevented me from further
> unrolling the copy loop. I did not manage to do stores with an
> immediate offset, so I had to use r0 for that, and this is
> only good for one level of unrolling.

I _do_ actually use stores with immediate offsets! You have to
obey the constraints of these inctructions:

- the source register must always be r0
- the offset can _only_ be positive (no sign extension) and very
- with the inline assembler syntax, you have to give the offset
  already shifted according to data width, e.g. if you want to
  store a word 2 bytes ahead, you have to write
  "mov.w r0,@(2,r<n>)"
  If you use "mov.w r0,@(1,r<n>)" you get an assemble error.

>> - the take-out of cli() / sti() and replacement of these by
>> set_irq_level() made disk loading in the video player
>> extremely
>> slow.

>> I don't understand what the intention in replacing cli() /
>> sti() by set_irq_level() really is. The comment says that
>> cli() / sti() are not safe.

> I didn't understand this either, and it seems I have not yet
> compiled this into the version I'm running, else I may have
> noticed the drag. They should be inline functions.

> For the LCD code, we can take out the interrupt disable (like
> my comment says), I only put this in to be safe for the

I know your comment, but I wanted to retain your safety measure.
Please let me know what you think about my idea about nesting
cli() / sti().

Regards, Jens


Page was last modified "Jan 10 2012" The Rockbox Crew