Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category Codecs
  • Assigned To
    Buschel
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version Release 3.4
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Buschel - 2010-06-13
Last edited by Buschel - 2010-06-15

FS#11398 - libmad: changes to mad_bit_read()

When reviewing the libmad code segment mad_bit_read() it became obvious that on LITTLE_ENDIAN targets the function betoh32() is called in each call of mad_bit_read(). This is not needed as the address from which the data is loaded does not change on each call of mad_bit_read(). So, we can buffer betoh32’ed data.

Patch v01 adds two global variables. The first holds the address, the second holds the data. If the address did not change in the current call of mad_bit_read(), the old – already betoh32’ed – data is used. In case of the need to load the new address the global variables are set accordingly.

Measurements on a sample file showed that buffered data is used in 2/3 of all calls. So, on LITTLE_ENDIAN targets 2/3 of betoh32()-calls are eliminated, on Coldfire targets it might be of interest to measure speed as well. Reason is that the buffered variables are located in IRAM. A very similar approach in libmpc sped up Coldfire by several MHz.
Strange thing is that the effect on arm7tdmi is (slightly) negative. Without patch 37.0 MHz (single core), with patch 37.15 MHz (single core).

Patch v02 gets rid of global variables and adds both variables to the struct mad_bitptr. Libmad’s code defined two different structs of this name! Therefore patch v02 needs to add dummy variables to the “other” struct as well to avoid overwriting of memory. I will check for a better solution which cleans up this mess…

Needed:
a) Tests on Coldfire targets!
b) Ideas why there is a slight slowdown on arm (which should be LITTLE_ENDIAN)?
c) Clean up of struct mess.

Closed by  Buschel
2010-06-15 06:34
Reason for closing:  Rejected
Additional comments about closing:  

Slower an arm and coldfire.

There were several double dedfined macros and structs in mad.h. This patch cleans up mad.h in addition the the buffering change.

Submitted the clean up of mad.h with r26838. Added a detaioed comment and updated patch.

nls commented on 2010-06-14 08:37

Hi, i'm not familiar with the mad code but wouldn't it make sense to byteswap that data in one go when it's loaded intsead of this?

nls, I was thinking of the same. Maybe this could be a future solution for LITTLE_ENDIAN targets. Nevertheless this change needs to be tested on Coldfire due to the DRAM-vs-IRAM accesses.

tested on MPIO HD200 (mcf5249)
virgin svn: 463.19% realtime 26.81MHz needed for realtime
libmad_v04.patch: 458.22% realtime 27.10MHz needed for realtime

I'm puzzled :/

nls commented on 2010-06-14 12:35

for coldfire it doesn't seem so surprising, iiuc your patch introduces an extra load from iram (bitptr→buffered_addr), a comparison and a conditional branch while replacing a load from dram with one from iram.

Ok. I expected at least some gain as DRAM access is sloooow on Coldfire… In mpc's bit fiddling there were 3 DRAM accesses per call instead of only 1. Maybe this made up the difference…

Just disassembled the resulting asm for betoh32() – it is only 4 cycles. When taking into account the added code and compare it to the unpatched code the break-even for arm is reached when (betoh32() + ldr from DRAM) is ~10 cycles. If it takes more cycles, the patch will speed up, otherwise it will slow down.

I'll close this entry.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing