Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category LCD
  • Assigned To No-one
  • Operating System PortalPlayer-based
  • Severity Low
  • Priority Very Low
  • Reported Version Daily build (which?)
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Buschel - 2007-10-14
Last edited by amiconn - 2007-10-22

FS#7951 - Speed optimization for yuv-conversion

This patch speeds up the lcd_yuv_blit function for iPod Video a bit and should speed it up a lot for iPod nano, iPod color and iriver H10.

For the iPod Video two changes were made:
- use ICODE_ATTR for lcd_yuv_blit()-function
- simplify the boundary check for red/green/blue components a bit

Result for iPod Video via test_fps:
6.7 → 7.2 fps (full screen, 30MHz)
20.7 → 21.8 fps (quarter screen, 30MHz)

For all other players the optimized yuv-conversion was ported from iPod Video to the repespective LCD-drivers. This needs to be tested though as I do not have any of these players.

Please post your experiences/results here.

Thanks :)

Closed by  amiconn
2007-10-22 00:39
Reason for closing:  Accepted
Additional comments about closing:  

Adapted and committed to SVN.

Just to mention one thing: The asm-routines for the yuv-conversion which were written for sansa/gigabeat could also be optimized further. As I do not have the chance to test on such target I won't post any patches about this. But if you take a look at the patches above you will see that it's possible to sum up nearly all offsets which are part of e.g.

G = ( 74 * (Y' - 16) - 24 * (Cb - 128) - 51 * (Cr - 128) + 128 ) » 8

to

G = ( 74 * Y' - 24 * Cb - 51 * Cr + ROUNDOFFG ) » 8, whereas ROUNDOFFG = -74*16 + 24*128 + 51*128 + 128 = const

The asm-routines currently applies offsets to Y', Cb, Cr and before shifting. This can be reduced to one single offset before shifting for each of the components R, G and B.

In detail for lcd-as-c200.S

following lines are obsolete:

  sub         r7, r7, #16             @ r7 = Y = (Y' - 16)*74
  sub         r8, r8, #128            @ Cb -= 128
  sub         r9, r9, #128            @ Cr -= 128

and lines

  add         r8, r8, #2              @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
  mov         r8, r8, asr #2          @
  add         r9, r9, #256            @ r9 = rv = (r9 + 256) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #128          @ r10 = guv = (-r10 + 128) >> 8
  mov         r10, r10, asr #8        @

should be replaced with something like this

  mov         r8, r8, asl #2          @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
  sub         r8, r8, #541            @
  mov         r8, r8, asr #4          @
  sub         r9, r9, #13856          @ r9 = rv = (r9 + ROUNDOFFR) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #8544         @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
  mov         r10, r10, asr #8        @

In detail for lcd-as-c200.S

following lines are obsolete:

  sub         r7, r7, #16             @ r7 = Y = (Y' - 16)*74
  sub         r8, r8, #128            @ Cb -= 128
  sub         r9, r9, #128            @ Cr -= 128

and lines

  add         r8, r8, #2              @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
  mov         r8, r8, asr #2          @
  add         r9, r9, #256            @ r9 = rv = (r9 + 256) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #128          @ r10 = guv = (-r10 + 128) >> 8
  mov         r10, r10, asr #8        @

should be replaced with something like this

  mov         r8, r8, asl #2          @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
  sub         r8, r8, #541            @
  mov         r8, r8, asr #4          @
  sub         r9, r9, #13856          @ r9 = rv = (r9 + ROUNDOFFR) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #8544         @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
  mov         r10, r10, asr #8        @

oops, double clicked.. sorry for any inconvenience :o)

no guarantee, can't test it here…

MikeS commented on 2007-10-15 22:07

Constants can be no more than 8 bits wide on arm and rotateable by even numbers of bits. A one's compliment move is possible as well.

mov r0, #0xff : r0 = 255
or
mvn r0, #0xff : r0 = -256 (identity: -x = ~x + 1)
mvn r0, #0 : r0 = -1

mov r0, #0x1fe : not legal
mov r0, #0x3fc : ok
mov r0, #0xc0000003f: ok

MikeS commented on 2007-10-15 22:12

I forgot to mention that you _can_ load a value like 0x123456 but it's done like this:

ldr r0, =0x123456

Which is fine but this requires loading the constant from a memory location so to add 0x123456 to something you'd do:
ldr r0, =0x123456 : this comes from a "constant pool" somewhere near the function
add r1, r1, r0

So, this one should be ok then.

Works for Nano, 2 FPS gain average for one file.

There are immediate plans (by amiconn) on porting these YUV blit functions to assembler, so don't spend too much time on optimising the C versions, btw :)

preglow: heard about this in IRC yesterday – after working on my patch :/

Commited the Nano and Video ones, can't test the rest. Will close this when the ASM versions are commited.

New version of patch with assembler optimization for iPod Video, all other targets were dropped as there seems to be an assembler optimization under development right now.

YUV-blit performance is increasing from 7.2 fps → 8.5 fps (30MHz) for fullscreen on the iPod Video.

I could not add the chroma_buf as I have an internal compiler error when start in working on this… Maybe someone can solve this?

New version with chroma-buffer :o)

With this patch the following speed is reached:

fps YUV (full/quarter screen):
8.7 / 25.2 (30MHz), 19.1 / 42 (80MHz)

fps MPEGplayer (elephants dream):
128x96 128x128 160x96 160x128 176x96 176x128 224x128 224x176 320x176 320x240
47 41 43 37 41 35 30 24 17 13

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing