Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category LCD
  • Assigned To No-one
  • Operating System PortalPlayer-based
  • Severity Low
  • Priority Very Low
  • Reported Version Daily build (which?)
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Andree Buschmann - 2007-10-14
Last edited by Jens Arnold - 2007-10-22

FS#7951 - Speed optimization for yuv-conversion

This patch speeds up the lcd_yuv_blit function for iPod Video a bit and should speed it up a lot for iPod nano, iPod color and iriver H10.

For the iPod Video two changes were made:
- use ICODE_ATTR for lcd_yuv_blit()-function
- simplify the boundary check for red/green/blue components a bit

Result for iPod Video via test_fps:
6.7 → 7.2 fps (full screen, 30MHz)
20.7 → 21.8 fps (quarter screen, 30MHz)

For all other players the optimized yuv-conversion was ported from iPod Video to the repespective LCD-drivers. This needs to be tested though as I do not have any of these players.

Please post your experiences/results here.

Thanks :)

Closed by  Jens Arnold
2007-10-22 00:39
Reason for closing:  Accepted
Additional comments about closing:  

Adapted and committed to SVN.

Andree Buschmann commented on 2007-10-15 19:54

Just to mention one thing: The asm-routines for the yuv-conversion which were written for sansa/gigabeat could also be optimized further. As I do not have the chance to test on such target I won't post any patches about this. But if you take a look at the patches above you will see that it's possible to sum up nearly all offsets which are part of e.g.

G = ( 74 * (Y' - 16) - 24 * (Cb - 128) - 51 * (Cr - 128) + 128 ) » 8

to

G = ( 74 * Y' - 24 * Cb - 51 * Cr + ROUNDOFFG ) » 8, whereas ROUNDOFFG = -74*16 + 24*128 + 51*128 + 128 = const

The asm-routines currently applies offsets to Y', Cb, Cr and before shifting. This can be reduced to one single offset before shifting for each of the components R, G and B.

Andree Buschmann commented on 2007-10-15 20:39

In detail for lcd-as-c200.S

following lines are obsolete:

  sub         r7, r7, #16             @ r7 = Y = (Y' - 16)*74
  sub         r8, r8, #128            @ Cb -= 128
  sub         r9, r9, #128            @ Cr -= 128

and lines

  add         r8, r8, #2              @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
  mov         r8, r8, asr #2          @
  add         r9, r9, #256            @ r9 = rv = (r9 + 256) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #128          @ r10 = guv = (-r10 + 128) >> 8
  mov         r10, r10, asr #8        @

should be replaced with something like this

  mov         r8, r8, asl #2          @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
  sub         r8, r8, #541            @
  mov         r8, r8, asr #4          @
  sub         r9, r9, #13856          @ r9 = rv = (r9 + ROUNDOFFR) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #8544         @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
  mov         r10, r10, asr #8        @
Andree Buschmann commented on 2007-10-15 20:43

In detail for lcd-as-c200.S

following lines are obsolete:

  sub         r7, r7, #16             @ r7 = Y = (Y' - 16)*74
  sub         r8, r8, #128            @ Cb -= 128
  sub         r9, r9, #128            @ Cr -= 128

and lines

  add         r8, r8, #2              @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
  mov         r8, r8, asr #2          @
  add         r9, r9, #256            @ r9 = rv = (r9 + 256) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #128          @ r10 = guv = (-r10 + 128) >> 8
  mov         r10, r10, asr #8        @

should be replaced with something like this

  mov         r8, r8, asl #2          @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
  sub         r8, r8, #541            @
  mov         r8, r8, asr #4          @
  sub         r9, r9, #13856          @ r9 = rv = (r9 + ROUNDOFFR) >> 9
  mov         r9, r9, asr #9          @
  rsb         r10, r10, #8544         @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
  mov         r10, r10, asr #8        @
Andree Buschmann commented on 2007-10-15 20:44

oops, double clicked.. sorry for any inconvenience :o)

Andree Buschmann commented on 2007-10-15 21:35

no guarantee, can't test it here…

Michael Sevakis commented on 2007-10-15 22:07

Constants can be no more than 8 bits wide on arm and rotateable by even numbers of bits. A one's compliment move is possible as well.

mov r0, #0xff : r0 = 255
or
mvn r0, #0xff : r0 = -256 (identity: -x = ~x + 1)
mvn r0, #0 : r0 = -1

mov r0, #0x1fe : not legal
mov r0, #0x3fc : ok
mov r0, #0xc0000003f: ok

Michael Sevakis commented on 2007-10-15 22:12

I forgot to mention that you _can_ load a value like 0x123456 but it's done like this:

ldr r0, =0x123456

Which is fine but this requires loading the constant from a memory location so to add 0x123456 to something you'd do:
ldr r0, =0x123456 : this comes from a "constant pool" somewhere near the function
add r1, r1, r0

Andree Buschmann commented on 2007-10-16 06:31

So, this one should be ok then.

Thom Johansen commented on 2007-10-16 11:17

Works for Nano, 2 FPS gain average for one file.

Thom Johansen commented on 2007-10-16 14:50

There are immediate plans (by amiconn) on porting these YUV blit functions to assembler, so don't spend too much time on optimising the C versions, btw :)

Andree Buschmann commented on 2007-10-16 16:32

preglow: heard about this in IRC yesterday – after working on my patch :/

Thom Johansen commented on 2007-10-16 19:47

Commited the Nano and Video ones, can't test the rest. Will close this when the ASM versions are commited.

Andree Buschmann commented on 2007-10-21 00:52

New version of patch with assembler optimization for iPod Video, all other targets were dropped as there seems to be an assembler optimization under development right now.

YUV-blit performance is increasing from 7.2 fps → 8.5 fps (30MHz) for fullscreen on the iPod Video.

I could not add the chroma_buf as I have an internal compiler error when start in working on this… Maybe someone can solve this?

Andree Buschmann commented on 2007-10-21 20:33

New version with chroma-buffer :o)

With this patch the following speed is reached:

fps YUV (full/quarter screen):
8.7 / 25.2 (30MHz), 19.1 / 42 (80MHz)

fps MPEGplayer (elephants dream):
128x96 128x128 160x96 160x128 176x96 176x128 224x128 224x176 320x176 320x240
47 41 43 37 41 35 30 24 17 13

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing