This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#7951 - Speed optimization for yuv-conversion
Attached to Project:
Rockbox
Opened by Andree Buschmann (Buschel) - Sunday, 14 October 2007, 19:45 GMT+2
Last edited by Jens Arnold (amiconn) - Monday, 22 October 2007, 02:39 GMT+2
Opened by Andree Buschmann (Buschel) - Sunday, 14 October 2007, 19:45 GMT+2
Last edited by Jens Arnold (amiconn) - Monday, 22 October 2007, 02:39 GMT+2
|
DetailsThis patch speeds up the lcd_yuv_blit function for iPod Video a bit and should speed it up a lot for iPod nano, iPod color and iriver H10.
For the iPod Video two changes were made: - use ICODE_ATTR for lcd_yuv_blit()-function - simplify the boundary check for red/green/blue components a bit Result for iPod Video via test_fps: 6.7 -> 7.2 fps (full screen, 30MHz) 20.7 -> 21.8 fps (quarter screen, 30MHz) For all other players the optimized yuv-conversion was ported from iPod Video to the repespective LCD-drivers. This needs to be tested though as I do not have any of these players. Please post your experiences/results here. Thanks :) |
This task depends upon
Closed by Jens Arnold (amiconn)
Monday, 22 October 2007, 02:39 GMT+2
Reason for closing: Accepted
Additional comments about closing: Adapted and committed to SVN.
Monday, 22 October 2007, 02:39 GMT+2
Reason for closing: Accepted
Additional comments about closing: Adapted and committed to SVN.
G = ( 74 * (Y' - 16) - 24 * (Cb - 128) - 51 * (Cr - 128) + 128 ) >> 8
to
G = ( 74 * Y' - 24 * Cb - 51 * Cr + ROUNDOFFG ) >> 8, whereas ROUNDOFFG = -74*16 + 24*128 + 51*128 + 128 = const
The asm-routines currently applies offsets to Y', Cb, Cr and before shifting. This can be reduced to one single offset before shifting for each of the components R, G and B.
following lines are obsolete:
sub r7, r7, #16 @ r7 = Y = (Y' - 16)*74
sub r8, r8, #128 @ Cb -= 128
sub r9, r9, #128 @ Cr -= 128
and lines
add r8, r8, #2 @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
mov r8, r8, asr #2 @
add r9, r9, #256 @ r9 = rv = (r9 + 256) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #128 @ r10 = guv = (-r10 + 128) >> 8
mov r10, r10, asr #8 @
should be replaced with something like this
mov r8, r8, asl #2 @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
sub r8, r8, #541 @
mov r8, r8, asr #4 @
sub r9, r9, #13856 @ r9 = rv = (r9 + ROUNDOFFR) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #8544 @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
mov r10, r10, asr #8 @
following lines are obsolete:
sub r7, r7, #16 @ r7 = Y = (Y' - 16)*74
sub r8, r8, #128 @ Cb -= 128
sub r9, r9, #128 @ Cr -= 128
and lines
add r8, r8, #2 @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
mov r8, r8, asr #2 @
add r9, r9, #256 @ r9 = rv = (r9 + 256) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #128 @ r10 = guv = (-r10 + 128) >> 8
mov r10, r10, asr #8 @
should be replaced with something like this
mov r8, r8, asl #2 @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
sub r8, r8, #541 @
mov r8, r8, asr #4 @
sub r9, r9, #13856 @ r9 = rv = (r9 + ROUNDOFFR) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #8544 @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
mov r10, r10, asr #8 @
mov r0, #0xff : r0 = 255
or
mvn r0, #0xff : r0 = -256 (identity: -x = ~x + 1)
mvn r0, #0 : r0 = -1
mov r0, #0x1fe : not legal
mov r0, #0x3fc : ok
mov r0, #0xc0000003f: ok
ldr r0, =0x123456
Which is fine but this requires loading the constant from a memory location so to add 0x123456 to something you'd do:
ldr r0, =0x123456 : this comes from a "constant pool" somewhere near the function
add r1, r1, r0
YUV-blit performance is increasing from 7.2 fps -> 8.5 fps (30MHz) for fullscreen on the iPod Video.
I could not add the chroma_buf as I have an internal compiler error when start in working on this... Maybe someone can solve this?
With this patch the following speed is reached:
fps YUV (full/quarter screen):
8.7 / 25.2 (30MHz), 19.1 / 42 (80MHz)
fps MPEGplayer (elephants dream):
128x96 128x128 160x96 160x128 176x96 176x128 224x128 224x176 320x176 320x240
47 41 43 37 41 35 30 24 17 13