FS#8075 - 5G LCD speed up

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Friday, 02 November 2007, 21:44 GMT
Last edited by Jens Arnold (amiconn) - Monday, 26 November 2007, 23:48 GMT
Task Type Patches
Category LCD
Status Closed
Assigned To No-one
Operating System iPod 5G
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


As I am not sure if my last patch was closed by accident here the current update of the LCD-optimization for LCD/YUV based upon amiconns rework.

- include outer loop (--height) into asm routine for lcd_update_rect()
- include two line updates into a single asm routine for lcd_yuv_blit()
- use 32bit-access for chroma buffer, needs more space but is a lot faster through usage of stm/ldm

Results (@80MHz):
LCD 1/1 screen: 101fps (same)
LCD 1/4 screen: 399fps (+2%)
YUV 1/1 screen: 28.5fps (+4%)
YUV 1/4 screen: 112.5fps (+4%)
This task depends upon

Closed by  Jens Arnold (amiconn)
Monday, 26 November 2007, 23:48 GMT
Reason for closing:  Accepted
Comment by Andree Buschmann (Buschel) - Saturday, 03 November 2007, 07:50 GMT
Needed to correct one issue which caused possible crash for full width videos (but not the test_fps, what is kind of strange).

- calculate the correct length of chroma buffer (width/2 * 3, and not width*3). btw, the buffer length is also false in trunk.

This one was tested on all available resolutions of elephants dream as well as via test_fps.
Comment by Andree Buschmann (Buschel) - Saturday, 03 November 2007, 10:14 GMT
Just the next speed-up for lcd_yuv_blit().

- use pixel packing for each loop (2 pixels) and write them with one single str

Results (@80MHz) vs. trunk:
YUV 1/1 screen: 29.5fps (+8%)
YUV 1/4 screen: 117fps (+8%)
Comment by Andree Buschmann (Buschel) - Monday, 19 November 2007, 12:07 GMT
Just a new idea: As we now have the ability to set the destination address we could now totally drop the chroma buffer and write all 4 pixels. 2 pixels for first line, then update destination address and do the 2 pixels for the next line -- when doing so there is no need to save the chroma bytes anympore. I'll try to change the code :o)
Comment by Andree Buschmann (Buschel) - Wednesday, 21 November 2007, 07:12 GMT
Dropped this idea. We may save 1x stmia (3*2+1) and 1x ldmia (3*2+1) per 4 pixels. But instead we need to add at least 2x str + 2x ldrh + several mov's for setting the LCD-registers as there are not enough ARM-registers left...
Comment by Andree Buschmann (Buschel) - Monday, 26 November 2007, 16:37 GMT
Put some more details in the comments -- especially corrected the YUV-conversion formula.
Comment by Andree Buschmann (Buschel) - Monday, 26 November 2007, 20:45 GMT
Removed the changes for lcd_update_rect() and kept the changes for lcd_yuv_blit() as discussed in IRC.

Speed of YUV-blit with this patch is ~+8% vs. svn:
30MHz -> 1/1: 11.0fps / 1:4: 43.5fps
80MHz -> 1/1: 29.5fps / 1:4: 117.0fps