Rockbox

Tasklist

FS#12325 - e200v1 screen corruption after USB connection since r30475

Attached to Project: Rockbox
Opened by Michael Chicoine (mc2739) - Saturday, 08 October 2011, 23:56 GMT
Last edited by Thomas Martitz (kugel.) - Friday, 28 October 2011, 16:55 GMT
Task Type Bugs
Category Themes
Status Closed
Assigned To No-one
Operating System Sansa e200
Severity High
Priority Urgent
Reported Version Daily build (which?)
Due in Version Next release
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Since r30475, I am seeing screen corruption of the statusbar and/or backdrop on e200v1.

Reproduction procedure:

1. From power off, connect device to computer
2. Safely remove device and disconnect USB cable
3. If corruption does not happen immediately on USB disconnect, press a button or rotate the wheel
4. Attempting to enter the View buflib allocs debug screen at this point causes the device to hardlock

The corruption will also occur after connecting USB after device is already powered on, but it happens less often.

r30471 does not have this problem. r30475 up to current svn will fail using this procedure.
This task depends upon

Closed by  Thomas Martitz (kugel.)
Friday, 28 October 2011, 16:55 GMT
Reason for closing:  Fixed
Additional comments about closing:  r30845
Comment by Michael Chicoine (mc2739) - Saturday, 15 October 2011, 15:46 GMT
Here is a screendump of the screen corruption I am seeing.
Comment by Richard Brittain (rbbrittain) - Sunday, 16 October 2011, 21:36 GMT
With r30764 and theme ipodosha-v1.4 ( http://themes.rockbox.org/index.php?themeid=1416&target=sansae200 ), I get "Data abort at 000719B4 (0)" after step 2; this happens with both Vista host and Ubuntu 11.10 guest running in VirtualBox (VM configured to capture Rockbox USB). However, if new firmware is loaded during the USB session, the restart screen appears as usual; it restarts normally.

If theme is changed to cabbiev2, problem is reproduced exactly as Michael described (i.e., corruption occurs when wheel turned, locks up when "View buflib allocs" selected).

The rockbox.map file from a compile of r30764 shows the following at address 000719B4:
.text 0x00071820 0x300 /home/ubuntu/rockbox/build/firmware/libfirmware.a(font_cache.o)
0x0007182c search
0x000718e8 font_cache_get
0x00071a84 font_cache_create
Comment by Richard Brittain (rbbrittain) - Sunday, 16 October 2011, 22:41 GMT
The bug is almost certainly in either r30474 or more likely r30475; the other two patches in Michael's gap don't involve fonts. Before this bug font_cache.c hadn't been changed since 2008, so I'm sure its change in r30763 isn't an issue (even though rockbox.map points to that file).
Comment by Richard Brittain (rbbrittain) - Sunday, 16 October 2011, 22:46 GMT
Upon further review, the bug *must* be in r30475, the buflib-for-backdrops patch. (The other three patches all involved website code, i.e., build table and IRC viewer.)
Comment by Richard Brittain (rbbrittain) - Sunday, 16 October 2011, 22:58 GMT
I should note that I do *NOT* get this issue with ipodosha-v1.4 if the device is powered on before USB connection. (Haven't tested that with cabbiev2.)

Since r30475 is rather complex *and* both files have been changed since then for further buflib implementation, it can't be fixed by reverting r30475. A skin or font expert (like JD or Freddy) will need to look at it more closely.
Comment by Andree Buschmann (Buschel) - Monday, 17 October 2011, 04:59 GMT
Maybe connected to  FS#12310 .
Comment by Andree Buschmann (Buschel) - Friday, 21 October 2011, 10:32 GMT
This is not connected  FS#12310 .

Please re-test against latest svn r30812.
Comment by Richard Brittain (rbbrittain) - Friday, 21 October 2011, 10:51 GMT
With r30812, I get "Data abort at 0000C200 (0)" after safely detaching with ipodosha-v1.4. (With r30811 it was "Undefined instruction", but in a similar binary range.) With cabbiev2, screen corruption continues as before.
Comment by Andree Buschmann (Buschel) - Friday, 21 October 2011, 11:36 GMT
What happens if you just leave buflib_move_callback() with "return BUFLIB_CB_CANNOT_MOVE;"?
Comment by Richard Brittain (rbbrittain) - Friday, 21 October 2011, 13:15 GMT
I think you just went over my head, Buschel. I'm not sure what part of the code you're referring to; I'll still fairly new at this.
Comment by Andree Buschmann (Buschel) - Friday, 21 October 2011, 13:53 GMT
/trunk/apps/gui/skin_engine/skin_backdrops.c, function buflib_move_callback(). Just place "return BUFLIB_CB_CANNOT_MOVE;" as the first line of code into buflib_move_callback().
Comment by Michael Chicoine (mc2739) - Friday, 21 October 2011, 14:03 GMT
Buschel,

That does not help - I still get screen corruption after that change.
Comment by Thomas Martitz (kugel.) - Tuesday, 25 October 2011, 16:56 GMT
I'm unable to reproduce with SVN as of today.
Comment by Richard Brittain (rbbrittain) - Tuesday, 25 October 2011, 17:32 GMT
With r30835 I still get screen corruption with cabbiev2 after moving the scroll wheel. With ipodosha-v1.4, it locks up with "SWI at 0000C1D0 (0)" just after the main menu appears.

Remember, the issue only occurs reliably if the Sansa's power is *OFF* when you insert the USB cable (i.e., Sansa connector). Michael claims to have seen it sometimes with power on, but I've only seen it with power off.
Comment by Michael Chicoine (mc2739) - Wednesday, 26 October 2011, 01:00 GMT
kugel.,

Recent builds do not fail exactly the same as they did when this was first reported. I have an e260 that I use for testing (broken headphone jack) and it will no longer fail on r30835. This device failed consistently on earlier builds. My e280 still fails consistently on r30835. The main difference between the two devices is that the storage on the e260 is nearly empty and the storage on the e280 is approximately 80% full (about 2000 total files).
Comment by Michael Chicoine (mc2739) - Wednesday, 26 October 2011, 01:38 GMT
Further testing shows that this bug does not occur when dircache is disabled.
Comment by Thomas Martitz (kugel.) - Wednesday, 26 October 2011, 10:56 GMT
I hoped for a second that r30812 fixed it. But I could repro this one in the meanwhile :(
Comment by Thomas Martitz (kugel.) - Thursday, 27 October 2011, 06:53 GMT
This patch changes behavior on the sim so that all threads need to ACK the usb connection, like on target. (press U to enter simulated usb mode).

Hopefully that makes reproducing this in the sim possible. I haven't managed yet, though. Maybe you have more luck.
Comment by Michael Chicoine (mc2739) - Thursday, 27 October 2011, 16:05 GMT
kugel.,

I was not able to get the sim to fail using the usb-sim.patch

Also, using r30844 with default settings, I have found these additional quirks:

1. clear the backdrop (Settings->Theme Settings->Clear Backdrop) and follow test procedure - it does not fail
2. clear the backdrop and follow test procedure, then go to Files->.rockbox->backdrops and highlight cabbiev2.bmp, access context menu and select Set As Backdrop - device hard locks
3. after clean power-up with no backdrop, go to Files->.rockbox->backdrops and highlight cabbiev2.bmp, access context menu and select Set As Backdrop - the backdrop is not displayed, but after power cycle it is displayed
4. I have not been able to reproduce the problem by connecting USB after device is powered up. I'm wondering now if I ever saw the corruption when connecting after power-up, or if some other change has helped in this situation
Comment by Richard Brittain (rbbrittain) - Thursday, 27 October 2011, 16:46 GMT
Using r30835 with ipodosha-v1.4 backdrop (ipodosha-v1.0.bmp) cleared, #1 still fails with same error message as before ("SWI at 0000C1D0 (0)"); thus I can't get to #2. With cabbiev2, #1 and #2 reproduce exactly as Michael described.

#3 reproduces as Michael described with both skins; in addition, it reproduces if you clear the backdrop & then try to set it again manually without powering down. (That's true whether the backdrop matches the skin or not.) However, if you clear the backdrop and then select a theme (same or different), the theme's backdrop loads properly whether you power-cycle or not.

Of course, #4 has been my experience all along; for me the issue *only* appeared when my e280 was powered up by USB connection.
Comment by Richard Brittain (rbbrittain) - Thursday, 27 October 2011, 16:49 GMT
Clarification: With the ipodosha-v1.4 skin *and* its backdrop cleared, #1 still fails with same error message.
Comment by Michael Chicoine (mc2739) - Thursday, 27 October 2011, 18:09 GMT
I finally did get the win32 sim to fail with screen corruption. This happened after increasing the number of files on the simdisk.

The failure happened after the second simulated USB connection after starting the sim. Here is the console log at the time of the failure:

USB inserted. Waiting for 4632022 acks...
usb: got ack, 4632021 to go...
usb: got ack, 4632020 to go...
usb: got ack, 4632019 to go...
usb: got ack, 4632018 to go...
usb: got ack, 4632017 to go...
usb: got ack, 4632016 to go...
usb: got ack, 4632015 to go...
USB inserted. Waiting for 6 acks...
usb: got ack, 5 to go...
usb: got ack, 4 to go...
usb: got ack, 3 to go...
usb: got ack, 2 to go...
usb: got ack, 1 to go...
All threads have acknowledged the connect.
usb: got ack, 0 to go...

Subsequent tests would fail after various connection attempts, I did get a failure on the first connection, but sometimes up to 5 connections were needed to cause the failure.

Here is some info of the test setup:

1. win32 e200 sim r30844 running on win7 32 bit
2. There is a symlink to my music library in the symdisk folder - library is approx. 21.2GB, 5822 files, 1033 folders

After I disabled dircache, I was not able to reproduce the failure.
Comment by Fred Bauer (freddyb) - Friday, 28 October 2011, 04:30 GMT
Do fd_bindings[] or append_position need to have their pointers updated in dircache.c:move_callback()?
Comment by Thomas Martitz (kugel.) - Friday, 28 October 2011, 06:32 GMT
This should fix the screen corruption.

I was able to reproduce on the sim using my above patch. The key is to be really fast pressung U, since dircache scan is rather quickly done on a PC (with my SSD even more so).

The problem happens when the USB insertion interrupt the initial dircache scan. The scan is aborted, and the buffer is freed. However, allocated_size isn't set to 0, which makes the re-start of the scan take the first code path in dircache_build() even though no allocation exists.
Comment by Michael Chicoine (mc2739) - Friday, 28 October 2011, 11:38 GMT
kugel.,

dircache.c.vc.diff does seem to correct the screen corruption problems.

The problem with setting a backdrop and it not being displayed until reboot still occurs, but is probably not related - I'll open a new bug report for this.
Comment by Fred Bauer (freddyb) - Friday, 28 October 2011, 14:39 GMT
Here is some debugging code that shows fd_bindings[] and append_position point inside the buflib_allocation.
Comment by Richard Brittain (rbbrittain) - Friday, 28 October 2011, 15:14 GMT
Kugel & Freddy, it seems your patches change completely different parts of dircache.c. Are they compatible with each other? Should both be applied?
Comment by Richard Brittain (rbbrittain) - Friday, 28 October 2011, 16:03 GMT
Michael's patch (alone) worked as he said. Freddy's patch (alone) produced various warnings & errors:

/home/ubuntu/rockbox/firmware/common/dircache.c: In function ‘move_callback’:
/home/ubuntu/rockbox/firmware/common/dircache.c:153: warning: implicit declaration of function ‘_logf’
* * *
/home/ubuntu/rockbox/build/firmware/libfirmware.a(dircache.o): In function `move_callback':
dircache.c:(.text+0x15f0): undefined reference to `_logf'
dircache.c:(.text+0x15fc): undefined reference to `_logf'
dircache.c:(.text+0x1608): undefined reference to `_logf'
dircache.c:(.text+0x1614): undefined reference to `_logf'
dircache.c:(.text+0x1624): undefined reference to `_logf'
/home/ubuntu/rockbox/build/firmware/libfirmware.a(dircache.o):dircache.c:(.text+0x1704): more undefined references to `_logf' follow
collect2: ld returned 1 exit status
make: *** [/home/ubuntu/rockbox/build/rockbox.elf] Error 1

Comment by Fred Bauer (freddyb) - Friday, 28 October 2011, 16:05 GMT
Richard, my patch is for debugging not production but you should be able to add them both if you want to. You might have to reconfigure with logf enabled to be able to check the log file and there will only be output if the dircache memory gets moved. It does update some extra pointers so maybe it will fix something.
Comment by Richard Brittain (rbbrittain) - Friday, 28 October 2011, 16:19 GMT
I think I'm sticking with Kugel's patch for now; it seems to resolve all but one minor issue--i.e., new backdrops set in the filetree don't load until reboot. (This happens whether you clear the backdrop first or not.) I don't think that's a showstopper; backdrops in new themes load just fine.

IMO Kugel's patch should be committed for 3.10; the remaining issue should continue in another bug report (as Michael suggested) for more work after 3.10.
Comment by Fred Bauer (freddyb) - Friday, 28 October 2011, 16:50 GMT
Ok. If you get curious here's a patch without the logf stuff so you can try without reconfiguring.
Comment by Thomas Martitz (kugel.) - Friday, 28 October 2011, 16:55 GMT
I'll close this as the bug is fixed. Fred, please open a new report or let's discuss on ML/irc (preferably before the freeze ends :) )

Loading...