This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#11877 - Filesystem corruption after entering USB mode
Attached to Project:
Rockbox
Opened by sideral (sideral) - Friday, 14 January 2011, 08:59 GMT+2
Last edited by sideral (sideral) - Saturday, 09 April 2011, 22:46 GMT+2
Opened by sideral (sideral) - Friday, 14 January 2011, 08:59 GMT+2
Last edited by sideral (sideral) - Saturday, 09 April 2011, 22:46 GMT+2
|
DetailsVarious users on IRC recently reported cases of filesystem corruption. I have seen them as well, especially when using USB mode (I'm using a Sansa ClipV2, USB-enabled through
One strangeness I've experienced is that when my system has detected filesystem corruption, it is often enough to remove the player from USB and then replug it, without repairing the filesystem and without rebooting Rockbox; on the next USB mount, the filesystem is clean. This might indicate that some buffer is flushed too late for the USB host to see. [EDIT: Removed previously attached patch, which is now tracked separately as FS#11883.] |
This task depends upon
We need another (more manual) way for having disk_unmount_all called on all paths that might expose the disk to the outside world has to be achieved; I'll come up with a patch soon.
In the meantime, I've done a fair bit of debugging. It is true that some files still remain open when usb_slave_mode is entered, most notably, the current font and the currently playing audio file. This should be fixed. However, these files are read only and do not lead to dirty buffers in memory.
Instead, I've found a much scarier problem: Rockbox sometimes exposes a corrupt FAT on my ClipV2, in which some 4-KiB aligned bytes are replaced with the constant 0x7d. Rebooting to the OF (or sometimes just reconnecting the player) makes these phantom corruptions vanish. My current theory is that this is a caching problem that's triggered by the zero-copy optimization in the SD-card driver. I'm now running with the attached patch to see whether disabling this optimization might help.
I've had another look at the FAT corruptions (of both the first and the second incident) that are exposed over USB, but aren't actually on the disk (unless written out as part of a FAT update). I have noticed some more patterns:
* The byte offsets on disk at which the 1-byte corruptions appear are always 4-KiB (0x1000) aligned, but only comprise hex offsets ending in *[01234567]000 (that is, there's no hex offset ending in *8000, for example).
* In both incidents, about 3000 of the two FATs' 14.7 MiB were damaged.
* The corruptions a single-byte only, and are always one of four byte values: (in order of number of occurrences) 0x7d, 0x7e, and 0xfd, and 0xfe.
* In each of case of byte corruption, the preceeding valid 20 bytes have a pair of either 0x7d,0x7e or 0xfd,0xfe bytes, spaced 4 bytes away.
I've googled for the bad hex values, and apparently at least 0x7d and 0x7e are use as escape bytes in some part of low-level USB signaling, which I find quite suspicious. This leads me to suspect that the unstable USB support for my ClipV2 (enabled by way of
FS#11664) is to blame.I have verified that the SD driver reads correct data from the disk, so it's likely that the data was corrupted on its way out through USB.
Looking at the pattern of USB-controller resets in my logf output, I theorized that there is a race in the USB-reset path in the AMSv2 USB driver. I tried disabling the USB hard-reset patch (
FS#11664), but I could still observe the data corruption. Then I tried using a cancel_all_transfers(true) in place of the controller reset, which seemed to work around the corruption issue.Finally, jhMikeS and pamaury found a fixed a number of synchronization issues in the USB driver (29129 & r29130). These may have been the root cause of the corruption issue, as well as other issues with the AMSv2 USB driver. I haven't seen any data corruption yet since adopting these changes. I'll test them some more over the next few days.
Really though, hopefully it's sound now.
While transferrs themselves seems to have been trouble-free for me, there are still some things with it asking for a high speed port (usually after the 3rd plug) and after replacing the main rockbox binary, the entire theme disappears, showing only the default "safe" theme and it ends up in the main menu instead of showing the yes/no screen to restart.
This patch at least gets me an SD mount every other time I reinsert the card while connected. It's better than the freezing when one TRAN wait went into an infinite loop just before doing transfers from memory.
BTW, I added a setting of the disconnect bit in the driver which seemed to cure my speed warnings. It appears it works much like the ARC controller where taking it out of the run state makes it turn off the pullups and makes the module safe to clock gate. If that isn't done, the whole thing is left in an undefined state with the D+/- lines set at who knows what. ETA:done in r29149
The patch in MikeS' latest comment probably should be tracked elsewhere so that we can close this task.
Some recent revisions with which I have observed the bug:
* r29679 on Clip+
* r29583 on Clip+
* r29506 on ClipV2
There aren't many recent changes to the USB code that might have caused this. One theory is that r29492 is causing this, as it plays with synchronization and includes changes to USB driver. I'll attempt to back that change out and will report back.
* The AMSv2 variant of my Clip+ is 0.
* It doesn't make a difference whether an SD card has been inserted in the slot or not.
* Occasionally, the Clip+ panics with this message: "usb-drv: EP0 completion while waiting for SETUP"