Rockbox

Tasklist

FS#11877 - Filesystem corruption after entering USB mode

Attached to Project: Rockbox
Opened by sideral (sideral) - Friday, 14 January 2011, 07:59 GMT
Last edited by sideral (sideral) - Saturday, 09 April 2011, 20:46 GMT
Task Type Bugs
Category Operating System/Drivers
Status New   Reopened
Assigned To No-one
Operating System Sansa AMSv2
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

Various users on IRC recently reported cases of filesystem corruption. I have seen them as well, especially when using USB mode (I'm using a Sansa ClipV2, USB-enabled through  FS#11664 ).

One strangeness I've experienced is that when my system has detected filesystem corruption, it is often enough to remove the player from USB and then replug it, without repairing the filesystem and without rebooting Rockbox; on the next USB mount, the filesystem is clean. This might indicate that some buffer is flushed too late for the USB host to see.

[EDIT: Removed previously attached patch, which is now tracked separately as FS#11883.]
This task depends upon

Comment by sideral (sideral) - Saturday, 15 January 2011, 00:06 GMT
After an IRC discussion, it became rather obvious that moving that moving disk unmounting/mounting away from usb_slave_mode is a rather dumb idea: Doing it in system_flush/restore breaks the well defined order of doing unmounts last and mounts first.

We need another (more manual) way for having disk_unmount_all called on all paths that might expose the disk to the outside world has to be achieved; I'll come up with a patch soon.

In the meantime, I've done a fair bit of debugging. It is true that some files still remain open when usb_slave_mode is entered, most notably, the current font and the currently playing audio file. This should be fixed. However, these files are read only and do not lead to dirty buffers in memory.

Instead, I've found a much scarier problem: Rockbox sometimes exposes a corrupt FAT on my ClipV2, in which some 4-KiB aligned bytes are replaced with the constant 0x7d. Rebooting to the OF (or sometimes just reconnecting the player) makes these phantom corruptions vanish. My current theory is that this is a caching problem that's triggered by the zero-copy optimization in the SD-card driver. I'm now running with the attached patch to see whether disabling this optimization might help.
Comment by sideral (sideral) - Sunday, 16 January 2011, 22:25 GMT
I've split out the clean-disk-unmount patch into a separate item, FS#11883, as it appears to be unrelated to this issue.
Comment by sideral (sideral) - Sunday, 16 January 2011, 23:40 GMT
In the meantime I've had one more occurrence of this problem, with similar symptoms as described in yesterday's comment. So disabling the SD driver's zero-copying optimization does not seem to have helped.

I've had another look at the FAT corruptions (of both the first and the second incident) that are exposed over USB, but aren't actually on the disk (unless written out as part of a FAT update). I have noticed some more patterns:

* The byte offsets on disk at which the 1-byte corruptions appear are always 4-KiB (0x1000) aligned, but only comprise hex offsets ending in *[01234567]000 (that is, there's no hex offset ending in *8000, for example).

* In both incidents, about 3000 of the two FATs' 14.7 MiB were damaged.

* The corruptions a single-byte only, and are always one of four byte values: (in order of number of occurrences) 0x7d, 0x7e, and 0xfd, and 0xfe.

* In each of case of byte corruption, the preceeding valid 20 bytes have a pair of either 0x7d,0x7e or 0xfd,0xfe bytes, spaced 4 bytes away.

I've googled for the bad hex values, and apparently at least 0x7d and 0x7e are use as escape bytes in some part of low-level USB signaling, which I find quite suspicious. This leads me to suspect that the unstable USB support for my ClipV2 (enabled by way of  FS#11664 ) is to blame.
Comment by sideral (sideral) - Monday, 24 January 2011, 21:13 GMT
Update:

I have verified that the SD driver reads correct data from the disk, so it's likely that the data was corrupted on its way out through USB.

Looking at the pattern of USB-controller resets in my logf output, I theorized that there is a race in the USB-reset path in the AMSv2 USB driver. I tried disabling the USB hard-reset patch ( FS#11664 ), but I could still observe the data corruption. Then I tried using a cancel_all_transfers(true) in place of the controller reset, which seemed to work around the corruption issue.

Finally, jhMikeS and pamaury found a fixed a number of synchronization issues in the USB driver (29129 & r29130). These may have been the root cause of the corruption issue, as well as other issues with the AMSv2 USB driver. I haven't seen any data corruption yet since adopting these changes. I'll test them some more over the next few days.
Comment by Michael Sevakis (MikeS) - Tuesday, 25 January 2011, 13:43 GMT
I iz teh rockbox concurrency cop.

Really though, hopefully it's sound now.

While transferrs themselves seems to have been trouble-free for me, there are still some things with it asking for a high speed port (usually after the 3rd plug) and after replacing the main rockbox binary, the entire theme disappears, showing only the default "safe" theme and it ends up in the main menu instead of showing the yes/no screen to restart.
Comment by Michael Sevakis (MikeS) - Friday, 28 January 2011, 04:38 GMT
I don't think this needs a separate task.

This patch at least gets me an SD mount every other time I reinsert the card while connected. It's better than the freezing when one TRAN wait went into an infinite loop just before doing transfers from memory.

BTW, I added a setting of the disconnect bit in the driver which seemed to cure my speed warnings. It appears it works much like the ARC controller where taking it out of the run state makes it turn off the pullups and makes the module safe to clock gate. If that isn't done, the whole thing is left in an undefined state with the D+/- lines set at who knows what. ETA:done in r29149
Comment by sideral (sideral) - Saturday, 29 January 2011, 14:56 GMT
I think it's now safe to say that the original bug (USB data corruption) has been squashed with r29129.

The patch in MikeS' latest comment probably should be tracked elsewhere so that we can close this task.
Comment by Michael Sevakis (MikeS) - Saturday, 29 January 2011, 17:05 GMT
The patch is obsolete. The problem was identified and was a stale data issue, not a hardware one. r29169 makes some changes, treading cautiously, and the commit message explains.
Comment by sideral (sideral) - Saturday, 09 April 2011, 20:46 GMT
  • Field changed: Details
  • Field changed: Status (Unconfirmed → New)
  • Field changed: Operating System (All players → Sansa AMSv2)
I have reopened this task because I can now again reproduce this problem (more or less reliably) on my Clip+. I have also observed this problem once on my ClipV2, but otherwise USB has been very stable for me on the ClipV2. (Others on IRC have reported stable USB on the FuzeV2 as well.)

Some recent revisions with which I have observed the bug:
* r29679 on Clip+
* r29583 on Clip+
* r29506 on ClipV2

There aren't many recent changes to the USB code that might have caused this. One theory is that r29492 is causing this, as it plays with synchronization and includes changes to USB driver. I'll attempt to back that change out and will report back.
Comment by sideral (sideral) - Saturday, 09 April 2011, 21:45 GMT
No, backing out r29492 does not fix the problem. So, it looks like the original fix, r29129 & r29130, did not manage to fix the problem completely.
Comment by sideral (sideral) - Monday, 18 April 2011, 22:53 GMT
Some additional observations (partially requested by pamaury):
* The AMSv2 variant of my Clip+ is 0.
* It doesn't make a difference whether an SD card has been inserted in the slot or not.
* Occasionally, the Clip+ panics with this message: "usb-drv: EP0 completion while waiting for SETUP"

Loading...