Rockbox

Tasklist

FS#12429 - Fuze+: playback failures (data aborts/undef instr/etc) with several codecs

Attached to Project: Rockbox
Opened by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 17:29 GMT
Last edited by Andree Buschmann (Buschel) - Sunday, 11 December 2011, 16:34 GMT
Task Type Bugs
Category Music playback
Status Closed
Assigned To No-one
Operating System Another
Severity High
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Several problem occurs with those format on Fuze+'s port. The problem has been identified as player specific as it doesn't occurs on clip+ and has been seen on 2 differents Fuze+ unit.

The problem are quite random which lead to difficulties to track them down precisely. Nevertheless, 3 differents phases can be observe regarding the behaviour of the fuze+ and mpc files:

- Normal behaviour (files can be selected, play, playlist also play without problem)
- Non-playing behauviour (files are skipped one after the other till the end of the dynamic play list) -- Nothing consistent was found on how to go from normal into non-playing behaviour or reverse.
- Buggy behaviour (you can actually play and skip between files but after playing one file, the player will just hang stucked or go into panic at the moment to load the next one:
Two different value set were observed for now (backtrace remains the same):
Data abort at 63E57B28
FSR 0x8
(domain 0, fault 8)
address 0x64000000
backtrace start
A: 0x63E56AE8
A: 0x60040D28
A: 0x6003D87C
A: 0x6003DB50
backtrace end

or:

Data abort at 63E57B24
FSR 0x8
(domain 0, fault 8)
address 0x64000001
backtrace start
A: 0x63E56AE8
A: 0x60040D28
A: 0x6003D87C
A: 0x6003DB50
backtrace end

Several tests were made so far to establish that:

1) The problem is not files specific as normal behaviour and non-playing and buggy behaviour occurs on the same sets of files. Differents tag coding and even wiping all the tag away didn't help either.

2) The problem is not related to database as not building it and/or erasing all DB files doesn't wipe out the buggy behaviour.

3) The problem ist not be related to specific settings or if so, it is related to defaut settings. Recompiling, reinstalling rockbox from scratch doesn't solve the buggy behaviour

4) Nothing consistent was found about how the player goes from normal to non-playing or reverse. However some consistent (systematical) way to go from non-playing to buggy phase was found:
- Reinitialize the database (wether if there are no DB or an already present up-to-date one). But the player will then be back on non-playing phases after reboot.
- Playing a big directory with a lot of files after skipping 20 to 50 of then the player will eventually start playing one of then (a random one nothing consistent here to)
- desactivating the directory's cache option (settings > system > disk) this way is the only way to remains even after reboot.

5) the backtrace of panic occuring in buggy behaviour was given into the ./utils/analisys/find_address.pl tool with the following result:
https://gist.github.com/1434466

6) Flac format as been seen with the same non-playing behviour. But buggy phases gave no backtrace:
Data abort at 63E71120
FSR 0x1
(domain 0, fault 1)
address 0x00004D6B
This task depends upon

Closed by  Andree Buschmann (Buschel)
Sunday, 11 December 2011, 16:34 GMT
Reason for closing:  Fixed
Additional comments about closing:  The MPC sv7 crash is fixed with r31211, flac crash is fixed with r31207 (enlarged MAX_FRAMESIZE).

The undefined instruction and the dircache influence will be handled in separate tasks.
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 17:32 GMT
find_address 0x63E71120 1 (flac's data abort) returns:
https://gist.github.com/1434486
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 17:42 GMT
the revision for all those test is 31145.
Comment by Andree Buschmann (Buschel) - Monday, 05 December 2011, 17:52 GMT
Yesterday evening I had a lockup issue several times with my iPod nano 2G as well. When playing back mpc and changing the volume the device stopped playback, the time position was stuck, no pause was signalled. Only a restart solved the problem. This was with r30907 (an not with r31055 as calimed before I edited this comment).

Do you use other formats as well? To me this does not sound like a mpc or flac related issue, but like a general code playback issue.


Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 18:54 GMT
I'm also using mp3 and ogg (quality 8 and 7) and never had the problem with them. The volume doesn't seams related to this problem (it always happens at the end or beginning of the file). The principal issue is the skipping file and the panic when they finally get played. Did you experienced this?
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 18:58 GMT
but I agree it is not related directly to codec for this issue doesn't seems to appear on other device... The backtrace is clearly pointing the codec but there must be other problem there...
Comment by Andree Buschmann (Buschel) - Monday, 05 December 2011, 19:29 GMT
I did not experience issues after skipping yet. Such problems could point to buffering... Another guess: I know that mpc is not very error resilient. If the data is corrupted in RAM (e.g. some overwritten or incomplete data segment which holds audio file data) mpc might crash. mp3 is more stable, I cannot judge ogg and flac though...
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 19:39 GMT
I'm not experienced yet enough with programming to know about memory but there are some clue that could point to a problem in memory handling. I have sometime an issue even with mp3 and ogg: "undefined instruction" while reaching the end of a song. Further more the fact that Database is not related to the problem but that initializing it has some effect on it is quite strange!
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 19:52 GMT
And I noticed today that mpc play okay in a little directory but will and with data abord in a big directory full of files (2,6 GB - 313 files)
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 19:54 GMT
Arf just to be clear they play ok sometime in a little folder - then we are back again in buggy or skippy mode. But the fact the problem are different in big and little folder is also a clue to memory, isn't it?
Comment by Andree Buschmann (Buschel) - Monday, 05 December 2011, 20:04 GMT
Do you have those issues with both enabled and disabled directory cache? For testing please change this setting and perform a clean restart.
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 22:38 GMT
as written in the bug description, desactiving directory cache get rid of the problem of file skipping but not of the data abort
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Monday, 05 December 2011, 22:46 GMT
someone on the forum thread of the fuze+'s port seems to have the same issue with "m4a (aac hev2, encoded by Nero AAC, two-pass mode), and ogg/vorbis (encoded by the version of aoTuv included in the latest FreAc)"
Comment by Andree Buschmann (Buschel) - Tuesday, 06 December 2011, 07:18 GMT
Can you place a link to the relevant forum thread?
Comment by Boris Gjenero (dreamlayers) - Tuesday, 06 December 2011, 15:53 GMT
It would be nice to have .elf files, but I guess the MPC codec I built from r31155 matches the reported addresses.

The 0x63E57B28 and 0x63E57B24 data aborts are where values are loaded from r->buff[] in mpc_bits_read() at:
ret = (r->buff[0] | (r->buff[-1] << 8)) >> r->count;
It is found in mpc_bits_reader.h and inlined right after the first mpc_demux_fill() call in the d->si.stream_version <= 7 branch of mpc_demux_decode_inner(), which is inlined within mpc_demux_decode(). The start of the backtrace is the call of mpc_demux_decode() from codec_run(). It's interesting that the failing data accesses are just past the end of RAM.

The FLAC data abort at 0x63E71120 is weird. It is past the end of .bss, and it doesn't seem like FLAC uses codec_malloc().
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Tuesday, 06 December 2011, 16:15 GMT
Regarding mpc I can confirm that the bug seems only to appear on sv7 version. So far sv8 seems to be out of this. Regarding the flac one, it's also weird that it leave no backtrace. As I said the problem is also that the bug is not always here... For the moment I don't have problem with flac anymore... I will try different conpression to be sure...
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Tuesday, 06 December 2011, 16:21 GMT
oh and another thing I noticed was that when a bug occurs on mpc it SEEMS (i'm not absolutly sure) that reaching the end of the file, the progress bar should show more free space for the count down to reach its end...
Another thing is that when rather than going into panic the player just hang, although the bug occurs at the end of the file, it shows a WPS with the progress bar on the 1/6 of the file.
For a link to the forum : http://forums.rockbox.org/index.php/topic,26284.msg186784.html#msg186784 and following thread of the same user.
Comment by Boris Gjenero (dreamlayers) - Tuesday, 06 December 2011, 17:25 GMT
> Regarding the flac one, it's also weird that it leave no backtrace.

The backtrace algorithm examines both the stack and code. If the code jumps somewhere crazy, I don't expect to get a backtrace. There, FLAC is executing code from past the end of .bss. I don't see anything that would put code there, so it seems PC shouldn't be in that area. There is probably some memory corruption. (BTW. Stack corruption can also cause backtrace to fail.)
Comment by Thomas Martitz (kugel.) - Wednesday, 07 December 2011, 09:52 GMT
Why is this high severity? It only affects the unsupported fuze+ target, no?
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Wednesday, 07 December 2011, 15:44 GMT
Ok the skippping bug (files getting skipped one after the other) is back, and it does affect sv8 also! I can confirm that desctiving directory cache get rid of it and reactiving directory cache bring it back. No idea why it cames back suddenly but it was just after reinstallation. Pehraps I should open a different bug report for this skipping behavious for it doesn't seems related?
Comment by Andree Buschmann (Buschel) - Wednesday, 07 December 2011, 17:03 GMT
> Why is this high severity? It only affects the unsupported fuze+ target, no?
Severity is high, because the impact of the failure itself is high. We could lower the priority of course.
Comment by Jean-Louis Biasini (JeanLouisBiasini) - Saturday, 10 December 2011, 14:42 GMT
I have experienced the undefined instruction that come very often with the new backtrace's patch:
Undefined instruction at 0000AC44
backtrace start
pc: 0x0000AC44
sp: 0x00005350
backtrace end

find_addr.pl returns:
jean-louis@debian:~/Bureau/rockbox-devtree/rockbox/buidl$ ../utils/analysis/find_addr.pl 0x0000AC44 1
/home/jean-louis/Bureau/rockbox-devtree/rockbox/buidl/firmware/libfirmware.a(thread.o) -> threads

jean-louis@debian:~/Bureau/rockbox-devtree/rockbox/buidl$ ../utils/analysis/find_addr.pl 0x00005350 1
/home/jean-louis/Bureau/rockbox-devtree/rockbox/buidl/apps/codec_thread.o ->
Comment by Andree Buschmann (Buschel) - Sunday, 11 December 2011, 16:14 GMT
Short update on this: The MPC sv7 crash is fixed with r31211, flac crash is fixed with r31207 (enlarged MAX_FRAMESIZE).

Loading...