Rockbox

Tasklist

FS#12093 - Playback hanging after codec/playback rework

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Tuesday, 03 May 2011, 06:04 GMT
Last edited by Michael Sevakis (MikeS) - Monday, 09 May 2011, 21:20 GMT
Task Type Bugs
Category Music playback
Status Closed
Assigned To No-one
Operating System iPod Nano 2G
Severity High
Priority Normal
Reported Version Release 3.8.1
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

I experienced several hangups during playback in the middle of tracks. Those hangups appeared nearly each 1h of playback. Seek fwd/rwd or play/pause does not solve the issue. Skipping forward lets the next track play normally. The track that formerly hung up can be played normally after a skip.

Player: iPod nano 2G
Tracks: mpc / mp3

Hangups occured with mpc only, but that might just be an effect of my music collection which mostly consists of mpc.

Thread states:
codec *R 16 16 11% (=> always *R, not alternating *R/T)
audio T 15 15 57%
buffering T 15 15 25%

I will attach the .config file later.
This task depends upon

Closed by  Michael Sevakis (MikeS)
Monday, 09 May 2011, 21:20 GMT
Reason for closing:  Fixed
Additional comments about closing:  Reporter says the last patch provided fixes the problem.
Comment by Michael Sevakis (MikeS) - Tuesday, 03 May 2011, 06:17 GMT
Then, I guess you're the guy to ask if anything is potentially wrong in the MPC codec because of the change? :-)

Anyway, I thought about it and it might be a good idea to keep moving nonstop (if you hadn't done that when bringing the backlight up) on the scrollwheel just to be absolutely sure about the codec state. If it really is always *R, the thread is trapped in a loop somewhere of some type that is external to playback itself (not even buffering loops waiting for a handle's data should produce that since they sleep) but obviously yields. Perhaps I forgot about something too. :)

You said it can be played normally after a skip. Is it a reproducable pattern of hang in middle->skip->skip back to track->plays normally? Is the track completely buffered?

I'll probably have more questions too.
Comment by Michael Sevakis (MikeS) - Tuesday, 03 May 2011, 06:24 GMT
Another thing that comes to mind is stack space. Perhaps some silent overflow now exists. How tight is MPC on the codec stack?
Comment by Andree Buschmann (Buschel) - Tuesday, 03 May 2011, 20:10 GMT
Here is the .config file.
Comment by Andree Buschmann (Buschel) - Tuesday, 03 May 2011, 21:03 GMT
As far as I can tell from what I have seen in the buffering screen the playback was stuck when playing the last buffered track. Is it possible that this hangup occurs when the next tracks should be buffered?
Comment by Andree Buschmann (Buschel) - Wednesday, 04 May 2011, 05:47 GMT
Happened again on my iPod nano 2G. If I move the finger on the scrollwheel while I am in the "OS stack" screen I can see that "codec" very shortly switches to "S", but most of the time shows "*R".

OS stack debug screen:
===================
S 16 16 10% scroll
B 18 18 33% usb
T/*R 16 16 17% backlight -> *R only very shortly when scrolling the wheel
S 18 18 26% power
S 16 16 30% nand
B 20 20 48% dircache
T 20 20 8% tagcache
T 20 20 34% playlist cachectrl
*R/S 16 16 11% codec -> S only very shortly when scrolling the wheel
T 15 15 57% audio
B 16 16 12% voice
T 15 15 25% buffering

buffering thread debug screen:
========================
pcm 0/529.200
alloc 29.313.567/29.313.568
real 76.019/29.313.568
usefl 54.187/29.313.568

Doesn't this again show that the hangup might have happened when trying to buffer?

Regarding your other questions I can say that "hang in middle"->skip to next track->skip back to track->"plays normally" worked twice for me. I am pretty sure that when skipping to next track re-buffering took place. "hang in middle"->"seek fwd"->"seek rwd" did not get the device back to playback state, the time position and the bar were updated properly though. "hang in middle"->play/pause did not work too, the WPS kept displaying the "Play"-Icon. So, I guess the state of the playback engine did not change.

Edit: FWIW, my iPod Video did not lock up so far (using r29792).
Comment by Michael Sevakis (MikeS) - Thursday, 05 May 2011, 02:23 GMT
Thank you for obtaining these details. Very brief "S" along with responsiveness to the audio thread's quit request indicates strongly that it's stuck in prep_bufdata in buffering.c. However, some points to corruption of the playback state if seeking won't recover it and play/pause won't change the engine state as seen from the WPS (see how simple audio_on_pause() really is and how could that not work?).

ETA:
To answer your question, yes, rebuffering seems quite certain. I'm not sure why. A queue overflow is possible and must not happen. alloc: full, real: low and useful: low looks interesting though.
Handle count, track count and watermark in the buffering screen might have been useful too.
Comment by Michael Sevakis (MikeS) - Thursday, 05 May 2011, 05:37 GMT
To me right now from that set of quantities and a fair amount of consideration, it's looking like a case of a reset audio handle (to force rebuffer for the codec's data to a new position) + failure to shrink the buffer in order to get metadata out of the way so new data can be added.
Comment by Andree Buschmann (Buschel) - Thursday, 05 May 2011, 06:04 GMT
Good to see I could help you with these details. I will try to give some more details about handles, track count, etc. next time -- I didn't use my iPod since the last hangup.
Comment by Michael Sevakis (MikeS) - Friday, 06 May 2011, 01:40 GMT
Give this one a try and see what happens.
Comment by Andree Buschmann (Buschel) - Friday, 06 May 2011, 09:52 GMT
Lockup after 3h with the patch from above:

buffering thread debug screen:
========================
pcm 0/529.200
alloc 29.406.639/29.406.640
real 62.115/29.406.640
usefl 22.147/29.406.640
data_rem 623.147
track count 6
handlecount 16
pcmbufdesc 0/21
watermark 21.500

OS stack debug screen:
===================
*R 16 16 53% main
S 16 16 10% scroll
B 18 18 4% usb
T/*R 16 16 17% backlight -> *R only very shortly when scrolling the wheel
S 18 18 26% power
S 16 16 36% nand
B 20 20 48% dircache
T 20 20 8% tagcache
T 20 20 34% playlist cachectrl
*R/S 16 16 11% codec -> S only very shortly when scrolling the wheel
T 15 15 57% audio
B 16 16 12% voice
T 15 15 25% buffering

Just happened again:

buffering thread debug screen:
========================
pcm 0/529.200
alloc 29.406.639/29.406.640
real 25.703.947/29.406.640
usefl 35.227/29.406.640
data_rem 6.103.683
track count 4
handlecount 8
pcmbufdesc 0/21
watermark 21.125

It seems like you can force the issue when using the buffering thread debug screen and skipping until there is only a minor buffer left (usefl ) and then the playback continue until it would need to rebuffer.

Edit 1: I could reproduce this issue 4-5 time today. The buffering screen always looks very similar: pcm 0, alloc and real nearly filled, usefl nearly empty, pcmbufdesc 0.

Edit 2: It observed that the watermark is set to 0 during rebuffering. Can this cause any unwanted condition in the pcm buffer handling (like "Decode when PCM buffer is below watermark. watermark is =0, we are fine.")?

Comment by Michael Sevakis (MikeS) - Friday, 06 May 2011, 16:14 GMT
It occurred to me that the codec couldn't be only stuck waiting for data, it must be exiting the buffering call then making no further progress to see *R, since waiting for data only calls sleep() but outside that calls yield().

In both cases, the usefl is above the watermark, but did it just add a small amount of data or quit just before scheduled rebuffering? (Rhetorical atm; I know you might not know that since you'd have to see it at the right time)

The numbers in the buffering screen don't change but are stuck at those values?

When you said, "...then the playback continue until it would need to rebuffer." Does that mean it quits again at the next rebuffer?

Comment by Andree Buschmann (Buschel) - Friday, 06 May 2011, 17:10 GMT
> but did it just add a small amount of data or quit just before scheduled rebuffering?
usefl indeed sometimes changed (going up by several 100 KB) before rebuffering -- from few 10 KB to some 100 KB's.

> The numbers in the buffering screen don't change but are stuck at those values?
If the playback hangs the numbers in buffering screen do not change at all.

> Does that mean it quits again at the next rebuffer?
No. Bad english :/ I meant to say that you just skip until usefl is so small, that any further skip would immediately result in rebuffering. Then you just wait until the playback engine wants to rebuffer. In some cases it will lock up, in most cases it will rebuffer normally. I just have the impression that you do not need to wait to play the whole buffered data, you can just skip and wait for rebuffering. This saves a lot of time for reproduction.
Comment by Michael Sevakis (MikeS) - Saturday, 07 May 2011, 00:00 GMT
One moment! You're saying that you have recipe? Skip enough to force rebuffer-> wait for normal rebuffer after that = sometimes stalls. ??

Now it locks or you mean just stalls without pcm output? I wouldn't call any state that can be exited normally a "lockup".
Comment by Michael Sevakis (MikeS) - Saturday, 07 May 2011, 02:20 GMT
With this one, it also sends BUFFER_EVENT_REBUFFER to clear old tracks when there isn't enough space even if the handle is the last one where before if the last handle was the one being rebuffered, no event was sent. To not clear it could lead to a buffer jam, unable to shrink the contents, if previous tracks exist that aren't moveable. Please reports!
Comment by Michael Sevakis (MikeS) - Saturday, 07 May 2011, 05:40 GMT
On my fuze I can reproduce this pretty easily (meaning pretty much at will) with a mixed bunch of test files, the one at the key point being flac. It nearly drains the buffer, then buffering for that last one on the buffer stops prematurely. usefl is quite close to but greater than the watermark.

Next thing I'll do is to check for queue overflows which are really a very bad thing here should they happen. Every message is precious.

ETA:

The premature stop is what is expected if the buffering didn't yet have enough room to complete and collides with ridx or the next handle. Buffer above the watermark won't trigger a proper clear of old handles yet the request cannot complete because of those handles. It was probably due for a refill but a buffer seek caused a handle reset instead of sending the low buffer event, which after that refill for the handle leaves it too full to rebuffer, then it never gets low enough again to rebuffer inside playback. The codec state being anything other than S has me wondering about there being just one more ingredient.
Comment by Michael Sevakis (MikeS) - Saturday, 07 May 2011, 06:21 GMT
Haha...well a case of the dumb overcame me and I don't want that responsible for misleading anyone. It's possible to see *R if it timed out in the sleep() call but hadn't yet been scheduled. derp.
Comment by Michael Sevakis (MikeS) - Saturday, 07 May 2011, 06:41 GMT
Yep, splashed it out nice and cheap and it really looks like a buffer jam with the codec thread spinning around waiting for data that can't be loaded. I'm checking all this with plain SVN, FYI. The previous patch might cure it but I'm not yet sure it should be fixed exactly that way.
Comment by Andree Buschmann (Buschel) - Saturday, 07 May 2011, 06:57 GMT
Sounds like you have a clue what is going on. I will wait for a patch to test or any support request from your side.
Comment by Michael Sevakis (MikeS) - Saturday, 07 May 2011, 07:25 GMT
Sure do! :-) I think the real cause of the regression is the removal of sending BUFFER_EVENT_BUFFER_LOW when processing Q_START_FILL. There is no seek-related problem, at least in how my testing proceeded, since I checked if it actually went through those cases and it didn't. The position must be in range but not the extent of the data needed. That patch won't quite take care of things universally.

Really, those low buffer events tend to be a nuissance and I don't want it to fire them (risking queue overflows) just to make sure we beat it hard enough.
Comment by Michael Sevakis (MikeS) - Sunday, 08 May 2011, 07:15 GMT
Try this one out. I wasn't able to reproduce any more on the Fuze (or was getting very unlucky :-).

This really isn't quite proper but should tell if it's the right thing overall. It also removes buf_request_buffer_handle as a buffering API since that function can be considered a kludge because simply asking for data should be what requests buffering automatically if needed. Also ups the plugin API version.
Comment by Andree Buschmann (Buschel) - Sunday, 08 May 2011, 17:34 GMT
I tested a build with this patch today and experienced no issues anymore. My nano ran for ~6h. I will start an overnight run in a few hours.
Comment by Andree Buschmann (Buschel) - Monday, 09 May 2011, 05:59 GMT
Results of the overnight endurance test => works! :o)

I would like to have this one submitted to have stable playback in svn. You can then further work on smarter solution. What do you think?
Comment by Michael Sevakis (MikeS) - Monday, 09 May 2011, 20:31 GMT
Ok, guess I'll just go with this for now (in a few minutes) and think about a nicer, more universal implementation for later. My time is limited at the moment anyway.

Loading...