• Status New
  • Percent Complete
  • Task Type Bugs
  • Category LCD
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version Daily build (which?)
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by dreamlayers - 2009-04-21

FS#10156 - LCD sleep can theoretically happen after the backlight turns on again

If LCD_SLEEP is put on the backlight queue immediatly after BACKLIGHT_ON, the LCD will go to sleep immediately after the backlight turns on. On the 5G iPod, the next backlight_on() call will wake it, but on some other targets it might not wake again.

For example, consider the following sequence of events:
1) backlight_on() is called, putting BACKLIGHT_ON on the queue
2) The LCD sleep timeout from the previous backlight shutoff puts LCD_SLEEP on the queue
3) BACKLIGHT_ON is received and the backlight is turned on. The call of backlight_lcd_sleep_countdown(false) does nothing about the LCD_SLEEP on the queue.
5) The LCD_SLEEP is received, putting the LCD to sleep again.

This patch causes LCD_SLEEP to be ignored by backlight_thread() when the backlight is on. It also removes a workaround from 5G iPod LCD sleep code.

This issue is theoretical and very improbable. I am not able to trigger this bug in r20774.

We're talking about this all within 1 tick right?

Anyway, I think the calls to sleep_countdown(), lcd_sleep()/awake(), lcd_enable() should be moved into backlight.c, this isn't really target specific afterall.

The window of opportunity may be much shorter than a tick. The tick interrupt has to occur between queue_post(&backlight_queue, BACKLIGHT_ON, 0) in backlight_on() and backlight_lcd_sleep_countdown(false) in _backlight_on(). (_backlight_on() is in backlight.c when using PWM fading. Otherwise it is in target-specific backlight code.)

Anyway, I think the calls to sleep_countdown(), lcd_sleep()/awake(), lcd_enable() should be moved into backlight.c, this isn't really target specific afterall.

I agree that the functions should be moved, but that won't fix this bug, so it requires yet another backlight-related task. I'm willing to do it.

For this patch, wouldn't it also work (or be even better(?)) LCD_SLEEP is not even posted if BACKLIGHT_ON isn't handled yet?
Something like:
if (queue_empty(&backlight_queue)) queue_post(&backlight_queue, LCD_SLEEP, 0)

if (queue_empty(&backlight_queue)) queue_post(&backlight_queue, LCD_SLEEP, 0)

That still leaves a very small window of vulnerability, between the queue_wait that gets BACKLIGHT_ON out of the queue and backlight_lcd_sleep_countdown(false).

I don't quite understand. Reading your first post again, it seems your identify the problem as being "putting LCD_SLEEP when BACKLIGHT_ON has just been posted)". This can only happen if the backlight thread didn't run between that point (which is already uncredible unlikely, and you even said you cannot reproduce it anymore).

With what I proposed, it should be impossible that LCD_SLEEP is posted until BACKLIGHT_ON has been handled.

Or am I wrong? Not allowing LCD_SLEEP with backlight being on seems weird to me too, but on the other hand, I can imagine requiring backlight_off() before is reasonable too.

LCD_SLEEP is posted from backlight_tick() and that is called from a timer interrupt. This means it can be posted any time before backlight_lcd_sleep_countdown(false) sets lcd_sleep_timer to 0. I didn't mean to imply that LCD_SLEEP has to be put on the queue while BACKLIGHT_ON is still on the queue.

Actually, now I see the same issue with the three …light_off() functions which are called from backlight_tick(). In those cases, if the timer expires between posting of the corresponding …LIGHT_ON and the reset of the corresponding timer, the button press fails to keep the light on or turn it back on. Here the backlight thread could be made to turn off the light only if the corresponding timer is zero, and …light_off() calls from other parts of Rockbox could zero the timer before posting the …LIGHT_OFF message.

I know I'm being very pedantic here. I noticed this possibility when I was preparing to commit  FS#10129  and wondering if I could remove the workaround in backlight-nano_video.c. (Removal of that workaround is the backlight-nano_video.c part of prevent_unwanted_lcd_sleep.patch)

But nothing except PWM fading uses a external timer or interrupt. Everything else uses the "ordinary" tick task timer. And what you mentioned is that we don't "allow" calling the _*light_off functions from outside the backlight thread, but only the normal *light_off ones (without underscore in the beginning).

The backlight tick is always called in the normal timer interrupt. backlight_lcd_sleep_countdown(false) shouldn't actually be called outside the backlight thread. Most targets I saw though calls it in their backlight driver in _backlight_off (which in turn is only called from the backlight thread) which should be fine.

It seems that at least on PP502x, tick tasks are called from a timer interrupt. In particular, irq_handler() (in system-pp502x.c) calls TIMER1() (in kernel-pp.c) which then calls call_tick_tasks() (inline, in kernel.h), which in turn calls backlight_tick().

I wasn't talking about functions which start with an underscore. By "…" I meant something like a wildcard; I didn't use * because that has another meaning in C. I guess * would have been clearer. Sorry about how my second paragraph was unclear. What I meant is the following sequence of events:
1) Key press results in backlight_on() and BACKLIGHT_ON is posted.
3) Before backlight_timer is updated in backlight_update_state(), backlight_tick() is called from the timer interrupt, the timer expires, and BACKLIGHT_OFF is posted.
3) The BACKLIGHT_OFF causes backlight_timer to be set to zero and the backlight to be turned off.
I could say the same thing for the buttonlight and remote backlight.

As an unrelated issue: _*light_on() and _*light_off() functions can be called from outside backlight_thread() when changing backlight timeout settings. See  FS#10130 .

All tick tasks are called via a timer interrupt on all targets. And the backlight tick runs once per tick (and can only post 1 message).

Keypress scans are also run in a tick task (button tick), and can also only post 1 messsage to the backlight queue (newer ipods and sansa wheels are special, they're read in an interrupt, but they only post to the button queue if it's empty). Anyway, the button tick will do the post to the backlight queue, which is also only per tick.

It doesn't really matter that the tick tasks are interrupts, since all are run only every 10ms.

Thus, I'm not entirely sure if I understand the issue.

RE:  FS#10130  (maybe this should go there, but meh), that's indeed weird and should be done cleaner.

I put the following in one of the battery debug screens:
long wait = current_tick;
while (!TIME_AFTER(current_tick, wait + HZ*2));
That doesn't affect tick functions, since they execute via an interrupt, but it blocks backlight_thread() from executing during those waits, because the while loop doesn't yield. That extends the window of opportunity. After button_tick() posts BACKLIGHT_ON, there can be up to two seconds until backlight_thread() can run and cancel the LCD sleep countdown via backlight_lcd_sleep_countdown(false). If lcd_sleep_timer gets to zero during that time, LCD_SLEEP will be posted.

I set the LCD sleep timer to 5 seconds. If I waited about 4 seconds after fade-out and then pressed the select button, the backlight would come on soon after, and then maybe two seconds after that the LCD would go white. I wasn't always able to do this; I managed to do it on maybe half of my attempts. prevent_unwanted_lcd_sleep.patch did not seem to help. I thought that was probably because the one second backlight timeout had already expired, so I changed the test to "if ((bl_dim_current|bl_dim_target) == 0)". That made the problem go away. Increasing the backlight timeout to 3 s also made the problem go away.

Without the patch, I could also reproduce the problem when the unyielding wait was HZ/2. It was just harder.

So, I'm still thinking checking queue_empty before posting LCD_SLEEP would fix it.

This will prevent LCD_SLEEP from being posted until BACKLIGHT_ON is handled (i.e. until queue_wait returns with being BACKLIGHT_ON).
This means that the only possibility for LCD_SLEEP is *after* queue_wait, but after queue_wait there's already directly the switch-case, i.e. only some clocks until backlight_update_state is executed.

Or are we just not yielding enough at some point?

The timer interrupt can still occur in the few clocks between queue_wait and "lcd_sleep_timer = 0". The probability of the timer interrupt happening then is very low, and in addition that would have to be the tick which makes lcd_sleep_timer reach zero. Together, this is such a low probability that I guess it is good enough as a solution.

I would rather check once LCD_SLEEP is received, because I think that makes the problem impossible, and I think that's better than making the problem very improbable.
I don't think the LCD should ever sleep while the backlight is on, so I don't see any disadvantages. I do have one concern however: since LCD sleep is under target control on some targets, I'm not absolutely certain about correctness. I think the solution to that is what you said earlier:
> Anyway, I think the calls to sleep_countdown(), lcd_sleep()/awake(), lcd_enable() should be moved into backlight.c, this isn't really target specific afterall.
Once that has been done, we can be sure that LCD_SLEEP can be ignored when backlight is on.

Well, I'm just thinking your fix equals disbelieving in our tick task/thread/timer interrupt system. If you argue like that (which is perfectly reasonable) you'll want to fix similar situations in dozens of other places in Rockbox too.

Checking for backlight on isn't 100% safe either. Because between posting BACKLIGHT_ON and actually is_backlight_on() returning true there's quite a few clocks within too (upto 1 tick)

With either fix you have a very few clocks of "undefined behavior". But I think my is even shorter. You need to wait from "post BACKLIGHT_ON" until "backlight_update_state returns". With my way, you only safe the time until the next tick, only having the time after the return of queue_wait until the return of backlight_update_state.

Or is my thinking wrong?

MikeS commented on 2009-04-23 21:49

I think something related actually happened to me one time on Gigabeat S, and only one time. It got stuck where the backlight didn't come on again and is probably related. Things should work, whether the required timing is improbable or not.

The best solution would be to get rid of the backlight tick (it's really quite pointless imho) and use a ⇐ 1s timeout on thread, performing all functions on the backlight thread itself. I pondered doing it a couple times since it probably wouldn't even be that complicated a change overall.

I don't think there's any undefined behaviour when doing the check in the LCD_SLEEP case.
If the timer expires before BACKLIGHT_ON is put in the queue, effectively the backlight was turned on too late to prevent the LCD from going to sleep. LCD_SLEEP will go on the queue before BACKLIGHT_ON, and the LCD will go to sleep, wake and then the backlight will go on.
If the timer expires after BACKLIGHT_ON was put in the queue (including after BACKLIGHT_ON was taken out of the queue), code executed in response to BACKLIGHT_ON will finish before backlight_thread() does another queue_wait to get LCD_SLEEP. So, when LCD_SLEEP is executed, it will see everything that was done in response to BACKLIGHT_ON. There is no way for the LCD_SLEEP case to see only part of what the BACKLIGHT_ON case does, because all of that code is in one thread.

I originally wasn't questioning every instance of the task/thread/timer interrupt system. But in backlight.c, there certainly are more issues. For example, in the previous paragraph, think of the backlight timer and BACKLIGHT_OFF instead of LCD_SLEEP. Just like the LCD sleep timer, the backlight timer can expire after BACKLIGHT_ON was put on the thread but before the timer was reset. In that case the timer reset is ignored; effectively, that keypress was ignored for backlight purposes. (I'm focusing on the LCD sleep timer more because it seems like a worse problem. It leads a bad state: backlight on but LCD sleeping. That's worse than ignoring a keypress for backlight purposes.)

So far, I don't see any way for anything to be permanently stuck. If a keypress fails to keep the backlight on or turn it on, the next press should turn it on. If the backlight is on and the LCD is sleeping, letting the backlight turn off and then turning it back on must wake the LCD.

I agree that getting rid of backlight_tick would be good. Why does code need to always run 100 times a second to handle occasional timeouts which are several seconds in length? I wasn't intending to propose such a major rewrite of backlight code though.

Not a major rewrite at all :)

edit: updated patch slightly

You're right, not a major rewrite. Thanks! I do have some concerns about the patch however:
1) The MIN in "unsigned long delta = MIN(HZ,(unsigned long)(current_tick - last_tick));" seems unneeded and wrong. When fading uses SYS_TIMEOUT, timeouts are 20 to 40 ms, and the MIN would make all the timers run too quickly.
2) Interrupts between the queue_wait_w_tmo and the queue_post resulting from a timer running out can cause the sorts of problems I described. Since backlight_timeouts_handler() is executed within backlight_thread(), there is no need to use the backlight_queue; the functions which perform the actual work can be called directly.
3) I think "if (buttonlight_timer == 0)" is wrong; it should be ⇐ 0 like the others.
4) There's still LCD_SLEEP which can be posted after the PWM fade is complete when "LCD Sleep (After backlight off)" is set to always. I guess that could be transformed to a 1-tick delay, though that can become a 1-second delay. (I'm sorry if  FS#10129  makes things harder. Perhaps I should have delayed that patch until these things are ironed out.)
5) I'd like to use the shortest time remaining (of all the timers) as a timeout. Then, when no timer is active, the thread could just call queue_wait, and do nothing until it's needed. It would also avoid delaying some timers by up to a second.
6) I'm tempted to replace timers which count down with the tick number when the timer will run out, to avoid cumulative error, but that kind of precision is unnecessary.

I'm attaching an patch with modifications 1, 2 and 3. It's late now; I'll look at the other stuff tomorrow.

The MIN() is needed. If you press buttons, SYS_TIMEOUT will not happen, which leads to increasing delta between current_tick and last_tick (more than HZ). And this will make the timeout calculation wrong (if the delta is already say 8*HZ, the light will already turn of 7 seconds after stopping pressing buttons, not 15 (given that 15s backlight timeout is set)).

5) sounds reasonable. But maybe not changing too much at once is better :)

Re: your other concerns. This was more a proof-of-concept (to show it's not a major rewrite), surely it can be optimized :)

Ok, now I understand the increasing delta issue. (Last night I was wondering why the backlight seemed to turn off too soon sometimes. This was the reason.)
Sorry, I was confused about the MIN; my point 1) above is incorrect.

Hm, I agree that the backlight tick seems obsolete and the patch works quite well. I guess we could commit your version with the MIN fixed, and mess your other 3 concerns later?


Available keyboard shortcuts


Task Details

Task Editing