[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
> > No, the patch that Kevin provided cannot work because it touches the > watchdog before jiffies has been updated. Since both jiffy update and > watchdog check happens inside do_timer(), this is a hard problem to fix > for > Linux 2.6.16. You could push the watchdog touch inside the following > loop > that calls do_timer(): I think that would work! > OK, I've spent a little time to really understand this today (hopefully!) and I think I know now why none of the patches to date (for 2.6.16 anyway) work -- the problem is they only touched the wdt one time BUT timer_interrupt in time-xen.c has a loop that repeatedly calls do_timer to advance the jiffies and check for timeout until the entire delta time since the last time called is accounted for... any single one of those do_timer calls might result in a watchdog timer expiration. It's also not really correct to only touch the watchdog if the stolen time is > 5s -- you might be currently sitting at 8s since the watchdog was last updated and get called after 2s of stolen time and that will cause a timeout. What's more, if you get called with more than 20s of stolen time (e.g. after save/restore or pause/unpause), you really need to tickle the watchdog timer multiple times (at least once for every 10s worth of jiffies in the total stolen time). So -- my proposal (patch attached for 2.6.16) is to touch the watchdog inside the loop that calls do_timer(), right after the call IF the remaining amount of stolen time is greater than NS_PER_TICK -- since each call to do_timer advances jiffies by one, this could only go wrong if there was only a single jiffy left until the watchdog timer expires on entry and I think that's OK! I also considered only touching the watchdog timer every 5s or so, but I think the code to do that would have more overhead than simply touching it for every do_timer() call (since it's just a call that copies jiffies to the per-cpu watchdog timer value). Take a look and let me know what you think (the printk could be removed -- I just put it in so I could tell the code was running). Simon Attachment:
softlockup.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |