[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Watchdog and live migration



>>> On 16.03.12 at 15:58, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Fri, 2012-03-16 at 14:34 +0000, Jan Beulich wrote:
>> >>> On 16.03.12 at 13:32, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
>> > I think this is a bug, moving to xen-devel@ and CCing the drivers author
>> > (Hi Jan).
>> > 
>> > On Fri, 2012-03-16 at 12:29 +0000, Wouter de Geus wrote:
>> >> * Ian Campbell <Ian.Campbell@xxxxxxxxxx> [2012-03-16 10:55:34 +0000]:
>> >> 
>> >> > > In the domU a /dev/watchdog exists, but I never touch it myself.
>> >> > 
>> >> > Does your distro start a watchdogd for you?
>> >> > 
>> >> > Even if it did then it should obviously continue to poke the watchdog
>> >> > even after a migration, but it might provide some hints.
>> >> 
>> >> Nope, it doesn't.
>> >> As an experiemtn I tried sending a 'V' to /dev/watchdog just after 
>> >> migration, which seems to be the magic byte to stop the watchdog 
>> >> according to the linux kernel docs.
>> >> 
>> >> (according to Documentation/watchdog/watchdog-api.txt) - 
>> >> "If a driver supports "Magic Close", the driver will not disable the
>> >> watchdog unless a specific magic character 'V' has been sent to
>> >> /dev/watchdog just before closing the file."
>> >> 
>> >> Surprisingly, the domU seemed to stay alive after doing this.
>> >> So it would appear that somehow the watchdog is triggered after migration.
>> >> No idea why though.
>> > 
>> > drivers/watchdog/xen_wdt.c:xen_wdt_resume() unconditionally calls
>> > xen_wdt_start(). Shouldn't it only do this if the wdt is active?
>> 
>> Could you give the patch below a try?
>> 
>> Jan
>> 
>> --- a/drivers/watchdog/xen_wdt.c
>> +++ b/drivers/watchdog/xen_wdt.c
>> @@ -297,11 +297,19 @@ static void xen_wdt_shutdown(struct plat
>>  
>>  static int xen_wdt_suspend(struct platform_device *dev, pm_message_t state)
>>  {
>> -    return xen_wdt_stop();
>> +    typeof(wdt.id) id = wdt.id;
> 
> typeof here is a bit odd.

But I want to match the original field's type.

>> +    int rc = xen_wdt_stop();
>> +
>> +    wdt.id = id;
>> +
>> +    return rc;
>>  }
>>  
>>  static int xen_wdt_resume(struct platform_device *dev)
>>  {
>> +    if (!wdt.id)
>> +            return 0;
> 
> Can't you check is_active instead and avoid having to play tricks in
> xen_wdt_suspend to preserve a non-0 wdt.id when the watchdog is active?

I first thought of this too, but is_active doesn't represent whether
a watchdog is actually engaged - it merely says whether the
watchdog device is currently open (but watchdog setup itself
may have failed).

Looking at the code for this again, I see another problem though:
is_active gets cleared in xen_wdt_release() even when a release
was not expected (similar to how iTCO_wdt, pcwd_pci, or pcwd_usb
behave, but imo buggy nevertheless), or when xen_wdt_stop()
failed.

Jan

>> +    wdt.id = 0;
>>      return xen_wdt_start();
>>  }
>>  
>> 
>> 




_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.