[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 08/16] xen/arm: add watchdog domain suspend/resume helpers



Hi,

On Thu, Mar 13, 2025 at 5:34 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 05.03.2025 10:11, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > This patch implements suspend/resume helpers for the watchdog.
> > While a domain is suspended its watchdogs must be paused. Otherwise,
> > if the domain stays in the suspend state for a longer period of time
> > compared to the watchdog period, the domain would be shutdown on resume.
> > Proper solution to this problem is to stop (suspend) the watchdog timers
> > after the domain suspends and to restart (resume) the watchdog timers
> > before the domain resumes. The suspend/resume of watchdog timers is done
> > in Xen and is invisible to the guests.
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@xxxxxxxxxx>
>
> From: != first S-o-b: is always raising the question of who's the original
> author of a patch.

I'll try to change the From field if possible. Thank you for pointing that out.

>
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -1605,6 +1605,45 @@ void watchdog_domain_destroy(struct domain *d)
> >          kill_timer(&d->watchdog_timer[i].timer);
> >  }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +void watchdog_domain_suspend(struct domain *d)
> > +{
> > +    unsigned int i;
> > +
> > +    spin_lock(&d->watchdog_lock);
> > +
> > +    for ( i = 0; i < NR_DOMAIN_WATCHDOG_TIMERS; i++ )
> > +    {
> > +        if ( test_bit(i, &d->watchdog_inuse_map) )
> > +        {
> > +            stop_timer(&d->watchdog_timer[i].timer);
> > +        }
>
> We generally prefer to omit the braces if the body of an if() (or
> whatever it is) is a single statement / line.

will change

>
> > +    }
> > +
> > +    spin_unlock(&d->watchdog_lock);
> > +}
> > +
> > +void watchdog_domain_resume(struct domain *d)
> > +{
> > +    unsigned int i;
> > +
> > +    spin_lock(&d->watchdog_lock);
> > +
> > +    for ( i = 0; i < NR_DOMAIN_WATCHDOG_TIMERS; i++ )
> > +    {
> > +        if ( test_bit(i, &d->watchdog_inuse_map) )
> > +        {
> > +            set_timer(&d->watchdog_timer[i].timer,
> > +                      NOW() + SECONDS(d->watchdog_timer[i].timeout));
>
> The timeout may have almost expired before suspending; restoring to the
> full original period feels wrong. At the very least, if that's indeed
> intended behavior, imo this needs spelling out explicitly.

It takes some time to wake up the domain, but the watchdog timeout is
reset by a userspace daemon. As a result, we can easily encounter a
watchdog trigger during the  resume process. It looks like we should
stop the watchdog timer from the guest, and in that case, we can drop
all changes related to the watchdog in this patch series.
In any case, in this patch, we restore the default timeout instead of
using the real remaining time, so the behavior won't change. However,
I'm not sure exactly how much effort this would require. We can
enable/disable the watchdog using the Linux kernel driver and the Xen
watchdog daemon, but the Linux kernel already handles suspend/resume
of the Xen watchdog timer.

>
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -1109,6 +1109,15 @@ void scheduler_disable(void);
> >  void watchdog_domain_init(struct domain *d);
> >  void watchdog_domain_destroy(struct domain *d);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +/*
> > + * Suspend/resume watchdogs of domain (while the domain is suspended its
> > + * watchdogs should be on pause)
> > + */
> > +void watchdog_domain_suspend(struct domain *d);
> > +void watchdog_domain_resume(struct domain *d);
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
>
> I don't think the #ifdef is strictly needed here?

ok, I'll drop it

>
> Jan

Best regards,
Mykola



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.