[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 08/16] xen/arm: add watchdog domain suspend/resume helpers



On Fri, Mar 21, 2025 at 4:04 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 21.03.2025 10:50, Mykola Kvach wrote:
> > On Thu, Mar 13, 2025 at 5:34 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >> On 05.03.2025 10:11, Mykola Kvach wrote:
> >>> +void watchdog_domain_resume(struct domain *d)
> >>> +{
> >>> +    unsigned int i;
> >>> +
> >>> +    spin_lock(&d->watchdog_lock);
> >>> +
> >>> +    for ( i = 0; i < NR_DOMAIN_WATCHDOG_TIMERS; i++ )
> >>> +    {
> >>> +        if ( test_bit(i, &d->watchdog_inuse_map) )
> >>> +        {
> >>> +            set_timer(&d->watchdog_timer[i].timer,
> >>> +                      NOW() + SECONDS(d->watchdog_timer[i].timeout));
> >>
> >> The timeout may have almost expired before suspending; restoring to the
> >> full original period feels wrong. At the very least, if that's indeed
> >> intended behavior, imo this needs spelling out explicitly.
> >
> > It takes some time to wake up the domain, but the watchdog timeout is
> > reset by a userspace daemon. As a result, we can easily encounter a
> > watchdog trigger during the  resume process.
>
> Which may mean the restoring is done too early, or needs doing in two
> phases.
>
> > It looks like we should
> > stop the watchdog timer from the guest, and in that case, we can drop
> > all changes related to the watchdog in this patch series.
>
> Except that then you require a guest to be aware of host suspend. Which
> may not be desirable.

I think this is not a problem; at least, I don't see how the guest
could be aware of the host suspend.

For now, we have three cases:

1) The guest is suspended (actually paused) because the system
suspends, and we pause all non-hardware domains.
2) The guest is suspended via the `xl` tool (x86 only, at least for now).
3) The guest requests S2R via `echo mem > /sys/power/state` or
`systemctl suspend`.

Let's review all these cases:

1) There is no action required here; it should be handled correctly by
domain pause. However, I think it is not handled properly right
now—but that is not an issue with the current patch series.
2) There are potential problems here. We should either notify the
domain that it will be suspended (which is hard to implement and the
guest will be aware of the host suspending) or suspend watchdog
directly during the execution of `xl` commands (more preferable).
3) As far as I know, if `watchdogd` is running, we can simply add an
action to it on suspend/resume events (need to review not Linux kernel
cases). In the case of the Linux kernel driver, it already handles
suspending/resuming the Xen watchdog correctly.

So, if I am not mistaken, we can drop all patches related to watchdog
suspend in Xen until `xl suspend/resume` for ARM64 is implemented. For
other cases, we should handle suspend/resume of the watchdog via the
`watchdogd` service.

Note: As far as I know, only the control domain has `watchdogd`
(though we could potentially set it up for other domains). DomUs can
only use the Xen watchdog Linux kernel driver.

>
> Jan

~Mykola



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.