|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
On 19.04.2023 11:02, Roger Pau Monné wrote:
> On Wed, Apr 19, 2023 at 09:07:41AM +0200, Jan Beulich wrote:
>> On 18.04.2023 17:54, Andrew Cooper wrote:
>>> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
>>>> The addition of the flags field in the vcpu_set_singleshot_timer in
>>>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
>>>> increased.
>>>>
>>>> Remove such field addition and drop the implementation of the
>>>> VCPU_SSHOTTMR_future flag. If a timer provides an expired timeout
>>>> value just inject the timer interrupt.
>>>>
>>>> Bump the Xen interface version, and keep the flags field and
>>>> VCPU_SSHOTTMR_future available for guests using the old interface.
>>>>
>>>> Note the removal of the field from the vcpu_set_singleshot_timer
>>>> struct allows removing the compat translation of the struct.
>>>>
>>>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
>>>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>>
>>> While everything said is true, this isn't the reason to to get rid of
>>> VCPU_SSHOTTMR_future
>>>
>>> It 505ef3ea8687 does appear to have been an ABI break, that's
>>> incidental. It dates from 2007 so whatever we have now is the de-facto
>>> ABI, whether we like it or not.
>>>
>>> The reason to delete this is because it is a monumentality and entirely
>>> stupid idea which should have been rejected outright at the time, and
>>> the only guest we can find which uses it also BUG_ON()'s in response to
>>> -ETIME.
>>
>> The instance in Linux (up to 4.6) that I could find was
>>
>> BUG_ON(ret != 0 && ret != -ETIME);
>>
>> i.e. not really dying when getting back -ETIME. (And if there really was
>> a BUG_ON(ret) somewhere despite setting the flag, it would be a bug there,
>> not something to "fix" in Xen.) I'm afraid I also disagree on "stupid
>> idea" as well as ...
>
> The logic in old Linux is indeed 'fine' in the sense that it doesn't
> hit a BUG_ON.
>
> The problem we are seeing is that when logdirty is enabled on a guest
> with 32vCPUs (and without any kind of logdirty hardware assistance)
> the contention on the p2m lock is so high that by the time
> VCPUOP_set_singleshot_timer has copied the hypercall data from HVM
> context the provided timeout has already expired, leading to:
>
> [ 65.543736] hrtimer: interrupt took 10817714 ns
> [ 65.514171] CE: xen increased min_delta_ns to 150000 nsec
> [ 65.514171] CE: xen increased min_delta_ns to 225000 nsec
> [ 65.514171] CE: xen increased min_delta_ns to 337500 nsec
> [ 65.566495] CE: xen increased min_delta_ns to 150000 nsec
> [ 65.514171] CE: xen increased min_delta_ns to 506250 nsec
> [ 65.573088] CE: xen increased min_delta_ns to 150000 nsec
> [ 65.572884] CE: xen increased min_delta_ns to 150000 nsec
> [ 65.514171] CE: xen increased min_delta_ns to 759375 nsec
> [ 65.638644] CE: xen increased min_delta_ns to 150000 nsec
> [ 65.566495] CE: xen increased min_delta_ns to 225000 nsec
> [ 65.514171] CE: xen increased min_delta_ns to 1000000 nsec
> [ 65.572884] CE: xen increased min_delta_ns to 225000 nsec
> [ 65.573088] CE: xen increased min_delta_ns to 225000 nsec
> [ 65.630224] CE: xen increased min_delta_ns to 150000 nsec
> ...
>
> xenrt1062821 login: [ 82.752788] CE: Reprogramming failure. Giving up
> [ 82.779470] CE: xen increased min_delta_ns to 1000000 nsec
> [ 82.793075] CE: Reprogramming failure. Giving up
> [ 82.779470] CE: Reprogramming failure. Giving up
> [ 82.821864] CE: xen increased min_delta_ns to 506250 nsec
> [ 82.821864] CE: xen increased min_delta_ns to 759375 nsec
> [ 82.821864] CE: xen increased min_delta_ns to 1000000 nsec
> [ 82.821864] CE: Reprogramming failure. Giving up
> [ 82.856256] CE: Reprogramming failure. Giving up
> [ 84.566279] CE: Reprogramming failure. Giving up
> [ 84.649493] Freezing user space processes ...
> [ 130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected
> by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
> [ 130.604032] Task dump for CPU 14:
> [ 130.604032] swapper/14 R running task 0 0 1
> 0x00000000
> [ 130.604032] Call Trace:
> [ 130.604032] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [ 130.604032] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [ 130.604032] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [ 130.604032] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [ 130.604032] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [ 130.604032] [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> [ 549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected
> by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
> [ 549.655463] Task dump for CPU 26:
> [ 549.655463] swapper/26 R running task 0 0 1
> 0x00000000
> [ 549.655463] Call Trace:
> [ 549.655463] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [ 549.655463] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [ 549.655463] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [ 549.655463] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [ 549.655463] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [ 549.655463] [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> [ 821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected
> by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
> [ 821.888596] Task dump for CPU 26:
> [ 821.888622] swapper/26 R running task 0 0 1
> 0x00000000
> [ 821.888677] Call Trace:
> [ 821.888712] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [ 821.888771] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [ 821.888818] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [ 821.888865] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [ 821.888917] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [ 821.888966] [<ffffffff900000d5>] ? start_cpu+0x5/0x14
>
> At some point Linux simply gives up trying to reprogram the timer, and
> that obviously lead to CPU stalls.
And that's all with old enough Linux then, I suppose?
> Ignoring the VCPU_SSHOTTMR_future flag allows the guest to survive, by
> not returning ETIME and just injecting the expired interrupt.
>
> Overall I'm not sure how useful VCPU_SSHOTTMR_future is at least when
> implemented as done currently in Linux.
>
> Instead of trying to reprogram the timer Linux should do the
> equivalent of self-inject a timer interrupt in order to cope with the
> fact that the selected timeout has already expired.
Indeed - that's what I was expecting would be happening. But I didn't
go check their code ... Yet them getting it wrong still isn't a reason
to ignore the request, at least not unconditionally. OSes could be
getting it right, and they could then benefit from the avoided event.
As to "unconditionally": Introducing a per-guest control is likely too
much overhead for something that, aiui, isn't commonly used (anymore).
But tying this to a command line option might make sense - engaging it
shouldn't (hopefully) lead to misbehavior in guests properly using the
flag, so ought to be okay to enable in a system-wide manner.
I vaguely recall considerations for similar overrides to hypercall
behavior in other areas, so such an option - if made extensible -
might find further uses down the road.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |