[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v3] xen/vcpu: ignore VCPU_SSHOTTMR_future


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Date: Wed, 19 Apr 2023 16:31:55 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Lj4RJ6agQM5HczTdtZAY9gfemG/J0BB2RY0p7h0DKPM=; b=KPZs+JfClQ3Mi95dnqne6t+2T2qJwVQ9uXmzS/8iYqpAYibNgSnyXvDJFld+yyfhwAe0sCZf3CsjkkIapUssI1GnlRJSct4t1aNjsv34oQzvhqjI5j8T952fq9nPh3xrCqs5u47aMwbcEpFsKGUqFErztuEJaNg8MQokEkqI0YwGpgH9fOACQHC6Zrof3++wFpsnTwPabwsi8Q5BaaGVWMH6XxM34+H3kU1sjbKCQbSyvnP0oMfji9mSiwT/NTTuqxV/Jl1mMUkYI+PK18pDIOMfM3rk7Ro5CwSeuVlaGTR5QBoZZycdXHY1u/LNnc0TJLOihsXaQK58Q3WIJaNRKw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FTFHBd25KYuz5FpXNgfT0MHtdG8XrxA5w2yOpH9WEPYbJiZIB6Vm3DYGpFUusff2nfre0TnrGHlr/X1myOvgRsBS4CLBFwsbY4abYVPrU0u0irY5Ix/a6VkTGjj7n2paIoAqcpdtsZFpVgEHwnw6M5HalDmovC9/Mlsne3lNPEz+GwGfR3cR75B2XYBle3FKyBb5VzZyUY2d00UBlSqmoPSEMX2/dlbz7EPtU+gtMJlzGk4phbjSlpOQw2aY3PIKAeJrkA1x+YQj346w9CoWQJLav8okdmd2k+fDKaqoebokXVPapK9ynDBJD78WGfk7FctB0WFguW27ZbKjSBUUaw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, Community Manager <community.manager@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 19 Apr 2023 14:32:25 +0000
  • Ironport-data: A9a23:+lbQSK1CHmI7NE9BafbD5Vdwkn2cJEfYwER7XKvMYLTBsI5bpzEBz jFLD2mAOfeJZjT0LtFwPoy+/EsA6JLWn9diQAo5pC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefTAOK6ULWeUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tq3qMDEULOf82cc3lk8tuTS+XuDgNyo4GlD5gBnNKgS1LPjvyJ94Kw3dPnZw0TQGuG4LsbiL 87fwbew+H/u/htFIrtJRZ6iLyXm6paLVeS/oiI+t5qK23CulQRrukoPD9IOaF8/ttm8t4sZJ OOhF3CHYVxB0qXkwIzxWvTDes10FfUuFLTveRBTvSEPpqFvnrSFL/hGVSkL0YMkFulfXEpA0 eYJdG83YzuPqPPomKy8E9VKmZF2RCXrFNt3VnBI6xj8VK5ja7acBqLA6JlfwSs6gd1IEbDGf c0FZDFzbRPGJRpSJlMQD5F4l+Ct7pX9W2QA9BTJ+uxouy6KlFAZPLvFabI5fvSQQspYhACAr 3/u9GXlGBAKcteYzFJp91r13r+SwHunBtp6+LuQ9NNz2kO1/0IqBSJKEkKbgOSYgUu0VIcKQ 6AT0m90xUQoz2S7Q9+4UxCmrXqsuh8HR8EWA+A88BuKyKff/0CeHGdsZh5MbsY38vA/QzMC3 0WM2djuAFRHu7qQTG+b96uF6za7PyEaIHUqdSICVREC4dTovMc0lB2nZtRpHbOxj9b1MSrt2 D3Mpy87750RkMoK2qOT7V3BxTW2qfDhVRUp7w/aWmak6AJRZ4O/YYGsr1/B4p5oM4KxXlSH+ n8elKCjAPsmCJiMkGmHRroLFbTwv/KdamSE3RhoAoUr8Cmr9zi7Z4dM7TpiJUBvdMEZZTvuZ 0yVsgRUjHNOAEaXgWZMS9rZI6wXIWLISLwJiti8ggJyX6VM
  • Ironport-hdrordr: A9a23:0yhEoqsA8plkXFWcn2tZV1jl7skCYoAji2hC6mlwRA09TyXBrb HXoBwavSWVtN9jYgBmpTngAtjHfZq4z/VICOYqTNOftWXd1ldAabsSlLcKoAeQbREWlNQtsp uIGpIWYLedYmSSz/yKhjVQeOxQo+VvhZrY4Ns2uE0dLz2CBZsA0+/VYTz3LmRGAC19QbYpHp uV4cRK4xKmZHQsd8y+QlUVQuTZoNXPtZT+JToLHQQu5gWihS6hrOeSKWnR4j4uFxd0hZsy+2 nMlAL0oo2lrvGA0xfZk0PD8phMn9Pl691bQOiBkNIcJDnAghuhIK5hR7qBljYop/zH0idirP D85zMbe+hj4XLYeW+45TH33RP77Too43j+jXeFnHrKu6XCNXgHIvsEobgcXgrS6kImst05+r lMxXilu51eCg6FtDjh5uLPSwphmiOP0DcfeK8o/jBiuLklGfFsRL8kjQJo+VA7bWLHAbUcYa ZT5QfnlbVrmB2hHjLkVyJUsaeRtzwIb227qs9ogL3Q79CD90oJinfwgvZv1Uso5dYzTYJJ6P /DNbktnLZSTtUOZaY4H+sZR9CrY1a9NC4kn1jiX2gPOZt3SE4lkaSHkokd9aWvYtgF3ZEykJ POXBdRsnMzYVvnDYmL0IdQ+h7ATW2hVXC1o/sukKRRq/n5Xv7mICeDQFchn4+ppOgeGNTSX7 K2NIhNC/HuIGPyEcJC3hH4WZNVNX4COfdlzuoTShaLuIbGO4fqvuvUfLLaI6fsCy8tXiflDn 4KTFHIVbV9B4CQKw7FaTTqKgzQkxbEjO9N+YDhjpQu9LQ=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
When the hypervisor returns -ETIME (timeout in the past) Linux keeps
retrying to setup the timer with a higher timeout instead of
self-injecting a timer interrupt.

On boxes without any hardware assistance for logdirty we have seen HVM
Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
logdirty is enabled:

CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 506250 nsec
CE: xen increased min_delta_ns to 759375 nsec
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
Freezing user space processes ...
INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 
jiffies, g=4006, c=4005, q=14130)
Task dump for CPU 14:
swapper/14      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 
jiffies, g=6922, c=6921, q=7013)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 
jiffies, g=8499, c=8498, q=7664)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14

Thus leading to CPU stalls and a broken system as a result.

Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
the hypervisor.  Old Linux versions are the only ones known to have
(wrongly) attempted to use the flag, and ignoring it is compatible
with the behavior expected by any guests setting that flag.

Note the usage of the flag has been removed from Linux by commit:

c06b6d70feb3 xen/x86: don't lose event interrupts

Which landed in Linux 4.7.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
Acked-by: Henry Wang <Henry.Wang@xxxxxxx> # CHANGELOG
---
Changes since v2:
 - Reword the comment about VCPU_SSHOTTMR_future being ignored.
 - s/ENOTIME/ETIME/ in the commit message.
 - Reword changelog message.

Changes since v1:
 - Just ignore the flag, as there's no ABI breakage.
---
 CHANGELOG.md              |  2 ++
 xen/common/domain.c       | 13 ++++++++++---
 xen/include/public/vcpu.h |  5 ++++-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5dbf8b06d72c..5bfd3aa5c0d5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,8 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
 ### Changed
  - Repurpose command line gnttab_max_{maptrack_,}frames options so they don't
    cap toolstack provided values.
+ - Ignore VCPUOP_set_singleshot_timer's VCPU_SSHOTTMR_future flag. The only
+   known user doesn't use it properly, leading to in-guest breakage.
 
 ### Added
  - On x86, support for features new in Intel Sapphire Rapids CPUs:
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 626debbae095..6a440590fe2a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1762,9 +1762,16 @@ long common_vcpu_op(int cmd, struct vcpu *v, 
XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&set, arg, 1) )
             return -EFAULT;
 
-        if ( (set.flags & VCPU_SSHOTTMR_future) &&
-             (set.timeout_abs_ns < NOW()) )
-            return -ETIME;
+        if ( set.timeout_abs_ns < NOW() )
+        {
+            /*
+             * Simplify the logic if the timeout has already expired and just
+             * inject the event.
+             */
+            stop_timer(&v->singleshot_timer);
+            send_timer_event(v);
+            break;
+        }
 
         migrate_timer(&v->singleshot_timer, smp_processor_id());
         set_timer(&v->singleshot_timer, set.timeout_abs_ns);
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index 81a3b3a7438c..a836b264a911 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -150,7 +150,10 @@ typedef struct vcpu_set_singleshot_timer 
vcpu_set_singleshot_timer_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
 
 /* Flags to VCPUOP_set_singleshot_timer. */
- /* Require the timeout to be in the future (return -ETIME if it's passed). */
+ /*
+  * Request the timeout to be in the future (return -ETIME if it's passed)
+  * but can be ignored by the hypervisor.
+  */
 #define _VCPU_SSHOTTMR_future (0)
 #define VCPU_SSHOTTMR_future  (1U << _VCPU_SSHOTTMR_future)
 
-- 
2.40.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.