[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/3] Limit retry in ControllerGetResponse


  • To: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>, "win-pv-devel@xxxxxxxxxxxxxxxxxxxx" <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Owen Smith <owen.smith@xxxxxxxxxx>
  • Date: Fri, 7 Nov 2025 09:11:01 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4RVT8y4YMUjfhoTDpcyjL6eWxy3THS+zDPHVNH3HyZA=; b=PrNALVjdblAl/J9o8dfqfHYeRHoEA35CFFB9p7ikodotnj+TloWuYgU3Nta7WFhkDqBljcdvqqRQBUSEP/bXZN7cuKvPVgnfzahy/tj2ZMG7KgGztvG+YP7+0jxcWAEJflOF9633QSlIm6hJ00XTaViNRMHA9SM8/BGCF3DASU4rdeSGvmwFtsOUw4HuM+umKeDAwqyP+1MzTXWVj5jlGq+RF3HEedsgHvpY7sJOAQq1+jatYPyNUPNye0aOtEtL7xMirQLEv7r6VMnaRTHAjqKlWeS0Yo5aNVe5yT7hKwOnDaLUBAOYc2rDKkkZEeBSNT5r7CgkaJVBtQCNetk7pg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=CGO6glVS05VFwkNgptekjNn0HjjtT2cY3uob5k8hEIjmPybzBpLfHnzBtNeEFi0vGVq5sp7/JW/+EqfwIXPK8abEYgu4CnqMnm0p0FfcapTbmPKGd6D0SUzKTrw8Uli/Ouma8yyg4E8zeYa/9YrdRRexdKBYpyvrrMH3vV2H3oMVE0V1DiI+5TSCdoBT5y4stb+wXGiMB+L2C7CN/CyglafXIo55GwHWuceGoy1XJ03DqygHr2B58LkzxBY++SdkBLKwlr4hR0wVtHlYkWWWOVZ1qRuUSAwkKPJN+0vOT87uzvhowYd7A+0jD4ndygesC5bFPyb2iLdqdnwX+7pmYw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Delivery-date: Fri, 07 Nov 2025 09:11:13 +0000
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>
  • Msip_labels:
  • Thread-index: AQHcTyhPf4yajOGgv025SjSgwRoOp7Tm7gP8
  • Thread-topic: [PATCH v2 2/3] Limit retry in ControllerGetResponse

Would keeping a count of the number of STATUS_TIMEOUTs returned by 
ControllerGetResponse() and breaking out of the loop after 50 timeouts be 
simpler?
Or is it required to query the time to ensure a more accurate poll?

Owen

________________________________________
From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> on behalf of Tu 
Dinh <ngoc-tu.dinh@xxxxxxxxxx>
Sent: 06 November 2025 2:18 PM
To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Cc: Tu Dinh
Subject: [PATCH v2 2/3] Limit retry in ControllerGetResponse

During unplug, NDIS may issue a hash reconfiguration. However, the
backend may disappear before the request has had a chance to complete.
This is observable by e.g. adding a delay at the beginning of
VifReceiverSetHashAlgorithm and unplugging. If this happens, Xenvif will
get stuck waiting forever in ControllerGetResponse, at DISPATCH_LEVEL to
boot.

Limit the total wait time from calling EvtchnWait to 5 seconds, and
return an error code to the caller if the wait has failed.

Signed-off-by: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
---
 src/xenvif/controller.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/src/xenvif/controller.c b/src/xenvif/controller.c
index e032972..6a091d6 100644
--- a/src/xenvif/controller.c
+++ b/src/xenvif/controller.c
@@ -246,9 +246,11 @@ fail1:

 #define TIME_US(_us)        ((_us) * 10)
 #define TIME_MS(_ms)        (TIME_US((_ms) * 1000))
+#define TIME_S(_s)          (TIME_MS((_s) * 1000))
 #define TIME_RELATIVE(_t)   (-(_t))

-#define XENVIF_CONTROLLER_POLL_PERIOD 100 // ms
+#define XENVIF_CONTROLLER_POLL_PERIOD   TIME_MS(100)
+#define XENVIF_CONTROLLER_MAX_WAIT      TIME_S(5)

 _IRQL_requires_(DISPATCH_LEVEL)
 static NTSTATUS
@@ -257,10 +259,15 @@ ControllerGetResponse(
     OUT PULONG                      Data OPTIONAL
     )
 {
-    LARGE_INTEGER                   Timeout;
+    LARGE_INTEGER                   PollTimeout;
+    LARGE_INTEGER                   MaxTimeout;
+    LARGE_INTEGER                   Start;
     NTSTATUS                        status;

-    Timeout.QuadPart = TIME_RELATIVE(TIME_MS(XENVIF_CONTROLLER_POLL_PERIOD));
+    PollTimeout.QuadPart = TIME_RELATIVE(XENVIF_CONTROLLER_POLL_PERIOD);
+    MaxTimeout.QuadPart = TIME_RELATIVE(XENVIF_CONTROLLER_MAX_WAIT);
+
+    KeQuerySystemTime(&Start);

     for (;;) {
         ULONG   Count;
@@ -279,9 +286,27 @@ ControllerGetResponse(
                                &Controller->EvtchnInterface,
                                Controller->Channel,
                                Count + 1,
-                               &Timeout);
-        if (status == STATUS_TIMEOUT)
+                               &PollTimeout);
+        if (status == STATUS_TIMEOUT) {
+            LARGE_INTEGER       Now;
+            LONGLONG            Delta;
+
+            KeQuerySystemTime(&Now);
+
+            // Relative timeout
+            Delta = Now.QuadPart - Start.QuadPart;
+            if (Delta > -MaxTimeout.QuadPart)
+                break;
+
             __ControllerSend(Controller);
+        }
+    }
+
+    // Use STATUS_TRANSACTION_TIMED_OUT as an error code since STATUS_TIMEOUT 
is
+    // a success code.
+    if (Controller->Response.id != Controller->Request.id) {
+        status = STATUS_TRANSACTION_TIMED_OUT;
+        goto done;
     }

     ASSERT3U(Controller->Response.type, ==, Controller->Request.type);
@@ -311,6 +336,7 @@ ControllerGetResponse(
     if (NT_SUCCESS(status) && Data != NULL)
         *Data = Controller->Response.data;

+done:
     RtlZeroMemory(&Controller->Request,
                   sizeof (struct xen_netif_ctrl_request));
     RtlZeroMemory(&Controller->Response,
--
2.51.0.windows.2



--
Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.