[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] Limit retry in ControllerGetResponse



On 21/10/2025 15:07, Owen Smith wrote:
> Is there a way of detecting the backend has disappeared during 
> ControllerGetResponse?
> As currently, there is a 100ms delay between attempts to trigger the backend 
> - its possible the backend could take longer to respond in normal operation
> 
> Owen

I couldn't think of a clean way to do that. One way is to try and probe 
FrontendIsBackendOnline every timeout expiration to see if it's still 
working. It wouldn't work if the backend became unresponsive, but I can 
add that as an additional check for v2.

Another problem is with XENVIF_CONTROLLER_POLL_PERIOD as passed to 
EvtchnWait. This is similar to other places where we call 
KeStallExecutionProcessor in a loop at DISPATCH_LEVEL. This really 
limits how long we can wait in ControllerGetResponse, and I feel like 
10x10ms is already way too long.

IMO a better way would be to rework the controller (and eventually other 
components) to not run requests full-time at DISPATCH_LEVEL, so that 
we're free to wait for as long as we need. This is what I've been trying 
to do here: 
https://github.com/xcp-ng/win-xenvif/tree/dnt/controller-rework, but I 
need to find time to continue it...

> 
> ________________________________________
> From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> on behalf of 
> Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
> Sent: 16 October 2025 11:16 AM
> To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Tu Dinh
> Subject: [PATCH 2/2] Limit retry in ControllerGetResponse
> 
> During unplug, NDIS may issue a hash reconfiguration. However, the
> backend may disappear before the request has had a chance to complete.
> This is observable by e.g. adding a delay at the beginning of
> VifReceiverSetHashAlgorithm and unplugging. If this happens, Xenvif will
> get stuck waiting forever in ControllerGetResponse, at DISPATCH_LEVEL to
> boot.
> 
> Limit the EvtchnWait retry to 10 attempts (10ms each to keep the
> previous 100ms limit) and return an error code to the caller if the wait
> has failed.
> 
> Signed-off-by: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
> ---
>   src/xenvif/controller.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/src/xenvif/controller.c b/src/xenvif/controller.c
> index e032972..109c030 100644
> --- a/src/xenvif/controller.c
> +++ b/src/xenvif/controller.c
> @@ -248,7 +248,7 @@ fail1:
>   #define TIME_MS(_ms)        (TIME_US((_ms) * 1000))
>   #define TIME_RELATIVE(_t)   (-(_t))
> 
> -#define XENVIF_CONTROLLER_POLL_PERIOD 100 // ms
> +#define XENVIF_CONTROLLER_POLL_PERIOD 10 // ms
> 
>   _IRQL_requires_(DISPATCH_LEVEL)
>   static NTSTATUS
> @@ -258,11 +258,12 @@ ControllerGetResponse(
>       )
>   {
>       LARGE_INTEGER                   Timeout;
> +    ULONG                           Attempt;
>       NTSTATUS                        status;
> 
>       Timeout.QuadPart = 
> TIME_RELATIVE(TIME_MS(XENVIF_CONTROLLER_POLL_PERIOD));
> 
> -    for (;;) {
> +    for (Attempt = 0; Attempt < 10; Attempt++) {
>           ULONG   Count;
> 
>           Count = XENBUS_EVTCHN(GetCount,
> @@ -284,6 +285,13 @@ ControllerGetResponse(
>               __ControllerSend(Controller);
>       }
> 
> +    // Use STATUS_TRANSACTION_TIMED_OUT as an error code since 
> STATUS_TIMEOUT is
> +    // a success code.
> +    if (Controller->Response.id != Controller->Request.id) {
> +        status = STATUS_TRANSACTION_TIMED_OUT;
> +        goto done;
> +    }
> +
>       ASSERT3U(Controller->Response.type, ==, Controller->Request.type);
> 
>       switch (Controller->Response.status) {
> @@ -311,6 +319,7 @@ ControllerGetResponse(
>       if (NT_SUCCESS(status) && Data != NULL)
>           *Data = Controller->Response.data;
> 
> +done:
>       RtlZeroMemory(&Controller->Request,
>                     sizeof (struct xen_netif_ctrl_request));
>       RtlZeroMemory(&Controller->Response,
> --
> 2.51.0.windows.2
> 
> 
> 
> --
> Ngoc Tu Dinh | Vates XCP-ng Developer
> 
> XCP-ng & Xen Orchestra - Vates solutions
> 
> web: https://vates.tech
> 
> 
> 



--
Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.