[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/7] Implement forced unplug and forced activation



On 24/07/2025 12:49, Marek Marczykowski-Górecki wrote:
> On Thu, Jul 24, 2025 at 10:43:32AM +0000, Tu Dinh wrote:
>> Hi Marek,
>>
>> On 24/07/2025 12:40, Marek Marczykowski-Górecki wrote:
>>> On Wed, Jul 23, 2025 at 01:58:27PM +0000, Tu Dinh wrote:
>>>> The goal of these two features is to simplify driver servicing and avoid
>>>> requiring storing driver state in Registry.
>>>>
>>>> Forced unplug, as name implies, forcefully unplugs emulated devices when
>>>> a driver is present, rather than when it's active. It defines a Registry
>>>> key at CurrentControlSet\XEN\ForceUnplug. Drivers can opt into forced
>>>> unplug by creating an appropriate value (DISKS/NICS) in this key.
>>>>
>>>> Forced activation is the companion to forced unplug. It aims to make
>>>> activation of Xenbus FDOs deterministic and stateless, using a
>>>> precedence mapping based on device IDs, prioritizing the vendor device
>>>> over the generic ones. This avoids situations where the wrong FDO is
>>>> activated, which will prevent Windows Update from working.
>>>>
>>>> With forced activation, Xenfilt is now installed via INF on top of the
>>>> current installation routines. This means PV drivers can be injected
>>>> offline without needing another reboot to be reconfigured.
>>>>
>>>> To avoid affecting older drivers, the two features are conditioned
>>>> behind new build variables FORCE_UNPLUG and FORCE_ACTIVATE.
>>>>
>>>> The following scenarios have been successfully tested, both requiring
>>>> only one reboot:
>>>> * Offline driver installation via DISM
>>>> * Toggling vendor device
>>>
>>> Hi,
>>>
>>> I tested this series (or rather
>>> https://github.com/xcp-ng/win-xenbus/tree/emulated-v3 at
>>> 9424383b7c2f0b6d9f4a7fd554662c8679cc935f) and got the following failure:
>>>
>>>       xen platform: xenbus|PdoQueryId: (ULONG_PTR)Buffer - 
>>> (ULONG_PTR)Id.Buffer = 1032
>>>       xen platform: xenbus|PdoQueryId: REGSTR_VAL_MAX_HCID_LEN = 1024
>>>       xen platform: xenbus|PdoQueryId: ASSERTION FAILED: ((ULONG_PTR)Buffer 
>>> - (ULONG_PTR)Id.Buffer) < (1024)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 0000000000038990)
>>>       xen platform: xenbus|SUSPEND: Count = 0
>>>       xen platform: xenbus|SUSPEND: EARLY: xenbus.sys + 0000000000010910 
>>> (FFFFC08B03CFF130)
>>>       xen platform: xenbus|SUSPEND: EARLY: xenbus.sys + 000000000002F3D0 
>>> (FFFFC08B03F11910)
>>>       xen platform: xenbus|SUSPEND: EARLY: xenbus.sys + 0000000000026DA0 
>>> (FFFFC08B04B63640)
>>>       xen platform: xenbus|SUSPEND: EARLY: xenbus.sys + 00000000000373A0 
>>> (FFFFC08B09393000)
>>>       xen platform: xenbus|SUSPEND: LATE: xenbus.sys + 0000000000010B10 
>>> (FFFFC08B03CFF130)
>>>       xen platform: xenbus|SUSPEND: LATE: xenbus.sys + 0000000000037440 
>>> (FFFFC08B09393000)
>>>       xen platform: xenbus|SUSPEND: LATE: xenbus.sys + 0000000000004520 
>>> (FFFFC08B09265260)
>>>       xen platform: xenbus|SUSPEND: LATE: xenbus.sys + 0000000000021110 
>>> (FFFFC08B092DA010)
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 0000000000038990)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 000000000003C5C0)
>>>       xen platform: xenbus|RANGE_SET: RANGE SETS:
>>>       xen platform: xenbus|RANGE_SET:  - gnttab:
>>>       xen platform: xenbus|RANGE_SET:    {20 - 1ff}*
>>>       xen platform: xenbus|RANGE_SET:  - balloon:
>>>       xen platform: xenbus|RANGE_SET:    EMPTY
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 000000000003C5C0)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 000000000000D850)
>>>       xen platform: xenbus|EVTCHN: EVENT CHANNELS:
>>>       xen platform: xenbus|EVTCHN: - (0001) BY xenbus.sys + 
>>> 0000000000034A41 ACTIVE
>>>       xen platform: xenbus|EVTCHN: FIXED
>>>       xen platform: xenbus|EVTCHN: Count = 40
>>>       xen platform: xenbus|EVTCHN: - (0002) BY xenbus.sys + 
>>> 0000000000003AD6 ACTIVE
>>>       xen platform: xenbus|EVTCHN: FIXED
>>>       xen platform: xenbus|EVTCHN: Count = 0
>>>       xen platform: xenbus|EVTCHN: - (0006) BY xenbus.sys + 
>>> 000000000002355B ACTIVE
>>>       xen platform: xenbus|EVTCHN: VIRQ: Index = 1
>>>       xen platform: xenbus|EVTCHN: Count = 0
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 000000000000D850)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 000000000002E680)
>>>       xen platform: xenbus|SHARED_INFO: Address = 00000000.7ee78000
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 000000000002E680)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 0000000000042710)
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 0000000000042710)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 0000000000025D60)
>>>       xen platform: xenbus|GNTTAB: [0] Address = 7ee66000.884c0740
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 0000000000025D60)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 00000000000341A0)
>>>       xen platform: xenbus|STORE: Address = 00000000.feffc000
>>>       xen platform: xenbus|STORE: Events = 40 Dpcs = 4 Polls = 44
>>>       xen platform: xenbus|STORE: WATCHES:
>>>       xen platform: xenbus|STORE: - (9255) ON device BY xenbus.sys + 
>>> 0000000000022879 [ACTIVE]
>>>       xen platform: xenbus|STORE: - (9256) ON control/shutdown BY 
>>> xenbus.sys + 0000000000022908 [ACTIVE]
>>>       xen platform: xenbus|STORE: - (9257) ON memory/target BY xenbus.sys + 
>>> 0000000000022A0A [ACTIVE]
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 00000000000341A0)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 0000000000003680)
>>>       xen platform: xenbus|CONSOLE: Address = 00000000.fefff000
>>>       xen platform: xenbus|CONSOLE: Events = 0 Dpcs = 1
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 0000000000003680)
>>>       xen platform: XEN|DEBUG: ====> (xenbus.sys + 0000000000016F90)
>>>       xen platform: xenbus|FDO: VIRQS:
>>>       xen platform: xenbus|FDO: - DEBUG: (0:0) Count = 0
>>>       xen platform: XEN|DEBUG: <==== (xenbus.sys + 0000000000016F90)
>>>       xen platform: xen|BUGCHECK: ====>
>>>       xen platform: xen|BUGCHECK: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED: 
>>> FFFFFFFFC0000420 FFFFF8028845B042 FFFFB0849D5E9178 FFFFB0849D5E89B0
>>>       xen platform: xen|BUGCHECK: C0000420 AT xenbus.sys + 000000000002B042
>>>       xen platform: xen|BUGCHECK: EXCEPTION (FFFFB0849D5E9178):
>>>       xen platform: xen|BUGCHECK: - Code = C0000420
>>>       xen platform: xen|BUGCHECK: - Flags = 00000000
>>>       xen platform: xen|BUGCHECK: - Address = FFFFF8028845B042
>>>       xen platform: xen|BUGCHECK: CONTEXT (FFFFB0849D5E89B0):
>>>       xen platform: xen|BUGCHECK: - GS = 002B
>>>       xen platform: xen|BUGCHECK: - FS = 0053
>>>       xen platform: xen|BUGCHECK: - ES = 002B
>>>       xen platform: xen|BUGCHECK: - DS = 002B
>>>       xen platform: xen|BUGCHECK: - SS = 0018
>>>       xen platform: xen|BUGCHECK: - CS = 0010
>>>       xen platform: xen|BUGCHECK: - EFLAGS = 00040282
>>>       xen platform: xen|BUGCHECK: - RDI = 00000000090FCA70
>>>       xen platform: xen|BUGCHECK: - RSI = 000000009D5E9700
>>>       xen platform: xen|BUGCHECK: - RBX = 0000000009389340
>>>       xen platform: xen|BUGCHECK: - RDX = 0000000000000059
>>>       xen platform: xen|BUGCHECK: - RCX = 00000000FD000000
>>>       xen platform: xen|BUGCHECK: - RAX = 0000000000000001
>>>       xen platform: xen|BUGCHECK: - RBP = 0000000000000000
>>>       xen platform: xen|BUGCHECK: - RIP = 000000008845B042
>>>       xen platform: xen|BUGCHECK: - RSP = 000000009D5E93B0
>>>       xen platform: xen|BUGCHECK: - R8 = 000000000000004D
>>>       xen platform: xen|BUGCHECK: - R9 = 0000000000000000
>>>       xen platform: xen|BUGCHECK: - R10 = 00000000719679B0
>>>       xen platform: xen|BUGCHECK: - R11 = 0000000000000000
>>>       xen platform: xen|BUGCHECK: - R12 = 0000000080000FB4
>>>       xen platform: xen|BUGCHECK: - R13 = 0000000000000001
>>>       xen platform: xen|BUGCHECK: - R14 = 00000000C00000BB
>>>       xen platform: xen|BUGCHECK: - R15 = 0000000009389340
>>>       xen platform: xen|BUGCHECK: STACK:
>>>       xen platform: xen|BUGCHECK: 000000009D5E9500: (00000000091AA140 
>>> 00000000090FCA70 0000000000000013 000000008847C370) xenbus.sys + 
>>> 0000000000028668
>>>       xen platform: xen|BUGCHECK: 000000009D5E9560: (00000000091AA140 
>>> 00000000090FCA70 0000000000000518 0000000000000000) xenbus.sys + 
>>> 0000000000028408
>>>       xen platform: xen|BUGCHECK: 000000009D5E95A0: (00000000091AA140 
>>> 00000000090FCA70 0000000000000000 000000007182E891) xenbus.sys + 
>>> 000000000000A3CC
>>>       xen platform: xen|BUGCHECK: 000000009D5E9620: (0000000009389340 
>>> 00000000090FCA70 0000000000000001 0000000069706E04) ntoskrnl.exe + 
>>> 000000000022A6B5
>>>       xen platform: xen|BUGCHECK: 000000009D5E9660: (0000000000000000 
>>> 0000000009389340 000000009D5E9700 0000000000000001) ntoskrnl.exe + 
>>> 0000000000694A88
>>>       xen platform: xen|BUGCHECK: 000000009D5E96D0: (00000000C00000BB 
>>> 000000000938CC90 000000009D5E9878 0000000009389340) ntoskrnl.exe + 
>>> 0000000000733716
>>>       xen platform: xen|BUGCHECK: 000000009D5E9760: (0000000000000000 
>>> 000000009D5E9878 0000000003D5A500 0000000000000000) ntoskrnl.exe + 
>>> 00000000007335F4
>>>       xen platform: xen|BUGCHECK: 000000009D5E97C0: (0000000000000000 
>>> 000000009D5E98C0 0000000000020000 000000000938CC90) ntoskrnl.exe + 
>>> 0000000000730EC1
>>>
>>>
>>> This is on fresh Windows 10 (22H2) domU, with opt-in patches for both xenvbd
>>> and xennet.
>>>
>>> The test was done on our CI system, more logs are available at
>>> https://openqa.qubes-os.org/tests/147544#downloads. Especially it
>>> includes tarball of the whole /var/log, where you can find
>>> /var/log/xen/console/guest-windows-test-dm.log with messages before the
>>> above failure.
>>>
>>
>> That tree is stale, could you try again (or apply the patches directly)?
>> That BSOD should already be fixed with 3e93bee44bbc "xenbus: Use
>> nonpaged pool in FdoQueryId".
>
> Hm, I have this patch included already. Now when fetched diff shows two
> logical differences:
> - change in evtchn_fifo.c (change while loop to for)
> - changes in xenbus.inf adding FORCE_ACTIVATION among other things
>
> BTW, should I enable also FORCE_ACTIVATION?
>

You're right, I missed the assertion failure. Looks like adding rev
0900000C caused the compatible ID list to exceed
REGSTR_VAL_MAX_HCID_LEN. Although I have no idea how to fix this without
breaking compatibility somehow.

The evtchn_fifo fix is a minor fix that only concerns failure cleanup.

If you use forced unplug you'll very likely want forced activation as
well, as you'll get to avoid the reboots needed to change the active
Xenbus FDO.


Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.