[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/17] Q35 initial support for HVM guests


  • To: Alexey G <x1917x@xxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 5 May 2026 16:15:32 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Vx7MD38twReAUfUoLhcvPlJ4BkO2wCvVzg1VXItAjz0=; b=GBqEL4+tP33x/H9pWlFEQZzTB1yfDFMjeuBlXrk/RHx36fn3ebN5FEgyD2BgZKj9iCYWBTin7yJgXg3A4FLE15dG80+jm4YQwYc7U4s7z9MR9sY3scUvmUQDZLx2yLp33UzEB4fjgSVFWnanwrF+PV5lhT9GQg00S78Dw7U8yjPt8mR3ubIbBbGFbMM1qem/CF2E7d4KufPks1dqhbBXFLrhuTnN4AZsBgzeKgfVXBeFcYAAvYkN6X3KsuFMLRFMfkoTI4UWhp4SlLHaqsbNDSmLUVbH0RjbQMbs0Yj4NBBxbQmd054DCnrFf+7XZL4oVHd14ThkaLNMPw0HtBjuog==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=uwOsWRSa4Ko14Q7/4NMByV0GlQXKetg9zfwiKTImmb7GHNxkS4E1oWXGqKT/jRzuXG4BrVI7vbXmluPtvFi0NqnADgVDcZg6WgpAFKMCP2O9ZU4VoSmYQi13j00hwu6DC5rXZKtk+Cnmi/wWd7Q0bhApG16hYhvYB5/B07BtXwc+JTC1pDbbzIE4NtLsl0IS4uEAohEKQeo8EGM5gXKQVyCzkptnRMARjovtujBPBuv/rr7hRr7g4xT9dqVbaVCxTmNJPaB3Jm2wWOVpZ99DQytsjOPqMKzhA6tyRSq7VvIa9TKQt9u3bRI/y2a2viNlfLH+nhCuDyma66A+BAED9g==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Thierry Escande <thierry.escande@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
  • Delivery-date: Tue, 05 May 2026 14:16:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, May 05, 2026 at 03:07:36PM +0200, Alexey G wrote:
> On Tue, 28 Apr 2026 09:48:41 +0200
> Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> 
> >On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
> >> This series introduces initial Q35 chipset support for HVM guests,
> >> based on the patchset at [1] by Alexey Gerasimenko.
> >> 
> >> Basic support means that this patchset allows to start an HVM guest
> >> that emulates a Q35 chipset via Qemu and implements access to PCIe
> >> extended configuration space for such devices emulated by Qemu.
> >> 
> >> Support for PCIe device passthrough is not implemented yet. This is
> >> planned but implies modifications in the hypervisor and the
> >> firmwares, mainly for the support of multiple PCI buses.
> >
> >Why do you need multi bus support to expose PCIe capabilities?  I'm
> >not seeing the relation between those two.  You could still expose a
> >single bus on the MCFG table.
> 
> The problem with the PCIe bus is that it's very "topological" by design
> - and it always wants a valid hierarchy.
> 
> Each PCIe device manifests itself (via its PCIe Capabilities entry)
> as either a chipset-integrated device or a regular PCIe endpoint
> device, which is the most common case. There are more types IIRC but
> these are what we deal with mostly - both for PT devices and
> QEMU-emulated ones.
> 
> But, being a PCIe endpoint means that the device must have some parent
> device. It can be located below a PCIe switch or, in the simplest and
> the most common case, below a PCIe Root Port device.
> 
> In both cases the 'parent' is a PCI-PCI bridge technically, with the
> PCIe endpoint device being located on its secondary bus.
> 
> As the Q35 patch series was done with mostly PCIe device passthrough in
> mind, this brings the main complication - in order to properly place a
> passed through device on the PCIe bus, we need an emulated/real/hybrid
> Root Port device.
> 
> A much lengthier description is in this patch message:
> https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg01197.html
> 
> To summarize, we need this 'valid PCIe topology' nonsense just to make
> Windows kernel (pci.sys driver specifically) not to discard our PT
> device due to checking PCIe bus hierarchy above it.
> 
> This limitation was found/confirmed via debugging - luckily, pci.sys
> had symbols and the main bad function which was failing had a very
> speaking name - something like pcieCheckTopology or similar.
> 
> Emulating the "chipset-integrated device" in PT device's PCIe
> Capabilities was a simple hack which allowed to bypass the requirement
> to have a valid PCIe hierarchy with multiple buses. But the proper
> future direction is implementing emulation of Root Ports or PCIe
> switches I guess.

Oh, I see.  We discussed this with Jan, it wasn't clear whether it
would be a strict requirement or not.  We have our answer now, it is a
strict requirement for pass-through to Windows guests.

> >> The PCIe MMCONFIG area is configured by hvmloader and its base
> >> address and size are set in Xen using a new pair of hypercalls
> >> HVMOP_get|set_ecam_space.
> >
> >I guess I will see how that looks like in the series, but the setting
> >of the ECAM region would better be done by the toolstack.  Setting it
> >in hvmloader is possibly not the best placement, because it doesn't
> >run for PVH guests (and we will want ECAM support for PVH at some
> >point), and there's also a vague plan/intention to get rid of
> >hvmloader even for HVM guests eventually.
> 
> This is the situation where the difference between HVM and PVH might be
> very problematic I'm afraid. HVM guests assume full freedom over the
> IO/MMIO resources setup inside their sandboxed environment.
> 
> It's not just Windows reallocating PCI BARs to its liking, but also
> spans to the emulated chipset's resources. In worst case we could have
> MMCONFIG reinitialization implemented even in Intel's Q35 drivers
> installed inside an HVM guest. Fortunately, this is not what I remember
> was the case, but in theory Q35 driver could have done things like this.

Indeed.  In later patches my recommendation was to trap accesses to
the root complex registers that control the position and size of the
ECAM region, and forward those to the hypervisor, instead of hvmloader
using a side-band hypercall to set the position and size of the ECAM
region.

I've also discussed this with Jan, alternatively we could trap the
registers directly in Xen itself, but the Xen would need to know the
domain has an emulated q35, which we might need to do at some point,
but we are likely not there yet.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.