[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [early RFC] ARM PCI Passthrough design document



On Tue, Jan 24, 2017 at 05:17:06PM +0000, Julien Grall wrote:
> Hi Roger,
> 
> On 06/01/17 15:12, Roger Pau Monné wrote:
> > On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
> > > So given a specific SBDF, it would be possible to find the host bridge 
> > > and the
> > > RID associated to a PCI device.
> > > 
> > > # Interaction of the PCI subsystem with other subsystems
> > > 
> > > In order to have a PCI device fully working, Xen will need to configure
> > > other subsystems subsytems such as the SMMU and the Interrupt Controller.
> >                    ^ duplicated.
> > > 
> > > The interaction expected between the PCI subsystem and the other is:
> >                                                          ^ this seems quite
> >                                                          confusing, what's 
> > "the
> >                                                          other"?
> 
> By "other" I meant "IOMMU and Interrupt Controller". Would the wording "and
> the other subsystems" be better?

Yes, I think so.

> > >     * Add a device
> > >     * Remove a device
> > >     * Assign a device to a guest
> > >     * Deassign a device from a guest
> > > 
> > > XXX: Detail the interaction when assigning/deassigning device
> > 
> > Assigning a device will probably entangle setting up some direct MMIO 
> > mappings
> > (BARs and ROMs) plus a bunch of traps in order to perform emulation of 
> > accesses
> > to the PCI config space (or those can be setup when a new bridge is 
> > registered
> > with Xen).
> 
> I am planning to details the root complex emulation in a separate section. I
> sent the design document before writing it.
> 
> In brief, I would expect the registration of a new bridge to setup the trap
> to emulation access to the PCI configuration space. On ARM, the first
> approach will rely on the OS to setup the BARs and ROMs. So they will be
> mapped by the PCI configuration space emulation.
> 
> The reason on relying on the OS to setup the BARs/ROMs reducing the work to
> do for a first version. Otherwise we would have to add code in the toolstack
> to decide where to place the BARs/ROMs. I don't think it is a lot of work,
> but it is not that important because it does not require a stable ABI (this
> is an interaction between the hypervisor and the toolstack). Furthermore,
> Linux (at least on ARM) is assigning the BARs at the setup. From my
> understanding, this is the expected behavior with both DT (the DT has a
> property to skip the scan) and ACPI.

This approach might work for Dom0, but for DomU you certainly need to know
where the MMIO regions of a device are, and either the toolstack or Xen needs
to setup this in advance (or at least mark which MMIO regions are available to
the DomU). Allowing a DomU to map random MMIO regions is certainly a security
issue.

> > 
> > > ## Interrupt controller
> > > 
> > > PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. 
> > > On ARM
> > > legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> > > either mapped to SPIs or LPIs.
> > > 
> > > Whilst SPIs can be programmed using an interrupt number, LPIs can be
> > > identified via a pair (DeviceID, EventID) when configure through the ITS.
> >                                                           ^d
> > 
> > > 
> > > The DeviceID is a unique identifier for each MSI-capable device that can
> > > be deduced from the RID with the help of the firmware tables (see below).
> > > 
> > > XXX: Figure out if something is necessary for GICv2m
> > > 
> > > # Information available in the firmware tables
> > > 
> > > ## ACPI
> > > 
> > > ### Host bridges
> > > 
> > > The static table MCFG (see 4.2 in [1]) will describe the host bridges 
> > > available
> > > at boot and supporting ECAM. Unfortunately there are platforms out there
> > > (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> >                                                     ^s
> > 
> > > compatible.
> > > 
> > > This means that Xen needs to account for possible quirks in the host 
> > > bridge.
> > > The Linux community are working on a patch series for see (see [2] and 
> > > [3])
> > > where quirks will be detected with:
> > >     * OEM ID
> > >     * OEM Table ID
> > >     * OEM Revision
> > >     * PCI Segment (from _SEG)
> > >     * PCI bus number range (from _CRS, wildcard allowed)
> > 
> > So segment and bus number range needs to be fetched from ACPI objects? Is 
> > that
> > because the information in the MCFG is lacking/wrong?
> 
> All the host bridges will be described in ASL. Only the one available at
> boot will be described in the MCFG. So it looks more sensible to rely on the
> ASL from Linux POV.

Yes, that's right. We need to rely on PHYSDEVOP_pci_mmcfg_reserved or similar
so that Dom0 can tell Xen about hotplug host bridges found in the ACPI
namespace.

> > 
> > > 
> > > Based on what Linux is currently doing, there are two kind of quirks:
> > >     * Accesses to the configuration space of certain sizes are not allowed
> > >     * A specific driver is necessary for driving the host bridge
> > 
> > Hm, so what are the issues that make this bridges need specific drivers?
> > 
> > This might be quite problematic if you also have to emulate this broken
> > behavior inside of Xen (because Dom0 is using a specific driver).
> 
> I am not expecting to emulate the configuration space access for DOM0. I
> know you mentioned that it would be necessary to hide PCI used by Xen (such
> as the UART) to DOM0 or configuring MSI. But for ARM, the UART is integrated
> in the SOC and MSI will be configured through the interrupt controller.

Right, we certainly need to do it for x86, but I don't know that much of the
ARM architecture in order to know if that's needed or not. I'm also wondering
if having both Xen and the Dom0 directly accessing the ECAM area is fine, even
if they use different cache mapping attributes?

> > > So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> > > with all the relevant informations. This will be done via a new hypercall
> > > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > > 
> > > struct physdev_pci_host_bridge_add
> > > {
> > >     /* IN */
> > >     uint16_t seg;
> > >     /* Range of bus supported by the host bridge */
> > >     uint8_t  bus_start;
> > >     uint8_t  bus_nr;
> > >     uint32_t res0;  /* Padding */
> > >     /* Information about the configuration space region */
> > >     uint64_t cfg_base;
> > >     uint64_t cfg_size;
> > > }
> > 
> > Why do you need to cfg_size attribute? Isn't it always going to be 4096 
> > bytes
> > in size?
> 
> The cfg_size is here to help us to match the corresponding node in the
> device tree. The cfg_size may differ depending on how the hardware has
> implemented the access to the configuration space.

But certainly cfg_base needs to be aligned to a PAGE_SIZE? And according to the
spec cfg_size cannot be bigger than 4KB (PAGE_SIZE), so in any case you will
end up mapping a whole 4KB page, because that's the minimum granularity of the
p2m?

> But to be fair, I think we can deal without this property. For ACPI, the
> size will vary following the number of bus handled and can be deduced. For
> DT, the base address and bus range should be enough to find the associated
> node.
> 
> > 
> > If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved
> > hypercalls.
> > 
> > > DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> > > bridge available on the platform. When Xen is receiving the hypercall, the
> > > the driver associated to the host bridge will be instantiated.
> > > 
> > > XXX: Shall we limit DOM0 the access to the configuration space from that
> > > moment?
> > 
> > Most definitely yes, you should instantiate an emulated bridge over the real
> > one, in order to proxy Dom0 accesses to the PCI configuration space. You for
> > example don't want Dom0 moving the position of the BARs of PCI devices 
> > without
> > Xen being aware (and properly changing the second stage translation).
> 
> The problem is on ARM we don't have a single way to access the configuration
> space. So we would need different emulator in Xen, which I don't like unless
> there is a strong reason to do it.
> 
> We could avoid DOM0s to modify the position of the BARs after setup. I also
> remembered you mention about MSI configuration, for ARM this is done via the
> interrupt controller.
> 
> > 
> > > ## Discovering and register PCI
> > > 
> > > Similarly to x86, PCI devices will be discovered by DOM0 and register
> > > using the hypercalls PHYSDEVOP_pci_add_device or 
> > > PHYSDEVOP_manage_pci_add_ext.
> > 
> > Why do you need this? If you have access to the bridges you can scan them 
> > from
> > Xen and discover the devices AFAICT.
> 
> I am a bit confused. Are you saying that you plan to ditch them for PVH? If
> so, why are they called by Linux today?

I think we can get away with PHYSDEVOP_pci_mmcfg_reserved only, but maybe I'm
missing something. AFAICT Xen should be able to gather all the other data by
itself from the PCI config space once it knows the details about the host
bridge.

> > 
> > > By default all the PCI devices will be assigned to DOM0. So Xen would have
> > > to configure the SMMU and Interrupt Controller to allow DOM0 to use the 
> > > PCI
> > > devices. As mentioned earlier, those subsystems will require the StreamID
> > > and DeviceID. Both can be deduced from the RID.
> > > 
> > > XXX: How to hide PCI devices from DOM0?
> > 
> > By adding the ACPI namespace of the device to the STAO and blocking Dom0
> > access to this device in the emulated bridge that Dom0 will have access to
> > (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI 
> > header).
> 
> Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a
> driver (similarly to xen-pciback.hide). We still want DOM0 to access the PCI
> config space in order to reset the device. Unless you plan to import all the
> reset quirks in Xen?

I don't have a clear opinion here, and I don't know all thew details of this
reset hacks.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.