|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PCI Pass-through in Xen ARM: Draft 4
CC'ing a few other x86 people as this is likely to be the same approach
that will be taken by PVH.
On Thu, 13 Aug 2015, Manish Jaggi wrote:
> -----------------------------
> | PCI Pass-through in Xen ARM |
> -----------------------------
> manish.jaggi@xxxxxxxxxxxxxxxxxx
> -------------------------------
>
> Draft-4
>
>
> -----------------------------------------------------------------------------
> Introduction
> -----------------------------------------------------------------------------
> This document describes the design for the PCI passthrough support in Xen
> ARM. The target system is an ARM 64bit SoC with GICv3 and SMMU v2 and PCIe
> devices.
>
> -----------------------------------------------------------------------------
> Revision History
> -----------------------------------------------------------------------------
> Changes from Draft-1:
> ---------------------
> a) map_mmio hypercall removed from earlier draft
> b) device bar mapping into guest not 1:1
> c) Reserved Area in guest address space for mapping PCI-EP BARs in Stage2.
> d) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>
> Changes from Draft-2:
> ---------------------
> a) DomU boot information updated with boot-time device assignment and
> hotplug.
> b) SMMU description added
> c) Mapping between streamID - bdf - deviceID.
> d) assign_device hypercall to include virtual(guest) sbdf.
> Toolstack to generate guest sbdf rather than pciback.
>
> Changes from Draft-3:
> ---------------------
> a) Fixed typos and added more description
> b) NUMA and PCI passthrough description removed for now.
> c) Added example from Ian's Mail
>
> -----------------------------------------------------------------------------
> Index
> -----------------------------------------------------------------------------
> (1) Background
>
> (2) Basic PCI Support in Xen ARM
> (2.1) pci_hostbridge and pci_hostbridge_ops
> (2.2) PHYSDEVOP_HOSTBRIDGE_ADD hypercall
> (2.3) XEN Internal API
>
> (3) SMMU programming
> (3.1) Additions for PCI Passthrough
> (3.2) Mapping between streamID - deviceID - pci sbdf - requesterID
>
> (4) Assignment of PCI device
> (4.1) Dom0
> (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
> (4.1.1.1) For Dom0
> (4.1.1.2) For DomU
> (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space
>
> (4.2) DomU
> (4.2.1) Reserved Areas in guest memory space
> (4.2.2) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
> (4.2.3) Hypercall Modification for bdf mapping notification to xen
>
> (5) DomU FrontEnd Bus Changes
> (5.1) Change in Linux PCI frontend bus and gicv3-its node binding for domU
>
> (6) Glossary
>
> (7) References
> -----------------------------------------------------------------------------
>
> 1. Background of PCI passthrough
> -----------------------------------------------------------------------------
> Passthrough refers to assigning a PCI device to a guest domain (domU) such
> that the guest has full control over the device. The MMIO space / interrupts
> are managed by the guest itself, close to how a bare kernel manages a device.
>
> Device's access to guest address space needs to be isolated and protected.
> SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
> device access guest memory for data transfer and sending MSI/X interrupts.
> PCI devices generated message signalled interrupt writes are within guest
> address spaces which are also translated using SMMU.
>
> For this reason the GITS (ITS address space) Interrupt Translation Register
> space is mapped in the guest address space.
>
> 2. Basic PCI Support for ARM
> -----------------------------------------------------------------------------
> The APIs to read write from PCI configuration space are based on segment:bdf.
> How the sbdf is mapped to a physical address is under the realm of the PCI
> host controller.
>
> ARM PCI support in Xen, introduces PCI host controller similar to what
> exists in Linux. Host controller drivers registers callbacks, which are
> invoked on matching the compatible property in pci device tree node.
>
> Note: as pci devices are enumerated the pci node in device tree refers to
> the host controller.
>
> (TODO: for ACPI unimplemented)
>
> 2.1 pci_hostbridge and pci_hostbridge_ops
> -----------------------------------------------------------------------------
> The init function in the PCI host driver calls to register hostbridge
> callbacks:
>
> int pci_hostbridge_register(pci_hostbridge_t *pcihb);
>
> struct pci_hostbridge_ops {
> u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
> u32 reg, u32 bytes);
> void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
> u32 reg, u32 bytes, u32 val);
> };
>
> struct pci_hostbridge{
> u32 segno;
> paddr_t cfg_base;
> paddr_t cfg_size;
> struct dt_device_node *dt_node;
> struct pci_hostbridge_ops ops;
> struct list_head list;
> };
>
> A PCI conf_read function would internally be as follows:
> u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
> {
> pci_hostbridge_t *pcihb;
> list_for_each_entry(pcihb, &pci_hostbridge_list, list)
> {
> if(pcihb-segno == seg)
> return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
> }
> return -1;
> }
>
> 2.2 PHYSDEVOP_pci_host_bridge_add hypercall
> -----------------------------------------------------------------------------
> Xen code accesses PCI configuration space based on the sbdf received from
> the guest. The order in which the pci device tree node appear may not be
> the same order of device enumeration in dom0. Thus there needs to be a
> mechanism to bind the segment number assigned by dom0 to the pci host
> controller. The hypercall is introduced:
>
> #define PHYSDEVOP_pci_host_bridge_add <<>>
> struct physdev_pci_host_bridge_add {
> /* IN */
> uint16_t seg;
> uint64_t cfg_base;
> uint64_t cfg_size;
> };
>
> This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
> hypercall.
>
> To understand in detail about the requirement Ian's example is listed below:
> -- Ref: [1]
> Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and
> a second with CFG space at 0xB0000000.
>
> Xen discovers these and assigns segment 0=0xA0000000 and segment
> 1=0xB0000000.
>
> Dom0 discovers them too but assigns segment 1=0xA0000000 and segment
> 0=0xB0000000 (i.e. the other way).
>
> Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
> the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is
> the PHYSDEVOP_manage_pci_add_ext call).
>
> But Xen thinks it is talking about the device with BDF behind the root
> bridge at 0xB0000000 because Dom0 and Xen do not agree on what the segments
> mean. Now Xen will use the wrong device ID in the IOMMU (since that is
> associated with the host bridge), or poke the wrong configuration space, or
> whatever.
>
> Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts
> talking about segment=0 and =1 it has no idea what is going on.
>
> PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0
> is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at
> 0xA0000000". With this there is no confusion between Xen and Dom0 because
> Xen isn't picking a segment ID, it is being told what it is by Dom0 which
> has done the picking.
> --
>
> The handler code invokes to update segment number in pci_hostbridge:
>
> int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
> cfg_size);
>
> Subsequent calls to pci_conf_read/write are completed by the
> pci_hostbridge_ops of the respective pci_hostbridge.
>
> 2.3 XEN Internal API
> -----------------------------------------------------------------------------
> a) pci_hostbridge_dt_node
>
> struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno);
>
> Returns the device tree node pointer of the pci node which is bound by the
> passed segment number. The API can be called subsequent to
> pci_hostbridge_setup
>
> 3. SMMU programming
> -----------------------------------------------------------------------------
>
> 3.1. Additions for PCI Passthrough
> -----------------------------------------------------------------------------
>
> 3.1.1 - add_device in iommu_ops is implemented.
> -----------------------------------------------------------------------------
>
> This is called when PHYSDEVOP_pci_add_device / PHYSDEVOP_manage_pci_add_ext
> is called from dom0.
>
> .add_device = arm_smmu_add_dom0_dev,
> static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
> {
> if (dev_is_pci(dev)) {
> struct pci_dev *pdev = to_pci_dev(dev);
> return arm_smmu_assign_dev(pdev-domain, devfn, dev);
> }
> return -1;
> }
>
> 3.1.2 - remove_device in iommu_ops is implemented.
> -----------------------------------------------------------------------------
> This is called when PHYSDEVOP_pci_device_remove is called from dom0/domU.
>
> .remove_device = arm_smmu_remove_dev.
> TODO: add implementation details of arm_smmu_remove_dev.
>
> 3.1.3 dev_get_dev_node is modified for pci devices.
> -----------------------------------------------------------------------------
> The function is modified to return the dt_node of the pci hostbridge from
> the device tree. This is required as non-dt devices need a way to find on
> which smmu they are attached.
>
> static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
> {
> struct device_node *dev_node = dev_get_dev_node(dev);
> ....
>
> static struct device_node *dev_get_dev_node(struct device *dev)
> {
> if (dev_is_pci(dev)) {
> struct pci_dev *pdev = to_pci_dev(dev);
> return pci_hostbridge_dt_node(pdev-seg);
> }
> ...
>
>
> 3.2. Mapping between streamID - deviceID - pci sbdf - requesterID
> -----------------------------------------------------------------------------
> For a simpler case all should be equal to BDF. But there are some devices
> that use the wrong requester ID for DMA transactions. Linux kernel has PCI
> quirks for these. How the same be implemented in Xen or a diffrent approach
> has to be taken is TODO here.
>
> Till that time, for basic implementation it is assumed that all are equal
> to BDF.
>
> 4. Assignment of PCI device
> -----------------------------------------------------------------------------
>
> 4.1 Dom0
> -----------------------------------------------------------------------------
> All PCI devices are assigned to dom0 unless hidden by pciback.hide bootargs
> in dom0.Dom0 enumerates the PCI devices. For each device the MMIO space has
> to be mapped in the Stage2 translation for dom0. For dom0 Xen maps ranges
> from device tree pci nodes in stage 2 translation during boot.
>
> In the flow of hypercall processing PHYSDEV_pci_add_device
> its_add_device(machine_sbdf) should be called. This will allocate ITS
> specific data structures for the device. (Reference [2])
>
>
> 4.1.1 Stage 2 Mapping of GITS_ITRANSLATER space (64k)
> -----------------------------------------------------------------------------
>
> GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
> that SMMU can translate MSI(x) from the device using the page table of the
> domain.
>
> 4.1.1.1 For Dom0
> -----------------------------------------------------------------------------
> GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this
> mapping is done in the vgic driver. For domU the mapping is done by
> toolstack.
>
> 4.1.1.2 For DomU
> -----------------------------------------------------------------------------
> For domU, while creating the domain, the toolstack reads the IPA from the
> macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
> read from a new hypercall which returns PA of GITS_ITRANSLATER_SPACE.
>
> Subsequently toolstack sends a hypercall to create a stage 2 mapping.
>
> Hypercall Details: XEN_DOMCTL_get_itranslater_space
>
> /* XEN_DOMCTL_get_itranslater_space */
> struct xen_domctl_get_itranslater_space {
> /* OUT variables. */
> uint64_aligned_t start_addr;
> uint64_aligned_t size;
> };
>
> 4.2 DomU
> -----------------------------------------------------------------------------
>
> 4.2.1 Mapping BAR regions in guest address space
> -----------------------------------------------------------------------------
> When a PCI-EP device is assigned to a domU the toolstack will read the pci
> configuration space BAR registers. Toolstack allocates a virtual BAR
> region for each BAR region, from the area reserved in guest address space for
> mapping BARs referred to as Guest BAR area. This area is defined in
> public/arch-arm.h
>
> /* For 32bit BARs*/
> #define GUEST_BAR_BASE_32 <<>>
> #define GUEST_BAR_SIZE_32 <<>>
>
> /* For 64bit BARs*/
> #define GUEST_BAR_BASE_64 <<>>
> #define GUEST_BAR_SIZE_64 <<>>
>
> Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
> translation. If a BAR region address is 32b BASE_32 area would be used,
> otherwise 64b. If a combination of both is required the support is TODO.
>
> Toolstack manages these areas and allocate from these area. The allocation
> and deallocation is done using APIs similar to malloc and free.
>
> 4.2.2 Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
> ----------------------------------------------------------------------------
> Toolstack also updates the xenstore information for the device
> (virtualbar:physical bar).This information is read by xen-pciback and
> returned to the domU-pcifront driver configuration space reads for BAR.
>
> Entries created are as follows:
> /local/domain/0/backend/pci/1/0
> vdev-N
> BDF = ""
> BAR-0-IPA = ""
> BAR-0-PA = ""
> BAR-0-SIZE = ""
> ...
> BAR-M-IPA = ""
> BAR-M-PA = ""
> BAR-M-SIZE = ""
>
> Note: If BAR M SIZE is 0, it is not a valid entry.
>
> 4.2.3 Hypercall Modification (XEN_DOMCTL_assign_device)
> ----------------------------------------------------------------------------
> For machine:sbdf guest:sbdf needs to be generated when a device is assigned
> to a domU. Currently this is done by xen-pciback. As per discussions [4]
> on xen-devel the df generation should be done by toolstack rather than
> the xen-pciback.
>
> Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f
> It is proposed in this design document that the df generation be done by
> toolstack and the xenstore keys be created by toolstack.
>
> Folowing guest_sbdf generation the domctl to assign the device is invoked.
> This hypercall is updated to include *guest_sbdf*. Xen ITS driver can store
> this mapping domID: guest_sbdf: machine_sbdf and can be used later.
>
> struct xen_domctl_assign_device {
> uint32_t dev; /* XEN_DOMCTL_DEV_* */
> union {
> struct {
> uint32_t machine_sbdf; /* machine PCI ID of assigned device */
> uint32_t guest_sbdf; /* guest PCI ID of assigned device */
> } pci;
> struct {
> uint32_t size; /* Length of the path */
> XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
> } dt;
> } u;
> };
>
> In the handler of this hypercall an internal API function
> its_assign_device(domid, machine_sbdf, guest_sbdf)
> (Refrence [2])
>
> is called and will store the mapping between machine_sbdf:guest_sbdf.
>
> 5. Change in Linux PCI frontEnd - backend driver for MSI/X programming
> -----------------------------------------------------------------------------
>
> 5.1 pci-frontend bus and gicv3-its node binding for domU
> -----------------------------------------------------------------------------
> It is assumed that toolstack would generate a gicv3-its node in domU device
> tree. As of now the ARM PCI passthrough design supports device assignment to
> the guests which have gicv3-its support. PCI passthrough with a gicv2 guest
> is not supported.
>
> All the devices assigned to domU are enumerated on a PCI frontend bus.
> On this bus interrupt parent is set as gicv3-its for ARM systems. As the
> gicv3-its is emulated in xen, all the access by domU driver is trapped.
> This helps configuration & direct injection of MSI(LPI) into the guest. Thus
> the frontend-backend communication for MSI is no longer required.
>
> Frontend-backend communication is required only for reading PCI configuration
> space by dom0 on behalf of domU.
>
> 6. Glossary
> -----------------------------------------------------------------------------
> MSI: Message Signalled Interrupt
> ITS: Interrupt Translation Service
> GIC: Generic Interrupt Controller
> LPI: Locality-specific Peripheral Interrupt
>
>
> 7. References
> -----------------------------------------------------------------------------
> [1]. http://osdir.com/ml/general/2015-08/msg15346.html
> [2]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg01984.html
> [3]. http://xenbits.xen.org/people/ianc/vits/draftG.html
> [4]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg05513.html
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |