[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC + Queries] Flow of PCI passthrough in ARM
On Wed, 1 Oct 2014, manish jaggi wrote: > On 25 September 2014 15:57, Stefano Stabellini > <stefano.stabellini@xxxxxxxxxxxxx> wrote: > > On Thu, 25 Sep 2014, manish jaggi wrote: > >> On 24 September 2014 19:40, Stefano Stabellini > >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: > >> > CC'ing Matt and Dave at ARM for an opinions about device tree, SMMUs and > >> > stream ids. See below. > >> > > >> > On Wed, 24 Sep 2014, manish jaggi wrote: > >> >> On 22 September 2014 16:15, Stefano Stabellini > >> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: > >> >> > On Thu, 18 Sep 2014, manish jaggi wrote: > >> >> >> Hi, > >> >> >> Below is the flow I am working on, Please provide your comments, I > >> >> >> have a couple of queries as well.. > >> >> >> > >> >> >> a) Device tree has smmu nodes and each smmu node has the mmu-master > >> >> >> property. > >> >> >> In our Soc DT the mmu-master is a pcie node in device tree. > >> >> > > >> >> > Do you mean that both the smmu nodes and the pcie node have the > >> >> > mmu-master property? The pcie node is the pcie root complex, right? > >> >> > > >> >> pci-node is the pcie root complex. pci node is the mmu master in smmu > >> >> node. > >> >> > >> >> smmu1@0x8310,00000000 { > >> >> ... > >> >> > >> >> mmu-masters = <&pcie1 0x100>; > >> >> }; > >> >> > >> >> >> b) Xen parses the device tree and prepares a list which stores the > >> >> >> pci > >> >> >> device tree node pointers. The order in device tree is mapped to > >> >> >> segment number in subsequent calls. For eg 1st pci node found is > >> >> >> segment 0, 2nd segment 1 > >> >> > > >> >> > What's a segment number? Something from the PCI spec? > >> >> > If you have several pci nodes on device tree, does that mean that you > >> >> > have several different pcie root complexes? > >> >> > > >> >> yes. > >> >> segment is the pci rc number. > >> >> > > >> >> >> c) During SMMU init the pcie nodes in DT are saved as smmu masters. > >> >> > > >> >> > At this point you should also be able to find via DT the stream-id > >> >> > range > >> >> > supported by each SMMU and program the SMMU with them, assigning > >> >> > everything to dom0. > >> >> Currently pcie enumeration is not done in xen, it is done in dom0. > >> > > >> > Yes, but we don't really need to walk any PCIe busses in order to > >> > program the SMMU, right? We only need the requestor id and the stream id > >> > ranges. We should be able to get them via device tree. > >> > > >> Yes, but i have a doubt here > >> Before booting dom0 for each smmu the mask in SMR can be set to enable > >> stream ids to dom0. > >> This can be fixed or read from device tree. > >> There are 2 points here > >> a) PCI bus enumeration > >> b) Programming SMMU for dom0 > >> For (b) the enumeration is not required provided we set the mask > >> So are you also saying that (a) should be done in Xen and not in dom0 ? > >> If yes how would dom0 get to know about PCIe Eps , from its Device tree ? > > > > No, I think that doing (a) via PHYSDEVOP_pci_device_add is OK. > > I am saying that we should consider doing (b) in Xen before booting > > dom0. > > > > > >> >> >> d) Dom0 Enumerates PCI devices, calls hypercall > >> >> >> PHYSDEVOP_pci_device_add. > >> >> >> - In Xen the SMMU iommu_ops add_device is called. I have implemented > >> >> >> the add_device function. > >> >> >> - In the add_device function > >> >> >> the segment number is used to locate the device tree node pointer of > >> >> >> the pcie node which helps to find out the corresponding smmu. > >> >> >> - In the same PHYSDEVOP the BAR regions are mapped to Dom0. > >> >> >> > >> >> >> Note: The current SMMU driver maps the complete Domain's Address > >> >> >> space > >> >> >> for the device in SMMU hardware. > >> >> >> > >> >> >> The above flow works currently for us. > >> >> > > >> >> > It would be nice to be able to skip d): in a system where all dma > >> >> > capable > >> >> > devices are behind smmus, we should be capable of booting dom0 without > >> >> > the 1:1 mapping hack. If we do that, it would be better to program the > >> >> > smmus before booting dom0. Otherwise there is a risk that dom0 is > >> >> > going > >> >> > to start using these devices and doing dma before we manage to secure > >> >> > the devices via smmus. > >> >> > > >> >> In our current case we are programming smmu in > >> >> PHYSDEVOP_pci_device_add flow so before the domain 0 accesses the > >> >> device it is mapped, otherwise xen gets a SMMU fault. > >> > > >> > Good. > >> > > >> > > >> >> > Of course we can do that if there are no alternatives. But in our case > >> >> > we should be able to extract the stream-ids from device tree and > >> >> > program > >> >> > the smmus right away, right? Do we really need to wait for dom0 to > >> >> > call > >> >> > PHYSDEVOP_pci_device_add? We could just assign everything to dom0 for > >> >> > a > >> >> > start. > >> >> > > >> >> We cannot get streamid from device tree as enumeration is done for the > >> >> same. > >> > > >> > I am not sure what the current state of the device tree spec is, but I > >> > am pretty sure that the intention is to express stream id and requestor > >> > id ranges directly in the dts so that the SMMU can be programmed right > >> > away without walking the PCI bus. > >> > > >> > > >> >> > I would like to know from the x86 guys, if this is really how it is > >> >> > supposed to work on PVH too. Do we rely on PHYSDEVOP_pci_device_add to > >> >> > program the IOMMU? > >> >> > > >> >> > > >> >> I was waiting but no one has commented > >> > > >> > Me too. Everybody is very busy at the moment with the 4.5 release. > >> > > >> > > >> >> >> Now when I call pci-assignable-add I see that the iommu_ops > >> >> >> remove_device in smmu driver is not called. If that is not called the > >> >> >> SMMU would still have the dom0 address space mappings for that device > >> >> >> > >> >> >> Can you please suggest the best place (kernel / xl-tools) to put the > >> >> >> code which would call the remove_device in iommu_opps in the control > >> >> >> flow from pci-assignable-add. > >> >> >> > >> >> >> One way I see is to introduce a DOMCTL_iommu_remove_device in > >> >> >> pci-assignable-add / pci-detach and DOMCTL_iommu_add_device in > >> >> >> pci-attach. Is that a valid approach ? > >> >> > > >> >> > I am not 100% sure, but I think that before assigning a PCI device to > >> >> > another guest, you are supposed to bind the device to xen-pciback (see > >> >> > drivers/xen/xen-pciback, also see > >> >> > http://wiki.xen.org/wiki/Xen_PCI_Passthrough). The pciback driver is > >> >> > going hide the device from dom0 and as a consequence > >> >> > drivers/xen/pci.c:xen_remove_device ends up being called, that issues > >> >> > a > >> >> > PHYSDEVOP_pci_device_remove hypercall. > >> >> > >> >> xen_remove_device is not called at all. in pci-attach > >> >> iommu_ops->assign_device is called. > >> >> In Xen the nomenclature is confusing and no comments are there is > >> >> iommu.h > >> >> iommu_ops.add_device is when dom0 issues hypercall > >> >> iommu_ops.assign_dt_device is when xen attaches a device tree device to > >> >> dom0 > >> >> iommu_ops.assign_device is when xl pci-attach is called > >> >> iommu_ops.reassign_device is called when xl pci-detach is called > >> >> > >> >> As of now we are able to assign devices to domU and driver in domU is > >> >> running, we did some hacks like > >> >> a) in xen pci front driver bus->msi is assigned to its msi_chip > >> >> > >> >> ---- pcifront_scan_root() > >> >> ... > >> >> b = pci_scan_bus_parented(&pdev->xdev->dev, bus, > >> >> &pcifront_bus_ops, sd); > >> >> if (!b) { > >> >> dev_err(&pdev->xdev->dev, > >> >> "Error creating PCI Frontend Bus!\n"); > >> >> err = -ENOMEM; > >> >> pci_unlock_rescan_remove(); > >> >> goto err_out; > >> >> } > >> >> > >> >> bus_entry->bus = b; > >> >> + msi_node = of_find_compatible_node(NULL,NULL, > >> >> "arm,gic-v3-its"); > >> >> + if(msi_node) { > >> >> + b->msi = of_pci_find_msi_chip_by_node(msi_node); > >> >> + if(!b->msi) { > >> >> + printk(KERN_ERR"Unable to find bus->msi node \r\n"); > >> >> + goto err_out; > >> >> + } > >> >> + }else { > >> >> + printk(KERN_ERR"Unable to find arm,gic-v3-its > >> >> compatible node \r\n"); > >> >> + goto err_out; > >> >> + } > >> > > >> > It seems to be that of_pci_find_msi_chip_by_node should be called by > >> > common code somewhere else. Maybe people at linux-arm would know where > >> > to suggest this initialization should go. > >> > > >> This is a workaround to attach a msi-controller to xen pcifront bus. > >> We are avoiding the xen fronted ops for msi. > > > > I think I would need to see a proper patch series to really evaluate this > > change. > > > > > >> > > >> >> ---- > >> >> > >> >> using this the ITS emulation code in xen is able to trap ITS command > >> >> writes by driver. > >> >> But we are facing a problem now, where your help is needed > >> >> > >> >> The StreamID is generated by segment: bus : device: number which is > >> >> fed as DevID in ITS commands. In Dom0 the streamID is correctly > >> >> generated but in domU the Stream ID for a passthrough device is > >> >> 0:0:0:0 now when emulating this in Xen it is a problem as xen does not > >> >> know how to get the physical stream id. > >> >> > >> >> (Eg: xl pci-attach 1 0001:00:05.0 > >> >> DomU has the device but in DomU the id is 0000:00:00.0.) > >> >> > >> >> Could you suggest how to go about this. > >> > > >> > I don't think that the ITS patches have been posted yet, so it is > >> > difficult for me to understand the problem and propose a solution. > >> > >> In a simpler way, It is more of what the StreamID a driver running in > >> domU sees. Which is programmed in the ITS commands. > >> And how to map the domU streamID to actual streamID in Xen when the > >> ITS command write traps. > > > > Wouldn't it be possible to pass the correct StreamID to DomU via device > > tree? Does it really need to match the PCI BDF? > Device Tree provide static mapping, runtime attaching a device (using > xl tools) to a domU is what I am working on. As I wrote before it is difficult to answer without the patches and/or a design document. You should be able to specify StreamID ranges in Device Tree to cover a bus. So you should be able to say that the virtual PCI bus in the guest has StreamID [0-8] for slots [0-8]. Then in your example below you need to make sure to insert the passthrough device in virtual slot 1 instead of virtual slot 0. I don't know if you were aware of this but you can already specify the virtual slot number to pci-attach, see xl pci-attach --help Otherwise you could let the frontend know the StreamID via xenbus: the backend should know the correct StreamID for the device, it could just add it to xenstore as a new parameter for the frontend. Either way you should be able to tell the frontend what is the right StreamID for the device. > > Otherwise if the command trap into Xen, couldn't Xen do the translation? > Xen does not know how to map the BDF in domU to actual streamID. > > I had thought of adding a hypercall, when xl pci-attach is called. > PHYSDEVOP_map_streamid { > dom_id, > phys_streamid, //bdf > guest_streamid, > } > > But I am not able to get correct BDF of domU. I don't think that an hypercall is a good way to solve this. > For instance the logs at 2 different place give diff BDFs > > #xl pci-attach 1 '0002:01:00.1,permissive=1' > > xen-pciback pci-1-0: xen_pcibk_export_device exporting dom 2 bus 1 slot 0 > func 1 > xen_pciback: vpci: 0002:01:00.1: assign to virtual slot 1 > xen_pcibk_publish_pci_dev 0000:00:01.00 > > Code that generated print: > static int xen_pcibk_publish_pci_dev(struct xen_pcibk_device *pdev, > unsigned int domain, unsigned int bus, > unsigned int devfn, unsigned int devid) > { > ... > printk(KERN_ERR"%s %04x:%02x:%02x.%02x",__func__, domain, bus, > PCI_SLOT(devfn), PCI_FUNC(devfn)); > > > While in xen_pcibk_do_op Print is: > > xen_pcibk_do_op Guest SBDF=0:0:1.1 (this is output of lspci in domU) > > Code that generated print: > > void xen_pcibk_do_op(struct work_struct *data) > { > ... > if (dev == NULL) > op->err = XEN_PCI_ERR_dev_not_found; > else { > printk(KERN_ERR"%s Guest SBDF=%d:%d:%d.%d \r\n",__func__, > op->domain, op->bus, op->devfn>>3, op->devfn&0x7); > > > Stefano, I need your help in this _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |