[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC + Queries] Flow of PCI passthrough in ARM



On 25 September 2014 15:57, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> On Thu, 25 Sep 2014, manish jaggi wrote:
>> On 24 September 2014 19:40, Stefano Stabellini
>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> > CC'ing Matt and Dave at ARM for an opinions about device tree, SMMUs and
>> > stream ids. See below.
>> >
>> > On Wed, 24 Sep 2014, manish jaggi wrote:
>> >> On 22 September 2014 16:15, Stefano Stabellini
>> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >> > On Thu, 18 Sep 2014, manish jaggi wrote:
>> >> >> Hi,
>> >> >> Below is the flow I am working on, Please provide your comments, I
>> >> >> have a couple of queries as well..
>> >> >>
>> >> >> a) Device tree has smmu nodes and each smmu node has the mmu-master 
>> >> >> property.
>> >> >> In our Soc DT the mmu-master is a pcie node in device tree.
>> >> >
>> >> > Do you mean that both the smmu nodes and the pcie node have the
>> >> > mmu-master property? The pcie node is the pcie root complex, right?
>> >> >
>> >> pci-node is the pcie root complex. pci node is the mmu master in smmu 
>> >> node.
>> >>
>> >>   smmu1@0x8310,00000000 {
>> >> ...
>> >>
>> >>                  mmu-masters = <&pcie1 0x100>;
>> >>          };
>> >>
>> >> >> b) Xen parses the device tree and prepares a list which stores the pci
>> >> >> device tree node pointers. The order in device tree is mapped to
>> >> >> segment number in subsequent calls. For eg 1st pci node found is
>> >> >> segment 0, 2nd segment 1
>> >> >
>> >> > What's a segment number? Something from the PCI spec?
>> >> > If you have several pci nodes on device tree, does that mean that you
>> >> > have several different pcie root complexes?
>> >> >
>> >> yes.
>> >> segment is the pci rc number.
>> >> >
>> >> >> c) During SMMU init the pcie nodes in DT are saved as smmu masters.
>> >> >
>> >> > At this point you should also be able to find via DT the stream-id range
>> >> > supported by each SMMU and program the SMMU with them, assigning
>> >> > everything to dom0.
>> >> Currently pcie enumeration is not done in xen, it is done in dom0.
>> >
>> > Yes, but we don't really need to walk any PCIe busses in order to
>> > program the SMMU, right? We only need the requestor id and the stream id
>> > ranges. We should be able to get them via device tree.
>> >
>> Yes, but i have a doubt here
>> Before booting dom0 for each smmu the mask in SMR can be set to enable
>> stream ids to dom0.
>> This can be fixed or read from device tree.
>> There are 2 points here
>> a) PCI bus enumeration
>> b) Programming SMMU for dom0
>> For (b) the enumeration is not required provided we set the mask
>> So are you also saying that (a) should be done in Xen and not in dom0 ?
>> If yes how would dom0 get to know about PCIe Eps , from its Device tree ?
>
> No, I think that doing (a) via PHYSDEVOP_pci_device_add is OK.
> I am saying that we should consider doing (b) in Xen before booting
> dom0.
>
>
>> >> >> d) Dom0 Enumerates PCI devices, calls hypercall 
>> >> >> PHYSDEVOP_pci_device_add.
>> >> >>  - In Xen the SMMU iommu_ops add_device is called. I have implemented
>> >> >> the add_device function.
>> >> >> - In the add_device function
>> >> >>  the segment number is used to locate the device tree node pointer of
>> >> >> the pcie node which helps to find out the corresponding smmu.
>> >> >> - In the same PHYSDEVOP the BAR regions are mapped to Dom0.
>> >> >>
>> >> >> Note: The current SMMU driver maps the complete Domain's Address space
>> >> >> for the device in SMMU hardware.
>> >> >>
>> >> >> The above flow works currently for us.
>> >> >
>> >> > It would be nice to be able to skip d): in a system where all dma 
>> >> > capable
>> >> > devices are behind smmus, we should be capable of booting dom0 without
>> >> > the 1:1 mapping hack. If we do that, it would be better to program the
>> >> > smmus before booting dom0. Otherwise there is a risk that dom0 is going
>> >> > to start using these devices and doing dma before we manage to secure
>> >> > the devices via smmus.
>> >> >
>> >> In our current case we are programming smmu in
>> >> PHYSDEVOP_pci_device_add flow so before the domain 0 accesses the
>> >> device it is mapped, otherwise xen gets a SMMU fault.
>> >
>> > Good.
>> >
>> >
>> >> > Of course we can do that if there are no alternatives. But in our case
>> >> > we should be able to extract the stream-ids from device tree and program
>> >> > the smmus right away, right?  Do we really need to wait for dom0 to call
>> >> > PHYSDEVOP_pci_device_add? We could just assign everything to dom0 for a
>> >> > start.
>> >> >
>> >> We cannot get streamid from device tree as enumeration is done for the 
>> >> same.
>> >
>> > I am not sure what the current state of the device tree spec is, but I
>> > am pretty sure that the intention is to express stream id and requestor
>> > id ranges directly in the dts so that the SMMU can be programmed right
>> > away without walking the PCI bus.
>> >
>> >
>> >> > I would like to know from the x86 guys, if this is really how it is
>> >> > supposed to work on PVH too. Do we rely on PHYSDEVOP_pci_device_add to
>> >> > program the IOMMU?
>> >> >
>> >> >
>> >> I was waiting but no one has commented
>> >
>> > Me too. Everybody is very busy at the moment with the 4.5 release.
>> >
>> >
>> >> >> Now when I call pci-assignable-add I see that the iommu_ops
>> >> >> remove_device in smmu driver is not called. If that is not called the
>> >> >> SMMU would still have the dom0 address space mappings for that device
>> >> >>
>> >> >> Can you please suggest the best place (kernel / xl-tools) to put the
>> >> >> code which would call the remove_device in iommu_opps in the control
>> >> >> flow from pci-assignable-add.
>> >> >>
>> >> >> One way I see is to introduce a DOMCTL_iommu_remove_device in
>> >> >> pci-assignable-add / pci-detach and DOMCTL_iommu_add_device in
>> >> >> pci-attach. Is that a valid approach  ?
>> >> >
>> >> > I am not 100% sure, but I think that before assigning a PCI device to
>> >> > another guest, you are supposed to bind the device to xen-pciback (see
>> >> > drivers/xen/xen-pciback, also see
>> >> > http://wiki.xen.org/wiki/Xen_PCI_Passthrough). The pciback driver is
>> >> > going hide the device from dom0 and as a consequence
>> >> > drivers/xen/pci.c:xen_remove_device ends up being called, that issues a
>> >> > PHYSDEVOP_pci_device_remove hypercall.
>> >>
>> >> xen_remove_device is not called at all. in pci-attach
>> >> iommu_ops->assign_device is called.
>> >> In Xen the nomenclature is confusing and no comments are there is iommu.h
>> >> iommu_ops.add_device is when dom0 issues hypercall
>> >> iommu_ops.assign_dt_device is when xen attaches a device tree device to 
>> >> dom0
>> >> iommu_ops.assign_device is when xl pci-attach is called
>> >> iommu_ops.reassign_device is called when xl pci-detach is called
>> >>
>> >> As of now we are able to assign devices to domU and driver in domU is
>> >> running, we did some hacks like
>> >> a) in xen pci front driver bus->msi is assigned to its msi_chip
>> >>
>> >> ---- pcifront_scan_root()
>> >> ...
>> >> b = pci_scan_bus_parented(&pdev->xdev->dev, bus,
>> >>                   &pcifront_bus_ops, sd);
>> >>     if (!b) {
>> >>         dev_err(&pdev->xdev->dev,
>> >>             "Error creating PCI Frontend Bus!\n");
>> >>         err = -ENOMEM;
>> >>         pci_unlock_rescan_remove();
>> >>         goto err_out;
>> >>     }
>> >>
>> >>     bus_entry->bus = b;
>> >> +        msi_node = of_find_compatible_node(NULL,NULL, "arm,gic-v3-its");
>> >> +        if(msi_node) {
>> >> +            b->msi = of_pci_find_msi_chip_by_node(msi_node);
>> >> +            if(!b->msi) {
>> >> +               printk(KERN_ERR"Unable to find bus->msi node \r\n");
>> >> +               goto err_out;
>> >> +            }
>> >> +        }else {
>> >> +               printk(KERN_ERR"Unable to find arm,gic-v3-its
>> >> compatible node \r\n");
>> >> +               goto err_out;
>> >> +        }
>> >
>> > It seems to be that of_pci_find_msi_chip_by_node should be called by
>> > common code somewhere else. Maybe people at linux-arm would know where
>> > to suggest this initialization should go.
>> >
>> This is a workaround to attach a msi-controller to xen pcifront bus.
>> We are avoiding the xen fronted ops for msi.
>
> I think I would need to see a proper patch series to really evaluate this 
> change.
>
>
>> >
>> >> ----
>> >>
>> >> using this the ITS emulation code in xen is able to trap ITS command
>> >> writes by driver.
>> >> But we are facing a problem now, where your help is needed
>> >>
>> >> The StreamID is generated by segment: bus : device: number which is
>> >> fed as DevID in ITS commands. In Dom0 the streamID is correctly
>> >> generated but in domU the Stream ID for a passthrough device is
>> >> 0:0:0:0 now when emulating this in Xen it is a problem as xen does not
>> >> know how to get the physical stream id.
>> >>
>> >> (Eg: xl pci-attach 1 0001:00:05.0
>> >> DomU has the device but in DomU the id is 0000:00:00.0.)
>> >>
>> >> Could you suggest how to go about this.
>> >
>> > I don't think that the ITS patches have been posted yet, so it is
>> > difficult for me to understand the problem and propose a solution.
>>
>> In a simpler way, It is more of what the StreamID a driver running in
>> domU sees. Which is programmed in the ITS commands.
>> And how to map the domU  streamID to actual streamID in Xen when the
>> ITS command write traps.
>
> Wouldn't it be possible to pass the correct StreamID to DomU via device
> tree? Does it really need to match the PCI BDF?
Device Tree provide static mapping, runtime attaching a device (using
xl tools) to a domU is what I am working on.

> Otherwise if the command trap into Xen, couldn't Xen do the translation?
Xen does not know how to map the BDF in domU to actual streamID.

I had thought of adding a hypercall,  when xl pci-attach is called.
PHYSDEVOP_map_streamid {
    dom_id,
    phys_streamid, //bdf
    guest_streamid,
}

 But I am not able to get correct BDF of domU.
For instance the logs at 2 different place give diff BDFs

#xl pci-attach 1 '0002:01:00.1,permissive=1'

xen-pciback pci-1-0: xen_pcibk_export_device exporting dom 2 bus 1 slot 0 func 1
xen_pciback: vpci: 0002:01:00.1: assign to virtual slot 1
xen_pcibk_publish_pci_dev 0000:00:01.00

Code that generated print:
static int xen_pcibk_publish_pci_dev(struct xen_pcibk_device *pdev,
                                   unsigned int domain, unsigned int bus,
                                   unsigned int devfn, unsigned int devid)
{
    ...
        printk(KERN_ERR"%s %04x:%02x:%02x.%02x",__func__, domain, bus,
                            PCI_SLOT(devfn), PCI_FUNC(devfn));


While in xen_pcibk_do_op Print is:

xen_pcibk_do_op Guest SBDF=0:0:1.1 (this is output of lspci in domU)

Code that generated print:

void xen_pcibk_do_op(struct work_struct *data)
{
     ...
        if (dev == NULL)
                op->err = XEN_PCI_ERR_dev_not_found;
        else {
        printk(KERN_ERR"%s Guest SBDF=%d:%d:%d.%d \r\n",__func__,
op->domain, op->bus, op->devfn>>3, op->devfn&0x7);


Stefano, I need your help in this

-Regards
Manish

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.