[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV



On Fri, Mar 20, 2009 at 12:10:54AM +0800, Espen Skoglund wrote:
> [Yu Zhao]
> > Yes, using the master BDF can move current logic into Dom0 and makes
> > hypervisor cleaner. And it does work for VT-d spec 1.2.
> 
> > But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI
> > device and the Virtual Function have their own remapping unit or
> > something like this, rather than use their masters', how could we
> > support it using the master BDF?
> 
> If this happens the dom0 kernel will detect it and pass a different
> master BDF to the hypervisor.  This was the whole point of my comment;
> the hypervisor should need not know what type of device function it is
> dealing with.  The logic for handling this should if possible be kept
> out of the hypervisor (and if these kind of changes came along you
> would still need dom0 support for handling it anyway).

Yes, I understand you point but didn't make myself clear:
1) we can't extend device_add existing in 3.3 release for compatibility
reason.
2) the master BDF can only cover current VT-d 1.2 spec case -- IOMMUs
from other vendors may require the ARI Extended Function or the Virtual
Function to use a seperate remapping that can't indicated by the master
BDF. They may use a simple algorithm as:
    if (is_ari_extfn)
        use IOMMU_1
    else if (is_sriov_virtfn)
        use IOMMU_2
    else
        use BDF to find a proper IOMMU
or something like this which doesn't have the master BDF concept at all.
And the Virtual Function ATS has following requirement:

PCI SR-IOV 1.0 section 3.7.4:
However, all VFs associated with a PF share a single input queue in the
PF. To implement Invalidation flow control, the TA must ensure that the
total number of outstanding Invalidate Requests to the shared PF queue
(targeted to the PF and its associated VFs) does not exceed the value
in the PF Invalidate Queue Depth field.

Which means if we want to enable ATS for a Virtual Function, we must
know it's a Virtual Function first, then its associated Physical
Function. Only knowing its master BDF can't give IOMMU enough hint
to setup the Invalidation Queue (IOMMU won't figure out the function
type behind the master BDF).

Eventually we still need to pass the function type to the hypervisor and
let the IOMMU code to do something else even we have found the master BDF
for DRHD unit matching in the Dom0. This makes me feel no difference
between putting a small part this kind logics in the Dom0 while leaving
most of them in the hypervisor and putting all of them in the hypervisor.

> >                                    Things evolve fast, we would need
> > to add another hypercall to enhance the master BDF one after it's in
> > 3.4 -- it would be like when the device_add was added, the VT-d spec
> > didn't have such requirement, but now we have to add device_add_ext
> > because the compatibility requirement.
> 
> > Passing these device specific information down and doing the IOMMU
> > specific work inside the hypervisor hereditarily come with current
> > passthrough architecture. After choosing putting all IOMMU things
> > (both high level remapping data structures and logics, and low level
> > hardware drivers) into hypervisor, we lost the flexibility to split
> > the matching up logic and move it back to the Dom0 kernel.
> 
> I don't buy this argument.  You seem to be indicating that the
> mechanism for configuring a given setup can not be separated from the
> mechanism which actually enforces that configuration.  This is not
> true.  It's all a matter of finding the right abstraction for the
> configuration interface.  Flexibility need not be sacrificed.  I guess
> the main problem here is that there was never much thought put into
> how to best express the interfaces and abstractions for dealing with
> IOMMUs, and as newer generations of IOMMU and PCIe hardware came along
> the lack of flexibility in the original abstractions has come back to
> bite us.

Yes, VT-d used to not cover the SR-IOV/ARI device because the PCIe IOV
specs appeared relatively late and hardware having these new features
is rarely supported by VMMs.

Any comments on improving interfaces and abstractions are welcome.

Thanks,
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.