[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV
On Fri, Mar 20, 2009 at 12:10:54AM +0800, Espen Skoglund wrote: > [Yu Zhao] > > Yes, using the master BDF can move current logic into Dom0 and makes > > hypervisor cleaner. And it does work for VT-d spec 1.2. > > > But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI > > device and the Virtual Function have their own remapping unit or > > something like this, rather than use their masters', how could we > > support it using the master BDF? > > If this happens the dom0 kernel will detect it and pass a different > master BDF to the hypervisor. This was the whole point of my comment; > the hypervisor should need not know what type of device function it is > dealing with. The logic for handling this should if possible be kept > out of the hypervisor (and if these kind of changes came along you > would still need dom0 support for handling it anyway). Yes, I understand you point but didn't make myself clear: 1) we can't extend device_add existing in 3.3 release for compatibility reason. 2) the master BDF can only cover current VT-d 1.2 spec case -- IOMMUs from other vendors may require the ARI Extended Function or the Virtual Function to use a seperate remapping that can't indicated by the master BDF. They may use a simple algorithm as: if (is_ari_extfn) use IOMMU_1 else if (is_sriov_virtfn) use IOMMU_2 else use BDF to find a proper IOMMU or something like this which doesn't have the master BDF concept at all. And the Virtual Function ATS has following requirement: PCI SR-IOV 1.0 section 3.7.4: However, all VFs associated with a PF share a single input queue in the PF. To implement Invalidation flow control, the TA must ensure that the total number of outstanding Invalidate Requests to the shared PF queue (targeted to the PF and its associated VFs) does not exceed the value in the PF Invalidate Queue Depth field. Which means if we want to enable ATS for a Virtual Function, we must know it's a Virtual Function first, then its associated Physical Function. Only knowing its master BDF can't give IOMMU enough hint to setup the Invalidation Queue (IOMMU won't figure out the function type behind the master BDF). Eventually we still need to pass the function type to the hypervisor and let the IOMMU code to do something else even we have found the master BDF for DRHD unit matching in the Dom0. This makes me feel no difference between putting a small part this kind logics in the Dom0 while leaving most of them in the hypervisor and putting all of them in the hypervisor. > > Things evolve fast, we would need > > to add another hypercall to enhance the master BDF one after it's in > > 3.4 -- it would be like when the device_add was added, the VT-d spec > > didn't have such requirement, but now we have to add device_add_ext > > because the compatibility requirement. > > > Passing these device specific information down and doing the IOMMU > > specific work inside the hypervisor hereditarily come with current > > passthrough architecture. After choosing putting all IOMMU things > > (both high level remapping data structures and logics, and low level > > hardware drivers) into hypervisor, we lost the flexibility to split > > the matching up logic and move it back to the Dom0 kernel. > > I don't buy this argument. You seem to be indicating that the > mechanism for configuring a given setup can not be separated from the > mechanism which actually enforces that configuration. This is not > true. It's all a matter of finding the right abstraction for the > configuration interface. Flexibility need not be sacrificed. I guess > the main problem here is that there was never much thought put into > how to best express the interfaces and abstractions for dealing with > IOMMUs, and as newer generations of IOMMU and PCIe hardware came along > the lack of flexibility in the original abstractions has come back to > bite us. Yes, VT-d used to not cover the SR-IOV/ARI device because the PCIe IOV specs appeared relatively late and hardware having these new features is rarely supported by VMMs. Any comments on improving interfaces and abstractions are welcome. Thanks, Yu _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |