[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from



On 09/03/15 15:42, Suravee Suthikulanit wrote:
> On 3/6/2015 6:15 AM, Andrew Cooper wrote:
>> On 06/03/2015 07:50, Jan Beulich wrote:
>>>>>> On 05.03.15 at 18:30, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 26/02/15 13:56, Jan Beulich wrote:
>>>>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>>>>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>>>>> @@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
>>>>>       return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >>
>>>>> PAGE_SHIFT;
>>>>>   }
>>>>>
>>>>> -static inline struct page_info* alloc_amd_iommu_pgtable(void)
>>>>> +static inline struct page_info *alloc_amd_iommu_pgtable(struct
>>>>> domain *d)
>>>>>   {
>>>>>       struct page_info *pg;
>>>>>       void *vaddr;
>>>>>
>>>>> -    pg = alloc_domheap_page(NULL, 0);
>>>>> +    pg = alloc_domheap_page(d, MEMF_no_owner);
>>>> Same comment as with the VT-d side of things.  This should be based on
>>>> the proximity information of the IOMMU, not of the owning domain.
>>> I think I buy this argument on the VT-d side (under the assumption
>>> that there's going to be at least one IOMMU per node), but I'm not
>>> sure here: The most modern AMD box I have has just a single
>>> IOMMU for 4 nodes it reports.
>>
>> It is not possible for an IOMMU to cover multiple NUMA nodes worth of
>> IO, because of the position it has to sit relative to the IO root ports
>> and QPI/HT links.
>>
>> In AMD systems, the IOMMUs lives in the northbridges, meaning one per
>> numa node (as it is the northbridges which contain the hypertransport
>> links)
>>
>> The BIOS/firmware will only report IOMMUs from northbridges which have
>> IO connected to their IO hypertransport link (most systems in the wild
>> have all IO hanging off one or two Numa nodes).  On the other hand, I
>> have an AMD system with 8 IOMMUs in use.
>
>
> Actually, a single IOMMU could handle multiple nodes. For example, in
> scenario of a multi-chip-module (MCM) setup, there could be at least
> 2-4 nodes sharing one IOMMU depending on how the platform vendor
> configuring the system. In the server platforms, IOMMU is in AMD
> northbridge chipsets (e.g. SR56xx). This website has an example of
> such system configuration
> (http://www.qdpma.com/systemarchitecture/SystemArchitecture_Opteron.html).

Ok - I was basing my example on the last layout I had the manual for,
which I believe was Bulldozer.

However, my point still stands that there is an IOMMU between any IO and
RAM.  An individual IOMMU will always benefit from having its
iopagetables on the local numa node, rather than the numa node(s) which
the domain owning the device is running on.

>
> For AMD IOMMU, the IVRS table specifies the PCI bus/device ranges to
> be handled by each IOMMU. This is probably should be considered here.

Presumably a PCI transaction must never get onto the HT bus without
having already undergone translation, or there can be no guarantee that
it would be routed via the IOMMU?  Or are you saying that there are
cases where a transaction will enter the HT bus, route sideways to an
IOMMU, undergo translation, then route back onto the HT bus to the
target RAM/processor?

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.