[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
On 27/07/2015 18:42, Dario Faggioli wrote:
> On Mon, 2015-07-27 at 17:33 +0100, Andrew Cooper wrote:
>> On 27/07/15 17:31, David Vrabel wrote:
>>>
>>>> Yeah, indeed. That's the downside of Juergen's "Linux scheduler
>>>> approach". But the issue is there, even without taking vNUMA into
>>>> account, and I think something like that would really help (only for
>>>> Dom0, and Linux guests, of course).
>>> I disagree. Whether we're using vNUMA or not, Xen should still ensure
>>> that the guest kernel and userspace see a consistent and correct
>>> topology using the native mechanisms.
>>
>> +1
>>
> +1 from me as well. In fact, a mechanism for making exactly such thing
> happen, was what I was after when starting the thread.
>
> Then it came up that CPUID needs to be used for at least two different
> and potentially conflicting purposes, that we want to support both and
> that, whether and for whatever reason it's used, Linux configures its
> scheduler after it, potentially resulting in rather pathological setups.
I don't see what the problem is here. Fundamentally, "NUMA
optimise" vs "comply with licence" is a user/admin decision at boot
time, and we need not cater to both halves at the same time.
Supporting either, as chosen by the admin, is worthwhile.
>
>
> It's at that point that some decoupling started to appear
> interesting... :-P
>
> Also, are we really being consistent? If my methodology is correct
> (which might not be, please, double check, and sorry for that), I'm
> seeing quite some inconsistency around:
>
> HOST:
>Â root@Zhaman:~# xl info -n
>Â ...
>Â cpu_topologyÂÂÂÂÂÂÂÂÂÂ :
>Â cpu:ÂÂÂ coreÂÂÂ socketÂÂÂÂ node
>ÂÂÂ 0:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 1:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 2:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 3:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 4:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 5:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 6:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 7:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0
>ÂÂÂ 8:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
>ÂÂÂ 9:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
>ÂÂ 10:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
>ÂÂ 11:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
>ÂÂ 12:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
>ÂÂ 13:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
>ÂÂ 14:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
>ÂÂ 15:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1
o_O
What kind of system results in this layout? Can you dump the ACPI
tables and make them available?
>
>Â ...
>Â root@Zhaman:~# xl vcpu-list test
>Â NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ IDÂ VCPUÂÂ CPU StateÂÂ Time(s) Affinity (Hard / Soft)
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 0ÂÂÂ 0ÂÂ r--ÂÂÂÂÂÂ 1.5Â 0 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 1ÂÂÂ 1ÂÂ r--ÂÂÂÂÂÂ 0.2Â 1 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 2ÂÂÂ 8ÂÂ -b-ÂÂÂÂÂÂ 2.2Â 8 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 3ÂÂÂ 9ÂÂ -b-ÂÂÂÂÂÂ 2.0Â 9 / all
>
> GUEST (HVM, 4 vcpus):
>Â root@test:~# cpuid|grep CORE_ID
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=1
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=0
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=1
>
> HOST:
>Â root@Zhaman:~# xl vcpu-pin 2 all 0
>Â root@Zhaman:~# xl vcpu-list 2
>Â NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ IDÂ VCPUÂÂ CPU StateÂÂ Time(s) Affinity (Hard / Soft)
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 0ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 43.7Â 0 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 1ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 38.4Â 0 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 2ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 36.9Â 0 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 3ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 38.8Â 0 / all
>
> GUEST:
>Â root@test:~# cpuid|grep CORE_ID
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0
>
> HOST:
>Â root@Zhaman:~# xl vcpu-pin 2 0 7
>Â root@Zhaman:~# xl vcpu-pin 2 1 7
>Â root@Zhaman:~# xl vcpu-pin 2 2 15
>Â root@Zhaman:~# xl vcpu-pin 2 3 15
>Â root@Zhaman:~# xl vcpu-list 2
>Â NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ IDÂ VCPUÂÂ CPU StateÂÂ Time(s) Affinity (Hard / Soft)
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 0ÂÂÂ 7ÂÂ -b-ÂÂÂÂÂ 44.3Â 7 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 1ÂÂÂ 7ÂÂ -b-ÂÂÂÂÂ 38.9Â 7 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 2ÂÂ 15ÂÂ -b-ÂÂÂÂÂ 37.3Â 15 / all
>Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 3ÂÂ 15ÂÂ -b-ÂÂÂÂÂ 39.2Â 15 / all
>
> GUEST:
>Â root@test:~# cpuid|grep CORE_ID
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=26 SMT_ID=1
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=26 SMT_ID=1
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=10 SMT_ID=1
>ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=10 SMT_ID=1
>
> So, it looks to me that:
>Â 1) any application using CPUID for either licensing or
>ÂÂÂÂ placement/performance optimization will get (potentially) random
>ÂÂÂÂ results;
>Â 2) whatever set of values the kernel used, during guest boot, to build
>ÂÂÂÂ up its internal scheduling data structures, has no guarantee of
>ÂÂÂÂ being related to any value returned by CPUID, at a later point.
>
> Hence, I think I'm seeing inconsistency between kernel and userspace
> (and between userspace and itself, over time) already... Am I
> overlooking something?
All current CPUID values presented to guests are about as reliable
as being picked from /dev/urandom. (This isn't strictly true - the
feature flags will be in the right ballpark if the VM has not
migrated yet).
Fixing this (as described in my feature levelling design document)
is sufficiently non-trivial that it has been deferred to post
feature-levelling work.
~Andrew
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|