[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: Bisected Host boot failure on AMD Phenom



On 02/03/17 19:15, Boris Ostrovsky wrote:
> On 03/02/2017 01:56 PM, Andrew Cooper wrote:
>> On 02/03/17 18:51, Sander Eikelenboom wrote:
>>> On 02/03/17 19:29, Andrew Cooper wrote:
>>>> On 02/03/17 18:25, Sander Eikelenboom wrote:
>>>>> On 02/03/17 18:38, Andrew Cooper wrote:
>>>>>> On 02/03/17 17:29, Sander Eikelenboom wrote:
>>>>>>> On 02/03/17 15:55, Andrew Cooper wrote:
>>>>>>>> On 02/03/17 14:42, Sander Eikelenboom wrote:
>>>>>>>>> Hi Andrew / Jan,
>>>>>>>>>
>>>>>>>>> While testing current xen-unstable staging i ran into my host 
>>>>>>>>> rebooting in early kernel boot. 
>>>>>>>>> Bisection has turned up:
>>>>>>>>>     5cecf60f439e828f4bc0d2a368ced9a73b130cb7 is the first bad commit
>>>>>>>>>     Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>>>>     Date:   Fri Feb 17 17:10:50 2017 +0000
>>>>>>>>>
>>>>>>>>>     x86/cpuid: Handle leaf 0x1 in guest_cpuid()
>>>>>>>>>
>>>>>>>>> Hardware is a AMD phenom x6.
>>>>>>>>> Below is the output of serial console of a failed boot.
>>>>>>>> Hmm.  Sorry for breaking this (although my AMD servers are booting 
>>>>>>>> fine).
>>>>>>> No problem, it is the staging branch of the unstable tree anyway ;-)
>>>>>>>
>>>>>>>> It is unfortunately not entirely obvious what Linux is objecting to, 
>>>>>>>> and
>>>>>>>> must be related to something visible in the emulated view.
>>>>>>>>
>>>>>>>> Does this delta make any difference?
>>>>>>> Yes it does, boots fine with this patch applied, thanks !
>>>>>> That is bad though. :s
>>>>>>
>>>>>> It means that something in dom0 has an aversion to my attempt to lie
>>>>>> less about the topology.
>>>>>>
>>>>>> Do you mind checking whether
>>>>>>
>>>>>> res->b = cpuid_ebx(0x1) & 0xff00ffffu;
>>>>>>
>>>>>> causes is to break again?
>>>>> Used that in the is_hardware_domain() case and it boots fine.
>>>> Hmm - curious.  I am now even more confused.
>>>>
>>>> What about this?
>>>>
>>>> res->b = cpuid_ebx(0x1) & 0x00ffffffu;
>>>>
>>>> It will leave the APIC_ID field zeroed rather than feeding v->vcpu_id
>>>> back into it.
>>> Also boots fine.
>> Right.  For my sanity, what about
>>
>> res->b = cpuid_ebx(0x1) & 0x00ffffffu;
>> res->b |= (v->vcpu_id * 2) << 24;
> FWIW, I booted a 2-node
>
>   (XEN) CPU Vendor: AMD, Family 21 (0x15), Model 1 (0x1), Stepping 2
> (raw 00600f12)
>
> with Linux 4.10 and latest staging. (I thought perhaps my nightly missed
> something because it's a single node)

I expect it might have something to do with fact that this failure to
boot is a 6-core system, rather than a power of two, at which point I
doubt the APIC IDs follow a linear trend.

(Properly fixing the reported topology is going to be a can of worms. 
All this series is trying to do is use just enough duct-tape to get the
hypervisor into a state where we can sensibly fix the reported topology.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.