[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] support for more than 32 VCPUs when migrating PVHVM guest



Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> writes:

> On Mon, Feb 02, 2015 at 12:03:28PM +0100, Vitaly Kuznetsov wrote:
>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> writes:
>> 
>> > On 02/02/15 10:47, Vitaly Kuznetsov wrote:
>> >> Hi Konrad,
>> >>
>> >> I just hit an issue with PVHVM guests after save/restore (or migration),
>> >> if a PVHVM guest has > 32 VCPUs it hangs. Turns out, you saw it almost a
>> >> year ago and even wrote patches to call VCPUOP_register_vcpu_info after
>> >> resume. Unfortunately these patches never made it to xen/kernel. Do you
>> >> have a plan to pick this up? What were the arguments against your
>> >> suggestion?
>> >
>> > 32 VCPUs is the legacy limit for HVM guests, but should not have any
>> > remaining artefacts these days.
>> >
>> > Do you know why the hang occurs?  I can't spot anything in the legacy
>> > migration code which would enforce such a limit.
>> >
>> > What is the subject of the thread you reference so I can search for it?
>> >
>> 
>> Sorry, I should have send the link:
>> 
>> http://lists.xen.org/archives/html/xen-devel/2014-04/msg00794.html
>> 
>> Konrad's patches:
>> 
>> http://lists.xen.org/archives/html/xen-devel/2014-04/msg01199.html
>> 
>> The issue is that we don't call VCPUOP_register_vcpu_info after
>> suspend/resume (or migration) and it is mandatory.
>
> The issues I saw were that with the enablement of that everything
> (which is what Jan requested) seems to work - except that I , ah here it is:
>
> http://lists.xen.org/archives/html/xen-devel/2014-04/msg02875.html
> err:
>
> http://lists.xen.org/archives/html/xen-devel/2014-04/msg02945.html
>
>       > The VCPUOP_send_nmi did cause the HVM to get an NMI and it spitted out
>       > 'Dazed and confused'. It also noticed corruption:
>       > 
>       > [    3.611742] Corrupted low memory at c000fffc (fffc phys) = 00029b00
>       > [    2.386785] Corrupted low memory at ffff88000000fff8 (fff8 phys) = 
>       > 2990000000000
>       > 
>       > Which is odd because there does not seem to be anything in the path
>       > of hypervisor that would cause this.
>
>       Indeed. This looks a little like a segment descriptor got modified here
>       with a descriptor table base of zero and a selector of 0xfff8. That
>       corruption needs to be hunted down in any case before enabling
>       VCPUOP_send_nmi for HVM.
>
> I did not get a chance to "hunt down" that pesky issue. That is the only
> thing holding this patchset.
>
> Said patch is in my queue of patches to upstream (amongts 30 other ones) -
> and I am working through the review/issues - but it will take me quite some
> time - so if you feel like taking a stab at this - please do!

Thanks for summing this up for me, in case something pops up wrt this
corruption issue I'll report.

-- 
  Vitaly

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.