[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] error in xen/arch/x86/mm.c:get_page during migration



>>> On 21.02.13 at 15:48, Olaf Hering <olaf@xxxxxxxxx> wrote:

> While doing "while xm migrate --live domU localhost;do sleep 2;done" I
> see many errors from get_page:
> 
> ...
> (XEN) HVM56 restore: TSC_ADJUST 0
> (XEN) HVM56 restore: TSC_ADJUST 1
> (XEN) mm.c:1982:d0 Error pfn 41a863: rd=ffff83036ffef000, 
> od=0000000000000000, caf=180000000000000, taf=7400000000000001
> (XEN) mm.c:1982:d0 Error pfn 41be1c: rd=ffff83036ffef000, 
> od=0000000000000000, caf=180000000000000, taf=7400000000000001
> (XEN) mm.c:1982:d0 Error pfn 41a862: rd=ffff83036ffef000, 
> od=0000000000000000, caf=180000000000000, taf=7400000000000001
> (XEN) mm.c:1982:d0 Error pfn 41b90f: rd=ffff83036ffef000, 
> od=0000000000000000, caf=180000000000000, taf=7400000000000001
> (XEN) mm.c:1982:d0 Error pfn 41b49a: rd=ffff83036ffef000, 
> od=0000000000000000, caf=180000000000000, taf=7400000000000001
> (XEN) mm.c:1982:d0 Error pfn 41b48d: rd=ffff83036ffef000, 
> od=0000000000000000, caf=180000000000000, taf=7400000000000001
> (XEN) irq.c:375: Dom56 callback via changed to Direct Vector 0xf3
> (XEN) HVM56 save: CPU
> ...
> 
> The pfn number and the amount of pfn differs during iterations, but in the 
> end
> only these two variants appear:
> 
> # xm dmesg | grep -w mm | cut -d : -f 4- | sort | uniq -c | sort
>      22  rd=ffff83036ffef000, od=0000000000000000, caf=180000000000000, 
> taf=7400000000000001
>      46  rd=ffff83036ffef000, od=0000000000000000, caf=180000000000000, 
> taf=0000000000000001
> 
> 
> It does not seem to cause issues other than the log output.
> Does it indiciate a real bug?

I'm afraid it does - a non-zero type count should generally not be
accompanied by a zero general count. That's specifically because
lone put_page_type() calls are pretty rare, and going through all
of them I don't see anyone that could be one being outstanding
in your case.

I'm surprised this doesn't cause an assertion to trigger somewhere.
You are using a debug hypervisor, aren't you?

Of course, if this truly is just a "leaked" type reference, then no
other bad consequences are to be afraid of.

What you could do to get a better understanding of when this
happens is to add a WARN_ON() alongside the printk() (perhaps
such that it triggers only once for each of the two different
cases), and then let us look at the call trace.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.