[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VMX status report. Xen:26323 & Dom0:3.7.1

To: <xen-devel@xxxxxxxxxxxxx>
From: Mats Petersson <mats.petersson@xxxxxxxxxx>
Date: Thu, 10 Jan 2013 17:27:04 +0000
Delivery-date: Thu, 10 Jan 2013 17:27:24 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 10/01/13 17:10, Andres Lagar-Cavilla wrote:

On Jan 10, 2013, at 3:57 AM, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:

On 10.01.13 at 08:51, "Ren, Yongjie" <yongjie.ren@xxxxxxxxx> wrote:

New issue(1)
==============
1. sometimes live migration failed and reported call trace in dom0
  http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1841

For the failed allocation, the only obvious candidate appears to be

        err_array = kcalloc(m.num, sizeof(int), GFP_KERNEL);

which quite obviously can be of (almost) arbitrary size because

        nr_pages = m.num;
        if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
                return -EINVAL;

really only checks for completely insane values.

This got introduced by Andres' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2
ioctl" and is becoming worse with Mukesh's recent "xen: privcmd:
support autotranslated physmap guests", which added another
similar (twice as large) allocation in alloc_empty_pages().

Perhaps the err_array in this case, since alloc_empty_pages only happens for 
auto translated dom0s.

Not familiar wether libxl changes (or is even capable of changing) parameters 
of the migration code. But right now in libxc, mapping is done in 
MAX_BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 
ints, which is *one* page.

So is really the kernel incapable of allocating one measly page?

This leads me to think that it might be gather_array, but that one would 
allocate a grand total of two pages.

In any case, both functions allocate arbitrary number of pages, and that is the 
fundamental problem.

What is the approach in the forward ported kernel wrt to gather_array?

The cleanest alternative I can think of is to refactor the the body of 
mmap_batch to allocate one page for each array, and iteratively call 
traverse_pages recycling the local arrays and increasing the pointers in the 
source user space arrays.

Having said that, that would allocate two pages (always), and the code right now 
allocates max three (for libxc driven migrations). So maybe the problem is 
elsewhere….

Thanks,
Andres

Whilst this may not add much to the discussion, where I have beenworking on the improved privcmd.c, I have been using 3.7.0rc5 and 3.8.0.Both of these seem to work fine for migration using the libxc interface(since I've been using the Xenserver build, the migration is not donethrough libxl).

I have not had a single failure to allocate pages in the migration, - Ihave a script that loops around migrating the guest to the same host asquickly as it can and I have used guests up to 64GB (and left thatrunning overnight - it takes about 3 minutes, so a night gives severalhundred iterations.


So I'm wondering what is different between my setup and this one...

--
Mats

I'd like to note that the forward ported kernels don't appear to
have a similar issue, as they never allocates more than a page at
a time. Was that code consulted at all when that addition was
done?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] VMX status report. Xen:26323 & Dom0:3.7.1
  - From: Ren, Yongjie
- Re: [Xen-devel] VMX status report. Xen:26323 & Dom0:3.7.1
  - From: Jan Beulich
- Re: [Xen-devel] VMX status report. Xen:26323 & Dom0:3.7.1
  - From: Andres Lagar-Cavilla

Prev by Date: Re: [Xen-devel] Is this a racing bug in page_make_sharable()?
Next by Date: [Xen-devel] [PATCH 2/2] libxl: correct xenstore permissions on console device
Previous by thread: Re: [Xen-devel] VMX status report. Xen:26323 & Dom0:3.7.1
Next by thread: Re: [Xen-devel] VMX status report. Xen:26323 & Dom0:3.7.1
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.