[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 16788: regressions - FAIL



At 16:34 +0000 on 04 Mar (1362414893), Jan Beulich wrote:
> >>> On 04.03.13 at 14:20, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
> > Having some problem testing this - not with the change itself,
> > but with something else that I don't understand yet
> > (XENMEM_add_to_physmap:XENMAPSPACE_grant_table failing
> > during domain construction, but only when hap=0 in the config
> > file).
> 
> So this is due to a number of factors:
> 
> shadow_enable() calls sh_set_allocation() with 1024 as the page
> count argument. My guest has 8 vCPU-s (and 2G or memory),
> causing (once all vCPU-s got created) shadow_min_acceptable_pages()
> to return 9*128=1152. Which means that there won't ever be a
> success return from shadow_alloc_p2m_page().

Argh.  sh_set_allocation() deliberately leaves (d->tot_pages / 256) of
overhead on top of shadow_min_acceptable_pages(), but this first call
happens before tot_pages is set. :(

> Lowering the
> vCPU count to 2 doesn't help - the now available portion of pages
> will all be consumed for p2m, and by the time the grant table
> physmap insertion happens there's again nothing left. Also
> lowering memory size to 512M did finally help.
> 
> So apparently xl doesn't set the shadow size early enough, as by
> the time domain creation fails, I can't observe any invocation of
> shadow_domctl(), i.e. also not libxl__arch_domain_create()'s
> XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION.

Hmmm.  libxl__arch_domain_create() seems to be called from
domcreate_attach_pci(), of all places -- oh, because the whole
process is a series of tail-calls -- and that's very late in
domain build.  I'm pretty sure xend allocated shadow RAM up front.
Or maybe we're only now getting to see VMs big enough that this is a
problem.

> That's all very unsatisfying, the more with the failure not easily
> being recognizable as an out of (shadow) memory related one. So

How annoying.  I'll add at least a one-time console warning and look
into some more sensible error propagation

> in the end I spent a couple of hours figuring all this out, rather
> than just running a simple test to verify the change above (in
> order to commit it quickly, so that hopefully the test failures would
> get addressed).
> 
> And in the end, Tim, it doesn't look like Linux HVM guests exercise
> the cross-page-boundary emulation path, so I can't really test the
> code. Shall I put it in nevertheless, or would you be able to give
> this a go beforehand?

Unfortunately there's no chance I can do any testing this week before
Thursday.  I think the patch should go in anyway -- it's not going to
make the situation any worse for VMs that do hit that path.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.