[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] PoD code killing domain before it really gets started
George, in the hope that you might have some insight, or might be remembering that something like this was reported before (and ideally fixed), I'll try to describe a problem a customer of ours reported. Unfortunately this is with Xen 4.0.x (plus numerous backports), and it is not known whether the same issue exists on 4.1.x or -unstable. For a domain with maxmem=16000M and memory=3200M, what gets logged is (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages 480 pod_entries 221184 (XEN) domain_crash called from p2m.c:1150 (XEN) Domain 3 reported crashed by domain 0 on cpu#6: (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages 480 pod_entries 221184 (XEN) domain_crash called from p2m.c:1150 Translated to hex, the numbers are 1e0 and 36000. The latter one varies across the (rather infrequent) cases where this happens (but was always a multiple of 1000 - see below), and instant retries to create the affected domain did always succeed so far (i.e. the failure is definitely not because of a lack of free memory). Given that the memory= target wasn't reached, yet, I would conclude that this happens in the middle of (4.0.x file name used here) tools/libxc/xc_hvm_build.c:setup_guest()'s main physmap population code. However, the way I read the code there, I would think that the sequence of population should be (using hex GFNs) 0...9f, c0...7ff, 800-fff, 1000-17ff, etc. That, however appears to be inconsistent with the logged numbers above - tot_pages should always be at least 7e0 (low 2Mb less the VGA hole), especially when pod_entries is divisible by 800 (the increment by which large page population happens). As a result of this apparent inconsistency I can't really conclude anything from the logged numbers. The main question, irrespective of any numbers, of course is: How would p2m_pod_demand_populate() be invoked at all during this early phase of domain construction? Nothing should be touching any of the memory... If this nevertheless is possible (even if just for a single page), then perhaps the tools ought to make sure the pages put into the low 2Mb get actually zeroed, so the PoD code has a chance to find victim pages. Thanks for any thoughts or pointers, Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |