[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [xen-unstable-smoke test] 173492: regressions - FAIL
On 12/10/2022 11:49, Julien Grall wrote: > Hi Andrew, > > On 12/10/2022 11:29, Andrew Cooper wrote: >> On 12/10/2022 11:01, Julien Grall wrote: >>> (+ Bertrand & Stefano) >>> >>> Hi Henry, >>> >>> On 12/10/2022 07:39, Henry Wang wrote: >>>>> -----Original Message----- >>>>> Subject: Re: [xen-unstable-smoke test] 173492: regressions - FAIL >>>>> >>>>> On 11.10.2022 18:29, osstest service owner wrote: >>>>>> flight 173492 xen-unstable-smoke real [real] >>>>>> http://logs.test-lab.xenproject.org/osstest/logs/173492/ >>>>>> >>>>>> Regressions :-( >>>>>> >>>>>> Tests which did not succeed and are blocking, >>>>>> including tests which could not be run: >>>>>> test-arm64-arm64-xl-xsm 14 guest-start fail >>>>>> REGR. vs. 173457 >>>>> >>>>> Parsing config from /etc/xen/debian.guest.osstest.cfg >>>>> libxl: debug: libxl_create.c:2079:do_domain_create: ao >>>>> 0xaaaacaccf680: >>>>> create: how=(nil) callback=(nil) poller=0xaaaacaccefd0 >>>>> libxl: detail: libxl_create.c:661:libxl__domain_make: passthrough: >>>>> disabled >>>>> libxl: debug: libxl_arm.c:148:libxl__arch_domain_prepare_config: >>>>> Configure >>>>> the domain >>>>> libxl: debug: libxl_arm.c:151:libxl__arch_domain_prepare_config: - >>>>> Allocate >>>>> 0 SPIs >>>>> libxl: error: libxl_create.c:709:libxl__domain_make: domain creation >>>>> fail: No >>>>> such file or directory >>> >>> So this is -ENOENT which could be returned by the P2M is it can't >>> allocate a page table (see p2m_set_entry()). >>> >>>>> libxl: error: libxl_create.c:1294:initiate_domain_create: cannot >>>>> make domain: >>>>> -3 >>>>> >>>>> Later flights don't fail here anymore, though. >>>>> >>>>>> test-armhf-armhf-xl 14 guest-start fail >>>>>> REGR. vs. 173457 >>>>> >>>>> Similar log contents here, but later flights continue to fail the >>>>> same way. >>>>> >>>>> I'm afraid I can't draw conclusions from this; I haven't been able >>>>> to spot >>>>> anything helpful in the hypervisor logs. My best guess right now is >>>>> the use >>>>> of some uninitialized memory, which just happened to go fine in the >>>>> later >>>>> flights for 64-bit. >>> >>> It looks like the smoke flight failed on laxton0 but passed on >>> rochester{0, 1}. The former is using GICv2 whilst the latter are using >>> GICv3. >>> >>> In the case of GICv2, we will create a P2M mapping when the domain is >>> created. This is not necessary in the GICv3. >>> >>> IIRC the P2M pool is only populated later on (we don't add a few pages >>> like on x86). So I am guessing this is why we are seen failure. >>> >>> If that's correct, then this is a complete oversight from me (I >>> haven't done any GICv2 testing) while reviewing the series. >>> >>> The easy way to solve it would be to add a few pages in the pool when >>> the domain is created. I don't like it, but I think there other >>> possible solutions would require more work as we would need to delay >>> the mappings. >> >> Honestly, I've considered doing this on x86 too. > > AFAICT, this is already the case for HAP (see call to > hap_set_allocation() in hap_enable()). 256 pages will be pre-allocated. Right, but it's asymmetric with shadow. This wants fixing and simplifying. > >> >> There are several things which want allocating in domain_create(), but >> are deferred to max_vcpus() because they require the P2M having a >> non-zero allocation. This in turn means we've got a load of checks in >> paths where we'd ideally not have them. >> >> We already have a calculation of the absolutely minimum we will ever >> permit the p2m pool to be. IMO we ought to allocate this minimum size >> in domain_create(). > > It depends on the number. At the moment domain_create() is not > preemptible, so we don't want to allocate too many pages (I think even > 256 pages could be risky on some Arm platform). > > Maybe the solution is to have domain_create() preemptible. But it is > not something that could be done in the 4.17 time frame. domain_create() can't be pre-emptible in its current form, because it depends on "atomically" taking the domid from not existing to existing. Specifically, until the hypercall completes, other hypercalls can't find a struct domain* for the domid. This is necessary, because we guarantee that when you can look up a domain by domid, e.g. the predicates work on it, and d->max_vcpus is nonzero, etc. In some future where the error paths have been made idempotent and we have a clean split between teardown and destroy, we probably can alter the existing creation path to do a more basic initial setup (which can be cleaned up by the destroy logic), then insert the domain into dom hashtable and automatically continue into a different subop and perform more long-running setup. But yeah - absolutely definitely not 4.17 content. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |