[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH V3 (resend) 01/19] x86: Create per-domain mapping of guest_root_pt
Hi Jan, On 14/05/2024 15:51, Jan Beulich wrote: On 13.05.2024 15:40, Elias El Yandouzi wrote:From: Hongyan Xia <hongyxia@xxxxxxxxxx> Create a per-domain mapping of PV guest_root_pt as direct map is being removed. Note that we do not map and unmap root_pgt for now since it is still a xenheap page. Signed-off-by: Hongyan Xia <hongyxia@xxxxxxxxxx> Signed-off-by: Julien Grall <jgrall@xxxxxxxxxx> Signed-off-by: Elias El Yandouzi <eliasely@xxxxxxxxxx> ---- Changes in V3: * Rename SHADOW_ROOT * Haven't addressed the potentially over-allocation issue as I don't get itI thought I had explained in enough detail that the GDT/LDT area needs quite a bit more space (2 times 64k per vCPU) than the root PT one (4k per vCPU). Thus while d->arch.pv.gdt_ldt_l1tab really needs to point at a full page (as long as not taking into account dynamic domain properties), d->arch.pv.root_pt_l1tab doesn't need to (and hence might better be allocated using xzalloc() / xzalloc_array(), even when also not taking into account dynamic domain properties, i.e. vCPU count). I just understood your point and yes you're correct I was over-allocating... Sorry, it took me so long to get it. I'll go instead with: @@ -371,6 +396,12 @@ int pv_domain_initialise(struct domain *d) goto fail; clear_page(d->arch.pv.gdt_ldt_l1tab); + d->arch.pv.root_pt_l1tab = + xzalloc_array(l1_pgentry_t *, + DIV_ROUND_UP(d->max_vcpus, L1_PAGETABLE_ENTRIES)); + if ( !d->arch.pv.root_pt_l1tab ) + goto fail; + if ( levelling_caps & ~LCAP_faulting && (d->arch.pv.cpuidmasks = xmemdup(&cpuidmask_defaults)) == NULL ) goto fail;However, I noticed quite a weird bug while doing some testing. I may need your expertise to find the root cause. In the case where I have more vCPUs than pCPUs (and let's consider we have one pCPU for two vCPUs), I noticed that I would always get a page fault in dom0 kernel (5.10.0-13-amd64) at the exact same location. I did a bit of investigation but I couldn't come to a clear conclusion. Looking at the stack trace [1], I have the feeling the crash occurs in a loop or a recursive call. I tried to identify where the crash occurred using addr2line: > addr2line -e vmlinux-5.10.0-29-amd64 0xffffffff810218a0 debian/build/build_amd64_none_amd64/arch/x86/xen/mmu_pv.c:880It turns out to point on the closing bracket of the function xen_mm_unpin_all()[2]. I thought the crash could happen while returning from the function in the assembly epilogue but the output of objdump doesn't even show the address. The only theory I could think of was that because we only have one pCPU, we may never execute one of the two vCPUs, and never setup the mapping to the guest_root_pt in write_ptbase(), hence the page fault. This is just a random theory, I couldn't find any hint suggesting it would be the case though. Any idea how I could debug this? [1] https://pastebin.com/UaGRaV6a [2] https://github.com/torvalds/linux/blob/v5.10/arch/x86/xen/mmu_pv.c#L880 Elias
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |