[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC] analysis of runtime xenheap memory allocations
On 18/03/2025 01:35, Andrew Cooper wrote: > On 17/03/2025 6:48 pm, Andrea Bastoni wrote: >> Hi, >> >> In the last months we have been working with Stefano's team in AMD at tools >> to >> facilitate the analysis of "anonymous" memory allocations performed at >> "runtime" >> from the shared Xen heap. >> >> - Anonymous here means: allocations that insist on the common Xen heap pool >> (i.e., not on a per-domain heap pool). >> - Runtime here means: system_state >= SYS_STATE_active. >> >> TL;DR: A set of patches print a warning stack when anonymous allocations are >> detected at runtime. Scripts help to parse the stack traces that can be >> checked >> to evaluate how problematic certain allocations paths can be or how >> difficult it >> could be to update them. A full example of the results is e.g., in [1], and >> the >> Readme-details in [2]. Below in the email more details and some commented >> stack >> traces. >> >> Feedback, especially on the paths that are currently marked as "UNDECIDED" >> is very welcome. >> >> [1]: >> https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/x86_64/xilinx-hyperlaunch-x86_64-gcc-debug-virtio-pci-net >> [2]: >> https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md > > Hello, > > A few observations. > > As you note, assigning ownership to current->domain is usually the wrong > thing. Setting PGC_allocated however is worse; you've created a > security vulnerability where guests can free memory that Xen things is > private to it. > > Similarly, you can't interchange xenheap and domheap like that. It will > work on small systems, but not on large ones. Nevertheless, it's an ok > start for debugging. Thank you for the feedback. As you saw, there are two parts. Here you're referring to https://gitlab.com/minervasys/public/xen/-/commit/c1d6baae27932d2a7f07e82560deae5f64c5536a which we used as debugging to find out where the configuration needed to be updated. This is in a separate branch as the rest of the observations (minerva/per-domain-xenheap-idea-approach vs minerva/warn) > Ages ago, I tried to make a start on this problem, > https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-fault-ttl > simply by counting the number of xmalloc()'s to the domain with with > they were logically associated. It's stalled because the > fault-injection test found a refcounting bug and there isn't even an > agreement on whether it's a bug or not, or how to fix it. The other part goes along a similar line trying to find e.g., who triggered the xmalloc. > I'm struggling to follow the analysis on > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/9207014172 ; > However, it appears to be a mix of reports between arm64 and x86_64 > jobs, and they need separating to get a clearer picture. The job prints a summary of each analyzed log. The actual analysis is created as artifact of the pipeline and can be downloaded. The output is then splitted per architecture, similarly to what has been committed here: https://gitlab.com/minervasys/public/xen/-/tree/minerva/warn/minerva_analysis/1674833972/parsed > The bottom of each stack trace appears to be doubled. Is that intentional? Nice catch. On arm the parser of the logs needs to be modified to avoid considering this two times: (XEN) [ 240.296766] [<00000a000022f784>] _xmalloc+0xf0/0x32c (PC) (XEN) [ 240.298386] [<00000a000022f784>] _xmalloc+0xf0/0x32c (LR) > Part of the problem is that xmalloc() isn't separated from > {xen,dom}heap, despite the former being layered on top of the latter. > All *heap pages are going to a multiple of 4k as that's the allocation > unit. It will probably be better to terminate at _xmalloc() and ignore > the call down into alloc_domheap_page(). Those are tracked separately (and the free also tries to match them separately): "DEBUG:...xmem_pool_alloc" terminates at the _xmalloc(), "DEBUG...xmalloc_whole_pages" tracks the replenishments. > Importantly, _malloc > allocating another 4k page isn't strictly related to the caller, when > it's doing so to replenish it's free pool. Yes, the ownership tracking in free_xenheap_pages() is fuzzy. > The 1-byte allocations are silly. bitmap_to_xenctl_bitmap() is buggy > for trying to cater to an endianness case we simply don't support. > There's no memory allocation needed at all. There are patches to > resolve this from aaages back, blocked on a request to fix up other > endian helpers in the meantime. > > evtchn_fifo_init_control() seems to show up a lot, and seems to be the > only dynamic allocation triggered by the guest itself. It's reasonably > large, but probably ought to be allocated up front. event_fifo is > pretty ubiquitous these days. I'm not sure where the alternative sizes > come from; they ought to all be 392 bytes. You mean like here? https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/arm64/qemu-dom0less-arm64-gcc-debug-virtio-pci-net#L222 > // MINERVA: PROB_SAFE > // 1) evtchn_fifo_init_* should only happen once per domain at boot > // (NOTE: Count not always 2) > // 2) evtchn fifo evtchn can be disabled in favor of the older 2l evtchn > // interface that doesn't require allocations (xen_fifo_events=false) > Domain : d1 > Occurrences: 2 > Total size : 1424 B > entry.o#guest_sync_slowpath > do_trap_guest_sync > traps.c#do_trap_hypercall > do_event_channel_op > evtchn_fifo_init_control > _xzalloc > _xmalloc > > I'm concerned by the xfree() in efi_runtime_call() seeing as there's no > associated allocation that I can see. The verbose log files contain all the parsed allocation paths e.g., (https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/x86_64/verbose/xilinx-hyperlaunch-x86_64-gcc-debug-virtio-pci-net#L25) The summary misses this information. This can be improved. > Finally, it's not just anonymous allocations you need to worry about. > Even ones out of the p2m memory pool are still fatal at the point of > exhaustion, and can cause safety issues. Some of the allocations on such paths are tracked, e.g., https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/x86_64/xilinx-hyperlaunch-x86_64-gcc-debug-virtio-pci-net#L860 > // MINERVA: UNDECIDED > // Similar to guest_remove_page > // allocation related to the memory management of iommu page tables for > // a domain (the domain is available down the chain). > // The allocation of the iommu PDE is anonymous. > // If no memory is available, the domain is crashed. > // Unclear if only selected VMs can trigger this path. > Domain : d0 > Occurrences: 2 > Total size : 8192 B > svm_stgi_label > svm_vmexit_handler > hvm_hypercall > hvm_memory_op > do_memory_op > memory.c#xenmem_add_to_physmap_batch > xenmem_add_to_physmap_one > set_foreign_p2m_entry > p2m.c#set_typed_p2m_entry > p2m_set_entry > p2m-pt.c#p2m_pt_set_entry > iommu_legacy_map > iommu_map > amd_iommu_map_page > iommu_map.c#iommu_pde_from_dfn > iommu_alloc_pgtable > alloc_domheap_pages Maybe there is a better place where to terminate / catch these cases? Thanks, Andrea
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |