[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC] analysis of runtime xenheap memory allocations
On 17/03/2025 6:48 pm, Andrea Bastoni wrote: > Hi, > > In the last months we have been working with Stefano's team in AMD at tools to > facilitate the analysis of "anonymous" memory allocations performed at > "runtime" > from the shared Xen heap. > > - Anonymous here means: allocations that insist on the common Xen heap pool > (i.e., not on a per-domain heap pool). > - Runtime here means: system_state >= SYS_STATE_active. > > TL;DR: A set of patches print a warning stack when anonymous allocations are > detected at runtime. Scripts help to parse the stack traces that can be > checked > to evaluate how problematic certain allocations paths can be or how difficult > it > could be to update them. A full example of the results is e.g., in [1], and > the > Readme-details in [2]. Below in the email more details and some commented > stack > traces. > > Feedback, especially on the paths that are currently marked as "UNDECIDED" > is very welcome. > > [1]: > https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/x86_64/xilinx-hyperlaunch-x86_64-gcc-debug-virtio-pci-net > [2]: > https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md Hello, A few observations. As you note, assigning ownership to current->domain is usually the wrong thing. Setting PGC_allocated however is worse; you've created a security vulnerability where guests can free memory that Xen things is private to it. Similarly, you can't interchange xenheap and domheap like that. It will work on small systems, but not on large ones. Nevertheless, it's an ok start for debugging. Ages ago, I tried to make a start on this problem, https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-fault-ttl simply by counting the number of xmalloc()'s to the domain with with they were logically associated. It's stalled because the fault-injection test found a refcounting bug and there isn't even an agreement on whether it's a bug or not, or how to fix it. I'm struggling to follow the analysis on https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/9207014172 ; However, it appears to be a mix of reports between arm64 and x86_64 jobs, and they need separating to get a clearer picture. The bottom of each stack trace appears to be doubled. Is that intentional? Part of the problem is that xmalloc() isn't separated from {xen,dom}heap, despite the former being layered on top of the latter. All *heap pages are going to a multiple of 4k as that's the allocation unit. It will probably be better to terminate at _xmalloc() and ignore the call down into alloc_domheap_page(). Importantly, _malloc allocating another 4k page isn't strictly related to the caller, when it's doing so to replenish it's free pool. The 1-byte allocations are silly. bitmap_to_xenctl_bitmap() is buggy for trying to cater to an endianness case we simply don't support. There's no memory allocation needed at all. There are patches to resolve this from aaages back, blocked on a request to fix up other endian helpers in the meantime. evtchn_fifo_init_control() seems to show up a lot, and seems to be the only dynamic allocation triggered by the guest itself. It's reasonably large, but probably ought to be allocated up front. event_fifo is pretty ubiquitous these days. I'm not sure where the alternative sizes come from; they ought to all be 392 bytes. I'm concerned by the xfree() in efi_runtime_call() seeing as there's no associated allocation that I can see. Finally, it's not just anonymous allocations you need to worry about. Even ones out of the p2m memory pool are still fatal at the point of exhaustion, and can cause safety issues. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |