Xen project Mailing List

[RFC] analysis of runtime xenheap memory allocations

From: Andrea Bastoni <andrea.bastoni@xxxxxxxxxxxxxxx>

Date: Mon, 17 Mar 2025 19:48:59 +0100

Autocrypt: addr=andrea.bastoni@xxxxxxxxxxxxxxx; keydata= xsFNBF5Nh4sBEAC7UM3QJtjrFO3pjcMCCh04JFyCCDzLFMIqMTB1UWCLamZ9dUwIau7ScgWv 49aqbM++edVvEBmG8JHDC83DFWymvFVXBgqgcR7tHHBbg33XJKFMHvuW/kFm/67XPTFcec4L JsH5MWms9TLJbgCnaWQQMH3kztTRQaf5QcULIoHnTySKlt3WzzzHosaMO+/GNYX7vzfc4ypJ mD5SQWYDhfRefASkyxdrN6/QkPwS2vGTyVK58o2U9I27KPYvs+77JrjrNBfpnebapaYVA55C 7BvTnno5Kr6QHwA6LcnIZqefz7KxQ1n+1C5QQbmhi9S68aloGCeUo9R06UMJG79TXC2Mc68t AtSCN/HpgcvN1CSL45f/4WCDPG572ebo5M6MPcTb4ptV1SC/i+4U/3cG0LNSUap+sGRCf0Iy C5xy0KOtgoq8jesdleSy8j/3DNIMGekSYbQYMO39DfZds2XFh9lVDjG7tQcChwW+lQDPo113 ENBRftDyqJthrvmJXGyrOmn0su56qh2Zqvi5pSHWsH88vAZUJsOU+9lpohmcb3o/cQ18UXNK H/9wjo2zKHFdSylQFERHIzj6WlBp01wkTcCqtUGpxsjJHmVSyakWs3TrGXooKR9SPMxqVrD/ oCCEo9IUD9jd+TxLsp/4TzUp4ScTO/43uPgdkMekU5mRs6B6WwARAQABzS9BbmRyZWEgQmFz dG9uaSA8YW5kcmVhLmJhc3RvbmlAbWluZXJ2YXN5cy50ZWNoPsLBlAQTAQgAPgIbAwULCQgH AgYVCgkICwIEFgIDAQIeAQIXgBYhBImpnm1L3x9XIoXhD3VSShFTR9xSBQJitcUIBQkKC9f2 AAoJEHVSShFTR9xSmSoP/0W/VboJmWmLh89eIl1QRoiL1nGcti1fTN835Q2TPiSdg+nFVqy1 8oLrJaHNe5THVaSr4S2du56SASYSG6f+Y5aiQccbalUJIV7KwXr741kovTnUPUKjPoi61Bx4 DUzis0Xo0NmMnz7M1YudbQZmjoakE/wZJRB+b79+kJwrfZFcNg4DSuSvNOUeI17uapLKRQ1a ukuRgXwUpMOudKngJ8HB+16aHIQX+yfpcsanNr1nGwHSLhEPEM20DVzKiCp2cKjv9Y7zZ+6y akbJHdbRuJliyZmXaSVO8sQ1tO/W6Y/4zAjejw2c1qDKISeIlGi+o6UEVYZlKCThzrV9vYok AcJF4DlYcAuBMVYCTomovXb/9/Y48pRGlfDGrmnt+FvGVA0jYrG02oufItY2JAGzFcX1KMTG AGf1S7pOj3AvBTGJjW8d8+sXuedH51HNixJtMnzcmi+qJfPJujBW3BEZ1p0ZoDyTH/WCZF+7 B5lFvTi0DRKY6AI0TSzBdjtaOMt64fn6KXtLtaTZKDKkFUBwAShJyYcWQSp/7EO+ZJW7dWIw 1vM5AcnugonJby+q+JGgBVC2rjscu5Okl3lnviF9WLAzmRH/pDkAa3jifiUm8eS+dP+4cN6g WXL9vTF6FtFyI8sgzRlY/IX8mao5bV/P1HCwNvkBhO8C3XMaul4FwQsjzsFNBF5Nh4sBEADN J99l+vOp8LB8jDjWOhINlpgp+EcrmWOuler5QnoJUywc2zkLelQIwVGP2lFliMdLHM6DbMEX ySIzHbhw7oPRP0QRPK/6I4bXYkSQCrLyqYd0CYSbkar8YV6Xa6nGxRmP1bBv1lPFHN66D0yE /z1ScGMXyX+ZOIvH0ekIkqFvi7Ec/7a/ntfU43o2t05dmbnEKoECZgeS8SraojfKnQRpz7+P N0q45O5fMETZpIiQh1/mB12HOcklDNELcKohqVfevbknJw04Yjbcv79aGpBRqoVWWBS4TxcD CRPQZ+H0tMUVEL/MqO7tNLA1VuGpOccyFtZnC/+J/twa7iKpPIxS9Ec/LDYTddebWH+8gOmr /PkBerBXghlZpxmQUlJeQ8kyecOOc4C7ec3aUGj+x28j0+zlXFLUbjiKIEM5VowIMgDDRwA/ MDr9IJhFzHaY2VCfBnX8sgJSn62IxqREq4X3KkR/Jtxt+HYXQYLl0yva2MBplkRcwQO799o6 woAMW0uyct4+BUcKo1sBFP2x2n4NFiPEjeoH3y9baruD9iiMQsmbJ3IKqtT13crCa+bcY3ZS Oz+CymgzNdH+RabJMC3mGfKIhUQGwEHz+wyMnv16nqO49bmoCk3q5Oneo4I3XwI3QbIJr0rd QkX6oh6R0taC3naal1ZYGxs0vZK567bT5wARAQABwsF2BBgBCAAgFiEEiamebUvfH1ciheEP dVJKEVNH3FIFAl5Nh4sCGwwACgkQdVJKEVNH3FLafxAAl7pW0v6Jju19I6wtB+XNzzi5Wota 3AyWzCxO/hUHNGRV/ZufhMkNFCMNzAxbdmO56eCk9ZYf/JMLu8H1GwhV1NgQ7HL4FNXXxLZ0 ixZDik86iiSjVLtEjLuwkS4Fj9wjqevycL/t16kJduFNyxT0/XiB5UPh5NClOMonHSC+V2If Kf6l2Ic34CrA3ovkfVvBXJTzia0VgyQCikeazgPRELH8bq2WTBWfjR3/l86Y0twiYyXqBNQ8 Q2Z83mu+yt38gTanz4YuDYz7YFjvL6IU2MZ5+ByothK6Cfx4W595q81dsGcJOlcd6j3QE+ps b3SHuToWZCHZRHyKrgGDqCL5RbsK3wXgDmc48SfN+5TxenT8A1lkoOHFcQ0SV8xleiwURXHc Ao+SzyDcTfltBNjzQmntQjAfq1Lq5Ux9nfpPMgnVHu5ANWeLtwLJyh+4aPVE5hGjeCa+Ab5U MyocCYdTuAmDHB9RQdf9c+qlVYuZCg7yYlWsvId5DGZnab2MzvExayaFCJVEoCccpfrqFFiF kJ19BogE4A6VTU0ShoHYJhLg7PuEZS1oWzULZnM8sNNI72MecvfZn5Oi0ZEJhFh+HETlJnIT 7gh7CGFBxPacT8vHxmeMPod7qrvYgKW+QKhU+tAI8gkI6hHXLBg/dxn7wAwTjlX1bo+jRJyp Id5SuAU=

Cc: Stefano Stabellini <stefano.stabellini@xxxxxxx>, Ayan Kumar Halder <ayan.kumar.halder@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, VictorM.Lira@xxxxxxx, Carlo Nonato <carlo.nonato@xxxxxxxxxxxxxxx>, Alex Zuepke <alex.zuepke@xxxxxxxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, jgross@xxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Mon, 17 Mar 2025 18:49:11 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi, In the last months we have been working with Stefano's team in AMD at tools to facilitate the analysis of "anonymous" memory allocations performed at "runtime" from the shared Xen heap. - Anonymous here means: allocations that insist on the common Xen heap pool (i.e., not on a per-domain heap pool). - Runtime here means: system_state >= SYS_STATE_active. TL;DR: A set of patches print a warning stack when anonymous allocations are detected at runtime. Scripts help to parse the stack traces that can be checked to evaluate how problematic certain allocations paths can be or how difficult it could be to update them. A full example of the results is e.g., in [1], and the Readme-details in [2]. Below in the email more details and some commented stack traces. Feedback, especially on the paths that are currently marked as "UNDECIDED" is very welcome. [1]: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/x86_64/xilinx-hyperlaunch-x86_64-gcc-debug-virtio-pci-net [2]: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md Here the long version. The work is a starting point to identify the paths leading to anonymous allocations, identify whether they could be "problematic" (see below), and simplify decisions on possible changes targeting such allocations. Why --- In contexts like those targeted by Xen for safety, we want to run VMs with different criticality side-by-side (in parallel). Xen provides services to both critical and non critical VMs. Shared resources at Xen-level may become problematic when low/no-criticality VM can prevent higher criticality VMs from receiving the requested service. This is also relevant in security context where we want to ensure that VMs cannot deplete resources shared with other VMs. The shared Xen heap (xenheap) is one of such shared resources that is accessed from both potentially critical and non critical VMs to serve various memory allocations. While for system_state < SYS_STATE_active we can statically define the amount of Xen heap that will be consumed by the system, at runtime this becomes blurry as it depends on the system's usage. This is problematic when a critical VM cannot receive its intended service due to xenheap memory depletion. What ---- We want to identify those anonymous allocations, understand the paths that lead to them, and the allocated sizes. Also, many allocations are expected to be followed by a "free", and it is useful to keep track of how much memory is left allocated after a service completes. To address this, we have been following multiple directions: a) Runtime testing b) Offline call path analysis c) Draft of changes needed for a shared -> per-domain heap switch Runtime testing --------------- See: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md The idea is to hook in the testing infrastructure of Xen and detect anonymous allocations at runtime. The system is composed by a set of patches added to `xen/common/page_alloc.c` and `xen/common/xmalloc_tlsf.c` that will result in a warning stack being produced at runtime when anonymous allocations are detected. In such cases, the warning will try to detect additional information like: the domain currently running and the pointer to the allocated memory area. Such information is later matched (if possible) against corresponding frees. Specifically, if possible, anonymous memory allocated via `alloc_domheap_pages` is matched against `free_domheap_pages`, while memory allocated via `xmalloc` is matched against `xfree`. Depending on the type of test, the amount of information discovered by the tool might differ. The execution can be integrated in the Xen pipeline (e.g., https://gitlab.com/xen-project/people/sstabellini/xen/-/pipelines/1682205969). A script (https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/log_parser.py) parses the CI output and produces statistics and stack traces leading to an allocation. Those stack traces can be checked to evaluate how problematic certain allocations paths can be or how difficult it could be to update them. The tool support building a "database" of comments associated with a stack trace and can automatically (re)apply comments to known stack traces in further runs. (See #Automated Generation of Comments in the README.) An example of the processing can be seen e.g., here (for pipeline 1674833972): https://gitlab.com/minervasys/public/xen/-/tree/minerva/warn/minerva_analysis/1674833972 Here, the parsed files are already commented with, for example: ``` // MINERVA: UNDECIDED // Anonymous allocation increases per-CPU heap. // Allocation not attributable to any domain. Domain : d0 Occurrences: 1 Total size : 128 B return_to_new_vcpu64 leave_hypervisor_to_guest traps.c#check_for_pcpu_work do_softirq softirq.c#__do_softirq timer.c#timer_softirq_action _xmalloc _xmalloc ``` In this example, a direct `_xmalloc` is triggered from an ISR. No domain can be held accountable for such allocation and a different strategy to ensure/motivate the boundness of such allocations should be planned. ``` // MINERVA: PROB_SAFE (under the assumption that hwdom are safe) // the path is already attributed to a domain pool, except if the domain is // a hardware domain, in this case the allocation is anonymous (p2m_alloc_page). alloc_domheap_pages() entry.o#guest_sync_slowpath do_trap_guest_sync traps.c#do_trap_hypercall do_memory_op xenmem_add_to_physmap xenmem_add_to_physmap_one p2m_set_entry p2m.c#__p2m_set_entry p2m.c#p2m_next_level p2m.c#p2m_alloc_page alloc_domheap_pages alloc_domheap_pages ``` In this example, the allocation is marked as "probably safe" since the path is taken only when the domain is a hardware domain. Such domains can (under the usage-domain of interest) perform allocations. Offline Call Path Analysis -------------------------- See: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_static_analysis/README.md This analysis can complement the runtime discovery of anonymous allocations and uses compiler support to generate call-graphs (using -fcallgraph-info). Two scripts support parsing and filtering the (large amount of) data generated. `callpath.py` generates the call paths leading to one selected function. `filter.py` can filter the call paths based on whitelist / blacklist of functions that should be included / excluded. For example, in the above branch we have used the scripts to generate and refine the paths leading to `alloc_domheap_pages()`. The offline analysis can complement the runtime one by reporting reacheable paths that have not (yet) been generated in testing scenarios. Draft of changes for per-domain heap ------------------------------------ See: https://gitlab.com/minervasys/public/xen/-/tree/minerva/per-domain-xenheap-idea-approach We have prototyped some of the changes that would be needed (especially in the configuration) if more allocations that are now anonymous would be taken by a per-domain heap instead. The main change in the prototype is to enforce the current domain as target heap allocator for all anonymous (i.e., domain == NULL) allocations triggered from alloc_xenheap_pages(). Also, we force the use of the newly modified alloc_xenheap_page() in alloc_xen_pagetable() ``` - pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub); + if (system_state == SYS_STATE_active) + d = current->domain; + + pg = alloc_domheap_pages(d, order, memflags | MEMF_no_scrub); ``` The change is _not_ correct in many cases (since current->domain might not be the right domain), but allows finding the needed configuration changes and potential needs for increasing the sizes of per-domain heaps. The commit: https://gitlab.com/minervasys/public/xen/-/commit/c1d6baae27932d2a7f07e82560deae5f64c5536a implements these changes. Thanks, Andrea

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.