[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [RFC] analysis of runtime xenheap memory allocations
Hi, In the last months we have been working with Stefano's team in AMD at tools to facilitate the analysis of "anonymous" memory allocations performed at "runtime" from the shared Xen heap. - Anonymous here means: allocations that insist on the common Xen heap pool (i.e., not on a per-domain heap pool). - Runtime here means: system_state >= SYS_STATE_active. TL;DR: A set of patches print a warning stack when anonymous allocations are detected at runtime. Scripts help to parse the stack traces that can be checked to evaluate how problematic certain allocations paths can be or how difficult it could be to update them. A full example of the results is e.g., in [1], and the Readme-details in [2]. Below in the email more details and some commented stack traces. Feedback, especially on the paths that are currently marked as "UNDECIDED" is very welcome. [1]: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/x86_64/xilinx-hyperlaunch-x86_64-gcc-debug-virtio-pci-net [2]: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md Here the long version. The work is a starting point to identify the paths leading to anonymous allocations, identify whether they could be "problematic" (see below), and simplify decisions on possible changes targeting such allocations. Why --- In contexts like those targeted by Xen for safety, we want to run VMs with different criticality side-by-side (in parallel). Xen provides services to both critical and non critical VMs. Shared resources at Xen-level may become problematic when low/no-criticality VM can prevent higher criticality VMs from receiving the requested service. This is also relevant in security context where we want to ensure that VMs cannot deplete resources shared with other VMs. The shared Xen heap (xenheap) is one of such shared resources that is accessed from both potentially critical and non critical VMs to serve various memory allocations. While for system_state < SYS_STATE_active we can statically define the amount of Xen heap that will be consumed by the system, at runtime this becomes blurry as it depends on the system's usage. This is problematic when a critical VM cannot receive its intended service due to xenheap memory depletion. What ---- We want to identify those anonymous allocations, understand the paths that lead to them, and the allocated sizes. Also, many allocations are expected to be followed by a "free", and it is useful to keep track of how much memory is left allocated after a service completes. To address this, we have been following multiple directions: a) Runtime testing b) Offline call path analysis c) Draft of changes needed for a shared -> per-domain heap switch Runtime testing --------------- See: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md The idea is to hook in the testing infrastructure of Xen and detect anonymous allocations at runtime. The system is composed by a set of patches added to `xen/common/page_alloc.c` and `xen/common/xmalloc_tlsf.c` that will result in a warning stack being produced at runtime when anonymous allocations are detected. In such cases, the warning will try to detect additional information like: the domain currently running and the pointer to the allocated memory area. Such information is later matched (if possible) against corresponding frees. Specifically, if possible, anonymous memory allocated via `alloc_domheap_pages` is matched against `free_domheap_pages`, while memory allocated via `xmalloc` is matched against `xfree`. Depending on the type of test, the amount of information discovered by the tool might differ. The execution can be integrated in the Xen pipeline (e.g., https://gitlab.com/xen-project/people/sstabellini/xen/-/pipelines/1682205969). A script (https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/log_parser.py) parses the CI output and produces statistics and stack traces leading to an allocation. Those stack traces can be checked to evaluate how problematic certain allocations paths can be or how difficult it could be to update them. The tool support building a "database" of comments associated with a stack trace and can automatically (re)apply comments to known stack traces in further runs. (See #Automated Generation of Comments in the README.) An example of the processing can be seen e.g., here (for pipeline 1674833972): https://gitlab.com/minervasys/public/xen/-/tree/minerva/warn/minerva_analysis/1674833972 Here, the parsed files are already commented with, for example: ``` // MINERVA: UNDECIDED // Anonymous allocation increases per-CPU heap. // Allocation not attributable to any domain. Domain : d0 Occurrences: 1 Total size : 128 B return_to_new_vcpu64 leave_hypervisor_to_guest traps.c#check_for_pcpu_work do_softirq softirq.c#__do_softirq timer.c#timer_softirq_action _xmalloc _xmalloc ``` In this example, a direct `_xmalloc` is triggered from an ISR. No domain can be held accountable for such allocation and a different strategy to ensure/motivate the boundness of such allocations should be planned. ``` // MINERVA: PROB_SAFE (under the assumption that hwdom are safe) // the path is already attributed to a domain pool, except if the domain is // a hardware domain, in this case the allocation is anonymous (p2m_alloc_page). alloc_domheap_pages() entry.o#guest_sync_slowpath do_trap_guest_sync traps.c#do_trap_hypercall do_memory_op xenmem_add_to_physmap xenmem_add_to_physmap_one p2m_set_entry p2m.c#__p2m_set_entry p2m.c#p2m_next_level p2m.c#p2m_alloc_page alloc_domheap_pages alloc_domheap_pages ``` In this example, the allocation is marked as "probably safe" since the path is taken only when the domain is a hardware domain. Such domains can (under the usage-domain of interest) perform allocations. Offline Call Path Analysis -------------------------- See: https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_static_analysis/README.md This analysis can complement the runtime discovery of anonymous allocations and uses compiler support to generate call-graphs (using -fcallgraph-info). Two scripts support parsing and filtering the (large amount of) data generated. `callpath.py` generates the call paths leading to one selected function. `filter.py` can filter the call paths based on whitelist / blacklist of functions that should be included / excluded. For example, in the above branch we have used the scripts to generate and refine the paths leading to `alloc_domheap_pages()`. The offline analysis can complement the runtime one by reporting reacheable paths that have not (yet) been generated in testing scenarios. Draft of changes for per-domain heap ------------------------------------ See: https://gitlab.com/minervasys/public/xen/-/tree/minerva/per-domain-xenheap-idea-approach We have prototyped some of the changes that would be needed (especially in the configuration) if more allocations that are now anonymous would be taken by a per-domain heap instead. The main change in the prototype is to enforce the current domain as target heap allocator for all anonymous (i.e., domain == NULL) allocations triggered from alloc_xenheap_pages(). Also, we force the use of the newly modified alloc_xenheap_page() in alloc_xen_pagetable() ``` - pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub); + if (system_state == SYS_STATE_active) + d = current->domain; + + pg = alloc_domheap_pages(d, order, memflags | MEMF_no_scrub); ``` The change is _not_ correct in many cases (since current->domain might not be the right domain), but allows finding the needed configuration changes and potential needs for increasing the sizes of per-domain heaps. The commit: https://gitlab.com/minervasys/public/xen/-/commit/c1d6baae27932d2a7f07e82560deae5f64c5536a implements these changes. Thanks, Andrea
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |