[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC] analysis of runtime xenheap memory allocations


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Andrea Bastoni <andrea.bastoni@xxxxxxxxxxxxxxx>
  • Date: Mon, 17 Mar 2025 19:48:59 +0100
  • Autocrypt: addr=andrea.bastoni@xxxxxxxxxxxxxxx; keydata= xsFNBF5Nh4sBEAC7UM3QJtjrFO3pjcMCCh04JFyCCDzLFMIqMTB1UWCLamZ9dUwIau7ScgWv 49aqbM++edVvEBmG8JHDC83DFWymvFVXBgqgcR7tHHBbg33XJKFMHvuW/kFm/67XPTFcec4L JsH5MWms9TLJbgCnaWQQMH3kztTRQaf5QcULIoHnTySKlt3WzzzHosaMO+/GNYX7vzfc4ypJ mD5SQWYDhfRefASkyxdrN6/QkPwS2vGTyVK58o2U9I27KPYvs+77JrjrNBfpnebapaYVA55C 7BvTnno5Kr6QHwA6LcnIZqefz7KxQ1n+1C5QQbmhi9S68aloGCeUo9R06UMJG79TXC2Mc68t AtSCN/HpgcvN1CSL45f/4WCDPG572ebo5M6MPcTb4ptV1SC/i+4U/3cG0LNSUap+sGRCf0Iy C5xy0KOtgoq8jesdleSy8j/3DNIMGekSYbQYMO39DfZds2XFh9lVDjG7tQcChwW+lQDPo113 ENBRftDyqJthrvmJXGyrOmn0su56qh2Zqvi5pSHWsH88vAZUJsOU+9lpohmcb3o/cQ18UXNK H/9wjo2zKHFdSylQFERHIzj6WlBp01wkTcCqtUGpxsjJHmVSyakWs3TrGXooKR9SPMxqVrD/ oCCEo9IUD9jd+TxLsp/4TzUp4ScTO/43uPgdkMekU5mRs6B6WwARAQABzS9BbmRyZWEgQmFz dG9uaSA8YW5kcmVhLmJhc3RvbmlAbWluZXJ2YXN5cy50ZWNoPsLBlAQTAQgAPgIbAwULCQgH AgYVCgkICwIEFgIDAQIeAQIXgBYhBImpnm1L3x9XIoXhD3VSShFTR9xSBQJitcUIBQkKC9f2 AAoJEHVSShFTR9xSmSoP/0W/VboJmWmLh89eIl1QRoiL1nGcti1fTN835Q2TPiSdg+nFVqy1 8oLrJaHNe5THVaSr4S2du56SASYSG6f+Y5aiQccbalUJIV7KwXr741kovTnUPUKjPoi61Bx4 DUzis0Xo0NmMnz7M1YudbQZmjoakE/wZJRB+b79+kJwrfZFcNg4DSuSvNOUeI17uapLKRQ1a ukuRgXwUpMOudKngJ8HB+16aHIQX+yfpcsanNr1nGwHSLhEPEM20DVzKiCp2cKjv9Y7zZ+6y akbJHdbRuJliyZmXaSVO8sQ1tO/W6Y/4zAjejw2c1qDKISeIlGi+o6UEVYZlKCThzrV9vYok AcJF4DlYcAuBMVYCTomovXb/9/Y48pRGlfDGrmnt+FvGVA0jYrG02oufItY2JAGzFcX1KMTG AGf1S7pOj3AvBTGJjW8d8+sXuedH51HNixJtMnzcmi+qJfPJujBW3BEZ1p0ZoDyTH/WCZF+7 B5lFvTi0DRKY6AI0TSzBdjtaOMt64fn6KXtLtaTZKDKkFUBwAShJyYcWQSp/7EO+ZJW7dWIw 1vM5AcnugonJby+q+JGgBVC2rjscu5Okl3lnviF9WLAzmRH/pDkAa3jifiUm8eS+dP+4cN6g WXL9vTF6FtFyI8sgzRlY/IX8mao5bV/P1HCwNvkBhO8C3XMaul4FwQsjzsFNBF5Nh4sBEADN J99l+vOp8LB8jDjWOhINlpgp+EcrmWOuler5QnoJUywc2zkLelQIwVGP2lFliMdLHM6DbMEX ySIzHbhw7oPRP0QRPK/6I4bXYkSQCrLyqYd0CYSbkar8YV6Xa6nGxRmP1bBv1lPFHN66D0yE /z1ScGMXyX+ZOIvH0ekIkqFvi7Ec/7a/ntfU43o2t05dmbnEKoECZgeS8SraojfKnQRpz7+P N0q45O5fMETZpIiQh1/mB12HOcklDNELcKohqVfevbknJw04Yjbcv79aGpBRqoVWWBS4TxcD CRPQZ+H0tMUVEL/MqO7tNLA1VuGpOccyFtZnC/+J/twa7iKpPIxS9Ec/LDYTddebWH+8gOmr /PkBerBXghlZpxmQUlJeQ8kyecOOc4C7ec3aUGj+x28j0+zlXFLUbjiKIEM5VowIMgDDRwA/ MDr9IJhFzHaY2VCfBnX8sgJSn62IxqREq4X3KkR/Jtxt+HYXQYLl0yva2MBplkRcwQO799o6 woAMW0uyct4+BUcKo1sBFP2x2n4NFiPEjeoH3y9baruD9iiMQsmbJ3IKqtT13crCa+bcY3ZS Oz+CymgzNdH+RabJMC3mGfKIhUQGwEHz+wyMnv16nqO49bmoCk3q5Oneo4I3XwI3QbIJr0rd QkX6oh6R0taC3naal1ZYGxs0vZK567bT5wARAQABwsF2BBgBCAAgFiEEiamebUvfH1ciheEP dVJKEVNH3FIFAl5Nh4sCGwwACgkQdVJKEVNH3FLafxAAl7pW0v6Jju19I6wtB+XNzzi5Wota 3AyWzCxO/hUHNGRV/ZufhMkNFCMNzAxbdmO56eCk9ZYf/JMLu8H1GwhV1NgQ7HL4FNXXxLZ0 ixZDik86iiSjVLtEjLuwkS4Fj9wjqevycL/t16kJduFNyxT0/XiB5UPh5NClOMonHSC+V2If Kf6l2Ic34CrA3ovkfVvBXJTzia0VgyQCikeazgPRELH8bq2WTBWfjR3/l86Y0twiYyXqBNQ8 Q2Z83mu+yt38gTanz4YuDYz7YFjvL6IU2MZ5+ByothK6Cfx4W595q81dsGcJOlcd6j3QE+ps b3SHuToWZCHZRHyKrgGDqCL5RbsK3wXgDmc48SfN+5TxenT8A1lkoOHFcQ0SV8xleiwURXHc Ao+SzyDcTfltBNjzQmntQjAfq1Lq5Ux9nfpPMgnVHu5ANWeLtwLJyh+4aPVE5hGjeCa+Ab5U MyocCYdTuAmDHB9RQdf9c+qlVYuZCg7yYlWsvId5DGZnab2MzvExayaFCJVEoCccpfrqFFiF kJ19BogE4A6VTU0ShoHYJhLg7PuEZS1oWzULZnM8sNNI72MecvfZn5Oi0ZEJhFh+HETlJnIT 7gh7CGFBxPacT8vHxmeMPod7qrvYgKW+QKhU+tAI8gkI6hHXLBg/dxn7wAwTjlX1bo+jRJyp Id5SuAU=
  • Cc: Stefano Stabellini <stefano.stabellini@xxxxxxx>, Ayan Kumar Halder <ayan.kumar.halder@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, VictorM.Lira@xxxxxxx, Carlo Nonato <carlo.nonato@xxxxxxxxxxxxxxx>, Alex Zuepke <alex.zuepke@xxxxxxxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, jgross@xxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Mon, 17 Mar 2025 18:49:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi,

In the last months we have been working with Stefano's team in AMD at tools to
facilitate the analysis of "anonymous" memory allocations performed at "runtime"
from the shared Xen heap.

- Anonymous here means: allocations that insist on the common Xen heap pool
  (i.e., not on a per-domain heap pool).
- Runtime here means: system_state >= SYS_STATE_active.

TL;DR: A set of patches print a warning stack when anonymous allocations are
detected at runtime. Scripts help to parse the stack traces that can be checked
to evaluate how problematic certain allocations paths can be or how difficult it
could be to update them. A full example of the results is e.g., in [1], and the
Readme-details in [2]. Below in the email more details and some commented stack
traces.

Feedback, especially on the paths that are currently marked as "UNDECIDED"
is very welcome.

[1]: 
https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/1674833972/parsed/x86_64/xilinx-hyperlaunch-x86_64-gcc-debug-virtio-pci-net
[2]: 
https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md


Here the long version.

The work is a starting point to identify the paths leading to anonymous
allocations, identify whether they could be "problematic" (see below), and
simplify decisions on possible changes targeting such allocations.

Why
---
In contexts like those targeted by Xen for safety, we want to run VMs with
different criticality side-by-side (in parallel).
Xen provides services to both critical and non critical VMs. Shared resources at
Xen-level may become problematic when low/no-criticality VM can prevent
higher criticality VMs from receiving the requested service.

This is also relevant in security context where we want to ensure
that VMs cannot deplete resources shared with other VMs.

The shared Xen heap (xenheap) is one of such shared resources that is accessed
from both potentially critical and non critical VMs to serve various memory
allocations.  While for system_state < SYS_STATE_active we can statically define
the amount of Xen heap that will be consumed by the system, at runtime this
becomes blurry as it depends on the system's usage. This is problematic when a
critical VM cannot receive its intended service due to xenheap memory depletion.

What
----
We want to identify those anonymous allocations, understand the paths that lead
to them, and the allocated sizes. Also, many allocations are expected
to be followed by a "free", and it is useful to keep track of how much memory
is left allocated after a service completes.

To address this, we have been following multiple directions:
a) Runtime testing
b) Offline call path analysis
c) Draft of changes needed for a shared -> per-domain heap switch

Runtime testing
---------------
See: 
https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/README.md

The idea is to hook in the testing infrastructure of Xen and detect anonymous
allocations at runtime.

The system is composed by a set of patches added to `xen/common/page_alloc.c`
and `xen/common/xmalloc_tlsf.c` that will result in a warning stack being
produced at runtime when anonymous allocations are detected.
In such cases, the warning will try to detect additional information like:
the domain currently running and the pointer to the allocated memory area.
Such information is later matched (if possible) against corresponding frees.
Specifically, if possible, anonymous memory allocated via `alloc_domheap_pages`
is matched against `free_domheap_pages`, while memory allocated via `xmalloc` is
matched against `xfree`.

Depending on the type of test, the amount of information discovered by the tool
might differ.

The execution can be integrated in the Xen pipeline (e.g.,
https://gitlab.com/xen-project/people/sstabellini/xen/-/pipelines/1682205969).

A script
(https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_analysis/log_parser.py)
parses the CI output and produces statistics and stack traces leading to an
allocation. Those stack traces can be checked to evaluate how problematic
certain allocations paths can be or how difficult it could be to update them.

The tool support building a "database" of comments associated with a stack
trace and can automatically (re)apply comments to known stack traces in further
runs. (See #Automated Generation of Comments in the README.)

An example of the processing can be seen e.g., here (for pipeline 1674833972):
https://gitlab.com/minervasys/public/xen/-/tree/minerva/warn/minerva_analysis/1674833972
Here, the parsed files are already commented with, for example:

```
// MINERVA: UNDECIDED
// Anonymous allocation increases per-CPU heap.
// Allocation not attributable to any domain.
Domain     : d0
Occurrences: 1
Total size : 128 B
return_to_new_vcpu64
  leave_hypervisor_to_guest
    traps.c#check_for_pcpu_work
      do_softirq
        softirq.c#__do_softirq
          timer.c#timer_softirq_action
            _xmalloc
              _xmalloc
```
In this example, a direct `_xmalloc` is triggered from an ISR.
No domain can be held accountable for such allocation and a different strategy
to ensure/motivate the boundness of such allocations should be planned.

```
// MINERVA: PROB_SAFE (under the assumption that hwdom are safe)
// the path is already attributed to a domain pool, except if the domain is
// a hardware domain, in this case the allocation is anonymous (p2m_alloc_page).
alloc_domheap_pages()
entry.o#guest_sync_slowpath
  do_trap_guest_sync
    traps.c#do_trap_hypercall
      do_memory_op
        xenmem_add_to_physmap
          xenmem_add_to_physmap_one
            p2m_set_entry
              p2m.c#__p2m_set_entry
                p2m.c#p2m_next_level
                  p2m.c#p2m_alloc_page
                    alloc_domheap_pages
                      alloc_domheap_pages
```
In this example, the allocation is marked as "probably safe" since the path is
taken only when the domain is a hardware domain.
Such domains can (under the usage-domain of interest) perform allocations.


Offline Call Path Analysis
--------------------------
See: 
https://gitlab.com/minervasys/public/xen/-/blob/minerva/warn/minerva_static_analysis/README.md

This analysis can complement the runtime discovery of anonymous allocations and
uses compiler support to generate call-graphs (using -fcallgraph-info).

Two scripts support parsing and filtering the (large amount of) data generated.

`callpath.py` generates the call paths leading to one selected function.
`filter.py` can filter the call paths based on whitelist / blacklist of
functions that should be included / excluded.

For example, in the above branch we have used the scripts to generate and
refine the paths leading to `alloc_domheap_pages()`.

The offline analysis can complement the runtime one by reporting reacheable
paths that have not (yet) been generated in testing scenarios.


Draft of changes for per-domain heap
------------------------------------
See: 
https://gitlab.com/minervasys/public/xen/-/tree/minerva/per-domain-xenheap-idea-approach

We have prototyped some of the changes that would be needed (especially in the
configuration) if more allocations that are now anonymous would be taken by a
per-domain heap instead.

The main change in the prototype is to enforce the current domain as target
heap allocator for all anonymous (i.e., domain == NULL) allocations triggered
from alloc_xenheap_pages().
Also, we force the use of the newly modified alloc_xenheap_page() in
alloc_xen_pagetable()

```
-    pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
+    if (system_state == SYS_STATE_active)
+        d = current->domain;
+
+    pg = alloc_domheap_pages(d, order, memflags | MEMF_no_scrub);
```

The change is _not_ correct in many cases (since current->domain might
not be the right domain), but allows finding the needed configuration
changes and potential needs for increasing the sizes of per-domain
heaps.

The commit:
https://gitlab.com/minervasys/public/xen/-/commit/c1d6baae27932d2a7f07e82560deae5f64c5536a
implements these changes.

Thanks,
Andrea



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.