Xen project Mailing List

Re: Pinned, non-revocable mappings of VRAM: will bad things happen?

To: Val Packett <val@xxxxxxxxxxxxxxxxxxxxxx>, Christian König <christian.koenig@xxxxxxx>, dri-devel@xxxxxxxxxxxxxxxxxxxxx, Xen developer discussion <xen-devel@xxxxxxxxxxxxxxxxxxxx>, linux-media@xxxxxxxxxxxxxxx

From: Demi Marie Obenour <demiobenour@xxxxxxxxx>

Date: Tue, 21 Apr 2026 21:27:12 -0400

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"

Autocrypt: addr=demiobenour@xxxxxxxxx; keydata= xsFNBFp+A0oBEADffj6anl9/BHhUSxGTICeVl2tob7hPDdhHNgPR4C8xlYt5q49yB+l2nipd aq+4Gk6FZfqC825TKl7eRpUjMriwle4r3R0ydSIGcy4M6eb0IcxmuPYfbWpr/si88QKgyGSV Z7GeNW1UnzTdhYHuFlk8dBSmB1fzhEYEk0RcJqg4AKoq6/3/UorR+FaSuVwT7rqzGrTlscnT DlPWgRzrQ3jssesI7sZLm82E3pJSgaUoCdCOlL7MMPCJwI8JpPlBedRpe9tfVyfu3euTPLPx wcV3L/cfWPGSL4PofBtB8NUU6QwYiQ9Hzx4xOyn67zW73/G0Q2vPPRst8LBDqlxLjbtx/WLR 6h3nBc3eyuZ+q62HS1pJ5EvUT1vjyJ1ySrqtUXWQ4XlZyoEFUfpJxJoN0A9HCxmHGVckzTRl 5FMWo8TCniHynNXsBtDQbabt7aNEOaAJdE7to0AH3T/Bvwzcp0ZJtBk0EM6YeMLtotUut7h2 Bkg1b//r6bTBswMBXVJ5H44Qf0+eKeUg7whSC9qpYOzzrm7+0r9F5u3qF8ZTx55TJc2g656C 9a1P1MYVysLvkLvS4H+crmxA/i08Tc1h+x9RRvqba4lSzZ6/Tmt60DPM5Sc4R0nSm9BBff0N m0bSNRS8InXdO1Aq3362QKX2NOwcL5YaStwODNyZUqF7izjK4QARAQABzTxEZW1pIE1hcmll IE9iZW5vdXIgKGxvdmVyIG9mIGNvZGluZykgPGRlbWlvYmVub3VyQGdtYWlsLmNvbT7CwXgE EwECACIFAlp+A0oCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJELKItV//nCLBhr8Q AK/xrb4wyi71xII2hkFBpT59ObLN+32FQT7R3lbZRjVFjc6yMUjOb1H/hJVxx+yo5gsSj5LS 9AwggioUSrcUKldfA/PKKai2mzTlUDxTcF3vKx6iMXKA6AqwAw4B57ZEJoMM6egm57TV19kz PMc879NV2nc6+elaKl+/kbVeD3qvBuEwsTe2Do3HAAdrfUG/j9erwIk6gha/Hp9yZlCnPTX+ VK+xifQqt8RtMqS5R/S8z0msJMI/ajNU03kFjOpqrYziv6OZLJ5cuKb3bZU5aoaRQRDzkFIR 6aqtFLTohTo20QywXwRa39uFaOT/0YMpNyel0kdOszFOykTEGI2u+kja35g9TkH90kkBTG+a EWttIht0Hy6YFmwjcAxisSakBuHnHuMSOiyRQLu43ej2+mDWgItLZ48Mu0C3IG1seeQDjEYP tqvyZ6bGkf2Vj+L6wLoLLIhRZxQOedqArIk/Sb2SzQYuxN44IDRt+3ZcDqsPppoKcxSyd1Ny 2tpvjYJXlfKmOYLhTWs8nwlAlSHX/c/jz/ywwf7eSvGknToo1Y0VpRtoxMaKW1nvH0OeCSVJ itfRP7YbiRVc2aNqWPCSgtqHAuVraBRbAFLKh9d2rKFB3BmynTUpc1BQLJP8+D5oNyb8Ts4x Xd3iV/uD8JLGJfYZIR7oGWFLP4uZ3tkneDfYzsFNBFp+A0oBEAC9ynZI9LU+uJkMeEJeJyQ/ 8VFkCJQPQZEsIGzOTlPnwvVna0AS86n2Z+rK7R/usYs5iJCZ55/JISWd8xD57ue0eB47bcJv VqGlObI2DEG8TwaW0O0duRhDgzMEL4t1KdRAepIESBEA/iPpI4gfUbVEIEQuqdqQyO4GAe+M kD0Hy5JH/0qgFmbaSegNTdQg5iqYjRZ3ttiswalql1/iSyv1WYeC1OAs+2BLOAT2NEggSiVO txEfgewsQtCWi8H1SoirakIfo45Hz0tk/Ad9ZWh2PvOGt97Ka85o4TLJxgJJqGEnqcFUZnJJ riwoaRIS8N2C8/nEM53jb1sH0gYddMU3QxY7dYNLIUrRKQeNkF30dK7V6JRH7pleRlf+wQcN fRAIUrNlatj9TxwivQrKnC9aIFFHEy/0mAgtrQShcMRmMgVlRoOA5B8RTulRLCmkafvwuhs6 dCxN0GNAORIVVFxjx9Vn7OqYPgwiofZ6SbEl0hgPyWBQvE85klFLZLoj7p+joDY1XNQztmfA rnJ9x+YV4igjWImINAZSlmEcYtd+xy3Li/8oeYDAqrsnrOjb+WvGhCykJk4urBog2LNtcyCj kTs7F+WeXGUo0NDhbd3Z6AyFfqeF7uJ3D5hlpX2nI9no/ugPrrTVoVZAgrrnNz0iZG2DVx46 x913pVKHl5mlYQARAQABwsFfBBgBAgAJBQJafgNKAhsMAAoJELKItV//nCLBwNIP/AiIHE8b oIqReFQyaMzxq6lE4YZCZNj65B/nkDOvodSiwfwjjVVE2V3iEzxMHbgyTCGA67+Bo/d5aQGj gn0TPtsGzelyQHipaUzEyrsceUGWYoKXYyVWKEfyh0cDfnd9diAm3VeNqchtcMpoehETH8fr RHnJdBcjf112PzQSdKC6kqU0Q196c4Vp5HDOQfNiDnTf7gZSj0BraHOByy9LEDCLhQiCmr+2 E0rW4tBtDAn2HkT9uf32ZGqJCn1O+2uVfFhGu6vPE5qkqrbSE8TG+03H8ecU2q50zgHWPdHM OBvy3EhzfAh2VmOSTcRK+tSUe/u3wdLRDPwv/DTzGI36Kgky9MsDC5gpIwNbOJP2G/q1wT1o Gkw4IXfWv2ufWiXqJ+k7HEi2N1sree7Dy9KBCqb+ca1vFhYPDJfhP75I/VnzHVssZ/rYZ9+5 1yDoUABoNdJNSGUYl+Yh9Pw9pE3Kt4EFzUlFZWbE4xKL/NPno+z4J9aWemLLszcYz/u3XnbO vUSQHSrmfOzX3cV4yfmjM5lewgSstoxGyTx2M8enslgdXhPthZlDnTnOT+C+OTsh8+m5tos8 HQjaPM01MKBiAqdPgksm1wu2DrrwUi6ChRVTUBcj6+/9IJ81H2P2gJk3Ls3AVIxIffLoY34E +MYSfkEjBz0E8CLOcAw7JIwAaeBT

Cc: Suwit Semal <sumit.semwal@xxxxxxxxxx>, "Pelloux-Prayer, Pierre-Eric" <Pierre-eric.Pelloux-prayer@xxxxxxx>

Delivery-date: Wed, 22 Apr 2026 01:27:38 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 4/21/26 12:55, Val Packett wrote: > > On 4/20/26 4:12 PM, Demi Marie Obenour wrote: >> On 4/20/26 14:53, Christian König wrote: >>> On 4/20/26 20:46, Demi Marie Obenour wrote: >>>> On 4/20/26 13:58, Christian König wrote: >>>>> On 4/20/26 19:03, Demi Marie Obenour wrote: >>>>>> On 4/20/26 04:49, Christian König wrote: >>>>>>> On 4/17/26 21:35, Demi Marie Obenour wrote: >>>>> ... >>>>>>>> Are any of the following reasonable options? >>>>>>>> >>>>>>>> 1. Change the guest kernel to only map (and thus pin) a small subset >>>>>>>> of VRAM at any given time. If unmapped VRAM is accessed the guest >>>>>>>> traps the page fault, evicts an old VRAM mapping, and creates a >>>>>>>> new one. >>>>>>> Yeah, that could potentially work. >>>>>>> >>>>>>> This is basically what we do on the host kernel driver when we can't >>>>>>> resize the BAR for some reason. In that use case VRAM buffers are >>>>>>> shuffled in and out of the CPU accessible window of VRAM on demand. >>>>>> How much is this going to hurt performance? >>>>> Hard to say, resizing the BAR can easily give you 10-15% more performance >>>>> on some use cases. >>>>> >>>>> But that involves physically transferring the data using a DMA. For this >>>>> solution we basically only have to we basically only have to transfer a >>>>> few messages between host and guest. >>>>> >>>>> No idea how performant that is. >>>> In this use-case, 20-30% performance penalties are likely to be >>>> "business as usual". >>> Well that is quite a bit. >>> >>>> Close to native performance would be ideal, but >>>> to be useful it just needs to beat software rendering by a wide margin, >>>> and not cause data corruption or vulnerabilities. >>> That should still easily be the case, even trivial use cases are multiple >>> magnitudes faster on GPUs compared to software rendering. >> Makes sense. If only GPUs supported easy and flexible virtualization the >> way CPUs do :(. >> >>>>>>> But I have one question: When XEN has a problem handling faults from >>>>>>> the guest on the host then how does that work for system memory >>>>>>> mappings? >>>>>>> >>>>>>> There is really no difference between VRAM and system memory in the >>>>>>> handling for the GPU driver stack. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>> Generally, Xen makes the frontend (usually an unprivileged VM) >>>>>> responsible for providing mappings to the backend (usually the host). >>>>>> That is possible with system RAM but not with VRAM, because Xen has >>>>>> no awareness of VRAM. To Xen, VRAM is just a PCI BAR. >>>>> No, that doesn't work with system memory allocations of GPU drivers >>>>> either. >>>>> >>>>> We already had it multiple times that people tried to be clever and >>>>> incremented the page reference counter on driver allocated system memory >>>>> and were totally surprised that this can result in security issues and >>>>> data corruption. >>>>> >>>>> I seriously hope that this isn't the case here again. As far as I know >>>>> XEN already has support for accessing VMAs with VM_PFN or otherwise I >>>>> don't know how driver allocated system memory access could potentially >>>>> work. >>>>> >>>>> Accessing VRAM is pretty much the same use case as far as I can see. >>>>> >>>>> Regards, >>>>> Christian. >>>> The Xen-native approach would be for system memory allocations to >>>> be made using the Xen driver and then imported into the virtio-GPU >>>> driver via dmabuf. Is there any chance this could be made to happen? >>> That could be. Adding Pierre-Eric to comment since he knows that use much >>> better than I do. >>> >>>> If it's a lost cause, then how much is the memory overhead of pinning >>>> everything ever used in a dmabuf? It should be possible to account >>>> pinned host memory against a guest's quota, but if that leads to an >>>> unusable system it isn't going to be good. >>> That won't work at all. >>> >>> We have use cases where you *must* migrate a DMA-buf to VRAM or otherwise >>> the GPU can't use it. >>> >>> A simple scanout to a monitor is such an use case for example, that is >>> usually not possible from system memory. >> Direct scanout isn't a concern here. >> >>>> Is supporting page faults in Xen the only solution that will be viable >>>> long-term, considering the tolerance for very substantial performance >>>> overheads compared to native? AAA gaming isn't the initial goal here. >>>> Qubes OS already supports PCI passthrough for that. >>> We have AAA gaming working on XEN through native context working for quite >>> a while. >>> >>> Pierre-Eric can tell you more about that. >>> >>> Regards, >>> Christian. >> I've heard of that, but last I checked it required downstream patches >> to Xen, Linux, and QEMU. I don't know if any of those have been >> upstreamed since, but I believe that upstreaming the Xen and Linux >> patches (or rewriting them and upstreaming the rewritten version) would >> be necessary. Qubes OS (which I don't work for anymore but still want >> to help with this) almost certainly won't be using QEMU for GPU stuff. > > Yeah, our plan is to use xen-vhost-frontend[1] + vhost-device-gpu, > ported/extended/modified as necessary. (I already have > xen-vhost-frontend itself working on amd64 PVH with purely xenbus-based > hotplug/configuration, currently working on cleaning up and submitting > the necessary patches.) > > I'm curious to hear more details about how AMD has it working but last > time I checked, there weren't any missing pieces in Xen or Linux that > we'd need.. The AMD downstream changes were mostly related to QEMU. > > As for the memory management concerns, I would like to remind everyone > once again that the pinning of GPU dmabufs in regular graphics workloads > would be *very* short-term. In GPU paravirtualization (native contexts > or venus or whatever else) the guest mostly operates on *opaque handles* > that refer to buffers owned by the host GPU process. The typical > rendering process (roughly) only involves submitting commands to the GPU > that refer to memory using these handles. Only upon mmap() would a > buffer be pinned/granted to the guest, and those are typically only used > for *uploads* where the guest immediately does its memcpy() and unmaps > the buffer. > > So I'm not worried about (unintentionally) pinning too much GPU driver > memory. > > In terms of deliberate denial-of-service attacks from the guest to the > host, the only reasonable response is: > > ¯\_(ツ)_/¯ > > CPU-mapping lots of GPU memory is far from the only DoS vector, the GPU > commands themselves can easily wedge the GPU core in a million ways (and > last time I checked amdgpu was noooot so good at recovering from hangs). > > > [1]: https://github.com/vireshk/xen-vhost-frontend > > ~val I think it is best to handle things like GPU crashes by giving the guest some time to unmap its grants, and if that fails, crashing it. This should be done from a revoke callback, as afterwards the VRAM might get reused. Does amdgpu call revoke callbacks when the device is reset and VRAM is lost? It seems like it at least ought to. As an aside, Qubes needs to use the process isolation mode of the amdgpu driver. This means that only one process will be on the GPU at a time, so it _should_ be possible to blow away all GPU-resident state except VRAM without affecting other processes. Unfortunately, I think AMD GPUs might have HW or FW limitations that prevent that, at least on dGPUs. It might make sense to recommend KDE with GPU acceleration. KWin can recover from losing VRAM. -- Sincerely, Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.