[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH v2] xen: privcmd: fix ioeventfd crash under PV domain



Starting a virtio backend in a PV domain would panic the kernel in
alloc_ioreq, trying to dereference vma->vm_private_data as a pages
pointer when in reality it stayed as PRIV_VMA_LOCKED.

Avoid crashing by handling the PRIV_VMA_LOCKED case in alloc_ioreq.
PV support requires mapping the virtio ioreq page explicitly into
the kernel's page tables, so do it on-demand when PRIV_VMA_LOCKED
is seen.

Signed-off-by: Val Packett <val@xxxxxxxxxxxxxxxxxxxxxx>
---
Changes from RFC v1[1]:
* An actually working patch now, not just a request for help :)
  All I needed to know was that PV actually doesn't emulate a common
  physical address space *at all*, the pfn mapped by xen_remap_domain_mfn_array
  was *only* available in the userspace process and the *only* way to have
  kernel access to the same memory was to perfore the same call but for the
  kernel's mm.
* Everything happens lazily / on-demand, inside of the ioeventfd code,
  addressing concerns about extra work performed for non-ioeventfd usage
  from the review. (kmalloc specifically is gone entirely..)

I'm leaving this as RFC mostly because of the "fake" vm_area_struct that's used
as a workaround for xen_remap_domain_mfn_array (or rather, its underlying
xen_remap_pfn) accepting a vm_area_struct only to take its vm_mm (and also check
flags with a BUG assertion). It was written for userspace processes since that
was the only use case anyone could ever imagine.. until ioeventfd came along.

So it would probably be better to change xen_remap_pfn to take the mm? Maybe?
I wanted to ask for advice first before trying to refactor other code.

Thanks,
~val

[1]: 
https://lore.kernel.org/all/20251015195713.6500-1-val@xxxxxxxxxxxxxxxxxxxxxx/
---
 drivers/xen/privcmd.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index f52a457b302d..a3ad10f149ec 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -818,6 +818,8 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
                        DOMID_SELF : kdata.dom;
                int num, *errs = (int *)pfns;
 
+               vma->vm_pgoff = pfns[0]; /* store the acquired pfn for 
ioeventfd access */
+
                BUILD_BUG_ON(sizeof(*errs) > sizeof(*pfns));
                num = xen_remap_domain_mfn_array(vma,
                                                 kdata.addr & PAGE_MASK,
@@ -1248,10 +1250,38 @@ struct privcmd_kernel_ioreq *alloc_ioreq(struct 
privcmd_ioeventfd *ioeventfd)
                goto error_kfree;
        }
 
-       pages = vma->vm_private_data;
-       kioreq->ioreq = (struct ioreq *)(page_to_virt(pages[0]));
        mmap_write_unlock(mm);
 
+       /* In a PV domain, we must manually map the pages into the kernel */
+       if (vma->vm_private_data == PRIV_VMA_LOCKED) {
+               /* This should never ever happen outside of PV */
+               if (WARN_ON_ONCE(!xen_pv_domain())) {
+                       ret = -EINVAL;
+                       goto error_kfree;
+               }
+
+               /* xen_remap_domain_mfn_array only really needs the mm */
+               struct vm_area_struct kern_vma = {
+                       .vm_flags = VM_PFNMAP | VM_IO,
+                       .vm_mm = &init_mm,
+               };
+               xen_pfn_t pfn = vma->vm_pgoff;
+               int num, err;
+
+               /* Don't provide NULL as the errors array as that results in 
pfn increment */
+               num = xen_remap_domain_mfn_array(&kern_vma, (unsigned 
long)pfn_to_kaddr(pfn),
+                                               &pfn, 1, &err, PAGE_KERNEL, 
ioeventfd->dom);
+               if (num < 0) {
+                       ret = num;
+                       goto error_kfree;
+               }
+
+               kioreq->ioreq = (struct ioreq *)(pfn_to_kaddr(pfn));
+       } else {
+               pages = vma->vm_private_data;
+               kioreq->ioreq = (struct ioreq *)(page_to_virt(pages[0]));
+       }
+
        ports = memdup_array_user(u64_to_user_ptr(ioeventfd->ports),
                                  kioreq->vcpus, sizeof(*ports));
        if (IS_ERR(ports)) {
-- 
2.51.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.