[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU



On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx> 
> >>> wrote:
> > On 12/04/2013 12:04 PM, Ian Campbell wrote:
> >> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
> >>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> >>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
> >>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
> >>>
> >>> But that's the source of my not understanding: xen_make_pte()
> >>> derives addr from the passed in pte, and that pte can - for a
> >>> foreign domain's page - hardly hold a PFN. Otherwise how would
> >>> the translation to MFN be supposed to happen? Yet, if it's a
> >>> machine address that's coming in, it can't point into the low 1Mb.
> >>
> >> Isn't it a foreign gpfn at this point, which for an HVM guest is
> >> actually a PFN not an MFN?
> >>
> >> You are making me think I might be talking out my a**e though, because
> >> what is a foreign mapping even doing in xen_make_pte -- those need to be
> >> instantiated in a special way.
> >>
> > I believe the callpath for this is
> > 
> > xen_remap_domain_range() (mmu.c)
> >   |
> >   v
> > remap_area_pfn_pte() (mmu.c)
> >   |
> >   v
> > pfn_pte() (somewhere, one of the pgtable.h hdrs)
> >   |
> >   v
> > __pte() (paravirt.h)
> >   |
> >   v
> > xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
> > 
> > Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN, 
> > but it seems the case.
> 
> But that's a fundamental thing to explain. As Ian says - foreign PFNs
> shouldn't make it here, or else how do you know how to translate
> them to MFNs (as you can't consult the local P2M table to do so)?

This is all done via the toolstack which does the /dev/xen ioctl to map
some of its user-space memory in the guest memory. It ends up getting
the MFNs via some hypercall (forgotten which) and inputs those in the
IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with
_PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore
the P2M and use that specific MFN value.

It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage
out - but this is the last bastion where it is used.

The check that the xen_make_pte for the VM_IO for 1:1 pages is not
really needed anymore - as we have the 1:1 pages in the P2M (except for
the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach
there - but that is different bug).

So the check there could actually be lessen - and we can piggyback on
the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the
xen_pte_val - which we would only be set by xen_make_pte iff P2M says
the page is 1:1.


Not compile tested:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..98efb65 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
                        if (mfn & IDENTITY_FRAME_BIT) {
                                mfn &= ~IDENTITY_FRAME_BIT;
                                flags |= _PAGE_IOMAP;
-                       }
+                       } else
+                               flags &= _PAGE_IOMAP;
                }
                val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
        }
@@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
                pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
        }
 #endif
-       if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
+       if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */
                return pteval;
 
        return pte_mfn_to_pfn(pteval);
@@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
 #endif
        /*
         * Unprivileged domains are allowed to do IOMAPpings for
-        * PCI passthrough, but not map ISA space.  The ISA
-        * mappings are just dummy local mappings to keep other
-        * parts of the kernel happy.
+        * PCI passthrough. _PAGE_SPECIAL is done when user-space uses
+        * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP
+        * is supplied to use by xen_set_fixmap.
         */
-       if (unlikely(pte & _PAGE_IOMAP) &&
-           (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+       if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
                pte = iomap_pte(pte);
-       } else {
-               pte &= ~_PAGE_IOMAP;
+       else
                pte = pte_pfn_to_mfn(pte);
-       }
 
        return native_make_pte(pte);
 }


> Jan
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.