[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] arm: xen: foreign mapping PTEs are special.



On Sun, 2013-12-08 at 17:32 +0000, Stefano Stabellini wrote:
> On Fri, 6 Dec 2013, Ian Campbell wrote:
> > On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote:
> > > On Wed, 4 Dec 2013, Ian Campbell wrote:
> > > > These mappings are in fact special and require special handling in 
> > > > privcmd,
> > > > which already exists. Failure to mark the PTE as special on arm64 
> > > > causes all sorts of bad PTE fun.
> > > > 
> > > > x86 already gets this correct.
> > > 
> > > Yes, but x86 does that for PV guests, not for autotranslate guests (for
> > > which the function return -EINVAL).
> > > 
> > > Given that in the ARM case we are changing the p2m underneath, why do we
> > > also need to mark them special?
> > 
> > It's not about the p2m, it's about the handling in privcmd wrt setup and
> > teardown of the stage one mapping which goes along with the p2m
> > manipulations.
> > 
> > Without this the normal rmap counting kicks in and complains about the
> > mapcount being -1.
> 
> In the case of PV guests we need to mark the pte special because there
> is no struct page associated to it: the userspace virtual address is
> mapped to a foreign mfn. No dom0 pages are involved.
> 
> In this case we do have a corresponding physical address and a
> corresponding struct page in dom0.  In fact the dom0 page is a
> xenballooned_pages allocated by alloc_empty_pages.

special is not 100% the same as "no struct page". It means "not normal"
where normal also has other requirements, such as being normal user
process RAM established in the normal way (e.g. get_user_pages or
similar), which privcmd does not do.

> What exactly is the kernel complaining about?

There are a few different symptoms, but frequently it is hitting a bad
pte path on teardown. This is caused by vm_normal_page returning true
when it should be false (where true and false are actually non-NULL and
NULL).

e.g.:

BUG: Bad page map in process xl  pte:e0004077b33f53 pmd:4079575003
page:ffffffbce1a2f328 count:1 mapcount:-1 mapping:          (null) index:0x0
page flags: 0x4000000000000014(referenced|dirty)
addr:0000007fb5259000 vm_flags:040644fa anon_vma:          (null) 
mapping:ffffffc03a6fda58 index:0
vma->vm_ops->fault: privcmd_fault+0x0/0x38
vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c
CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ #102
Call trace:  
[<ffffffc0000880f8>] dump_backtrace+0x0/0x12c
[<ffffffc000088238>] show_stack+0x14/0x1c
[<ffffffc0004b67e0>] dump_stack+0x70/0x90
[<ffffffc000125690>] print_bad_pte+0x12c/0x1bc
[<ffffffc0001268f4>] unmap_single_vma+0x4cc/0x700
[<ffffffc0001273b4>] unmap_vmas+0x68/0xb4
[<ffffffc00012c050>] unmap_region+0xcc/0x1d4
[<ffffffc00012df20>] do_munmap+0x218/0x314
[<ffffffc00012e060>] vm_munmap+0x44/0x64
[<ffffffc00012ed78>] SyS_munmap+0x24/0x34

Or:

BUG: Bad page state in process xl  pfn:4077b4d
page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping:          (null) index:0x0
page flags: 0x4000000000000014(referenced|dirty)
Modules linked in:
CPU: 0 PID: 2657 Comm: xl Tainted: G    B        3.12.0+ #102
Call trace:
[<ffffffc0000880f8>] dump_backtrace+0x0/0x12c
[<ffffffc000088238>] show_stack+0x14/0x1c
[<ffffffc0004b67e0>] dump_stack+0x70/0x90
[<ffffffc00010f798>] bad_page+0xc4/0x110
[<ffffffc00010f8b4>] free_pages_prepare+0xd0/0xd8
[<ffffffc000110e94>] free_hot_cold_page+0x28/0x178
[<ffffffc000111460>] free_hot_cold_page_list+0x38/0x60
[<ffffffc000114cf0>] release_pages+0x190/0x1dc
[<ffffffc00012c0e0>] unmap_region+0x15c/0x1d4
[<ffffffc00012df20>] do_munmap+0x218/0x314
[<ffffffc00012e060>] vm_munmap+0x44/0x64
[<ffffffc00012ed78>] SyS_munmap+0x24/0x34

> Didn't we allocate the pages correctly?

It depends what you mean by correctly, we've done everything right for a
special page, except telling the kernel (via the pte) that the page is
special.

> Didn't we call get_page appropriately?

The count which is wrong is not page->count it is page->_mapcount, which
is used by the rmap stuff AFAIK.

> There must be a way to make the mapcount happy without marking the pages
> as special.

I'm sure there is, but it would be semantically incorrect, these pages
are not normal process memory, even though they do happen to have a
struct page. I really don't think making these pages appear normal is
what we want. We don't want them being considered page cache, available
to be swapped or any of that "normal" stuff.

> Also consider that on 3.13-rc2 pte_mkspecial on arm is implemented as:
> 
> static inline pte_t pte_mkspecial(pte_t pte) { return pte; }

This is because 32-bit arm does not have a spare PTE bit to use as
PTE_SPECIAL, end therefore does not define __HAVE_ARCH_PTE_SPECIAL. It
takes the slow/fallback path in vm_normal, which works fine.

arm64 does have such a spare bit, so vm_normal takes the fast path but
reaches the wrong conclusion because we didn't set the special bit like
we should.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.