[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] arm: xen: foreign mapping PTEs are special.
On Sun, 2013-12-08 at 17:32 +0000, Stefano Stabellini wrote: > On Fri, 6 Dec 2013, Ian Campbell wrote: > > On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote: > > > On Wed, 4 Dec 2013, Ian Campbell wrote: > > > > These mappings are in fact special and require special handling in > > > > privcmd, > > > > which already exists. Failure to mark the PTE as special on arm64 > > > > causes all sorts of bad PTE fun. > > > > > > > > x86 already gets this correct. > > > > > > Yes, but x86 does that for PV guests, not for autotranslate guests (for > > > which the function return -EINVAL). > > > > > > Given that in the ARM case we are changing the p2m underneath, why do we > > > also need to mark them special? > > > > It's not about the p2m, it's about the handling in privcmd wrt setup and > > teardown of the stage one mapping which goes along with the p2m > > manipulations. > > > > Without this the normal rmap counting kicks in and complains about the > > mapcount being -1. > > In the case of PV guests we need to mark the pte special because there > is no struct page associated to it: the userspace virtual address is > mapped to a foreign mfn. No dom0 pages are involved. > > In this case we do have a corresponding physical address and a > corresponding struct page in dom0. In fact the dom0 page is a > xenballooned_pages allocated by alloc_empty_pages. special is not 100% the same as "no struct page". It means "not normal" where normal also has other requirements, such as being normal user process RAM established in the normal way (e.g. get_user_pages or similar), which privcmd does not do. > What exactly is the kernel complaining about? There are a few different symptoms, but frequently it is hitting a bad pte path on teardown. This is caused by vm_normal_page returning true when it should be false (where true and false are actually non-NULL and NULL). e.g.: BUG: Bad page map in process xl pte:e0004077b33f53 pmd:4079575003 page:ffffffbce1a2f328 count:1 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) addr:0000007fb5259000 vm_flags:040644fa anon_vma: (null) mapping:ffffffc03a6fda58 index:0 vma->vm_ops->fault: privcmd_fault+0x0/0x38 vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ #102 Call trace: [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c [<ffffffc000088238>] show_stack+0x14/0x1c [<ffffffc0004b67e0>] dump_stack+0x70/0x90 [<ffffffc000125690>] print_bad_pte+0x12c/0x1bc [<ffffffc0001268f4>] unmap_single_vma+0x4cc/0x700 [<ffffffc0001273b4>] unmap_vmas+0x68/0xb4 [<ffffffc00012c050>] unmap_region+0xcc/0x1d4 [<ffffffc00012df20>] do_munmap+0x218/0x314 [<ffffffc00012e060>] vm_munmap+0x44/0x64 [<ffffffc00012ed78>] SyS_munmap+0x24/0x34 Or: BUG: Bad page state in process xl pfn:4077b4d page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) Modules linked in: CPU: 0 PID: 2657 Comm: xl Tainted: G B 3.12.0+ #102 Call trace: [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c [<ffffffc000088238>] show_stack+0x14/0x1c [<ffffffc0004b67e0>] dump_stack+0x70/0x90 [<ffffffc00010f798>] bad_page+0xc4/0x110 [<ffffffc00010f8b4>] free_pages_prepare+0xd0/0xd8 [<ffffffc000110e94>] free_hot_cold_page+0x28/0x178 [<ffffffc000111460>] free_hot_cold_page_list+0x38/0x60 [<ffffffc000114cf0>] release_pages+0x190/0x1dc [<ffffffc00012c0e0>] unmap_region+0x15c/0x1d4 [<ffffffc00012df20>] do_munmap+0x218/0x314 [<ffffffc00012e060>] vm_munmap+0x44/0x64 [<ffffffc00012ed78>] SyS_munmap+0x24/0x34 > Didn't we allocate the pages correctly? It depends what you mean by correctly, we've done everything right for a special page, except telling the kernel (via the pte) that the page is special. > Didn't we call get_page appropriately? The count which is wrong is not page->count it is page->_mapcount, which is used by the rmap stuff AFAIK. > There must be a way to make the mapcount happy without marking the pages > as special. I'm sure there is, but it would be semantically incorrect, these pages are not normal process memory, even though they do happen to have a struct page. I really don't think making these pages appear normal is what we want. We don't want them being considered page cache, available to be swapped or any of that "normal" stuff. > Also consider that on 3.13-rc2 pte_mkspecial on arm is implemented as: > > static inline pte_t pte_mkspecial(pte_t pte) { return pte; } This is because 32-bit arm does not have a spare PTE bit to use as PTE_SPECIAL, end therefore does not define __HAVE_ARCH_PTE_SPECIAL. It takes the slow/fallback path in vm_normal, which works fine. arm64 does have such a spare bit, so vm_normal takes the fast path but reaches the wrong conclusion because we didn't set the special bit like we should. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |