[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] dom0 pvops crash



On 01/27/2010 09:26 AM, Ian Campbell wrote:
On Mon, 2010-01-25 at 20:02 +0000, Jeremy Fitzhardinge wrote:
IanC, Pasi, myself and others explored a number of other ways to try
and fix it in the Xen pvops code, but they all turned out to be very
expensive, just not work (they just pushed the race around), or
require new pvops just for this case.
Just to brainstorm a bit more:

There's no way a kunmap_atomic pvop would be acceptable? it would at
least make the API symmetrical.

We could propose it, but I think we have bigger things to spend our capital on. And I'm not sure it would help:

In theory xen_kmap_atomic could take the pte lock and unmap_atomic could release it. But kmap_atomic doesn't have enough info be able to take the lock and unmap wouldn't either unless we passed it some odd parameters. And even if we did take the lock, the calling kernel code will also attempt to take the lock if it actually wants to make a pte change, so we'd have to change the logic there.

What about a hypercall which would set a PTE with the writable bit set
atomically depending on the pinned status of the referenced page? (I
haven't even vaguely thought this idea through).

It doesn't really help because the core issue is the race which changes the page state half way through. If we create a writable mapping, a pin on another CPU is going to fail. We could fix it by locking the pte while it is mapped, but then we wouldn't need a new hypercall.

Is there some way we can disable HIGHPTE at runtime even if
CONFIG_HIGHPTE=y? Looks like that might be relatively self-contained in
pte_alloc_one(). All the actual uses of high PTEs goes through
kmap_atomic which explicitly tests for PageHighmem so by ensuring PTEs
are never high at allocation time we would skip all those paths.
Something like the untested patch below, but not so skanky, obviously.

That's a thought. It could be generally useful too; highpte should only be used in extreme circumstances (to prevent ptes from filling most of lowmem), not on every system with highmem. IOW use a generic flag rather than make it explicitly Xen-related, then we can set that flag.

Or we could just put a big fat config dependency in.

This last would be nice since it also remove the
crippling-for-virtualisation overhead, so it would potentially benefit
KVM and VMI as well...

VMI is a non-issue, and I don't think HIGHPTE is extraordinarily expensive on kvm.

Given that HIGHPTE is generally a bad idea and should be deprecated
(any machine big enough to need it should definitely be running a
64-bit kernel), I've left it on the backburner hoping for some
inspiration to strike.  So far it has not.
Unfortunately distros seem to be using it for their native kernels and
since pvops means they won't have a separate xen kernel I think we need
to figure something out.

We could lobby for them to turn it off. I wonder if they have a real user demand for it these days. It could only be important for users with lots of physical memory and a 32-bit only CPU, which can't be common now. (There should be no problem with using a 64-bit kernel, even if userspace is all 32-bit.).

    J

Ian.

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 65215ab..49f8e83 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -28,7 +28,10 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long 
address)
        struct page *pte;

  #ifdef CONFIG_HIGHPTE
-       pte = alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 0);
+       if (is_xen_domain())
+               pte = alloc_pages(PGALLOC_GFP, 0);
+       else
+               pte = alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 0);
  #else
        pte = alloc_pages(PGALLOC_GFP, 0);
  #endif





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.