[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0



On Thu, Jun 11, 2009 at 08:18:15AM -0700, Jeremy Fitzhardinge wrote:
> On 06/11/09 02:02, Ian Campbell wrote:
> >On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge wrote:
> >   
> >>Ian Campbell wrote:
> >>     
> >>>I wonder how this interacts with the logic in
> >>>arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
> >>>the (deferred) pin multicall to occur? Hmm, no this is about the
> >>>PagePinned flag on the struct page which is out of date WRT the actual
> >>>pinned status as Xen sees it -- we update the PagePinned flag early in
> >>>xen_pin_page() long before Xen the pin hypercall so this window is the
> >>>other way round to what would be needed to trigger this bug.
> >>>
> >>>       
> >>Yes, it looks like you could get a bad mapping here.  An obvious fix
> >>would be to defer clearing the pinned flag in the page struct until
> >>after the hypercall has issued.  That would make the racy
> >>kmap_atomic_pte map RO, which would be fine unless it actually tries to
> >>modify it (but I can't imagine it would do that unlocked).
> >>     
> >
> >But would it redo the mapping after taking the lock? It doesn't look
> >like it does (why would it). So we could end up writing to an unpinned
> >pte via a R/O mapping.
> >   
> 
> Hm, yep.  One thing I noticed is that set_pte() is used very rarely, so 
> it would be no cost to always use a hypercall in that case.  But 
> xen_set_pte_at() ends up calling xen_set_pte() as well, and I think 
> that's more common.  Certainly we need to make sure that we're actually 
> taking advantage of late-pin by direct writing unpinned ptes.
> 
> I've been thinking of rearranging the set_pte(_at) pvops a little bit 
> anyway; its not obvious we're really getting much benefit from using the 
> update_va_mapping hypercall, and if we're not using it, then the 
> set_pte_at pvop is taking a lot of unused parameters.
> 
> If we switch to just using mmu_update, then we can just pass the address 
> and pte value.  But we could also pass the struct page * (which makes a 
> bit of conceptual sense), so we could easy directly test whether the pte 
> is pinned, and either use a direct write or hypercall accordingly.
> 
> >As an experiment I tried the simple approach of flushing the multicalls
> >explicitly in xen_unpin_page and then clearing the Pinned bit and it all
> >goes a bit wrong. eip is "ptep->pte_low = 0" so I think the unpinned but
> >R/O theory holds...
> >   
> 
> Yes, I think the theory is sound.  But I'm curious why Pasi seems to be 
> able to hit the race easily, but we have not...
> 

Yeah, I've been thinking about that too.. 

My hardware is ~5 years old, but it has been running stable with multiple
distributions and kernel versions, on various types of loads. I think the
hardware should be all fine.

Atm I've been running Fedora 10 and Fedora 11 on it, both seem stable with
the distro-provided kernels.

ie. I'm only seeing the problem on pv_ops dom0 kernel.

My installation is pretty basic/standard.. root-fs on LVM-volume. Can't
really think of anything special.. 

And the problem seems to be _always_ reproducible with a simple 
"make clean && make bzImage && make modules" command on dom0 .. 

Anyway, I'll continue testing. Hopefully we get this hunted down :)

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.