[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen: oops at atomic64_read_cx8+0x4



On Thu, Jun 07, 2012 at 12:33:55PM +0200, Andrea Arcangeli wrote:
> On Thu, Jun 07, 2012 at 02:33:33AM -0500, Jonathan Nieder wrote:
> > Sergio Gelato wrote[1]:
> > 
> > >                          That 3.4.1-1~experimental.1 build
> > > (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux)
> > > is even less well-behaved under Xen: I'm getting a kernel OOPS at
> > > EIP: [<c1168e54>] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c
> > > The top of the trace message unfortunately scrolled off the console 
> > > before I
> > > could see it, and the message doesn't have time to make it to syslog 
> > > (either
> > > local or remote).
> > [...]
> > > Non-Xen boots proceed normally.
> > 
> > Yeah, apparently[2] that's caused by
> > 
> >     commit 26c191788f18
> >     Author: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> >     Date:   Tue May 29 15:06:49 2012 -0700
> > 
> >         mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP 
> > race condition
> > 
> > which was also included in Debian kernel 3.2.19-1.
> > 
> > [1] http://bugs.debian.org/676360
> > [2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4
> 
> Oops, sorry I didn't imagine atomic64_read on a pmd would trip.

Hmm, so it looks like it used to do this:

 pmd = pmd_offset(pud, addr);
 ..
 pmd_t pmdval = *pmd;

but now you do:
 pmd_t ret = (pmd_val)((u32)*tmp);
 ret |= (*tmp+1) << 32;

which would read the low first and then the high one next
(or is the other way around?).  The 'pmd_offset' beforehand
manufactures the pmd using the PFN to MFN lookup tree (so
that there aren't any hypercall or traps).

Hm, with your change, you are still looking at the 'pmd'
and its contents, except that you are reading the low and
then the high part. Why that would trip the hypervisor
is not clear to me. Perhaps in the past it only read the
low bits?

If there was Xen hypervisor log that might give some ideas. Is
there any chance that the Linode folks could send that over?

> 
> Unfortunately to support pagetable walking with mmap_sem hold for
> reading, we need an atomic read on 32bit PAE if
> CONFIG_TRANSPARENT_HUGEPAGE=y.
> 
> The only case requiring this is 32bit PAE with
> CONFIG_TRANSPARENT_HUGEPAGE=y at build time. If you set
> CONFIG_TRANSPARENT_HUGEPAGE=n temporarily you should be able to work
> around this as I optimized the code in a way to avoid an expensive
> cmpxchg8b.

Ah, by just skipping the thing if the low bits are zero.
> 
> I guess if Xen can't be updated to handle an atomic64_read on a pmd in
> the guest, we can add a pmd_read paravirt op? Or if we don't want to
> break the paravirt interface a loop like gup_fast with irq disabled
> should also work but looping + local_irq_disable()/enable() sounded
> worse and more complex than a atomic64_read (gup fast already disables
> irqs because it doesn't hold the mmap_sem so it's a different cost

I am not really sure what is at foot. It sounds like the hypervisor
didn't like somebody reading the high and low bit, but isn't the
pmdval_t still 64-bit ? So I would have thought this would
have been triggered? Or is that the code on pmd_val never actually
read the high bits (before your addition to the atomic_read?)?

> looping there). AFIK Xen disables THP during boot, so a check on THP
> being enabled and falling back in the THP=n version of
> pmd_read_atomic, would also be safe, but it's not so nice to do it
> with a runtime check.

The thing is that I did install a 32-bit PAE guest (a Fedora) on a Fedora
17 dom0. So it looks like this is reading high part is fixed on the newer
hypervisors, but now with the older ones. And the older one is Amazon EC2
so some .. hack to workaround older hypervisors could be added.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.