[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [GIT PULL] x86/mm changes for v3.9-rc1

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
From: "H. Peter Anvin" <hpa@xxxxxxxxx>
Date: Fri, 22 Feb 2013 09:31:27 -0800
Cc: linux-mips <linux-mips@xxxxxxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Gleb Natapov <gleb@xxxxxxxxxx>, "H. J. Lu" <hjl.tools@xxxxxxxxx>, Frederic Weisbecker <fweisbec@xxxxxxxxx>, Joe Millenbach <jmillenbach@xxxxxxxxx>, virtualization <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>, Gokul Caushik <caushik1@xxxxxxxxx>, Ralf Baechle <ralf@xxxxxxxxxxxxxx>, Pavel Machek <pavel@xxxxxx>, sparclinux@xxxxxxxxxxxxxxx, Christoph Lameter <cl@xxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Ville SyrjÃlÃ <ville.syrjala@xxxxxxxxxxxxxxx>, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>, Andrea Arcangeli <aarcange@xxxxxxxxxx>, Lee Schermerhorn <Lee.Schermerhorn@xxxxxx>, "Xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Russell King <linux@xxxxxxxxxxxxxxxx>, Len Brown <len.brown@xxxxxxxxx>, Joerg Roedel <joro@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Hugh Dickins <hughd@xxxxxxxxxx>, Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>, Mel Gorman <mgorman@xxxxxxx>, Ingo Molnar <mingo@xxxxxxx>, Borislav Petkov <bp@xxxxxxx>, Paul Turner <pjt@xxxxxxxxxx>, Avi Kivity <avi@xxxxxxxxxx>, Alexander Duyck <alexander.h.duyck@xxxxxxxxx>, Fenghua Yu <fenghua.yu@xxxxxxxxx>, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>, Arnd Bergmann <arnd@xxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Rusty Russell <rusty@xxxxxxxxxxxxxxx>, Jamie Lokier <jamie@xxxxxxxxxxxxx>, Josh Triplett <josh@xxxxxxxxxxxxxxxx>, Steven Rostedt <rostedt@xxxxxxxxxxx>, "Rafael J. Wysocki" <rjw@xxxxxxx>, Matt Fleming <matt.fleming@xxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Andrzej Pietrasiewicz <andrzej.p@xxxxxxxxxxx>, Shuah Khan <shuah.khan@xxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Yinghai Lu <yinghai@xxxxxxxxxx>, Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxx>, Daniel J Blueman <daniel@xxxxxxxxxxxxxxxxxx>, Zachary Amsden <zamsden@xxxxxxxxx>, "linux-pm@xxxxxxxxxxxxxxx" <linux-pm@xxxxxxxxxxxxxxx>, Marcelo Tosatti <mtosatti@xxxxxxxxxx>, Jacob Shin <jacob.shin@xxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, stable <stable@xxxxxxxxxxxxxxx>, Dave Hansen <dave@xxxxxxxxxxxxxxxxxx>, Pekka Enberg <penberg@xxxxxxxxxx>, Kyungmin Park <kyungmin.park@xxxxxxxxxxx>, "Michael S. Tsirkin" <mst@xxxxxxxxxx>, "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>, Rob Landley <rob@xxxxxxxxxxx>, Johannes Weiner <hannes@xxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>, Shuah Khan <shuahkhan@xxxxxxxxx>
Delivery-date: Fri, 22 Feb 2013 17:39:22 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 02/22/2013 08:22 AM, Linus Torvalds wrote:


Ugh. So I've tried to walk through this, and it's painful. If this
results in problems, we're going to be *so* screwed. Is it bisectable?

I can't tell you for sure that it is bisectable at every point. Thereare definite bisection points in there, though, as this is severalpieces of work from two kernel cycles that were independently tested.

I also don't understand how "early_idt_handler" could *possibly* work.
In particular, it seems to rely on the trap number being set up in the
stack frame:

         cmpl $14,72(%rsp)       # Page fault?

but that's not even *true*. Why? Because we export both the
early_idt_handlers[] array (that sets up the trap number and makes the
stack frame be reliable) and the single early_idt_handler function
(that relies on the trap number and the reliable stack frame), AND
AFAIK WE USE THE LATTER!

See x86_64_start_kernel():

         for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
#ifdef CONFIG_EARLY_PRINTK
                 set_intr_gate(i, &early_idt_handlers[i]);
#else
                 set_intr_gate(i, early_idt_handler);
#endif
         }

so unless you have CONFIG_EARLY_PRINTK, the interrupt gate will point
to that raw early_idt_handler function that doesn't *work* on its own,
afaik.

This is a (pre-existing!) bug that absolutely needs to be fixed, whichought to break other things too (early use of *msr_safe for example, oranything else that relies on an early exception entry, which therearen't a lot of so far). The fix is simple and obvious.

But you're right... what the heck is going on here?

My own testing would probably not have caught this, as I considerEARLY_PRINTK a must have, but Ingo's test machines definitely would have.

Btw, it's not just the page fault index testing that is wrong. The whole

         cmpl $__KERNEL_CS,96(%rsp)
         jne 11f

also relies on the stack frame being set up the same way for all
exceptions - which again is only true if we ran through the
early_idt_handlers[] prologue that added the extra stack entry.

How does this even work for me? I don't have EARLY_PRINTK enabled.

What am I missing?

I just ran a simulation without EARLY_PRINTK, presumably based on thememory layout, we can apparently go through the entire bootup sequencewithout actually ever taking an early trap. It is a bug, though, and itis a bug even without this patchset. I will submit a fix. However, theXen "we tested this, this worked, now it doesn't" worries me a lot.


        -hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] [GIT PULL] x86/mm changes for v3.9-rc1
  - From: H. Peter Anvin
- Re: [Xen-devel] [GIT PULL] x86/mm changes for v3.9-rc1
  - From: Linus Torvalds

Prev by Date: Re: [Xen-devel] [GIT PULL] x86/mm changes for v3.9-rc1
Next by Date: Re: [Xen-devel] [GIT PULL] x86/mm changes for v3.9-rc1
Previous by thread: Re: [Xen-devel] [GIT PULL] x86/mm changes for v3.9-rc1
Next by thread: Re: [Xen-devel] [GIT PULL] x86/mm changes for v3.9-rc1
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.