[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] regression from 22242:7831b8e5aae2 (x86 guest pagetable walker: check for invalid bits in pagetable entries)?


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Tim Deegan <tim@xxxxxxx>
  • Date: Fri, 17 Feb 2012 16:23:44 +0000
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Fri, 17 Feb 2012 16:24:03 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi, 

At 15:50 +0000 on 17 Feb (1329493838), Jan Beulich wrote:
> this c/s or-s PFEC_reserved_bit into the error code passed to the guest
> in two places, and I'm afraid it is responsible for problems we're seeing
> when a Linux HVM PAE guest on a non-HAP platform runs into
> invocations of pgtable_bad(). Linux check the reserved bit before
> looking at the present bit, and expects the reserved bit to always be
> clear when the present bit also is. The respective lines from the
> guest's kernel log however are
> 
> Bad pagetable: 000c [#1] SMP
> Bad pagetable: 000e [#2] SMP
> Bad pagetable: 000e [#3] SMP
> Bad pagetable: 000c [#4] SMP
> Bad pagetable: 000c [#5] SMP
> Bad pagetable: 000c [#6] SMP
> Bad pagetable: 000a [#7] SMP
> Bad pagetable: 000a [#8] SMP
> Bad pagetable: 000c [#10] SMP
> Bad pagetable: 000c [#11] SMP
> 
> (i.e. reserved flag always set, present flag always clear) with the
> corresponding page table dumps being
> 
> *pdpt = 0000000035682001 *pde = 0000000013014067 *pte = 002dea0000000000
> *pdpt = 0000000035d65001 *pde = 0000000028a8a067 *pte = 002dd16000000000
> *pdpt = 0000000013177001 *pde = 0000000028bf9067 *pte = 002dd16000000000
> *pdpt = 0000000035480001 *pde = 000000005e6b1067
> *pdpt = 00000000133b6001 *pde = 000000005e6c1067
> *pdpt = 00000000131b7001 *pde = 000000005de15067
> *pdpt = 00000000354bd001 *pde = 00000000355ad067 *pte = 002dfca000000000
> *pdpt = 000000002bd5c001 *pde = 000000002bf2c067 *pte = 002dfa6000000000
> *pdpt = 000000002bd79001 *pde = 0000000000000000
> *pdpt = 000000002bdc6001 *pde = 000000005e63c067
> *pdpt = 00000000130ef001 *pde = 000000005e690067
> 
> (i.e. no invalid bits at all, but paged out entries with page indexes
> into the swap file indicating a swap size of over 10G - with anything
> up to 4G, the reserved bits would never be run into).

Right - so the problem is that the walker is choking on the reserved
bits even though the present bit it clear in that particular PTE?
Yeah, that's a bug.

> The problem appears to be that the or-ing in of PFEC_reserved_bit
> happens without consideration of PFEC_present. If you can confirm
> I'm not mis-interpreting things, fixing this should be relatively
> strait forward (though the question would be whether it should be
> guest_walk_tables() or its callers that should be fixed).

We should fix it in guest_walk_tables, since AFAICS it's possible for
PFEC_reserved_bit to be set based on a bad higher-level entry even if a
lower-level one has _PAGE_PRESENT clear.

Something like the attached (compile-tested only) patch? 

(Sigh; this PT walker was once relatively readable and efficient.)

Cheers,

Tim.

Attachment: pfec
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.