Xen project Mailing List

Re: [Xen-devel] Re: Possible regression in "x86-64: reduce range spanned by 1:1 mapping and frame table indexes"

From: Simon Horman <horms@xxxxxxxxxxxx>

Date: Fri, 11 Dec 2009 22:56:18 +1100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <Keir.Fraser@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>

Delivery-date: Fri, 11 Dec 2009 03:56:41 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Fri, Dec 11, 2009 at 10:15:32AM +0000, Tim Deegan wrote: > Hi, > > I seem to have missed this thread earlier; sorry about that. > > At 01:39 +0000 on 11 Dec (1260495591), Simon Horman wrote: > > * The memory access is to a MMIO region - a control register for the NIC. > > The access is made by the e1000 gPXE driver. > > > > * Interestingly, if there is a write to the page (or perhaps even anywhere > > in the entire 20-page MMIO region?) before a read, the problem doesn't > > occur. So a simple work-around in the gPXE driver is to just write to a > > register before reading any. > > > > * In sh_page_fault() the call to gfn_to_mfn_guest_dbg(d, gfn, &p2mt) sets > > p2mt to p2m_mmio_direct, which seems to be correct. I'm a bit > > stuck in working out what goes wrong from there. > > So from the trace below it looks like the shadow fault handler is trying > to emulate an access to a passthrough MMIO address. That's surprising. > And if it only happens when the first access is a read, that's a bit > suspect too. Sorry, I wasn't as clear as I might have been. The problem occurs if the first access is a write. The work-around is to make the first access a read. > > 106706 sh: sh_page_fault__guest_2(): d:v=1:0 va=0xf30000d8 err=2, rip=6f6 > > Oh, but this is a write, according to the error code. > > > 106707 sh: sh_page_fault__guest_2(): 2954: A > > 106708 sh: sh_page_fault__guest_2(): 2998: B > > 106709 sh: sh_page_fault__guest_2(): 3077: page_fault_slow_path: > > 106710 sh: sh_page_fault__guest_2(): 3081: C > > 106711 sh: sh_page_fault__guest_2(): 3095: rewalk: > > 106712 sh: sh_page_fault__guest_2(): 3097: D > > 106713 sh: sh_page_fault__guest_2(): 3106: E > > 106714 sh: sh_page_fault__guest_2(): 3122: F > > (XEN) _gfn_to_mfn_type_dbg: current > > 106715 sh: sh_page_fault__guest_2(): 3141: gfn_to_mfn_guest_dbg: p2mt=5 > > 106716 sh: sh_page_fault__guest_2(): 3143: G > > 106717 sh: sh_page_fault__guest_2(): 3181: H > > 106718 sh: sh_page_fault__guest_2(): 3193: I > > 106719 sh: sh_page_fault__guest_2(): 3204: J > > 106720 sh: sh_page_fault__guest_2(): 3216: K > > 106721 shdebug: make_fl1_shadow(): (f3000)=>118644 > > FL1 shadow means we're building a shadow L1 that's equivalent to a > single PSE guest entry, pointing a guest frames f3000 to f31ff. > > > 106722 sh: set_fl1_shadow_status(): gfn=f3000, type=00000002, smfn=118644 > > 106723 shdebug: _sh_propagate(): demand write level 2 guest f30000e7 shadow > > 0000000118644067 > > 106724 sh: sh_page_fault__guest_2(): 3241: L > > 106725 shdebug: _sh_propagate(): demand write level 1 guest f3000067 shadow > > 00000000f0500037 > > 106726 sh: sh_page_fault__guest_2(): 3263: M > > 106727 shdebug: _sh_propagate(): prefetch level 1 guest f3001067 shadow > > 00000000f0501037 > > 106728 shdebug: _sh_propagate(): prefetch level 1 guest f3002067 shadow > > 00000000f0502037 > > [...] and the shadow PTE flags look OK: A, PCD, U/S, R/W, P. > > > 106758 sh: sh_page_fault__guest_2(): 3285: N > > 106759 sh: sh_page_fault__guest_2(): 3310: O > > 106760 sh: sh_page_fault__guest_2(): 3319: P > > 106761 sh: sh_page_fault__guest_2(): 3332: Q > > 106762 sh: sh_page_fault__guest_2(): 3343: goto emulate; > > That's the interesting one. I guess your line numbers are different > because of the extra printout but it must be this: > > if ( is_hvm_domain(d) > && unlikely(!hvm_wp_enabled(v)) > && regs->error_code == (PFEC_write_access|PFEC_page_present) ) > { > perfc_incr(shadow_fault_emulate_wp); > goto emulate; > } Yes, it is that. > Hah. That explains why we're emulating. The guest has CR0.WP clear, > and this was a write fault that wouldn't have happened on real hardware, > so we need to emulate it because we can't actually disable CR0.WP or the > shadow pagetables stop working altogether. > > In fact in this case we don't need to emulate it because retrying would > be good enough, but in the general case we might. I'm not sure that I understand why that is true. > > 106763 sh: sh_page_fault__guest_2(): 3361: emulate: > > 106764 sh: sh_page_fault__guest_2(): 3367: R > > 106765 sh: sh_page_fault__guest_2(): 3390: emulate_readonly: > > 106766 sh: sh_page_fault__guest_2(): 3403: early_emulation: > > 106767 sh: sh_page_fault__guest_2(): 3405: S > > 106768 sh: sh_page_fault__guest_2(): emulate: eip=0x6f6 esp=0x3d264 > > 106769 sh: sh_page_fault__guest_2(): 3446: T > > 106770 sh: sh_page_fault__guest_2(): emulator failure, unshadowing mfn > > 0xf0500 > > The emulation fails (because the emulator can't/won't map MMIO space) > and we're about to bail out, try unshadowing everything and retry (at > which point everything will just work). Except: > > > 106771 sh: sh_remove_shadows(): d=1, v=0, gmfn=f0500 > > sh_remove_shadows isn't ready to be called with an MMIO MFN. We used to > get away with this before the m2p was made sparse, but now MMIO holes > mean m2p holes. > > Two fixes suggest themselves. The first is to gate the CR0.WP emulation > path on mfn_valid(). The emulator won't handle it so it only leads to > confusion and delay if we try: > > --- a/xen/arch/x86/mm/shadow/multi.c Fri Dec 11 09:17:09 2009 +0000 a > +++ b/xen/arch/x86/mm/shadow/multi.c Fri Dec 11 10:06:39 2009 +0000 > @@ -3305,7 +3305,8 @@ > * fault was a non-user write to a present page. */ > if ( is_hvm_domain(d) > && unlikely(!hvm_wp_enabled(v)) > - && regs->error_code == (PFEC_write_access|PFEC_page_present) ) > + && regs->error_code == (PFEC_write_access|PFEC_page_present) > + && mfn_valid(gmfn) ) > { > perfc_incr(shadow_fault_emulate_wp); > goto emulate; That resolves the problem that I observed. > > > The second is to make sh_remove_shadows() safe against being called with > a bogus MFN. I'm not quite decided whether the right answer is to put > "if (!mfn_valid(gmfn)) return;" at the top of it (which would stop a > certain class of crashes) or just "ASSERT(gmfn_valid(gmfn));" to make it > clearer what's gone wrong. Adding "if (!mfn_valid(gmfn)) return;" near the top of sh_remove_shadows() also resolves the problem that I was seeing. Thanks (x2) ! Although I'm not really that familiar with the code in question, I think that I have a preference for both guarding the goto emulate and adding an ASSERT to sh_remove_shadows(). That is: Index: xen-unstable.hg/xen/arch/x86/mm/shadow/common.c =================================================================== --- xen-unstable.hg.orig/xen/arch/x86/mm/shadow/common.c 2009-12-11 20:31:48.000000000 +0900 +++ xen-unstable.hg/xen/arch/x86/mm/shadow/common.c 2009-12-11 20:48:48.000000000 +0900 @@ -2752,6 +2752,7 @@ void sh_remove_shadows(struct vcpu *v, m }; ASSERT(!(all && fast)); + ASSERT(mfn_valid(gmfn)); /* Although this is an externally visible function, we do not know * whether the shadow lock will be held when it is called (since it Index: xen-unstable.hg/xen/arch/x86/mm/shadow/multi.c =================================================================== --- xen-unstable.hg.orig/xen/arch/x86/mm/shadow/multi.c 2009-12-11 20:47:17.000000000 +0900 +++ xen-unstable.hg/xen/arch/x86/mm/shadow/multi.c 2009-12-11 20:48:17.000000000 +0900 @@ -3305,7 +3305,8 @@ static int sh_page_fault(struct vcpu *v, * fault was a non-user write to a present page. */ if ( is_hvm_domain(d) && unlikely(!hvm_wp_enabled(v)) - && regs->error_code == (PFEC_write_access|PFEC_page_present) ) + && regs->error_code == (PFEC_write_access|PFEC_page_present) + && mfn_valid(gmfn) ) { perfc_incr(shadow_fault_emulate_wp); goto emulate; _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.