[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/mm: drop bogus assertion
>>> On 30.11.17 at 12:33, <andrew.cooper3@xxxxxxxxxx> wrote: > On 30/11/17 09:10, Jan Beulich wrote: >> Olaf has observed this assertion to trigger after an aborted migration >> of a PV guest (it remains to be determined why there is a page fault in >> the first place here: >> >> (XEN) Xen call trace: >> (XEN) [<ffff82d0802a85dc>] do_page_fault+0x39f/0x55c >> (XEN) [<ffff82d08036b7d8>] x86_64/entry.S#handle_exception_saved+0x66/0xa4 >> (XEN) [<ffff82d0802a9274>] __copy_to_user_ll+0x22/0x30 >> (XEN) [<ffff82d0802772d4>] update_runstate_area+0x19c/0x228 >> (XEN) [<ffff82d080277371>] domain.c#_update_runstate_area+0x11/0x39 >> (XEN) [<ffff82d080277596>] context_switch+0x1fd/0xf25 >> (XEN) [<ffff82d0802395c5>] schedule.c#schedule+0x303/0x6a8 >> (XEN) [<ffff82d08023d067>] softirq.c#__do_softirq+0x6c/0x95 >> (XEN) [<ffff82d08023d0da>] do_softirq+0x13/0x15 >> (XEN) [<ffff82d08036b2f1>] x86_64/entry.S#process_softirqs+0x21/0x30 >> >> i.e. the guest specified a runstate area address which has a non-present >> mapping in the page tables [EC=0002 CR2=ffff88003d405220], but that's >> not something the hypervisor needs to be concerned about.) Release >> builds work fine, which is a first indication that the assertion isn't >> really needed. >> >> What's worse though - there appears to be a timing window where the >> guest runs in shadow mode, but not in log-dirty mode, and that is what >> triggers the assertion (the same could, afaict, be achieved by test- >> enabling shadow mode on a PV guest). This is because turing off log- >> dirty mode is being performed in two steps: First the log-dirty bit gets >> cleared (paging_log_dirty_disable() [having paused the domain] -> >> sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by >> unpausing the domain and only then clearing shadow mode (via >> shadow_test_disable(), which pauses the domain a second time). >> >> Hence besides removing the ASSERT() here (or optionally replacing it by >> explicit translate and refcounts mode checks, but this seems rather >> pointless now that the three are tied together) I wonder whether either >> shadow_one_bit_disable() should turn off shadow mode if no other bit >> besides PG_SH_enable remains set (just like shadow_one_bit_enable() >> enables it if not already set), or the domain pausing scope should be >> extended so that both steps occur without the domain getting a chance to >> run in between. >> >> Reported-by: Olaf Hering <olaf@xxxxxxxxx> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> >> --- a/xen/arch/x86/traps.c >> +++ b/xen/arch/x86/traps.c >> @@ -1338,12 +1338,8 @@ static int fixup_page_fault(unsigned lon >> */ >> if ( paging_mode_enabled(d) && !paging_mode_external(d) ) >> { >> - int ret; >> + int ret = paging_fault(addr, regs); >> >> - /* Logdirty mode is the only expected paging mode for PV guests. */ >> - ASSERT(paging_mode_only_log_dirty(d)); > > This assert was introduced for a specific reason, as a result of > d5c251c22b7, although the later bugfix 7b9f21cabc complicated things > somewhat. I don't see how this later change complicated things; see also below. > The observation about shadow/logdirty happening in two phases does > invalidate the check in this form. Its original purpose was to check > that we hadn't slipped into PV autotranslate mode, and I think that is > still worth keeping around. > > How about reducing it to just a simple ASSERT(!paging_mode_translate(d)) ? Well, I've mentioned this as an option, so I don't mind; the reason I didn't do it this way is because, as said, we now have uniformly external == refcounts == translate anyway, and this is in a !external conditional already. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |