Xen project Mailing List

Re: [Xen-devel] nvmx deadlock with MSR bitmaps

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 12 Mar 2020 09:59:48 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 12 Mar 2020 08:59:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11.03.2020 19:04, Andrew Cooper wrote: > Specifically, this is a switch from an HVM vcpu, to a PV vcpu, where the > mapcache code tries to access the per-domain mappings on the HVM monitor > table. It ends up trying to recursively acquire the mapcache lock while > trying to walk %cr2 to identify the source of the fault. > > For nvmx->msr_merged, this needs to either be a xenheap page, or a > globally mapped domheap page. I'll draft a patch in a moment. > > For map_domain_page(), is there anything we can rationally do to assert > that it isn't called in the middle of a context switch? This is the > kind of thing which needs to blow up reliably in a debug build. Well, it's not inherently unsafe to do, it's just that mapcache_current_vcpu() would need to avoid using current from context_switch()'s call to set_current() through to __context_switch()'s call to write_ptbase(). A possible detection (if we don't want to make the case work) would seem to be ASSERT(current == this_cpu(curr_vcpu)). But of course there's also this extra logic in mapcache_current_vcpu() to deal with a PV vCPU having a null v->arch.guest_table, which I'm once again struggling to see under what conditions it might happen. The Dom0 building case can't be meant with there being mapcache_override_current() on that path. I'm wondering if the comment there is misleading and it's really to cover the case where, coming from a PV vCPU, current was already set to the idle vCPU by context_switch() (which would have a null v->arch.guest_table) - I wouldn't call this "we are running a paravirtualised guest". But in such a case the logic here would simply be a (too) special case of what you're describing as the issue with nVMX. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.