Xen project Mailing List

Re: [Xen-devel] nvmx deadlock with MSR bitmaps

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 12 Mar 2020 14:44:35 +0100

Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=roger.pau@xxxxxxxxxx; spf=Pass smtp.mailfrom=roger.pau@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Paul Durrant <paul@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 12 Mar 2020 13:45:01 +0000

Ironport-sdr: Q6JNUJVV2zOGvkRAXjK+KT0i4cLN1cl3D3By7GTTQgpBamkGGsbPaUcY86PuEbdWwPi+QpRIxl RZVIjyWnastXgNc6CW8wUNXTOO5UPNdBs/G1tdeQX68pL1kPNfyDG/rywup86KKqbtpxUV5GaV s8FMaeOm9rvRk3E+Dc2vice4MBClTNxnUnx5+TFUxWMhFF3WnwvPYH2bFVntHau5OL+1QCNp50 se3SnqT8NvYUw7nepXM8qo9uvGD8+Ybem1Jou5y1XdkKfk+4bkPEKjbA9/iR0RCrxNjZrXtuBZ wT8=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Mar 12, 2020 at 12:12:12PM +0100, Jürgen Groß wrote: > On 12.03.20 11:56, Roger Pau Monné wrote: > > On Thu, Mar 12, 2020 at 09:59:48AM +0100, Jan Beulich wrote: > > > On 11.03.2020 19:04, Andrew Cooper wrote: > > > > Specifically, this is a switch from an HVM vcpu, to a PV vcpu, where the > > > > mapcache code tries to access the per-domain mappings on the HVM monitor > > > > table. It ends up trying to recursively acquire the mapcache lock while > > > > trying to walk %cr2 to identify the source of the fault. > > > > > > > > For nvmx->msr_merged, this needs to either be a xenheap page, or a > > > > globally mapped domheap page. I'll draft a patch in a moment. > > > > > > > > For map_domain_page(), is there anything we can rationally do to assert > > > > that it isn't called in the middle of a context switch? This is the > > > > kind of thing which needs to blow up reliably in a debug build. > > > > > > Well, it's not inherently unsafe to do, it's just that > > > mapcache_current_vcpu() would need to avoid using current from > > > context_switch()'s call to set_current() through to > > > __context_switch()'s call to write_ptbase(). A possible > > > detection (if we don't want to make the case work) would > > > seem to be ASSERT(current == this_cpu(curr_vcpu)). But of course > > > there's also this extra logic in mapcache_current_vcpu() to deal > > > with a PV vCPU having a null v->arch.guest_table, which I'm once > > > again struggling to see under what conditions it might happen. > > > The Dom0 building case can't be meant with there being > > > mapcache_override_current() on that path. I'm wondering if the > > > comment there is misleading and it's really to cover the case > > > where, coming from a PV vCPU, current was already set to the > > > idle vCPU by context_switch() (which would have a null > > > v->arch.guest_table) - I wouldn't call this "we are running a > > > paravirtualised guest". But in such a case the logic here would > > > simply be a (too) special case of what you're describing as the > > > issue with nVMX. > > > > Looking at the code in context_switch and __context_switch would it be > > possible to set current to the next vCPU after all the from hooks have > > been called? > > > > Ie: set_current could be moved into __context_switch after the call to > > pd->arch.ctxt_switch->from(p). > > No, wouldn't work. When switching to idle __context_switch() is normally > not called in order to avoid switching the address space in case the > same vcpu will be scheduled again after idle. This is the reason why > current and curr_vcpu can be different. Since the idle vCPU context switch is already a special case we could maybe place the call to set_current in such special handling, while leaving the call for the rest of vCPUs in __context_switch after the ->from hooks have been executed? > > I'm also not sure I understand the difference between context_switch > > and __context_switch, and how are callers supposed to use them. Jan points out that __context_switch signals a pending context switch, in which case my proposal to use set_current might not be suitable, as we would be changing current without actually doing the context switch? I also wonder why __context_switch then needs to call the ->from hook just to signal a pending context switch. It seems like __context_switch does a lot of work just to signal a context switch, which will then be redone when context_switch is actually called? Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.