[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/vvmx: Fix deadlock with MSR bitmap merging



On 11.03.2020 19:34, Andrew Cooper wrote:
> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> map_domain_page() which may get used in the middle of context switch.
> 
> This is not safe, and causes Xen to deadlock on the mapcache lock:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>   (XEN)    [<ffff82d08038da3d>] F 
> x86_64/entry.S#handle_exception_saved+0x68/0x94
>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>   (XEN)    [<ffff82d08029a3f7>] F 
> arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>   (XEN)    [<ffff82d08031d796>] F 
> arch/x86/domain.c#__context_switch+0x124/0x4a9
>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>   (XEN)    [<ffff82d080252f3e>] F 
> common/sched/core.c#sched_context_switch+0x16a/0x175
>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> Convert the domheap page into being a xenheap page.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

> I suspect this is the not-quite-consistent-enough-to-bisect issue which
> OSSTest is hitting and interfering with pushes to master.

Having looked at a number of (albeit not all) failures, I don't
think I've seen any sign of a crash like the one above. Do you
think there are more subtle manifestations of the issue? Also
it is my understanding that this issue shouldn't get in the
way of any non-nested tests (of which we've had varying sets of
failures).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.