[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/vvmx: Fix deadlock with MSR bitmap merging

On 12/03/2020 09:21, Jan Beulich wrote:
> On 11.03.2020 19:34, Andrew Cooper wrote:
>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use 
>> of
>> map_domain_page() which may get used in the middle of context switch.
>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>   (XEN) Xen call trace:
>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>   (XEN)    [<ffff82d08038da3d>] F 
>> x86_64/entry.S#handle_exception_saved+0x68/0x94
>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>   (XEN)    [<ffff82d08029a3f7>] F 
>> arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>   (XEN)    [<ffff82d08031d796>] F 
>> arch/x86/domain.c#__context_switch+0x124/0x4a9
>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>   (XEN)    [<ffff82d080252f3e>] F 
>> common/sched/core.c#sched_context_switch+0x16a/0x175
>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>> Convert the domheap page into being a xenheap page.
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>> OSSTest is hitting and interfering with pushes to master.
> Having looked at a number of (albeit not all) failures, I don't
> think I've seen any sign of a crash like the one above. Do you
> think there are more subtle manifestations of the issue?

This stack trace was produced by an NMI watchdog timeout, and I thought
OSSTest didn't, but I see I'm wrong.

In which case this probably isn't want OSSTest is seeing, but it is a
genuine issue.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.