[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/vvmx: Fix deadlock with MSR bitmap merging



On 12/03/2020 13:25, Jan Beulich wrote:
> On 12.03.2020 13:06, Andrew Cooper wrote:
>> On 12/03/2020 10:58, Roger Pau Monné wrote:
>>> On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
>>>> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
>>>>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a 
>>>>> use of
>>>>> map_domain_page() which may get used in the middle of context switch.
>>>>>
>>>>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>>>>
>>>>>   (XEN) Xen call trace:
>>>>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>>>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>>>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>>>>   (XEN)    [<ffff82d08038da3d>] F 
>>>>> x86_64/entry.S#handle_exception_saved+0x68/0x94
>>>>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>>>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>>>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>>>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>>>>   (XEN)    [<ffff82d08029a3f7>] F 
>>>>> arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>>>>   (XEN)    [<ffff82d08031d796>] F 
>>>>> arch/x86/domain.c#__context_switch+0x124/0x4a9
>>>>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>>>>   (XEN)    [<ffff82d080252f3e>] F 
>>>>> common/sched/core.c#sched_context_switch+0x16a/0x175
>>>>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>>>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>>>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>>>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>>>>
>>>>> Convert the domheap page into being a xenheap page.
>>>> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>> ---
>>>>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>>>>> CC: Wei Liu <wl@xxxxxxx>
>>>>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>>>> CC: Kevin Tian <kevin.tian@xxxxxxxxx>
>>>>>
>>>>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>>>>> OSSTest is hitting and interfering with pushes to master.
>>>>> ---
>>>>>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>>>>>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>>>>>  2 files changed, 5 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> index 926a11c15f..f049920196 100644
>>>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>>>>>  
>>>>>      if ( cpu_has_vmx_msr_bitmap )
>>>>>      {
>>>>> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
>>>>> +        nvmx->msr_merged = alloc_xenheap_page();
>>>> Could we also use __map_domain_page_global here (keeping the domheap
>>>> allocation) in order to map the page on init and keep it mapped until
>>>> the domain is destroyed?
>>> Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
>>> mention using map_domain_page_global there as an option also, so I
>>> guess you went with the xenheap page option because it was simpler.
>> A domheap page which is mapped globally for its entire lifetime is
>> strictly greater overhead than a xenheap page, because it also uses vmap
>> space.
>>
>> global domheap mappings are for where we need to maintain a mapping for
>> more than a single transient access, but we don't know if/what/where at
>> the time the domain is created.
> I didn't think that's the only criteria:

It isn't the only criteria.  domheap+global does also let you get
working NUMA positioning.  However...

> One large systems the
> xenheap may be exhausted while the domheap isn't, and hence
> using domheap pages (and global mappings) allows to avoid
> -ENOMEM.

... on large systems, you more likely to run out of vmap space than
xenheap space, seeing as the former is limited to 64G (inc iomap/fixmap
mappings), and the latter tops out at 4T.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.