[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option



On 29/07/15 06:28, Andy Lutomirski wrote:
> On Tue, Jul 28, 2015 at 8:01 PM, Boris Ostrovsky
> <boris.ostrovsky@xxxxxxxxxx> wrote:
>> On 07/28/2015 08:47 PM, Andrew Cooper wrote:
>>> On 29/07/2015 01:21, Andy Lutomirski wrote:
>>>> On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
>>>> <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>> On 07/28/2015 01:07 PM, Andy Lutomirski wrote:
>>>>>> On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
>>>>>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>>> I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
>>>>>>> xen_free_ldt() is attempting to nab back the pages which Xen still has
>>>>>>> mapped as an LDT.
>>>>>>>
>>>>>> I just instrumented it with yet more LSL instructions.  I'm pretty
>>>>>> sure that set_ldt really is clearing at least LDT entry zero.
>>>>>> Nonetheless the free_ldt call still oopses.
>>>>>>
>>>>> Yes, I added some instrumentation to the hypervisor and we definitely
>>>>> set
>>>>> LDT to NULL before failing.
>>>>>
>>>>> -boris
>>>> Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
>>>> getting incremented once on each CPU at the same time if both CPUs
>>>> fault in the same shadow LDT page at the same time?
>>> Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
>>> expected to have a type refcount of 2.
>>>
>>>> Similarly, what
>>>> keeps both CPUs from calling get_page_type at the same time and
>>>> therefore losing track of the page type reference count?
>>> a cmpxchg() loop in the depths of __get_page_type().
>>>
>>>> I don't see why vmalloc or vm_unmap_aliases would have anything to do
>>>> with this, though.
>>
>> So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately
>> and the problem went away.
> As far as I can tell, this affects TLB flushes but not unmaps.  That
> means that my patch is totally bogus -- vm_unmap_aliases() *flushed*
> aliases but isn't involved in removing them from the page tables.
> That must be why xen_alloc_ldt and xen_set_ldt work today.
>
> So what does flushing the TLB have to do with anything?  The only
> thing I can think of is that it might force some deferred hypercalls
> out.  I can reproduce this easily on UP, so IPIs aren't involved.
>
> The other odd thing is that it seems like this happens when clearing
> the LDT and freeing the old one but not when setting the LDT and
> freeing the old one.  This is plausibly related to the lazy mode in
> effect at the time, but I have no evidence for that.
>
> Two more data points:  Putting xen_flush_mc before and after the
> SET_LDT multicall has no effect.  Putting flush_tlb_all() in
> xen_free_ldt doesn't help either, while vm_unmap_aliases() in the
> exact same place does help.

FYI, I have got a repro now and am investigating.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.