[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary

To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Wed, 30 Aug 2017 14:06:06 +0100
Cc: sstabellini@xxxxxxxxxx, wei.liu2@xxxxxxxxxx, George.Dunlap@xxxxxxxxxxxxx, tim@xxxxxxx, ian.jackson@xxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxx
Delivery-date: Wed, 30 Aug 2017 13:07:57 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 30/08/17 13:59, Boris Ostrovsky wrote:
>>> This patch has been applied to staging, but its got problems.  The
>>> following crash is rather trivial to provoke:
>>>
>>> ~Andrew
>>>
>>> (d19) Test result: SUCCESS
>>> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
>>> (XEN) CPU:    5
>>> (XEN) RIP:    e008:[<ffff82d0802252fc>] 
>>> page_alloc.c#free_heap_pages+0x786/0x7a1
>>> ...
>>> (XEN) Pagetable walk from ffff82ffffffffe4:
>>> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
>>> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
>> Some negative offset into somewhere, it seems. Upon second
>> look I think the patch is simply wrong in its current shape:
>> free_heap_pages() looks for page_state_is(..., free) when
>> trying to merge chunks, while alloc_heap_pages() now sets
>> PGC_state_inuse outside of the locked area. I'll revert it right
>> away.
> Yes, so we do need to update page state under heap lock. I'll then move
> scrubbing (and checking) only to outside the lock.
>
> I am curious though, what was the test to trigger this? I ran about 100
> parallel reboots under memory pressure and never hit this.

# git clone git://xenbits.xen.org/xtf.git
# cd xtf
# make -j4 -s
# ./xtf-runner -qa

Purposefully, ./xtf-runner doesn't synchronously wait for VMs to be
fully destroyed before starting the next test.  (There is an ~800ms
added delay to synchronously destroy HVM guests, over PV, which I expect
is down to an interaction with qemu.  I got sufficiently annoyed that I
coded around the issue.)

As a result, destruction of one domain will be happening while
construction of the next one is happening.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  - From: Boris Ostrovsky

References:
- [Xen-devel] [PATCH 0/5] Scrubbing updates
  - From: Boris Ostrovsky
- [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  - From: Boris Ostrovsky
- Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  - From: Andrew Cooper
- Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  - From: Boris Ostrovsky

Prev by Date: [Xen-devel] [xen-4.7-testing test] 112946: tolerable trouble: blocked/broken/fail/pass - PUSHED
Next by Date: Re: [Xen-devel] [PATCH 1/2] x86/mm: Use mfn_t for new_guest_cr3()
Previous by thread: Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
Next by thread: Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.