[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary

On 30/08/17 13:59, Boris Ostrovsky wrote:
>>> This patch has been applied to staging, but its got problems.  The
>>> following crash is rather trivial to provoke:
>>> ~Andrew
>>> (d19) Test result: SUCCESS
>>> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
>>> (XEN) CPU:    5
>>> (XEN) RIP:    e008:[<ffff82d0802252fc>] 
>>> page_alloc.c#free_heap_pages+0x786/0x7a1
>>> ...
>>> (XEN) Pagetable walk from ffff82ffffffffe4:
>>> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
>>> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
>> Some negative offset into somewhere, it seems. Upon second
>> look I think the patch is simply wrong in its current shape:
>> free_heap_pages() looks for page_state_is(..., free) when
>> trying to merge chunks, while alloc_heap_pages() now sets
>> PGC_state_inuse outside of the locked area. I'll revert it right
>> away.
> Yes, so we do need to update page state under heap lock. I'll then move
> scrubbing (and checking) only to outside the lock.
> I am curious though, what was the test to trigger this? I ran about 100
> parallel reboots under memory pressure and never hit this.

# git clone git://xenbits.xen.org/xtf.git
# cd xtf
# make -j4 -s
# ./xtf-runner -qa

Purposefully, ./xtf-runner doesn't synchronously wait for VMs to be
fully destroyed before starting the next test.  (There is an ~800ms
added delay to synchronously destroy HVM guests, over PV, which I expect
is down to an interaction with qemu.  I got sufficiently annoyed that I
coded around the issue.)

As a result, destruction of one domain will be happening while
construction of the next one is happening.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.