Xen project Mailing List

[Xen-devel] [PATCH] kexec: Add spinlock for the whole hypercall

When we concurrently try to unload and load crash images we eventually get: Xen call trace: [<ffff82d08018b04f>] machine_kexec_add_page+0x3a0/0x3fa [<ffff82d08018b184>] machine_kexec_load+0xdb/0x107 [<ffff82d080116e8d>] kexec.c#kexec_load_slot+0x11/0x42 [<ffff82d08011724f>] kexec.c#kexec_load+0x119/0x150 [<ffff82d080117c1e>] kexec.c#do_kexec_op_internal+0xab/0xcf [<ffff82d080117c60>] do_kexec_op+0xe/0x1e [<ffff82d08025c620>] pv_hypercall+0x20a/0x44a [<ffff82d080260116>] cpufreq.c#test_all_events+0/0x30 Pagetable walk from ffff820040088320: L4[0x104] = 00000002979d1063 ffffffffffffffff L3[0x001] = 00000002979d0063 ffffffffffffffff L2[0x000] = 00000002979c7063 ffffffffffffffff L1[0x088] = 80037a91ede97063 ffffffffffffffff The interesting thing is that the page bits (063) look legit. The operation on which we blow up is us trying to write in the L1 and finding that the L2 entry points to some bizzare MFN. It stinks of a race, and it looks like the issue is due to no concurrency locks when dealing with the crash kernel space. Specifically we concurrently call kimage_alloc_crash_control_page which iterates over the kexec_crash_area.start -> kexec_crash_area.size and once found: if ( page ) { image->next_crash_page = hole_end; clear_domain_page(_mfn(page_to_mfn(page))); } clears. Since the parameters of what MFN to use are provided by the callers (and the area to search is bounded) the the 'page' is probably the same. So #1 we concurrently clear the 'control_code_page'. The next step is us passing this 'control_code_page' to machine_kexec_add_page. This function requires the MFNs: page_to_maddr(image->control_code_page). And this would always return the same virtual address, as the MFN of the control_code_page is inside of the kexec_crash_area.start -> kexec_crash_area.size area. Then machine_kexec_add_page updates the L1 .. which can be done concurrently and on subsequent calls we mangle it up. This is all a theory at this time, but testing reveals that adding the spinlock at the kexec hypercall fixes the crash. An alternative solution would be to only add the spinlock on paths that touch the CRASH region and not everything. NOTE: The spinlock in kexec_swap_images() was removed as this function is only reachable on the kexec call, which is now entirely protected by a spinlock, thus the local spinlock is no longer necessary. NOTE: This patch follows 5c5216 (kexec: clear kexec_image slot when unloading kexec image) to prevent crashes during simultaneous load/unloads. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> Signed-off-by: Eric DeVolder <eric.devolder@xxxxxxxxxx> Reviewed-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx> --- xen/common/kexec.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/xen/common/kexec.c b/xen/common/kexec.c index 072cc8e..5bcc356 100644 --- a/xen/common/kexec.c +++ b/xen/common/kexec.c @@ -819,7 +819,6 @@ static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg) static int kexec_swap_images(int type, struct kexec_image *new, struct kexec_image **old) { - static DEFINE_SPINLOCK(kexec_lock); int base, bit, pos; int new_slot, old_slot; @@ -831,8 +830,6 @@ static int kexec_swap_images(int type, struct kexec_image *new, if ( kexec_load_get_bits(type, &base, &bit) ) return -EINVAL; - spin_lock(&kexec_lock); - pos = (test_bit(bit, &kexec_flags) != 0); old_slot = base + pos; new_slot = base + !pos; @@ -845,8 +842,6 @@ static int kexec_swap_images(int type, struct kexec_image *new, clear_bit(old_slot, &kexec_flags); *old = kexec_image[old_slot]; - spin_unlock(&kexec_lock); - return 0; } @@ -1187,12 +1182,22 @@ static int do_kexec_op_internal(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg, bool_t compat) { + static DEFINE_SPINLOCK(kexec_lock); int ret = -EINVAL; ret = xsm_kexec(XSM_PRIV); if ( ret ) return ret; + /* + * Because we write directly to the reserved memory + * region when loading crash kernels we need a spinlock here to + * prevent multiple crash kernels from attempting to load + * simultaneously, and to prevent a crash kernel from loading + * over the top of a in use crash kernel. + */ + spin_lock(&kexec_lock); + switch ( op ) { case KEXEC_CMD_kexec_get_range: @@ -1227,6 +1232,8 @@ static int do_kexec_op_internal(unsigned long op, break; } + spin_unlock(&kexec_lock); + return ret; } -- 2.7.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.