|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Strange PVM spinlock case revisited
A while ago I had been reporting an issue which causes Xen PVM guests
to hang (apparently related to spinlocks). Since then I was able to
gather a few more facts which I try to provide below. For the real
reasons, I am still puzzled and would be thankful for any hints or
direction.
- Happens with Xen 4.1.2 and Xen 4.2 host-side.
- PVM guest with 8 VCPUs is running 800 threads doing DB updates.
- When hanging at least 2 VCPUs in xen_spin_lock_slow with the lock free.
IIRC the weird lock always belongs to one VCPU runqueue.
- Testcase shows the problem for guest kernel versions v3.2..v3.5 (not
verified farther back). Since v3.6 it cannot be reproduced. Bisect
ends up with: 611ae8e3f5204f7480b3b405993b3352cfa16662
"x86/tlb: enable tlb flush range support for x86"
- The other potential fix for this was to prevent xen_spin_lock_slow()
from unmasking events (enabling interrupts) if those were enabled
before the spinlock call.
Maybe allowing or not allowing the interrupts in xen_spin_lock_slow
just manages to tip the likelihood of getting nested deeper. The same
way that the changed TLB flush may reduce it by causing lesser IPIs.
From the static information I can extract from a dump it is hard to
tell what exactly is going on. VCPU2 seems to be doing something but
looks to be rather deep in event handling. The irq_count is at 2, but
irq_count seems to be used in a few more places than just xen_hypervisor
callback. Though that at least seems to be the entry on the stack I can
see for VCPU2. The contents of the per-cpu variables for irq_reqs and
irq_stack_ptr at least seem consistent as if a callback came in while
being on the sched_op hypercall. But then the bt command seems to be
operating on a completely different stack.
VCPU2 seems to be active, not polling (event processing could be initiated
from enable_irqs via force callback or before finally exiting the sched_op
hypercall), there seems to be a pending upcall but upcalls are masked.
The next thing I would try is to instrument the sites incrementing and
decrementing irq_count...
---
Below some of the info seen from dom0 debugging keys and looking
at a dump with crash:
Dom0 Info:
Polling vCPUs: {1,5-7}
VCPU0: CPU0 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00
VCPU1: CPU1 [has=F] poll=10 upcall_pend = 00, upcall_mask = 00
VCPU2: CPU6 [has=T] poll=0 upcall_pend = 01, upcall_mask = 01
VCPU3: CPU2 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00
VCPU4: CPU4 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00
VCPU5: CPU3 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00
VCPU6: CPU7 [has=F] poll=40 upcall_pend = 01, upcall_mask = 01
VCPU7: CPU4 [has=F] poll=46 upcall_pend = 01, upcall_mask = 01
10 [0/1]: s=6 n=1 x=0
40 [0/1]: s=6 n=6 x=0
46 [0/1]: s=6 n=7 x=0
Guest Info:
lock_spinners:
[0] (struct xen_spinlock *) 0x
[1] (struct xen_spinlock *) 0xffff8803bffc8ae8 = { lock=1, spinners=2 }
[2] (struct xen_spinlock *) 0xffff8803bfc93700 = { lock=0, spinners=2 }
[3] (struct xen_spinlock *) 0x
[4] (struct xen_spinlock *) 0x
[5] (struct xen_spinlock *) 0xffff8803bffc8ae8 -> [1]
[6] (struct xen_spinlock *) 0xffff8803bfc93700 -> [2]
[7] (struct xen_spinlock *) 0xffffffff81f15ef0 = { lock=1, spinners=1 }
VCPU[2,6] -> lock of runqueue[2] = (struct rq *) 0xffff8803bfc93700
irq_count[1] = 1
irq_count[2] = 2
It is -1 for the rest.
*(struct pt_regs *) irq_regs[2] = {
.ip = 0xffffffff810013aa == hypercall_page+938 -> sched_op
.sp = 0xffff8803bfc83ce8
}
char *irq_stack_ptr[2] = 0xffff8803bfc83fc0
#> bt -T ffff88033d65c4a0
PID: 2050 TASK: ffff88033d65c4a0 CPU: 2 COMMAND: "postgres"
[ffff88033d72d530] get_page_from_freelist at ffffffff8111e6df
[ffff88033d72d600] __alloc_pages_nodemask at ffffffff8111ebec
[ffff88033d72d620] check_preempt_curr at ffffffff81050254
[ffff88033d72d640] pull_task at ffffffff8105e33e
[ffff88033d72d670] balance_tasks at ffffffff8105e4b6
[ffff88033d72d680] radix_tree_lookup_slot at ffffffff8130db5e
[ffff88033d72d690] find_get_page at ffffffff811161ee
[ffff88033d72d6b0] find_lock_page at ffffffff81116286
[ffff88033d72d6e0] shmem_getpage_gfp at ffffffff8112dd10
[ffff88033d72d748] pte_pfn_to_mfn at ffffffff81005a78
[ffff88033d72d7a0] get_page_from_freelist at ffffffff8111e6df
[ffff88033d72d7c0] _raw_spin_lock at ffffffff81655e4e
[ffff88033d72d7d0] ext4_ext_check_cache at ffffffff81239248
[ffff88033d72d820] ext4_ext_map_blocks at ffffffff8123e269
[ffff88033d72d870] _raw_spin_lock at ffffffff81655e4e
[ffff88033d72d880] enqueue_to_backlog at ffffffff8153e55f
[ffff88033d72d890] __wake_up_common at ffffffff8104c1c8
[ffff88033d72d8e0] netif_rx.part.82 at ffffffff8153f2de
[ffff88033d72d920] netif_rx at ffffffff8153f400
[ffff88033d72d960] loopback_xmit at ffffffff8146fb1c
[ffff88033d72d990] dev_hard_start_xmit at ffffffff8153fea6
[ffff88033d72d9d0] do_softirq at ffffffff81016284
[ffff88033d72d9f0] local_bh_enable at ffffffff8106d614
[ffff88033d72da00] dev_queue_xmit at ffffffff81540309
[ffff88033d72da50] ip_finish_output at ffffffff815769ab
[ffff88033d72da90] ip_output at ffffffff81577518
[ffff88033d72dac0] ip_local_out at ffffffff81576c09
[ffff88033d72dae0] ip_send_skb at ffffffff81577f4b
[ffff88033d72db00] udp_send_skb at ffffffff8159aaa1
[ffff88033d72db40] ip_generic_getfrag at ffffffff81575030
[ffff88033d72db50] udp_sendmsg at ffffffff8159bd38
[ffff88033d72db90] irq_to_desc at ffffffff810d70b7
[ffff88033d72dbd0] notify_remote_via_irq at ffffffff813a74c1
[ffff88033d72dc10] ttwu_queue at ffffffff81052672
[ffff88033d72dc38] _raw_spin_unlock_irqrestore at ffffffff8165605e
[ffff88033d72dc90] inet_sendmsg at ffffffff815a6474
--- bt -t would end here
[ffff88033d72dcb0] apparmor_socket_sendmsg at ffffffff812d43f7
[ffff88033d72dcd0] sock_sendmsg at ffffffff815266fe
[ffff88033d72dd70] md_make_request at ffffffff814e0606
[ffff88033d72ddd0] kmem_cache_free at ffffffff811605cf
[ffff88033d72de10] mempool_free_slab at ffffffff81118507
[ffff88033d72de20] mempool_free at ffffffff811185b7
[ffff88033d72de50] sys_sendto at ffffffff8152974d
[ffff88033d72dee0] ext4_sync_file at ffffffff8120f07b
[ffff88033d72df00] vfs_write at ffffffff811764b0
[ffff88033d72df50] xen_hypervisor_callback at ffffffff816606db
RIP: 00007f6852c7b39a RSP: 00007fff431454e0 RFLAGS: 00000212
RAX: 00007f6852fb38b8 RBX: 00007f6852fb3720 RCX: 000000000000014a
RDX: 0000000000000150 RSI: 0000000000000140 RDI: 00007f6852fb3720
RBP: 0000000000000150 R8: 0000000000000015 R9: 0000000000000000
R10: 0000000000000000 R11: 00007f6852c9174a R12: 00007f6852fb3778
R13: 0000000000000140 R14: 00007f6852fb38b8 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: e033 SS: e02b
#> bt -T ffff88033b812dc0
PID: 2069 TASK: ffff88033b812dc0 CPU: 6 COMMAND: "postgres"
[ffff88033b897530] get_page_from_freelist at ffffffff8111e6df
[ffff88033b897600] __alloc_pages_nodemask at ffffffff8111ebec
[ffff88033b897640] radix_tree_lookup at ffffffff8130db6b
[ffff88033b897650] irq_to_desc at ffffffff810d70b7
[ffff88033b897660] irq_get_irq_data at ffffffff810d9f4e
[ffff88033b897670] balance_tasks at ffffffff8105e493
[ffff88033b897680] radix_tree_lookup_slot at ffffffff8130db5e
[ffff88033b897690] find_get_page at ffffffff811161ee
[ffff88033b8976b0] find_lock_page at ffffffff81116286
[ffff88033b8976e0] shmem_getpage_gfp at ffffffff8112dd10
[ffff88033b897748] pte_pfn_to_mfn at ffffffff81005a78
[ffff88033b897770] xen_set_pte_at at ffffffff81008e29
[ffff88033b897788] __raw_callee_save_xen_make_pte at ffffffff810052cd
[ffff88033b8977b0] unlock_page at ffffffff8111589a
[ffff88033b8977d0] __do_fault at ffffffff81138ac9
[ffff88033b8978a0] pvclock_clocksource_read at ffffffff8103dd15
[ffff88033b8978f0] xen_clocksource_read at ffffffff8100a850
[ffff88033b897900] sched_clock at ffffffff8101bd79
[ffff88033b897910] blk_rq_init at ffffffff812ec262
[ffff88033b897930] get_request at ffffffff812f004e
[ffff88033b8979c0] cpumask_next_and at ffffffff81308c66
[ffff88033b8979e0] find_busiest_group at ffffffff8105a061
--- bt -t ends here
[ffff88033b897a30] radix_tree_lookup at ffffffff8130db6b
[ffff88033b897a40] irq_to_desc at ffffffff810d70b7
[ffff88033b897a50] irq_get_irq_data at ffffffff810d9f4e
[ffff88033b897a60] info_for_irq at ffffffff813a636e
[ffff88033b897a80] xen_poll_irq_timeout at ffffffff813a696e
[ffff88033b897ac0] xen_poll_irq at ffffffff813a8450
[ffff88033b897ad0] xen_spin_lock_slow at ffffffff8163ad77
[ffff88033b897b20] xen_spin_lock at ffffffff810121da
[ffff88033b897b40] _raw_spin_lock at ffffffff81655e4e
[ffff88033b897b50] double_rq_lock at ffffffff8105119c
[ffff88033b897b80] load_balance at ffffffff8105e788
[ffff88033b897b88] _raw_spin_unlock_irqrestore at ffffffff8165605e
[ffff88033b897c10] idle_balance at ffffffff8163d57c
[ffff88033b897c60] __schedule at ffffffff81653ea9
[ffff88033b897cc0] sleep_on_page at ffffffff81115810
[ffff88033b897ce0] schedule at ffffffff81653fcf
[ffff88033b897cf0] io_schedule at ffffffff8165407f
[ffff88033b897d10] sleep_on_page at ffffffff8111581e
[ffff88033b897d20] __wait_on_bit at ffffffff8165489f
[ffff88033b897d70] wait_on_page_bit at ffffffff81115988
[ffff88033b897da0] wake_bit_function at ffffffff8108a140
[ffff88033b897dc0] filemap_fdatawait_range at ffffffff81115a9c
[ffff88033b897e60] do_writepages at ffffffff81120ee1
[ffff88033b897eb0] filemap_write_and_wait_range at ffffffff81117398
[ffff88033b897ee0] ext4_sync_file at ffffffff8120ef9f
[ffff88033b897f00] vfs_write at ffffffff811764b0
[ffff88033b897f40] do_fsync at ffffffff811a4a36
[ffff88033b897f70] sys_fdatasync at ffffffff811a4d83
[ffff88033b897f80] system_call_fastpath at ffffffff8165e442
RIP: 00007f6852ce8240 RSP: 00007fff43145618 RFLAGS: 00000246
RAX: 000000000000004b RBX: ffffffff8165e442 RCX: 00007f6852ce1900
RDX: 00000000000000c5 RSI: 000000000000000c RDI: 000000000000000c
RBP: 00000000000000c5 R8: 000000000073a550 R9: 0000000000000000
R10: 0000000000000004 R11: 0000000000000246 R12: ffffffff811a4d83
R13: ffff88033b897f78 R14: 0000000000000001 R15: 0000000000af7488
ORIG_RAX: 000000000000004b CS: e033 SS: e02b
Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |