[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Intermittent xenvif_disconnect() hang on domU destroy
I'm seeing the xenwatch kernel thread hang intermittently when destroying a domU on recent stable xen 4.5, with Linux 4.4.11 + grsec dom0. The domU is created with a virtual network interface connected to a physical interface (ixgbevf) via an openvswitch virtual switch. Everything works fine until the domain is destroyed. Once in a while, a few seconds after the domain goes away, xenwatch hangs in xenvif_disconnect(), calling kthread_stop() on a dealloc task. I added a warning to xenvif_dealloc_kthread_should_stop() when kthread_should_stop() is true and queue->inflight_packets > 0, printing inflight_packets as well as stats.tx_zerocopy_*. Each time the hang occurs, inflight_packets == 1 and tx_zerocopy_sent == tx_zerocopy_success + tx_zerocopy_fail + 1. I also added a warning to xenvif_skb_zerocopy_complete() when queue->task is null. If I manually bring down the physical interface to which the vif was connected (ifconfig down), this somehow causes the last in-flight packet to be transmitted, and everything is unblocked. The following shows xenwatch hung trying to stop vif44.0-q0-dealloc, waking up again after I bring down the physical interface net0_52. [xl destroy] ... [ 2914.510070] net vif44.0: stopping vif44.0-q0-dealloc task (pid 28045) [ 2914.510224] xen_netback:xenvif_dealloc_kthread_should_stop: vif44.0-q0-dealloc task (pid 28045) should_stop=1 inflight_packets=1 tx_zerocopy_sent=209494 tx_zerocopy_success=209492 tx_zerocopy_fail=1 ... [ 2933.561404] device net0_52 left promiscuous mode [ 2933.564813] device vif44.0 left promiscuous mode [ 3136.324009] INFO: task xenwatch:29 blocked for more than 120 seconds. [ 3136.324119] Not tainted 4.4.11-grsec-skyport #66 [ 3136.324181] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3136.324284] xenwatch D ffffc90005d0baf8 12640 29 2 0x00000000 [ 3136.324411] ffffc90005d0baf8 0000000000000000 ffffffff81b70d60 ffff8804ec08c500 [ 3136.324536] ffff8804ec08dbb8 ffffc900115abec0 ffffc900115abeb8 ffff8804ec08c500 [ 3136.324662] 0000000000000005 ffffc90005d0bb10 ffffffff816d6ff7 7fffffffffffffff [ 3136.324806] Call Trace: [ 3136.324866] [<ffffffff816d6ff7>] schedule+0x37/0x80 [ 3136.324932] [<ffffffff816dbdec>] schedule_timeout+0x1bc/0x240 [ 3136.325004] [<ffffffff8100a73c>] ? xen_clocksource_read+0x1c/0x30 [ 3136.325078] [<ffffffff810194e3>] ? sched_clock+0x13/0x20 [ 3136.325153] [<ffffffff8109f72c>] ? local_clock+0x1c/0x20 [ 3136.325228] [<ffffffff810bdd69>] ? mark_held_locks+0x79/0xa0 [ 3136.325298] [<ffffffff816dd077>] ? _raw_spin_unlock_irq+0x27/0x50 [ 3136.325367] [<ffffffff810bdecd>] ? trace_hardirqs_on_caller+0x13d/0x1d0 [ 3136.325441] [<ffffffff816d8426>] wait_for_completion+0xd6/0x110 [ 3136.325514] [<ffffffff81099570>] ? wake_up_q+0x70/0x70 [ 3136.325585] [<ffffffff8108f087>] kthread_stop+0x47/0x80 [ 3136.325660] [<ffffffff814f1661>] xenvif_disconnect+0xb1/0x130 [ 3136.325729] [<ffffffff814ef3c6>] set_backend_state+0x116/0xde0 [ 3136.325805] [<ffffffff8143717e>] ? xenbus_gather+0x10e/0x140 [ 3136.325881] [<ffffffff811a5e42>] ? kfree+0x1c2/0x1e0 [ 3136.325960] [<ffffffff8109f72c>] ? local_clock+0x1c/0x20 [ 3136.326026] [<ffffffff814f0577>] frontend_changed+0xb7/0xc0 [ 3136.326095] [<ffffffff81437fb0>] xenbus_otherend_changed+0x80/0x90 [ 3136.330341] [<ffffffff81437410>] ? unregister_xenbus_watch+0x260/0x260 [ 3136.330414] [<ffffffff81438d2b>] frontend_changed+0xb/0x10 [ 3136.330483] [<ffffffff8143744a>] xenwatch_thread+0x3a/0x130 [ 3136.330553] [<ffffffff810b2ba0>] ? wake_up_atomic_t+0x30/0x30 [ 3136.330621] [<ffffffff8108eccc>] kthread+0xfc/0x120 [ 3136.330686] [<ffffffff8108ebd0>] ? kthread_create_on_node+0x240/0x240 [ 3136.330775] [<ffffffff816ddbee>] ret_from_fork+0x3e/0x70 [ 3136.330840] [<ffffffff8108ebd0>] ? kthread_create_on_node+0x240/0x240 [ 3136.330911] 1 lock held by xenwatch/29: [ 3136.330972] #0: (xenwatch_mutex){+.+.+.}, at: [<ffffffff814374a7>] ffffffff814374a7 ... [ifconfig net0_52 down] ... [ 3162.217907] ixgbe 0000:81:00.0 net0: VF Reset msg received from vf 52 [ 3162.228840] ------------[ cut here ]------------ [ 3162.228945] WARNING: CPU: 3 PID: 31978 at drivers/net/xen-netback/interface.c:71 xenvif_skb_zerocopy_complete+0x79/0x90() [ 3162.229104] vif44.0: dead queue vif44.0-q0 decremented inflight_packets to 0 [ 3162.229184] Modules linked in: xt_physdev br_netfilter bridge stp llc tun xen_blkback ip_gre ip_tunnel gre ixgbevf drbg ansi_cprng dm_crypt algif_skcipher af_alg xen_evtchn xenfs xen_privcmd xen_pciback openvswitch nf_defrag_ipv6 libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul raid1 aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac lpc_ich mfd_core mei_me mei i2c_i801 ioatdma ipmi_ssif ipmi_msghandler squashfs lz4_decompress ixgbe mdio vxlan xhci_pci xhci_hcd ip6_udp_tunnel udp_tunnel ptp pps_core dca ahci libahci ehci_pci ehci_hcd usbcore usb_common tpm_tis [ 3162.230580] CPU: 3 PID: 31978 Comm: ifconfig Not tainted 4.4.11-grsec-skyport #66 [ 3162.230681] Hardware name: ABCD, BIOS SE5C610.86B.01.01.0009.C1.060120151350 06/01/2015 [ 3162.230814] 0000000000000000 ffffc9000f7c3a68 ffffffff8137d3f8 0000000000000000 [ 3162.230938] ffffc9000f7c3ab0 ffffffff81aa9060 ffffc9000f7c3aa0 ffffffff81068de8 [ 3162.231062] ffffc90011565000 ffffc9001156f788 0000000000000001 ffffc90011565000 [ 3162.231187] Call Trace: [ 3162.231251] [<ffffffff8137d3f8>] dump_stack+0x9a/0xe2 [ 3162.231324] [<ffffffff81068de8>] warn_slowpath_common+0x78/0xb0 [ 3162.231392] [<ffffffff81068e67>] warn_slowpath_fmt+0x47/0x50 [ 3162.231469] [<ffffffff810bdecd>] ? trace_hardirqs_on_caller+0x13d/0x1d0 [ 3162.231540] [<ffffffff814f0dc9>] xenvif_skb_zerocopy_complete+0x79/0x90 [ 3162.231612] [<ffffffff814ede0f>] xenvif_zerocopy_callback+0x9f/0xc0 [ 3162.231694] [<ffffffff8156cdb4>] skb_release_data+0xc4/0xe0 [ 3162.231761] [<ffffffff8156cdef>] skb_release_all+0x1f/0x30 [ 3162.231828] [<ffffffff8156cecd>] consume_skb+0x1d/0x40 [ 3162.231900] [<ffffffff81586a74>] __dev_kfree_skb_any+0x34/0x40 [ 3162.231974] [<ffffffffa0bd00d0>] ixgbevf_unmap_and_free_tx_resource.isra.46+0x20/0x80 [ixgbevf] [ 3162.232083] [<ffffffffa0bd016c>] ixgbevf_clean_tx_ring+0x3c/0x80 [ixgbevf] [ 3162.232156] [<ffffffffa0bd418e>] ixgbevf_down+0x2be/0x330 [ixgbevf] [ 3162.232226] [<ffffffffa0bd5332>] ixgbevf_close+0x22/0xa0 [ixgbevf] [ 3162.232299] [<ffffffff81581240>] __dev_close_many+0x90/0xe0 [ 3162.232378] [<ffffffff815813be>] __dev_close+0x2e/0x50 [ 3162.232448] [<ffffffff8158cee8>] __dev_change_flags+0x98/0x160 [ 3162.232517] [<ffffffff8158cfd4>] dev_change_flags+0x24/0x60 [ 3162.232590] [<ffffffff8161f534>] devinet_ioctl+0x834/0x8f0 [ 3162.232661] [<ffffffff8162077b>] inet_ioctl+0x4b/0x70 [ 3162.232739] [<ffffffff8155e6c0>] sock_do_ioctl+0x20/0x50 [ 3162.232804] [<ffffffff8155e8d0>] sock_ioctl+0x1e0/0x290 [ 3162.232876] [<ffffffff811da630>] do_vfs_ioctl+0x430/0x7f0 [ 3162.232939] [<ffffffff811daa64>] SyS_ioctl+0x74/0x80 [ 3162.233009] [<ffffffff816dd83a>] entry_SYSCALL_64_fastpath+0x16/0x7e [ 3162.233129] ---[ end trace 2e4a237dee6f3318 ]--- [ 3162.357009] xen:events: domain 45 does not have 103 anymore [ 3162.357099] xen:events: domain 45 does not have 102 anymore ... [xenwatch unblocked] ... [ 3162.394227] net vif50.0: stopping vif50.0-q0-dealloc task (pid 29413) [ 3162.667880] net vif42.0: stopping vif42.0-q0-dealloc task (pid 27272) [ 3162.705464] net vif48.0: stopping vif48.0-q0-dealloc task (pid 28779) It's not clear to me whether the problem lies in netback, ixgbevf, or somewhere in between. Is the root cause ixgbevf hanging onto a skb for so long, and doing nothing with it until I bring the interface down, or is that a symptom of some other problem? Or is netback supposed to somehow flush in-flight transmit packets before it gets as far as xenvif_disconnect()? Or should it forget about the in-flight packets since the interface is disappearing anyway? Any clues would be appreciated. --Ed _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |