[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Centos 7 shutdown can cause dom0 kernel panic



On Wed, 2015-05-13 at 11:03 +0200, Armin Zentai wrote:
> Dear Xen Developers!
> 
> 
> I'd like to report you a bug, that can cause HV reboot.

Thanks.

FWIW this is a dom0 kernel panic somewhere in either the network or
iscsi stack, not a panic in the hypervisor itself.

I mention this because the set of people who would be expected to look
into such a thing would be different. I've also tweaked the subject to
reflect this.

Ian.

> 
> Initiating an xm shutdown command to a centos 7 can VM cause a 
> Hypervisor kernel panic.
> 
> Fortunately I can reproduce this bug for every second shutdown 
> (statistically), this HV was pulled out from production environment, so 
> I can test anything on it from now.
> 
> Running: xm shutdown p39glp9m68muq2
> after some seconds, HV kernel panic
> 
> 
> 
> Dom0 Operating system:
> Linux c2-node08 3.10.55-11.el6.centos.alt.x86_64 #1 SMP Fri Sep 26 
> 19:08:24 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> 
> DomU Operating system (guest centos7):
> Linux centos7memtest 3.10.0-123.20.1.el7.onapp.x86_64 #1 SMP Fri Feb 6 
> 14:54:22 EET 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> HW:
> CPU: Intel Xeon E5645 @ 2.40 Ghz
> Chassis/Motherboard: Dell PowerEdge R410
> Memory: 48GB (4x16GB HMT42GR7BMR4A-G7)
> Disk: INTEL SSDSA2CT04 - 40GB
> NIC: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
> 
> We're using a SAN network, 2x2 multipath iscsi, with a EMC VNX 5300 
> storage. iSCSI is connected via 2x10 gbit, through two Cisco Nexus 5548 
> switches.
> 
> 
> Grub config:
>    kernel /boot/xen.gz dom0_mem=3145728 dom0_max_vcpus=6 log_lvl=all 
> guest_loglvl=all noreboot=true
>    module /boot/vmlinuz-3.10.55-11.el6.centos.alt.x86_64 ro 
> root=UUID=796d645a-c97a-4574-a4fc-716cbeb7247e rd_NO_LUKS rd_NO_LVM 
> LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto 
> KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM pcie_asmp=off
>    module /boot/initramfs-3.10.55-11.el6.centos.alt.x86_64.img
> 
> We are using these extra settings at startup
> ethtool -K tge1 tso off gso off lro off
> ethtool -K tge2 tso off gso off lro off
> 
> sysctl -w vm.min_free_kbytes=262144
> sysctl -w kernel.sem="250 32000 100 512"
> 
> Dmesg output from dom0 is attached as dom0_dmesg.txt
> xl dmesg output from dom0 is attached as dom0_xl_dmesg.txt
> 
> Instead of searial we are using netconsole, but it gives us a good 
> output. Attached as netconsole_log.txt. But I'm putting here the top of 
> the call stack:
> [<ffffffffa02932c8>] ? iscsi_tcp_recv_skb+0x1b8/0x3c0 [libiscsi_tcp]
> [<ffffffffa02adfe9>] iscsi_sw_tcp_recv+0x49/0xe0 [iscsi_tcp]
> [<ffffffff8156ed75>] tcp_read_sock+0x95/0x1e0
> [<ffffffff81063b87>] ? local_bh_enable+0x27/0xa0
> [<ffffffffa02adfa0>] ? iscsi_sw_tcp_state_change+0xd0/0xd0 [iscsi_tcp]
> [<ffffffffa02ae3d7>] iscsi_sw_tcp_data_ready+0x57/0xd8 [iscsi_tcp]
> [<ffffffff8157739d>] ? tcp_try_rmem_schedule+0x6d/0x130
> [<ffffffff81577e6f>] tcp_data_queue+0x37f/0x5c0
> [<ffffffff8157aaa9>] tcp_rcv_established+0x319/0
> 
> If it's required I can configure a serial link to a HV, to catch a 
> better output.
> 
> Dmesg output from domU is attached as centos7_dmesg.txt
> 
> Xen version: 4.2.4-33.el6, full xen info output is attached as xen_info.txt
> 
> Probably it's related is iSCSI, so I'm putting here some info about the 
> network and iSCSI settings:
> 
> 
> Interfaces:
> 
> 4: tge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP 
> qlen 1000
>      link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff
> 5: tge2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP 
> qlen 1000
>      link/ether a0:36:9f:2a:70:92 brd ff:ff:ff:ff:ff:ff
> 6: tge1.20@tge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP
>      link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff
> 7: tge1.12@tge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc 
> noqueue state UP
>      link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff
> 8: tge1.4@tge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue 
> state UP
>      link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff
> 9: tge2.5@tge2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue 
> state UP
>      link/ether a0:36:9f:2a:70:92 brd ff:ff:ff:ff:ff:ff
> 
> iSCSI traffic is sent via the tge1.4 and tge2.5 VLAN interfaces
> 
> 8: tge1.4@tge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue 
> state UP
>      link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff
>      inet 10.0.1.18/24 brd 10.0.1.255 scope global tge1.4
>         valid_lft forever preferred_lft forever
>      inet6 fe80::a236:9fff:fe2a:7090/64 scope link
>         valid_lft forever preferred_lft forever
> 9: tge2.5@tge2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue 
> state UP
>      link/ether a0:36:9f:2a:70:92 brd ff:ff:ff:ff:ff:ff
>      inet 10.0.2.18/24 brd 10.0.2.255 scope global tge2.5
>         valid_lft forever preferred_lft forever
>      inet6 fe80::a236:9fff:fe2a:7092/64 scope link
>         valid_lft forever preferred_lft forever
> 
> 
> Multipathd is working with the following config:
> devices {
>          device {
>                  vendor                  "DGC"
>                  product                 "*"
>                  product_blacklist       "LUNZ"
>                  path_grouping_policy    "group_by_prio"
>                  getuid_callout          "/sbin/scsi_id -g -u -d /dev/%n"
>                  path_selector           "round-robin 0"
>                  features                "1 queue_if_no_path"
>                  hardware_handler        "1 alua"
>                  prio                    "alua"
>                  path_checker            "emc_clariion"
>                  no_path_retry           60
>                  failback                immediate
>                  rr_weight               uniform
>                  rr_min_io               1000
>          }
> }
> 
> 
> 
> iSCSI config:
> 
> iscsid.startup = /etc/rc.d/init.d/iscsid force-start
> node.startup = automatic
> node.leading_login = No
> node.session.timeo.replacement_timeout = 10
> node.conn[0].timeo.login_timeout = 5
> node.conn[0].timeo.logout_timeout = 5
> node.conn[0].timeo.noop_out_interval = 5
> node.conn[0].timeo.noop_out_timeout = 5
> node.session.err_timeo.abort_timeout = 10
> node.session.err_timeo.lu_reset_timeout = 10
> node.session.err_timeo.tgt_reset_timeout = 10
> node.session.initial_login_retry_max = 8
> node.session.cmds_max = 512
> node.session.queue_depth = 32
> node.session.xmit_thread_priority = -20
> node.session.iscsi.InitialR2T = No
> node.session.iscsi.ImmediateData = Yes
> node.session.iscsi.FirstBurstLength = 262144
> node.session.iscsi.MaxBurstLength = 16776192
> node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
> node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
> discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
> node.conn[0].iscsi.HeaderDigest = None
> node.session.nr_sessions = 1
> node.session.iscsi.FastAbort = Yes
> 
> 
> 
> XM config for the virtual machine:
> 
> bootloader = "/usr/bin/pygrub"
> vcpus = "1"
> memory = "400"
> name = "p39glp9m68muq2"
> 
> vif = [ "mac=00:16:3e:84:XX:XX, bridge=x0evss6g1ztoa4, ip=XX.XX.XX.XX, 
> vifname=gh4txstv16yoaw, rate=100Mb/s" ]
> 
> disk = [ "phy:/dev/onapp-p65uo6l3rgns6n/x5c58a6aj8cgiw,xvda1,w", 
> "phy:/dev/onapp-p65uo6l3rgns6n/s89aa4a3m3uzi2,xvda2,w" ]
> vfb = [ "type=vnc,vncpasswd=lysuad" ]
> 
> 
> 
> Thanks for all,
>   - Armin
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.