[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Trying to unmap invalid handle! pending_idx: @ drivers/net/xen-netback/netback.c:998 causes kernel panic/reboot



Hello

On Mon, Jul 14, 2014 at 04:25:54AM +0200, Armin Zentai wrote:
> Dear Xen Developers!
> 
> 
> We're running Xen on multiple machines, most of them are Dell R410 or SM
> X8DTL, with one E5645 cpu, and 48 GB of RAM. We've update the kernel to
> 3.15.4, after the some of our hypervisors started to rebooting at random
> times.
> 
> The logs were empty, and we have no information about the crashes, we've
> tried some tricks, and at the end the netconsole kernel modul helped, so we
> can do a very thin layer of remote kernel logging. We've found the following
> in the remote logs:

It's good you've got netconsole working. I would still like to point out
that we have a wiki page on setting up serial console on Xen, which
might be helpful.

http://wiki.xen.org/wiki/Xen_Serial_Console

> 
> Jul 13 00:46:58 node11 [157060.106323] vif vif-2-0 h14z4mzbvfrrhb: Trying to
> unmap invalid handle! pending_idx: c
> Jul 13 00:46:58 node11 [157060.106476] ------------[ cut here ]------------
> Jul 13 00:46:58 node11 [157060.106546] kernel BUG at
> drivers/net/xen-netback/netback.c:998!
> Jul 13 00:46:58 node11 [157060.106616] invalid opcode: 0000 [#1]
> Jul 13 00:46:58 SMP
> Jul 13 00:46:58 node11
[...]
> Jul 13 00:46:58 node11 [157060.112705] CPU: 0 PID: 0 Comm: swapper/0
> Tainted: G            E 3.15.4 #1
> Jul 13 00:46:58 node11 [157060.112776] Hardware name: Supermicro
> X8DTL/X8DTL, BIOS 1.1b    03/19/2010
> Jul 13 00:46:58 node11 [157060.112848] task: ffffffff81c1b480 ti:
> ffffffff81c00000 task.ti: ffffffff81c00000
> Jul 13 00:46:58 node11 [157060.112936] RIP: e030:[<ffffffffa025f61d>]
> Jul 13 00:46:58 node11  [<ffffffffa025f61d>] xenvif_idx_unmap+0x11d/0x130
> [xen_netback]
> Jul 13 00:46:58 node11 [157060.113078] RSP: e02b:ffff88008ea03d48 EFLAGS:
> 00010292
> Jul 13 00:46:58 node11 [157060.113147] RAX: 000000000000004a RBX:
> 000000000000000c RCX: 0000000000000000
> Jul 13 00:46:58 node11 [157060.113234] RDX: ffff88008a40b600 RSI:
> ffff88008ea03a18 RDI: 000000000000021b
> Jul 13 00:46:58 node11 [157060.113321] RBP: ffff88008ea03d88 R08:
> 0000000000000000 R09: ffff88008a40b600
> Jul 13 00:46:58 node11 [157060.113408] R10: ffff88008a0004e8 R11:
> 00000000000006d8 R12: ffff8800569708c0
> Jul 13 00:46:58 node11 [157060.113495] R13: ffff88006558fec0 R14:
> ffff8800569708c0 R15: 0000000000000001
> Jul 13 00:46:58 node11 [157060.113589] FS:  00007f351684b700(0000)
> GS:ffff88008ea00000(0000) knlGS:0000000000000000
> Jul 13 00:46:58 node11 [157060.113679] CS:  e033 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Jul 13 00:46:58 node11 [157060.113747] CR2: 00007fc2a4372000 CR3:
> 00000000049f3000 CR4: 0000000000002660
> Jul 13 00:46:58 node11 [157060.113835] Stack:
> Jul 13 00:46:58 node11 [157060.113896]  ffff880056979f90
> Jul 13 00:46:58 node11  ff00000000000001
> Jul 13 00:46:58 node11  ffff880b0605e000
> Jul 13 00:46:58 node11  0000000000000000
> Jul 13 00:46:58 node11
> Jul 13 00:46:58 node11 [157060.114143]  ffff0000ffffffff
> Jul 13 00:46:58 node11  00000000fffffff6
> Jul 13 00:46:58 node11  0000000000000001
> Jul 13 00:46:58 node11  ffff8800569769d0
> Jul 13 00:46:58 node11
> Jul 13 00:46:58 node11 [157060.114390]  ffff88008ea03e58
> Jul 13 00:46:58 node11  ffffffffa02622fc
> Jul 13 00:46:58 node11  ffff88008ea03dd8
> Jul 13 00:46:58 node11  ffffffff810b5223
> Jul 13 00:46:58 node11
> Jul 13 00:46:58 node11 [157060.114637] Call Trace:
> Jul 13 00:46:58 node11 [157060.114700]  <IRQ>
> Jul 13 00:46:58 node11
> Jul 13 00:46:58 node11 [157060.114750]
> Jul 13 00:46:58 node11  [<ffffffffa02622fc>] xenvif_tx_action+0x27c/0x7f0
> [xen_netback]
> Jul 13 00:46:58 node11 [157060.114927]  [<ffffffff810b5223>] ?
> __wake_up+0x53/0x70
> Jul 13 00:46:58 node11 [157060.114998]  [<ffffffff810ca077>] ?
> handle_irq_event_percpu+0xa7/0x1b0
> Jul 13 00:46:58 node11 [157060.115073]  [<ffffffffa02647d1>]
> xenvif_poll+0x31/0x64 [xen_netback]
> Jul 13 00:46:58 node11 [157060.115147]  [<ffffffff81653d4b>]
> net_rx_action+0x10b/0x290
> Jul 13 00:46:58 node11 [157060.115221]  [<ffffffff81071c73>]
> __do_softirq+0x103/0x320
> Jul 13 00:46:58 node11 [157060.115292]  [<ffffffff81072015>]
> irq_exit+0x135/0x140
> Jul 13 00:46:58 node11 [157060.115363]  [<ffffffff8144759c>]
> xen_evtchn_do_upcall+0x3c/0x50
> Jul 13 00:46:58 node11 [157060.115436]  [<ffffffff8175c07e>]
> xen_do_hypervisor_callback+0x1e/0x30
> Jul 13 00:46:58 node11 [157060.115506]  <EOI>
> Jul 13 00:46:58 node11
> Jul 13 00:46:58 node11 [157060.115551]
> Jul 13 00:46:58 node11  [<ffffffff810013aa>] ?
> xen_hypercall_sched_op+0xa/0x20
> Jul 13 00:46:58 node11 [157060.115722]  [<ffffffff810013aa>] ?
> xen_hypercall_sched_op+0xa/0x20
> Jul 13 00:46:58 node11 [157060.115794]  [<ffffffff8100a200>] ?
> xen_safe_halt+0x10/0x20
> Jul 13 00:46:58 node11 [157060.115869]  [<ffffffff8101dbbf>] ?
> default_idle+0x1f/0xc0
> Jul 13 00:46:58 node11 [157060.115939]  [<ffffffff8101d38f>] ?
> arch_cpu_idle+0xf/0x20
> Jul 13 00:46:58 node11 [157060.116009]  [<ffffffff810b5aa1>] ?
> cpu_startup_entry+0x201/0x360
> Jul 13 00:46:58 node11 [157060.116084]  [<ffffffff817420a7>] ?
> rest_init+0x77/0x80
> Jul 13 00:46:58 node11 [157060.116156]  [<ffffffff81d3a156>] ?
> start_kernel+0x406/0x413
> Jul 13 00:46:58 node11 [157060.116227]  [<ffffffff81d39b6e>] ?
> repair_env_string+0x5b/0x5b
> Jul 13 00:46:58 node11 [157060.116298]  [<ffffffff81d39603>] ?
> x86_64_start_reservations+0x2a/0x2c
> Jul 13 00:46:58 node11 [157060.116373]  [<ffffffff81d3d5dc>] ?
> xen_start_kernel+0x584/0x586
[...]
> Jul 13 00:46:58 node11
> Jul 13 00:46:58 node11 [157060.119179] RIP
> Jul 13 00:46:58 node11  [<ffffffffa025f61d>] xenvif_idx_unmap+0x11d/0x130
> [xen_netback]
> Jul 13 00:46:58 node11 [157060.119312]  RSP <ffff88008ea03d48>
> Jul 13 00:46:58 node11 [157060.119395] ---[ end trace 7e021c96c8cfea53 ]---
> Jul 13 00:46:58 node11 [157060.119465] Kernel panic - not syncing: Fatal
> exception in interrupt
> 
> 
> h14z4mzbvfrrhb was a name of a VIF. This VIF belongs to a Windows Server
> 2008 R2 X64 virtual machine. We had 6 random reboots until now, all of the
> VIFs are belonged to the same operating system, but different virtual
> machines. So only Windows Server 2008 R2 X64 system's virtual interfaces
> caused the crashes, these systems has been provisioned from different
> installs or templates. The GPLPV driver's versions are also different.
> 

Unfortunately I don't have Windows server 2008 R2. :-(

This bug is in guest TX path. What's the workload of your guest? Is
there any pattern of its traffic?

I've checked changesets between 3.15.4 and 3.16-rc5 there's no fix for
this, so this is the first report of this issue.  If there's a reliable
reproduce then that would be great.

Zoltan, have you seen this before? Can your work on pktgen help?

> [root@c2-node11 ~]# uname -a
> Linux c2-node11 3.15.4 #1 SMP Tue Jul 8 17:58:26 CEST 2014 x86_64 x86_64
> x86_64 GNU/Linux
> 
> 
> The xm create config file of the specified VM (the other VM's config files
> are the same):
> 
> kernel = "/usr/lib/xen/boot/hvmloader"
> device_model = "/usr/lib64/xen/bin/qemu-dm"
> builder = "hvm"
> memory = "2000"
> name = "vna3mhwnv9pn4m"
> vcpus = "1"
> 
> timer_mode = "2"
> viridian = "1"
> 
> vif = [ "type=ioemu, mac=00:16:3e:64:c8:ba, bridge=x0evss6g1ztoa4, ip=...,
> vifname=h14z4mzbvfrrhb, rate=100Mb/s" ]
> 
> disk = [ "phy:/dev/q7jiqc2gh02b2b/xz7wget4ycmp77,ioemu:hda,w" ]
> vnc = 1
> vncpasswd="aaaaa1"
> usbdevice="tablet"
> 
> 
> The HV's networking looks as the following:
> We are using dual emulex 10gbit network adapters, with bonding (LACP), and
> on the top of the bond, we're using VLAN's for the VM, management and the
> iSCSI traffic.
> We're tried to reproduce the error, but we couldn't, the crash/reboot
> happened randomly every time.
> 

In that case you will need to instrument netback to spit out more
information. Zoltan, is there any other information that you would like
to know?

Wei.

> Thanks, for your help,
> 
>  - Armin Zentai
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.