[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null domains after xl destroy

To: Juergen Gross <jgross@xxxxxxxx>
From: Glenn Enright <glenn@xxxxxxxxxxxxxxx>
Date: Mon, 1 May 2017 12:55:24 +1200
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
Delivery-date: Mon, 01 May 2017 00:55:35 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 19/04/17 22:09, Juergen Gross wrote:

On 19/04/17 09:16, Roger Pau Monné wrote:

On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote:

On 19/04/17 03:02, Glenn Enright wrote:

Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still
shows the issue. When replicating the leak I now see this trace (via
dmesg). Hopefully that is useful.

Please note, I'm going to be offline next week, but am keen to keep on
with this, it may just be a while before I followup is all.

Regards, Glenn
http://rimuhosting.com


------------[ cut here ]------------
WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508
xen_blkbk_remove+0x138/0x140
Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev
xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4
ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security
iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc
ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_smbus
i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci floppy
dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1
Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007
 ffffc90040cfbba8 ffffffff8136b61f 0000000000000013 0000000000000000
 0000000000000000 0000000000000000 ffffc90040cfbbf8 ffffffff8108007d
 ffffea0001373fe0 000001fc33394434 ffff880000000001 ffff88004d93fac0
Call Trace:
 [<ffffffff8136b61f>] dump_stack+0x67/0x98
 [<ffffffff8108007d>] __warn+0xfd/0x120
 [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20
 [<ffffffff814ebde8>] xen_blkbk_remove+0x138/0x140
 [<ffffffff814497f7>] xenbus_dev_remove+0x47/0xa0
 [<ffffffff814bcfd4>] __device_release_driver+0xb4/0x160
 [<ffffffff814bd0ad>] device_release_driver+0x2d/0x40
 [<ffffffff814bbfd4>] bus_remove_device+0x124/0x190
 [<ffffffff814b93a2>] device_del+0x112/0x210
 [<ffffffff81448113>] ? xenbus_read+0x53/0x70
 [<ffffffff814b94c2>] device_unregister+0x22/0x60
 [<ffffffff814ed7cd>] frontend_changed+0xad/0x4c0
 [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
 [<ffffffff81449b57>] xenbus_otherend_changed+0xc7/0x140
 [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
 [<ffffffff81449fe0>] frontend_changed+0x10/0x20
 [<ffffffff814477fc>] xenwatch_thread+0x9c/0x140
 [<ffffffff810bffa0>] ? woken_wake_function+0x20/0x20
 [<ffffffff816ed93a>] ? schedule+0x3a/0xa0
 [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [<ffffffff810c0c5d>] ? complete+0x4d/0x60
 [<ffffffff81447760>] ? split+0xf0/0xf0
 [<ffffffff810a051d>] kthread+0xcd/0xf0
 [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
 [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
 [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
 [<ffffffff816f1b45>] ret_from_fork+0x25/0x30
---[ end trace ee097287c9865a62 ]---


Konrad, Roger,

this was triggered by a debug patch in xen_blkbk_remove():

        if (be->blkif)
-               xen_blkif_disconnect(be->blkif);
+               WARN_ON(xen_blkif_disconnect(be->blkif));

So I guess we need something like xen_blk_drain_io() in case of calls to
xen_blkif_disconnect() which are not allowed to fail (either at the call
sites of xen_blkif_disconnect() or in this function depending on a new
boolean parameter indicating it should wait for outstanding I/Os).

I can try a patch, but I'd appreciate if you could confirm this wouldn't
add further problems...


Hello,

Thanks for debugging this, the easiest solution seems to be to replace the
ring->inflight atomic_read check in xen_blkif_disconnect with a call to
xen_blk_drain_io instead, and making xen_blkif_disconnect return void (to
prevent further issues like this one).


Glenn,

can you please try the attached patch (in dom0)?


Juergen


(resending with full CC list)

I'm back. After testing unfortunately I'm still seeing the leak. Thebelow trace is with the debug patch applied as well under 4.9.25. Itlooks very similar to me. I am still able to replicate this reliably.


Regards, Glenn
http://rimuhosting.com

------------[ cut here ]------------

WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:511xen_blkbk_remove+0x138/0x140Modules linked in: ebt_ip xen_pciback xen_netback xen_gntallocxen_gntdev xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_securityiptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llcipv6 crc_ccitt ppdev parport_pc parport serio_raw i2c_i801 i2c_smbusi2c_core sg e1000e ptp pps_core i3000_edac edac_core raid1 sd_mod ahcilibahci floppy dm_mirror dm_region_hash dm_log dm_mod

CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.25-1.el6xen.x86_64 #1
Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007
 ffffc90040cfbb98 ffffffff8136b76f 0000000000000013 0000000000000000
 0000000000000000 0000000000000000 ffffc90040cfbbe8 ffffffff8108007d
 ffffea0000141720 000001ff41334434 ffff880000000001 ffff88004d3aedc0
Call Trace:
 [<ffffffff8136b76f>] dump_stack+0x67/0x98
 [<ffffffff8108007d>] __warn+0xfd/0x120
 [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20
 [<ffffffff814ec0a8>] xen_blkbk_remove+0x138/0x140
 [<ffffffff81449b07>] xenbus_dev_remove+0x47/0xa0
 [<ffffffff814bd2b4>] __device_release_driver+0xb4/0x160
 [<ffffffff814bd38d>] device_release_driver+0x2d/0x40
 [<ffffffff814bc2b4>] bus_remove_device+0x124/0x190
 [<ffffffff814b9682>] device_del+0x112/0x210
 [<ffffffff81448423>] ? xenbus_read+0x53/0x70
 [<ffffffff814b97a2>] device_unregister+0x22/0x60
 [<ffffffff814eda9d>] frontend_changed+0xad/0x4c0
 [<ffffffff81449e67>] xenbus_otherend_changed+0xc7/0x140
 [<ffffffff816f1486>] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [<ffffffff8144a2f0>] frontend_changed+0x10/0x20
 [<ffffffff81447b0c>] xenwatch_thread+0x9c/0x140
 [<ffffffff810bffb0>] ? woken_wake_function+0x20/0x20
 [<ffffffff816ed98a>] ? schedule+0x3a/0xa0
 [<ffffffff816f1486>] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [<ffffffff810c0c6d>] ? complete+0x4d/0x60
 [<ffffffff81447a70>] ? split+0xf0/0xf0
 [<ffffffff810a0535>] kthread+0xe5/0x100
 [<ffffffff810a051d>] ? kthread+0xcd/0x100
 [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
 [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
 [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
 [<ffffffff816f1bc5>] ret_from_fork+0x25/0x30
---[ end trace ea3a48c80e4ad79d ]---

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] null domains after xl destroy
  - From: Steven Haigh

Prev by Date: [Xen-devel] [qemu-mainline bisection] complete build-amd64-xsm
Next by Date: [Xen-devel] [xen-unstable test] 108038: regressions - FAIL
Previous by thread: [Xen-devel] [qemu-mainline bisection] complete build-amd64-xsm
Next by thread: Re: [Xen-devel] null domains after xl destroy
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.