Xen project Mailing List

Re: [Xen-devel] [PATCHv5 12/14] xen-blkback: safely unmap grants in case they are still in use

To: David Vrabel <david.vrabel@xxxxxxxxxx>

Date: Mon, 09 Mar 2015 17:09:05 +0800

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Jenny Herbert <jennifer.herbert@xxxxxxxxxx>

Delivery-date: Mon, 09 Mar 2015 09:09:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi David, Recently I met an issue which is likely related with this patch. It happened when running block benchmark on domU, the backend was an iSCSI disk connected to dom0. I got below panic at put_page_testzero() on dom0, at that time the ixgbe network card was freeing skb pages in __skb_frag_unref() but the page->_count was already 0. Do you think is it possiable that page was already freed by blkback? Thanks, -Bob ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:288! invalid opcode: 0000 [#1] SMP Modules linked in: dm_queue_length ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp xt_mac xt_nat nf_conntrack_netlink xt_conntrack ipt_REJECT xt_TCPMSS xt_comment xt_connmark iptable_raw xt_REDIRECT ext4 jbd2 xt_state xt_NFQUEUE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_gre gre nfnetlink_queue nfnetlink ip6table_filter ip6_tables ebtable_nat ebtables softdog iptable_filter ip_tables xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd 8021q garp bridge stp llc sunrpc bonding mlx4_en mlx4_core ipv6 ipmi_devintf ipmi_si ipmi_msghandler vhost_net macvtap macvlan tun iTCO_wdt iTCO_vendor_support coretemp freq_table mperf intel_powerclamp ghash_clmulni_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma sg ext3 jbd mbcache dm_round_robin sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci megaraid_sas ixgbe hwmon dca dm_multipath dm_mirror dm_region_hash dm_log dm_mod crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: bonding] CPU 0 Pid: 31309, comm: kworker/0:0 Tainted: G W 3.8.13-48.el6uek.20150306.x86_64 #2 Oracle Corporation SUN FIRE X4170 M3 /ASSY,MOTHERBOARD,1U RIP: e030:[<ffffffff8113d481>] [<ffffffff8113d481>] put_page+0x31/0x50 RSP: e02b:ffff880278e03d10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8802692257b8 RCX: 00000000ffffffff RDX: ffff88026ea4b2c0 RSI: 0000000000000200 RDI: ffffea00088670c0 RBP: ffff880278e03d10 R08: ffff88026a6e4500 R09: ffff880270a25098 R10: 0000000000000001 R11: ffff880278e03cf0 R12: 0000000000000006 R13: 00000000ffffff8e R14: ffff880270a25098 R15: ffff88026c95d9f0 FS: 00007fbb497cf700(0000) GS:ffff880278e00000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000007aaed0 CR3: 00000002703c2000 CR4: 0000000000042660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:0 (pid: 31309, threadinfo ffff88020c608000, task ffff880200e24140) Stack: ffff880278e03d30 ffffffff814c3855 ffff8802692257b8 ffff8802692257b8 ffff880278e03d50 ffffffff814c38ee 0000000000000000 ffffc9001188faa0 ffff880278e03d70 ffffffff814c3ff1 ffffc9001188faa0 ffff88026c95d8e0 Call Trace: <IRQ> [<ffffffff814c3855>] skb_release_data+0x75/0xf0 [<ffffffff814c38ee>] __kfree_skb+0x1e/0xa0 [<ffffffff814c3ff1>] consume_skb+0x31/0x70 [<ffffffff814ce6ed>] dev_kfree_skb_any+0x3d/0x50 [<ffffffffa01d0bdc>] ixgbe_clean_tx_irq+0xac/0x530 [ixgbe] [<ffffffffa01d10b3>] ixgbe_poll+0x53/0x1a0 [ixgbe] [<ffffffff814d3d05>] net_rx_action+0x105/0x2b0 [<ffffffff81066587>] __do_softirq+0xd7/0x240 [<ffffffff815a7c5c>] call_softirq+0x1c/0x30 [<ffffffff810174b5>] do_softirq+0x65/0xa0 [<ffffffff8106636d>] irq_exit+0xbd/0xe0 [<ffffffff8133d3e5>] xen_evtchn_do_upcall+0x35/0x50 [<ffffffff815a7cbe>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20 [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20 [<ffffffff8133a4e6>] ? gnttab_unmap_refs+0x26/0x70 [<ffffffff8133a5ba>] ? __gnttab_unmap_refs_async+0x8a/0xb0 [<ffffffff8133a672>] ? gnttab_unmap_work+0x22/0x30 [<ffffffff8107bf10>] ? process_one_work+0x180/0x420 [<ffffffff8107df4e>] ? worker_thread+0x12e/0x390 [<ffffffff8107de20>] ? manage_workers+0x180/0x180 [<ffffffff8108329e>] ? kthread+0xce/0xe0 [<ffffffff810039ee>] ? xen_end_context_switch+0x1e/0x30 [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815a682c>] ? ret_from_fork+0x7c/0xb0 [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70 Code: 66 66 90 66 f7 07 00 c0 75 25 8b 47 1c 85 c0 74 1a f0 ff 4f 1c 0f 94 c0 84 c0 75 06 c9 c3 0f 1f 40 00 e8 43 fd ff ff c9 66 90 c3 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 e8 5b fd ff ff c9 RIP [<ffffffff8113d481>] put_page+0x31/0x50 RSP <ffff880278e03d10> ---[ end trace 9f93fe018444fc09 ]--- Kernel panic - not syncing: Fatal exception in interrupt (XEN) Domain 0 crashed: rebooting machine in 5 seconds. (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. On 01/28/2015 12:44 AM, David Vrabel wrote: > From: Jennifer Herbert <jennifer.herbert@xxxxxxxxxx> > > Use gnttab_unmap_refs_async() to wait until the mapped pages are no > longer in use before unmapping them. > > This allows blkback to use network storage which may retain refs to > pages in queued skbs after the block I/O has completed. > > Signed-off-by: Jennifer Herbert <jennifer.herbert@xxxxxxxxxx> > Acked-by: Roger Pau MonnÃ <roger.pau@xxxxxxxxxx> > Acked-by: Jens Axboe <axboe@xxxxxxxxx> > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> > --- > drivers/block/xen-blkback/blkback.c | 169 > ++++++++++++++++++++++++----------- > drivers/block/xen-blkback/common.h | 3 + > 2 files changed, 122 insertions(+), 50 deletions(-) > > diff --git a/drivers/block/xen-blkback/blkback.c > b/drivers/block/xen-blkback/blkback.c > index 908e630..2a04d34 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -47,6 +47,7 @@ > #include <asm/xen/hypervisor.h> > #include <asm/xen/hypercall.h> > #include <xen/balloon.h> > +#include <xen/grant_table.h> > #include "common.h" > > /* > @@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif, > atomic_dec(&blkif->persistent_gnt_in_use); > } > > +static void free_persistent_gnts_unmap_callback(int result, > + struct gntab_unmap_queue_data > *data) > +{ > + struct completion *c = data->data; > + > + /* BUG_ON used to reproduce existing behaviour, > + but is this the best way to deal with this? */ > + BUG_ON(result); > + complete(c); > +} > + > static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root > *root, > unsigned int num) > { > @@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif > *blkif, struct rb_root *root, > struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > struct persistent_gnt *persistent_gnt; > struct rb_node *n; > - int ret = 0; > int segs_to_unmap = 0; > + struct gntab_unmap_queue_data unmap_data; > + struct completion unmap_completion; > + > + init_completion(&unmap_completion); > + > + unmap_data.data = &unmap_completion; > + unmap_data.done = &free_persistent_gnts_unmap_callback; > + unmap_data.pages = pages; > + unmap_data.unmap_ops = unmap; > + unmap_data.kunmap_ops = NULL; > > foreach_grant_safe(persistent_gnt, n, root, node) { > BUG_ON(persistent_gnt->handle == > @@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif > *blkif, struct rb_root *root, > > if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST || > !rb_next(&persistent_gnt->node)) { > - ret = gnttab_unmap_refs(unmap, NULL, pages, > - segs_to_unmap); > - BUG_ON(ret); > + > + unmap_data.count = segs_to_unmap; > + gnttab_unmap_refs_async(&unmap_data); > + wait_for_completion(&unmap_completion); > + > put_free_pages(blkif, pages, segs_to_unmap); > segs_to_unmap = 0; > } > @@ -653,18 +676,14 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif) > shrink_free_pagepool(blkif, 0 /* All */); > } > > -/* > - * Unmap the grant references, and also remove the M2P over-rides > - * used in the 'pending_req'. > - */ > -static void xen_blkbk_unmap(struct xen_blkif *blkif, > - struct grant_page *pages[], > - int num) > +static unsigned int xen_blkbk_unmap_prepare( > + struct xen_blkif *blkif, > + struct grant_page **pages, > + unsigned int num, > + struct gnttab_unmap_grant_ref *unmap_ops, > + struct page **unmap_pages) > { > - struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > - struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > unsigned int i, invcount = 0; > - int ret; > > for (i = 0; i < num; i++) { > if (pages[i]->persistent_gnt != NULL) { > @@ -674,21 +693,95 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif, > if (pages[i]->handle == BLKBACK_INVALID_HANDLE) > continue; > unmap_pages[invcount] = pages[i]->page; > - gnttab_set_unmap_op(&unmap[invcount], vaddr(pages[i]->page), > + gnttab_set_unmap_op(&unmap_ops[invcount], vaddr(pages[i]->page), > GNTMAP_host_map, pages[i]->handle); > pages[i]->handle = BLKBACK_INVALID_HANDLE; > - if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) { > - ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, > - invcount); > + invcount++; > + } > + > + return invcount; > +} > + > +static void xen_blkbk_unmap_and_respond_callback(int result, struct > gntab_unmap_queue_data *data) > +{ > + struct pending_req* pending_req = (struct pending_req*) (data->data); > + struct xen_blkif *blkif = pending_req->blkif; > + > + /* BUG_ON used to reproduce existing behaviour, > + but is this the best way to deal with this? */ > + BUG_ON(result); > + > + put_free_pages(blkif, data->pages, data->count); > + make_response(blkif, pending_req->id, > + pending_req->operation, pending_req->status); > + free_req(blkif, pending_req); > + /* > + * Make sure the request is freed before releasing blkif, > + * or there could be a race between free_req and the > + * cleanup done in xen_blkif_free during shutdown. > + * > + * NB: The fact that we might try to wake up pending_free_wq > + * before drain_complete (in case there's a drain going on) > + * it's not a problem with our current implementation > + * because we can assure there's no thread waiting on > + * pending_free_wq if there's a drain going on, but it has > + * to be taken into account if the current model is changed. > + */ > + if (atomic_dec_and_test(&blkif->inflight) && > atomic_read(&blkif->drain)) { > + complete(&blkif->drain_complete); > + } > + xen_blkif_put(blkif); > +} > + > +static void xen_blkbk_unmap_and_respond(struct pending_req *req) > +{ > + struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data; > + struct xen_blkif *blkif = req->blkif; > + struct grant_page **pages = req->segments; > + unsigned int invcount; > + > + invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_pages, > + req->unmap, req->unmap_pages); > + > + work->data = req; > + work->done = xen_blkbk_unmap_and_respond_callback; > + work->unmap_ops = req->unmap; > + work->kunmap_ops = NULL; > + work->pages = req->unmap_pages; > + work->count = invcount; > + > + gnttab_unmap_refs_async(&req->gnttab_unmap_data); > +} > + > + > +/* > + * Unmap the grant references. > + * > + * This could accumulate ops up to the batch size to reduce the number > + * of hypercalls, but since this is only used in error paths there's > + * no real need. > + */ > +static void xen_blkbk_unmap(struct xen_blkif *blkif, > + struct grant_page *pages[], > + int num) > +{ > + struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > + struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > + unsigned int invcount = 0; > + int ret; > + > + while (num) { > + unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST); > + > + invcount = xen_blkbk_unmap_prepare(blkif, pages, batch, > + unmap, unmap_pages); > + if (invcount) { > + ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, > invcount); > BUG_ON(ret); > put_free_pages(blkif, unmap_pages, invcount); > - invcount = 0; > } > - } > - if (invcount) { > - ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount); > - BUG_ON(ret); > - put_free_pages(blkif, unmap_pages, invcount); > + pages += batch; > + num -= batch; > } > } > > @@ -982,32 +1075,8 @@ static void __end_block_io_op(struct pending_req > *pending_req, int error) > * the grant references associated with 'request' and provide > * the proper response on the ring. > */ > - if (atomic_dec_and_test(&pending_req->pendcnt)) { > - struct xen_blkif *blkif = pending_req->blkif; > - > - xen_blkbk_unmap(blkif, > - pending_req->segments, > - pending_req->nr_pages); > - make_response(blkif, pending_req->id, > - pending_req->operation, pending_req->status); > - free_req(blkif, pending_req); > - /* > - * Make sure the request is freed before releasing blkif, > - * or there could be a race between free_req and the > - * cleanup done in xen_blkif_free during shutdown. > - * > - * NB: The fact that we might try to wake up pending_free_wq > - * before drain_complete (in case there's a drain going on) > - * it's not a problem with our current implementation > - * because we can assure there's no thread waiting on > - * pending_free_wq if there's a drain going on, but it has > - * to be taken into account if the current model is changed. > - */ > - if (atomic_dec_and_test(&blkif->inflight) && > atomic_read(&blkif->drain)) { > - complete(&blkif->drain_complete); > - } > - xen_blkif_put(blkif); > - } > + if (atomic_dec_and_test(&pending_req->pendcnt)) > + xen_blkbk_unmap_and_respond(pending_req); > } > > /* > diff --git a/drivers/block/xen-blkback/common.h > b/drivers/block/xen-blkback/common.h > index f65b807..cc90a84 100644 > --- a/drivers/block/xen-blkback/common.h > +++ b/drivers/block/xen-blkback/common.h > @@ -350,6 +350,9 @@ struct pending_req { > struct grant_page *indirect_pages[MAX_INDIRECT_PAGES]; > struct seg_buf seg[MAX_INDIRECT_SEGMENTS]; > struct bio *biolist[MAX_INDIRECT_SEGMENTS]; > + struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS]; > + struct page *unmap_pages[MAX_INDIRECT_SEGMENTS]; > + struct gntab_unmap_queue_data gnttab_unmap_data; > }; > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.