[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCHv5 12/14] xen-blkback: safely unmap grants in case they are still in use



Hi David,

Recently I met an issue which is likely related with this patch. It
happened when running block benchmark on domU, the backend was an iSCSI
disk connected to dom0. I got below panic at put_page_testzero() on
dom0, at that time the ixgbe network card was freeing skb pages in
__skb_frag_unref() but the page->_count was already 0.
Do you think is it possiable that page was already freed by blkback?

Thanks,
-Bob
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:288!
invalid opcode: 0000 [#1] SMP
Modules linked in: dm_queue_length ib_iser rdma_cm ib_cm iw_cm ib_sa
ib_mad ib_core ib_addr iscsi_tcp xt_mac xt_nat nf_conntrack_netlink
xt_conntrack ipt_REJECT xt_TCPMSS xt_comment xt_connmark iptable_raw
xt_REDIRECT ext4 jbd2 xt_state xt_NFQUEUE iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_gre gre
nfnetlink_queue nfnetlink ip6table_filter ip6_tables ebtable_nat
ebtables softdog iptable_filter ip_tables xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd 8021q
garp bridge stp llc sunrpc bonding mlx4_en mlx4_core ipv6 ipmi_devintf
ipmi_si ipmi_msghandler vhost_net macvtap macvlan tun iTCO_wdt
iTCO_vendor_support coretemp freq_table mperf intel_powerclamp
ghash_clmulni_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core
shpchp ioatdma sg ext3 jbd mbcache dm_round_robin sd_mod crc_t10dif
aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci
megaraid_sas ixgbe hwmon dca dm_multipath dm_mirror dm_region_hash
dm_log dm_mod crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i
libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi
scsi_transport_iscsi [last unloaded: bonding]
CPU 0
Pid: 31309, comm: kworker/0:0 Tainted: G        W
3.8.13-48.el6uek.20150306.x86_64 #2 Oracle Corporation SUN FIRE X4170 M3
    /ASSY,MOTHERBOARD,1U
RIP: e030:[<ffffffff8113d481>]  [<ffffffff8113d481>] put_page+0x31/0x50
RSP: e02b:ffff880278e03d10  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8802692257b8 RCX: 00000000ffffffff
RDX: ffff88026ea4b2c0 RSI: 0000000000000200 RDI: ffffea00088670c0
RBP: ffff880278e03d10 R08: ffff88026a6e4500 R09: ffff880270a25098
R10: 0000000000000001 R11: ffff880278e03cf0 R12: 0000000000000006
R13: 00000000ffffff8e R14: ffff880270a25098 R15: ffff88026c95d9f0
FS:  00007fbb497cf700(0000) GS:ffff880278e00000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000007aaed0 CR3: 00000002703c2000 CR4: 0000000000042660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:0 (pid: 31309, threadinfo ffff88020c608000, task
ffff880200e24140)
Stack:
 ffff880278e03d30 ffffffff814c3855 ffff8802692257b8 ffff8802692257b8
 ffff880278e03d50 ffffffff814c38ee 0000000000000000 ffffc9001188faa0
 ffff880278e03d70 ffffffff814c3ff1 ffffc9001188faa0 ffff88026c95d8e0
Call Trace:
 <IRQ>
 [<ffffffff814c3855>] skb_release_data+0x75/0xf0
 [<ffffffff814c38ee>] __kfree_skb+0x1e/0xa0
 [<ffffffff814c3ff1>] consume_skb+0x31/0x70
 [<ffffffff814ce6ed>] dev_kfree_skb_any+0x3d/0x50
 [<ffffffffa01d0bdc>] ixgbe_clean_tx_irq+0xac/0x530 [ixgbe]
 [<ffffffffa01d10b3>] ixgbe_poll+0x53/0x1a0 [ixgbe]
 [<ffffffff814d3d05>] net_rx_action+0x105/0x2b0
 [<ffffffff81066587>] __do_softirq+0xd7/0x240
 [<ffffffff815a7c5c>] call_softirq+0x1c/0x30
 [<ffffffff810174b5>] do_softirq+0x65/0xa0
 [<ffffffff8106636d>] irq_exit+0xbd/0xe0
 [<ffffffff8133d3e5>] xen_evtchn_do_upcall+0x35/0x50
 [<ffffffff815a7cbe>] xen_do_hypervisor_callback+0x1e/0x30
 <EOI>
 [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20
 [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20
 [<ffffffff8133a4e6>] ? gnttab_unmap_refs+0x26/0x70
 [<ffffffff8133a5ba>] ? __gnttab_unmap_refs_async+0x8a/0xb0
 [<ffffffff8133a672>] ? gnttab_unmap_work+0x22/0x30
 [<ffffffff8107bf10>] ? process_one_work+0x180/0x420
 [<ffffffff8107df4e>] ? worker_thread+0x12e/0x390
 [<ffffffff8107de20>] ? manage_workers+0x180/0x180
 [<ffffffff8108329e>] ? kthread+0xce/0xe0
 [<ffffffff810039ee>] ? xen_end_context_switch+0x1e/0x30
 [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815a682c>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70
Code: 66 66 90 66 f7 07 00 c0 75 25 8b 47 1c 85 c0 74 1a f0 ff 4f 1c 0f
94 c0 84 c0 75 06 c9 c3 0f 1f 40 00 e8 43 fd ff ff c9 66 90 c3 <0f> 0b
eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 e8 5b fd ff ff c9
RIP  [<ffffffff8113d481>] put_page+0x31/0x50
 RSP <ffff880278e03d10>
---[ end trace 9f93fe018444fc09 ]---
Kernel panic - not syncing: Fatal exception in interrupt
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

On 01/28/2015 12:44 AM, David Vrabel wrote:
> From: Jennifer Herbert <jennifer.herbert@xxxxxxxxxx>
> 
> Use gnttab_unmap_refs_async() to wait until the mapped pages are no
> longer in use before unmapping them.
> 
> This allows blkback to use network storage which may retain refs to
> pages in queued skbs after the block I/O has completed.
> 
> Signed-off-by: Jennifer Herbert <jennifer.herbert@xxxxxxxxxx>
> Acked-by: Roger Pau Monnà <roger.pau@xxxxxxxxxx>
> Acked-by: Jens Axboe <axboe@xxxxxxxxx>
> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
> ---
>  drivers/block/xen-blkback/blkback.c |  169 
> ++++++++++++++++++++++++-----------
>  drivers/block/xen-blkback/common.h  |    3 +
>  2 files changed, 122 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c 
> b/drivers/block/xen-blkback/blkback.c
> index 908e630..2a04d34 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -47,6 +47,7 @@
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
>  #include <xen/balloon.h>
> +#include <xen/grant_table.h>
>  #include "common.h"
>  
>  /*
> @@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif,
>       atomic_dec(&blkif->persistent_gnt_in_use);
>  }
>  
> +static void free_persistent_gnts_unmap_callback(int result,
> +                                             struct gntab_unmap_queue_data 
> *data)
> +{
> +     struct completion *c = data->data;
> +
> +     /* BUG_ON used to reproduce existing behaviour,
> +        but is this the best way to deal with this? */
> +     BUG_ON(result);
> +     complete(c);
> +}
> +
>  static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root 
> *root,
>                                   unsigned int num)
>  {
> @@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif 
> *blkif, struct rb_root *root,
>       struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>       struct persistent_gnt *persistent_gnt;
>       struct rb_node *n;
> -     int ret = 0;
>       int segs_to_unmap = 0;
> +     struct gntab_unmap_queue_data unmap_data;
> +     struct completion unmap_completion;
> +
> +     init_completion(&unmap_completion);
> +
> +     unmap_data.data = &unmap_completion;
> +     unmap_data.done = &free_persistent_gnts_unmap_callback;
> +     unmap_data.pages = pages;
> +     unmap_data.unmap_ops = unmap;
> +     unmap_data.kunmap_ops = NULL;
>  
>       foreach_grant_safe(persistent_gnt, n, root, node) {
>               BUG_ON(persistent_gnt->handle ==
> @@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif 
> *blkif, struct rb_root *root,
>  
>               if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
>                       !rb_next(&persistent_gnt->node)) {
> -                     ret = gnttab_unmap_refs(unmap, NULL, pages,
> -                             segs_to_unmap);
> -                     BUG_ON(ret);
> +
> +                     unmap_data.count = segs_to_unmap;
> +                     gnttab_unmap_refs_async(&unmap_data);
> +                     wait_for_completion(&unmap_completion);
> +
>                       put_free_pages(blkif, pages, segs_to_unmap);
>                       segs_to_unmap = 0;
>               }
> @@ -653,18 +676,14 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif)
>       shrink_free_pagepool(blkif, 0 /* All */);
>  }
>  
> -/*
> - * Unmap the grant references, and also remove the M2P over-rides
> - * used in the 'pending_req'.
> - */
> -static void xen_blkbk_unmap(struct xen_blkif *blkif,
> -                            struct grant_page *pages[],
> -                            int num)
> +static unsigned int xen_blkbk_unmap_prepare(
> +     struct xen_blkif *blkif,
> +     struct grant_page **pages,
> +     unsigned int num,
> +     struct gnttab_unmap_grant_ref *unmap_ops,
> +     struct page **unmap_pages)
>  {
> -     struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> -     struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>       unsigned int i, invcount = 0;
> -     int ret;
>  
>       for (i = 0; i < num; i++) {
>               if (pages[i]->persistent_gnt != NULL) {
> @@ -674,21 +693,95 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
>               if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
>                       continue;
>               unmap_pages[invcount] = pages[i]->page;
> -             gnttab_set_unmap_op(&unmap[invcount], vaddr(pages[i]->page),
> +             gnttab_set_unmap_op(&unmap_ops[invcount], vaddr(pages[i]->page),
>                                   GNTMAP_host_map, pages[i]->handle);
>               pages[i]->handle = BLKBACK_INVALID_HANDLE;
> -             if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> -                     ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
> -                                             invcount);
> +             invcount++;
> +       }
> +
> +       return invcount;
> +}
> +
> +static void xen_blkbk_unmap_and_respond_callback(int result, struct 
> gntab_unmap_queue_data *data)
> +{
> +     struct pending_req* pending_req = (struct pending_req*) (data->data);
> +     struct xen_blkif *blkif = pending_req->blkif;
> +
> +     /* BUG_ON used to reproduce existing behaviour,
> +        but is this the best way to deal with this? */
> +     BUG_ON(result);
> +
> +     put_free_pages(blkif, data->pages, data->count);
> +     make_response(blkif, pending_req->id,
> +                   pending_req->operation, pending_req->status);
> +     free_req(blkif, pending_req);
> +     /*
> +      * Make sure the request is freed before releasing blkif,
> +      * or there could be a race between free_req and the
> +      * cleanup done in xen_blkif_free during shutdown.
> +      *
> +      * NB: The fact that we might try to wake up pending_free_wq
> +      * before drain_complete (in case there's a drain going on)
> +      * it's not a problem with our current implementation
> +      * because we can assure there's no thread waiting on
> +      * pending_free_wq if there's a drain going on, but it has
> +      * to be taken into account if the current model is changed.
> +      */
> +     if (atomic_dec_and_test(&blkif->inflight) && 
> atomic_read(&blkif->drain)) {
> +             complete(&blkif->drain_complete);
> +     }
> +     xen_blkif_put(blkif);
> +}
> +
> +static void xen_blkbk_unmap_and_respond(struct pending_req *req)
> +{
> +     struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
> +     struct xen_blkif *blkif = req->blkif;
> +     struct grant_page **pages = req->segments;
> +     unsigned int invcount;
> +
> +     invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_pages,
> +                                        req->unmap, req->unmap_pages);
> +
> +     work->data = req;
> +     work->done = xen_blkbk_unmap_and_respond_callback;
> +     work->unmap_ops = req->unmap;
> +     work->kunmap_ops = NULL;
> +     work->pages = req->unmap_pages;
> +     work->count = invcount;
> +
> +     gnttab_unmap_refs_async(&req->gnttab_unmap_data);
> +}
> +
> +
> +/*
> + * Unmap the grant references.
> + *
> + * This could accumulate ops up to the batch size to reduce the number
> + * of hypercalls, but since this is only used in error paths there's
> + * no real need.
> + */
> +static void xen_blkbk_unmap(struct xen_blkif *blkif,
> +                            struct grant_page *pages[],
> +                            int num)
> +{
> +     struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +     struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +     unsigned int invcount = 0;
> +     int ret;
> +
> +     while (num) {
> +             unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +             
> +             invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
> +                                                unmap, unmap_pages);
> +             if (invcount) {
> +                     ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, 
> invcount);
>                       BUG_ON(ret);
>                       put_free_pages(blkif, unmap_pages, invcount);
> -                     invcount = 0;
>               }
> -     }
> -     if (invcount) {
> -             ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
> -             BUG_ON(ret);
> -             put_free_pages(blkif, unmap_pages, invcount);
> +             pages += batch;
> +             num -= batch;
>       }
>  }
>  
> @@ -982,32 +1075,8 @@ static void __end_block_io_op(struct pending_req 
> *pending_req, int error)
>        * the grant references associated with 'request' and provide
>        * the proper response on the ring.
>        */
> -     if (atomic_dec_and_test(&pending_req->pendcnt)) {
> -             struct xen_blkif *blkif = pending_req->blkif;
> -
> -             xen_blkbk_unmap(blkif,
> -                             pending_req->segments,
> -                             pending_req->nr_pages);
> -             make_response(blkif, pending_req->id,
> -                           pending_req->operation, pending_req->status);
> -             free_req(blkif, pending_req);
> -             /*
> -              * Make sure the request is freed before releasing blkif,
> -              * or there could be a race between free_req and the
> -              * cleanup done in xen_blkif_free during shutdown.
> -              *
> -              * NB: The fact that we might try to wake up pending_free_wq
> -              * before drain_complete (in case there's a drain going on)
> -              * it's not a problem with our current implementation
> -              * because we can assure there's no thread waiting on
> -              * pending_free_wq if there's a drain going on, but it has
> -              * to be taken into account if the current model is changed.
> -              */
> -             if (atomic_dec_and_test(&blkif->inflight) && 
> atomic_read(&blkif->drain)) {
> -                     complete(&blkif->drain_complete);
> -             }
> -             xen_blkif_put(blkif);
> -     }
> +     if (atomic_dec_and_test(&pending_req->pendcnt))
> +             xen_blkbk_unmap_and_respond(pending_req);
>  }
>  
>  /*
> diff --git a/drivers/block/xen-blkback/common.h 
> b/drivers/block/xen-blkback/common.h
> index f65b807..cc90a84 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -350,6 +350,9 @@ struct pending_req {
>       struct grant_page       *indirect_pages[MAX_INDIRECT_PAGES];
>       struct seg_buf          seg[MAX_INDIRECT_SEGMENTS];
>       struct bio              *biolist[MAX_INDIRECT_SEGMENTS];
> +     struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS];
> +     struct page                   *unmap_pages[MAX_INDIRECT_SEGMENTS];
> +     struct gntab_unmap_queue_data gnttab_unmap_data;
>  };
>  

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.