[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Xen backend support for paged out grant targets.
On Fri, Aug 31, 2012 at 03:33:17PM -0400, Andres Lagar-Cavilla wrote: > But msleep will wind up calling schedule(). We definitely cannot afford to > pin down a dom0 vcpu when the pager itself is in dom0. Duh! Somehow I was thinking it just do a dumb loop. Maybe I am thinking about a different type of early XYZsleep variant. > > Iiuc.... > > Andres > On Aug 31, 2012 12:55 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> > wrote: > > > On Fri, Aug 31, 2012 at 11:42:32AM -0400, Andres Lagar-Cavilla wrote: > > > Actually acted upon your feedback ipso facto: > > > > > > commit d5fab912caa1f0cf6be0a6773f502d3417a207b6 > > > Author: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx> > > > Date: Sun Aug 26 09:45:57 2012 -0400 > > > > > > Xen backend support for paged out grant targets. > > > > > > Since Xen-4.2, hvm domains may have portions of their memory paged > > out. When a > > > foreign domain (such as dom0) attempts to map these frames, the map > > will > > > initially fail. The hypervisor returns a suitable errno, and kicks an > > > asynchronous page-in operation carried out by a helper. The foreign > > domain is > > > expected to retry the mapping operation until it eventually > > succeeds. The > > > foreign domain is not put to sleep because itself could be the one > > running the > > > pager assist (typical scenario for dom0). > > > > > > This patch adds support for this mechanism for backend drivers using > > grant > > > mapping and copying operations. Specifically, this covers the > > blkback and > > > gntdev drivers (which map foregin grants), and the netback driver > > (which copies > > > foreign grants). > > > > > > * Add GNTST_eagain, already exposed by Xen, to the grant interface. > > > * Add a retry method for grants that fail with GNTST_eagain (i.e. > > because the > > > target foregin frame is paged out). > > > * Insert hooks with appropriate macro decorators in the > > aforementioned drivers. > > > > > > The retry loop is only invoked if the grant operation status is > > GNTST_eagain. > > > It guarantees to leave a new status code different from > > GNTST_eagain. Any other > > > status code results in identical code execution as before. > > > > > > The retry loop performs 256 attempts with increasing time intervals > > through a > > > 32 second period. It uses msleep to yield while waiting for the next > > retry. > > > > > > > > > Would it make sense to yield to other processes (so call schedule)? Or > > perhaps have this in a workqueue ? > > > > I mean the 'msleep' just looks like a hack. .. 32 seconds of doing > > 'msleep' on 1VCPU dom0 could trigger the watchdog I think? > > > > > V2 after feedback from David Vrabel: > > > * Explicit MAX_DELAY instead of wrap-around delay into zero > > > * Abstract GNTST_eagain check into core grant table code for netback > > module. > > > > > > Signed-off-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx> > > > > > > diff --git a/drivers/net/xen-netback/netback.c > > b/drivers/net/xen-netback/netback.c > > > index 682633b..5610fd8 100644 > > > --- a/drivers/net/xen-netback/netback.c > > > +++ b/drivers/net/xen-netback/netback.c > > > @@ -635,9 +635,7 @@ static void xen_netbk_rx_action(struct xen_netbk > > *netbk) > > > return; > > > > > > BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op)); > > > - ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, > > &netbk->grant_copy_op, > > > - npo.copy_prod); > > > - BUG_ON(ret != 0); > > > + gnttab_batch_copy_no_eagain(netbk->grant_copy_op, npo.copy_prod); > > > > > > while ((skb = __skb_dequeue(&rxq)) != NULL) { > > > sco = (struct skb_cb_overlay *)skb->cb; > > > @@ -1460,18 +1458,15 @@ static void xen_netbk_tx_submit(struct xen_netbk > > *netbk) > > > static void xen_netbk_tx_action(struct xen_netbk *netbk) > > > { > > > unsigned nr_gops; > > > - int ret; > > > > > > nr_gops = xen_netbk_tx_build_gops(netbk); > > > > > > if (nr_gops == 0) > > > return; > > > - ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, > > > - netbk->tx_copy_ops, nr_gops); > > > - BUG_ON(ret); > > > > > > - xen_netbk_tx_submit(netbk); > > > + gnttab_batch_copy_no_eagain(netbk->tx_copy_ops, nr_gops); > > > > > > + xen_netbk_tx_submit(netbk); > > > } > > > > > > static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 > > pending_idx) > > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > > > index eea81cf..96543b2 100644 > > > --- a/drivers/xen/grant-table.c > > > +++ b/drivers/xen/grant-table.c > > > @@ -38,6 +38,7 @@ > > > #include <linux/vmalloc.h> > > > #include <linux/uaccess.h> > > > #include <linux/io.h> > > > +#include <linux/delay.h> > > > #include <linux/hardirq.h> > > > > > > #include <xen/xen.h> > > > @@ -823,6 +824,26 @@ unsigned int gnttab_max_grant_frames(void) > > > } > > > EXPORT_SYMBOL_GPL(gnttab_max_grant_frames); > > > > > > +#define MAX_DELAY 256 > > > +void > > > +gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t *status, > > > + const char *func) > > > +{ > > > + unsigned delay = 1; > > > + > > > + do { > > > + BUG_ON(HYPERVISOR_grant_table_op(cmd, gop, 1)); > > > + if (*status == GNTST_eagain) > > > + msleep(delay++); > > > + } while ((*status == GNTST_eagain) && (delay < MAX_DELAY)); > > > + > > > + if (delay >= MAX_DELAY) { > > > + printk(KERN_ERR "%s: %s eagain grant\n", func, > > current->comm); > > > + *status = GNTST_bad_page; > > > + } > > > +} > > > +EXPORT_SYMBOL_GPL(gnttab_retry_eagain_gop); > > > + > > > int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, > > > struct gnttab_map_grant_ref *kmap_ops, > > > struct page **pages, unsigned int count) > > > @@ -836,6 +857,11 @@ int gnttab_map_refs(struct gnttab_map_grant_ref > > *map_ops, > > > if (ret) > > > return ret; > > > > > > + /* Retry eagain maps */ > > > + for (i = 0; i < count; i++) > > > + if (map_ops[i].status == GNTST_eagain) > > > + gnttab_retry_eagain_map(map_ops + i); > > > + > > > if (xen_feature(XENFEAT_auto_translated_physmap)) > > > return ret; > > > > > > diff --git a/drivers/xen/xenbus/xenbus_client.c > > b/drivers/xen/xenbus/xenbus_client.c > > > index b3e146e..749f6a3 100644 > > > --- a/drivers/xen/xenbus/xenbus_client.c > > > +++ b/drivers/xen/xenbus/xenbus_client.c > > > @@ -490,8 +490,7 @@ static int xenbus_map_ring_valloc_pv(struct > > xenbus_device *dev, > > > > > > op.host_addr = arbitrary_virt_to_machine(pte).maddr; > > > > > > - if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) > > > - BUG(); > > > + gnttab_map_grant_no_eagain(&op); > > > > > > if (op.status != GNTST_okay) { > > > free_vm_area(area); > > > @@ -572,8 +571,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int > > gnt_ref, > > > gnttab_set_map_op(&op, (unsigned long)vaddr, GNTMAP_host_map, > > gnt_ref, > > > dev->otherend_id); > > > > > > - if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) > > > - BUG(); > > > + gnttab_map_grant_no_eagain(&op); > > > > > > if (op.status != GNTST_okay) { > > > xenbus_dev_fatal(dev, op.status, > > > diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h > > > index 11e27c3..2fecfab 100644 > > > --- a/include/xen/grant_table.h > > > +++ b/include/xen/grant_table.h > > > @@ -43,6 +43,7 @@ > > > #include <xen/interface/grant_table.h> > > > > > > #include <asm/xen/hypervisor.h> > > > +#include <asm/xen/hypercall.h> > > > > > > #include <xen/features.h> > > > > > > @@ -183,6 +184,43 @@ unsigned int gnttab_max_grant_frames(void); > > > > > > #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr)) > > > > > > +/* Retry a grant map/copy operation when the hypervisor returns > > GNTST_eagain. > > > + * This is typically due to paged out target frames. > > > + * Generic entry-point, use macro decorators below for specific grant > > > + * operations. > > > + * Will retry for 1, 2, ... 255 ms, i.e. 256 times during 32 seconds. > > > + * Return value in *status guaranteed to no longer be GNTST_eagain. */ > > > +void gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t > > *status, > > > + const char *func); > > > + > > > +#define gnttab_retry_eagain_map(_gop) \ > > > + gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, (_gop), \ > > > + &(_gop)->status, __func__) > > > + > > > +#define gnttab_retry_eagain_copy(_gop) \ > > > + gnttab_retry_eagain_gop(GNTTABOP_copy, (_gop), \ > > > + &(_gop)->status, __func__) > > > + > > > +#define gnttab_map_grant_no_eagain(_gop) > > \ > > > +do { > > \ > > > + if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, (_gop), 1)) > > \ > > > + BUG(); > > \ > > > + if ((_gop)->status == GNTST_eagain) > > \ > > > + gnttab_retry_eagain_map((_gop)); > > \ > > > +} while(0) > > > + > > > +static inline void > > > +gnttab_batch_copy_no_eagain(struct gnttab_copy *batch, unsigned count) > > > +{ > > > + unsigned i; > > > + struct gnttab_copy *op; > > > + > > > + BUG_ON(HYPERVISOR_grant_table_op(GNTTABOP_copy, batch, count)); > > > + for (i = 0, op = batch; i < count; i++, op++) > > > + if (op->status == GNTST_eagain) > > > + gnttab_retry_eagain_copy(op); > > > +} > > > + > > > int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, > > > struct gnttab_map_grant_ref *kmap_ops, > > > struct page **pages, unsigned int count); > > > diff --git a/include/xen/interface/grant_table.h > > b/include/xen/interface/grant_table.h > > > index 7da811b..66cb734 100644 > > > --- a/include/xen/interface/grant_table.h > > > +++ b/include/xen/interface/grant_table.h > > > @@ -520,6 +520,7 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version); > > > #define GNTST_permission_denied (-8) /* Not enough privilege for > > operation. */ > > > #define GNTST_bad_page (-9) /* Specified page was invalid for > > op. */ > > > #define GNTST_bad_copy_arg (-10) /* copy arguments cross page > > boundary */ > > > +#define GNTST_eagain (-12) /* Retry. > > */ > > > > > > #define GNTTABOP_error_msgs { \ > > > "okay", \ > > > @@ -533,6 +534,7 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version); > > > "permission denied", \ > > > "bad page", \ > > > "copy arguments cross page boundary" \ > > > + "retry" \ > > > } > > > > > > #endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */ > > > > > > > > > On Aug 31, 2012, at 10:45 AM, Andres Lagar-Cavilla wrote: > > > > > > > > > > > On Aug 31, 2012, at 10:32 AM, David Vrabel wrote: > > > > > > > >> On 27/08/12 17:51, andres@xxxxxxxxxxxxxxxx wrote: > > > >>> From: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx> > > > >>> > > > >>> Since Xen-4.2, hvm domains may have portions of their memory paged > > out. When a > > > >>> foreign domain (such as dom0) attempts to map these frames, the map > > will > > > >>> initially fail. The hypervisor returns a suitable errno, and kicks an > > > >>> asynchronous page-in operation carried out by a helper. The foreign > > domain is > > > >>> expected to retry the mapping operation until it eventually > > succeeds. The > > > >>> foreign domain is not put to sleep because itself could be the one > > running the > > > >>> pager assist (typical scenario for dom0). > > > >>> > > > >>> This patch adds support for this mechanism for backend drivers using > > grant > > > >>> mapping and copying operations. Specifically, this covers the > > blkback and > > > >>> gntdev drivers (which map foregin grants), and the netback driver > > (which copies > > > >>> foreign grants). > > > >>> > > > >>> * Add GNTST_eagain, already exposed by Xen, to the grant interface. > > > >>> * Add a retry method for grants that fail with GNTST_eagain (i.e. > > because the > > > >>> target foregin frame is paged out). > > > >>> * Insert hooks with appropriate macro decorators in the > > aforementioned drivers. > > > >> > > > >> I think you should implement wrappers around > > HYPERVISOR_grant_table_op() > > > >> have have the wrapper do the retries instead of every backend having > > to > > > >> check for EAGAIN and issue the retries itself. Similar to the > > > >> gnttab_map_grant_no_eagain() function you've already added. > > > >> > > > >> Why do some operations not retry anyway? > > > > > > > > All operations retry. The reason why I could not make it as elegant as > > you suggest is because grant operations are submitted in batches and their > > status(es?) later checked individually elsewhere. This is the case for > > netback. Note that both blkback and gntdev use a more linear structure with > > the gnttab_map_refs helper, which allows me to hide all the retry gore from > > those drivers into grant table code. Likewise for xenbus ring mapping. > > > > > > > > In summary, outside of core grant table code, only the netback driver > > needs to check explicitly for retries, due to its > > batch-copy-delayed-per-slot-check structure. > > > > > > > >> > > > >>> +void > > > >>> +gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t > > *status, > > > >>> + const char *func) > > > >>> +{ > > > >>> + u8 delay = 1; > > > >>> + > > > >>> + do { > > > >>> + BUG_ON(HYPERVISOR_grant_table_op(cmd, gop, 1)); > > > >>> + if (*status == GNTST_eagain) > > > >>> + msleep(delay++); > > > >>> + } while ((*status == GNTST_eagain) && delay); > > > >> > > > >> Terminating the loop when delay wraps is a bit subtle. Why not make > > > >> delay unsigned and check delay <= MAX_DELAY? > > > > Good idea (MAX_DELAY == 256). I'd like to get Konrad's feedback before > > a re-spin. > > > > > > > >> > > > >> Would it be sensible to ramp the delay faster? Perhaps double each > > > >> iteration with a maximum possible delay of e.g., 256 ms. > > > > Generally speaking we've never seen past three retries. I am open to > > changing the algorithm but there is a significant possibility it won't > > matter at all. > > > > > > > >> > > > >>> +#define gnttab_map_grant_no_eagain(_gop) > > \ > > > >>> +do { > > \ > > > >>> + if ( HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, (_gop), > > 1)) \ > > > >>> + BUG(); > > \ > > > >>> + if ((_gop)->status == GNTST_eagain) > > \ > > > >>> + gnttab_retry_eagain_map((_gop)); > > \ > > > >>> +} while(0) > > > >> > > > >> Inline functions, please. > > > > > > > > I want to retain the original context for debugging. Eventually we > > print __func__ if things go wrong. > > > > > > > > Thanks, great feedback > > > > Andres > > > > > > > >> > > > >> David > > > > > > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |