|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
On Tue, Mar 05, 2013 at 12:01:56PM +0000, Tim Deegan wrote:
> Hi,
>
> At 12:47 -0500 on 04 Mar (1362401229), Konrad Rzeszutek Wilk wrote:
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -252,11 +252,124 @@ static long midsize_alloc_zone_pages;
> > #define MIDSIZE_ALLOC_FRAC 128
> >
> > static DEFINE_SPINLOCK(heap_lock);
> > +static long total_unclaimed_pages; /* total outstanding claims by all
> > domains */
>
> Could this field (& associated per-domain stuff) have a better name?
Yes. d->outstanding_pages ?
> AFAICT from the code, this is a count of _claimed_ pages (specifically,
> claimed but not allocated). It caused me to double-take almost every
> time I saw it used in this patch.
The word 'unclaimed' does not help either.
>
> How about outstanding_claimed_pages, or just outstanding_claims?
outstanding_claims is a great name.
>
> > unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
> > {
> > + long dom_before, dom_after, dom_claimed, sys_before, sys_after;
> > +
> > ASSERT(spin_is_locked(&d->page_alloc_lock));
> > - return d->tot_pages += pages;
> > + d->tot_pages += pages;
> > +
> > + /*
> > + * can test d->claimed_pages race-free because it can only change
> > + * if d->page_alloc_lock and heap_lock are both held, see also
> > + * domain_set_unclaimed_pages below
> > + */
> > + if ( !d->unclaimed_pages )
> > + goto out;
> > +
> > + spin_lock(&heap_lock);
> > + /* adjust domain unclaimed_pages; may not go negative */
> > + dom_before = d->unclaimed_pages;
> > + dom_after = dom_before - pages;
> > + if ( (dom_before > 0) && (dom_after < 0) )
> > + dom_claimed = 0;
>
> Why the test for dom_before > 0 ? I think it might be better to
> BUG_ON() any of these counts ever being negative, rather than to carry
> on regardless.
I put in a BUG_ON( dom_before <= 0 ) and simplified that logic to be
BUG_ON(dom_before < 0);
dom_claimed = dom_after < 0 ? 0 : dom_after;
>
> Or is this meant to be a cunning way of handling the case where
> sizeof (long) == 4 and unclaimed_pages > 2^31 ? I suspect that
> will fall foul of the undefinedness of signed overflow.
>
> > + else
> > + dom_claimed = dom_after;
> > + d->unclaimed_pages = dom_claimed;
> > + /* flag accounting bug if system unclaimed_pages would go negative */
> > + sys_before = total_unclaimed_pages;
> > + sys_after = sys_before - (dom_before - dom_claimed);
> > + BUG_ON((sys_before > 0) && (sys_after < 0));
>
> Same question here.
That can certainly be simplified to be:
+ BUG_ON(sys_after < 0);
>
> > +/*
> > + * XENMEM_claim_pages flags:
> > + * normal: claim is successful only if sufficient free pages
> > + * are available.
> > + * tmem: claim is successful only if sufficient free pages
> > + * are available and (if tmem is enabled) hypervisor
> > + * may also consider tmem "freeable" pages to satisfy the claim.
>
> The 'normal' restriction isn't actually enforced except at claim time.
> Since the gate in alloc_heap_pages is effectively:
>
> (request > free + freeable - total_reserved)
>
> later allocations (from any source) can take up all the free memory as
> long as freeable >= total_reserved).
>
> Is there a use for it? Presumably by turning on tmem you're happy with
> the idea that allocations might have to come from freeable memory?
The big thing is *might*. I put this in the code path to explain better:
/* note; The usage of tmem claim means that allocation from a guest *might*
+ * have to come from freeable memory. Using free memory is always better,
if
+ * it is available, than using freeable memory. This flag is for the use
+ * case where the toolstack cannot be constantly aware of the exact current
+ * value of free and/or freeable on each machine and is multi-machine
+ * capable. It can try/fail a "normal" claim on all machines first then,
+ * and if the normal claim on all machines fail, then "fallback" to a
+ * tmem-flag type claim.
+ *
+ * The memory claimed by a normal claim isn't enforced against "freeable
+ * pages" once the claim is successful. That's by design. Once the claim
+ * has been made, it still can take minutes before the claim is fully
+ * satisfied. Tmem can make use of the unclaimed pages during this time
+ * (to store ephemeral/freeable pages only, not persistent pages).
+ */
Here is the updated patch (with the long git commit description removed).
diff --git a/xen/common/domain.c b/xen/common/domain.c
index b360de1..90ce40b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -507,6 +507,7 @@ int domain_kill(struct domain *d)
evtchn_destroy(d);
gnttab_release_mappings(d);
tmem_destroy(d->tmem);
+ domain_set_unclaimed_pages(d, 0, 0);
d->tmem = NULL;
/* fallthrough */
case DOMDYING_dying:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b7f6619..c98e99c 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct
xen_domctl_getdomaininfo *info)
info->tot_pages = d->tot_pages;
info->max_pages = d->max_pages;
+ info->outstanding_pages = d->outstanding_pages;
info->shr_pages = atomic_read(&d->shr_pages);
info->paged_pages = atomic_read(&d->paged_pages);
info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 08550ef..aa698b1 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -712,6 +712,37 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
break;
}
+ case XENMEM_claim_pages:
+ if ( !IS_PRIV(current->domain) )
+ return -EPERM;
+
+ if ( copy_from_guest(&reservation, arg, 1) )
+ return -EFAULT;
+
+ if ( !guest_handle_is_null(reservation.extent_start) )
+ return -EINVAL;
+
+ if ( reservation.extent_order != 0 )
+ return -EINVAL;
+
+ d = rcu_lock_domain_by_id(reservation.domid);
+ if ( d == NULL )
+ return -EINVAL;
+
+ rc = domain_set_unclaimed_pages(d, reservation.nr_extents,
+ reservation.mem_flags);
+
+ rcu_unlock_domain(d);
+
+ break;
+
+ case XENMEM_get_unclaimed_pages:
+ if ( !IS_PRIV(current->domain) )
+ return -EPERM;
+
+ rc = get_outstanding_claims();
+ break;
+
default:
rc = arch_memory_op(op, arg);
break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e9fb15..73e2392 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -252,11 +252,137 @@ static long midsize_alloc_zone_pages;
#define MIDSIZE_ALLOC_FRAC 128
static DEFINE_SPINLOCK(heap_lock);
+static long outstanding_claims; /* total outstanding claims by all domains */
unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
{
+ long dom_before, dom_after, dom_claimed, sys_before, sys_after;
+
ASSERT(spin_is_locked(&d->page_alloc_lock));
- return d->tot_pages += pages;
+ d->tot_pages += pages;
+
+ /*
+ * can test d->claimed_pages race-free because it can only change
+ * if d->page_alloc_lock and heap_lock are both held, see also
+ * domain_set_unclaimed_pages below
+ */
+ if ( !d->outstanding_pages )
+ goto out;
+
+ spin_lock(&heap_lock);
+ /* adjust domain outstanding pages; may not go negative */
+ dom_before = d->outstanding_pages;
+ dom_after = dom_before - pages;
+ BUG_ON(dom_before < 0);
+ dom_claimed = dom_after < 0 ? 0 : dom_after;
+ d->outstanding_pages = dom_claimed;
+ /* flag accounting bug if system outstanding_claims would go negative */
+ sys_before = outstanding_claims;
+ sys_after = sys_before - (dom_before - dom_claimed);
+ BUG_ON(sys_after < 0);
+ outstanding_claims = sys_after;
+ spin_unlock(&heap_lock);
+
+out:
+ return d->tot_pages;
+}
+
+int domain_set_unclaimed_pages(struct domain *d, unsigned long pages,
+ unsigned long flags)
+{
+ int ret = -ENOMEM;
+ unsigned long claim, avail_pages;
+
+ /*
+ * Sanity check.
+ */
+ switch ( flags ) {
+ case XENMEM_CLAIMF_normal:
+ case XENMEM_CLAIMF_tmem:
+ if ( pages == 0 )
+ return -EINVAL;
+ break;
+ case XENMEM_CLAIMF_cancel:
+ pages = 0;
+ break;
+ default:
+ return -ENOSYS;
+ }
+
+ /*
+ * take the domain's page_alloc_lock, else all d->tot_page adjustments
+ * must always take the global heap_lock rather than only in the much
+ * rarer case that d->outstanding_pages is non-zero
+ */
+ spin_lock(&d->page_alloc_lock);
+ spin_lock(&heap_lock);
+
+ /* pages==0 means "unset" the claim. */
+ if ( pages == 0 )
+ {
+ outstanding_claims -= d->outstanding_pages;
+ d->outstanding_pages = 0;
+ ret = 0;
+ goto out;
+ }
+
+ /* only one active claim per domain please */
+ if ( d->outstanding_pages )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* disallow a claim not exceeding current tot_pages or above max_pages */
+ if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* how much memory is available? */
+ avail_pages = total_avail_pages;
+ /* note; The usage of tmem claim means that allocation from a guest *might*
+ * have to come from freeable memory. Using free memory is always better,
if
+ * it is available, than using freeable memory. This flag is for the use
+ * case where the toolstack cannot be constantly aware of the exact current
+ * value of free and/or freeable on each machine and is multi-machine
+ * capable. It can try/fail a "normal" claim on all machines first then,
+ * and if the normal claim on all machines fail, then "fallback" to a
+ * tmem-flag type claim.
+ *
+ * The memory claimed by a normal claim isn't enforced against "freeable
+ * pages" once the claim is successful. That's by design. Once the claim
+ * has been made, it still can take minutes before the claim is fully
+ * satisfied. Tmem can make use of the unclaimed pages during this time
+ * (to store ephemeral/freeable pages only, not persistent pages).
+ */
+ if ( flags & XENMEM_CLAIMF_tmem )
+ avail_pages += tmem_freeable_pages();
+ avail_pages -= outstanding_claims;
+
+ /*
+ * note, if domain has already allocated memory before making a claim
+ * then the claim must take tot_pages into account
+ */
+ claim = pages - d->tot_pages;
+ if ( claim > avail_pages )
+ goto out;
+
+ /* yay, claim fits in available memory, stake the claim, success! */
+ d->outstanding_pages = claim;
+ outstanding_claims += d->outstanding_pages;
+ ret = 0;
+
+out:
+ spin_unlock(&heap_lock);
+ spin_unlock(&d->page_alloc_lock);
+ return ret;
+}
+
+long get_outstanding_claims(void)
+{
+ return outstanding_claims;
}
static unsigned long init_node_heap(int node, unsigned long mfn,
@@ -397,7 +523,7 @@ static void __init setup_low_mem_virq(void)
static void check_low_mem_virq(void)
{
unsigned long avail_pages = total_avail_pages +
- (opt_tmem ? tmem_freeable_pages() : 0);
+ (opt_tmem ? tmem_freeable_pages() : 0) - outstanding_claims;
if ( unlikely(avail_pages <= low_mem_virq_th) )
{
@@ -466,6 +592,15 @@ static struct page_info *alloc_heap_pages(
spin_lock(&heap_lock);
/*
+ * Claimed memory is considered unavailable unless the request
+ * is made by a domain with sufficient unclaimed pages.
+ */
+ if ( (outstanding_claims + request >
+ total_avail_pages + tmem_freeable_pages()) &&
+ (d == NULL || d->outstanding_pages < request) )
+ goto not_found;
+
+ /*
* TMEM: When available memory is scarce due to tmem absorbing it, allow
* only mid-size allocations to avoid worst of fragmentation issues.
* Others try tmem pools then fail. This is a workaround until all
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index deb19db..113b8dc 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -36,7 +36,7 @@
#include "grant_table.h"
#include "hvm/save.h"
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
/*
* NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo {
uint32_t flags; /* XEN_DOMINF_* */
uint64_aligned_t tot_pages;
uint64_aligned_t max_pages;
+ uint64_aligned_t outstanding_pages;
uint64_aligned_t shr_pages;
uint64_aligned_t paged_pages;
uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 1c5ca19..5c264ca 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -68,6 +68,8 @@ struct xen_memory_reservation {
* IN: GPFN bases of extents to populate with memory
* OUT: GMFN bases of extents that were allocated
* (NB. This command also updates the mach_to_phys translation table)
+ * XENMEM_claim_pages:
+ * IN: must be zero
*/
XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
@@ -430,10 +432,51 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
/*
- * Reserve ops for future/out-of-tree "claim" patches (Oracle)
+ * Attempt to stake a claim for a domain on a quantity of pages
+ * of system RAM, but _not_ assign specific pageframes. Only
+ * arithmetic is performed so the hypercall is very fast and need
+ * not be preemptible, thus sidestepping time-of-check-time-of-use
+ * races for memory allocation. Returns 0 if the hypervisor page
+ * allocator has atomically and successfully claimed the requested
+ * number of pages, else non-zero.
+ *
+ * Any domain may have only one active claim. When sufficient memory
+ * has been allocated to resolve the claim, the claim silently expires.
+ * Claiming zero pages effectively resets any outstanding claim and
+ * is always successful.
+ *
+ * Note that a valid claim may be staked even after memory has been
+ * allocated for a domain. In this case, the claim is not incremental,
+ * i.e. if the domain's tot_pages is 3, and a claim is staked for 10,
+ * only 7 additional pages are claimed.
+ *
+ * Caller must be privileged or the hypercall fails.
*/
#define XENMEM_claim_pages 24
-#define XENMEM_get_unclaimed_pages 25
+
+/*
+ * XENMEM_claim_pages flags:
+ * normal: claim is successful only if sufficient free pages
+ * are available.
+ * tmem: claim is successful only if sufficient free pages
+ * are available and (if tmem is enabled) hypervisor
+ * may also consider tmem "freeable" pages to satisfy the claim.
+ * cancel: cancel the outstanding claim for the domain.
+ */
+#define XENMEM_CLAIMF_ignored 0
+
+#define _XENMEM_CLAIMF_normal 1
+#define XENMEM_CLAIMF_normal (1U<<_XENMEM_CLAIMF_normal)
+#define _XENMEM_CLAIMF_tmem 2
+#define XENMEM_CLAIMF_tmem (1U<<_XENMEM_CLAIMF_tmem)
+#define _XENMEM_CLAIMF_cancel 3
+#define XENMEM_CLAIMF_cancel (1U<<_XENMEM_CLAIMF_cancel)
+/*
+ * Get the number of pages currently claimed (but not yet "possessed")
+ * across all domains. The caller must be privileged but otherwise
+ * the call never fails.
+ */
+#define XENMEM_get_unclaimed_pages 25
#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 2f701f5..8a355b6 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -49,7 +49,11 @@ void free_xenheap_pages(void *v, unsigned int order);
#define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
#define free_xenheap_page(v) (free_xenheap_pages(v,0))
+/* Claim handling */
unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
+int domain_set_unclaimed_pages(
+ struct domain *d, unsigned long pages, unsigned long flags);
+long get_outstanding_claims(void);
/* Domain suballocator. These functions are *not* interrupt-safe.*/
void init_domheap_pages(paddr_t ps, paddr_t pe);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e108436..569e76e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -242,6 +242,7 @@ struct domain
struct page_list_head page_list; /* linked list */
struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
unsigned int tot_pages; /* number of pages currently possesed */
+ unsigned int outstanding_pages; /* pages claimed but not possessed */
unsigned int max_pages; /* maximum value for tot_pages */
atomic_t shr_pages; /* number of shared pages */
atomic_t paged_pages; /* number of paged-out pages */
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |