[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
On Tue, Mar 05, 2013 at 12:01:56PM +0000, Tim Deegan wrote: > Hi, > > At 12:47 -0500 on 04 Mar (1362401229), Konrad Rzeszutek Wilk wrote: > > --- a/xen/common/page_alloc.c > > +++ b/xen/common/page_alloc.c > > @@ -252,11 +252,124 @@ static long midsize_alloc_zone_pages; > > #define MIDSIZE_ALLOC_FRAC 128 > > > > static DEFINE_SPINLOCK(heap_lock); > > +static long total_unclaimed_pages; /* total outstanding claims by all > > domains */ > > Could this field (& associated per-domain stuff) have a better name? Yes. d->outstanding_pages ? > AFAICT from the code, this is a count of _claimed_ pages (specifically, > claimed but not allocated). It caused me to double-take almost every > time I saw it used in this patch. The word 'unclaimed' does not help either. > > How about outstanding_claimed_pages, or just outstanding_claims? outstanding_claims is a great name. > > > unsigned long domain_adjust_tot_pages(struct domain *d, long pages) > > { > > + long dom_before, dom_after, dom_claimed, sys_before, sys_after; > > + > > ASSERT(spin_is_locked(&d->page_alloc_lock)); > > - return d->tot_pages += pages; > > + d->tot_pages += pages; > > + > > + /* > > + * can test d->claimed_pages race-free because it can only change > > + * if d->page_alloc_lock and heap_lock are both held, see also > > + * domain_set_unclaimed_pages below > > + */ > > + if ( !d->unclaimed_pages ) > > + goto out; > > + > > + spin_lock(&heap_lock); > > + /* adjust domain unclaimed_pages; may not go negative */ > > + dom_before = d->unclaimed_pages; > > + dom_after = dom_before - pages; > > + if ( (dom_before > 0) && (dom_after < 0) ) > > + dom_claimed = 0; > > Why the test for dom_before > 0 ? I think it might be better to > BUG_ON() any of these counts ever being negative, rather than to carry > on regardless. I put in a BUG_ON( dom_before <= 0 ) and simplified that logic to be BUG_ON(dom_before < 0); dom_claimed = dom_after < 0 ? 0 : dom_after; > > Or is this meant to be a cunning way of handling the case where > sizeof (long) == 4 and unclaimed_pages > 2^31 ? I suspect that > will fall foul of the undefinedness of signed overflow. > > > + else > > + dom_claimed = dom_after; > > + d->unclaimed_pages = dom_claimed; > > + /* flag accounting bug if system unclaimed_pages would go negative */ > > + sys_before = total_unclaimed_pages; > > + sys_after = sys_before - (dom_before - dom_claimed); > > + BUG_ON((sys_before > 0) && (sys_after < 0)); > > Same question here. That can certainly be simplified to be: + BUG_ON(sys_after < 0); > > > +/* > > + * XENMEM_claim_pages flags: > > + * normal: claim is successful only if sufficient free pages > > + * are available. > > + * tmem: claim is successful only if sufficient free pages > > + * are available and (if tmem is enabled) hypervisor > > + * may also consider tmem "freeable" pages to satisfy the claim. > > The 'normal' restriction isn't actually enforced except at claim time. > Since the gate in alloc_heap_pages is effectively: > > (request > free + freeable - total_reserved) > > later allocations (from any source) can take up all the free memory as > long as freeable >= total_reserved). > > Is there a use for it? Presumably by turning on tmem you're happy with > the idea that allocations might have to come from freeable memory? The big thing is *might*. I put this in the code path to explain better: /* note; The usage of tmem claim means that allocation from a guest *might* + * have to come from freeable memory. Using free memory is always better, if + * it is available, than using freeable memory. This flag is for the use + * case where the toolstack cannot be constantly aware of the exact current + * value of free and/or freeable on each machine and is multi-machine + * capable. It can try/fail a "normal" claim on all machines first then, + * and if the normal claim on all machines fail, then "fallback" to a + * tmem-flag type claim. + * + * The memory claimed by a normal claim isn't enforced against "freeable + * pages" once the claim is successful. That's by design. Once the claim + * has been made, it still can take minutes before the claim is fully + * satisfied. Tmem can make use of the unclaimed pages during this time + * (to store ephemeral/freeable pages only, not persistent pages). + */ Here is the updated patch (with the long git commit description removed). diff --git a/xen/common/domain.c b/xen/common/domain.c index b360de1..90ce40b 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -507,6 +507,7 @@ int domain_kill(struct domain *d) evtchn_destroy(d); gnttab_release_mappings(d); tmem_destroy(d->tmem); + domain_set_unclaimed_pages(d, 0, 0); d->tmem = NULL; /* fallthrough */ case DOMDYING_dying: diff --git a/xen/common/domctl.c b/xen/common/domctl.c index b7f6619..c98e99c 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info) info->tot_pages = d->tot_pages; info->max_pages = d->max_pages; + info->outstanding_pages = d->outstanding_pages; info->shr_pages = atomic_read(&d->shr_pages); info->paged_pages = atomic_read(&d->paged_pages); info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info)); diff --git a/xen/common/memory.c b/xen/common/memory.c index 08550ef..aa698b1 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -712,6 +712,37 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case XENMEM_claim_pages: + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + if ( copy_from_guest(&reservation, arg, 1) ) + return -EFAULT; + + if ( !guest_handle_is_null(reservation.extent_start) ) + return -EINVAL; + + if ( reservation.extent_order != 0 ) + return -EINVAL; + + d = rcu_lock_domain_by_id(reservation.domid); + if ( d == NULL ) + return -EINVAL; + + rc = domain_set_unclaimed_pages(d, reservation.nr_extents, + reservation.mem_flags); + + rcu_unlock_domain(d); + + break; + + case XENMEM_get_unclaimed_pages: + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + rc = get_outstanding_claims(); + break; + default: rc = arch_memory_op(op, arg); break; diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 9e9fb15..73e2392 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -252,11 +252,137 @@ static long midsize_alloc_zone_pages; #define MIDSIZE_ALLOC_FRAC 128 static DEFINE_SPINLOCK(heap_lock); +static long outstanding_claims; /* total outstanding claims by all domains */ unsigned long domain_adjust_tot_pages(struct domain *d, long pages) { + long dom_before, dom_after, dom_claimed, sys_before, sys_after; + ASSERT(spin_is_locked(&d->page_alloc_lock)); - return d->tot_pages += pages; + d->tot_pages += pages; + + /* + * can test d->claimed_pages race-free because it can only change + * if d->page_alloc_lock and heap_lock are both held, see also + * domain_set_unclaimed_pages below + */ + if ( !d->outstanding_pages ) + goto out; + + spin_lock(&heap_lock); + /* adjust domain outstanding pages; may not go negative */ + dom_before = d->outstanding_pages; + dom_after = dom_before - pages; + BUG_ON(dom_before < 0); + dom_claimed = dom_after < 0 ? 0 : dom_after; + d->outstanding_pages = dom_claimed; + /* flag accounting bug if system outstanding_claims would go negative */ + sys_before = outstanding_claims; + sys_after = sys_before - (dom_before - dom_claimed); + BUG_ON(sys_after < 0); + outstanding_claims = sys_after; + spin_unlock(&heap_lock); + +out: + return d->tot_pages; +} + +int domain_set_unclaimed_pages(struct domain *d, unsigned long pages, + unsigned long flags) +{ + int ret = -ENOMEM; + unsigned long claim, avail_pages; + + /* + * Sanity check. + */ + switch ( flags ) { + case XENMEM_CLAIMF_normal: + case XENMEM_CLAIMF_tmem: + if ( pages == 0 ) + return -EINVAL; + break; + case XENMEM_CLAIMF_cancel: + pages = 0; + break; + default: + return -ENOSYS; + } + + /* + * take the domain's page_alloc_lock, else all d->tot_page adjustments + * must always take the global heap_lock rather than only in the much + * rarer case that d->outstanding_pages is non-zero + */ + spin_lock(&d->page_alloc_lock); + spin_lock(&heap_lock); + + /* pages==0 means "unset" the claim. */ + if ( pages == 0 ) + { + outstanding_claims -= d->outstanding_pages; + d->outstanding_pages = 0; + ret = 0; + goto out; + } + + /* only one active claim per domain please */ + if ( d->outstanding_pages ) + { + ret = -EINVAL; + goto out; + } + + /* disallow a claim not exceeding current tot_pages or above max_pages */ + if ( (pages <= d->tot_pages) || (pages > d->max_pages) ) + { + ret = -EINVAL; + goto out; + } + + /* how much memory is available? */ + avail_pages = total_avail_pages; + /* note; The usage of tmem claim means that allocation from a guest *might* + * have to come from freeable memory. Using free memory is always better, if + * it is available, than using freeable memory. This flag is for the use + * case where the toolstack cannot be constantly aware of the exact current + * value of free and/or freeable on each machine and is multi-machine + * capable. It can try/fail a "normal" claim on all machines first then, + * and if the normal claim on all machines fail, then "fallback" to a + * tmem-flag type claim. + * + * The memory claimed by a normal claim isn't enforced against "freeable + * pages" once the claim is successful. That's by design. Once the claim + * has been made, it still can take minutes before the claim is fully + * satisfied. Tmem can make use of the unclaimed pages during this time + * (to store ephemeral/freeable pages only, not persistent pages). + */ + if ( flags & XENMEM_CLAIMF_tmem ) + avail_pages += tmem_freeable_pages(); + avail_pages -= outstanding_claims; + + /* + * note, if domain has already allocated memory before making a claim + * then the claim must take tot_pages into account + */ + claim = pages - d->tot_pages; + if ( claim > avail_pages ) + goto out; + + /* yay, claim fits in available memory, stake the claim, success! */ + d->outstanding_pages = claim; + outstanding_claims += d->outstanding_pages; + ret = 0; + +out: + spin_unlock(&heap_lock); + spin_unlock(&d->page_alloc_lock); + return ret; +} + +long get_outstanding_claims(void) +{ + return outstanding_claims; } static unsigned long init_node_heap(int node, unsigned long mfn, @@ -397,7 +523,7 @@ static void __init setup_low_mem_virq(void) static void check_low_mem_virq(void) { unsigned long avail_pages = total_avail_pages + - (opt_tmem ? tmem_freeable_pages() : 0); + (opt_tmem ? tmem_freeable_pages() : 0) - outstanding_claims; if ( unlikely(avail_pages <= low_mem_virq_th) ) { @@ -466,6 +592,15 @@ static struct page_info *alloc_heap_pages( spin_lock(&heap_lock); /* + * Claimed memory is considered unavailable unless the request + * is made by a domain with sufficient unclaimed pages. + */ + if ( (outstanding_claims + request > + total_avail_pages + tmem_freeable_pages()) && + (d == NULL || d->outstanding_pages < request) ) + goto not_found; + + /* * TMEM: When available memory is scarce due to tmem absorbing it, allow * only mid-size allocations to avoid worst of fragmentation issues. * Others try tmem pools then fail. This is a workaround until all diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index deb19db..113b8dc 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -36,7 +36,7 @@ #include "grant_table.h" #include "hvm/save.h" -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008 +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009 /* * NB. xen_domctl.domain is an IN/OUT parameter for this operation. @@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo { uint32_t flags; /* XEN_DOMINF_* */ uint64_aligned_t tot_pages; uint64_aligned_t max_pages; + uint64_aligned_t outstanding_pages; uint64_aligned_t shr_pages; uint64_aligned_t paged_pages; uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */ diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 1c5ca19..5c264ca 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -68,6 +68,8 @@ struct xen_memory_reservation { * IN: GPFN bases of extents to populate with memory * OUT: GMFN bases of extents that were allocated * (NB. This command also updates the mach_to_phys translation table) + * XENMEM_claim_pages: + * IN: must be zero */ XEN_GUEST_HANDLE(xen_pfn_t) extent_start; @@ -430,10 +432,51 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t; DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); /* - * Reserve ops for future/out-of-tree "claim" patches (Oracle) + * Attempt to stake a claim for a domain on a quantity of pages + * of system RAM, but _not_ assign specific pageframes. Only + * arithmetic is performed so the hypercall is very fast and need + * not be preemptible, thus sidestepping time-of-check-time-of-use + * races for memory allocation. Returns 0 if the hypervisor page + * allocator has atomically and successfully claimed the requested + * number of pages, else non-zero. + * + * Any domain may have only one active claim. When sufficient memory + * has been allocated to resolve the claim, the claim silently expires. + * Claiming zero pages effectively resets any outstanding claim and + * is always successful. + * + * Note that a valid claim may be staked even after memory has been + * allocated for a domain. In this case, the claim is not incremental, + * i.e. if the domain's tot_pages is 3, and a claim is staked for 10, + * only 7 additional pages are claimed. + * + * Caller must be privileged or the hypercall fails. */ #define XENMEM_claim_pages 24 -#define XENMEM_get_unclaimed_pages 25 + +/* + * XENMEM_claim_pages flags: + * normal: claim is successful only if sufficient free pages + * are available. + * tmem: claim is successful only if sufficient free pages + * are available and (if tmem is enabled) hypervisor + * may also consider tmem "freeable" pages to satisfy the claim. + * cancel: cancel the outstanding claim for the domain. + */ +#define XENMEM_CLAIMF_ignored 0 + +#define _XENMEM_CLAIMF_normal 1 +#define XENMEM_CLAIMF_normal (1U<<_XENMEM_CLAIMF_normal) +#define _XENMEM_CLAIMF_tmem 2 +#define XENMEM_CLAIMF_tmem (1U<<_XENMEM_CLAIMF_tmem) +#define _XENMEM_CLAIMF_cancel 3 +#define XENMEM_CLAIMF_cancel (1U<<_XENMEM_CLAIMF_cancel) +/* + * Get the number of pages currently claimed (but not yet "possessed") + * across all domains. The caller must be privileged but otherwise + * the call never fails. + */ +#define XENMEM_get_unclaimed_pages 25 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index 2f701f5..8a355b6 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -49,7 +49,11 @@ void free_xenheap_pages(void *v, unsigned int order); #define alloc_xenheap_page() (alloc_xenheap_pages(0,0)) #define free_xenheap_page(v) (free_xenheap_pages(v,0)) +/* Claim handling */ unsigned long domain_adjust_tot_pages(struct domain *d, long pages); +int domain_set_unclaimed_pages( + struct domain *d, unsigned long pages, unsigned long flags); +long get_outstanding_claims(void); /* Domain suballocator. These functions are *not* interrupt-safe.*/ void init_domheap_pages(paddr_t ps, paddr_t pe); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index e108436..569e76e 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -242,6 +242,7 @@ struct domain struct page_list_head page_list; /* linked list */ struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */ unsigned int tot_pages; /* number of pages currently possesed */ + unsigned int outstanding_pages; /* pages claimed but not possessed */ unsigned int max_pages; /* maximum value for tot_pages */ atomic_t shr_pages; /* number of shared pages */ atomic_t paged_pages; /* number of paged-out pages */ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |