Xen project Mailing List

Re: [Xen-devel] [PATCH 3 of 5] Rework locking in the PoD layer

On Wed, 2012-02-01 at 19:56 +0000, Andres Lagar-Cavilla wrote: > @@ -114,15 +114,20 @@ p2m_pod_cache_add(struct p2m_domain *p2m > unmap_domain_page(b); > } > > + /* First, take all pages off the domain list */ > lock_page_alloc(p2m); > - > - /* First, take all pages off the domain list */ > for(i=0; i < 1 << order ; i++) > { > p = page + i; > page_list_del(p, &d->page_list); > } > > + /* Ensure that the PoD cache has never been emptied. > + * This may cause "zombie domains" since the page will never be freed. */ > + BUG_ON( d->arch.relmem != RELMEM_not_started ); > + > + unlock_page_alloc(p2m); > + This assert should stay where it is. The idea is to verify after-the-fact that the page_list_add_tail()s *have not* raced with emptying the PoD cache. Having the assert before cannot guarantee that they *will not* race with emptying the PoD cache. Alternately, if we're positive that condition can never happen again, we should just remove the BUG_ON(). If I recall correctly, I put this in here because something ended up calling cache_add after empty_cache(), potentially with the p2m lock already held; that's why I put the BUG_ON() there to begin with. > @@ -922,6 +929,12 @@ p2m_pod_emergency_sweep(struct p2m_domai > limit = (start > POD_SWEEP_LIMIT) ? (start - POD_SWEEP_LIMIT) : 0; > > /* FIXME: Figure out how to avoid superpages */ > + /* NOTE: Promote to globally locking the p2m. This will get complicated > + * in a fine-grained scenario. Even if we're to lock each gfn > + * individually we must be careful about recursion limits and > + * POD_SWEEP_STRIDE. This is why we don't enforce deadlock constraints > + * between p2m and pod locks */ > + p2m_lock(p2m); > for ( i=p2m->pod.reclaim_single; i > 0 ; i-- ) > { > p2m_access_t a; > @@ -940,7 +953,7 @@ p2m_pod_emergency_sweep(struct p2m_domai > /* Stop if we're past our limit and we have found *something*. > * > * NB that this is a zero-sum game; we're increasing our cache size > - * by re-increasing our 'debt'. Since we hold the p2m lock, > + * by re-increasing our 'debt'. Since we hold the pod lock, > * (entry_count - count) must remain the same. */ > if ( p2m->pod.count > 0 && i < limit ) > break; > @@ -949,6 +962,7 @@ p2m_pod_emergency_sweep(struct p2m_domai > if ( j ) > p2m_pod_zero_check(p2m, gfns, j); > > + p2m_unlock(p2m); > p2m->pod.reclaim_single = i ? i - 1 : i; > > } > @@ -965,8 +979,9 @@ p2m_pod_demand_populate(struct p2m_domai > int i; > > ASSERT(gfn_locked_by_me(p2m, gfn)); > + pod_lock(p2m); > > - /* This check is done with the p2m lock held. This will make sure that > + /* This check is done with the pod lock held. This will make sure that > * even if d->is_dying changes under our feet, p2m_pod_empty_cache() > * won't start until we're done. */ > if ( unlikely(d->is_dying) ) > @@ -977,6 +992,7 @@ p2m_pod_demand_populate(struct p2m_domai > * 1GB region to 2MB chunks for a retry. */ > if ( order == PAGE_ORDER_1G ) > { > + pod_unlock(p2m); > gfn_aligned = (gfn >> order) << order; > /* Note that we are supposed to call set_p2m_entry() 512 times to > * split 1GB into 512 2MB pages here. But We only do once here > because > @@ -1000,11 +1016,15 @@ p2m_pod_demand_populate(struct p2m_domai > > /* If we're low, start a sweep */ > if ( order == PAGE_ORDER_2M && page_list_empty(&p2m->pod.super) ) > + /* Note that sweeps scan other ranges in the p2m. In an scenario > + * in which p2m locks are order-enforced wrt pod lock and p2m > + * locks are fine grained, this will result in deadlock panic */ > p2m_pod_emergency_sweep_super(p2m); > > if ( page_list_empty(&p2m->pod.single) && > ( ( order == PAGE_ORDER_4K ) > || (order == PAGE_ORDER_2M && > page_list_empty(&p2m->pod.super) ) ) ) > + /* Same comment regarding deadlock applies */ > p2m_pod_emergency_sweep(p2m); > } > Regarding locking on emergency sweeps: I think it should suffice if we could do the equivalent of a spin_trylock() on each gpfn, and just move on (not checking that gfn) if it fails. What do you think? Other than that, I don't see anything wrong with this locking at first glance; but it's complicated enough that I don't think I've quite grokked it yet. I'll look at it again tomorrow and see if things are more clear. :-) -George > @@ -1012,8 +1032,6 @@ p2m_pod_demand_populate(struct p2m_domai > if ( q == p2m_guest && gfn > p2m->pod.max_guest ) > p2m->pod.max_guest = gfn; > > - lock_page_alloc(p2m); > - > if ( p2m->pod.count == 0 ) > goto out_of_memory; > > @@ -1026,8 +1044,6 @@ p2m_pod_demand_populate(struct p2m_domai > > BUG_ON((mfn_x(mfn) & ((1 << order)-1)) != 0); > > - unlock_page_alloc(p2m); > - > gfn_aligned = (gfn >> order) << order; > > set_p2m_entry(p2m, gfn_aligned, mfn, order, p2m_ram_rw, > p2m->default_access); > @@ -1038,7 +1054,7 @@ p2m_pod_demand_populate(struct p2m_domai > paging_mark_dirty(d, mfn_x(mfn) + i); > } > > - p2m->pod.entry_count -= (1 << order); /* Lock: p2m */ > + p2m->pod.entry_count -= (1 << order); > BUG_ON(p2m->pod.entry_count < 0); > > if ( tb_init_done ) > @@ -1056,20 +1072,24 @@ p2m_pod_demand_populate(struct p2m_domai > __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t); > } > > + pod_unlock(p2m); > return 0; > out_of_memory: > - unlock_page_alloc(p2m); > + pod_unlock(p2m); > > printk("%s: Out of populate-on-demand memory! tot_pages %" PRIu32 " > pod_entries %" PRIi32 "\n", > __func__, d->tot_pages, p2m->pod.entry_count); > domain_crash(d); > out_fail: > + pod_unlock(p2m); > return -1; > remap_and_retry: > BUG_ON(order != PAGE_ORDER_2M); > - unlock_page_alloc(p2m); > + pod_unlock(p2m); > > /* Remap this 2-meg region in singleton chunks */ > + /* NOTE: In a p2m fine-grained lock scenario this might > + * need promoting the gfn lock from gfn->2M superpage */ > gfn_aligned = (gfn>>order)<<order; > for(i=0; i<(1<<order); i++) > set_p2m_entry(p2m, gfn_aligned+i, _mfn(0), PAGE_ORDER_4K, > @@ -1137,9 +1157,11 @@ guest_physmap_mark_populate_on_demand(st > rc = -EINVAL; > else > { > - p2m->pod.entry_count += 1 << order; /* Lock: p2m */ > + pod_lock(p2m); > + p2m->pod.entry_count += 1 << order; > p2m->pod.entry_count -= pod_count; > BUG_ON(p2m->pod.entry_count < 0); > + pod_unlock(p2m); > } > > gfn_unlock(p2m, gfn, order); > diff -r fb9c06df05f2 -r 56ceab0118cb xen/arch/x86/mm/p2m-pt.c > --- a/xen/arch/x86/mm/p2m-pt.c > +++ b/xen/arch/x86/mm/p2m-pt.c > @@ -954,6 +954,7 @@ long p2m_pt_audit_p2m(struct p2m_domain > struct domain *d = p2m->domain; > > ASSERT(p2m_locked_by_me(p2m)); > + ASSERT(pod_locked_by_me(p2m)); > > test_linear = ( (d == current->domain) > && !pagetable_is_null(current->arch.monitor_table) ); > diff -r fb9c06df05f2 -r 56ceab0118cb xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -72,6 +72,7 @@ boolean_param("hap_2mb", opt_hap_2mb); > static void p2m_initialise(struct domain *d, struct p2m_domain *p2m) > { > mm_lock_init(&p2m->lock); > + mm_lock_init(&p2m->pod.lock); > INIT_LIST_HEAD(&p2m->np2m_list); > INIT_PAGE_LIST_HEAD(&p2m->pages); > INIT_PAGE_LIST_HEAD(&p2m->pod.super); > @@ -587,8 +588,10 @@ guest_physmap_add_entry(struct domain *d > rc = -EINVAL; > else > { > - p2m->pod.entry_count -= pod_count; /* Lock: p2m */ > + pod_lock(p2m); > + p2m->pod.entry_count -= pod_count; > BUG_ON(p2m->pod.entry_count < 0); > + pod_unlock(p2m); > } > } > > @@ -1372,6 +1375,7 @@ p2m_flush_table(struct p2m_domain *p2m) > /* "Host" p2m tables can have shared entries &c that need a bit more > * care when discarding them */ > ASSERT(p2m_is_nestedp2m(p2m)); > + /* Nested p2m's do not do pod, hence the asserts (and no pod lock)*/ > ASSERT(page_list_empty(&p2m->pod.super)); > ASSERT(page_list_empty(&p2m->pod.single)); > > @@ -1529,6 +1533,7 @@ void audit_p2m(struct domain *d, > P2M_PRINTK("p2m audit starts\n"); > > p2m_lock(p2m); > + pod_lock(p2m); > > if (p2m->audit_p2m) > pmbad = p2m->audit_p2m(p2m); > @@ -1589,6 +1594,7 @@ void audit_p2m(struct domain *d, > } > spin_unlock(&d->page_alloc_lock); > > + pod_unlock(p2m); > p2m_unlock(p2m); > > P2M_PRINTK("p2m audit complete\n"); > diff -r fb9c06df05f2 -r 56ceab0118cb xen/include/asm-x86/p2m.h > --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -261,25 +261,12 @@ struct p2m_domain { > unsigned long max_mapped_pfn; > > /* Populate-on-demand variables > - * NB on locking. {super,single,count} are > - * covered by d->page_alloc_lock, since they're almost always used in > - * conjunction with that functionality. {entry_count} is covered by > - * the domain p2m lock, since it's almost always used in conjunction > - * with changing the p2m tables. > - * > - * At this point, both locks are held in two places. In both, > - * the order is [p2m,page_alloc]: > - * + p2m_pod_decrease_reservation() calls p2m_pod_cache_add(), > - * which grabs page_alloc > - * + p2m_pod_demand_populate() grabs both; the p2m lock to avoid > - * double-demand-populating of pages, the page_alloc lock to > - * protect moving stuff from the PoD cache to the domain page list. > - * > - * We enforce this lock ordering through a construct in mm-locks.h. > - * This demands, however, that we store the previous lock-ordering > - * level in effect before grabbing the page_alloc lock. The unlock > - * level is stored in the arch section of the domain struct. > - */ > + * All variables are protected with the pod lock. We cannot rely on > + * the p2m lock if it's turned into a fine-grained lock. > + * We only use the domain page_alloc lock for additions and > + * deletions to the domain's page list. Because we use it nested > + * within the PoD lock, we enforce it's ordering (by remembering > + * the unlock level in the arch_domain sub struct). */ > struct { > struct page_list_head super, /* List of superpages > */ > single; /* Non-super lists > */ > @@ -288,6 +275,8 @@ struct p2m_domain { > unsigned reclaim_super; /* Last gpfn of a scan */ > unsigned reclaim_single; /* Last gpfn of a scan */ > unsigned max_guest; /* gpfn of max guest demand-populate > */ > + mm_lock_t lock; /* Locking of private pod structs, * > + * not relying on the p2m lock. > */ > } pod; > }; > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.