|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 3 of 5] Rework locking in the PoD layer
On Wed, 2012-02-01 at 19:56 +0000, Andres Lagar-Cavilla wrote:
> @@ -114,15 +114,20 @@ p2m_pod_cache_add(struct p2m_domain *p2m
> unmap_domain_page(b);
> }
>
> + /* First, take all pages off the domain list */
> lock_page_alloc(p2m);
> -
> - /* First, take all pages off the domain list */
> for(i=0; i < 1 << order ; i++)
> {
> p = page + i;
> page_list_del(p, &d->page_list);
> }
>
> + /* Ensure that the PoD cache has never been emptied.
> + * This may cause "zombie domains" since the page will never be freed. */
> + BUG_ON( d->arch.relmem != RELMEM_not_started );
> +
> + unlock_page_alloc(p2m);
> +
This assert should stay where it is. The idea is to verify
after-the-fact that the page_list_add_tail()s *have not* raced with
emptying the PoD cache. Having the assert before cannot guarantee that
they *will not* race with emptying the PoD cache. Alternately, if we're
positive that condition can never happen again, we should just remove
the BUG_ON().
If I recall correctly, I put this in here because something ended up
calling cache_add after empty_cache(), potentially with the p2m lock
already held; that's why I put the BUG_ON() there to begin with.
> @@ -922,6 +929,12 @@ p2m_pod_emergency_sweep(struct p2m_domai
> limit = (start > POD_SWEEP_LIMIT) ? (start - POD_SWEEP_LIMIT) : 0;
>
> /* FIXME: Figure out how to avoid superpages */
> + /* NOTE: Promote to globally locking the p2m. This will get complicated
> + * in a fine-grained scenario. Even if we're to lock each gfn
> + * individually we must be careful about recursion limits and
> + * POD_SWEEP_STRIDE. This is why we don't enforce deadlock constraints
> + * between p2m and pod locks */
> + p2m_lock(p2m);
> for ( i=p2m->pod.reclaim_single; i > 0 ; i-- )
> {
> p2m_access_t a;
> @@ -940,7 +953,7 @@ p2m_pod_emergency_sweep(struct p2m_domai
> /* Stop if we're past our limit and we have found *something*.
> *
> * NB that this is a zero-sum game; we're increasing our cache size
> - * by re-increasing our 'debt'. Since we hold the p2m lock,
> + * by re-increasing our 'debt'. Since we hold the pod lock,
> * (entry_count - count) must remain the same. */
> if ( p2m->pod.count > 0 && i < limit )
> break;
> @@ -949,6 +962,7 @@ p2m_pod_emergency_sweep(struct p2m_domai
> if ( j )
> p2m_pod_zero_check(p2m, gfns, j);
>
> + p2m_unlock(p2m);
> p2m->pod.reclaim_single = i ? i - 1 : i;
>
> }
> @@ -965,8 +979,9 @@ p2m_pod_demand_populate(struct p2m_domai
> int i;
>
> ASSERT(gfn_locked_by_me(p2m, gfn));
> + pod_lock(p2m);
>
> - /* This check is done with the p2m lock held. This will make sure that
> + /* This check is done with the pod lock held. This will make sure that
> * even if d->is_dying changes under our feet, p2m_pod_empty_cache()
> * won't start until we're done. */
> if ( unlikely(d->is_dying) )
> @@ -977,6 +992,7 @@ p2m_pod_demand_populate(struct p2m_domai
> * 1GB region to 2MB chunks for a retry. */
> if ( order == PAGE_ORDER_1G )
> {
> + pod_unlock(p2m);
> gfn_aligned = (gfn >> order) << order;
> /* Note that we are supposed to call set_p2m_entry() 512 times to
> * split 1GB into 512 2MB pages here. But We only do once here
> because
> @@ -1000,11 +1016,15 @@ p2m_pod_demand_populate(struct p2m_domai
>
> /* If we're low, start a sweep */
> if ( order == PAGE_ORDER_2M && page_list_empty(&p2m->pod.super) )
> + /* Note that sweeps scan other ranges in the p2m. In an scenario
> + * in which p2m locks are order-enforced wrt pod lock and p2m
> + * locks are fine grained, this will result in deadlock panic */
> p2m_pod_emergency_sweep_super(p2m);
>
> if ( page_list_empty(&p2m->pod.single) &&
> ( ( order == PAGE_ORDER_4K )
> || (order == PAGE_ORDER_2M &&
> page_list_empty(&p2m->pod.super) ) ) )
> + /* Same comment regarding deadlock applies */
> p2m_pod_emergency_sweep(p2m);
> }
>
Regarding locking on emergency sweeps: I think it should suffice if we
could do the equivalent of a spin_trylock() on each gpfn, and just move
on (not checking that gfn) if it fails. What do you think?
Other than that, I don't see anything wrong with this locking at first
glance; but it's complicated enough that I don't think I've quite
grokked it yet. I'll look at it again tomorrow and see if things are
more clear. :-)
-George
> @@ -1012,8 +1032,6 @@ p2m_pod_demand_populate(struct p2m_domai
> if ( q == p2m_guest && gfn > p2m->pod.max_guest )
> p2m->pod.max_guest = gfn;
>
> - lock_page_alloc(p2m);
> -
> if ( p2m->pod.count == 0 )
> goto out_of_memory;
>
> @@ -1026,8 +1044,6 @@ p2m_pod_demand_populate(struct p2m_domai
>
> BUG_ON((mfn_x(mfn) & ((1 << order)-1)) != 0);
>
> - unlock_page_alloc(p2m);
> -
> gfn_aligned = (gfn >> order) << order;
>
> set_p2m_entry(p2m, gfn_aligned, mfn, order, p2m_ram_rw,
> p2m->default_access);
> @@ -1038,7 +1054,7 @@ p2m_pod_demand_populate(struct p2m_domai
> paging_mark_dirty(d, mfn_x(mfn) + i);
> }
>
> - p2m->pod.entry_count -= (1 << order); /* Lock: p2m */
> + p2m->pod.entry_count -= (1 << order);
> BUG_ON(p2m->pod.entry_count < 0);
>
> if ( tb_init_done )
> @@ -1056,20 +1072,24 @@ p2m_pod_demand_populate(struct p2m_domai
> __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
> }
>
> + pod_unlock(p2m);
> return 0;
> out_of_memory:
> - unlock_page_alloc(p2m);
> + pod_unlock(p2m);
>
> printk("%s: Out of populate-on-demand memory! tot_pages %" PRIu32 "
> pod_entries %" PRIi32 "\n",
> __func__, d->tot_pages, p2m->pod.entry_count);
> domain_crash(d);
> out_fail:
> + pod_unlock(p2m);
> return -1;
> remap_and_retry:
> BUG_ON(order != PAGE_ORDER_2M);
> - unlock_page_alloc(p2m);
> + pod_unlock(p2m);
>
> /* Remap this 2-meg region in singleton chunks */
> + /* NOTE: In a p2m fine-grained lock scenario this might
> + * need promoting the gfn lock from gfn->2M superpage */
> gfn_aligned = (gfn>>order)<<order;
> for(i=0; i<(1<<order); i++)
> set_p2m_entry(p2m, gfn_aligned+i, _mfn(0), PAGE_ORDER_4K,
> @@ -1137,9 +1157,11 @@ guest_physmap_mark_populate_on_demand(st
> rc = -EINVAL;
> else
> {
> - p2m->pod.entry_count += 1 << order; /* Lock: p2m */
> + pod_lock(p2m);
> + p2m->pod.entry_count += 1 << order;
> p2m->pod.entry_count -= pod_count;
> BUG_ON(p2m->pod.entry_count < 0);
> + pod_unlock(p2m);
> }
>
> gfn_unlock(p2m, gfn, order);
> diff -r fb9c06df05f2 -r 56ceab0118cb xen/arch/x86/mm/p2m-pt.c
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -954,6 +954,7 @@ long p2m_pt_audit_p2m(struct p2m_domain
> struct domain *d = p2m->domain;
>
> ASSERT(p2m_locked_by_me(p2m));
> + ASSERT(pod_locked_by_me(p2m));
>
> test_linear = ( (d == current->domain)
> && !pagetable_is_null(current->arch.monitor_table) );
> diff -r fb9c06df05f2 -r 56ceab0118cb xen/arch/x86/mm/p2m.c
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -72,6 +72,7 @@ boolean_param("hap_2mb", opt_hap_2mb);
> static void p2m_initialise(struct domain *d, struct p2m_domain *p2m)
> {
> mm_lock_init(&p2m->lock);
> + mm_lock_init(&p2m->pod.lock);
> INIT_LIST_HEAD(&p2m->np2m_list);
> INIT_PAGE_LIST_HEAD(&p2m->pages);
> INIT_PAGE_LIST_HEAD(&p2m->pod.super);
> @@ -587,8 +588,10 @@ guest_physmap_add_entry(struct domain *d
> rc = -EINVAL;
> else
> {
> - p2m->pod.entry_count -= pod_count; /* Lock: p2m */
> + pod_lock(p2m);
> + p2m->pod.entry_count -= pod_count;
> BUG_ON(p2m->pod.entry_count < 0);
> + pod_unlock(p2m);
> }
> }
>
> @@ -1372,6 +1375,7 @@ p2m_flush_table(struct p2m_domain *p2m)
> /* "Host" p2m tables can have shared entries &c that need a bit more
> * care when discarding them */
> ASSERT(p2m_is_nestedp2m(p2m));
> + /* Nested p2m's do not do pod, hence the asserts (and no pod lock)*/
> ASSERT(page_list_empty(&p2m->pod.super));
> ASSERT(page_list_empty(&p2m->pod.single));
>
> @@ -1529,6 +1533,7 @@ void audit_p2m(struct domain *d,
> P2M_PRINTK("p2m audit starts\n");
>
> p2m_lock(p2m);
> + pod_lock(p2m);
>
> if (p2m->audit_p2m)
> pmbad = p2m->audit_p2m(p2m);
> @@ -1589,6 +1594,7 @@ void audit_p2m(struct domain *d,
> }
> spin_unlock(&d->page_alloc_lock);
>
> + pod_unlock(p2m);
> p2m_unlock(p2m);
>
> P2M_PRINTK("p2m audit complete\n");
> diff -r fb9c06df05f2 -r 56ceab0118cb xen/include/asm-x86/p2m.h
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -261,25 +261,12 @@ struct p2m_domain {
> unsigned long max_mapped_pfn;
>
> /* Populate-on-demand variables
> - * NB on locking. {super,single,count} are
> - * covered by d->page_alloc_lock, since they're almost always used in
> - * conjunction with that functionality. {entry_count} is covered by
> - * the domain p2m lock, since it's almost always used in conjunction
> - * with changing the p2m tables.
> - *
> - * At this point, both locks are held in two places. In both,
> - * the order is [p2m,page_alloc]:
> - * + p2m_pod_decrease_reservation() calls p2m_pod_cache_add(),
> - * which grabs page_alloc
> - * + p2m_pod_demand_populate() grabs both; the p2m lock to avoid
> - * double-demand-populating of pages, the page_alloc lock to
> - * protect moving stuff from the PoD cache to the domain page list.
> - *
> - * We enforce this lock ordering through a construct in mm-locks.h.
> - * This demands, however, that we store the previous lock-ordering
> - * level in effect before grabbing the page_alloc lock. The unlock
> - * level is stored in the arch section of the domain struct.
> - */
> + * All variables are protected with the pod lock. We cannot rely on
> + * the p2m lock if it's turned into a fine-grained lock.
> + * We only use the domain page_alloc lock for additions and
> + * deletions to the domain's page list. Because we use it nested
> + * within the PoD lock, we enforce it's ordering (by remembering
> + * the unlock level in the arch_domain sub struct). */
> struct {
> struct page_list_head super, /* List of superpages
> */
> single; /* Non-super lists
> */
> @@ -288,6 +275,8 @@ struct p2m_domain {
> unsigned reclaim_super; /* Last gpfn of a scan */
> unsigned reclaim_single; /* Last gpfn of a scan */
> unsigned max_guest; /* gpfn of max guest demand-populate
> */
> + mm_lock_t lock; /* Locking of private pod structs, *
> + * not relying on the p2m lock.
> */
> } pod;
> };
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |