[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH v2] xen: free_domheap_pages: delay page scrub to idle loop

>>> On 20.05.14 at 04:15, <lliubbo@xxxxxxxxx> wrote:
> So I use a percpu scrub page list in this patch, the tradeoff is we may not 
> use all idle cpus. It depends on free_domheap_pages() runs on which cpu.

So this means the time until all memory can be used again a completely
arbitrary amount of time can pass, as it depends on the freeing CPU
to be idle long enough (many minutes according to your observations)
- even if all the rest of the system was idle.

I see the problem with the lock contention, but I think an at least
slightly more sophisticated solution is going to be needed.

> @@ -633,6 +635,9 @@ static struct page_info *alloc_heap_pages(
>                      goto found;
>          } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
> +        if ( scrub_free_pages() )
> +            continue;

This will recover memory only from the current CPU's list - the larger
the system, the less likely that this will turn up anything. Furthermore
you're creating an almost unbounded loop here - for order > 0 the
ability of scrub_pages() to make memory available doesn't mean that
on the next iteration the loop wouldn't come back here.

> @@ -1417,6 +1422,23 @@ void free_xenheap_pages(void *v, unsigned int order)
>  #endif
> +unsigned long scrub_free_pages(void)

The return value and the local variable below could easily be
unsigned int.

> +{
> +    struct page_info *pg;
> +    unsigned long nr_scrubed = 0;
> +
> +    /* Scrub around 400M memory every time */
> +    while ( nr_scrubed < 100000 )

Without explanation such a hard coded number wouldn't be acceptable
in any case. How long does it take to scrub 400Mb on a _slow_ system?
I hope you realize that the amount of work you do here affects the
wakeup time of a vCPU supposed to run on the given CPU.

> @@ -1564,8 +1586,15 @@ void free_domheap_pages(struct page_info *pg, unsigned 
> int order)
>           * domain has died we assume responsibility for erasure.
>           */
>          if ( unlikely(d->is_dying) )
> +        {
> +            /*
> +             * Add page to page_scrub_list to speed up domain destroy, those
> +             * pages will be zeroed later by scrub_page_tasklet.
> +             */
>              for ( i = 0; i < (1 << order); i++ )
> -                scrub_one_page(&pg[i]);
> +                page_list_add_tail( &pg[i], &this_cpu(page_scrub_list) );
> +            goto out;
> +        }

If done this way, I see no reason why you couldn't add the page in one
chunk to the list (i.e. even if order > 0), by making use of PFN_ORDER()
to communicate the order to the scrubbing routine.

But having sent a v2 patch without the conceptional questions being
sorted out I consider kind of odd anyway. I.e. before sending another
version I think you need to
- explain that the latency gain here outweighs the performance effects
  on other guests,
- explain why alternative approaches (like the suggested flagging of the
  pages as needing scrubbing during freeing, and doing the scrubbing in
  the background as well as on the allocation path) are worse, or at least
  not better than this one (this model may even allow avoiding the
  scrubbing altogether for hypervisor internal allocations).


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.