[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2.



I've added Andres since I think this accumulation of ENOENT in err_ptr
is a paging related thing, or at least I think he understands this
code ;-)

> @@ -89,37 +89,112 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, 
> unsigned long addr,
>         struct page *page = info->pages[info->index++];
>         unsigned long pfn = page_to_pfn(page);
>         pte_t pte = pfn_pte(pfn, info->prot);
> -
> -       if (map_foreign_page(pfn, info->fgmfn, info->domid))
> -               return -EFAULT;
> +       int err;
> +       // TODO: We should really batch these updates.
> +       err = map_foreign_page(pfn, *info->fgmfn, info->domid);
> +       *info->err_ptr++ = err;
>         set_pte_at(info->vma->vm_mm, addr, ptep, pte);
> +       info->fgmfn++;
> 
> -       return 0;
> +       return err;

This will cause apply_to_page_range to stop walking and return an error.
AIUI the intention of the err_ptr array is to accumulate the individual
success/error for the entire range. The caller can then take care of the
successes/failures and ENOENTs as appropriate (in particular it doesn't
want to abort a batch because of an ENOENT, it wants to do as much as
possible)

On x86 (when err_ptr != NULL) you accumulate all of the errors from
HYPERVISOR_mmu_update rather than aborting on the first one  and this
seems correct to me.

>  int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 01de35c..323a2ab 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> [...]

> @@ -2528,23 +2557,98 @@ int xen_remap_domain_mfn_range(struct vm_area_struct 
> *vma,
>                 if (err)
>                         goto out;
> 
> -               err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid);
> -               if (err < 0)
> -                       goto out;
> +               /* We record the error for each page that gives an error, but
> +                * continue mapping until the whole set is done */
> +               do {
> +                       err = HYPERVISOR_mmu_update(&mmu_update[index],
> +                                                   batch_left, &done, domid);
> +                       if (err < 0) {
> +                               if (!err_ptr)
> +                                       goto out;
> +                               /* increment done so we skip the error item */
> +                               done++;
> +                               last_err = err_ptr[index] = err;
> +                       }
> +                       batch_left -= done;
> +                       index += done;
> +               } while (batch_left);
> 
>                 nr -= batch;
>                 addr += range;
> +               if (err_ptr)
> +                       err_ptr += batch;
>         }
> 
> -       err = 0;
> +       err = last_err;

This means that if you have 100 failures followed by one success you
return success overall. Is that intentional? That doesn't seem right.

>  out:
> 
>         xen_flush_tlb_all();
> 
>         return err;
>  }
[...]
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 0bbbccb..8f86a44 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
[...]
> @@ -283,12 +317,10 @@ static int mmap_batch_fn(void *data, void *state)
>         if (xen_feature(XENFEAT_auto_translated_physmap))
>                 cur_page = pages[st->index++];
> 
> -       ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 
> 1,
> -                                        st->vma->vm_page_prot, st->domain,
> -                                        &cur_page);
> -
> -       /* Store error code for second pass. */
> -       *(st->err++) = ret;
> +       BUG_ON(nr < 0);
> +       ret = xen_remap_domain_mfn_array(st->vma, st->va & PAGE_MASK, mfnp, 
> nr,
> +                                        st->err, st->vma->vm_page_prot,
> +                                        st->domain, &cur_page);
> 
>         /* And see if it affects the global_error. */

The code which follows needs adjustment to cope with the fact that we
now batch. I think it needs to walk st->err and set global_error as
appropriate. This is related to my comment about the return value of
xen_remap_domain_mfn_range too.

I think rather than trying to fabricate some sort of meaningful error
code for an entire batch xen_remap_domain_mfn_range should just return
an indication about whether there were any errors or not and leave it to
the caller to figure out the overall result by looking at the err array.

Perhaps return either the number of errors or the number of successes
(which turns the following if into either (ret) or (ret < nr)
respectively).

>         if (ret < 0) {
> @@ -300,7 +332,7 @@ static int mmap_batch_fn(void *data, void *state)
>                                 st->global_error = 1;
>                 }
>         }
> -       st->va += PAGE_SIZE;
> +       st->va += PAGE_SIZE * nr;
> 
>         return 0;
>  }
> @@ -430,8 +462,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
> int version)
>         state.err           = err_array;
> 
>         /* mmap_batch_fn guarantees ret == 0 */
> -       BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t),
> -                            &pagelist, mmap_batch_fn, &state));
> +       BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
> +                                   &pagelist, mmap_batch_fn, &state));

Can we make traverse_pages and _block common by passing a block size
parameter?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.