[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug: privcmd failure in mmaping 160+ pages



2013/8/5 Guanglin Xu <mzguanglin@xxxxxxxxx>:
> 2013/7/31 Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>:
>> On Jul 31, 2013, at 8:39 PM, Guanglin Xu <mzguanglin@xxxxxxxxx> wrote:
>>
>>> Hi Andres,
>>>
>>> Thanks for your reply. However, I encounter some problems in patching
>>> your diff as well as compiling the resulted privcmd.c (after manually
>>> patching).
>>>
>>> Which kernel version are you using? Can you provide complete privcmd.c file?
>> I posted this against linux-next.
>
> Hi Andres,
>
> I have successfully patched your codes into linux-next too, but it
> doesn't solve the problem of mmap 160+ pages.
>
> My configuration: Linux-next: 3.110-rc3;  Xen: 4.1
>
> Do you think if I should try Xen newer than 4.1?
>
>>
>> Please be less vague when posting. For example, how did the final diff look 
>> like and what is your compile error. We can probably fix the compile easily.
>
> Apologize for confusing. It's because of my lack of experience in
> building kernel.
>
> 1. the diff didn't work just because some blanks were missing. For example,
>
> -#ifndef HAVE_ARCH_PRIVCMD_MMAP
> <!!a blank should be here!!>static int
> privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma);
> -#endif
>
> 2. compiling error is because I didn't specify kernel headers for KDIR
> environment variable.
>
> Correct:    KDIR := /usr/src/linux-headers-3.11.0-rc3-xen-next-20130801
> Incorrect:  KDIR := /root/linux-next
> Incorrect:  KDIR := /lib/modules/$(shell uname -r)/build
>
> Thanks,
>
> Guanglin
>
>>
>> Posting the privcmd.c from linux-next would not really help.
>>
>> Thanks,
>> Andres
>>
>>
>>>
>>> Thanks,
>>> Guanglin
>>>
>>> 2013/7/31 Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>:
>>>> Hi Konrad, David and everybody else,
>>>>
>>>> I find a privcmd bug that no more than 160 pages can be mmap by
>>>> IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2.
>>>>
>>>> I have started a thread (see below) at the xen-user list, and then
>>>> discussed with Mr. Ian Campbell, who finally recommended talking to
>>>> the xen-devel list and CC'ing you.
>>>>
>>>> I appreciate your comments.
>>>>
>>>>
>>>> You know, this might have been me. Or in the process of being fixed by a
>>>> patch I just posted.
>>>>
>>>> As you may know by now, xc_map_foreign* first sets up a region of virtual
>>>> memory with an mmap call, and then issues the mapping ioctl to populate the
>>>> region.
>>>>
>>>> Let's say for the sake of argument that the mapping operation is broken up
>>>> into pieces. Current checks in the implementation of PRIVCMD_MMAPBATCH*
>>>> prevent you from populating the memory region piece-meal.
>>>>
>>>> So: can you try the patch I pasted below?
>>>>
>>>> If that doesn't fix it, then you'll need to give us the output of those
>>>> printks.
>>>>
>>>> Thanks
>>>> Andres
>>>>
>>>> [PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*.
>>>>
>>>> When a foreign mapper attempts to map guest frames that are paged out,
>>>> the mapper receives an ENOENT response and will have to try again
>>>> while a helper process pages the target frame back in.
>>>>
>>>> Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries
>>>> of mapping calls.
>>>>
>>>> Signed-off-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
>>>> ---
>>>> drivers/xen/privcmd.c |   32 +++++++++++++++++++++++++++-----
>>>> 1 files changed, 27 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>>>> index f8e5dd7..44a26c6 100644
>>>> --- a/drivers/xen/privcmd.c
>>>> +++ b/drivers/xen/privcmd.c
>>>> @@ -43,9 +43,12 @@ MODULE_LICENSE("GPL");
>>>>
>>>> #define PRIV_VMA_LOCKED ((void *)1)
>>>>
>>>> -#ifndef HAVE_ARCH_PRIVCMD_MMAP
>>>> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma);
>>>> -#endif
>>>> +
>>>> +static int privcmd_enforce_singleshot_mapping_granular(
>>>> +               struct vm_area_struct *vma,
>>>> +               unsigned long addr,
>>>> +               unsigned long nr_pages);
>>>>
>>>> static long privcmd_ioctl_hypercall(void __user *udata)
>>>> {
>>>> @@ -422,9 +425,9 @@ static long privcmd_ioctl_mmap_batch(void __user 
>>>> *udata,
>>>> int version)
>>>> vma = find_vma(mm, m.addr);
>>>> if (!vma ||
>>>>    vma->vm_ops != &privcmd_vm_ops ||
>>>> -     (m.addr != vma->vm_start) ||
>>>> -     ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) ||
>>>> -     !privcmd_enforce_singleshot_mapping(vma)) {
>>>> +     (m.addr < vma->vm_start) ||
>>>> +     ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end) ||
>>>> +     !privcmd_enforce_singleshot_mapping_granular(vma, m.addr, nr_pages)) 
>>>> {
>>>> up_write(&mm->mmap_sem);
>>>> ret = -EINVAL;
>>>> goto out;
>>>> @@ -540,11 +543,30 @@ static int privcmd_mmap(struct file *file, struct
>>>> vm_area_struct *vma)
>>>> return 0;
>>>> }
>>>>
>>>> +/* For MMAP */
>>>> static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma)
>>>> {
>>>> return !cmpxchg(&vma->vm_private_data, NULL, PRIV_VMA_LOCKED);
>>>> }
>>>>
>>>> +/* For MMAPBATCH*. This allows asserting the singleshot mapping
>>>> + * on a per pfn/pte basis. Mapping calls that fail with ENOENT
>>>> + * can be then retried until success. */
>>>> +static int enforce_singleshot_mapping_fn(pte_t *pte, struct page 
>>>> *pmd_page,
>>>> +                 unsigned long addr, void *data)
>>>> +{
>>>> + return pte_none(*pte) ? 0 : -EBUSY;
>>>> +}
>>>> +
>>>> +static int privcmd_enforce_singleshot_mapping_granular(
>>>> +            struct vm_area_struct *vma,
>>>> +            unsigned long addr,
>>>> +            unsigned long nr_pages)
>>>> +{
>>>> + return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT,
>>>> +               enforce_singleshot_mapping_fn, NULL) == 0;
>>>> +}
>>>> +
>>>> const struct file_operations xen_privcmd_fops = {
>>>> .owner = THIS_MODULE,
>>>> .unlocked_ioctl = privcmd_ioctl,
>>>> --
>>>> 1.7.1
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Guanglin
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Ian Campbell <Ian.Campbell@xxxxxxxxx>
>>>> Date: 2013/7/30
>>>> Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages
>>>> To: Guanglin Xu <mzguanglin@xxxxxxxxx>
>>>> ??? "xen-users@xxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxx>
>>>>
>>>>
>>>> On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote:
>>>>
>>>> 2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>:
>>>>
>>>> On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I just find a xc_map_foreign_range() problem in Xen.
>>>>
>>>> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen
>>>> kernel module, can be used to mmap guest VM memory into dom0. However,
>>>> if we mmap more than 160 pages, we'll fail.
>>>>
>>>> Inside xc_map_foreign_range(), it uses ioctrl to communicate with
>>>> privcmd. There are 2 ioctl commands, the one is
>>>> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is
>>>> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0
>>>> (success), but the mmapings are fail after mmaping 160 pages.
>>>>
>>>> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH
>>>> was the only one ioctl command.  rc = ioctl(fd,
>>>> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx).  After mapping 160 pages, the
>>>> subsequent invoking of ioctl would
>>>> set ioctlx.arr[] items to 140737344202616, overwriting the original
>>>> pfn number.
>>>>
>>>> And then, I updated my Linux kernel to 3.8 so as to test
>>>> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd,
>>>> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking
>>>> of ioctl would set ioctls.err[] items to EINVAL.
>>>>
>>>> Although I have inserted printk() in privcmd.c to track its execution
>>>> flow, the result showed a look-like complete path, which is quite
>>>> weird. I have no idea what happened in privcmd.
>>>>
>>>> Can anyone figure out this problem?
>>>>
>>>>
>>>> I wouldn't be all that surprised if there was a hardcoded batch size
>>>> limit somewhere in either libxc or the privcmd driver.
>>>>
>>>>
>>>> Hi Ian,
>>>>
>>>> Thank you very much for your reply.
>>>>
>>>> I can confirm that it's the problem of privcmd, because I have debug
>>>> libxc and narrowed the problem to ioctl(), where privcmd would set
>>>> ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] (
>>>> IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation.
>>>>
>>>>
>>>> If you map you memory in batch of e.g. 128 pages does it all work OK?
>>>>
>>>>
>>>> Yes. For example, [0-127] succeeds. However, the subsequent [128-255]
>>>> would fail because the size of  the whole region has exceeded 160
>>>> pages.
>>>>
>>>>
>>>> That's quite interesting.
>>>>
>>>>
>>>> If you want to get to the bottom of the 160 page limit you'll probably
>>>> have to trace through the code looking for hardcoded sizes or limits on
>>>> sizes (e.g. of arrays) or perhaps integer indexes etc which are too
>>>> small and are overflowing.
>>>>
>>>> 160 * the size of the various structs involved doesn't look to be all
>>>> that interesting (i.e. just over a page size boundary or something), but
>>>> that's the sort of direction I would start by looking in.
>>>>
>>>> If you can't spot it by eye then you'll likely have to instrument the
>>>> code paths with prints to try and track the progress of the initially
>>>> supplied buffer through to the hypercall etc.
>>>>
>>>>
>>>> Yes. I have been debuging libxc and instrumenting privcmd, but I
>>>> couldn't find the "hardcode" or other limitation codes.
>>>>
>>>> Codes in libxc is easizer to trace, but in privcmd it is hard for me
>>>> who is in lack of kernel dev experience. I couldn't even find where
>>>> privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[]
>>>> items while the execution path of privcmd_ioctl() seems quite complete
>>>> by use of printk().
>>>>
>>>> Do you have idea how privcmd can set ioctlx.err[] in another way?
>>>>
>>>>
>>>>
>>>> I'm not familiar with this code. It sounds like this is most probably a
>>>> kernel bug, I'd recommend taking it to the xen-devel list, perhaps CC
>>>> Konrad Wilk and David Vrabel.
>>>>
>>>> Ian.
>>>>
>>>>
>>


Dear Andres and all,

After a few days exploration, I realize that the problem I found is
actually about "memory hole". I must clarify that privcmd can
definitely map more than 160 pages, but it just can't map some memory
holes.

For example, in a 1 GB guest memory settings where 262144 pages (4096
bytes per page) should be available, privcmd denies to map pages 160 ~
191 and 260096 ~ 262143. According to
www.intel.la/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-desktop-vol-2-datasheet.pdf
, pages 160 ~ 191 are legacy video buffer; and in my settings, pages
260096 ~ 262143 are 8MB preserved memory for VGA.

However, this result is not satisfying at all. I would be happy if you
can response to the two further topics:

1. privcmd adds dummy paddings to the above memory holes rather than
denying mapping.
http://lists.xen.org/archives/html/xen-devel/2013-08/msg01052.html

2.fix Xen that reports wrong memory size of domU.
http://lists.xen.org/archives/html/xen-users/2013-08/msg00098.html



Thanks,

Guanglin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.