[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH 1/3] libxc: add xc_gnttab_map_grant_ref_notify



On 09/21/2011 11:25 AM, Ian Campbell wrote:
> On Wed, 2011-09-21 at 16:02 +0100, Daniel De Graaf wrote:
>> On 09/21/2011 06:03 AM, Ian Campbell wrote:
>>> On Mon, 2011-09-19 at 23:43 +0100, Daniel De Graaf wrote:
>>>> @@ -116,4 +116,35 @@ struct ioctl_gntdev_set_max_grants {
>>>>    uint32_t count;
>>>>  };
>>>>  
>>>> +/*
>>>> + * Sets up an unmap notification within the page, so that the other side 
>>>> can do
>>>> + * cleanup if this side crashes. Required to implement cross-domain robust
>>>> + * mutexes or close notification on communication channels.
>>>> + *
>>>> + * Each mapped page only supports one notification; multiple calls 
>>>> referring to
>>>> + * the same page overwrite the previous notification. You must clear the
>>>> + * notification prior to the IOCTL_GNTALLOC_DEALLOC_GREF if you do not 
>>>> want it
>>>> + * to occur.
>>>> + */
>>>> +#define IOCTL_GNTDEV_SET_UNMAP_NOTIFY \
>>>> +_IOC(_IOC_NONE, 'G', 7, sizeof(struct ioctl_gntdev_unmap_notify))
>>>> +struct ioctl_gntdev_unmap_notify {
>>>> +  /* IN parameters */
>>>> +  /* Offset in the file descriptor for a byte within the page (same as
>>>> +   * used in mmap).
>>>
>>> I'm probably being thick but I don't understand what this means, i.e.
>>> what this thing is relative to.
>>
>> This is an offset that acts like a byte offset in the /dev/xen/gntdev device.
>> Conceptually, if /dev/xen/evtchn implemented pwrite() then this offset would
>> be the offset you would pass to modify your target byte.
>>
>> For example, if you use gntdev to map two pages, the first will be at offset
>> 0 and the second at 4096; you would pass 4098 here to indicate the third byte
>> of the second page. The offsets (0, 4096) are returned by the map-grant 
>> ioctls.
> 
> Hmm. I think I was confused because it was an offset into the file
> rather than, say, a virtual address.
> 
> When I map a page how do I know what the offset of it is wrt the file
> descriptor? DO I just have to remember how many pages I mapped an *=
> 4096?

You had the offset at the time you mapped it, because that's what you
passed in the offset parameter to mmap(). Just don't lose the number :)

The xen interfaces do forget the number: for gntdev, they recover it
via IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR and for gntalloc, it is not needed
because the pages are unhooked as part of the map process so the offset
is no longer valid. This technique is now possible for gntdev, but older
kernels will fail the unmap ioctl and may crash if you close anyway.

>>
>>>>  If using UNMAP_NOTIFY_CLEAR_BYTE, this is the byte to
>>>> +   * be cleared. Otherwise, it can be any byte in the page whose
>>>> +   * notification we are adjusting.
>>>> +   */
>>>> +  uint64_t index;
>>>> +  /* Action(s) to take on unmap */
>>>> +  uint32_t action;
>>>> +  /* Event channel to notify */
>>>> +  uint32_t event_channel_port;
>>>
>>> evtchn_port_t ?
>>
>> Using that would require an include dependency on event_channel.h which is
>> not exposed as a userspace-visible header. Linux's evtchn.h also does not
>> use evtchn_port_t (it uses unsigned int).
>>
>> Since this is just a direct copy of the header from the linux source tree,
>> any changes really need to happen there first.
> 
> OK, that's fine as it is then.
> 
>>
>>>> +};
>>>> +
>>>> +/* Clear (set to zero) the byte specified by index */
>>>> +#define UNMAP_NOTIFY_CLEAR_BYTE 0x1
>>>> +/* Send an interrupt on the indicated event channel */
>>>> +#define UNMAP_NOTIFY_SEND_EVENT 0x2
>>>> +
>>>>  #endif /* __LINUX_PUBLIC_GNTDEV_H__ */
>>>> diff --git a/tools/libxc/xc_gnttab.c b/tools/libxc/xc_gnttab.c
>>>> index 4f55fce..3d3c63b 100644
>>>> --- a/tools/libxc/xc_gnttab.c
>>>> +++ b/tools/libxc/xc_gnttab.c
>>>> @@ -18,6 +18,7 @@
>>>>   */
>>>>  
>>>>  #include "xc_private.h"
>>>> +#include <errno.h>
>>>>  
>>>>  int xc_gnttab_op(xc_interface *xch, int cmd, void * op, int op_size, int 
>>>> count)
>>>>  {
>>>> @@ -174,6 +175,28 @@ void *xc_gnttab_map_domain_grant_refs(xc_gnttab *xcg,
>>>>                                                    count, domid, refs, 
>>>> prot);
>>>>  }
>>>>  
>>>> +void *xc_gnttab_map_grant_ref_notify(xc_gnttab *xcg,
>>>> +                                     uint32_t domid,
>>>> +                                     uint32_t ref,
>>>> +                                     uint32_t notify_offset,
>>>> +                                     evtchn_port_t notify_port,
>>>> +                                     int *notify_result)
>>>> +{
>>>> +  if (xcg->ops->u.gnttab.map_grant_ref_notify)
>>>> +          return xcg->ops->u.gnttab.map_grant_ref_notify(xcg, 
>>>> xcg->ops_handle,
>>>> +                  domid, ref, notify_offset, notify_port, notify_result);
>>>> +  else {
>>>> +          void* area = xcg->ops->u.gnttab.map_grant_ref(xcg, 
>>>> xcg->ops_handle,
>>>> +                  domid, ref, PROT_READ|PROT_WRITE);
>>>> +          if (area && notify_result) {
>>>> +                  *notify_result = -1;
>>>> +                  errno = ENOSYS;
>>>> +          }
>>>> +          return area;
>>>> +  }
>>>
>>> I think the new public interface is fine but do we really need a new
>>> internal interface here?
>>>
>>> I think you can just add the notify_* arguments to the existing OSDEP
>>> function and have those OS backends which don't implement that feature
>>> return ENOSYS if notify_offset != 0 (or ~0 or whatever invalid value
>>> works).
>>>
>>> Why doesn't the *_notify variant take a prot argument?
>>
>> At least for the byte-clear portion of the notify, the page must be writable
>> or the notify will not work. I suppose an event channel alone could be used
>> for a read-only mapping, but normally the unmapping of a read-only mapping is
>> not an important event - although I guess you could contrive a use for this
>> type of notificaiton, so there's no reason not to allow it.
> 
> If you combine this the two functions then returning EINVAL for attempts
> to map without PROT_WRITE (or whatever perm is necessary) would be
> reasonable IMHO.
> 

The ioctl already prevents you from requesting the impossible, so this should
just work.

>>> I'd be tempted to do away with notify_result too -- if the caller asked
>>> for notification and we fail to give that then we can cleanup and return
>>> an error. If they want to try again without the notification then that's
>>> up to them.
>>
>> The source of the error might be unclear, but this would make the interface
>> cleaner.
>>
>>>
>>>> +}
>>>> +
>>>> +
>>>>  int xc_gnttab_munmap(xc_gnttab *xcg,
>>>>                       void *start_address,
>>>>                       uint32_t count)
>>>> diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c
>>>> index dca6718..3040cb6 100644
>>>> --- a/tools/libxc/xc_linux_osdep.c
>>>> +++ b/tools/libxc/xc_linux_osdep.c
>>>> @@ -613,6 +613,62 @@ static void 
>>>> *linux_gnttab_map_domain_grant_refs(xc_gnttab *xcg, xc_osdep_handle
>>>>      return do_gnttab_map_grant_refs(xcg, h, count, &domid, 0, refs, prot);
>>>>  }
>>>>  
>>>> +static void *linux_gnttab_map_grant_ref_notify(xc_gnttab *xch, 
>>>> xc_osdep_handle h,
>>>> +                                               uint32_t domid, uint32_t 
>>>> ref,
>>>> +                                               uint32_t notify_offset,
>>>> +                                               evtchn_port_t notify_port,
>>>> +                                               int *notify_result)
>>>> +{
>>>> +    int fd = (int)h;
>>>> +    int rv = 0;
>>>> +    struct ioctl_gntdev_map_grant_ref map;
>>>> +    struct ioctl_gntdev_unmap_notify notify;
>>>> +    void *addr;
>>>> +
>>>> +    map.count = 1;
>>>> +    map.refs[0].domid = domid;
>>>> +    map.refs[0].ref = ref;
>>>> +
>>>> +    if ( ioctl(fd, IOCTL_GNTDEV_MAP_GRANT_REF, &map) ) {
>>>> +        PERROR("xc_gnttab_map_grant_ref: ioctl MAP_GRANT_REF failed");
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    addr = mmap(NULL, XC_PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 
>>>> map.index);
>>>> +    if ( addr == MAP_FAILED )
>>>> +    {
>>>> +        int saved_errno = errno;
>>>> +        struct ioctl_gntdev_unmap_grant_ref unmap_grant;
>>>> +
>>>> +        PERROR("xc_gnttab_map_grant_ref: mmap failed");
>>>> +        unmap_grant.index = map.index;
>>>> +        unmap_grant.count = 1;
>>>> +        ioctl(fd, IOCTL_GNTDEV_UNMAP_GRANT_REF, &unmap_grant);
>>>> +        errno = saved_errno;
>>>> +        return NULL;
>>>> +    }
>>>
>>> The non-notify variant handles EAGAIN, why doesn't this one need to do
>>> so?
>>
>> The non-notify variant doesn't need to handle EAGAIN anymore (with modern
>> kernels, at least... perhaps it should remain for older kernels). Also,
>> do_gnttab_map_grant_refs does not handle EAGAIN.
> 
> OK I guess that is fine (although if you combine them as I suggest it
> comes back?)
> 
> I hadn't noticed that we have both map_gratn_ref and map_grant_refs,
> that seems like an area which could be cleaned up. (not saying you
> should, just noticing it)

Since I'm already rewriting the osdep layer functions, I think I can replace
all 3-4 existing map functions with a single function.

It looks like the current 2.6.32.x xen kernels also aren't returning EAGAIN,
so I'm unsure as to why this support was added. The commit in question is
20689:fe42b16855aa by Grzegorz Milos (committed by Keir Fraser), but I don't
see any discussion on xen-devel for it. It's also unclear why repeating the
request every millisecond in an infinite loop is better than letting the
caller handle an -EAGAIN.

>>
>>>> +
>>>> +    notify.index = map.index;
>>>> +    notify.action = 0;
>>>> +    if (notify_offset >= 0) {
>>>> +        notify.index += notify_offset;
>>>> +        notify.action |= UNMAP_NOTIFY_CLEAR_BYTE;
>>>> +    }
>>>> +    if (notify_port >= 0) {
>>>> +        notify.event_channel_port = notify_port;
>>>> +        notify.action |= UNMAP_NOTIFY_SEND_EVENT;
>>>> +    }
>>>> +    if (notify.action)
>>>> +        rv = ioctl(fd, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, &notify);
>>>
>>> Is there a race if the other end (or this process) dies between the MAP
>>> ioctl and here?
>>>
>>> Ian.
>>>
>>
>> Technically it's a race, but it doesn't impact any reasonable use of the
>> notification. The local process can't actually be using the shared page
>> at this point, and the other side will not be certain that the map has
>> actually succeeded until after the function returns (and it is notified
>> in some way - libvchan changes the notify byte from 2->1 at this point).
>>
>> If the domain whose memory we are mapping crashes, this ioctl will succeed
>> unless the event channel it refers to has already been invalidated - but
>> either way, the notifications are now irrelevant as there is nobody to
>> listen to them.
> 
> OK.
> 
> Thanks,
> Ian.
> 
>>
> 
> 
> 


-- 
Daniel De Graaf
National Security Agency

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.