[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 2/5] libxl: react correctly to POLLHUP



Roger Pau Monne writes ("[PATCH v2 2/5] libxl: react correctly to POLLHUP"):
> When received POLLHUP on datacopier_readable/writable, kill the
> datacopier and call the callback with onwrite=-2. On the bootloader
> callback kill the bootloader process and wait for the callback, but
> without setting the bl->rc error code.
...
> +    if (revents & POLLHUP) {
> +        LOG(DEBUG, "received POLLHUP on %s during copy of %s, "
> +                   "killing the bootloader process",
> +                   dc->readwhat, dc->copywhat);

The datacopier logic is correct here as far as it goes I think, but as
Ian says, the message is wrong.

However I think you need also to check for POLLHUP on reading the
pty.  There are four fd events registered:
  - reading bootloader pty master, POLLHUP expected
  - writing bootloader pty master, POLLHUP expected
  - reading xenconsole client pty master, POLLHUP not expected
  - writing xenconsole client pty master, POLLHUP not expected
(The writing events may not always be registered.)

When the POLLHUP occurs on the bootloader pty master it should be
handled the same way whether we first notice it with respect to the
reading or writing event.  (These correspond to different poll slots
and there is no particular reason why it we would process one of these
before the other - although of course we the writing one is only
present if we have data to write.)

I'm assuming that libxl doesn't get a POLLHUP on the xenconsole pty
master when the client disconnects, because (a) if it did that would
be impossible for libxl to handle properly and (b) xenconsoled doesn't
have any code for this - although it uses select so it's not 100%
analogous.  So in this case I think the POLLHUP should be an error, as
before.


So having thought about it this way, I don't think this interface:

> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2073,6 +2073,7 @@ typedef struct libxl__datacopier_buf 
> libxl__datacopier_buf;
>   *     errnoval==0 means we got eof and all data was written
>   *     errnoval!=0 means we had a read error, logged
>   * onwrite==-1 means some other internal failure, errnoval not valid, logged
> + * onwrite==-2 means we got a POLLHUP, errnoval not valid, logged
>   * in all cases copier is killed before calling this callback */
>  typedef void libxl__datacopier_callback(libxl__egc *egc,
>       libxl__datacopier_state *dc, int onwrite, int errnoval);

is suitable.  We need to be able to distinguish POLLHUP on the writing
fd from POLLHUP on the reading fd.  That means that you shouldn't
abuse "onwrite" like this.

I can see two possible options:

1. Since we have "int errnoval" and errno values are always positive
you could signal this with a specific -ve value for errnoval.  That
is, call datacopier's callers must now be prepared to deal with this
`magic' value of errnoval which is not a proper errno value.  That is
going to be annoying for other users who will need special error
handling behaviour.

2. Provide a new callback function pointer for POLLHUP:
    struct libxl__datacopier_state {
        ...
        libxl__datacopier_callback *callback;
        libxl__datacopier_callback *callback_pollhup;
        ...
    }
with something like the following spec:
   /* If callback_pollhup!=NULL, POLLHUP will be reported
    * by calling callback_pollhup->(..., onwrite, -1).
    * If callback_pollhup==NULL, POLLHUP will be reported
    * as an internal failure.
    * in all cases copier is killed before calling these callbacks */
and make sure that the other users have callback_pollhup==0.


Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.