[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V3] Switch from select() to poll() in xenconsoled's IO loop.



On Mon, 2013-01-07 at 10:20 +0000, Ian Campbell wrote:
> On Fri, 2013-01-04 at 17:17 +0000, Wei Liu wrote:
> > In Linux select() typically supports up to 1024 file descriptors. This can 
> > be
> > a problem when user tries to boot up many guests. Switching to poll() has
> > minimum impact on existing code and has better scalibility.
> > 
> > Tracking arrays are dynamically allocated / reallocated. If the tracking
> > arrays fail to expand, we just ignore the incoming fd.
> > 
> > Change from V2:
> >   * remove unnecessary malloc in initialize_pollfd_arrays
> >   * use ROUND_UP to get new size of arrays
> > 
> > Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
> > ---
> >  tools/console/daemon/io.c |  147 
> > +++++++++++++++++++++++++++++++++------------
> >  1 file changed, 109 insertions(+), 38 deletions(-)
> > 
> > diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c
> > index 48fe151..0c53d30 100644
> > --- a/tools/console/daemon/io.c
> > +++ b/tools/console/daemon/io.c
> > @@ -28,7 +28,7 @@
> >  #include <stdlib.h>
> >  #include <errno.h>
> >  #include <string.h>
> > -#include <sys/select.h>
> > +#include <poll.h>
> >  #include <fcntl.h>
> >  #include <unistd.h>
> >  #include <termios.h>
> > @@ -928,9 +928,87 @@ static void handle_log_reload(void)
> >         }
> >  }
> > 
> > +
> > +/* Should have at least max_fd + 1 elements */
> > +static struct pollfd  *fds;
> > +static struct pollfd **fd_to_pollfd;
> > +static unsigned int current_array_size;
> > +static unsigned int nr_fds;
> > +
> > +static void initialize_pollfd_arrays(void)
> > +{
> > +       fds = NULL;
> > +       fd_to_pollfd = NULL;
> > +       current_array_size = 0;
> > +}
> 
> These can all be done as part of the definition of the variables.
> 
> > +
> > +static void destroy_pollfd_arrays(void)
> > +{
> > +       free(fds);
> > +       free(fd_to_pollfd);
> > +       current_array_size = 0;
> > +}
> 
> Having done the above I'd be inclined to inline this at the single
> callsite.
> 

I left the initialize() there just to pair with destroy(). But as you
suggested, I can eliminate both.

> > +
> > +#define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & 
> > ~((1UL<<(_w))-1))
> > +static void set_fds(int fd, short events)
> > +{
> > +       if (current_array_size < fd+1) {
> > +               struct pollfd  *p1 = NULL;
> > +               struct pollfd **p2 = NULL;
> > +               unsigned long newsize;
> > +
> > +               /* Round up to 2^8 boundary, in practice this just
> > +                * make newsize larger than current_array_size.
> > +                */
> > +               newsize = ROUNDUP(fd+1, 8);
> > +
> > +               p1 = realloc(fds, sizeof(struct pollfd)*newsize);
> > +               if (!p1)
> > +                       goto fail;
> > +               fds = p1;
> > +
> > +               p2 = realloc(fd_to_pollfd, sizeof(struct pollfd *)*newsize);
> > +               if (!p2)
> > +                       goto fail;
> > +               fd_to_pollfd = p2;
> 
> Do these two arrays really need to be the same size?
> 

Not necessary. I will see what I can do.

> I can see why fd_to_pollfd needs to be as big as the highest fd we are
> polling but the fds array only actually needs to scale with the total
> *number* of fds we are polling. Is it the case that the set of fds we
> are waiting on is pretty densely packed?
> 

It depends. If we have many guests up then destroy them all, then the
arrays can be sparse. There is no way to shrink the arrays. But see
below.

> Or maybe even if the fds are sparse it doesn't matter and this is
> simpler at the cost of a little bit of extra memory usage?
> 
> 

Even we have thousands of guests, the tracking arrays only occupy a few
megabytes.

> > +
> > +               memset(&fds[0] + current_array_size, 0,
> > +                      sizeof(struct pollfd) * 
> > (newsize-current_array_size));
> > +               memset(&fd_to_pollfd[0] + current_array_size, 0,
> > +                      sizeof(struct pollfd *) * 
> > (newsize-current_array_size));
> > +               current_array_size = newsize;
> > +       }
> > +
> > +       fds[nr_fds].fd = fd;
> > +       fds[nr_fds].events = events;
> > +       fd_to_pollfd[fd] = &fds[nr_fds];
> > +       nr_fds++;
> > +
> > +       return;
> > +fail:
> > +       dolog(LOG_ERR, "realloc failed, ignoring fd %d\n", fd);
> > +       return;
> > +}
> [...]
> > -               if (log_hv && FD_ISSET(xc_evtchn_fd(xce_handle), &readfds))
> > +               if (log_hv && fd_revents(xc_evtchn_fd(xce_handle)) & POLLIN)
> >                         handle_hv_logs();
> > 
> >                 if (ret <= 0)
> >                         continue;
> > 
> > -               if (FD_ISSET(xs_fileno(xs), &readfds))
> > +               if (fd_revents(xs_fileno(xs)) & POLLIN)
> 
> It seems like both this and the logging fd could be handled with a
> struct pollfd * variable specifically for each. Which combined with...
> 
> >                         handle_xs();
> > 
> >                 for (d = dom_head; d; d = n) {
> >                         n = d->next;
> >                         if (d->event_count < RATE_LIMIT_ALLOWANCE) {
> >                                 if (d->xce_handle != NULL &&
> > -                                   FD_ISSET(xc_evtchn_fd(d->xce_handle),
> > -                                            &readfds))
> > +                                   fd_revents(xc_evtchn_fd(d->xce_handle)) 
> > &
> 
> ... adding some struct pollfd pointers to struct domain (for this and
> master_fd) would remove the need for the fd_to_pollfd lookup array
> altogether.
> 

Yes, this is possible. But my original thought was to make as little
impact to original code as possible, so I didn't touch struct domain and
choose to add extra tracking facilities to make it just work.

If we hit poll() bottleneck we can move to libevent or libev, which
requires big surgery on existing code.

Of course I can do that as well if that suits you.


Wei.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.