[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] çåï Re: [PATCH] fix race condition between libvirtd event handling and libxl fd deregister



Hi, Ian

Thanks your comments, I do not think it deeply before it.

I am not sure about your "remaining problem". I guess this problem should not 
happen, 
Because the different registration should refer to different ev(lixl__ev_time, 
lixl__ev_fd).

About the race condition, I still not understand why it occurred. Libvirt 
should not got the event while libxl thought such event is useless. 
Currently, fd is deregister twice, the first time is just after transfer done 
and second time is 
during cleanup function such as bootloader cleanup. If we only do it in the 
cleanup functions,
this race should not happen.

>>> Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> 12å11æ28æ äå 3:08 >>>
Bamvor Jian Zhang writes ("[PATCH] fix race condition between libvirtd event 
handling and libxl fd deregister"):
> the race condition may be encounted at the following senaro:

Thanks for this report.  You are correct that there is a race here.
Unfortunately it's more complicated than your analysis reveals and I
think your proposed fix is not sufficient.

I have worked up a patch - see below - which I think fixes most of the
problem.  Please see the commit message and the big new comment just
after the definition of OSEVENT_HOOK_VOID for details.

But I think there may be one serious problem remaining.  It seems to
me that entry to libxl__osevent_occurred_timeout can race with the
hook timeout_deregister.  Specifically, the API presented by libxl
does not allow libxl to tell whether a call to
libxl__osevent_occurred_timeout is one which was entered before a call
to libxl__ev_time_deregister and thence timeout_deregister.

Let us suppose that the previous timeout registration is similar to
the new one, but has a different for_app_reg value.

If the call to libxl__osevent_occurred_timeout was entered a long time
ago then it refers to the old timeout registration and the current
timeout registration is not affected, and must still be deregistered if
necessary.  Not deregistering it would leak memory.

If the call was entered "recently" it refers to the current timeout
registration.  That means that the current timeout for_app_reg value
(the libvirt event request) has been freed and deregistering it would
be a memory management error.

Bamvor, do you agree with this analysis ?  If so, what change to the
libxl API syntax or semantics would fix the problem ?

I see that the notion that a timeout event registration by libxl with
the application is automatic deregistered is not expressed clearly in
libxl.h.  So perhaps one fix would be to decree that libxl is supposed
to call timeout_deregister from within
libxl__osevent_occurred_timeout if it wasn't called before.

Thanks,
Ian.


From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Subject: [PATCH] RFC: libxl: fix stale fd/timeout event callback race

DO NOT APPLY

Because there is not necessarily any lock held at the point the
application (eg, libvirt) calls libxl_osevent_occurred_timeout and
..._fd, in a multithreaded program those calls may be arbitrarily
delayed in relation to other activities within the program.

libxl therefore needs to be prepared to receive very old event
callbacks.  Arrange for this to be the case.

Document the problem and the solution in a comment in libxl_event.c
just before the definition of struct libxl__osevent_hook_nexus.

Reported-by: Bamvor Jian Zhang <bjzhang@xxxxxxxx>
Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

---
 tools/libxl/libxl_event.c    |  188 +++++++++++++++++++++++++++++++++++------
 tools/libxl/libxl_internal.h |    8 ++-
 2 files changed, 166 insertions(+), 30 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 72cb723..7c8719d 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -38,23 +38,123 @@
  * The application's registration hooks should be called ONLY via
  * these macros, with the ctx locked.  Likewise all the "occurred"
  * entrypoints from the application should assert(!in_hook);
+ *
+ * Dur+ * evaluated - ev->nexus is guaranteed to be valid and refer to the
+ * nexus which is being used for this event registration.  The
+ * arguments should specify ev->nexus for the for_libxl argument and
+ * ev->nexus->for_app_reg (or a pointer to it) for for_app_reg.
  */
-#define OSEVENT_HOOK_INTERN(retval, hookname, ...) do {                      \
-    if (CTX->osevent_hooks) {                                                \
-        CTX->osevent_in_hook++;                                              \
-        retval CTX->osevent_hooks->hookname(CTX->osevent_user, __VA_ARGS__); \
-        CTX->osevent_in_hook--;                                              \
-    }                                                                        \
+#define OSEVENT_HOOK_INTERN(retval, failedp, evkind, hookop, ...) do {  \
+    if (CTX->osevent_hooks) {                                           \
+        CTX->osevent_in_hook++;                                         \
+        libxl__osevent_hook_nexi *nexi = &CTX->hook_##evkind##_nexi_idle; \
+        osevent_hook_pre_##hookop(gc, ev, nexi, &ev->nexus);            \
+        retval CTX->osevent_hooks->evkind##_##hookop                    \
+            (CTX->osevent_user, __VA_ARGS__);                           \
+        if ((failedp))                                                  \
+            osevent_hook_failed_##hookop(gc, ev, nexi, &ev->nexus);     \
+        CTX->osevent_in_hook--;                                         \
+    }                                                                   \
 } while (0)
 
-#define OSEVENT_HOOK(hookname, ...) ({                                       \
-    int osevent_hook_rc = 0;                                                 \
-    OSEVENT_HOOK_INTERN(osevent_hook_rc = , hookname, __VA_ARGS__);          \
-    osevent_hook_rc;                                                         \
+#define OSEVENT_HOOK(evkind, hookop, ...) ({                   \
+    int osevent_hook_rc = 0;                                    \
+    OSEVENT_HOOK_INTERN(osevent_hook_rc =, !!osevent_hook_rc,   \
+                        evkind, hookop, __VA_ARGS__);          \
+    osevent_hook_rc;                                            \
 })
 
-#define OSEVENT_HOOK_VOID(hookname, ...) \
-    OSEVENT_HOOK_INTERN(/* void */, hookname, __VA_ARGS__)
+#define OSEVENT_HOOK_VOID(evkind, hookop, ...) \
+    OSEVENT_HOOK_INTERN(/* void */, 0, evkind, hookop, __VA_ARGS__)
+
+/*
+ * The application's calls to libxl_osevent_occurred_... may be
+ * indefinitely delayed with respect to the rest of the program (since
+ * they are not necessarily called with any lock held).  So the
+ * for_libxl value we receive may be (almost) arbitrarily old.  All we
+ * know is that it came from this ctx.
+ *
+ * Therefore we may not free the object referred to by any for_libxl
+ * value until we free the whole libxl_ctx.  And if we reuse it we
+ * must be able to tell when an old use turns up, and discard the
+ * stale event.
+ *
+ * Thus we cannot use the ev directly as the for_libxl value - we need
+ * a layer of indirection.
+ *
+ * We do this by keeping a pool of libxl__osevent_hook_nexus structs,
+ * and use pointers to them as for_libxl values.  In fact, there are
+ * two pools: one for fds and one for timeouts.  This ensures that we
+ * don't risk a type error when we upcast nexus->ev.  In each nexus
+ * the ev is either null or points to a valid libxl__ev_time or
+ * libxl__ev_fd, as applicable.
+ *
+ * We /do/ allow ourselves to reassociate an old nexus with a new ev
+ * as otherwise we would have to leak nexi.  (This reassociation
+ * might, of course, be an old ev being reused for a new purpose so
+ * simply comparing the ev pointer is not sufficient.)  Thus the
+ * libxl_osevent_occurred functions need to check that the condition
+ * allegedly signalled by this event actually exists.
+ *
+ * The nexi and the lists are all protected by the ctx lock.
+ */
+ 
+struct libxl__osevent_hook_nexus {
+    LIBXL_SLIST_ENTRY(libxl__osevent_hook_nexus) next;
+};
+
+static void osevent_hook_pre_register(libxl__gc *gc, void *ev,
+                                      libxl__osevent_hook_nexi *nexi_idle,
+                                      libxl__osevent_hook_nexus **nexus_r)
+{
+    libxl__osevent_hook_nexus *nexus = LIBXL_SLIST_FIRST(nexi_idle);
+    if (nexus) {
+        LIBXL_SLIST_REMOVE_HEAD(nexi_idle, next);
+    } else {
+        nexus = libxl__zalloc(NOGC, sizeof(*nexus));
+    }
+    nexus->ev = ev;
+    *nexus_r = nexus;
+}
+static void osevent_hook_pre_deregister(libxl__gc *gc, void *ev,
+                                        libxl__osevent_hook_nexi *nexi_idle,
+                                        libxl__osevent_hook_nexus **nexus)
+{
+    (*nexus)->ev = 0;
+    LIBXL_SLIST_INSERT_HEAD(nexi_idle, *nexus, next);
+}
+static void osevent_hook_pre_modify(libxl__gc *gc, void *ev,
+                                    libxl__osevent_hook_nexi *nexi_idle,
+                                    libxl__osevent_hook_nexus **nexus)
+{
+}
+
+static void osevent_hook_failed_register(libxl__gc *gc, void *ev,
+                                         libxl__osevent_hook_nexi *nexi_idle,
+                                         libxl__osevent_hook_nexus **nexus)
+{
+    osevent_hook_pre_deregister(gc, ev, nexi_idle, nexus);
+}
+static void osevent_hook_failed_deregister(libxl__gc *gc, void *ev,
+                                           libxl__osevent_hook_nexi *nexi_idle,
+                                           libxl__osevent_hook_nexus **nexus)
+{
+    abort();
+}
+static void osevent_hook_failed_modify(libxl__gc *gc, void *ev,
+                                       libxl__osevent_hook_nexi *nexi_idle,
+                                       libxl__osevent_hook_nexus **nexus)
+{
+}
+
+static void *osevent_ev_from_hook_nexus(libxl_ctx *ctx, void *for_libxl)
+{
+    libxl__osevent_hook_nexus *nexus = for_libxl;
+    return nexus->ev;
+}
 
 /*
  * fd events
@@ -72,7 +172,8 @@ int libxl__ev_fd_register(libxl__gc *gc, libxl__ev_fd *ev,
 
     DBG("ev_fd=%p register fd=%d events=%x", ev, fd, events);
 
-    rc = OSEVENT_HOOK(fd_register, fd, &ev->for_app_reg, events, ev);
+    rc = OSEVENT_HOOK(fd, register, fd, &ev->nexus->for_app_reg,
+                      events, ev->nexus);
     if (rc) goto out;
 
     ev->fd = fd;
@@ -97,7 +198,7 @@ int libxl__ev_fd_modify(libxl__gc *gc, libxl__ev_fd *ev, 
short events)
 
     DBG("ev_fd=%p modify fd=%d events=%x", ev, ev->fd, events);
 
-    rc = OSEVENT_HOOK(fd_modify, ev->fd, &ev->for_app_reg, events);
+    rc = OSEVENT_HOOK(fd, modify, ev->fd, &ev->nexus->for_app_reg, events);
     if (rc) goto out;
 
     ev->events = events;
@@ -119,7 +220,7 @@ void libxl__ev_fd_deregister(libxl__gc *gc, libxl__ev_fd 
*ev)
 
     DBG("ev_fd=%p deregister fd=%d", ev, ev->fd);
 
-    OSEVENT_HOOK_VOID(fd_deregister, ev->fd, ev->for_app_reg);
+    OSEVENT_HOOK_VOID(fd, deregister, ev->fd, ev->nexus->for_app_reg);
     LIBXL_LIST_REMOVE(ev, entry);
     ev->fd = -1;
 
@@ -171,7 +272,8 @@ static int time_register_finite(libxl__gc *gc, 
libxl__ev_time *ev,
 {
     int rc;
 
-    rc = OSEVENT_HOOK(timeout_register, &ev->for_app_reg, absolute, ev);
+    rc = OSEVENT_HOOK(timeout, register, &ev->nexus->for_app_reg,
+                      absolute, ev->nexus);
     if (rc) return rc;
 
     ev->infinite = 0;
@@ -184,7 +286,7 @@ static int time_register_finite(libxl__gc *gc, 
libxl__ev_time *ev,
 static void time_deregister(libxl__gc *gc, libxl__ev_time *ev)
 {
     if (!ev->infinite) {
-        OSEVENT_HOOK_VOID(timeout_deregister, ev->for_app_reg);
+        OSEVENT_HOOK_VOID(timeout, deregister, ev->nexus->for_app_reg);
         LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry);
     }
 }
@@ -270,7 +372,7 @@ int libxl__ev_time_modify_abs(libxl__gc *gc, libxl__ev_time 
*ev,
         rc = time_register_finite(gc, ev, absolute);
         if (rc) goto out;
     } else {
-        rc = OSEVENT_HOOK(timeout_modify, &ev->for_app_reg, absolute);
+        rc = OSEVENT_HOOK(         LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry);
@@ -1010,35 +1112,65 @@ void libxl_osevent_register_hooks(libxl_ctx *ctx,
 
 
 void libxl_osevent_occurred_fd(libxl_ctx *ctx, void *for_libxl,
-                               int fd, short events, short revents)
+                               int fd, short events_ign, short revents_ign)
 {
-    libxl__ev_fd *ev = for_libxl;
-
     EGC_INIT(ctx);
     CTX_LOCK;
     assert(!CTX->osevent_in_hook);
 
-    assert(fd == ev->fd);
-    revents &= ev->events;
-    if (revents)
-        ev->func(egc, ev, fd, ev->events, revents);
+    libxl__ev_fd *ev = osevent_ev_from_hook_nexus(ctx, for_libxl);
+    if (!ev) goto out;
+    if (ev->fd != fd) goto out;
+
+    struct pollfd check;
+    for (;;) {
+        check.fd = fd;
+        check.events = ev->events;
+        int r = poll(&check, 1, 0);
+        if (!r)
+            goto out;
+        if (r==1)
+            break;
+        assert(r<0);
+        if (errno != EINTR) {
+            LIBXL__EVENT_DISASTER(egc, "failed poll to check for fd", errno, 
0);
+            goto out;
+        }
+    }
 
+    if (check.revents)
+        ev->func(egc, ev, fd, ev->events, check.revents);
+
+ out:
     CTX_UNLOCK;
     EGC_FREE;
 }
 
 void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
 {
-    libxl__ev_time *ev = for_libxl;
-
     EGC_INIT(ctx);
     CTX_LOCK;
     assert(!CTX->osevent_in_hook);
 
-    assert(!ev->infinite);
+    libxl__ev_time *ev = osevent_ev_from_hook_nexus(ctx, for_libxl);
+    if (!ev) goto out;
+    if (ev->infinite) goto out;
+
+    struct timeval now;
+    int r = libxl__gettimeofday(gc, &now);
+    if (r)
+        /* probably(?) safer to risk a race causing a time event to
+         * happen early than to ignore the event and call DISASTER */
+        goto out;
+
+    if (timercmp(&ev->abs, &now, >))
+        /* lost the race - this is a stale timer event callback */
+        goto out;
+
     LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry);
     ev->func(egc, ev, &ev->abs);
 
+ out:
     CTX_UNLOCK;
     EGC_FREE;
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index cba3616..6484bcb 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -136,6 +136,8 @@ typedef struct libxl__gc libxl__gc;
 typedef struct libxl__egc libxl__egc;
 typedef struct libxl__ao libxl__ao;
 typedef struct libxl__aop_occurred libxl__aop_occurred;
+typedef struct libxl__osevent_hook_nexus libxl__osevent_hook_nexus;
+typedef struct libxl__osevent_hook_nexi libxl__osevent_hook_nexi;
 
 _hidden void libxl__alloc_failed(libxl_ctx *, const char *func,
                          size_t nmemb, size_t size) __attribute__((noreturn));
@@ -163,7 +165,7 @@ struct libxl__ev_fd {
     libxl__ev_fd_callback *func;
     /* remainder is private for libxl__ev_fd... */
     LIBXL_LIST_ENTRY(libxl__ev_fd) entry;
-    void *for_app_reg;
+    libxl__osevent_hook_nexus *nexus;
 };
 
 
@@ -178,7 +180,7 @@ struct libxl__ev_time {
     int infinite; /* not registered in list or with app if infinite */
     LIBXL_TAILQ_ENTRY(libxl__ev_time) entry;
     struct timeval abs;
-    void *for_app_reg;
+    libxl__osevent_hook_nexus *nexus;
 };
 
 typedef struct libxl__ev_xswatch libxl__ev_xswatch;
@@ -329,6 +331,8 @@ struct libxl__ctx {
     libxl__poller poller_app; /* libxl_osevent_beforepoll and _afterpoll */
     LIBXL_LIST_HEAD(, libxl__poller) pollers_event, pollers_idle;
 
+    LIBXL_SLIST_HEAD(libxl__osevent_hook_nexi, libxl__osevent_hook_nexus)
+        hook_fd_nexi_idle, hook_timeout_nexi_idle;
     LIBXL_LIST_HEAD(, libxl__ev_fd) efds;
     LIBXL_TAILQ_HEAD(, libxl__ev_time) etimes;
 
-- 
tg: (bfe7025..) t/xen/xl.eventfix2.fdreg.indirect (depends on: base)



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.