[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] xl devd segmentation fault on xl block-detach
Wei Liu writes ("Re: [Xen-devel] [BUG] xl devd segmentation fault on xl block-detach"): > Can you give the following patch a try? This applies to 4.8. > > Not sure if there is a better way to fix it though. Ian and Roger? I find the logic here rather awkward. I do remember reviewing it and becoming a bit confused at the time and it seems that even though I eventually convinced myself it was OK, I was wrong. > From: Wei Liu <wei.liu2@xxxxxxxxxx> > Date: Wed, 3 May 2017 17:55:42 +0100 > Subject: [PATCH] libxl: fix backend_watch_callback > > That function needs to cope with spurious events. The original "skip" > path blindly freed dguest even when it needed to stay in ddomain list. > > Free dguest iff it is newly added to the list. That way we don't free > the one that should stay on the list and we don't unnecessarily add a > stale dguest entry to ddomain list. AFAICT right now you are right. But I see another possible way of fixing it: How about moving the num_devs == 0 check, and associated cleanup, to the exit path ? That way a if new guest struct is spuriously allocated, it will automatically be freed. It would mean that the freeing of dguest would depend only on other invariants already in the code, rather than on explicit tracking. The invariants are, I think: * Any libxl__ddomain_device is either * on some list libxl__ddomain_guest->devices * being processed for removal, and referenced by a device remove async call initiated by remove_device and which will call device_complete() when done but not both! * Any libxl__domain_guest is on the list libxl__ddomain->guests. The above apply even within any function, except very briefly when transitioning from one state to another (eg creation, destruction). * SUM(libxl__domain_guest->num_*) != 0, when we return from the outermost callback. (Ie, there are no leftover empty guest structs.) Thinking about this like this, and observing the control flow, leads me to think I have found another bug. Consider what happens if a device is removed while it is still being added. That is, an event comes in which causes us to call add_device. add_device sets up the callback and starts doing work (eg hotplug scripts). Before that finishes, the device is removed again. backend_watch_callback will tear the device down and free dev. But dev is still referenced by the add_device operation, and when it completes, device_complete will call libxl__device_backend_path(gc, aodev->dev) There ought to be a (perhaps implicit) invariant that * Any dev referenced by an aodev call is legit But this invariant is violated by backend_watch_callback, which frees it despite it not knowing whether there is a callback in flight. Perhaps we should do explicit reference counting. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |