[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Race condition on device add hanling in xl devd
On Thu, Feb 28, 2019 at 11:08:37AM +0100, Roger Pau Monné wrote: > On Mon, Feb 25, 2019 at 12:14:02AM +0100, Marek Marczykowski-Górecki wrote: > > On Mon, Dec 17, 2018 at 05:09:19PM +0100, Roger Pau Monné wrote: > > > On Mon, Dec 17, 2018 at 02:42:23PM +0000, Paul Durrant wrote: > > > > I suspect I must be remembering a XenServer-specific hack^Wpatch then. > > > > I'd have to dig... it's been a while since I messed with the netif > > > > state model, which is of course different the blkif state model. > > > > > > Quite likely. With udev scripts is was feasible to only execute > > > hotplug scripts for vifs with an attached frontend. > > > > > > With libxl this is not possible, since hotplug scripts are run during > > > domain creation, at which point the guest is completely paused. > > > > > > I'm not that familiar with bridges and vifs, but maybe the vifs status > > > can be set to offline until there's a frontend attached in order to > > > reduce the bridge distributor load? (if that's not already the case). > > > > I've found was the problem, and with some definition of "race condition" > > it could be named this way. > > The problem is that for some reason xenstore watch on device add > > sometimes does not fire in xl devd. But then, when libxl in dom0 > > timeouts and remove the device, the xenstore watch in xl devd fire and > > hotplug script is called. At this point device is already gone, so > > it fails. xl devd then quickly calls hotplug script the second time, for > > device removal. > > > > I have no idea why this xenstore watch do not fire, but triggering a > > no-op write into watched path (to trigger the watch again) workarounds > > the problem. I use a xenstore watch in dom0 for that[1] - which works. > > I suspect something related to KVM nested virtualization (lost > > interrupt?)... > > That's very weird, could you try to run xenstored in dom0 with trace > enabled [0] in order to try to figure out what's happening? I've tried already, but it was way too slow (remember it's nested KVM, it doesn't really improve the performance). I hit multiple timeouts even without hitting this problem. Unfortunately I don't have logs from that experiment anymore. I can try again... > I assume this only happens when running nested in KVM? I'd say so. I'm not entirely sure, because I've seen similar symptoms on bare metal Xen too in the past, but I think it could be a different problem and also I haven't seen it in past 3 months. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |