[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: possible kernel/libxl race with xl network-attach
On Thu, Jan 13, 2022 at 11:19:46AM +0000, James Dingwall wrote: > Hi, > > I have been trying to debug a problem where a vif with the backend in a > driver domain is added to dom0. Intermittently the hotplug script is > not invoked by libxl (running as xl devd) in the driver domain. By > enabling some debug for the driver domain kernel and libxl I have these > messages: > > driver domain kernel (Ubuntu 5.4.0-92-generic): > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully > created xenvif > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: > /local/domain/0/device/vif/0 -> Initialising > [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: > backend/vif/0/0 -> InitWait > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: > /local/domain/0/device/vif/0 -> Connected > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready > [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: > backend/vif/0/0 -> Connected > > xl devd (Xen 4.14.3): > > 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: > watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event > epath=/local/domain/2/backend/vif/0/0/state > 2022-01-13 01:39:31 UTC libxl: debug: > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, > parent 0x5633ac567f90 > 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: > watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: > event epath=/local/domain/2/backend/vif/0/0/state > 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: > backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting > state 4 > 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: > watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event > epath=/local/domain/2/backend/vif/0/0/state > 2022-01-13 01:39:31 UTC libxl: debug: > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, > parent 0x5633ac567f90 > 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: > watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: > event epath=/local/domain/2/backend/vif/0/0/state > 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: > backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting > state 4 > 2022-01-13 01:39:51 UTC libxl: debug: > libxl_aoutils.c:88:xswait_timeout_callback: backend > /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait > timeout (path=/local/domain/2/backend/vif/0/0/state) > 2022-01-13 01:39:51 UTC libxl: debug: > libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2 > 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: > backend /local/domain/2/backend/vif/0/0/state wanted state 2 timed out > 2022-01-13 01:39:51 UTC libxl: debug: > libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: > deregister unregistered > 2022-01-13 01:39:51 UTC libxl: debug: > libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup > 2022-01-13 01:39:51 UTC libxl: debug: > libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: > deregister unregistered > 2022-01-13 01:39:51 UTC libxl: error: > libxl_device.c:1105:device_backend_callback: unable to add device with path > /local/domain/2/backend/vif/0/0 > 2022-01-13 01:39:51 UTC libxl: debug: > libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: > deregister unregistered > 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: > device /local/domain/2/backend/vif/0/0 add failed > 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: > ao 0x5633ac568f30: destroy > > the xenstore content for the backend: > > # xenstore-ls /local/domain/2/backend/vif/0 > 0 = "" > frontend = "/local/domain/0/device/vif/0" > frontend-id = "0" > online = "1" > state = "4" > script = "/etc/xen/scripts/vif-zynstra" > vifname = "dom0.0" > mac = "00:16:3e:6c:de:82" > bridge = "cluster" > handle = "0" > type = "vif" > feature-sg = "1" > feature-gso-tcpv4 = "1" > feature-gso-tcpv6 = "1" > feature-ipv6-csum-offload = "1" > feature-rx-copy = "1" > feature-rx-flip = "0" > feature-multicast-control = "1" > feature-dynamic-multicast-control = "1" > feature-split-event-channels = "1" > multi-queue-max-queues = "2" > feature-ctrl-ring = "1" > hotplug-status = "connected" > > My guess is that the libxl callback is started waiting for the backend > state key to be set to XenbusStateInitWait (2) but the frontend in dom0 > has already triggered the backend to transition to XenbusStateConnected > (4) and therefore it does not successfully complete. I think your analysis is correct. The question is who sets hotplug-status to "connected" in xenstore if the hotplug script in the driver domain hasn't been launched by `xl devd`. It would seem to me some other entity is launching the hotplug scripts, do you have some udev rules in the driver domain that do that and race with `xl devd` maybe? Regards, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |