[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: possible kernel/libxl race with xl network-attach


  • To: James Dingwall <james-xen@xxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 13 Jan 2022 17:11:07 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Slr2GhhCNaXEWifmCr49UNvxjA6VYstXfirn1wac16k=; b=L6HytB3tpvt5OVEIbX7Dafivq9gjWjswlKCy2KpKSGZEqrsrfX/ahV+B9dyqhm2DgHcdCt9eB0ygU8QVLcSiTPN3GdTSAEKhonDheUFwvqtu/rABoUWSEgjJlfQzF8dOEJmqLMeuVy1j0luTzuY/l2T3P96Cv9W0/z96iwdrEtG/SdDkcQm/2FF9EHON5Lk7y52GmGs0w3/uALKkn/fdF0QUXN3d8/aC0AEX5h9I/Rz/IvGbUeqLn3oj/Q+C08RyK7sMaNyDr+tKDrrBlx9QqikpZgSx94b3OuLGVK1gJ9Bwt1MTA23TslEI/15CrWX6tOIHIGIFl/E7or20HaOJOg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PP4LLuQsjHh4ZetM7mT5g+PWeC41k9ZCOfzgEewHv0csHlSo9sIjQ8RdoST5DyYZkyyHnJ/4vg77f2YTA0XboLIT64t0g8Ts8UxHwogW9SEgnceUdKxYvLvLSQ2b6ZQFkFmzXRP/izE55WaOFI//sr0oWzdpbddy4xx6pG2XRgeA4wTY0ul6kWxaGidR7p4p0dVP/D1qzQYhZ65pG8mM2lCTioVD8RxPaSbTft9tuX2JWsznMAkGyG7WdjYFrokTTdpn4fwncLUf2RiOgMlTMSxukMZyQPw0b4fPzMwiQWCDUjIB9qJ2RiZi1r+eR5XoB/ktSVQCvhc8Axtmp0gWzw==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 13 Jan 2022 16:12:19 +0000
  • Ironport-data: A9a23:wUllZq3mjKCFhFgUKPbD5Ql2kn2cJEfYwER7XKvMYLTBsI5bp2RWm 2cbCDqAaPaDMzanL99wPtnl9UsGuZeAnIcxHFNkpC1hF35El5HIVI+TRqvS04J+DSFhoGZPt Zh2hgzodZhsJpPkS5PE3oHJ9RGQ74nRLlbHILOCanAZqTNMEn9700o6wrVh2OaEvPDia++zk YKqyyHgEAfNNw5cagr4PIra9XuDFNyr0N8plgRWicJj5TcypFFMZH4rHomjLmOQf2VhNrXSq 9Avbl2O1jixEx8FUrtJm1tgG6EAaua60QOm0hK6V0U+6/TrS+NbPqsTbZIhhUlrZzqhro0u6 dhEqZmKRzgPJ4Hg295BaER5HHQrVUFG0OevzXmXtMWSywvNcmf2wuUoB0YzVWEa0r8pWycUr 6VecW1TKEDY7w616OvTpu1EnMMsIdOtJIoCknph0SvYHbAtRpWrr6Diu4YIgWxr3J0m8fD2a 5JAOT9xakX5egxjM34uUMxgtei5ryyqG9FfgA3M/vdmi4TJ9yRh3b6oPNfLd9iiQcROgl3et m/A537+ABwRKJqY0zXt2nClg+LU2DP2XqoeFbu36fduhBuWyyoOC3UruUCT+KfjzBTkApQGd hJSqnFGQbUOGFKDf4SifhCxvHu+sCE6QvZ8UNdm01ysxf+Bi+qGPVQsQjlEYd0gkcY5Qz02y 1OE9+/U6SxTXK69EizEqOrNxd+mEW1MdDJZO3dYJecQy4S7+OkOYgTzosGP+UJfpvn8AnnOz j+Dt0DSbJ1D3JdQh81XEb0q6g9AR6QlrCZpt207vUr/t2uVgbJJgaTyuTA3Ct4afe6koqGp5 iRspiRnxLlm4WuxvCKMWv4RO7qi+uyINjbR6XY2QcV7r232oyD/LdsKiN2bGKuPGpxUEdMOS BWC0T69GbcJZCf6BUOJS9/Z5zsWIVjISo2+C6G8gitmaZltbg6XlByClmbLt10BZHMEyPllU b/CKJ7EJS9DVcxPkWTqL89Aj+5D7n1ulAv7GMGgpzz6gOX2WZJgYepfWLd4RrpnvPrsTcS82 4s3CvZmPD0EAbKuOXeGoNdDRb3IRFBiba3LRwVsXrfrCiJtGX07Cu+XxrUkeod/mL9SmPuO9 Xa4MnK0AnKm2xUr8C2GNSJubq3BR5F6oS5pNCAgJw/wiXMifZyu/OEUcJ5uJesr8+lqzPhVS fgZeprfXqQTG2qfozlNP4PgqIFCdQiwgV7cNSSSfzViLYVrQBbE+4G4c1K3pjUOFCe+qeA3v 6akilHAWZMGSgk7VJTWZfujwkmfp38YnO4uDULELsMKIBfn8ZRwKjy3hfgyepleJRLGzzqc9 gCXHRZH+rWd/95rqIHE3PnWoZ2oHu1yGlthM1PatbvmZzPH+meDwJNbVLradz7qS26pqr6pY v9Yzq+gPaRfzkpKqYd1D51i0bk6u4n0v7ZfwwlpQCfLYlCsBu8yK3WKx5AS5KhEx7sfsgqqQ EOfvNJdPOzRas/iFVcQIisjb/iCiq5IymWDs6xtLRWo/jJz8ZqGTV5WbkuFhyFqJbdoNJ8on LU6s8kM5g3j0hcnP75qVMyPG7hg+pDYb5gaiw==
  • Ironport-hdrordr: A9a23:V6bmvaiSwd1p5PHechlygUOFv3BQXzh13DAbv31ZSRFFG/FwyP rAoB1L73PJYWgqNU3I+ergBEGBKUmskqKdxbNhR4tKPTOWw1dASbsN0WKM+UyDJ8STzJ856U 4kSdkCNDSSNykFsS+Z2njALz9I+rDum8rJ9ITjJjVWPHlXgslbnnhE422gYytLrWd9dP4E/M 323Ls6m9PsQwVeUu2LQl0+G8TTrdzCk5zrJTYAGh4c8QGLyRel8qTzHRS01goXF2on+8ZpzU H11yjCoomzufCyzRHRk0fV8pRtgdPkjv9OHtaFhMQ5IijlziyoeINicbufuy1dmpDl1H8a1P 335zswNcV67H3cOkmzvBvWwgHllA0j7nfzoGXo9kfLkIjcfnYXGsBBjYVWfl/y8Ew7puxx16 pNwiawq4dXJQmoplWz2/H4EzVR0makq3srluAey1ZFV5EFVbNXpYsDuGtIDZY7Gj7g4oxPKp gjMCjl3ocWTbqmVQGYgoE2q+bcHUjbXy32D3Tqg/blnQS/xxtCvgklLM92pAZ1yHtycegA2w 3+CNUZqFh5dL5iUUtMPpZxfSKJMB2/ffvtChPlHb21LtBPB5ryw6SHkondotvaPKA18A==
  • Ironport-sdr: o6AEBBgvBkW2mgvUSJSj3o2Xi+DEjThJE/Ko027lIhjtO0hGFKD78zuJk/oZG5DuV9DMN1FqmO +172BZ11ifXldY1FfCYPg57JeeOqRF/XdQebqHAR0A4i3IWK0UiH4QYXUjqB+OLUECl0yJwMYI iEIWOVE51vUSiGiXzlLLzKrlj2zwltR0422YbV+/AFE00ul5tO5xhBvEWstWRQcR/Ff1NyfRK/ Uce0faikvAWijgYRxrQGbj9MxmbJ1Bf3hOQopm7W46oo1cDdXyLXXsU5s/duu5NJAFPtro7oDZ dstb2VZOUeYX3lhmj+2PMEL5
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jan 13, 2022 at 11:19:46AM +0000, James Dingwall wrote:
> Hi,
> 
> I have been trying to debug a problem where a vif with the backend in a 
> driver domain is added to dom0.  Intermittently the hotplug script is 
> not invoked by libxl (running as xl devd) in the driver domain.  By 
> enabling some debug for the driver domain kernel and libxl I have these 
> messages:
> 
> driver domain kernel (Ubuntu 5.4.0-92-generic):
> 
> [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully 
> created xenvif
> [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
> /local/domain/0/device/vif/0 -> Initialising
> [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
> backend/vif/0/0 -> InitWait
> [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
> /local/domain/0/device/vif/0 -> Connected
> [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
> backend/vif/0/0 -> Connected
> 
> xl devd (Xen 4.14.3):
> 
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
> epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: 
> libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, 
> parent 0x5633ac567f90
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: 
> event epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
> state 4
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
> epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: 
> libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, 
> parent 0x5633ac567f90
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: 
> event epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
> state 4
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_aoutils.c:88:xswait_timeout_callback: backend 
> /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait 
> timeout (path=/local/domain/2/backend/vif/0/0/state)
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 
> wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: error: 
> libxl_device.c:1105:device_backend_callback: unable to add device with path 
> /local/domain/2/backend/vif/0/0
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: 
> device /local/domain/2/backend/vif/0/0 add failed
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: 
> ao 0x5633ac568f30: destroy
> 
> the xenstore content for the backend:
> 
> # xenstore-ls /local/domain/2/backend/vif/0
> 0 = ""
>  frontend = "/local/domain/0/device/vif/0"
>  frontend-id = "0"
>  online = "1"
>  state = "4"
>  script = "/etc/xen/scripts/vif-zynstra"
>  vifname = "dom0.0"
>  mac = "00:16:3e:6c:de:82"
>  bridge = "cluster"
>  handle = "0"
>  type = "vif"
>  feature-sg = "1"
>  feature-gso-tcpv4 = "1"
>  feature-gso-tcpv6 = "1"
>  feature-ipv6-csum-offload = "1"
>  feature-rx-copy = "1"
>  feature-rx-flip = "0"
>  feature-multicast-control = "1"
>  feature-dynamic-multicast-control = "1"
>  feature-split-event-channels = "1"
>  multi-queue-max-queues = "2"
>  feature-ctrl-ring = "1"
>  hotplug-status = "connected"
> 
> My guess is that the libxl callback is started waiting for the backend 
> state key to be set to XenbusStateInitWait (2) but the frontend in dom0 
> has already triggered the backend to transition to XenbusStateConnected 
> (4) and therefore it does not successfully complete.

I think your analysis is correct. The question is who sets
hotplug-status to "connected" in xenstore if the hotplug script in the
driver domain hasn't been launched by `xl devd`.

It would seem to me some other entity is launching the hotplug
scripts, do you have some udev rules in the driver domain that do that
and race with `xl devd` maybe?

Regards, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.