[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable



On 01/04/15 11:34, Roger Pau Monnà wrote:
> Hello,
>
> El 30/03/15 a les 12.40, Ian Campbell ha escrit:
>> On Thu, 2015-03-26 at 16:20 +0100, Roger Pau Monnà wrote:
>>> El 16/03/15 a les 14.29, Ross Lagerwall ha escrit:
>>>> From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>
>>>> POLLHUP|POLLIN is a valid revent to receive when there is readable data in 
>>>> a
>>>> pipe, but the writable fd has been closed.  This occurs in migration v2 
>>>> when
>>>> the legacy conversion process (which transforms the data inline) completes 
>>>> and
>>>> exits successfully.
>>>>
>>>> In the case that there is data to read, suppress the POLLHUP.  POSIX states
>>>> that the hangup state is latched[1], which means it will reoccur on 
>>>> subsequent
>>>> poll() calls.  The datacopier is thus provided the opportunity to read 
>>>> until
>>>> EOF, if possible.
>>>>
>>>> A POLLHUP on its own is treated exactly as before, indicating a different
>>>> error with the fd.
>>>>
>>>> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
>>> This patch breaks booting pygrub guests on FreeBSD, it seems to trigger 
>>> some kind of error when pygrub exits:
>>>
>>> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof 
>>> copying bootloader output
>>> libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build 
>>> domain: -3
>>>
>>> I will try to look into it, have you tested if the same happens on 
>>> Linux?
>> Ross/Andrew, have you tried this on Linux?
>>
>> Roger, Does xl -vvv or even strace give any clues as to the sequences of
>> events which lead to the error?
> Here is the output of xl -vvv, it doesn't seem to contain any useful 
> info apart from what I've already posted:
>
> # xl -vvv create debian.cfg
> Parsing config from debian.cfg
> libxl: debug: libxl_create.c:1521:do_domain_create: ao 0x802834080: create: 
> how=0x0 callback=0x0 poller=0x80281f0a0
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk 
> vdev=xvda spec.backend=unknown
> libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk 
> vdev=xvda, using backend phy
> libxl: debug: libxl_create.c:924:initiate_domain_create: running bootloader
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk 
> vdev=(null) spec.backend=phy
> libxl: debug: libxl.c:3065:libxl__device_disk_local_initiate_attach: locally 
> attaching PHY disk /dev/zvol/tank/debian
> libxl: debug: libxl_bootloader.c:411:bootloader_disk_attached_cb: Config 
> bootloader value: pygrub
> libxl: debug: libxl_bootloader.c:427:bootloader_disk_attached_cb: Checking 
> for bootloader in libexec path: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_create.c:1537:do_domain_create: ao 0x802834080: 
> inprogress: poller=0x80281f0a0, flags=i
> libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch 
> w=0x8028367a0 wpath=/local/domain/2 token=3/0: register slotnum=3
> libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0x802834080: 
> progress report: ignored
> libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing 
> bootloader: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
> /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
> --output=/var/run/xen/bootloader.2.out
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
> --output-format=simple0
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
> --output-directory=/var/run/xen/bootloader.2.d
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
> /dev/zvol/tank/debian
> libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x8028367a0 
> wpath=/local/domain/2 token=3/0: event epath=/local/domain/2
> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof 
> copying bootloader output
> libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader completed
> libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch 
> w=0x8028367a0 wpath=/local/domain/2 token=3/0: deregister slotnum=3
> libxl: error: libxl_create.c:1138:domcreate_rebuild_done: cannot (re-)build 
> domain: -3
> libxl: info: libxl.c:1699:devices_destroy_cb: forked pid 763 for destroy of 
> domain 2
> libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x802834080: 
> complete, rc=-3
> libxl: debug: libxl_event.c:1738:libxl__ao__destroy: ao 0x802834080: destroy
> xc: debug: hypercall buffer: total allocations:44 total releases:44
> xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
> xc: debug: hypercall buffer: cache current size:2
> xc: debug: hypercall buffer: cache hits:35 misses:2 toobig:7
>
> And this is (I think) the relevant output from ktrace/kdump:
>
>    766 xl       RET   read 20/0x14
>    766 xl       CALL  read(0xc,0x80280702f,0x3cd)
>    766 xl       RET   read -1 errno 35 Resource temporarily unavailable
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  poll(0x802816180,0x6,0xffffffff)
>    766 xl       RET   poll 1
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  write(0x10,0x802807014,0x1b)
>    766 xl       GIO   fd 16 wrote 27 bytes
>        0x0000 1b5b 3f31 6c1b 3e1b 5b32 343b 3148 0000 0000 000d 1b5b  
> |.[?1l.>.[24;1H.......[|
>        0x0016 3f31 6c1b 3e                                            |?1l.>|
>
>    766 xl       RET   write 27/0x1b
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  poll(0x802816180,0x5,0xffffffff)
>    766 xl       RET   poll 1
>    766 xl       PSIG  SIGCHLD caught handler=0x80114bdf0 mask=0x0 
> code=CLD_EXITED
>    766 xl       CALL  sigprocmask(SIG_SETMASK,0x7fffffffd81c,0)
>    766 xl       RET   sigprocmask 0
>    766 xl       CALL  write(0xe,0x800af1fd2,0x1)
>    766 xl       GIO   fd 14 wrote 1 byte
>        "\0"
>    766 xl       RET   write 1
>    766 xl       CALL  sigreturn(0x7fffffffd440)
>    766 xl       RET   sigreturn JUSTRETURN
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  read(0xc,0x802807014,0x3e8)
>    766 xl       GIO   fd 12 read 0 bytes
>        ""
>    766 xl       RET   read 0
>    766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
>    766 xl       GIO   fd 2 wrote 7 bytes
>        "libxl: "
>    766 xl       RET   write 7
>    766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
>    766 xl       GIO   fd 2 wrote 7 bytes
>        "error: "
>
>> Did/does commit 916735e1814d "tools/libxl: Restore errnoval behavior for
>> datacopier callback" make any difference?
> No, I still get the same error. I will try to take a closer look into this 
> when I finish some pending stuff...

This works on Linux

libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch
w=0xb50870 wpath=/local/domain/11 token=3/0: register slotnum=3
libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0xb53700:
progress report: ignored
libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing
bootloader: /usr/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: /usr/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --args=root=/dev/xvda2 ro
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --output=/var/run/xen/bootloader.11.out
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --output-format=simple0
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --output-directory=/var/run/xen/bootloader.11.d
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: /dev/vm/andrewcoop-jessie-disk
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0xb50870
wpath=/local/domain/11 token=3/0: event epath=/local/domain/11
libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader
completed
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained kernel
/var/run/xen/bootloader.11.d/boot_kernel.t2OsEV
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained ramdisk
/var/run/xen/bootloader.11.d/boot_ramdisk.3i2X1b
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained args root=/dev/xvda2 ro elevator=noop
root=/dev/xvda2 ro
libxl: debug: libxl_bootloader.c:651:bootloader_finished: bootloader
execution successful


As 7e9ec5 has positively been identified as the cause of the regression
on FreeBSD, that presumably means that FreeBSD won't permit a read() of
an FD which has POLLHUP set.

POSIX appears to specifically allow this action.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.