[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable



Hello,

El 30/03/15 a les 12.40, Ian Campbell ha escrit:
> On Thu, 2015-03-26 at 16:20 +0100, Roger Pau Monnà wrote:
>> El 16/03/15 a les 14.29, Ross Lagerwall ha escrit:
>>> From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>
>>> POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
>>> pipe, but the writable fd has been closed.  This occurs in migration v2 when
>>> the legacy conversion process (which transforms the data inline) completes 
>>> and
>>> exits successfully.
>>>
>>> In the case that there is data to read, suppress the POLLHUP.  POSIX states
>>> that the hangup state is latched[1], which means it will reoccur on 
>>> subsequent
>>> poll() calls.  The datacopier is thus provided the opportunity to read until
>>> EOF, if possible.
>>>
>>> A POLLHUP on its own is treated exactly as before, indicating a different
>>> error with the fd.
>>>
>>> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
>>
>> This patch breaks booting pygrub guests on FreeBSD, it seems to trigger 
>> some kind of error when pygrub exits:
>>
>> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof 
>> copying bootloader output
>> libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build 
>> domain: -3
>>
>> I will try to look into it, have you tested if the same happens on 
>> Linux?
> 
> Ross/Andrew, have you tried this on Linux?
> 
> Roger, Does xl -vvv or even strace give any clues as to the sequences of
> events which lead to the error?

Here is the output of xl -vvv, it doesn't seem to contain any useful 
info apart from what I've already posted:

# xl -vvv create debian.cfg
Parsing config from debian.cfg
libxl: debug: libxl_create.c:1521:do_domain_create: ao 0x802834080: create: 
how=0x0 callback=0x0 poller=0x80281f0a0
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda 
spec.backend=unknown
libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk 
vdev=xvda, using backend phy
libxl: debug: libxl_create.c:924:initiate_domain_create: running bootloader
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk 
vdev=(null) spec.backend=phy
libxl: debug: libxl.c:3065:libxl__device_disk_local_initiate_attach: locally 
attaching PHY disk /dev/zvol/tank/debian
libxl: debug: libxl_bootloader.c:411:bootloader_disk_attached_cb: Config 
bootloader value: pygrub
libxl: debug: libxl_bootloader.c:427:bootloader_disk_attached_cb: Checking for 
bootloader in libexec path: /usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_create.c:1537:do_domain_create: ao 0x802834080: inprogress: 
poller=0x80281f0a0, flags=i
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x8028367a0 
wpath=/local/domain/2 token=3/0: register slotnum=3
libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0x802834080: 
progress report: ignored
libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing bootloader: 
/usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
/usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
--output=/var/run/xen/bootloader.2.out
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
--output-format=simple0
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
--output-directory=/var/run/xen/bootloader.2.d
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: 
/dev/zvol/tank/debian
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x8028367a0 
wpath=/local/domain/2 token=3/0: event epath=/local/domain/2
libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof 
copying bootloader output
libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader completed
libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch 
w=0x8028367a0 wpath=/local/domain/2 token=3/0: deregister slotnum=3
libxl: error: libxl_create.c:1138:domcreate_rebuild_done: cannot (re-)build 
domain: -3
libxl: info: libxl.c:1699:devices_destroy_cb: forked pid 763 for destroy of 
domain 2
libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x802834080: complete, 
rc=-3
libxl: debug: libxl_event.c:1738:libxl__ao__destroy: ao 0x802834080: destroy
xc: debug: hypercall buffer: total allocations:44 total releases:44
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:35 misses:2 toobig:7

And this is (I think) the relevant output from ktrace/kdump:

   766 xl       RET   read 20/0x14
   766 xl       CALL  read(0xc,0x80280702f,0x3cd)
   766 xl       RET   read -1 errno 35 Resource temporarily unavailable
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  poll(0x802816180,0x6,0xffffffff)
   766 xl       RET   poll 1
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  write(0x10,0x802807014,0x1b)
   766 xl       GIO   fd 16 wrote 27 bytes
       0x0000 1b5b 3f31 6c1b 3e1b 5b32 343b 3148 0000 0000 000d 1b5b  
|.[?1l.>.[24;1H.......[|
       0x0016 3f31 6c1b 3e                                            |?1l.>|

   766 xl       RET   write 27/0x1b
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  poll(0x802816180,0x5,0xffffffff)
   766 xl       RET   poll 1
   766 xl       PSIG  SIGCHLD caught handler=0x80114bdf0 mask=0x0 
code=CLD_EXITED
   766 xl       CALL  sigprocmask(SIG_SETMASK,0x7fffffffd81c,0)
   766 xl       RET   sigprocmask 0
   766 xl       CALL  write(0xe,0x800af1fd2,0x1)
   766 xl       GIO   fd 14 wrote 1 byte
       "\0"
   766 xl       RET   write 1
   766 xl       CALL  sigreturn(0x7fffffffd440)
   766 xl       RET   sigreturn JUSTRETURN
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  read(0xc,0x802807014,0x3e8)
   766 xl       GIO   fd 12 read 0 bytes
       ""
   766 xl       RET   read 0
   766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
   766 xl       GIO   fd 2 wrote 7 bytes
       "libxl: "
   766 xl       RET   write 7
   766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
   766 xl       GIO   fd 2 wrote 7 bytes
       "error: "

> Did/does commit 916735e1814d "tools/libxl: Restore errnoval behavior for
> datacopier callback" make any difference?

No, I still get the same error. I will try to take a closer look into this 
when I finish some pending stuff...

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.