[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 19308: regressions - FAIL



On 16/09/13 11:55, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:
>> On 15/09/13 14:50, Ian Campbell wrote:
>>> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
>>>> flight 19308 xen-unstable real [real]
>>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. vs. 
>>>> 19208
>>>>  test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. vs. 
>>>> 19208
>>>>  test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 
>>>> 19208
>>>>  test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 
>>>> 19208
>>>>  test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. vs. 
>>>> 19208
>>>
>>> These are due to /var/run/xen-hotplug/block getting leaked
>>>
> 
> The error message in XenStore shows blkback tries to get hold of the
> block device 0:0 but there's no such device entry in system.
> 
>>>>  test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. vs. 
>>>> 19208
>>>>  test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. vs. 
>>>> 19208
>>>>  test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. 
>>>> vs. 19208
>>>
>>> These are:
>>>         libxl: error: libxl_device.c:894:device_backend_callback: unable
>>>         to add device with path /local/domain/0/backend/vbd/9/5632
>>>         libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add 
>>> disk devices
>>>
>>> /var/log/xen/xenhotplug.log contains:
>>>         xenstore-read: couldn't read path backend/vbd/9/5632/node
>>>
>>> For both of these I'm suspicious of:
>>> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
>>
>> Hello,
>>
>> I've tracked this down to libxl writing a wrong physical-device 
>> xenstore node when using regular files. When using block devices libxl 
>> can write the physical-device because it can be fetched without 
>> requiring the execution of the block script, but with regular files it 
>> is not true, we must first execute the block script in order to mount 
>> the regular file into a loop device and then fetch the physical-device 
>> from the loop device to which the image has been mounted. Following 
>> patch solves the issue for me.
>>
> 
> Yes, that's the in question I think. That code snippet was introduced in:
> 
> commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> Author: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> Date:   Tue Aug 7 14:26:29 2012 +0100
> 
>     libxl: write physical-device node if user did not supply a block script
>     
>     This reverts one of the intentional changes from 25733:353bc0801b11.
>     That change exposed an issue with the xl migration protocol, which
>     although safe triggers the hotplug scripts device sharing logic.
>     
>     For 4.2 we disable this logic by writing the physical-device xenstore
>     node ourselves if a user did not supply a script. If the user did
>     supply a script then we continue to rely on it to write the
>     physical-device node (not least because the script may create the
>     device and therefore it is not available before we run the script).
>     
>     This means that to support localhost migration a block hotplug script
>     needs to be robust against adding a device twice and should not
>     deactivate the device until it has been removed twice.
>     
>     This should be revisited for 4.3.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
>     Acked-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
>     Committed-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> 
> And in the commit message it says this behavior should be revisited.
> 
> Tracing back to 25733 
> (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> things look more complicated. One interesting snippet in the commit
> message is:
> 
> - libxl should not write the "physical-device" node. This is the
>   responsibility of the block script. Writing the "physical-device"
>   node in libxl basically completely short-cuts the standard block
>   hotplug script which uses "physical-device" to know if it has run
>   already or not.
> 
> That makes me believe the following fix is the correct thing to do in
> long term.
> 
> I have to admit that I cannot fully consume the commit message of 25733
> in one day so unless you (Ian) can confirm Roger's fix will not cause further
> regression otherwise I would suggest reverting my change at the moment.

My fix deals with one part of the problem, but will fail on local
migrate (block script will refuse to attach the same device twice). This
is indeed a tricky issue, and I cannot see an easy way to deal with it.

The proper way to fix this would be to unplug the devices from the
suspended domain before creating the new domain, but I'm sure this is
not trivial (this would also imply reattaching the devices to the
original domain if migration fails).


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.