[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Basic blktap2 functionality issues.



On Thu, 2012-03-29 at 15:39 +0100, greg@xxxxxxxxxxxx wrote:
> Hi, hope the day is going well for everyone.
> 
> I had posted a note about this issue two weeks ago and didn't get any
> response.  I don't know if that indicates that people are not using
> blktap or if the question stumped everyone.
> 
> First of all there is a documentable issue with blktap in 4.1.2 and
> xl.  Anyone trying to use it to export files to a guest will be
> affected.

The issue you refer to here is the deadlock describe below right? The
block device is actually functional while the guest is up and running?

>   Using xl to do 'block-attach' and 'block-detach' in dom0
> also doesn't work in stock 4.1.2 so I suspect there are generic issues
> with blktap and xl.

Stefano was looking at block-attach to dom0 more generally in
xen-unstable too.

> In stock 4.1.2 using xl results in the tapdisk2 server process being
> orphaned.  There is a patch floating around from Ian Campbell which
> fixes this problem and either creates or uncovers what may be a more
> fundamental problem.

That patch has been applied to xen-unstable -- does this hang occur for
you there? (I'm in the process of installing a suitable system to try
and reproduce).

> With Ian's patch applied the tapdisk2 process terminates but the
> tapdisk device is not released resulting in a steady accumulation of
> orphan minor numbers.  The underlying cause of this appears to be a
> resource deadlock between xl requesting a detach of the VBD and the
> tapdisk2 process.

When you say "the tapdisk2 process terminates" I guess you mean "tries
to terminate but gets blocked" rather than actually quits, since
otherwise I don't think there would be a deadlock?

I think an approach worth trying would be to have
tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
doing the actual detach. i.e. it would respond with "Yes, I will do
that" rather than "Yes, I have done that". My speculation is that this
will allow libxl to continue and hopefully avoid the deadlock.

A similar alternative would be to have tap_ctl_detach not wait for a
response at all.

In both cases I think it could be argued that if the
TAPDISK_MESSAGE_DETACH message does not result in tapdisk2 shutting down
there isn't much that libxl could do about it anyway so it might as well
get on with its life.

The other approach would be to figure out what libxl is doing after its
call (or perhaps just not doing) to tap_ctl_detach which actually has to
be done first e..g perhaps destroying the xenstore backend directory?

> I've also heard rumbles about a mythical 'blktap3' which runs
> completely in userspace.  If that is the direction things are going I
> would certainly be willing to hammer on that rather then put more time
> into blktap2 if there is 'blktap3' code someplace.

I expect that a lot of the control-plane stuff will be the same between
blktap2 and 3 so any issues fixed in 2 will likely help 3 too.

It is also worth fixing issues in blktap2 simply for the sake of fixing
them since there will still be people who are using it in 4.2.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.