[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 7/7] libxl: Wait for QEMU startup in stubdomain

On Fri, Feb 06, 2015 at 08:56:40AM -0500, Eric Shelton wrote:
> On Fri, Feb 6, 2015 at 6:16 AM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> > Thanks for posting.
> >
> > ...
> >
> > FWIW  we are now experiencing problem with this startup protocol (not
> > Linux stubdom specific) -- that path that libxl waiting for is wrong.
> I simply used the code already present in the QEMU upstream code,
> which is writing to that particular ath to indicate "running."  Since
> it is distinct from the path used by the QEMU instance running in
> Dom0, it works for my intended purpose: ensuring the device model is
> running before unpausing the HVM guest.  When you say it is "wrong,"
> is that just because you ultimately intend to rearchitect this and use
> something different?  If so, maybe the path I am using is "good
> enough" until that happens.  Otherwise, can you suggest a better path
> or mechanism?

It is not "good enough". It just happens to be working.

Currently the path is hardcoded "/local/domain/0/BLAH". It's wrong,
because the QEMU in stubdom is not running in 0. The correct prefix
should be "/local/domain/$stubdom_id".

Consider there is another QEMU process that runs in Dom0 that provides
some kind of backend directly to the guest, it will also write to the
same "/local/domain/0/BLAH" address. This gives us a situation that both
QEMU in stubdom and QEMU in Dom0 write to the same xenstore path, which
is of course wrong.

We would still rely on xenstore as basic mechanism to present device
model state, but the protocol as-is is broken.  Since this is broken
anyway we might as well consider redesigning that protocol to make it
work both properly and future proof.

> > Unfortunately this problem can't be solved without putting in
> > significant effort and time (involves redesign of protocol and handle
> > all the compatibility issues). We can't say for sure when the solution
> > is going to land.
> I noticed some discussion about this on xen-devel.  Unfortunately, I
> was unable to find anything that laid out specifically what the
> problems are - can you point me to a bug report or such?  The libxl
> startup code - with callbacks on top of callbacks, callbacks within
> callbacks, and callbacks stashed away in little places only to be
> called _much_ later - is really convoluted, I suspect particularly so
> for stubdom startup.  I am not surprised it got broken - who can
> remember how it works?

It's not how libxl is coded. It's the startup protocol that is broken.
The breakage of stubdom in Xen 4.5 is a latent bug exposed by a new

I guess I should just send a bug report saying "Device model startup
protocol is broken". But I don't have much to say at this point, because
thorough research for both qemu-trad and qemu-upstream is required to
produce a sensible report.

> While working on these patches reviving Anthony's work, I consistently
> ran into HVM starup problems with QEMU upstream in a stub domain (it
> always failed).  What I could not figure out is why QEMU-traditional
> did not have a similar problem; it seemed to me that the same race
> existed for QEMU-traditional stubdom.  I wrote it off as either (1)
> MiniOS startup was so much faster than Linux that QEMU-traditional
> always won the race, or (2) there was some implicit mechanism in

My bet is on 1).

> QEMU-traditional that ensured the HVM guest would wait for the device
> model to be in place.  It sounds like maybe the race ctually is being

QEMU-trad stubdom is suffering from the same problem.

> lost in 4.5.

So prior to 4.5, when there is emulation request issued by a guest vcpu,
that request is put on a ring, guest vcpu is paused. When a DM shows up
it processes that request, posts response, then guest vcpu is unpaused.
So there is implicit dependency on Xen's behaviour for DM to work.

In 4.5, a new feature called ioreq server is added. When Xen sees an
io request which no backing DM, it returns immediately. Guest sees some
wired value and crashes. That is, Xen's behaviour has changed and a
latent bug in stubdom's startup protocol is exposed.

> If the problem you are contending with is that the HVM guest is being
> unpaused before the device model is in place, I suggest that this
> patch, or someting much like it, should address it.  I note that I
> merely verified it did not break QEMU-traditional stubdom, but it is
> just a matter of ensuring QEMU-traditional writes to _some_ xenstore
> path when it is ready (it might do this already, in fact), and that
> this patch waits on that path.  Also, it should be pretty easy to
> extend this concept to ensure any additional stubdoms, such as vTPM,
> are up and running before leaving the code im libxl_dm.c and unpausing
> the HVM domain - we just chain through additional callbacks as needed.

Yes, that's the basic idea, chaining things together.

> There may be a desire to do a major rework of libxl_dm.c, etc., but
> this patch might be a reasonable bandaid now for Xen 4.5.1.
> > Also upstream QEMU stubdom, as you already notice, doesn't have a
> > critical functionality -- save / restore. Adding that in might involve
> > upstreaming some changes to QEMU, which has a time frame that is out of
> > our control.
> Xen maintains a separate repo for the QEMU code it uses.  I presume
> this is because there is always something a little out of sync with
> the mainstream QEMU release.  I do not understand why we cannot rely
> on this to make available any needed changes to QEMU pending their
> incorporation into QEMU proper.

ISTR our policy is upstream first. That is, though we maintain our own
qemu tree those changesets are all upstream changesets. Arguably there
might be some bandaid changesets that are not upstream but big changes
like this needs to be upstreamed first.

Stefano, could you clarify this and correct me if I'm wrong?

> > So my hunch is that we're not going to make it in time for
> > 4.6. :-/
> >
> > Wei.
> 4.5 was _just_ released, and Xen is on a ~10 month release cycle.  Why
> can't this get done?  Someone just has to take a little time to sit

Notably there are many months that are code freeze. 

And due to our upstream first QEMU policy we would also need to upstream
changes to QEMU.

> down and think about this.  I remain baffled why Xen did not
> transition to QEMU-upstream stubdom 2 years ago.  Running the device
> model directly in Dom0 is an obvious and significant security concern
> - the QEMU codebase is in constant flux, is too big, and is too
> complex to be allowed to be part of the TCB.  I do not think this, or
> even chroot jailing the device model (which I understand has been done
> for some Xen-based projects), meets the standards for security
> demonstrated by the rest of the project.

I would like to have upstream QEMU as much as you do. :-)

I don't object to Linux-based stubdom (if it works, why not); however I
object to the idea that we continue to use the broken protocol.  I shall
find time to kick off a discussion on xen-devel with regard to the new

> Can we arrive at an agreement that a Linux-based QEMU-upstream stubdom
> should _at least_ be a technical preview for Xen 4.6?  A year ago,

If we really want to make this happen before new protocol and
implementation are in place.  That would be "tech preview" or
"experimental", whichever is the term for least mature technology. Note
that this is not due to the route it chooses (Linux based), it's due to
the fact that the protocol is broken and destined to be changed.


> George kicked around the idea that QEMU-upstream stubdom should be a
> blocker for Xen 4.5 - clearly this notion fell through the cracks.
> For a reasonable number of users, specifically those wishing to use
> Xen as a desktop solution, save/restore is not required, and could be
> omitted in 4.6.  I understand that rumpkernel has been a preferred
> route, but realistically that looks like a Xen 5.0 feature - I have
> seen no indication we are anywhere near making that happen, whereas
> Linux will work now, with very few technical hurdles to overcome
> (right now, the main issues seem to be getting xenfb hooked up
> correctly, and deciding how we wish to handle certain elements of the
> build process).
> Best,
> Eric

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.