Xen project Mailing List

Re: [Xen-devel] [RFC 7/7] libxl: Wait for QEMU startup in stubdomain

From: Eric Shelton <eshelton@xxxxxxxxx>

Date: Fri, 6 Feb 2015 08:56:40 -0500

Cc: Anthony PERARD <anthony.perard@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

Delivery-date: Fri, 06 Feb 2015 13:57:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Feb 6, 2015 at 6:16 AM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: > Thanks for posting. > > ... > > FWIW we are now experiencing problem with this startup protocol (not > Linux stubdom specific) -- that path that libxl waiting for is wrong. I simply used the code already present in the QEMU upstream code, which is writing to that particular ath to indicate "running." Since it is distinct from the path used by the QEMU instance running in Dom0, it works for my intended purpose: ensuring the device model is running before unpausing the HVM guest. When you say it is "wrong," is that just because you ultimately intend to rearchitect this and use something different? If so, maybe the path I am using is "good enough" until that happens. Otherwise, can you suggest a better path or mechanism? > Unfortunately this problem can't be solved without putting in > significant effort and time (involves redesign of protocol and handle > all the compatibility issues). We can't say for sure when the solution > is going to land. I noticed some discussion about this on xen-devel. Unfortunately, I was unable to find anything that laid out specifically what the problems are - can you point me to a bug report or such? The libxl startup code - with callbacks on top of callbacks, callbacks within callbacks, and callbacks stashed away in little places only to be called _much_ later - is really convoluted, I suspect particularly so for stubdom startup. I am not surprised it got broken - who can remember how it works? While working on these patches reviving Anthony's work, I consistently ran into HVM starup problems with QEMU upstream in a stub domain (it always failed). What I could not figure out is why QEMU-traditional did not have a similar problem; it seemed to me that the same race existed for QEMU-traditional stubdom. I wrote it off as either (1) MiniOS startup was so much faster than Linux that QEMU-traditional always won the race, or (2) there was some implicit mechanism in QEMU-traditional that ensured the HVM guest would wait for the device model to be in place. It sounds like maybe the race ctually is being lost in 4.5. If the problem you are contending with is that the HVM guest is being unpaused before the device model is in place, I suggest that this patch, or someting much like it, should address it. I note that I merely verified it did not break QEMU-traditional stubdom, but it is just a matter of ensuring QEMU-traditional writes to _some_ xenstore path when it is ready (it might do this already, in fact), and that this patch waits on that path. Also, it should be pretty easy to extend this concept to ensure any additional stubdoms, such as vTPM, are up and running before leaving the code im libxl_dm.c and unpausing the HVM domain - we just chain through additional callbacks as needed. There may be a desire to do a major rework of libxl_dm.c, etc., but this patch might be a reasonable bandaid now for Xen 4.5.1. > Also upstream QEMU stubdom, as you already notice, doesn't have a > critical functionality -- save / restore. Adding that in might involve > upstreaming some changes to QEMU, which has a time frame that is out of > our control. Xen maintains a separate repo for the QEMU code it uses. I presume this is because there is always something a little out of sync with the mainstream QEMU release. I do not understand why we cannot rely on this to make available any needed changes to QEMU pending their incorporation into QEMU proper. > So my hunch is that we're not going to make it in time for > 4.6. :-/ > > Wei. 4.5 was _just_ released, and Xen is on a ~10 month release cycle. Why can't this get done? Someone just has to take a little time to sit down and think about this. I remain baffled why Xen did not transition to QEMU-upstream stubdom 2 years ago. Running the device model directly in Dom0 is an obvious and significant security concern - the QEMU codebase is in constant flux, is too big, and is too complex to be allowed to be part of the TCB. I do not think this, or even chroot jailing the device model (which I understand has been done for some Xen-based projects), meets the standards for security demonstrated by the rest of the project. Can we arrive at an agreement that a Linux-based QEMU-upstream stubdom should _at least_ be a technical preview for Xen 4.6? A year ago, George kicked around the idea that QEMU-upstream stubdom should be a blocker for Xen 4.5 - clearly this notion fell through the cracks. For a reasonable number of users, specifically those wishing to use Xen as a desktop solution, save/restore is not required, and could be omitted in 4.6. I understand that rumpkernel has been a preferred route, but realistically that looks like a Xen 5.0 feature - I have seen no indication we are anywhere near making that happen, whereas Linux will work now, with very few technical hurdles to overcome (right now, the main issues seem to be getting xenfb hooked up correctly, and deciding how we wish to handle certain elements of the build process). Best, Eric _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.