Xen project Mailing List

Re: [Xen-devel] [RFC 7/7] libxl: Wait for QEMU startup in stubdomain

From: Eric Shelton <eshelton@xxxxxxxxx>

Date: Fri, 6 Feb 2015 10:46:15 -0500

Cc: Anthony PERARD <anthony.perard@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

Delivery-date: Fri, 06 Feb 2015 15:46:42 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Feb 6, 2015 at 9:59 AM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: > On Fri, Feb 06, 2015 at 08:56:40AM -0500, Eric Shelton wrote: >> On Fri, Feb 6, 2015 at 6:16 AM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: >> >> I simply used the code already present in the QEMU upstream code, >> which is writing to that particular ath to indicate "running." Since >> it is distinct from the path used by the QEMU instance running in >> Dom0, it works for my intended purpose: ensuring the device model is >> running before unpausing the HVM guest. When you say it is "wrong," >> is that just because you ultimately intend to rearchitect this and use >> something different? If so, maybe the path I am using is "good >> enough" until that happens. Otherwise, can you suggest a better path >> or mechanism? >> > > It is not "good enough". It just happens to be working. > > Currently the path is hardcoded "/local/domain/0/BLAH". It's wrong, > because the QEMU in stubdom is not running in 0. The correct prefix > should be "/local/domain/$stubdom_id". OK; that definitely makes more sense - I recall the same idea crossing my mind when I first dug into this. Although the revised protocol may go in a different direction, I will adopt this approach for now. >> I noticed some discussion about this on xen-devel. Unfortunately, I >> was unable to find anything that laid out specifically what the >> problems are - can you point me to a bug report or such? The libxl >> startup code - with callbacks on top of callbacks, callbacks within >> callbacks, and callbacks stashed away in little places only to be >> called _much_ later - is really convoluted, I suspect particularly so >> for stubdom startup. I am not surprised it got broken - who can >> remember how it works? >> > > It's not how libxl is coded. It's the startup protocol that is broken. > The breakage of stubdom in Xen 4.5 is a latent bug exposed by a new > feature. > > I guess I should just send a bug report saying "Device model startup > protocol is broken". But I don't have much to say at this point, because > thorough research for both qemu-trad and qemu-upstream is required to > produce a sensible report. So, just where is the current protocol breaking down? Is there a contemplated bandaid for 4.5.1? I'm just trying to figure out what I might want to do differently. > So prior to 4.5, when there is emulation request issued by a guest vcpu, > that request is put on a ring, guest vcpu is paused. When a DM shows up > it processes that request, posts response, then guest vcpu is unpaused. > So there is implicit dependency on Xen's behaviour for DM to work. > > In 4.5, a new feature called ioreq server is added. When Xen sees an > io request which no backing DM, it returns immediately. Guest sees some > wired value and crashes. That is, Xen's behaviour has changed and a > latent bug in stubdom's startup protocol is exposed. So, is the approach that I took - waiting for the stubdom DM to finish initializing - a reasonable short-term solution? I guess I am wondering whether the fix you are contemplating is in libxl, the hypervisor, or both. Thanks, Eric _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.