Xen project Mailing List

Re: [Xen-devel] Xen Platform QoS design discussion

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>

Date: Thu, 8 May 2014 05:21:14 +0000

Accept-language: en-US

Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Thu, 08 May 2014 05:21:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac9kk87wxWHtz3jCTGiTUahAmD2eXv//fisA//74dFCAA6whgP//SxMQgADrzQD//R3q8P/3B02g/+syd8D/1fcbsA==

Thread-topic: [Xen-devel] Xen Platform QoS design discussion

> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Sent: Thursday, May 08, 2014 5:19 AM > To: George Dunlap > Cc: Xu, Dongxiao; Ian Campbell; Jan Beulich; xen-devel@xxxxxxxxxxxxx > Subject: Re: [Xen-devel] Xen Platform QoS design discussion > > On 07/05/14 14:26, George Dunlap wrote: > > On Tue, May 6, 2014 at 11:06 AM, Andrew Cooper > > <andrew.cooper3@xxxxxxxxxx> wrote: > >> On 06/05/14 02:40, Xu, Dongxiao wrote: > >>>> -----Original Message----- > >>>> From: Xu, Dongxiao > >>>> Sent: Sunday, May 04, 2014 8:46 AM > >>>> To: Jan Beulich > >>>> Cc: Andrew Cooper(andrew.cooper3@xxxxxxxxxx); Ian Campbell; > >>>> xen-devel@xxxxxxxxxxxxx > >>>> Subject: RE: Xen Platform QoS design discussion > >>>> > >>>>> -----Original Message----- > >>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >>>>> Sent: Friday, May 02, 2014 8:40 PM > >>>>> To: Xu, Dongxiao > >>>>> Cc: Andrew Cooper(andrew.cooper3@xxxxxxxxxx); Ian Campbell; > >>>>> xen-devel@xxxxxxxxxxxxx > >>>>> Subject: RE: Xen Platform QoS design discussion > >>>>> > >>>>>>>> On 02.05.14 at 14:30, <dongxiao.xu@xxxxxxxxx> wrote: > >>>>>>> -----Original Message----- > >>>>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >>>>>>> Sent: Friday, May 02, 2014 5:24 PM > >>>>>>> To: Xu, Dongxiao > >>>>>>> Cc: Andrew Cooper(andrew.cooper3@xxxxxxxxxx); Ian Campbell; > >>>>>>> xen-devel@xxxxxxxxxxxxx > >>>>>>> Subject: RE: Xen Platform QoS design discussion > >>>>>>> > >>>>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@xxxxxxxxx> wrote: > >>>>>>>>> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx] > >>>>>>>>> Have you asked yourself whether this information even needs to be > >>>>>>>>> exposed all the way up to libxl? Who are the expected consumers of > this > >>>>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are > >>>>>>>>> you > >>>>>>>>> expecting toolstacks to plumb this information all the way up to > >>>>>>>>> their > >>>>>>>>> GUI or CLI (e.g. xl or virsh)? > >>>>>>>> The information returned to libxl users is the cache utilization for > >>>>>>>> a > >>>>>>>> certain domain in certain socket, and the main consumers are cloud > users > >>>>>> like > >>>>>>>> openstack, etc. Of course, we will also provide an xl command to > present > >>>>>> such > >>>>>>>> information. > >>>>>>> To me this doesn't really address the question Ian asked, yet knowing > >>>>>>> who's going to be the consumer of the data is also quite relevant for > >>>>>>> answering your original question on the method to obtain that data. > >>>>>>> Obviously, if the main use of it is per-domain, a domctl would seem > >>>>>>> like > >>>>>>> a suitable approach despite the data being more of sysctl kind. But if > >>>>>>> a global view would be more important, that model would seem to > make > >>>>>>> life needlessly hard for the consumers. In turn, if using a domctl, I > >>>>>>> tend > >>>>>>> to agree that not using shared pages would be preferable; iirc their > >>>>>>> use > >>>>>>> was mainly suggested because of the size of the data. > >>>>>> From the discussion with openstack developers, on certain cloud host, > >>>>>> all > >>>>>> running VM's information (e.g., domain ID) will be stored in a > >>>>>> database, > and > >>>>>> openstack software will use libvirt/XenAPI to query specific domain > >>>>>> information. That libvirt/XenAPI API interface basically accepts the > domain > >>>>>> ID as input parameter and get the domain information, including the > platform > >>>>>> QoS one. > >>>>>> > >>>>>> Based on above information, I think we'd better design the QoS > hypercall > >>>>>> per-domain. > >>>>> If you think that this is going to be the only (or at least prevalent) > >>>>> usage model, that's probably okay then. But I'm a little puzzled that > >>>>> all this effort is just for a single, rather specific consumer. I > >>>>> thought > >>>>> that if this is so important to Intel there would be wider interested > >>>>> audience. > >>> Since there is no further comments, I suppose we all agreed on making the > hypercall per-domain and use data copying mechanism between hypervisor and > Dom0 tool stack? > >>> > >> No - the onus is very much on you to prove that your API will *not* be > >> used in the following way: > >> > >> every $TIMEPERIOD > >> for each domain > >> for each type of information > >> get-$TYPE-information-for-$DOMAIN > >> > >> > >> Which is the source of my concerns regarding overhead. > >> > >> As far as I can see, as soon as you provide access to this QoS > >> information, higher level toolstacks are going to want all information > >> for all domains. Given your proposed domctl, they will have exactly one > >> (bad) way of getting this information. > > Is this really going to be that much of a critical path that we need > > to even have this discussion? > > Absolutely. > > If that logical set of nested loops is on a remote control instance > where get-$TYPE-information-for-$DOMAIN involves rpc to a particular > dom0, then the domctls can be approximated as being functionally > infinite time periods apart. > > If the set of nested loops is a daemon or script in dom0, the domctls > will be very close together. > > As the current implementation involves taking a global spinlock, IPI'ing > the other sockets and MSR interactions, the net impact on the running > system can be massive, particularly if back-to-back IPIs interrupt HVM > guests. > > > > > We have two different hypercalls right now for getting "dominfo": a > > domctl and a sysctl. You use the domctl if you want information about > > a single domain, you use sysctl if you want information about all > > domains. The sysctl implementation calls the domctl implementation > > internally. > > It is not a fair comparison, given the completely different nature of > the domctls in question. XEN_DOMCTL_getdomaininfo is doing very little > more than reading specific bits of data out the appropriate struct > domain and its struct vcpu's which can trivially be done by the cpu > handling the hypercall. > > > > > Is there a problem with doing the same thing here? Or, with starting > > with a domctl, and then creating a sysctl if iterating over all > > domains (and calling the domctl internally) if we measure the domctl > > to be too slow for many callers? > > > > -George > > My problem is not with the domctl per-se. > > My problem is that this is not a QoS design discussion; this is an > email thread about a specific QoS implementation which is not answering > the concerns raised against it to the satisfaction of people raising the > concerns. > > The core argument here is that a statement of "OpenStack want to get a > piece of QoS data back from libvirt/xenapi when querying a specific > domain" is being used to justify implementing the hypercall in an > identical fashion. > > This is not a libxl design; this is a single user story forming part of > the requirement "I as a cloud service provider would like QoS > information for each VM to be available to my > $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge > customers, balance my load more evenly, etc}". > > The only valid justification for implementing a brand new hypercall in a > certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to > perform the actions I need to perform", for appropriately > substitutions. Not "because it is the same way I want to hand this > information off at the higher level". > > As part of this design discussion. I have raised a concern saying "I > believe the usecase of having a stats gathering daemon in dom0 has not > been appropriately considered", qualified with "If you were to use the > domctl as currently designed from a stats gathering daemon, you will > cripple Xen with the overhead". > > Going back to the original use, xenapi has a stats daemon for these > things. It has an rpc interface so a query given a specific domain can > return some or all data for that domain, but it very definitely does not > translate each request into a hypercall for the requested information. > I have no real experience with libvirt, so can't comment on stats > gathering in that context. > > I have proposed an alternative Xen->libxc interface designed with a > stats daemon in mind, explaining why I believe it has lower overheads to > Xen and why is more in line with what I expect ${VENDOR}Stack to > actually want. > > I am now waiting for a reasoned rebuttal which has more content than > "because there are a set of patches which already implement it in this way". No, I don't have the patch for domctl implementation. In the past half year, all previous v1-v10 patches are implemented in sysctl way, however based on that, people raised a lot of comments (large size of memory, runtime non-0 order of memory allocation, page sharing with user space, CPU online/offline special logic, etc.), and these make the platform QoS implementation more and more complex in Xen. That's why I am proposing the domctl method that can make things easier. I don't have more things to argue or rebuttal, and if you prefer sysctl, I can continue to work out a v11, v12 or more, to present the big 2-dimension array to end user and let them withdraw their real required data, still includes the extra CPU online/offline logics to handle the QoS resource runtime allocation. Thanks, Dongxiao > > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.