[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen Platform QoS design discussion

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Sent: Thursday, May 08, 2014 5:19 AM
> To: George Dunlap
> Cc: Xu, Dongxiao; Ian Campbell; Jan Beulich; xen-devel@xxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Xen Platform QoS design discussion
> On 07/05/14 14:26, George Dunlap wrote:
> > On Tue, May 6, 2014 at 11:06 AM, Andrew Cooper
> > <andrew.cooper3@xxxxxxxxxx> wrote:
> >> On 06/05/14 02:40, Xu, Dongxiao wrote:
> >>>> -----Original Message-----
> >>>> From: Xu, Dongxiao
> >>>> Sent: Sunday, May 04, 2014 8:46 AM
> >>>> To: Jan Beulich
> >>>> Cc: Andrew Cooper(andrew.cooper3@xxxxxxxxxx); Ian Campbell;
> >>>> xen-devel@xxxxxxxxxxxxx
> >>>> Subject: RE: Xen Platform QoS design discussion
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >>>>> Sent: Friday, May 02, 2014 8:40 PM
> >>>>> To: Xu, Dongxiao
> >>>>> Cc: Andrew Cooper(andrew.cooper3@xxxxxxxxxx); Ian Campbell;
> >>>>> xen-devel@xxxxxxxxxxxxx
> >>>>> Subject: RE: Xen Platform QoS design discussion
> >>>>>
> >>>>>>>> On 02.05.14 at 14:30, <dongxiao.xu@xxxxxxxxx> wrote:
> >>>>>>>  -----Original Message-----
> >>>>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >>>>>>> Sent: Friday, May 02, 2014 5:24 PM
> >>>>>>> To: Xu, Dongxiao
> >>>>>>> Cc: Andrew Cooper(andrew.cooper3@xxxxxxxxxx); Ian Campbell;
> >>>>>>> xen-devel@xxxxxxxxxxxxx
> >>>>>>> Subject: RE: Xen Platform QoS design discussion
> >>>>>>>
> >>>>>>>>>> On 01.05.14 at 02:56, <dongxiao.xu@xxxxxxxxx> wrote:
> >>>>>>>>> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx]
> >>>>>>>>> Have you asked yourself whether this information even needs to be
> >>>>>>>>> exposed all the way up to libxl? Who are the expected consumers of
> this
> >>>>>>>>> interface? Are they low-level CLI tools (i.e. like xenpm is) or are 
> >>>>>>>>> you
> >>>>>>>>> expecting toolstacks to plumb this information all the way up to 
> >>>>>>>>> their
> >>>>>>>>> GUI or CLI (e.g. xl or virsh)?
> >>>>>>>> The information returned to libxl users is the cache utilization for 
> >>>>>>>> a
> >>>>>>>> certain domain in certain socket, and the main consumers are cloud
> users
> >>>>>> like
> >>>>>>>> openstack, etc. Of course, we will also provide an xl command to
> present
> >>>>>> such
> >>>>>>>> information.
> >>>>>>> To me this doesn't really address the question Ian asked, yet knowing
> >>>>>>> who's going to be the consumer of the data is also quite relevant for
> >>>>>>> answering your original question on the method to obtain that data.
> >>>>>>> Obviously, if the main use of it is per-domain, a domctl would seem 
> >>>>>>> like
> >>>>>>> a suitable approach despite the data being more of sysctl kind. But if
> >>>>>>> a global view would be more important, that model would seem to
> make
> >>>>>>> life needlessly hard for the consumers. In turn, if using a domctl, I 
> >>>>>>> tend
> >>>>>>> to agree that not using shared pages would be preferable; iirc their 
> >>>>>>> use
> >>>>>>> was mainly suggested because of the size of the data.
> >>>>>> From the discussion with openstack developers, on certain cloud host, 
> >>>>>> all
> >>>>>> running VM's information (e.g., domain ID) will be stored in a 
> >>>>>> database,
> and
> >>>>>> openstack software will use libvirt/XenAPI to query specific domain
> >>>>>> information. That libvirt/XenAPI API interface basically accepts the
> domain
> >>>>>> ID as input parameter and get the domain information, including the
> platform
> >>>>>> QoS one.
> >>>>>>
> >>>>>> Based on above information, I think we'd better design the QoS
> hypercall
> >>>>>> per-domain.
> >>>>> If you think that this is going to be the only (or at least prevalent)
> >>>>> usage model, that's probably okay then. But I'm a little puzzled that
> >>>>> all this effort is just for a single, rather specific consumer. I 
> >>>>> thought
> >>>>> that if this is so important to Intel there would be wider interested
> >>>>> audience.
> >>> Since there is no further comments, I suppose we all agreed on making the
> hypercall per-domain and use data copying mechanism between hypervisor and
> Dom0 tool stack?
> >>>
> >> No - the onus is very much on you to prove that your API will *not* be
> >> used in the following way:
> >>
> >> every $TIMEPERIOD
> >>   for each domain
> >>     for each type of information
> >>       get-$TYPE-information-for-$DOMAIN
> >>
> >>
> >> Which is the source of my concerns regarding overhead.
> >>
> >> As far as I can see, as soon as you provide access to this QoS
> >> information, higher level toolstacks are going to want all information
> >> for all domains.  Given your proposed domctl, they will have exactly one
> >> (bad) way of getting this information.
> > Is this really going to be that much of a critical path that we need
> > to even have this discussion?
> Absolutely.
> If that logical set of nested loops is on a remote control instance
> where get-$TYPE-information-for-$DOMAIN involves rpc to a particular
> dom0, then the domctls can be approximated as being functionally
> infinite time periods apart.
> If the set of nested loops is a daemon or script in dom0, the domctls
> will be very close together.
> As the current implementation involves taking a global spinlock, IPI'ing
> the other sockets and MSR interactions, the net impact on the running
> system can be massive, particularly if back-to-back IPIs interrupt HVM
> guests.
> >
> > We have two different hypercalls right now for getting "dominfo": a
> > domctl and a sysctl.  You use the domctl if you want information about
> > a single domain, you use sysctl if you want information about all
> > domains.  The sysctl implementation calls the domctl implementation
> > internally.
> It is not a fair comparison, given the completely different nature of
> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very little
> more than reading specific bits of data out the appropriate struct
> domain and its struct vcpu's which can trivially be done by the cpu
> handling the hypercall.
> >
> > Is there a problem with doing the same thing here?  Or, with starting
> > with a domctl, and then creating a sysctl if iterating over all
> > domains (and calling the domctl internally) if we measure the domctl
> > to be too slow for many callers?
> >
> >  -George
> My problem is not with the domctl per-se.
> My problem is that this is not a QoS design discussion;  this is an
> email thread about a specific QoS implementation which is not answering
> the concerns raised against it to the satisfaction of people raising the
> concerns.
> The core argument here is that a statement of "OpenStack want to get a
> piece of QoS data back from libvirt/xenapi when querying a specific
> domain" is being used to justify implementing the hypercall in an
> identical fashion.
> This is not a libxl design; this is a single user story forming part of
> the requirement "I as a cloud service provider would like QoS
> information for each VM to be available to my
> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
> customers, balance my load more evenly, etc}".
> The only valid justification for implementing a brand new hypercall in a
> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way to
> perform the actions I need to perform", for appropriately
> substitutions.  Not "because it is the same way I want to hand this
> information off at the higher level".
> As part of this design discussion. I have raised a concern saying "I
> believe the usecase of having a stats gathering daemon in dom0 has not
> been appropriately considered", qualified with "If you were to use the
> domctl as currently designed from a stats gathering daemon, you will
> cripple Xen with the overhead".
> Going back to the original use, xenapi has a stats daemon for these
> things.  It has an rpc interface so a query given a specific domain can
> return some or all data for that domain, but it very definitely does not
> translate each request into a hypercall for the requested information.
> I have no real experience with libvirt, so can't comment on stats
> gathering in that context.
> I have proposed an alternative Xen->libxc interface designed with a
> stats daemon in mind, explaining why I believe it has lower overheads to
> Xen and why is more in line with what I expect ${VENDOR}Stack to
> actually want.
> I am now waiting for a reasoned rebuttal which has more content than
> "because there are a set of patches which already implement it in this way".

No, I don't have the patch for domctl implementation. 

In the past half year, all previous v1-v10 patches are implemented in sysctl 
way, however based on that, people raised a lot of comments (large size of 
memory, runtime non-0 order of memory allocation, page sharing with user space, 
CPU online/offline special logic, etc.), and these make the platform QoS 
implementation more and more complex in Xen. That's why I am proposing the 
domctl method that can make things easier.

I don't have more things to argue or rebuttal, and if you prefer sysctl, I can 
continue to work out a v11, v12 or more, to present the big 2-dimension array 
to end user and let them withdraw their real required data, still includes the 
extra CPU online/offline logics to handle the QoS resource runtime allocation.


> ~Andrew
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.