[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen Platform QoS design discussion



On Fri, May 16, 2014 at 6:11 AM, Xu, Dongxiao <dongxiao.xu@xxxxxxxxx> wrote:
>> -----Original Message-----
>> From: Xu, Dongxiao
>> Sent: Tuesday, May 13, 2014 9:53 AM
>> To: Andrew Cooper
>> Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@xxxxxxxxxxxxx
>> Subject: RE: [Xen-devel] Xen Platform QoS design discussion
>>
>> > -----Original Message-----
>> > From: Xu, Dongxiao
>> > Sent: Friday, May 09, 2014 10:41 AM
>> > To: Andrew Cooper
>> > Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@xxxxxxxxxxxxx
>> > Subject: RE: [Xen-devel] Xen Platform QoS design discussion
>> >
>> > > -----Original Message-----
>> > > From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
>> > > Sent: Thursday, May 08, 2014 7:26 PM
>> > > To: Xu, Dongxiao
>> > > Cc: George Dunlap; Ian Campbell; Jan Beulich; xen-devel@xxxxxxxxxxxxx
>> > > Subject: Re: [Xen-devel] Xen Platform QoS design discussion
>> > >
>> > > On 08/05/14 06:21, Xu, Dongxiao wrote:
>> > >
>> > > <massive snip>
>> > >
>> > > >>
>> > > >>> We have two different hypercalls right now for getting "dominfo": a
>> > > >>> domctl and a sysctl.  You use the domctl if you want information 
>> > > >>> about
>> > > >>> a single domain, you use sysctl if you want information about all
>> > > >>> domains.  The sysctl implementation calls the domctl implementation
>> > > >>> internally.
>> > > >> It is not a fair comparison, given the completely different nature of
>> > > >> the domctls in question.  XEN_DOMCTL_getdomaininfo is doing very 
>> > > >> little
>> > > >> more than reading specific bits of data out the appropriate struct
>> > > >> domain and its struct vcpu's which can trivially be done by the cpu
>> > > >> handling the hypercall.
>> > > >>
>> > > >>> Is there a problem with doing the same thing here?  Or, with starting
>> > > >>> with a domctl, and then creating a sysctl if iterating over all
>> > > >>> domains (and calling the domctl internally) if we measure the domctl
>> > > >>> to be too slow for many callers?
>> > > >>>
>> > > >>>  -George
>> > > >> My problem is not with the domctl per-se.
>> > > >>
>> > > >> My problem is that this is not a QoS design discussion;  this is an
>> > > >> email thread about a specific QoS implementation which is not 
>> > > >> answering
>> > > >> the concerns raised against it to the satisfaction of people raising 
>> > > >> the
>> > > >> concerns.
>> > > >>
>> > > >> The core argument here is that a statement of "OpenStack want to get a
>> > > >> piece of QoS data back from libvirt/xenapi when querying a specific
>> > > >> domain" is being used to justify implementing the hypercall in an
>> > > >> identical fashion.
>> > > >>
>> > > >> This is not a libxl design; this is a single user story forming part 
>> > > >> of
>> > > >> the requirement "I as a cloud service provider would like QoS
>> > > >> information for each VM to be available to my
>> > > >> $CHOSEN_ORCHESTRATION_SOFTWARE so I can {differentially charge
>> > > >> customers, balance my load more evenly, etc}".
>> > > >>
>> > > >> The only valid justification for implementing a brand new hypercall 
>> > > >> in a
>> > > >> certain way is "Because $THIS_CERTAIN_WAY is the $MOST_SENSIBLE way
>> to
>> > > >> perform the actions I need to perform", for appropriately
>> > > >> substitutions.  Not "because it is the same way I want to hand this
>> > > >> information off at the higher level".
>> > > >>
>> > > >> As part of this design discussion. I have raised a concern saying "I
>> > > >> believe the usecase of having a stats gathering daemon in dom0 has not
>> > > >> been appropriately considered", qualified with "If you were to use the
>> > > >> domctl as currently designed from a stats gathering daemon, you will
>> > > >> cripple Xen with the overhead".
>> > > >>
>> > > >> Going back to the original use, xenapi has a stats daemon for these
>> > > >> things.  It has an rpc interface so a query given a specific domain 
>> > > >> can
>> > > >> return some or all data for that domain, but it very definitely does 
>> > > >> not
>> > > >> translate each request into a hypercall for the requested information.
>> > > >> I have no real experience with libvirt, so can't comment on stats
>> > > >> gathering in that context.
>> > > >>
>> > > >> I have proposed an alternative Xen->libxc interface designed with a
>> > > >> stats daemon in mind, explaining why I believe it has lower overheads 
>> > > >> to
>> > > >> Xen and why is more in line with what I expect ${VENDOR}Stack to
>> > > >> actually want.
>> > > >>
>> > > >> I am now waiting for a reasoned rebuttal which has more content than
>> > > >> "because there are a set of patches which already implement it in this
>> way".
>> > > > No, I don't have the patch for domctl implementation.
>> > > >
>> > > > In the past half year, all previous v1-v10 patches are implemented in 
>> > > > sysctl
>> > way,
>> > > however based on that, people raised a lot of comments (large size of
>> memory,
>> > > runtime non-0 order of memory allocation, page sharing with user space, 
>> > > CPU
>> > > online/offline special logic, etc.), and these make the platform QoS
>> > > implementation more and more complex in Xen. That's why I am proposing
>> the
>> > > domctl method that can make things easier.
>> > > >
>> > > > I don't have more things to argue or rebuttal, and if you prefer 
>> > > > sysctl, I can
>> > > continue to work out a v11, v12 or more, to present the big 2-dimension 
>> > > array
>> > to
>> > > end user and let them withdraw their real required data, still includes 
>> > > the
>> extra
>> > > CPU online/offline logics to handle the QoS resource runtime allocation.
>> > > >
>> > > > Thanks,
>> > > > Dongxiao
>> > >
>> > > I am sorry - I was not trying to make an argument for one of the
>> > > proposed mechanisms over the other.  The point I was trying to make
>> > > (which on further consideration isn't as clear as I was hoping) is that
>> > > you cannot possibly design the hypercall interface before knowing the
>> > > library usecases, and there is a clear lack of understanding (or at
>> > > least communication) in this regard.
>> > >
>> > >
>> > > So, starting from the top. OpenStack want QoS information, and want to
>> > > get it from libvirt/XenAPI.  I think libvirt/XenAPI is the correct level
>> > > to do this at, and think exactly the same would apply to CloudStack as
>> > > well.  The relevant part of this is the question "how does
>> > > libvirt/XenAPI collect stats".
>> > >
>> > > XenAPI collects stats with the RRD Daemon, running in dom0.  It has an
>> > > internal database of statistics, and hands data from this database out
>> > > upon RPC requests.  It also has threads whose purpose is to periodically
>> > > refresh the data in the database.  This provides a disconnect between
>> > > ${FOO}Stack requesting stats for a domain and the logic to obtain stats
>> > > for that domain.
>> > >
>> > > I am however unfamiliar with libvirt in this regard.  Could you please
>> > > explain how the libvirt daemon deals with stats?
>> >
>> > I am not the libvirt expert either.
>> > Consult from other guys who work in libvirt that, libvirt doesn't maintain 
>> > the
>> > domain status itself, but just expose the APIs for upper cloud/openstack to
>> query,
>> > and these APIs accept the domain id as input parameter.
>>
>> Hi Andrew,
>>
>> Do you have more thought considering this libvirt usage?
>
> Ping...

So AndyC and I had a chat about this, and I think we came up with
something that would be do-able.  (This is from memory, so please
correct me if I missed anything, Andy.)

So the situation, as I understand it, is:

Stats are generated by MSRs on each CPU.  Collecting the stats from
the CPUs is potentially fairly expensive, including a number of IPIs;
and possibly reading the stats may be expensive as well.

However, we expect that many callers (including perhaps libvirt, or
xl/libxl) will want to view the information on a per-domain basis; and
may want to collect them on, say, a 1-second granularity.  Iterating
over each domain, collecting the stats for each one separately, for a
large installation may mean spending a non-negligible amount of time
just doing IPIs and reading MSRs, introducing an unacceptable level of
overhead.

So it seems like it would be better to collect information for all
domains at one time, amortizing the cost in one set of IPIs, and then
answering queries about each domain from that bit of "stored"
information.

The initial idea that comes to mind is having a daemon in dom0 collect
the metrics on a specified granularity (say, 1s) and then answer
per-domain queries.  However, we don't actually have the
infrastructure and standard architecture in place in libxl for
starting, managing, and talking to such a daemon; the entire thing
would have to be designed from scratch.

But in reality, all we need the daemon for is a place to store the
information to query.  The idea we came up with was to allocate memory
*inside the hypervisor* to store the information.  The idea is that
we'd have a sysctl to prompt Xen to *collect* the data into some
memory buffers inside of Xen, and then a domctl that would allow you
query the data on a per-domain basis.

That should be a good balance -- it's not quite as good as having as
separate daemon, but it's a pretty good compromise.

Thoughts?

There are a couple of options regarding collecting the data.  One is
to simply require the caller to do a "poll" sysctl every time they
want to refresh the data.  Another possibility would be to have a
sysctl "freshness" knob: you could say, "Please make sure the data is
no more than 1000ms old"; Xen could then automatically do a refresh
when necessary.

The advantage of the "poll" method is that you could get a consistent
snapshot across all domains; but you'd have to add in code to do the
refresh.  (An xl command querying an individual domain would
undoubtedly end up calling the poll on each execution, for instance.)

An advantage of the "freshness" knob, on the other hand, is that you
automatically get coalescing without having to do anything special
with the interface.

Does that make sense?  Is that something you might be willing to
implement, Dongxiao?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.