[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC



> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: Friday, May 28, 2010 1:04 AM
> To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel (xen-
> devel@xxxxxxxxxxxxxxxxxxx); Ian Pratt
> Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC
> 
> On 28/05/2010 07:29, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
> 
> >> It is impossible to meet that level of TSC consistency when doing
> CPU
> >> physical-add, without emulating all guest TSCs. We may need to add
> that as
> >> an option, at least, to keep a small class of apps that care (like
> Oracle's
> >> DB, we assume) happy.
> >
> > So a option to make TSC_MODE_DEFAULT as d->arch.vtsc=0 ?.
> > When CPU_hotadd, we should at least warning if that option is not
> set, am I
> > right?
> 
> Xen-unstable:21469.

Well, although it's better than nothing, it seems pretty
lame to only put an advisory warning in xen's log about a
condition that may possibly affect many guest OS's and
applications with hard to identify symptoms/failures, and
possibly randomly at some point in time that may be
days/weeks/months after the event occurs.  Consider a cloud
service provider for example.

The advantage of turning hot-add-cpu off by default
is that, if it is turned on at boot-time, TSC emulation
can always be enabled for all guests at guest boot
and the condition never arises.

Are there any other questionable conditions that might
arise from hot-adding physical CPUs?  For example (my
favorite), are any order>0 allocations required?  Or
what if the hot-added cpu results in mixed generations
(e.g. a Nehalem is added to an all-Westmere system,
where the apps are using AES instructions)?  Anything
else?

In other words, maybe it would be nice to be able
to rule out other special dynamic checks for hot-add
cpus that aren't done for simultaneously-reset cpus?
Requiring a boot option to allow hot-add physical CPUs
might make a future nasty support problem a lot easier.

> "Undetectable" by Dan's definition means undetectable by
> a multi-threaded app on a multi-vcpu guest. Any detected
> warp would therefore be a problem.

This is actually Linux's definition, a requirement
for selecting tsc as Linux's default clocksource,
and measured by the same algorithm in Xen and Linux.

Linux is a bit more flexible than apps in that, if
Linux detects a problem, it can fallback from using
tsc as the clocksource to some other clocksource.
But it remains to be seen how well this will work
in a virtual environment, where there are a number
of conditions that a bare-metal OS can detect
that a virtualized guest OS (or an app running
on a physical or virtualized OS) cannot.

But to summarize, IMHO, correctness comes first,
performance second, and functionality that might
be needed on only a small fraction of systems
comes third.  I think enterprise customers dependent
on Xen would agree.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.