[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] replace rdtsc emulation-vs-native xen boot option with per-domain (hypervisor part)



> But aside from that, all I'm asking for is a way for a domain to
> explicitly request that its tsc not be synthesized (or failing that,
> something that looks exactly like an unsynthesized tsc), so that
> usermode pvclock can work without needing edits to the config file.

Maybe I'm misunderstanding but I think what you are
asking for is that the kernel override a setting
specified by the administrator.  When that
setting (regardless of what you choose as a default)
is tsc_native==0, that means the administrator
has decided that correctness is more important
than performance, that now or sometime in the
future an app will use rdtsc and expect it to
behave like it does on either hardware with a
reliable TSC or like it does on VMware.

So in that circumstance, if pvclock+vsyscall is
dependent on an rdtsc, any rdtsc is slow and
the pvclock+vsyscall using rdtsc is slow.

I'm certainly open to a solution that solves both
problems, I just don't see one.

Perhaps a better choice would be for emulated tsc
to return Xen system time in both kernel and user
mode (which is what it does for HVM domains) and,
when a domain has tsc_native==0,
Xen sets its pvclock parameters so that no scaling
occurs?  This results in the guest reporting that
it has a 1GHz clock, but may be more consistent.

> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
> Sent: Monday, October 05, 2009 6:21 PM
> To: Dan Magenheimer
> Cc: Xen-Devel (E-mail); Keir Fraser
> Subject: Re: [Xen-devel] [PATCH] replace rdtsc emulation-vs-native xen
> boot option with per-domain (hypervisor part)
> 
> 
> On 10/05/09 16:42, Dan Magenheimer wrote:
> >> I've run into a few problems from this patch:
> >>
> >>    1. I'm seeing occasional messages on the console "hrtimer: 
> >> interrupt
> >>       too slow, forcing clock min delta to 9001953 ns" 
> which indicates
> >>       that the kernel is noticing that timer operations are 
> >> taking too long.
> >>     
> > Hmmm... that seems unlikely since it is highly probable that on
> > ANY machine, emulated TSC is faster than other highres timers,
> > for example, HPET. Perhaps it is a side effect of your assumptions
> > described in (2) below?
> >   
> 
> The kernel is measuring all time using pvclock, so it is 
> using rdtsc for
> that too.  Its simply noticing that rdtsc-to-rdtsc time is 
> taking longer
> than expected.  (This is a domU, so it has no access to any other form
> of time.)
> 
> >>    2. A domain can't turn on and off its own tsc emulation 
> state.  I'm
> >>       working on vsyscall support for pvclock (done, aside 
> from this
> >>       issue), so I need native tsc in usermode (or at 
> least, one with
> >>       the same parameters in kernel and userspace).  I was 
> >> getting very
> >>       confused because I didn't expect emulation to *only* apply to
> >>       usermode; I was expecting it to be done uniformly to 
> >> both user and
> >>       kernel tscs, with appropriate adjustments to the 
> vcpu_time_info
> >>       values.
> >>     
> > Sorry, I should probably have said this explicitly in the patch
> > prologue, but there is no (easy/fast) way to turn on emulation
> > for userland and turn off emulation for the kernel, so rather
> > than requiring a change to the pvclock ABI, rdtsc emulation
> > returns a raw TSC value when the rdtsc was executed in kernel
> > mode and Xen system time (nsec) when the rdtsc was executed
> > in userland.
> >   
> 
> You misunderstand me.  I was expecting that it would treat the tsc the
> same in user and kernel mode, and return a set of appropriate pvclock
> parameters.  This would allow user and kernel tscs to be consistent. 
> That wouldn't requite any changes to the pvclock ABI and it 
> would retain
> the TSC's "global timestamp" property, even across 
> kernel/usermode boundary.
> 
> Having them different is very awkard for tools which do things like
> measure kernel->user latency by getting tsc timestamps, or as 
> I'm trying
> to do, use the pvclock parameters in userspace.   You've effectively
> added an entirely new "usermode tsc" vs "kernel mode tsc" 
> concept to the
> architecture.
> 
> But aside from that, all I'm asking for is a way for a domain to
> explicitly request that its tsc not be synthesized (or failing that,
> something that looks exactly like an unsynthesized tsc), so that
> usermode pvclock can work without needing edits to the config file.
> 
> The current situation is very difficult because the kernel can't even
> tell if usermode is getting the same tsc properties that it is.
> 
> > To ensure both correctness and maximum performance across
> > a wide range of conditions, WITHOUT destroying backward
> > compatiblity for the pvclock ABI, a decision tree similar
> > to the one I just posted for apps could be employed.
> >   
> Well, having a variance between usermode and kernel tscs does break
> backwards compatibility and is very surprising.
> 
> > http://xen.markmail.org/message/uj4twbcsdw57z5zp
> >   
> 
> That looks very complicated.
> 
> > But this might rely on adding the same new Xen features
> > described in the parent to that post.
> >
> > OTOH, if pvclock is executed sometimes in userland and
> > sometimes in kernel (e.g. depending on the setting of
> > a sysfs variable), it seems like some kind of decision
> > tree is required anyway.
> >   
> 
> It is run in usermode as part of the normal vsyscall mechanism; the
> kernel also uses the same information and tsc to compue time for
> itself.  At the moment the only test it makes is to check if Xen
> supports the new hypercall I added to support this.
> 
>     J
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.