[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] rdtscP and xen (and maybe the app-tsc answer I've been looking for)
On 09/22/09 00:39, Jan Beulich wrote:
>> 1. It is running under Xen (or not, if you expect this to be
>> implemented on multiple hypervisors)
>> 2. rdtscp is available
>> 3. the ABI is actually being implemented, ie:
>> 1. the tsc_aux value actually has the correct meaning
>> 2. it has a working mechanism for getting the tsc scaling
> This sub-2 can certainly be assumed to imply the respective sub-1.
Yeah, they're the minimum requirements of a "working ABI". But I think
we should also have something workable if only rdtsc is available.
>> The obvious thing to do is to pack a version number and pcpu number into
>> TSC_AUX. Usermode would maintain an array of pv_clock parameters, one
>> for each pcpu. If the version number matches, then it uses the
>> parameters it has; if not it fetches new parameters and repeats the
>> rdtscp. There's no need to worry about either thread or vcpu context
>> switches because you get the (tsc,params) tuple atomically, which is the
>> tricky bit without rdtscp.
>> (The version number would be truncated wrt the normal pvclock version
>> number, but it just needs to be large enough to avoid aliasing from
>> wrapping; I'm assuming something like 24 bits version and 8 bits cpu
> I continue to think that it would be fundamentally wrong to use pCPU
> numbers here: Not only do you share information with the app that it
> shouldn't really care about, but you also push scalability issues to it
> that the kernel is supposed to abstract out for apps.
As far as usermode is concerned, they're just tags to distinguish
distinct sets of parameters. We could remap them from actual pcpu
numbers to some other key space, but I don't see much point in doing
so. The numbers are meaningless to usermode and have no inherent meaning.
(Of course we could add some inherent structure to them, like adding
node numbers for NUMA systems, so that usermode has at least some idea
of how it is being mapped to hardware, at least at that instant. But
that's a whole other discussion.)
> In particular,
> - the interface must not imply an upper bound for the number of
> pCPU-s (i.e. a fixed 8-/24-bit separation won't work, but reducing the
> version to significantly below 24 bits may cause issues),
Yeah. I was considering a mechanism whereby the version/cpu split was a
runtime option fetched from Xen. Running out of space for CPU numbers
would be a disaster, but a smaller version space can be dealt with by
making sure that there's at new pvclock param update before the version
wraps (which you can achieve by requiring an update every X units of
wallclock time, where X is less than the expected minimum time of a wrap).
> - the app must not imply the number of pCPU-s is bounded in any way
> (since, due to migration or CPU hotplug, it may grow).
Usermode might have to use a more flexible structure than a simple array
to handle arbitrary parameter keys (aka pcpu numbers).
> While both can be addressed, this really isn't something an app should
> (have to) care about.
I agree. All this machinery should be wrapped up in the form of
vsyscall. That would simplify many aspects of this discussion.
Xen-devel mailing list