[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen domU Timekeeping (a.k.a TSC/HPET issues)

  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>
  • Date: Fri, 17 Feb 2012 08:28:17 -0800
  • Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Qrux <qrux.qed@xxxxxxxxx>
  • Delivery-date: Fri, 17 Feb 2012 16:28:34 +0000
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=B66QxyZxJLN4a36H7iRSoa71ioW7ggaj3oLqQ8aql0Ga GoaAaSobMZu5sZt5oQKMxtfwr65D/R0nrnelg5TMeQvnYdHw5O+r24udzXD7Wtrv 9ENjlFjDDepZebkKUmYG4x897IklKxBjMdrZTk74WPZVrv3U4rxNYllQqw2XRcE=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> Date: Fri, 17 Feb 2012 12:06:05 +0000
> From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> To: Qrux <qrux.qed@xxxxxxxxx>
> Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] Xen domU Timekeeping (a.k.a TSC/HPET issues)
> Message-ID: <1329480365.3131.50.camel@xxxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="UTF-8"
> I'm afraid I don't know the answer to most of your questions (hence I'm
> afraid I've trimmed the quotes rather aggressively) but here's some of
> what I do know.

I'm gonna add another data point.

We're seeing the Windows 7 Query Performance Counter get mightily
confused. People have reported this in Amazon EC2 as well

We've tracked it down to the hpet. Xen schedules an interrupt delivery for
an hpet tick, but the vcpu is asleep. Could be an admin pause, a sleep on
a wait queue, paused while qemu does its thing, paused while a mem event
is processed...

When the vcpu wakes up it receives the "late" hpet tick. We believe
Windows 7 QPC also reads the TSC at that point. The TSC kept on ticking
while the vcpu was paused. Windows does not know what to do about the
discrepancy and reports a large time leap, usually consistent with a full
round trip of the 32 bit hpet counter at the Xen-emulated 1/16GHz

MSDN forums blame "bad hardware" for this. Funny.

So, we could solve our particular problem if the tsc were not to tick
during vcpu sleep. And I get an inkling that would help with this post as
well. But I don't think any of the advertised timer or tsc modes do that.


I'm not sure this will help with the original post, but there's gotta be
somebody who
>> But, practically, is there a safe CPU configuration?
> I think that part of the problem here is that it is very hard to
> determine this at the hardware level. There are at least 3 (if not more)
> CPUID feature bits which say "no really, the TSC is good and safe to use
> this time, you can rely on that" because they keep inventing new ways to
> get it wrong.
> [...]
>> Since September, I can't find any further information about this
>> issue. What is the state of this issue?  The inconsistency I see right
>> now is this: in the July 2010 TSC discussion, a "Stefano Stabellini"
>> posted this:
>> ====
>> > /me wonders if timer_mode=1 is the default for xl?
>> > Or only for xm?
>> no, it is not.
>> Xl defaults to 0 [zero], I am going to change it right now.
>> ====
>> So, it seems like (at least as of July 2010), xl is defaulting to
>> "timer_mode=1".  That is, assuming that the then-current timer_mode is
>> the same as present-day tsc_mode.
> No, I believe they are different things.
> tsc_mode is to do with the TSC, emulation vs direct exposure etc. Per
> xen/include/asm-x86/time.h and (in recent xen-unstable) xl.cfg(5)
> timer_mode is to do with the the way that timer interrupts are injected
> into the guest. This is described in xen/include/public/hvm/params.h.
> This isn't documented in xl.cfg(5) because I couldn't make head nor tail
> of the meaning of that header :-(
>>   In addition, I'm assuming he was changing it from 0 (zero) to 1
>> (one)--and not some other mode.  But,
>>         xen-4.1.2/docs/misc/tscmode.txt
> Remember that he was referring to timer_mode not tsc_mode...
>> says:
>>         "The default mode (tsc_mode==0) checks TSC-safeness of the
>> underlying
>>         hardware on which the virtual machine is launched.  If it is
>>         TSC-safe, rdtsc will execute at hardware speed; if it is not,
>> rdtsc
>>         will be emulated."
>> Which implies the default is always 0 (zero).  Which is it?
> It seems that xl, in xen-unstable, defaults to:
>       timer_mode = 1
>       tsc_mode = 0
> as does 4.1 as far as I can tell via code inspection.
>> More importantly, is the solution to force tsc_mode=2?
> IMHO this is safe in most situations unless you are running some sort of
> workload (e.g. a well known database) which has stringent requirements
> regarding the TSC for transactional consistency (hence the conservative
> default).
>>   If so, under what BIOS/xen-boot-params/dom0-boot-params conditions?
>> And--please excuse my exasperation--but WTH does this have to do with
>> ext3 versus ext4?  Is ext4 exquisitely sensitive to TSC/HPET
>> "jumpiness" (if that's even what's happening)?
> Sorry, I have no idea how/why the filesystem would be related to the
> TSC.
> It is possible you are actually seeing two bugs I suppose -- there have
> been issues relating to ext4 and barriers in some kernel versions (I'm
> afraid I don't recall the details, the list archives ought to contain
> something).
> Ian.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.