[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen domU Timekeeping (a.k.a TSC/HPET issues)


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>
  • Date: Fri, 17 Feb 2012 08:28:17 -0800
  • Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Qrux <qrux.qed@xxxxxxxxx>
  • Delivery-date: Fri, 17 Feb 2012 16:28:34 +0000
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=B66QxyZxJLN4a36H7iRSoa71ioW7ggaj3oLqQ8aql0Ga GoaAaSobMZu5sZt5oQKMxtfwr65D/R0nrnelg5TMeQvnYdHw5O+r24udzXD7Wtrv 9ENjlFjDDepZebkKUmYG4x897IklKxBjMdrZTk74WPZVrv3U4rxNYllQqw2XRcE=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> Date: Fri, 17 Feb 2012 12:06:05 +0000
> From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> To: Qrux <qrux.qed@xxxxxxxxx>
> Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] Xen domU Timekeeping (a.k.a TSC/HPET issues)
> Message-ID: <1329480365.3131.50.camel@xxxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="UTF-8"
>
> I'm afraid I don't know the answer to most of your questions (hence I'm
> afraid I've trimmed the quotes rather aggressively) but here's some of
> what I do know.

I'm gonna add another data point.

We're seeing the Windows 7 Query Performance Counter get mightily
confused. People have reported this in Amazon EC2 as well
https://forums.aws.amazon.com/thread.jspa?threadID=41426

We've tracked it down to the hpet. Xen schedules an interrupt delivery for
an hpet tick, but the vcpu is asleep. Could be an admin pause, a sleep on
a wait queue, paused while qemu does its thing, paused while a mem event
is processed...

When the vcpu wakes up it receives the "late" hpet tick. We believe
Windows 7 QPC also reads the TSC at that point. The TSC kept on ticking
while the vcpu was paused. Windows does not know what to do about the
discrepancy and reports a large time leap, usually consistent with a full
round trip of the 32 bit hpet counter at the Xen-emulated 1/16GHz
frequency.

MSDN forums blame "bad hardware" for this. Funny.

So, we could solve our particular problem if the tsc were not to tick
during vcpu sleep. And I get an inkling that would help with this post as
well. But I don't think any of the advertised timer or tsc modes do that.

Thanks,
Andres



I'm not sure this will help with the original post, but there's gotta be
somebody who
>
>> But, practically, is there a safe CPU configuration?
>
> I think that part of the problem here is that it is very hard to
> determine this at the hardware level. There are at least 3 (if not more)
> CPUID feature bits which say "no really, the TSC is good and safe to use
> this time, you can rely on that" because they keep inventing new ways to
> get it wrong.
>
> [...]
>>
>> Since September, I can't find any further information about this
>> issue. What is the state of this issue?  The inconsistency I see right
>> now is this: in the July 2010 TSC discussion, a "Stefano Stabellini"
>> posted this:
>>
>> ====
>> > /me wonders if timer_mode=1 is the default for xl?
>> > Or only for xm?
>>
>> no, it is not.
>> Xl defaults to 0 [zero], I am going to change it right now.
>> ====
>>
>> So, it seems like (at least as of July 2010), xl is defaulting to
>> "timer_mode=1".  That is, assuming that the then-current timer_mode is
>> the same as present-day tsc_mode.
>
> No, I believe they are different things.
>
> tsc_mode is to do with the TSC, emulation vs direct exposure etc. Per
> xen/include/asm-x86/time.h and (in recent xen-unstable) xl.cfg(5)
>
> timer_mode is to do with the the way that timer interrupts are injected
> into the guest. This is described in xen/include/public/hvm/params.h.
> This isn't documented in xl.cfg(5) because I couldn't make head nor tail
> of the meaning of that header :-(
>
>>   In addition, I'm assuming he was changing it from 0 (zero) to 1
>> (one)--and not some other mode.  But,
>>
>>         xen-4.1.2/docs/misc/tscmode.txt
>
> Remember that he was referring to timer_mode not tsc_mode...
>
>> says:
>>
>>         "The default mode (tsc_mode==0) checks TSC-safeness of the
>> underlying
>>         hardware on which the virtual machine is launched.  If it is
>>         TSC-safe, rdtsc will execute at hardware speed; if it is not,
>> rdtsc
>>         will be emulated."
>>
>> Which implies the default is always 0 (zero).  Which is it?
>
> It seems that xl, in xen-unstable, defaults to:
>       timer_mode = 1
>       tsc_mode = 0
> as does 4.1 as far as I can tell via code inspection.
>
>> More importantly, is the solution to force tsc_mode=2?
>
> IMHO this is safe in most situations unless you are running some sort of
> workload (e.g. a well known database) which has stringent requirements
> regarding the TSC for transactional consistency (hence the conservative
> default).
>
>>   If so, under what BIOS/xen-boot-params/dom0-boot-params conditions?
>> And--please excuse my exasperation--but WTH does this have to do with
>> ext3 versus ext4?  Is ext4 exquisitely sensitive to TSC/HPET
>> "jumpiness" (if that's even what's happening)?
>
> Sorry, I have no idea how/why the filesystem would be related to the
> TSC.
>
> It is possible you are actually seeing two bugs I suppose -- there have
> been issues relating to ext4 and barriers in some kernel versions (I'm
> afraid I don't recall the details, the list archives ought to contain
> something).
>
> Ian.
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.