Xen project Mailing List

Re: [Xen-devel] Xen domU Timekeeping (a.k.a TSC/HPET issues)

From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>

Date: Fri, 17 Feb 2012 08:28:17 -0800

Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Qrux <qrux.qed@xxxxxxxxx>

Delivery-date: Fri, 17 Feb 2012 16:28:34 +0000

Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=B66QxyZxJLN4a36H7iRSoa71ioW7ggaj3oLqQ8aql0Ga GoaAaSobMZu5sZt5oQKMxtfwr65D/R0nrnelg5TMeQvnYdHw5O+r24udzXD7Wtrv 9ENjlFjDDepZebkKUmYG4x897IklKxBjMdrZTk74WPZVrv3U4rxNYllQqw2XRcE=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> Date: Fri, 17 Feb 2012 12:06:05 +0000 > From: Ian Campbell <Ian.Campbell@xxxxxxxxxx> > To: Qrux <qrux.qed@xxxxxxxxx> > Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx> > Subject: Re: [Xen-devel] Xen domU Timekeeping (a.k.a TSC/HPET issues) > Message-ID: <1329480365.3131.50.camel@xxxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain; charset="UTF-8" > > I'm afraid I don't know the answer to most of your questions (hence I'm > afraid I've trimmed the quotes rather aggressively) but here's some of > what I do know. I'm gonna add another data point. We're seeing the Windows 7 Query Performance Counter get mightily confused. People have reported this in Amazon EC2 as well https://forums.aws.amazon.com/thread.jspa?threadID=41426 We've tracked it down to the hpet. Xen schedules an interrupt delivery for an hpet tick, but the vcpu is asleep. Could be an admin pause, a sleep on a wait queue, paused while qemu does its thing, paused while a mem event is processed... When the vcpu wakes up it receives the "late" hpet tick. We believe Windows 7 QPC also reads the TSC at that point. The TSC kept on ticking while the vcpu was paused. Windows does not know what to do about the discrepancy and reports a large time leap, usually consistent with a full round trip of the 32 bit hpet counter at the Xen-emulated 1/16GHz frequency. MSDN forums blame "bad hardware" for this. Funny. So, we could solve our particular problem if the tsc were not to tick during vcpu sleep. And I get an inkling that would help with this post as well. But I don't think any of the advertised timer or tsc modes do that. Thanks, Andres I'm not sure this will help with the original post, but there's gotta be somebody who > >> But, practically, is there a safe CPU configuration? > > I think that part of the problem here is that it is very hard to > determine this at the hardware level. There are at least 3 (if not more) > CPUID feature bits which say "no really, the TSC is good and safe to use > this time, you can rely on that" because they keep inventing new ways to > get it wrong. > > [...] >> >> Since September, I can't find any further information about this >> issue. What is the state of this issue? The inconsistency I see right >> now is this: in the July 2010 TSC discussion, a "Stefano Stabellini" >> posted this: >> >> ==== >> > /me wonders if timer_mode=1 is the default for xl? >> > Or only for xm? >> >> no, it is not. >> Xl defaults to 0 [zero], I am going to change it right now. >> ==== >> >> So, it seems like (at least as of July 2010), xl is defaulting to >> "timer_mode=1". That is, assuming that the then-current timer_mode is >> the same as present-day tsc_mode. > > No, I believe they are different things. > > tsc_mode is to do with the TSC, emulation vs direct exposure etc. Per > xen/include/asm-x86/time.h and (in recent xen-unstable) xl.cfg(5) > > timer_mode is to do with the the way that timer interrupts are injected > into the guest. This is described in xen/include/public/hvm/params.h. > This isn't documented in xl.cfg(5) because I couldn't make head nor tail > of the meaning of that header :-( > >> In addition, I'm assuming he was changing it from 0 (zero) to 1 >> (one)--and not some other mode. But, >> >> xen-4.1.2/docs/misc/tscmode.txt > > Remember that he was referring to timer_mode not tsc_mode... > >> says: >> >> "The default mode (tsc_mode==0) checks TSC-safeness of the >> underlying >> hardware on which the virtual machine is launched. If it is >> TSC-safe, rdtsc will execute at hardware speed; if it is not, >> rdtsc >> will be emulated." >> >> Which implies the default is always 0 (zero). Which is it? > > It seems that xl, in xen-unstable, defaults to: > timer_mode = 1 > tsc_mode = 0 > as does 4.1 as far as I can tell via code inspection. > >> More importantly, is the solution to force tsc_mode=2? > > IMHO this is safe in most situations unless you are running some sort of > workload (e.g. a well known database) which has stringent requirements > regarding the TSC for transactional consistency (hence the conservative > default). > >> If so, under what BIOS/xen-boot-params/dom0-boot-params conditions? >> And--please excuse my exasperation--but WTH does this have to do with >> ext3 versus ext4? Is ext4 exquisitely sensitive to TSC/HPET >> "jumpiness" (if that's even what's happening)? > > Sorry, I have no idea how/why the filesystem would be related to the > TSC. > > It is possible you are actually seeing two bugs I suppose -- there have > been issues relating to ext4 and barriers in some kernel versions (I'm > afraid I don't recall the details, the list archives ought to contain > something). > > Ian. > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.