|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v5] new config option vtsc_tolerance_khz to avoid TSC emulation
Add an option to control when vTSC emulation will be activated for a
domU with tsc_mode=default. Without such option each TSC access from
domU will be emulated, which causes a significant perfomance drop for
workloads that make use of rdtsc.
One option to avoid the TSC option is to run domUs with tsc_mode=native.
This has the drawback that migrating a domU from a "2.3GHz" class host
to a "2.4GHz" class host may change the rate at wich the TSC counter
increases, the domU may not be prepared for that.
With the new option the host admin can decide how a domU should behave
when it is migrated across systems of the same class. Since there is
always some jitter when Xen calibrates the cpu_khz value, all hosts of
the same class will most likely have slightly different values. As a
result vTSC emulation is unavoidable. Data collected during the incident
which triggered this change showed a jitter of up to 200 KHz across
systems of the same class.
Existing padding fields are reused to store vtsc_khz_tolerance as u16.
v5:
- reduce functionality to allow setting of the tolerance value
only at initial domU startup
v4:
- add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz
v3:
- rename vtsc_khz_tolerance to vtsc_tolerance_khz
- separate domctls to adjust values
- more docs
- update libxl.h
- update python tests
- flask check bound to tsc permissions
- not runtime tested due to dlsym() build errors in staging
Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>
---
docs/man/xen-tscmode.pod.7 | 16 ++++++++++++++++
docs/man/xl.cfg.pod.5.in | 10 ++++++++++
docs/specs/libxc-migration-stream.pandoc | 6 ++++--
tools/libxc/include/xenctrl.h | 2 ++
tools/libxc/xc_domain.c | 4 ++++
tools/libxc/xc_sr_common_x86.c | 6 ++++--
tools/libxc/xc_sr_stream_format.h | 3 ++-
tools/libxl/libxl.h | 6 ++++++
tools/libxl/libxl_types.idl | 1 +
tools/libxl/libxl_x86.c | 3 ++-
tools/python/xen/lowlevel/xc/xc.c | 2 +-
tools/xl/xl_parse.c | 3 +++
xen/arch/x86/domain.c | 2 +-
xen/arch/x86/domctl.c | 2 ++
xen/arch/x86/time.c | 31 ++++++++++++++++++++++++++++---
xen/include/asm-x86/domain.h | 1 +
xen/include/asm-x86/time.h | 6 ++++--
xen/include/public/domctl.h | 3 ++-
18 files changed, 93 insertions(+), 14 deletions(-)
diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 3bbc96f201..122ae36679 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -99,6 +99,9 @@ whether or not the VM has been saved/restored/migrated
=back
+If the tsc_mode is set to "default" the decision to emulate TSC can be
+tweaked further with the "vtsc_tolerance_khz" option.
+
To understand this in more detail, the rest of this document must
be read.
@@ -211,6 +214,19 @@ is emulated. Note that, though emulated, the "apparent"
TSC frequency
will be the TSC frequency of the initial physical machine, even after
migration.
+Since the calibration of the TSC frequency may not be 100% accurate, the
+exact value of the frequency can change even across reboots. This means
+also several otherwise identical systems can have a slightly different
+TSC frequency. As a result TSC access will be emulated if a domU is
+migrated from one host to another, identical host. To avoid the
+performance impact of TSC emulation a certain tolerance of the measured
+host TSC frequency can be specified with "vtsc_tolerance_khz". If the
+measured "cpu_khz" value is within the tolerance range, TSC access
+remains native. Otherwise it will be emulated. This allows to migrate
+domUs between identical hardware. If the domU will be migrated to a
+different kind of hardware, say from a "2.3GHz" to a "2.5GHz" system,
+TSC will be emualted to maintain the TSC frequency expected by the domU.
+
For environments where both TSC-safeness AND highest performance
even across migration is a requirement, application code can be specially
modified to use an algorithm explicitly designed into Xen for this purpose.
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 2c1a6e1422..0b36265e4f 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1891,6 +1891,16 @@ determined in a similar way to that of B<default> TSC
mode.
Please see B<xen-tscmode(7)> for more information on this option.
+=item B<vtsc_tolerance_khz="KHZ">
+
+B<(x86 only, relevant only for tsc_mode=default)>
+When a domU is started, the CPU frequency of the host is used by the domU for
+TSC related time measurement. Once the domU is either migrated or
+saved/restored on another host that CPU frequency has to be emulated to avoid
+timedrift. To avoid the performance penalty of the TSC emulation, allow a
+certain amount of jitter of the measured CPU frequency on the hosts the domU
+is supposed to run on.
+
=item B<localtime=BOOLEAN>
Set the real time clock to local time or to UTC. False (0) by default,
diff --git a/docs/specs/libxc-migration-stream.pandoc
b/docs/specs/libxc-migration-stream.pandoc
index 73421ff393..0d0f17edb1 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -3,7 +3,7 @@
Andrew Cooper <<andrew.cooper3@xxxxxxxxxx>>
Wen Congyang <<wency@xxxxxxxxxxxxxx>>
Yang Hongyang <<hongyang.yang@xxxxxxxxxxxx>>
-% Revision 2
+% Revision 3
Introduction
============
@@ -472,7 +472,7 @@ XEN\_DOMCTL\_{get,set}tscinfo hypercall sub-ops.
+------------------------+------------------------+
| nsec |
+------------------------+------------------------+
- | incarnation | (reserved) |
+ | incarnation | tolerance | (reserved) |
+------------------------+------------------------+
--------------------------------------------------------------------
@@ -485,6 +485,8 @@ khz TSC frequency, in kHz.
nsec Elapsed time, in nanoseconds.
incarnation Incarnation.
+
+tolerance Amount of Jitter the domU can handle after migration
--------------------------------------------------------------------
\clearpage
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 058e832c47..96bdd5609d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1360,6 +1360,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
uint32_t tsc_mode,
uint64_t elapsed_nsec,
uint32_t gtsc_khz,
+ uint16_t vtsc_tolerance_khz,
uint32_t incarnation);
int xc_domain_get_tsc_info(xc_interface *xch,
@@ -1367,6 +1368,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
uint32_t *tsc_mode,
uint64_t *elapsed_nsec,
uint32_t *gtsc_khz,
+ uint16_t *vtsc_tolerance_khz,
uint32_t *incarnation);
int xc_domain_disable_migrate(xc_interface *xch, uint32_t domid);
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 26b4b908b9..36acc1c45f 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -852,6 +852,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
uint32_t tsc_mode,
uint64_t elapsed_nsec,
uint32_t gtsc_khz,
+ uint16_t vtsc_tolerance_khz,
uint32_t incarnation)
{
DECLARE_DOMCTL;
@@ -860,6 +861,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
domctl.u.tsc_info.tsc_mode = tsc_mode;
domctl.u.tsc_info.elapsed_nsec = elapsed_nsec;
domctl.u.tsc_info.gtsc_khz = gtsc_khz;
+ domctl.u.tsc_info.vtsc_tolerance_khz = vtsc_tolerance_khz;
domctl.u.tsc_info.incarnation = incarnation;
return do_domctl(xch, &domctl);
}
@@ -869,6 +871,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
uint32_t *tsc_mode,
uint64_t *elapsed_nsec,
uint32_t *gtsc_khz,
+ uint16_t *vtsc_tolerance_khz,
uint32_t *incarnation)
{
int rc;
@@ -882,6 +885,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
*tsc_mode = domctl.u.tsc_info.tsc_mode;
*elapsed_nsec = domctl.u.tsc_info.elapsed_nsec;
*gtsc_khz = domctl.u.tsc_info.gtsc_khz;
+ *vtsc_tolerance_khz = domctl.u.tsc_info.vtsc_tolerance_khz;
*incarnation = domctl.u.tsc_info.incarnation;
}
return rc;
diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c
index 98f1cef30f..ea3e551a83 100644
--- a/tools/libxc/xc_sr_common_x86.c
+++ b/tools/libxc/xc_sr_common_x86.c
@@ -12,7 +12,8 @@ int write_tsc_info(struct xc_sr_context *ctx)
};
if ( xc_domain_get_tsc_info(xch, ctx->domid, &tsc.mode,
- &tsc.nsec, &tsc.khz, &tsc.incarnation) < 0 )
+ &tsc.nsec, &tsc.khz, &tsc.vtsc_tolerance,
+ &tsc.incarnation) < 0 )
{
PERROR("Unable to obtain TSC information");
return -1;
@@ -34,7 +35,8 @@ int handle_tsc_info(struct xc_sr_context *ctx, struct
xc_sr_record *rec)
}
if ( xc_domain_set_tsc_info(xch, ctx->domid, tsc->mode,
- tsc->nsec, tsc->khz, tsc->incarnation) )
+ tsc->nsec, tsc->khz, tsc->vtsc_tolerance,
+ tsc->incarnation) )
{
PERROR("Unable to set TSC information");
return -1;
diff --git a/tools/libxc/xc_sr_stream_format.h
b/tools/libxc/xc_sr_stream_format.h
index 15ff1c7efb..9b52f6ace6 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -121,7 +121,8 @@ struct xc_sr_rec_tsc_info
uint32_t khz;
uint64_t nsec;
uint32_t incarnation;
- uint32_t _res1;
+ uint16_t vtsc_tolerance;
+ uint16_t _res1;
};
/* HVM_PARAMS */
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index edd244278a..7e2b703251 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -354,6 +354,12 @@
#define LIBXL_HAVE_BUILDINFO_BOOTLOADER 1
#define LIBXL_HAVE_BUILDINFO_BOOTLOADER_ARGS 1
+/*
+ * LIBXL_HAVE_VTSC_TOLERANCE_KHZ indicates that libxl_domain_build_info
+ * has the vtsc_tolerance_khz field.
+ */
+#define LIBXL_HAVE_VTSC_TOLERANCE_KHZ 1
+
/*
* libxl ABI compatibility
*
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index dbb287d6fe..8b898bb3c9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -466,6 +466,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
("numa_placement", libxl_defbool),
("tsc_mode", libxl_tsc_mode),
+ ("vtsc_tolerance_khz", uint32),
("max_memkb", MemKB),
("target_memkb", MemKB),
("video_memkb", MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 1e9f98961b..ab5ff9aa8b 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -313,7 +313,8 @@ int libxl__arch_domain_create(libxl__gc *gc,
libxl_domain_config *d_config,
default:
abort();
}
- xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
+ xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0,
+ d_config->b_info.vtsc_tolerance_khz, 0);
if (libxl_defbool_val(d_config->b_info.disable_migrate))
xc_domain_disable_migrate(ctx->xch, domid);
rtc_timeoffset = d_config->b_info.rtc_timeoffset;
diff --git a/tools/python/xen/lowlevel/xc/xc.c
b/tools/python/xen/lowlevel/xc/xc.c
index f501764100..e73e2cafc7 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1522,7 +1522,7 @@ static PyObject *pyxc_domain_set_tsc_info(XcObject *self,
PyObject *args)
if (!PyArg_ParseTuple(args, "ii", &dom, &tsc_mode))
return NULL;
- if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0) != 0)
+ if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0, 0) !=
0)
return pyxc_error_to_exception(self->xc_handle);
Py_INCREF(zero);
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 8b999825d2..ddaddd6e65 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1126,6 +1126,9 @@ void parse_config_data(const char *config_source,
}
}
+ if (!xlu_cfg_get_long(config, "vtsc_tolerance_khz", &l, 0))
+ b_info->vtsc_tolerance_khz = l < 0 || l > UINT16_MAX ? UINT16_MAX : l;
+
if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
b_info->rtc_timeoffset = l;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fbb320da9c..d40b91721e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -561,7 +561,7 @@ int arch_domain_create(struct domain *d,
ASSERT_UNREACHABLE(); /* Not HVM and not PV? */
/* initialize default tsc behavior in case tools don't */
- tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
+ tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0, 0);
/* PV/PVH guests get an emulated PIT too for video BIOSes to use. */
pit_init(d, cpu_khz);
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..d86ff58482 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -939,6 +939,7 @@ long arch_do_domctl(
tsc_get_info(d, &domctl->u.tsc_info.tsc_mode,
&domctl->u.tsc_info.elapsed_nsec,
&domctl->u.tsc_info.gtsc_khz,
+ &domctl->u.tsc_info.vtsc_tolerance_khz,
&domctl->u.tsc_info.incarnation);
domain_unpause(d);
copyback = true;
@@ -954,6 +955,7 @@ long arch_do_domctl(
tsc_set_info(d, domctl->u.tsc_info.tsc_mode,
domctl->u.tsc_info.elapsed_nsec,
domctl->u.tsc_info.gtsc_khz,
+ domctl->u.tsc_info.vtsc_tolerance_khz,
domctl->u.tsc_info.incarnation);
domain_unpause(d);
}
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 84c1c0c082..df25be1c45 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2064,7 +2064,7 @@ int host_tsc_is_safe(void)
*/
void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
uint64_t *elapsed_nsec, uint32_t *gtsc_khz,
- uint32_t *incarnation)
+ uint16_t *vtsc_tolerance_khz, uint32_t *incarnation)
{
bool enable_tsc_scaling = is_hvm_domain(d) &&
hvm_tsc_scaling_supported && !d->arch.vtsc;
@@ -2080,6 +2080,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
*elapsed_nsec = *gtsc_khz = 0;
break;
case TSC_MODE_DEFAULT:
+ *vtsc_tolerance_khz = d->arch.vtsc_tolerance_khz;
if ( d->arch.vtsc )
{
case TSC_MODE_ALWAYS_EMULATE:
@@ -2122,7 +2123,8 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
*/
void tsc_set_info(struct domain *d,
uint32_t tsc_mode, uint64_t elapsed_nsec,
- uint32_t gtsc_khz, uint32_t incarnation)
+ uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz,
+ uint32_t incarnation)
{
ASSERT(!is_system_domain(d));
@@ -2134,9 +2136,12 @@ void tsc_set_info(struct domain *d,
switch ( d->arch.tsc_mode = tsc_mode )
{
+ bool disable_vtsc;
bool enable_tsc_scaling;
case TSC_MODE_DEFAULT:
+ d->arch.vtsc_tolerance_khz = vtsc_tolerance_khz;
+ /* Fallthrough. */
case TSC_MODE_ALWAYS_EMULATE:
d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
@@ -2149,8 +2154,26 @@ void tsc_set_info(struct domain *d,
* When a guest is created, gtsc_khz is passed in as zero, making
* d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
*/
+ disable_vtsc = d->arch.tsc_khz == cpu_khz;
+
+ if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz &&
+ d->arch.vtsc_tolerance_khz )
+ {
+ uint32_t khz_diff;
+
+ khz_diff = cpu_khz > gtsc_khz ?
+ cpu_khz - gtsc_khz : gtsc_khz - cpu_khz;
+ disable_vtsc = khz_diff <= d->arch.vtsc_tolerance_khz;
+
+ printk(XENLOG_G_INFO "%s: d%u: host has %lu kHz,"
+ " domU expects %u kHz,"
+ " difference of %u is %s tolerance of %u\n",
+ __func__, d->domain_id, cpu_khz, gtsc_khz, khz_diff,
+ disable_vtsc ? "within" : "outside",
+ d->arch.vtsc_tolerance_khz);
+ }
if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
- (d->arch.tsc_khz == cpu_khz ||
+ (disable_vtsc ||
(is_hvm_domain(d) &&
hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
{
@@ -2239,6 +2262,8 @@ static void dump_softtsc(unsigned char key)
printk(",ofs=%#"PRIx64, d->arch.vtsc_offset);
if ( d->arch.tsc_khz )
printk(",khz=%"PRIu32, d->arch.tsc_khz);
+ if ( d->arch.vtsc_tolerance_khz )
+ printk(",tol=%"PRIu16, d->arch.vtsc_tolerance_khz);
if ( d->arch.incarnation )
printk(",inc=%"PRIu32, d->arch.incarnation);
#if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index a12ae47f1b..7743995934 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -374,6 +374,7 @@ struct arch_domain
uint64_t vtsc_offset; /* adjustment for save/restore/migrate */
uint32_t tsc_khz; /* cached guest khz for certain emulated or
hardware TSC scaling cases */
+ uint32_t vtsc_tolerance_khz; /* domU handles that much jitter in cpu_khz */
struct time_scale vtsc_to_ns; /* scaling for certain emulated or
hardware TSC scaling cases */
struct time_scale ns_to_vtsc; /* scaling for certain emulated or
diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h
index b3ae832df4..ef9be7a701 100644
--- a/xen/include/asm-x86/time.h
+++ b/xen/include/asm-x86/time.h
@@ -61,10 +61,12 @@ u64 gtime_to_gtsc(struct domain *d, u64 time);
u64 gtsc_to_gtime(struct domain *d, u64 tsc);
void tsc_set_info(struct domain *d, uint32_t tsc_mode, uint64_t elapsed_nsec,
- uint32_t gtsc_khz, uint32_t incarnation);
+ uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz,
+ uint32_t incarnation);
void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec,
- uint32_t *gtsc_khz, uint32_t *incarnation);
+ uint32_t *gtsc_khz, uint16_t *vtsc_tolerance_khz,
+ uint32_t *incarnation);
void force_update_vcpu_system_time(struct vcpu *v);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index ec7a860afc..70a58ae2e4 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -702,7 +702,8 @@ struct xen_domctl_tsc_info {
uint32_t tsc_mode;
uint32_t gtsc_khz;
uint32_t incarnation;
- uint32_t pad;
+ uint16_t vtsc_tolerance_khz;
+ uint16_t pad;
uint64_aligned_t elapsed_nsec;
};
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |