[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
Add an option to control when vTSC emulation will be activated for a domU with tsc_mode=default. Without such option each TSC access from domU will be emulated, which causes a significant perfomance drop for workloads that make use of rdtsc. One option to avoid the TSC option is to run domUs with tsc_mode=native. This has the drawback that migrating a domU from a "2.3GHz" class host to a "2.4GHz" class host may change the rate at wich the TSC counter increases, the domU may not be prepared for that. With the new option the host admin can decide how a domU should behave when it is migrated across systems of the same class. Since there is always some jitter when Xen calibrates the cpu_khz value, all hosts of the same class will most likely have slightly different values. As a result vTSC emulation is unavoidable. Data collected during the incident which triggered this change showed a jitter of up to 200 KHz across systems of the same class. Existing padding fields are reused to store vtsc_khz_tolerance as u16. The padding is sent as zero in write_tsc_info to the receving host. The padding is undefined if the changed code runs as receiver. handle_tsc_info has no code to verify that padding is indeed zero. Due to the lack of a version field it is impossible to know if the sender already has the newly introduced vtsc_tolerance field. In the worst case the receiving domU will get an unemulated TSC. Signed-off-by: Olaf Hering <olaf@xxxxxxxxx> Reviewed-by: Wei Liu <wei.liu2@xxxxxxxxxx> (v07/v08) Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> (v08) -- v9: - extend commit msg, mention potential issues with xc_sr_rec_tsc_info._res1 v8: - adjust also python stream checker for added tolerance member v7: - use uint16 in libxl_types.idl to match type used elsewhere in the patch v6: - mention default value in xl.cfg - tsc_set_info: remove usage of __func__, use %d for domid - tsc_set_info: use ABS to calculate khz_diff v5: - reduce functionality to allow setting of the tolerance value only at initial domU startup v4: - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz v3: - rename vtsc_khz_tolerance to vtsc_tolerance_khz - separate domctls to adjust values - more docs - update libxl.h - update python tests - flask check bound to tsc permissions - not runtime tested due to dlsym() build errors in staging --- docs/man/xen-tscmode.pod.7 | 16 +++++++++++++ docs/man/xl.cfg.pod.5.in | 10 ++++++++ docs/specs/libxc-migration-stream.pandoc | 6 +++-- tools/libxc/include/xenctrl.h | 2 ++ tools/libxc/xc_domain.c | 4 ++++ tools/libxc/xc_sr_common_x86.c | 6 +++-- tools/libxc/xc_sr_stream_format.h | 3 ++- tools/libxl/libxl.h | 6 +++++ tools/libxl/libxl_types.idl | 1 + tools/libxl/libxl_x86.c | 3 ++- tools/python/xen/lowlevel/xc/xc.c | 2 +- tools/python/xen/migration/libxc.py | 8 +++---- tools/xl/xl_parse.c | 3 +++ xen/arch/x86/domain.c | 2 +- xen/arch/x86/domctl.c | 2 ++ xen/arch/x86/time.c | 30 +++++++++++++++++++++--- xen/include/asm-x86/domain.h | 1 + xen/include/asm-x86/time.h | 6 +++-- xen/include/public/domctl.h | 3 ++- 19 files changed, 96 insertions(+), 18 deletions(-) diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7 index 3bbc96f201..122ae36679 100644 --- a/docs/man/xen-tscmode.pod.7 +++ b/docs/man/xen-tscmode.pod.7 @@ -99,6 +99,9 @@ whether or not the VM has been saved/restored/migrated =back +If the tsc_mode is set to "default" the decision to emulate TSC can be +tweaked further with the "vtsc_tolerance_khz" option. + To understand this in more detail, the rest of this document must be read. @@ -211,6 +214,19 @@ is emulated. Note that, though emulated, the "apparent" TSC frequency will be the TSC frequency of the initial physical machine, even after migration. +Since the calibration of the TSC frequency may not be 100% accurate, the +exact value of the frequency can change even across reboots. This means +also several otherwise identical systems can have a slightly different +TSC frequency. As a result TSC access will be emulated if a domU is +migrated from one host to another, identical host. To avoid the +performance impact of TSC emulation a certain tolerance of the measured +host TSC frequency can be specified with "vtsc_tolerance_khz". If the +measured "cpu_khz" value is within the tolerance range, TSC access +remains native. Otherwise it will be emulated. This allows to migrate +domUs between identical hardware. If the domU will be migrated to a +different kind of hardware, say from a "2.3GHz" to a "2.5GHz" system, +TSC will be emualted to maintain the TSC frequency expected by the domU. + For environments where both TSC-safeness AND highest performance even across migration is a requirement, application code can be specially modified to use an algorithm explicitly designed into Xen for this purpose. diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in index 47d88243b1..995277794f 100644 --- a/docs/man/xl.cfg.pod.5.in +++ b/docs/man/xl.cfg.pod.5.in @@ -1898,6 +1898,16 @@ determined in a similar way to that of B<default> TSC mode. Please see B<xen-tscmode(7)> for more information on this option. +=item B<vtsc_tolerance_khz="KHZ"> + +B<(x86 only, relevant only for tsc_mode=default)> +When a domU is started, the CPU frequency of the host is used by the domU for +TSC related time measurement. Once the domU is either migrated or +saved/restored on another host that CPU frequency has to be emulated to avoid +timedrift. To avoid the performance penalty of the TSC emulation, allow a +certain amount of jitter of the measured CPU frequency on the hosts the domU +is supposed to run on. Default value is 0, i.e. no tolerance. + =item B<localtime=BOOLEAN> Set the real time clock to local time or to UTC. False (0) by default, diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc index 73421ff393..0d0f17edb1 100644 --- a/docs/specs/libxc-migration-stream.pandoc +++ b/docs/specs/libxc-migration-stream.pandoc @@ -3,7 +3,7 @@ Andrew Cooper <<andrew.cooper3@xxxxxxxxxx>> Wen Congyang <<wency@xxxxxxxxxxxxxx>> Yang Hongyang <<hongyang.yang@xxxxxxxxxxxx>> -% Revision 2 +% Revision 3 Introduction ============ @@ -472,7 +472,7 @@ XEN\_DOMCTL\_{get,set}tscinfo hypercall sub-ops. +------------------------+------------------------+ | nsec | +------------------------+------------------------+ - | incarnation | (reserved) | + | incarnation | tolerance | (reserved) | +------------------------+------------------------+ -------------------------------------------------------------------- @@ -485,6 +485,8 @@ khz TSC frequency, in kHz. nsec Elapsed time, in nanoseconds. incarnation Incarnation. + +tolerance Amount of Jitter the domU can handle after migration -------------------------------------------------------------------- \clearpage diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 408fa1c6a4..e74c480ae2 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1360,6 +1360,7 @@ int xc_domain_set_tsc_info(xc_interface *xch, uint32_t tsc_mode, uint64_t elapsed_nsec, uint32_t gtsc_khz, + uint16_t vtsc_tolerance_khz, uint32_t incarnation); int xc_domain_get_tsc_info(xc_interface *xch, @@ -1367,6 +1368,7 @@ int xc_domain_get_tsc_info(xc_interface *xch, uint32_t *tsc_mode, uint64_t *elapsed_nsec, uint32_t *gtsc_khz, + uint16_t *vtsc_tolerance_khz, uint32_t *incarnation); int xc_domain_disable_migrate(xc_interface *xch, uint32_t domid); diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 57e18ee227..ec111989ee 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -852,6 +852,7 @@ int xc_domain_set_tsc_info(xc_interface *xch, uint32_t tsc_mode, uint64_t elapsed_nsec, uint32_t gtsc_khz, + uint16_t vtsc_tolerance_khz, uint32_t incarnation) { DECLARE_DOMCTL; @@ -860,6 +861,7 @@ int xc_domain_set_tsc_info(xc_interface *xch, domctl.u.tsc_info.tsc_mode = tsc_mode; domctl.u.tsc_info.elapsed_nsec = elapsed_nsec; domctl.u.tsc_info.gtsc_khz = gtsc_khz; + domctl.u.tsc_info.vtsc_tolerance_khz = vtsc_tolerance_khz; domctl.u.tsc_info.incarnation = incarnation; return do_domctl(xch, &domctl); } @@ -869,6 +871,7 @@ int xc_domain_get_tsc_info(xc_interface *xch, uint32_t *tsc_mode, uint64_t *elapsed_nsec, uint32_t *gtsc_khz, + uint16_t *vtsc_tolerance_khz, uint32_t *incarnation) { int rc; @@ -882,6 +885,7 @@ int xc_domain_get_tsc_info(xc_interface *xch, *tsc_mode = domctl.u.tsc_info.tsc_mode; *elapsed_nsec = domctl.u.tsc_info.elapsed_nsec; *gtsc_khz = domctl.u.tsc_info.gtsc_khz; + *vtsc_tolerance_khz = domctl.u.tsc_info.vtsc_tolerance_khz; *incarnation = domctl.u.tsc_info.incarnation; } return rc; diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c index 98f1cef30f..ea3e551a83 100644 --- a/tools/libxc/xc_sr_common_x86.c +++ b/tools/libxc/xc_sr_common_x86.c @@ -12,7 +12,8 @@ int write_tsc_info(struct xc_sr_context *ctx) }; if ( xc_domain_get_tsc_info(xch, ctx->domid, &tsc.mode, - &tsc.nsec, &tsc.khz, &tsc.incarnation) < 0 ) + &tsc.nsec, &tsc.khz, &tsc.vtsc_tolerance, + &tsc.incarnation) < 0 ) { PERROR("Unable to obtain TSC information"); return -1; @@ -34,7 +35,8 @@ int handle_tsc_info(struct xc_sr_context *ctx, struct xc_sr_record *rec) } if ( xc_domain_set_tsc_info(xch, ctx->domid, tsc->mode, - tsc->nsec, tsc->khz, tsc->incarnation) ) + tsc->nsec, tsc->khz, tsc->vtsc_tolerance, + tsc->incarnation) ) { PERROR("Unable to set TSC information"); return -1; diff --git a/tools/libxc/xc_sr_stream_format.h b/tools/libxc/xc_sr_stream_format.h index 15ff1c7efb..9b52f6ace6 100644 --- a/tools/libxc/xc_sr_stream_format.h +++ b/tools/libxc/xc_sr_stream_format.h @@ -121,7 +121,8 @@ struct xc_sr_rec_tsc_info uint32_t khz; uint64_t nsec; uint32_t incarnation; - uint32_t _res1; + uint16_t vtsc_tolerance; + uint16_t _res1; }; /* HVM_PARAMS */ diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index a09d069358..2247f04648 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -354,6 +354,12 @@ #define LIBXL_HAVE_BUILDINFO_BOOTLOADER 1 #define LIBXL_HAVE_BUILDINFO_BOOTLOADER_ARGS 1 +/* + * LIBXL_HAVE_VTSC_TOLERANCE_KHZ indicates that libxl_domain_build_info + * has the vtsc_tolerance_khz field. + */ +#define LIBXL_HAVE_VTSC_TOLERANCE_KHZ 1 + /* * libxl ABI compatibility * diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 01ec1d1afa..bb99776401 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -466,6 +466,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")), ("numa_placement", libxl_defbool), ("tsc_mode", libxl_tsc_mode), + ("vtsc_tolerance_khz", uint16), ("max_memkb", MemKB), ("target_memkb", MemKB), ("video_memkb", MemKB), diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index ab88562619..d9747cc45a 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -314,7 +314,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config, default: abort(); } - xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0); + xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, + d_config->b_info.vtsc_tolerance_khz, 0); if (libxl_defbool_val(d_config->b_info.disable_migrate)) xc_domain_disable_migrate(ctx->xch, domid); rtc_timeoffset = d_config->b_info.rtc_timeoffset; diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 694bfa0642..d781589886 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -1522,7 +1522,7 @@ static PyObject *pyxc_domain_set_tsc_info(XcObject *self, PyObject *args) if (!PyArg_ParseTuple(args, "ii", &dom, &tsc_mode)) return NULL; - if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0) != 0) + if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0, 0) != 0) return pyxc_error_to_exception(self->xc_handle); Py_INCREF(zero); diff --git a/tools/python/xen/migration/libxc.py b/tools/python/xen/migration/libxc.py index f24448a9ef..abcda617e4 100644 --- a/tools/python/xen/migration/libxc.py +++ b/tools/python/xen/migration/libxc.py @@ -114,7 +114,7 @@ X86_PV_P2M_FRAMES_FORMAT = "II" X86_PV_VCPU_HDR_FORMAT = "II" # tsc_info -TSC_INFO_FORMAT = "IIQII" +TSC_INFO_FORMAT = "IIQIHH" # hvm_params HVM_PARAMS_ENTRY_FORMAT = "QQ" @@ -363,14 +363,14 @@ class VerifyLibxc(VerifyBase): if len(content) != sz: raise RecordError("Length should be %u bytes" % (sz, )) - mode, khz, nsec, incarn, res1 = unpack(TSC_INFO_FORMAT, content) + mode, khz, nsec, incarn, tolerance, res1 = unpack(TSC_INFO_FORMAT, content) if res1 != 0: raise StreamError("Reserved bits set in TSC_INFO: 0x%08x" % (res1, )) - self.info(" Mode %u, %u kHz, %u ns, incarnation %d" - % (mode, khz, nsec, incarn)) + self.info(" Mode %u, %u kHz, %u ns, incarnation %d, tolerance %u kHz" + % (mode, khz, nsec, incarn, tolerance)) def verify_record_hvm_context(self, content): diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index e6c54483e0..1915640d64 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -1126,6 +1126,9 @@ void parse_config_data(const char *config_source, } } + if (!xlu_cfg_get_long(config, "vtsc_tolerance_khz", &l, 0)) + b_info->vtsc_tolerance_khz = l < 0 || l > UINT16_MAX ? UINT16_MAX : l; + if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0)) b_info->rtc_timeoffset = l; diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 0ca820a00a..3ae13b8f78 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -557,7 +557,7 @@ int arch_domain_create(struct domain *d, ASSERT_UNREACHABLE(); /* Not HVM and not PV? */ /* initialize default tsc behavior in case tools don't */ - tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0); + tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0, 0); /* PV/PVH guests get an emulated PIT too for video BIOSes to use. */ pit_init(d, cpu_khz); diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 8fbbf3aeb3..d86ff58482 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -939,6 +939,7 @@ long arch_do_domctl( tsc_get_info(d, &domctl->u.tsc_info.tsc_mode, &domctl->u.tsc_info.elapsed_nsec, &domctl->u.tsc_info.gtsc_khz, + &domctl->u.tsc_info.vtsc_tolerance_khz, &domctl->u.tsc_info.incarnation); domain_unpause(d); copyback = true; @@ -954,6 +955,7 @@ long arch_do_domctl( tsc_set_info(d, domctl->u.tsc_info.tsc_mode, domctl->u.tsc_info.elapsed_nsec, domctl->u.tsc_info.gtsc_khz, + domctl->u.tsc_info.vtsc_tolerance_khz, domctl->u.tsc_info.incarnation); domain_unpause(d); } diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index c342d00732..4a9c43b718 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -2063,7 +2063,7 @@ int host_tsc_is_safe(void) */ void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec, uint32_t *gtsc_khz, - uint32_t *incarnation) + uint16_t *vtsc_tolerance_khz, uint32_t *incarnation) { bool enable_tsc_scaling = is_hvm_domain(d) && hvm_tsc_scaling_supported && !d->arch.vtsc; @@ -2079,6 +2079,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode, *elapsed_nsec = *gtsc_khz = 0; break; case TSC_MODE_DEFAULT: + *vtsc_tolerance_khz = d->arch.vtsc_tolerance_khz; if ( d->arch.vtsc ) { case TSC_MODE_ALWAYS_EMULATE: @@ -2121,7 +2122,8 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode, */ void tsc_set_info(struct domain *d, uint32_t tsc_mode, uint64_t elapsed_nsec, - uint32_t gtsc_khz, uint32_t incarnation) + uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz, + uint32_t incarnation) { ASSERT(!is_system_domain(d)); @@ -2133,9 +2135,12 @@ void tsc_set_info(struct domain *d, switch ( d->arch.tsc_mode = tsc_mode ) { + bool disable_vtsc; bool enable_tsc_scaling; case TSC_MODE_DEFAULT: + d->arch.vtsc_tolerance_khz = vtsc_tolerance_khz; + /* Fallthrough. */ case TSC_MODE_ALWAYS_EMULATE: d->arch.vtsc_offset = get_s_time() - elapsed_nsec; d->arch.tsc_khz = gtsc_khz ?: cpu_khz; @@ -2148,8 +2153,25 @@ void tsc_set_info(struct domain *d, * When a guest is created, gtsc_khz is passed in as zero, making * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation. */ + disable_vtsc = d->arch.tsc_khz == cpu_khz; + + if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz && + d->arch.vtsc_tolerance_khz ) + { + long khz_diff; + + khz_diff = ABS((long)(cpu_khz - gtsc_khz)); + disable_vtsc = khz_diff <= d->arch.vtsc_tolerance_khz; + + printk(XENLOG_G_INFO "d%d: host has %lu kHz," + " domU expects %u kHz," + " difference of %ld is %s tolerance of %u\n", + d->domain_id, cpu_khz, gtsc_khz, khz_diff, + disable_vtsc ? "within" : "outside", + d->arch.vtsc_tolerance_khz); + } if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() && - (d->arch.tsc_khz == cpu_khz || + (disable_vtsc || (is_hvm_domain(d) && hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) ) { @@ -2238,6 +2260,8 @@ static void dump_softtsc(unsigned char key) printk(",ofs=%#"PRIx64, d->arch.vtsc_offset); if ( d->arch.tsc_khz ) printk(",khz=%"PRIu32, d->arch.tsc_khz); + if ( d->arch.vtsc_tolerance_khz ) + printk(",tol=%"PRIu16, d->arch.vtsc_tolerance_khz); if ( d->arch.incarnation ) printk(",inc=%"PRIu32, d->arch.incarnation); #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS) diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 197f8d62be..be3265aa7f 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -379,6 +379,7 @@ struct arch_domain uint64_t vtsc_offset; /* adjustment for save/restore/migrate */ uint32_t tsc_khz; /* cached guest khz for certain emulated or hardware TSC scaling cases */ + uint32_t vtsc_tolerance_khz; /* domU handles that much jitter in cpu_khz */ struct time_scale vtsc_to_ns; /* scaling for certain emulated or hardware TSC scaling cases */ struct time_scale ns_to_vtsc; /* scaling for certain emulated or diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h index b3ae832df4..ef9be7a701 100644 --- a/xen/include/asm-x86/time.h +++ b/xen/include/asm-x86/time.h @@ -61,10 +61,12 @@ u64 gtime_to_gtsc(struct domain *d, u64 time); u64 gtsc_to_gtime(struct domain *d, u64 tsc); void tsc_set_info(struct domain *d, uint32_t tsc_mode, uint64_t elapsed_nsec, - uint32_t gtsc_khz, uint32_t incarnation); + uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz, + uint32_t incarnation); void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec, - uint32_t *gtsc_khz, uint32_t *incarnation); + uint32_t *gtsc_khz, uint16_t *vtsc_tolerance_khz, + uint32_t *incarnation); void force_update_vcpu_system_time(struct vcpu *v); diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 0535da81c6..b2a10ff04d 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -702,7 +702,8 @@ struct xen_domctl_tsc_info { uint32_t tsc_mode; uint32_t gtsc_khz; uint32_t incarnation; - uint32_t pad; + uint16_t vtsc_tolerance_khz; + uint16_t pad; uint64_aligned_t elapsed_nsec; }; _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |