|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-changelog] [xen staging] tools/docs: Remove PVRDTSCP support
commit 6b10a4c416c5fb598d5b960a2818f857d2ddd6ed
Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
AuthorDate: Thu Dec 13 15:51:41 2018 -0800
Commit: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CommitDate: Tue Dec 18 17:13:51 2018 +0000
tools/docs: Remove PVRDTSCP support
PVRDTSCP is believed-unused, and its implementation has adverse consequences
on unrelated functionality in the hypervisor. As a result, support is being
removed.
For more historical context, see
c/s c17b36d5dc792cfdf59b6de0213b168bec0af8e8
c/s 04656384a1b9714e43db850c51431008e23450d8
Modify libxl to provide a slightly more helpful error message if it
encounters
PVRDTSCP being selected. While adjusting TSC handling, make libxl check for
errors from the set_tsc hypercall.
Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Acked-by: Jan Beulich <jbeulich@xxxxxxxx>
Acked-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
---
docs/man/xen-tscmode.pod.7 | 94 +-----------
docs/man/xl.cfg.pod.5.in | 9 +-
docs/misc/pvrdtscp.c | 307 --------------------------------------
tools/libxl/libxl_x86.c | 13 +-
tools/python/xen/lowlevel/xc/xc.c | 2 +-
5 files changed, 19 insertions(+), 406 deletions(-)
diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 819c61dd05..1d81a3fe18 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -77,9 +77,7 @@ highest performance is required.
=item * B<tsc_mode=3> (PVRDTSCP).
-High-TSC-frequency apps may be paravirtualized (modified) to
-obtain both correctness and highest performance; any unmodified
-apps must be TSC-resilient.
+This mode has been removed.
=back
@@ -215,30 +213,6 @@ is emulated. Note that, though emulated, the "apparent"
TSC frequency
will be the TSC frequency of the initial physical machine, even after
migration.
-For environments where both TSC-safeness AND highest performance
-even across migration is a requirement, application code can be specially
-modified to use an algorithm explicitly designed into Xen for this purpose.
-This mode (tsc_mode==3) is called PVRDTSCP, because it requires
-app paravirtualization (awareness by the app that it may be running
-on top of Xen), and utilizes a variation of the rdtsc instruction
-called rdtscp that is available on most recent generation processors.
-(The rdtscp instruction differs from the rdtsc instruction in that it
-reads not only the TSC but an additional register set by system software.)
-When a pvrdtscp-modified app is running on a processor that is both TSC-safe
-and supports the rdtscp instruction, information can be obtained
-about migration and TSC frequency/offset adjustment to allow the
-vast majority of timestamps to be obtained at top performance; when
-running on a TSC-unsafe processor or a processor that doesn't support
-the rdtscp instruction, rdtscp is emulated.
-
-PVRDTSCP (tsc_mode==3) has two limitations. First, it applies to
-all apps running in this virtual machine. This means that all
-apps must either be TSC-resilient or pvrdtscp-modified. Second,
-highest performance is only obtained on TSC-safe machines that
-support the rdtscp instruction; when running on older machines,
-rdtscp is emulated and thus slower. For more information on PVRDTSCP,
-see below.
-
Finally, tsc_mode==1 always enables TSC emulation, regardless of
the underlying physical hardware. The "apparent" TSC frequency will
be the TSC frequency of the initial physical machine, even after migration.
@@ -287,56 +261,7 @@ have been replaced by a paravirtualized equivalent of the
cpuid instruction
("pvcpuid") and also trap to Xen. But apps in a PV guest that use a
cpuid instruction execute it directly, without a trap to Xen. As a result,
an app may directly examine the physical TSC Invariant cpuid bit and make
-decisions based on that bit. This is still an unsolved problem, though
-a workaround exists as part of the PVRDTSCP tsc_mode for apps that
-can be modified.
-
-=head1 MORE ON PVRDTSCP
-
-Paravirtualized OS's use the "pvclock" algorithm to manage the passing
-of time. This sophisticated algorithm obtains information from a memory
-page shared between Xen and the OS and selects information from this
-page based on the current virtual CPU (vcpu) in order to properly adapt to
-TSC-unsafe systems and changes that occur across migration. Neither
-this shared page nor the vcpu information is available to a userland
-app so the pvclock algorithm cannot be directly used by an app, at least
-without performance degradation roughly equal to the cost of just
-emulating an rdtsc.
-
-As a result, as of 4.0, Xen provides capabilities for a userland app
-to obtain key time values similar to the information accessible
-to the PV OS pvclock algorithm. The app uses the rdtscp instruction
-which is defined in recent processors to obtain both the TSC and an
-auxiliary value called TSC_AUX. Xen is responsible for setting TSC_AUX
-to the same value on all vcpus running any domain with tsc_mode==3;
-further, Xen tools are responsible for monotonically incrementing TSC_AUX
-anytime the domain is restored/migrated (thus changing key time values);
-and, when the domain is running on a physical machine that either
-is not TSC-safe or does not support the rdtscp instruction, Xen
-is responsible for emulating the rdtscp instruction and for setting
-TSC_AUX to zero on all processors.
-
-Xen also provides pvclock information via a "pvcpuid" instruction.
-While this results in a slow trap, the information changes
-(and thus must be reobtained via pvcpuid) ONLY when TSC_AUX
-has changed, which should be very rare relative to a high
-frequency of rdtscp instructions.
-
-Finally, Xen provides additional time-related information via
-other pvcpuid instructions. First, an app is capable of
-determining if it is currently running on Xen, next whether
-the tsc_mode setting of the domain in which it is running,
-and finally whether the underlying hardware is TSC-safe and
-supports the rdtscp instruction.
-
-As a result, a pvrdtscp-modified app has sufficient information
-to compute the pvclock "elapsed nanoseconds" which can
-be used as a timestamp. And this can be done nearly as
-fast as a native rdtsc instruction, much faster than emulation,
-and also much faster than nearly all OS-provided time mechanisms.
-While pvrtscp is too complex for most apps, certain enterprise
-TSC-sensitive high-TSC-frequency apps may find it useful to
-obtain a significant performance gain.
+decisions based on that bit.
=head1 HARDWARE TSC SCALING
@@ -344,21 +269,16 @@ Intel VMX TSC scaling and AMD SVM TSC ratio allow the
guest TSC read
by guest rdtsc/p increasing in a different frequency than the host
TSC frequency.
-If a HVM container in default TSC mode (tsc_mode=0) or PVRDTSCP mode
-(tsc_mode=3) is created on a host that provides constant TSC, its
-guest TSC frequency will be the same as the host. If it is later
-migrated to another host that provides constant TSC and supports Intel
-VMX TSC scaling/AMD SVM TSC ratio, its guest TSC frequency will be the
-same before and after migration.
+If a HVM container in default TSC mode (tsc_mode=0) is created on a host
+that provides constant TSC, its guest TSC frequency will be the same as
+the host. If it is later migrated to another host that provides constant
+TSC and supports Intel VMX TSC scaling/AMD SVM TSC ratio, its guest TSC
+frequency will be the same before and after migration.
For above HVM container in default TSC mode (tsc_mode=0), if above
hosts support rdtscp, both guest rdtsc and rdtscp instructions will be
executed natively before and after migration.
-For above HVM container in PVRDTSCP mode (tsc_mode=3), if the
-destination host does not support rdtscp, the guest rdtscp instruction
-will be emulated with the guest TSC frequency.
-
=head1 AUTHORS
Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index b1c0be14cd..3b92f39d8d 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -2099,14 +2099,7 @@ by h/w, else executed natively.
=item B<native_paravirt>
-Same as B<native>, except Xen manages the TSC_AUX register so the guest can
-determine when a restore/migration has occurred and assumes guest
-obtains/uses a pvclock-like mechanism to adjust for monotonicity and
-frequency changes.
-
-If a HVM container in B<native_paravirt> TSC mode can execute both guest
-rdtsc and guest rdtscp natively, then the guest TSC frequency will be
-determined in a similar way to that of B<default> TSC mode.
+This mode has been removed.
=back
diff --git a/docs/misc/pvrdtscp.c b/docs/misc/pvrdtscp.c
deleted file mode 100644
index 8d25843532..0000000000
--- a/docs/misc/pvrdtscp.c
+++ /dev/null
@@ -1,307 +0,0 @@
-/* pvrdtscp algorithm
- *
- * This sample code demonstrates the use of the paravirtualized rdtscp
- * algorithm. Using this algorithm, an application may communicate with
- * the Xen hypervisor (version 4.0+) to obtain timestamp information which
- * is both monotonically increasing and has a fixed 1 GHz rate, even across
- * migrations between machines with different TSC rates and offsets.
- * Further,the algorithm provides performance near the performance of a
- * native rdtsc/rdtscp instruction -- much faster than emulation PROVIDED
- * the application is running on a machine on which the rdtscp instruction
- * is supported and TSC is "safe". The application must also be running in a
- * PV domain. (HVM domains may be supported at a later time.) On machines
- * where TSC is unsafe or the rdtscp instruction is not supported, Xen
- * (v4.0+) provides emulation which is slower but consistent with the pvrdtscp
- * algorithm, thus providing support for the algorithm for live migration
- * across all machines.
- *
- * More information can be found within the Xen (4.0+) source tree at
- * docs/misc/tscmode.txt
- *
- * Copyright (c) 2009 Oracle Corporation and/or its affiliates.
- * All rights reserved
- * Written by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
- *
- * This code is derived from code licensed under the GNU
- * General Public License ("GPL") version 2 and is therefore itself
- * also licensed under the GPL version 2.
- *
- * This code is known to compile and run on Oracle Enterprise Linux 5 Update 2
- * using gcc version 4.1.2, but its purpose is to describe the pvrdtscp
- * algorithm and its ABI to Xen version 4.0+
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/wait.h>
-
-#ifdef __LP64__
-#define __X86_64__
-typedef unsigned short u16;
-typedef unsigned int u32;
-typedef unsigned long u64;
-typedef int i32;
-typedef long i64;
-#define NSEC_PER_SEC 1000000000
-#else
-#define __X86_32__
-typedef unsigned int u16;
-typedef unsigned long u32;
-typedef unsigned long long u64;
-typedef long i32;
-typedef long long i64;
-#define NSEC_PER_SEC 1000000000L
-#endif
-
-static inline void hvm_cpuid(u32 idx, u32 sub,
- u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
-{
- *eax = idx, *ecx = sub;
- asm("cpuid" : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
- : "0" (*eax), "2" (*ecx));
-}
-
-static inline void pv_cpuid(u32 idx, u32 sub,
- u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
-{
- *eax = idx, *ecx = sub;
- asm volatile ( "ud2a ; .ascii \"xen\"; cpuid" : "=a" (*eax),
- "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "0" (*eax), "2" (*ecx));
-}
-
-static inline u64 do_rdtscp(u32 *aux)
-{
-static u64 last = 0;
- u32 lo32, hi32;
- u64 val;
-
- asm volatile(".byte 0x0f,0x01,0xf9":"=a"(lo32),"=d"(hi32),"=c" (*aux));
- val = lo32 | ((u64)hi32 << 32);
- return val;
-}
-
-static inline int get_xen_tsc_mode(void)
-{
- u32 val, dummy1, dummy2, dummy3;
- pv_cpuid(0x40000003,0,&dummy1,&val,&dummy2,&dummy3);
- return val;
-}
-
-static inline int get_xen_vtsc(void)
-{
- u32 val, dummy1, dummy2, dummy3;
- pv_cpuid(0x40000003,0,&val,&dummy1,&dummy2,&dummy3);
- return val & 1;
-}
-
-static inline int get_xen_vtsc_khz(void)
-{
- u32 val, dummy1, dummy2, dummy3;
- pv_cpuid(0x40000003,0,&dummy1,&dummy2,&val,&dummy3);
- return val;
-}
-
-static inline u32 get_xen_cpu_khz(void)
-{
- u32 cpu_khz, dummy1, dummy2, dummy3;
- pv_cpuid(0x40000003,2,&cpu_khz,&dummy1,&dummy2,&dummy3);
- return cpu_khz;
-}
-
-static inline u32 get_xen_incarnation(void)
-{
- u32 incarn, dummy1, dummy2, dummy3;
- pv_cpuid(0x40000003,0,&dummy1,&dummy2,&dummy3,&incarn);
- return incarn;
-}
-
-static inline void get_xen_time_values(u64 *offset, u32 *mul_frac, u32 *shift)
-{
- u32 off_lo, off_hi, sys_lo, sys_hi, dummy;
-
- pv_cpuid(0x40000003,1,&off_lo,&off_hi,mul_frac,shift);
- *offset = off_lo | ((u64)off_hi << 32);
-}
-
-static inline u64 scale_delta(u64 delta, u32 tsc_mul_frac, i32 tsc_shift)
-{
- u64 product;
-#ifdef __X86_32__
- u32 tmp1, tmp2;
-#endif
-
- if ( tsc_shift < 0 )
- delta >>= -tsc_shift;
- else
- delta <<= tsc_shift;
-
-#ifdef __X86_32__
- asm (
- "mul %5 ; "
- "mov %4,%%eax ; "
- "mov %%edx,%4 ; "
- "mul %5 ; "
- "xor %5,%5 ; "
- "add %4,%%eax ; "
- "adc %5,%%edx ; "
- : "=A" (product), "=r" (tmp1), "=r" (tmp2)
- : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (tsc_mul_frac) );
-#else
- asm (
- "mul %%rdx ; shrd $32,%%rdx,%%rax"
- : "=a" (product) : "0" (delta), "d" ((u64)tsc_mul_frac) );
-#endif
-
- return product;
-}
-
-static inline u64 get_pvrdtscp_timestamp(int *discontinuity)
-{
- static int firsttime = 1;
- static u64 last_pvrdtscp_timestamp = 0;
- static u32 last_tsc_aux;
- static u64 xen_ns_offset;
- static u32 xen_tsc_to_ns_mul_frac, xen_tsc_to_ns_shift;
- u32 this_tsc_aux;
- u64 timestamp, cur_tsc, cur_ns;
-
- if (firsttime) {
- cur_tsc = do_rdtscp(&last_tsc_aux);
- get_xen_time_values(&xen_ns_offset, &xen_tsc_to_ns_mul_frac,
- &xen_tsc_to_ns_shift);
- cur_ns = scale_delta(cur_tsc, xen_tsc_to_ns_mul_frac,
- xen_tsc_to_ns_shift);
- timestamp = cur_ns - xen_ns_offset;
- last_pvrdtscp_timestamp = timestamp;
- firsttime = 0;
- }
- cur_tsc = do_rdtscp(&this_tsc_aux);
- *discontinuity = 0;
- while (this_tsc_aux != last_tsc_aux) {
- /* if tsc_aux changed, try again */
- last_tsc_aux = this_tsc_aux;
- get_xen_time_values(&xen_ns_offset, &xen_tsc_to_ns_mul_frac,
- &xen_tsc_to_ns_shift);
- cur_tsc = do_rdtscp(&this_tsc_aux);
- *discontinuity = 1;
- }
-
- /* compute nsec from TSC and Xen time values */
- cur_ns = scale_delta(cur_tsc, xen_tsc_to_ns_mul_frac,
- xen_tsc_to_ns_shift);
- timestamp = cur_ns - xen_ns_offset;
-
- /* enforce monotonicity just in case */
- if ((i64)(timestamp - last_pvrdtscp_timestamp) > 0)
- last_pvrdtscp_timestamp = timestamp;
- else {
- /* this should never happen but we'll check it anyway in
- * case of some strange combination of scaling errors
- * occurs across a very fast migration */
- printf("Time went backwards by %lluns\n",
- (unsigned long long)(last_pvrdtscp_timestamp-timestamp));
- timestamp = ++last_pvrdtscp_timestamp;
- }
- return timestamp;
-}
-
-#define HVM 1
-#define PVM 0
-
-static int running_on_xen(int hvm, u16 *version_major, u16 *version_minor)
-{
- u32 eax, ebx, ecx, edx, base;
- union { char csig[16]; u32 u[4]; } sig;
-
- for (base=0x40000000; base < 0x40010000; base += 0x100) {
- if (hvm==HVM)
- hvm_cpuid(base,0,&eax,&ebx,&ecx,&edx);
- else
- pv_cpuid(base,0,&eax,&ebx,&ecx,&edx);
- sig.u[0] = ebx; sig.u[1] = ecx; sig.u[2] = edx;
- sig.csig[12] = '\0';
- if (!strcmp("XenVMMXenVMM",&sig.csig[0]) && (eax >= (base+2))) {
- if (hvm==HVM)
- hvm_cpuid(base+1,0,&eax,&ebx,&ecx,&edx);
- else
- pv_cpuid(base+1,0,&eax,&ebx,&ecx,&edx);
- *version_major = (eax >> 16) & 0xffff;
- *version_minor = eax & 0xffff;
- return 1;
- }
- }
- return 0;
-}
-
-main(int ac, char **av)
-{
- u32 dummy;
- u16 version_hi, version_lo;
- u64 ts, last_ts;
- int status, discontinuity = 0;
- pid_t pid;
-
- if (running_on_xen(HVM,&version_hi,&version_lo)) {
- printf("running on Xen v%d.%d as an HVM domain, "
- "pvrdtsc not supported, exiting\n",
- (int)version_hi, (int)version_lo);
- exit(0);
- }
- pid = fork();
- if (pid == -1) {
- fprintf(stderr,"Huh? Fork failed\n");
- return 0;
- }
- else if (pid == 0) { /* child */
- pv_cpuid(0x40000000,0,&dummy,&dummy,&dummy,&dummy);
- exit(0);
- }
- waitpid(pid,&status,0);
- if (!WIFEXITED(status))
- exit(0);
- if (!running_on_xen(PVM,&version_hi,&version_lo)) {
- printf("not running on Xen, exiting\n");
- exit(0);
- }
- printf("running on Xen v%d.%d as a PV domain\n",
- (int)version_hi, (int)version_lo);
- if ( version_hi <= 3 ) {
- printf("pvrdtscp requires Xen version 4.0 or greater\n");
- /* exit(0); FIXME after xen-unstable is officially v4.0 */
- }
- if ( get_xen_tsc_mode() != 3 )
- printf("tsc_mode not pvrdtscp, set tsc_mode=3, exiting\n");
-
- /* OK, we are on Xen, now loop forever checking timestamps */
- ts = get_pvrdtscp_timestamp(&discontinuity);
- printf("Starting with ts=%lluns 0x%llx
(%llusec)\n",ts,ts,ts/NSEC_PER_SEC);
- printf("incarn=%d: vtsc=%d, vtsc_khz=%lu, phys cpu_khz=%lu\n",
- (unsigned long)get_xen_incarnation(),
- (unsigned long)get_xen_vtsc(),
- (unsigned long)get_xen_vtsc_khz(),
- (unsigned long)get_xen_cpu_khz());
- ts = get_pvrdtscp_timestamp(&discontinuity);
- last_ts = ts;
- while (1) {
- ts = get_pvrdtscp_timestamp(&discontinuity);
- if (discontinuity)
- printf("migrated/restored, incarn=%d: "
- "vtsc now %d, vtsc_khz=%lu, phys cpu_khz=%lu\n",
- (unsigned long)get_xen_incarnation(),
- (unsigned long)get_xen_vtsc(),
- (unsigned long)get_xen_vtsc_khz(),
- (unsigned long)get_xen_cpu_khz());
- if (ts < last_ts)
- /* this should NEVER happen, especially since there
- * is a check for it in get_pvrdtscp_timestamp() */
- printf("Time went backwards: %lluns (%llusec)\n",
- last_ts-ts,(last_ts-ts)/NSEC_PER_SEC);
- if (ts > last_ts + 200000000LL)
- /* this is OK, usually about 2sec for save/restore
- * and a fraction of a second for live migrate */
- printf("Time jumped forward %lluns (%llusec)\n",
- ts-last_ts,(ts-last_ts)/NSEC_PER_SEC);
- last_ts = ts;
- }
-}
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index c04fd75a64..c0f88a7eaa 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -309,12 +309,19 @@ int libxl__arch_domain_create(libxl__gc *gc,
libxl_domain_config *d_config,
tsc_mode = 2;
break;
case LIBXL_TSC_MODE_NATIVE_PARAVIRT:
- tsc_mode = 3;
- break;
+ LOGD(ERROR, domid, "TSC Mode native_paravirt (a.k.a PVRDTSCP) has been
removed");
+ ret = ERROR_FEATURE_REMOVED;
+ goto out;
default:
abort();
}
- xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
+
+ if (xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0)) {
+ LOGE(ERROR, "xc_domain_set_tsc_info() failed");
+ ret = ERROR_FAIL;
+ goto out;
+ }
+
if (libxl_defbool_val(d_config->b_info.disable_migrate))
xc_domain_disable_migrate(ctx->xch, domid);
rtc_timeoffset = d_config->b_info.rtc_timeoffset;
diff --git a/tools/python/xen/lowlevel/xc/xc.c
b/tools/python/xen/lowlevel/xc/xc.c
index 484b790c75..cc8175a11e 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -2439,7 +2439,7 @@ static PyMethodDef pyxc_methods[] = {
"Set a domain's TSC mode\n"
" dom [int]: Domain whose TSC mode is being set.\n"
" tsc_mode [int]: 0=default (monotonic, but native where possible)\n"
- " 1=always emulate 2=never emulate 3=pvrdtscp\n"
+ " 1=always emulate 2=never emulate\n"
"Returns: [int] 0 on success; -1 on error.\n" },
{ "domain_disable_migrate",
--
generated by git-patchbot for /home/xen/git/xen.git#staging
_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/xen-changelog
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |