|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [RESEND PATCH v3] Introduce runstate area registration with phys address
From: Andrii Anisov <andrii_anisov@xxxxxxxx>
---
Resending the series, because the previous try went with mangled threading.
---
Following discussion [1] it is introduced and implemented a runstate
registration interface which uses guest's phys address instead of a virtual one.
The new hypercall employes the same data structures as a predecessor, but
expects the vcpu_runstate_info structure to not cross a page boundary.
The interface is implemented in a way vcpu_runstate_info structure is mapped to
the hypervisor on the hypercall processing and is directly accessed during its
updates. This runstate area mapping follows vcpu_info structure registration.
Permanent mapping of runstate area would consume vmap area on arm32 what is
limited to 1G. Though it might be OK because it would be possible to increase
the ARM32 virtual address area by reworking the address space.
The series is tested for ARM64. Build tested for x86. I'd appreciate if someone
could check it with x86.
The Linux kernel patch is here [2]. Though it is for 4.14. It is not still
convinced the absolute correctness of that patch, yet this should be better
aligned with linux community.
Changes in:
v3: This version again implements runstate mapping on init approach.
Patches are squashed and refactored in order to not allow virt and phys
interfaces function simultaneously but replace one another on init.
In order to measure performance impact of permanent mapping vs mapping on
access there written two RFC patches which follow mapping on access
approach with the little difference:
- RFC 1 - using copy_to_guest_phys_flush_dcache() for each access to
runstate area.
- RFC 2 - mapping runstate area before all update manipulations and unmap
after.
RFC patches were implemented for ARM only, because performance
measurements
were done on ARM64 machine.
There were made performance measurements of approaches (runstate mapped on
access vs mapped on registration). The test setups are as following:
Thin Dom0 (Linux with intiramfs) with DomD running rich Yocto Linux. In
DomD 3d benchmark numbers are compared. The benchmark is GlMark2. GLMark2
is ran with different resolutions in order to emit different irq load,
where 320x240 emits high IRQ load, but 1920x1080 emits low irq load.
Separately tested baking DomD benchmark run with primitive Dom0 CPU burn
(dd), in order to stimulate VCPU(dX)->VCPU(dY) switches rather than
VCPU(dX)->idle->VCPU(dX).
with following results:
mapped mapped mapped
on init on access on update
RFC 1 RFC 2
GLMark2 320x240 2906 2856 (-2%) 2903 (0)
+Dom0 CPUBurn 2166 2106 (-3%) 2134 (1%)
GLMark2 800x600 2396 2367 (-1%) 2393 (0%)
+Dom0 CPUBurn 1958 1911 (-2%) 1942 (-1%)
GLMark2 1920x1080 939 936 (0%) 935 (0%)
+Dom0 CPUBurn 909 901 (-1%) 907 (0%)
Also it was checked IRQ latency difference using TBM in a setup similar to
[5]. Please note that the IRQ rate is one in 30 seconds, and only
VCPU->idle->VCPU use-case is considered. With following results (in ns,
the timer granularity 120ns):
mapped on init:
max=10080 warm_max=8760 min=6600 avg=6699
mapped on update (RFC1):
max=10440 warm_max=7560 min=7320 avg=7419
mapped on access (RFC2)
max=11520 warm_max=7920 min=7200 avg=7299
v2: It was reconsidered the new runstate interface implementation. The new
interface is made independent of the old one. Do not share runstate_area
field, and consequently avoid excessive concurrency with the old runstate
interface usage.
Introduced locks in order to resolve possible concurrency between runstate
area registration and usage.
Addressed comments from Jan Beulich [3][4] about coding style nits. Though
some of them become obsolete with refactoring and few are picked into this
thread for further discussion.
There were made performance measurements of approaches (runstate mapped on
access vs mapped on registration). The test setups are as following:
Thin Dom0 (Linux with intiramfs) with DomD running rich Yocto Linux. In
DomD 3d benchmark numbers are compared. The benchmark is GlMark2. GLMark2
is ran with different resolutions in order to emit different irq load,
where 320x240 emits high IRQ load, but 1920x1080 emits low irq load.
Separately tested baking DomD benchmark run with primitive Dom0 CPU burn
(dd), in order to stimulate VCPU(dX)->VCPU(dY) switches rather than
VCPU(dX)->idle->VCPU(dX).
with following results:
mapped mapped
on access on init
GLMark2 320x240 2852 2877 +0.8%
+Dom0 CPUBurn 2088 2094 +0.2%
GLMark2 800x600 2368 2375 +0.3%
+Dom0 CPUBurn 1868 1921 +2.8%
GLMark2 1920x1080 931 931 0%
+Dom0 CPUBurn 892 894 +0.2%
Please note that "mapped on access" means using the old runstate
registering interface. And runstate update in this case still often fails
to map runstate area like [5], despite the fact that our Linux kernel
does not have KPTI enabled. So runstate area update, in this case, is
really shortened.
Also it was checked IRQ latency difference using TBM in a setup similar to
[5]. Please note that the IRQ rate is one in 30 seconds, and only
VCPU->idle->VCPU use-case is considered. With following results (in ns,
the timer granularity 120ns):
mapped on access:
max=9960 warm_max=8640 min=7200 avg=7626
mapped on init:
max=9480 warm_max=8400 min=7080 avg=7341
Unfortunately there are no consitent results yet from profiling using
Lauterbach PowerTrace. Still in communication with the tracer vendor in
order to setup the proper configuration.
[1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html
[2]
https://github.com/aanisov/linux/commit/ba34d2780f57ea43f81810cd695aace7b55c0f29
[3] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00936.html
[4] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00934.html
[5] https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02369.html
[6] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg02297.html
Andrii Anisov (1):
xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
xen/arch/arm/domain.c | 58 ++++++++++++++++++---
xen/arch/x86/domain.c | 99 ++++++++++++++++++++++++++++++++---
xen/arch/x86/x86_64/domain.c | 16 +++++-
xen/common/domain.c | 121 +++++++++++++++++++++++++++++++++++++++----
xen/include/public/vcpu.h | 15 ++++++
xen/include/xen/sched.h | 28 +++++++---
6 files changed, 306 insertions(+), 31 deletions(-)
--
2.7.4
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |