|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [RFC PATCH] x86/p2m-pt: do type recalculations with p2m read lock
Global p2m type recalculations (as triggered by logdirty) can create
so much contention on the p2m lock that simple guest operations like
VCPUOP_set_singleshot_timer on guests with a high amount of vCPUs (32)
will cease to work in a timely manner, up to the point that Linux
kernel versions that sill use the VCPU_SSHOTTMR_future flag with the
singleshot timer will cease to work:
[ 82.779470] CE: xen increased min_delta_ns to 1000000 nsec
[ 82.793075] CE: Reprogramming failure. Giving up
[ 82.779470] CE: Reprogramming failure. Giving up
[ 82.821864] CE: xen increased min_delta_ns to 506250 nsec
[ 82.821864] CE: xen increased min_delta_ns to 759375 nsec
[ 82.821864] CE: xen increased min_delta_ns to 1000000 nsec
[ 82.821864] CE: Reprogramming failure. Giving up
[ 82.856256] CE: Reprogramming failure. Giving up
[ 84.566279] CE: Reprogramming failure. Giving up
[ 84.649493] Freezing user space processes ...
[ 130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected
by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
[ 130.604032] Task dump for CPU 14:
[ 130.604032] swapper/14 R running task 0 0 1 0x00000000
[ 130.604032] Call Trace:
[ 130.604032] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[ 130.604032] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[ 130.604032] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[ 130.604032] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[ 130.604032] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[ 130.604032] [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[ 549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected
by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
[ 549.655463] Task dump for CPU 26:
[ 549.655463] swapper/26 R running task 0 0 1 0x00000000
[ 549.655463] Call Trace:
[ 549.655463] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[ 549.655463] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[ 549.655463] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[ 549.655463] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[ 549.655463] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[ 549.655463] [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[ 821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected
by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
[ 821.888596] Task dump for CPU 26:
[ 821.888622] swapper/26 R running task 0 0 1 0x00000000
[ 821.888677] Call Trace:
[ 821.888712] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[ 821.888771] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[ 821.888818] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[ 821.888865] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[ 821.888917] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[ 821.888966] [<ffffffff900000d5>] ? start_cpu+0x5/0x14
This is obviously undesirable. One way to bodge the issue would be to
ignore VCPU_SSHOTTMR_future, but that's a deliberate breakage of the
hypercall ABI.
Instead lower the contention in the lock by doing the recalculation
with the lock in read mode. This is safe because only the flags/type
are changed, there's no PTE mfn change in the AMD recalculation logic.
The Intel (EPT) case is likely more complicated, as superpage
splitting for diverging EMT values must be done with the p2m lock in
taken in write mode.
Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
I'm unsure whether such modification is fully safe: I think changing
the flags/type should be fine: the PTE write is performed using
safwrite_p2m_entry() which must be atomic (as the guest is still
running and accessing the page tables). I'm slightly worried about
all PTE readers not using atomic accesses to do so (ie: pointer
returned by p2m_find_entry() should be read atomicallly), and code
assuming that a gfn type cannot change while holding the p2m lock in
read mode.
Wanted to post early in case someone knows any showstoppers with this
approach that make it a no-go, before I try to further evaluate
users.
---
xen/arch/x86/mm/p2m-pt.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index cd1af33b67..f145647f01 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -486,9 +486,6 @@ static int cf_check do_recalc(struct p2m_domain *p2m,
unsigned long gfn)
p2m_type_t ot, nt;
unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
- if ( !valid_recalc(l1, e) )
- P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
- p2m->domain->domain_id, gfn, level);
ot = p2m_flags_to_type(l1e_get_flags(e));
nt = p2m_recalc_type_range(true, ot, p2m, gfn & mask, gfn | ~mask);
if ( nt != ot )
@@ -538,9 +535,9 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa)
*/
ASSERT(!altp2m_active(current->domain));
- p2m_lock(p2m);
+ p2m_read_lock(p2m);
rc = do_recalc(p2m, PFN_DOWN(gpa));
- p2m_unlock(p2m);
+ p2m_read_unlock(p2m);
return rc;
}
--
2.40.0
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |