[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode
On Fri Aug 16, 2024 at 7:02 PM BST, Alejandro Vallejo wrote: > On Fri Jul 26, 2024 at 4:21 PM BST, Roger Pau Monne wrote: > > Instead of allocating a monitor table for each vCPU when running in HVM HAP > > mode, use a per-pCPU monitor table, which gets the per-domain slot updated > > on > > guest context switch. > > > > This limits the amount of memory used for HVM HAP monitor tables to the > > amount > > of active pCPUs, rather than to the number of vCPUs. It also simplifies > > vCPU > > allocation and teardown, since the monitor table handling is removed from > > there. > > > > Note the switch to using a per-CPU monitor table is done regardless of > > whether > > s/per-CPU/per-pCPU/ > > > Address Space Isolation is enabled or not. Partly for the memory usage > > reduction, and also because it allows to simplify the VM tear down path by > > not > > having to cleanup the per-vCPU monitor tables. > > > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > Note the monitor table is not made static because uses outside of the file > > where it's defined will be added by further patches. > > --- > > xen/arch/x86/hvm/hvm.c | 60 ++++++++++++++++++++++++ > > xen/arch/x86/hvm/svm/svm.c | 5 ++ > > xen/arch/x86/hvm/vmx/vmcs.c | 1 + > > xen/arch/x86/hvm/vmx/vmx.c | 4 ++ > > xen/arch/x86/include/asm/hap.h | 1 - > > xen/arch/x86/include/asm/hvm/hvm.h | 8 ++++ > > xen/arch/x86/mm.c | 8 ++++ > > xen/arch/x86/mm/hap/hap.c | 75 ------------------------------ > > xen/arch/x86/mm/paging.c | 4 +- > > 9 files changed, 87 insertions(+), 79 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > > index 7f4b627b1f5f..3f771bc65677 100644 > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -104,6 +104,54 @@ static const char __initconst warning_hvm_fep[] = > > static bool __initdata opt_altp2m_enabled; > > boolean_param("altp2m", opt_altp2m_enabled); > > > > +DEFINE_PER_CPU(root_pgentry_t *, monitor_pgt); > > + > > +static int allocate_cpu_monitor_table(unsigned int cpu) > > To avoid ambiguity, could we call these *_pcpu_*() instead? > > > +{ > > + root_pgentry_t *pgt = alloc_xenheap_page(); > > + > > + if ( !pgt ) > > + return -ENOMEM; > > + > > + clear_page(pgt); > > + > > + init_xen_l4_slots(pgt, _mfn(virt_to_mfn(pgt)), INVALID_MFN, NULL, > > + false, true, false); > > + > > + ASSERT(!per_cpu(monitor_pgt, cpu)); > > + per_cpu(monitor_pgt, cpu) = pgt; > > + > > + return 0; > > +} > > + > > +static void free_cpu_monitor_table(unsigned int cpu) > > +{ > > + root_pgentry_t *pgt = per_cpu(monitor_pgt, cpu); > > + > > + if ( !pgt ) > > + return; > > + > > + per_cpu(monitor_pgt, cpu) = NULL; > > + free_xenheap_page(pgt); > > +} > > + > > +void hvm_set_cpu_monitor_table(struct vcpu *v) > > +{ > > + root_pgentry_t *pgt = this_cpu(monitor_pgt); > > + > > + ASSERT(pgt); > > + > > + setup_perdomain_slot(v, pgt); > > Why not modify them as part of write_ptbase() instead? As it stands, it > appears > to be modifying the PTEs of what may very well be our current PT, which makes > the perdomain slot be in a $DEITY-knows-what state until the next flush > (presumably the write to cr3 in write_ptbase()?; assuming no PCIDs). > > Setting the slot up right before the cr3 change should reduce the potential > for > misuse. > > > + > > + make_cr3(v, _mfn(virt_to_mfn(pgt))); > > +} > > + > > +void hvm_clear_cpu_monitor_table(struct vcpu *v) > > +{ > > + /* Poison %cr3, it will be updated when the vCPU is scheduled. */ > > + make_cr3(v, INVALID_MFN); > > I think this would benefit from more exposition in the comment. If I'm getting > this right, after descheduling this vCPU we can't assume it'll be rescheduled > on the same pCPU, and if it's not it'll end up using a different monitor > table. > This poison value is meant to highlight forgetting to set cr3 in the > "ctxt_switch_to()" path. > > All of that can be deduced from what you wrote and sufficient headscratching > but seeing how this is invoked from the context switch path it's not > incredibly > clear wether you meant the perdomain slot would be updated by the next vCPU or > what I stated in the previous paragraph. > > Assuming it is as I mentioned, maybe hvm_forget_cpu_monitor_table() would > convey what it does better? i.e: the vCPU forgets/unbinds the monitor table > from its internal state. > > Cheers, > Alejandro After playing with the code for a while I'm becoming increasingly convinced that we don't want to tie hvm_clear_cpu_monitor_table() to the ctx_switch_to handlers at all. In __context_switch() we would ideally like to delay restoring the state until after said state is available in the page tables (i.e: after write_ptbase()). With that division we can do saves and restores with far less headaches as we can assume that the pcpu fixmap always contains the relevant data. Cheers, Alejandro
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |