|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 03/20] xen/riscv: introduce VMID allocation and manegement
On 31.07.2025 17:58, Oleksii Kurochko wrote:
> Current implementation is based on x86's way to allocate VMIDs:
> VMIDs partition the physical TLB. In the current implementation VMIDs are
> introduced to reduce the number of TLB flushes. Each time the guest's
> virtual address space changes,
virtual?
> instead of flushing the TLB, a new VMID is
> assigned. This reduces the number of TLB flushes to at most 1/#VMIDs.
> The biggest advantage is that hot parts of the hypervisor's code and data
> retain in the TLB.
>
> VMIDs are a hart-local resource. As preemption of VMIDs is not possible,
> VMIDs are assigned in a round-robin scheme. To minimize the overhead of
> VMID invalidation, at the time of a TLB flush, VMIDs are tagged with a
> 64-bit generation. Only on a generation overflow the code needs to
> invalidate all VMID information stored at the VCPUs with are run on the
> specific physical processor. This overflow appears after about 2^80
> host processor cycles,
Where's this number coming from? (If you provide numbers, I think they will
want to be "reproducible" by the reader. Which I fear isn't the case here.)
> @@ -148,6 +149,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
>
> console_init_postirq();
>
> + vmid_init();
This lives here only temporarily, I assume? Every hart will need to execute
it, and hence (like we have it on x86) this may want to be a central place
elsewhere.
> --- /dev/null
> +++ b/xen/arch/riscv/vmid.c
> @@ -0,0 +1,165 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/domain.h>
> +#include <xen/init.h>
> +#include <xen/sections.h>
> +#include <xen/lib.h>
> +#include <xen/param.h>
> +#include <xen/percpu.h>
> +
> +#include <asm/atomic.h>
> +#include <asm/csr.h>
> +#include <asm/flushtlb.h>
> +
> +/* Xen command-line option to enable VMIDs */
> +static bool __read_mostly opt_vmid_enabled = true;
__ro_after_init ?
> +boolean_param("vmid", opt_vmid_enabled);
> +
> +/*
> + * VMIDs partition the physical TLB. In the current implementation VMIDs are
> + * introduced to reduce the number of TLB flushes. Each time the guest's
> + * virtual address space changes, instead of flushing the TLB, a new VMID is
The same odd "virtual" again? All the code here is about guest-physical, isn't
it?
> + * assigned. This reduces the number of TLB flushes to at most 1/#VMIDs.
> + * The biggest advantage is that hot parts of the hypervisor's code and data
> + * retain in the TLB.
> + *
> + * Sketch of the Implementation:
> + *
> + * VMIDs are a hart-local resource. As preemption of VMIDs is not possible,
> + * VMIDs are assigned in a round-robin scheme. To minimize the overhead of
> + * VMID invalidation, at the time of a TLB flush, VMIDs are tagged with a
> + * 64-bit generation. Only on a generation overflow the code needs to
> + * invalidate all VMID information stored at the VCPUs with are run on the
> + * specific physical processor. This overflow appears after about 2^80
And the same interesting number again.
> + * host processor cycles, so we do not optimize this case, but simply disable
> + * VMID useage to retain correctness.
> + */
> +
> +/* Per-Hart VMID management. */
> +struct vmid_data {
> + uint64_t hart_vmid_generation;
Any reason not to simply use "generation"?
> + uint16_t next_vmid;
> + uint16_t max_vmid;
> + bool disabled;
> +};
> +
> +static DEFINE_PER_CPU(struct vmid_data, vmid_data);
> +
> +static unsigned long vmidlen_detect(void)
__init ? Or wait, are you (deliberately) permitting different VMIDLEN
across harts?
> +{
> + unsigned long vmid_bits;
Why "long" (also for the function return type)?
> + unsigned long old;
> +
> + /* Figure-out number of VMID bits in HW */
> + old = csr_read(CSR_HGATP);
> +
> + csr_write(CSR_HGATP, old | HGATP_VMID_MASK);
> + vmid_bits = csr_read(CSR_HGATP);
> + vmid_bits = MASK_EXTR(vmid_bits, HGATP_VMID_MASK);
Nit: Stray blank.
> + vmid_bits = flsl(vmid_bits);
> + csr_write(CSR_HGATP, old);
> +
> + /*
> + * We polluted local TLB so flush all guest TLB as
> + * a speculative access can happen at any time.
> + */
> + local_hfence_gvma_all();
There's no guest running. If you wrote hgat.MODE as zero, as per my
understanding now new TLB entries could even purely theoretically appear.
In fact, with no guest running (yet) I'm having a hard time seeing why
you shouldn't be able to simply write the register with just
HGATP_VMID_MASK, i.e. without OR-ing in "old". It's even questionable
whether "old" needs restoring; writing plain zero afterwards ought to
suffice. You're in charcge of the register, after all.
> + return vmid_bits;
> +}
> +
> +void vmid_init(void)
> +{
> + static bool g_disabled = false;
> + unsigned long vmid_len = vmidlen_detect();
> + struct vmid_data *data = &this_cpu(vmid_data);
> + unsigned long max_availalbe_bits = sizeof(data->max_vmid) << 3;
Nit: Typo in "available". Also now that we have it, better use
BITS_PER_BYTE here?
> + if ( vmid_len > max_availalbe_bits )
> + panic("%s: VMIDLEN is bigger then a type which represent VMID:
> %lu(%lu)\n",
> + __func__, vmid_len, max_availalbe_bits);
This shouldn't be a runtime check imo. What you want to check (at build
time) is that the bits set in HGATP_VMID_MASK can be held in ->max_vmid.
> + data->max_vmid = BIT(vmid_len, U) - 1;
> + data->disabled = !opt_vmid_enabled || (vmid_len <= 1);
Actually, what exactly does it mean that "VMIDs are disabled"? There's
no enable bit that I could find anywhere. Isn't it rather that in this
case you need to arrange to flush always on VM entry (or always after a
P2M change, depending how the TLB is split between guest and host use)?
If you look at vmx_vmenter_helper(), its flipping of
SECONDARY_EXEC_ENABLE_VPID tweaks CPU behavior, such that the flush
would be implicit (when the bit is off). I don't expect RISC-V has any
such "implicit" flushing behavior?
> + if ( g_disabled != data->disabled )
> + {
> + printk("%s: VMIDs %sabled.\n", __func__,
> + data->disabled ? "dis" : "en");
> + if ( !g_disabled )
> + g_disabled = data->disabled;
This doesn't match x86 code. g_disabled is a tristate there, which only
the boot CPU would ever write to.
A clear shortcoming of the x86 code (that you copied) is that the log
message doesn't identify the CPU in question. A sequence of "disabled"
and "enabled" could thus result, without the last one (or in fact any
one) making clear what the overall state is. I think you want to avoid
this from the beginning.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |