[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 4/7] xen/riscv: introduce functionality to work with CPU info



On Wed, 2024-08-14 at 17:22 +0200, Jan Beulich wrote:
> On 14.08.2024 16:45, oleksii.kurochko@xxxxxxxxx wrote:
> > On Tue, 2024-08-13 at 10:54 +0200, Jan Beulich wrote:
> > > On 09.08.2024 18:19, Oleksii Kurochko wrote:
> > > > --- a/xen/arch/riscv/include/asm/smp.h
> > > > +++ b/xen/arch/riscv/include/asm/smp.h
> > > > @@ -5,6 +5,8 @@
> > > >  #include <xen/cpumask.h>
> > > >  #include <xen/percpu.h>
> > > >  
> > > > +#define INVALID_HARTID UINT_MAX
> > > > +
> > > >  DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
> > > >  DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
> > > >  
> > > > @@ -14,6 +16,14 @@ DECLARE_PER_CPU(cpumask_var_t,
> > > > cpu_core_mask);
> > > >   */
> > > >  #define park_offline_cpus false
> > > >  
> > > > +void smp_set_bootcpu_id(unsigned long boot_cpu_hartid);
> > > > +
> > > > +/*
> > > > + * Mapping between linux logical cpu index and hartid.
> > > > + */
> > > > +extern unsigned long cpuid_to_hartid_map[NR_CPUS];
> > > > +#define cpuid_to_hartid(cpu) cpuid_to_hartid_map[cpu]
> > > 
> > > How wide can hart IDs be? Wider than 32 bits? If not, why
> > > unsigned
> > > long?
> > According to the spec:
> > ```
> > The mhartid CSR is an MXLEN-bit read-only register containing the
> > integer ID of the hardware thread running the code
> > ```
> > where MXLEN can bit 32 and 64.
> 
> Hmm, okay. If the full MXLEN bits can be used, then using unsigned
> long
> is indeed the right thing here.
> 
> > > > @@ -40,6 +41,19 @@ void __init noreturn start_xen(unsigned long
> > > > bootcpu_id,
> > > >  {
> > > >      remove_identity_mapping();
> > > >  
> > > > +    /*
> > > > +     * tp register contains an address of physical cpu
> > > > information.
> > > > +     * So write physical CPU info of boot cpu to tp register
> > > > +     * It will be used later by get_processor_id() ( look at
> > > > +     * <asm/processor.h> ):
> > > > +     *   #define get_processor_id()    (tp->processor_id)
> > > > +     */
> > > > +    asm volatile ( "mv tp, %0" : : "r"((unsigned
> > > > long)&pcpu_info[0]) );
> > > 
> > > Perhaps be on the safe side and also add a memory barrier?
> > Do you mean compiler barrier? ( asm volatile ( "..." :: ... :
> > "memory"
> > )? )
> 
> Yes.
> 
> > > Perhaps be on the safe side and do this absolutely first in the
> > > function,
> > > or even in assembly (such that initializers of future variables
> > > declared
> > > at the top of the function end up safe, too)?
> > I am not sure that I am following your part related to put this
> > code in
> > assembler ( instructions in assembly code still code be re-ordered
> > what
> > can affect a time when tp will be set with correct value )
> 
> I'm afraid I don't understand this. Instructions can be re-ordered,
> sure,
> but later instructions are guaranteed to observe the effects on
> register
> state that earlier instructions caused.
> 
> > and what do
> > you mean by "initializers of future variables declared at the top
> > of
> > the function end up safe"
> 
> With
> 
> void start_xen()
> {
>     int var = func();
> 
>     asm volatile ( "mv tp, %0" :: ...);
>     ...
> 
> you end up with the requirement that func() must not use anything
> that
> depends on tp being set. In this simple example it may be easy to say
> "don't use an initializer and call the function afterwards". But that
> is
> going to be a risky game to play. Look at x86'es __start_xen(). An
> insertion into such a big set of declarations may not pay attention
> to
> tp not being set yet, when _all_ other C code may reasonably assume
> tp
> is set.
Thanks for the clarification. I missed the possibility that someone
might accidentally use tp before it is set. In this case, I agree that
it would be better to create a setup_tp() function and call it before
start_xen().

> 
> > > > --- /dev/null
> > > > +++ b/xen/arch/riscv/smp.c
> > > > @@ -0,0 +1,4 @@
> > > > +#include <xen/smp.h>
> > > > +
> > > > +/* tp points to one of these per cpu */
> > > > +struct pcpu_info pcpu_info[NR_CPUS];
> > > 
> > > And they all need setting up statically? Is there a plan to make
> > > this
> > > dynamic (which could be recorded in a "fixme" in the comment)?
> > I didn't plan to make allocation of this array dynamic. I don't
> > expect
> > that NR_CPUS will be big.
> 
> What is this expectation of yours based on? Other architectures
> permit
> systems with hundreds or even thousands of CPUs; why would RISC-V be
> different there?
Based on available dev boards. ( what isn't really strong argument )

I checked other architectures and they are using static allocation too:
   struct cpuinfo_x86 cpu_data[NR_CPUS];
   
   u32 x86_cpu_to_apicid[NR_CPUS] __read_mostly =
        { [0 ... NR_CPUS-1] = BAD_APICID };
   
   ... /* Arm */
   
   struct cpuinfo_arm cpu_data[NR_CPUS];

I wanted to check to understand which one API should be used to
allocate this array dynamically. xmalloc?

And I am curious how I can use xmalloc() at this stage if page
allocator (?) should be initialized what isn't true now.
Or just to allocate pcpu_info only for boot cpu and for other then use
xmalloc()?

> 
> > I can add "fixme" but I am not really
> > understand what will be advantages if pcpu_info[] will be allocated
> > dynamically.
> 
> Where possible it's better to avoid static allocations, of which on
> some systems only a very small part may be used. Even if you put
> yourself
> on the position that many take - memory being cheap - you then still
> waste cache and TLB bandwidth. Furthermore as long as struct
> pcpu_info
> isn't big enough (and properly aligned) for two successive array
> entries
> to not share cache lines, you may end up playing cacheline ping-pong
> when a CPU writes to its own array slot.
Why the mentioned issues aren't work for dynamic memory? We still
allocating memory for sizeof(pcpu_info) * NR_CPUS and when this
allocated memory access it will go to cache, CPU/MMU still will use TLB
for address translation for this region and without a proper alignment
of pcpu_info size it still could be an issue with cache line sharing.

~ Oleksii




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.