[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 6/7] xen/riscv: page table handling



On Thu, 2024-08-15 at 14:16 +0200, Jan Beulich wrote:
> On 15.08.2024 13:21, oleksii.kurochko@xxxxxxxxx wrote:
> > On Thu, 2024-08-15 at 10:09 +0200, Jan Beulich wrote:
> > > On 14.08.2024 18:50, oleksii.kurochko@xxxxxxxxx wrote:
> > > > On Tue, 2024-08-13 at 12:31 +0200, Jan Beulich wrote:
> > > > > On 09.08.2024 18:19, Oleksii Kurochko wrote:
> > > > > > RISC-V detects superpages using `pte.x` and `pte.r`, as
> > > > > > there
> > > > > > is no specific bit in the PTE for this purpose. From the
> > > > > > RISC-V
> > > > > > spec:
> > > > > > ```
> > > > > >   ...
> > > > > >   4. Otherwise, the PTE is valid. If pte.r = 1 or pte.x =
> > > > > > 1, go
> > > > > > to
> > > > > > step 5.
> > > > > >      Otherwise, this PTE is a pointer to the next level of
> > > > > > the
> > > > > > page
> > > > > > table.
> > > > > >      ... .
> > > > > >   5. A leaf PTE has been found.
> > > > > >      ...
> > > > > >   ...
> > > > > > ```
> > > > > > 
> > > > > > The code doesn’t support super page shattering so 4KB pages
> > > > > > are
> > > > > > used as
> > > > > > default.
> > > > > 
> > > > > ... you have this. Yet still callers expecting re-mapping in
> > > > > the
> > > > > (large)
> > > > > range they map can request small-page mappings right away.
> > > > I am not sure that I fully understand what do you mean by
> > > > "expcting
> > > > re-
> > > > mapping".
> > > 
> > > Right now you have callers pass PTE_BLOCK when they know that no
> > > small
> > > page re-mappings are going to occur for an area. What I'm
> > > suggesting
> > > is
> > > that you invert this logic: Have callers pass PTE_SMALL when
> > > there is
> > > a possibility that re-mapping requests may be issued later. Then,
> > > later, by simply grep-ing for PTE_SMALL you'll be able to easily
> > > find
> > > all candidates that possibly can be relaxed when super-page
> > > shattering
> > > starts being supported. That's going to be easier than finding
> > > all
> > > instances where PTE_BLOCK is _not_used.
> > So if I understand correctly. Actually nothing will change in
> > algorithm
> > of pt_update() and only PTE_SMALL should be introduced instead of
> > PTE_BLOCK. And if I will know that something will be better to map
> > as
> > PTE_SMALL to not face shattering in case of unmap (for example) I
> > just
> > can map this memory as PTE_SMALL and that is it?
> 
> That is it.
> 
> > > > > > +    spin_unlock(&xen_pt_lock);
> > > > > 
> > > > > Does this really need to come after fence and flush?
> > > > I think yes, as page table should be updated only by 1 CPU at
> > > > the
> > > > same
> > > > time. And before give ability to other CPU to update page table
> > > > we
> > > > have
> > > > to finish a work on current CPU.
> > > 
> > > Can you then explain to me, perhaps by way of an example, what
> > > will
> > > go
> > > wrong if the unlock is ahead of the flush? (I'm less certain
> > > about
> > > the
> > > fence, and that's also less expensive.)
> > pt_update() will be called for interleaved region, for example, by
> > different CPUs:
> > 
> >                      pt_update():
> > CPU1:                                    CPU2:
> >  ...                                spin_lock(&xen_pt_lock);
> > RISCV_FENCE(rw, rw);                 ....
> > 
> > /* After this function will be
> >    executed the following thing
> >    can happen ------------------>  start to update page table
> > */                                 entries which was partially     
> > spin_unlock(&xen_pt_lock);         created during CPU1 but
> > CPU2       
> > ....                               doesn't know about them
> > yet        
> > ....                               because flush_tlb() ( sfence.vma
> > ) 
> > ....                               wasn't done      
> > ....                                                               
> >    
> > flush_tlb_range_va();
> 
> Not exactly: CPU2 knows about them as far as the memory used /
> modified
> goes, and that's all that matters for further page table
> modifications.
> CPU2 only doesn't know about the new page table entries yet when it
> comes
> to using them for a translation (by the hardware page walker). Yet
> this
> aspect is irrelevant here, if I'm not mistaken.
And it isn't an issue that CPU2 will add these new page table entries
again during execution of CPU2's pt_update()?

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.