|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings
On 22.07.2025 16:57, Oleksii Kurochko wrote:
>
> On 7/21/25 3:34 PM, Jan Beulich wrote:
>> On 17.07.2025 18:37, Oleksii Kurochko wrote:
>>> On 7/2/25 11:25 AM, Jan Beulich wrote:
>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>> Add support for down large memory mappings ("superpages") in the RISC-V
>>>>> p2m mapping so that smaller, more precise mappings ("finer-grained
>>>>> entries")
>>>>> can be inserted into lower levels of the page table hierarchy.
>>>>>
>>>>> To implement that the following is done:
>>>>> - Introduce p2m_split_superpage(): Recursively shatters a superpage into
>>>>> smaller page table entries down to the target level, preserving
>>>>> original
>>>>> permissions and attributes.
>>>>> - __p2m_set_entry() updated to invoke superpage splitting when inserting
>>>>> entries at lower levels within a superpage-mapped region.
>>>>>
>>>>> This implementation is based on the ARM code, with modifications to the
>>>>> part
>>>>> that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V does
>>>>> not require BBM, so there is no need to invalidate the PTE and flush the
>>>>> TLB before updating it with the newly created, split page table.
>>>> But some flushing is going to be necessary. As long as you only ever do
>>>> global flushes, the one after the individual PTE modification (within the
>>>> split table) will do (if BBM isn't required, see below), but once you move
>>>> to more fine-grained flushing, that's not going to be enough anymore. Not
>>>> sure it's a good idea to leave such a pitfall.
>>> I think that I don't fully understand what is an issue.
>> Whether a flush is necessary after solely breaking up a superpage is arch-
>> defined. I don't know how much restrictions the spec on possible hardware
>> behavior for RISC-V. However, the eventual change of (at least) one entry
>> of fulfill the original request will surely require a flush. What I was
>> trying to say is that this required flush would better not also cover for
>> the flushes that may or may not be required by the spec. IOW if the spec
>> leaves any room for flushes to possibly be needed, those flushes would
>> better be explicit.
>
> I think that I still don't understand why what I wrote above will work as long
> as global flushes is working and will stop to work when more fine-grained
> flushing
> is used.
>
> Inside p2m_split_superpage() we don't need any kind of TLB flush operation as
> it is allocation a new page for a table and works with it, so no one could use
> this page at the moment and cache it in TLB.
>
> The only question is that if it is needed BBM before staring using splitted
> entry:
> ....
> if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
> {
> /* Free the allocated sub-tree */
> p2m_free_subtree(p2m, split_pte, level);
>
> rc = -ENOMEM;
> goto out;
> }
>
> ------> /* Should be BBM used here ? */
> p2m_write_pte(entry, split_pte, p2m->clean_pte);
>
> And I can't find anything in the spec what tells me to use BBM here like Arm
> does:
But what you need is a statement in the spec that you can get away without. Imo
at least.
> /*
> * Follow the break-before-sequence to update the entry.
> * For more details see (D4.7.1 in ARM DDI 0487A.j).
> */
> p2m_remove_pte(entry, p2m->clean_pte);
> p2m_force_tlb_flush_sync(p2m);
>
> p2m_write_pte(entry, split_pte, p2m->clean_pte);
>
> I agree that even BBM isn't mandated explicitly sometime it could be useful,
> but
> here I am not really sure what is the point to do TLB flush before
> p2m_write_pte()
> as nothing serious will happen if and old mapping will be used for a some time
> as we are keeping the same rights for smaller (splited) mapping.
> The reason why Arm does p2m_remove_pte() & p2m_force_tlb_flush() before
> updating
> an entry here as it is doesn't guarantee that everything will be okay and they
> explicitly tells:
> This situation can possibly break coherency and ordering guarantees,
> leading to
> inconsistent unwanted behavior in our Processing Element (PE).
>
>
>>>> As to (no need for) BBM: I couldn't find anything to that effect in the
>>>> privileged spec. Can you provide some pointer? What I found instead is e.g.
>>>> this sentence: "To ensure that implicit reads observe writes to the same
>>>> memory locations, an SFENCE.VMA instruction must be executed after the
>>>> writes to flush the relevant cached translations." And this: "Accessing the
>>>> same location using different cacheability attributes may cause loss of
>>>> coherence." (This may not only occur when the same physical address is
>>>> mapped twice at different VAs, but also after the shattering of a superpage
>>>> when the new entry differs in cacheability.)
>>> I also couldn't find that RISC-V spec mandates BBM explicitly as Arm does
>>> it.
>>>
>>> What I meant that on RISC-V can do:
>>> - Write new PTE
>>> - Flush TLB
>>>
>>> While on Arm it is almost always needed to do:
>>> - Write zero to PTE
>>> - Flush TLB
>>> - Write new PTE
>>>
>>> For example, the common CoW code path where you copy from a read only page
>>> to
>>> a new page, then map that new page as writable just works on RISC-V without
>>> extra considerations and on Arm it requires BBM.
>> CoW is a specific sub-case with increasing privilege.
>
> Could you please explain why it matters increasing of privilege in terms of
> TLB
> flushing and PTE clearing before writing a new PTE?
Some architectures automatically re-walk page tables when encountering a
permission violation based on TLB contents. Hence increasing privilege
can be a special case.
>>> It seems to me that BBM is mandated for Arm only because that TLB is shared
>>> among cores, so there is no any guarantee that no prior entry for same VA
>>> remains in TLB. In case of RISC-V's TLB isn't shared and after a flush it is
>>> guaranteed that no prior entry for the same VA remains in the TLB.
>> Not sure that's the sole reason. But again the question is: Is this written
>> down explicitly anywhere? Generally there can be multiple levels of TLBs, and
>> while some of them may be per-core, others may be shared.
>
> Spec. mentions that:
> If a hart employs an address-translation cache, that cache must appear to
> be
> private to that hart.
Hmm, okay, that indeed pretty much excludes shared TLBs.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |