|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
On 07.07.2025 13:46, Oleksii Kurochko wrote:
> On 7/7/25 9:20 AM, Jan Beulich wrote:
>> On 04.07.2025 17:01, Oleksii Kurochko wrote:
>>> On 7/1/25 3:49 PM, Jan Beulich wrote:
>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>> This patch introduces p2m_set_entry() and its core helper
>>>>> __p2m_set_entry() for
>>>>> RISC-V, based loosely on the Arm implementation, with several
>>>>> RISC-V-specific
>>>>> modifications.
>>>>>
>>>>> Key differences include:
>>>>> - TLB Flushing: RISC-V allows caching of invalid PTEs and does not require
>>>>> break-before-make (BBM). As a result, the flushing logic is
>>>>> simplified.
>>>>> TLB invalidation can be deferred until p2m_write_unlock() is called.
>>>>> Consequently, the p2m->need_flush flag is always considered true and
>>>>> is
>>>>> removed.
>>>>> - Page Table Traversal: The order of walking the page tables differs from
>>>>> Arm,
>>>>> and this implementation reflects that reversed traversal.
>>>>> - Macro Adjustments: The macros P2M_ROOT_LEVEL, P2M_ROOT_ORDER, and
>>>>> P2M_ROOT_PAGES are updated to align with the new RISC-V
>>>>> implementation.
>>>>>
>>>>> The main functionality is in __p2m_set_entry(), which handles mappings
>>>>> aligned
>>>>> to page table block entries (e.g., 1GB, 2MB, or 4KB with 4KB granularity).
>>>>>
>>>>> p2m_set_entry() breaks a region down into block-aligned mappings and calls
>>>>> __p2m_set_entry() accordingly.
>>>>>
>>>>> Stub implementations (to be completed later) include:
>>>>> - p2m_free_entry()
>>>> What would a function of this name do?
>>> Recursively visiting all leaf PTE's for sub-tree behind an entry, then calls
>>> put_page() (which will free if there is no any reference to this page),
>>> freeing intermediate page table (after all entries were freed) by removing
>>> it from d->arch.paging.freelist, and removes correspondent page of
>>> intermediate page
>>> table from p2m->pages list.
>>>
>>>> You can clear entries, but you can't
>>>> free them, can you?
>>> Is is a question regarding terminology?
>> Yes. If one sees a call to a function, it should be possible to at least
>> roughly know what it does without needing to go look at the implementation.
>>
>>> I can't free entry itself, but a page table or
>>> a page (if it is a leaf entry) on which it points could free.
>> Then e.g. pte_free_subtree() or some such?
>
> It sounds fine to me. I'll use suggested name.
>
> Just want to notice that other arches also have the same function
> for the same purpose with the same name.
As to x86, it's not general P2M code which uses this odd (for the purpose)
name, but only p2m-pt.c.
> Does it make sense then to change a name for all arches?
I think so.
>>>>> +{
>>>>> + panic("%s: isn't implemented for now\n", __func__);
>>>>> +
>>>>> + return false;
>>>>> +}
>>>> For this function in particular, though: Besides the "p2me" in the name
>>>> being somewhat odd (supposedly page table entries here are simply pte_t),
>>>> how is this going to be different from pte_is_valid()?
>>> pte_is_valid() is checking a real bit of PTE, but p2me_is_valid() is
>>> checking
>>> what is a type stored in the radix tree (p2m->p2m_types):
>>> /*
>>> * In the case of the P2M, the valid bit is used for other purpose. Use
>>> * the type to check whether an entry is valid.
>>> */
>>> static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte)
>>> {
>>> return p2m_type_radix_get(p2m, pte) != p2m_invalid;
>>> }
>>>
>>> It is done to track which page was modified by a guest.
>> But then (again) the name doesn't convey what the function does.
>
> Then probably p2me_type_is_valid(struct p2m_domain *p2m, pte_t pte) would
> better.
For P2M type checks please don't invent new naming, but use what both x86
and Arm are already using. Note how we already have p2m_is_valid() in that
set. Just that it's not doing what you want here.
>> Plus
>> can't a guest also arrange for an entry's type to move to p2m_invalid?
>> That's then still an entry that was modified by the guest.
>
> I am not really sure that I fully understand the question.
> Do you ask if a guest can do something which will lead to a call of
> p2m_set_entry()
> with p2m_invalid argument?
That I'm not asking, but rather stating. I.e. I expect such is possible.
> If yes, then it seems like it will be done only in case of
> p2m_remove_mapping() what
> will mean that alongside with p2m_invalid INVALID_MFN will be also passed,
> what means
> this entry isn't expected to be used anymore.
Right. But such an entry would still have been "modified" by the guest.
>> Overall I think I'm lacking clarity what you mean to use this predicate
>> for.
>
> By using of "p2me_" predicate I wanted to express that not PTE's valid bit
> will be
> checked, but the type saved in radix tree will be used.
> As suggested above probably it will be better drop "e" too and just use
> p2m_type_is_valid().
See above regarding that name.
>>>>> + /*
>>>>> + * Don't take into account the MFN when removing mapping (i.e
>>>>> + * MFN_INVALID) to calculate the correct target order.
>>>>> + *
>>>>> + * XXX: Support superpage mappings if nr is not aligned to a
>>>>> + * superpage size.
>>>>> + */
>>>> Does this really need leaving as a to-do?
>>> I think so, yes. It won’t break the current workflow if|nr| isn’t aligned,
>>> a smaller order will simply be chosen.
>> Well, my question was more like "Isn't it simple enough to cover the case
>> right away?"
>>
>>>>> + mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0;
>>>>> + mask |= gfn_x(sgfn) | nr;
>>>>> +
>>>>> + for ( ; i != 0; i-- )
>>>>> + {
>>>>> + if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) )
>>>>> + {
>>>>> + order = XEN_PT_LEVEL_ORDER(i);
>>>>> + break;
>>>> Nit: Style.
>>>>
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
>>>>> + if ( rc )
>>>>> + break;
>>>>> +
>>>>> + sgfn = gfn_add(sgfn, (1 << order));
>>>>> + if ( !mfn_eq(smfn, INVALID_MFN) )
>>>>> + smfn = mfn_add(smfn, (1 << order));
>>>>> +
>>>>> + nr -= (1 << order);
>>>> Throughout maybe better be safe right away and use 1UL?
>>>>
>>>>> + }
>>>>> +
>>>>> + return rc;
>>>>> }
>>>> How's the caller going to know how much of the range was successfully
>>>> mapped?
>>> There is no such option. Do other arches do that? I mean returns somehow
>>> the number of successfully mapped (sgfn,smfn).
>> On x86 we had to introduce some not very nice code to cover for the absence
>> of proper handling there. For a new port I think it wants at least seriously
>> considering not to repeat such a potentially unhelpful pattern.
>>
>>>> That part may need undoing (if not here, then in the caller),
>>>> or a caller may want to retry.
>>> So the caller in the case if rc != 0, can just undoing the full range
>>> (by using the same sgfn, nr, smfn).
>> Can it? How would it know what the original state was?
>
> You're right — blindly unmapping the range assumes that no entries were valid
> beforehand and I missed that it could be that something valid was mapped
> before
> p2m_set_entry(sgfn,...,smfn) was called.
> But then I am not really understand why it won't be an issue if will know
> how many GFNs were successfully mapped.
The caller may know what that range's state was. But what I really wanted to
convey is: Updating multiple entries in one go is complicated in some of the
corner cases. You will want to think this through now, in order to avoid the
need to re-write everything later again.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |