|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 13/17] xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration
On 22.07.2025 13:34, Oleksii Kurochko wrote:
>
> On 7/22/25 12:41 PM, Oleksii Kurochko wrote:
>>
>>
>> On 7/21/25 2:18 PM, Jan Beulich wrote:
>>> On 18.07.2025 11:52, Oleksii Kurochko wrote:
>>>> On 7/17/25 12:25 PM, Jan Beulich wrote:
>>>>> On 17.07.2025 10:56, Oleksii Kurochko wrote:
>>>>>> On 7/16/25 6:18 PM, Jan Beulich wrote:
>>>>>>> On 16.07.2025 18:07, Oleksii Kurochko wrote:
>>>>>>>> On 7/16/25 1:31 PM, Jan Beulich wrote:
>>>>>>>>> On 15.07.2025 16:47, Oleksii Kurochko wrote:
>>>>>>>>>> On 7/1/25 5:08 PM, Jan Beulich wrote:
>>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>>>>>>>> --- a/xen/arch/riscv/p2m.c
>>>>>>>>>>>> +++ b/xen/arch/riscv/p2m.c
>>>>>>>>>>>> @@ -345,6 +345,26 @@ static pte_t *p2m_get_root_pointer(struct
>>>>>>>>>>>> p2m_domain *p2m, gfn_t gfn)
>>>>>>>>>>>> return __map_domain_page(p2m->root + root_table_indx);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> +static int p2m_type_radix_set(struct p2m_domain *p2m, pte_t pte,
>>>>>>>>>>>> p2m_type_t t)
>>>>>>>>>>> See comments on the earlier patch regarding naming.
>>>>>>>>>>>
>>>>>>>>>>>> +{
>>>>>>>>>>>> + int rc;
>>>>>>>>>>>> + gfn_t gfn = mfn_to_gfn(p2m->domain, mfn_from_pte(pte));
>>>>>>>>>>> How does this work, when you record GFNs only for Xenheap pages?
>>>>>>>>>> I think I don't understand what is an issue. Could you please provide
>>>>>>>>>> some extra details?
>>>>>>>>> Counter question: The mfn_to_gfn() you currently have is only a stub.
>>>>>>>>> It only
>>>>>>>>> works for 1:1 mapped domains. Can you show me the eventual final
>>>>>>>>> implementation
>>>>>>>>> of the function, making it possible to use it here?
>>>>>>>> At the moment, I planned to support only 1:1 mapped domains, so it is
>>>>>>>> final
>>>>>>>> implementation.
>>>>>>> Isn't that on overly severe limitation?
>>>>>> I wouldn't say that it's a severe limitation, as it's just a matter of
>>>>>> how
>>>>>> |mfn_to_gfn()| is implemented. When non-1:1 mapped domains are supported,
>>>>>> |mfn_to_gfn()| can be implemented differently, while the code where it’s
>>>>>> called
>>>>>> will likely remain unchanged.
>>>>>>
>>>>>> What I meant in my reply is that, for the current state and current
>>>>>> limitations,
>>>>>> this is the final implementation of|mfn_to_gfn()|. But that doesn't mean
>>>>>> I don't
>>>>>> see the value in, or the need for, non-1:1 mapped domains—it's just that
>>>>>> this
>>>>>> limitation simplifies development at the current stage of the RISC-V
>>>>>> port.
>>>>> Simplification is fine in some cases, but not supporting the "normal" way
>>>>> of
>>>>> domain construction looks like a pretty odd restriction. I'm also curious
>>>>> how you envision to implement mfn_to_gfn() then, suitable for generic use
>>>>> like
>>>>> the one here. Imo, current limitation or not, you simply want to avoid
>>>>> use of
>>>>> that function outside of the special gnttab case.
>>>>>
>>>>>>>>>>> In this context (not sure if I asked before): With this use of a
>>>>>>>>>>> radix tree,
>>>>>>>>>>> how do you intend to bound the amount of memory that a domain can
>>>>>>>>>>> use, by
>>>>>>>>>>> making Xen insert very many entries?
>>>>>>>>>> I didn’t think about that. I assumed it would be enough to set the
>>>>>>>>>> amount of
>>>>>>>>>> memory a guest domain can use by specifying|xen,domain-p2m-mem-mb|
>>>>>>>>>> in the DTS,
>>>>>>>>>> or using some predefined value if|xen,domain-p2m-mem-mb| isn’t
>>>>>>>>>> explicitly set.
>>>>>>>>> Which would require these allocations to come from that pool.
>>>>>>>> Yes, and it is true only for non-hardware domains with the current
>>>>>>>> implementation.
>>>>>>> ???
>>>>>> I meant that pool is used now only for non-hardware domains at the
>>>>>> moment.
>>>>> And how does this matter here? The memory required for the radix tree
>>>>> doesn't
>>>>> come from that pool anyway.
>>>> I thought that is possible to do that somehow, but looking at a code of
>>>> radix-tree.c it seems like the only one way to allocate memroy for the
>>>> radix
>>>> tree isradix_tree_node_alloc() -> xzalloc(struct rcu_node).
>>>>
>>>> Then it is needed to introduce radix_tree_node_allocate(domain)
>>> That would be a possibility, but you may have seen that less than half a
>>> year ago we got rid of something along these lines. So it would require
>>> some pretty good justification to re-introduce.
>>>
>>>> or radix tree
>>>> can't be used at all for mentioned in the previous replies security
>>>> reason, no?
>>> (Very) careful use may still be possible. But the downside of using this
>>> (potentially long lookup times) would always remain.
>> Could you please clarify what do you mean here by "(Very) careful"?
>> I thought about an introduction of an amount of possible keys in radix tree
>> and if this amount
>> is 0 then stop domain. And it is also unclear what should be a value for
>> this amount.
>> Probably, you have better idea.
>>
>> But generally your idea below ...
>>>>>>>>>> Also, it seems this would just lead to the issue you mentioned
>>>>>>>>>> earlier: when
>>>>>>>>>> the memory runs out,|domain_crash()| will be called or PTE will be
>>>>>>>>>> zapped.
>>>>>>>>> Or one domain exhausting memory would cause another domain to fail. A
>>>>>>>>> domain
>>>>>>>>> impacting just itself may be tolerable. But a domain affecting other
>>>>>>>>> domains
>>>>>>>>> isn't.
>>>>>>>> But it seems like this issue could happen in any implementation. It
>>>>>>>> won't happen only
>>>>>>>> if we will have only pre-populated pool for any domain type (hardware,
>>>>>>>> control, guest
>>>>>>>> domain) without ability to extend them or allocate extra pages from
>>>>>>>> domheap in runtime.
>>>>>>>> Otherwise, if extra pages allocation is allowed then we can't really
>>>>>>>> do something
>>>>>>>> with this issue.
>>>>>>> But that's why I brought this up: You simply have to. Or, as indicated,
>>>>>>> the
>>>>>>> moment you mark Xen security-supported on RISC-V, there will be an XSA
>>>>>>> needed.
>>>>>> Why it isn't XSA for other architectures? At least, Arm then should have
>>>>>> such
>>>>>> XSA.
>>>>> Does Arm use a radix tree for storing types? It uses one for mem-access,
>>>>> but
>>>>> it's not clear to me whether that's actually a supported feature.
>>>>>
>>>>>> I don't understand why x86 won't have the same issue. Memory is the
>>>>>> limited
>>>>>> and shared resource, so if one of the domain will use to much memory
>>>>>> then it could
>>>>>> happen that other domains won't have enough memory for its purpose...
>>>>> The question is whether allocations are bounded. With this use of a radix
>>>>> tree,
>>>>> you give domains a way to have Xen allocate pretty much arbitrary amounts
>>>>> of
>>>>> memory to populate that tree. That unbounded-ness is the problem, not
>>>>> memory
>>>>> allocations in general.
>>>> Isn't radix tree key bounded to an amount of GFNs given for a domain? We
>>>> can't have
>>>> more keys then a max GFN number for a domain. So a potential amount of
>>>> necessary memory
>>>> for radix tree is also bounded to an amount of GFNs.
>>> To some degree yes, hence why I said "pretty much arbitrary amounts".
>>> But recall that "amount of GFNs" is a fuzzy term; I think you mean to
>>> use it to describe the amount of memory pages given to the guest. GFNs
>>> can be used for other purposes, though. Guests could e.g. grant
>>> themselves access to their own memory, then map those grants at
>>> otherwise unused GFNs.
>>>
>>>> Anyway, IIUC I just can't use radix tree for p2m types at all, right?
>>>> If yes, does it make sense to borrow 2 bits from struct
>>>> page_info->type_info as now it
>>>> is used 9-bits for count of a frame?
>>> struct page_info describes MFNs, when you want to describe GFNs. As you
>>> mentioned earlier, multiple GFNs can in principle map to the same MFN.
>>> You would force them to all have the same properties, which would be in
>>> direct conflict with e.g. the grant P2M types.
>>>
>>> Just to mention one possible alternative to using radix trees: You could
>>> maintain a 2nd set of intermediate "page tables", just that leaf entries
>>> would hold meta data for the respective GFN. The memory for those "page
>>> tables" could come from the normal P2M pool (and allocation would thus
>>> only consume domain-specific resources). Of course in any model like
>>> this the question of lookup times (as mentioned above) would remain.
>> ...looks like an optimal option.
>>
>> The only thing I worry about is that it will require some code duplication
>> (I will think how to re-use the current one code), as for example, when
>> setting/getting metadata, TLB flushing isn’t needed at all as we aren't
>> working with with real P2M page tables.
>> Agree that lookup won't be the best one, but nothing can be done with
>> such models.
>
> Probably, instead of having a second set of intermediate "page tables",
> we could just allocate two consecutive pages within the real P2M page
> tables for the intermediate page table. The first page would serve as
> the actual page table to which the intermediate page table points,
> and the second page would store metadata for each entry of the page
> table that the intermediate page table references.
>
> As we are supporting only 1gb, 2mb and 4kb mappings we could do a little
> optimization and start allocate these consecutive pages only for PT levels
> which corresponds to 1gb, 2mb, 4kb mappings.
>
> Does it make sense?
I was indeed entertaining this idea, but I couldn't conclude for myself if
that would indeed be without any rough edges. Hence I didn't want to
suggest such. For example, the need to have adjacent pairs of pages could
result in a higher rate of allocation failures (while populating or
re-sizing the P2M pool). This would be possible to avoid by still using
entirely separate pages, and then merely linking them together via some
unused struct page_info fields (the "normal" linking fields can't be used,
afaict).
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |