|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
On 08.07.2025 17:42, Oleksii Kurochko wrote:
>
> On 7/8/25 2:45 PM, Jan Beulich wrote:
>> On 08.07.2025 12:37, Oleksii Kurochko wrote:
>>> On 7/8/25 11:01 AM, Oleksii Kurochko wrote:
>>>>
>>>> On 7/8/25 9:10 AM, Jan Beulich wrote:
>>>>> On 07.07.2025 18:10, Oleksii Kurochko wrote:
>>>>>> On 7/7/25 5:15 PM, Jan Beulich wrote:
>>>>>>> On 07.07.2025 17:00, Oleksii Kurochko wrote:
>>>>>>>> On 7/7/25 2:53 PM, Jan Beulich wrote:
>>>>>>>>> On 07.07.2025 13:46, Oleksii Kurochko wrote:
>>>>>>>>>> On 7/7/25 9:20 AM, Jan Beulich wrote:
>>>>>>>>>>> On 04.07.2025 17:01, Oleksii Kurochko wrote:
>>>>>>>>>>>> On 7/1/25 3:49 PM, Jan Beulich wrote:
>>>>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + panic("%s: isn't implemented for now\n", __func__);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + return false;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>> For this function in particular, though: Besides the "p2me" in
>>>>>>>>>>>>> the name
>>>>>>>>>>>>> being somewhat odd (supposedly page table entries here are simply
>>>>>>>>>>>>> pte_t),
>>>>>>>>>>>>> how is this going to be different from pte_is_valid()?
>>>>>>>>>>>> pte_is_valid() is checking a real bit of PTE, but p2me_is_valid()
>>>>>>>>>>>> is checking
>>>>>>>>>>>> what is a type stored in the radix tree (p2m->p2m_types):
>>>>>>>>>>>> /*
>>>>>>>>>>>> * In the case of the P2M, the valid bit is used for other
>>>>>>>>>>>> purpose. Use
>>>>>>>>>>>> * the type to check whether an entry is valid.
>>>>>>>>>>>> */
>>>>>>>>>>>> static inline bool p2me_is_valid(struct p2m_domain *p2m,
>>>>>>>>>>>> pte_t pte)
>>>>>>>>>>>> {
>>>>>>>>>>>> return p2m_type_radix_get(p2m, pte) != p2m_invalid;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> It is done to track which page was modified by a guest.
>>>>>>>>>>> But then (again) the name doesn't convey what the function does.
>>>>>>>>>> Then probably p2me_type_is_valid(struct p2m_domain *p2m, pte_t pte)
>>>>>>>>>> would better.
>>>>>>>>> For P2M type checks please don't invent new naming, but use what both
>>>>>>>>> x86
>>>>>>>>> and Arm are already using. Note how we already have p2m_is_valid() in
>>>>>>>>> that
>>>>>>>>> set. Just that it's not doing what you want here.
>>>>>>>> Hm, why not doing what I want? p2m_is_valid() verifies if P2M entry is
>>>>>>>> valid.
>>>>>>>> And in here it is checked if P2M pte is valid from P2M point of view
>>>>>>>> by checking
>>>>>>>> the type in radix tree and/or in reserved PTEs bits (just to remind we
>>>>>>>> have only 2
>>>>>>>> free bits for type).
>>>>>>> Because this is how it's defined on x86:
>>>>>>>
>>>>>>> #define p2m_is_valid(_t) (p2m_to_mask(_t) & \
>>>>>>> (P2M_RAM_TYPES |
>>>>>>> p2m_to_mask(p2m_mmio_direct)))
>>>>>>>
>>>>>>> I.e. more strict that simply "!= p2m_invalid". And I think such
>>>>>>> predicates
>>>>>>> would better be uniform across architectures, such that in principle
>>>>>>> they
>>>>>>> might also be usable in common code (as we already do with
>>>>>>> p2m_is_foreign()).
>>>>>> Yeah, Arm isn't so strict in definition of p2m_is_valid() and it seems
>>>>>> like
>>>>>> x86 and Arm have different understanding what is valid.
>>>>>>
>>>>>> Except what mentioned in the comment that grant types aren't considered
>>>>>> valid
>>>>>> for x86 (and shouldn't be the same then for Arm?), it isn't clear why
>>>>>> x86's
>>>>>> p2m_is_valid() is stricter then Arm's one and if other arches should be
>>>>>> also
>>>>>> so strict.
>>>>> Arm's p2m_is_valid() is entirely different (and imo misnamed, but
>>>>> arguably one
>>>>> could also consider x86'es to require a better name). It's a local
>>>>> helper, not
>>>>> a P2M type checking predicate. With that in mind, you may of course follow
>>>>> Arm's model, but in the longer run we may need to do something about the
>>>>> name
>>>>> collision then.
>>>>>
>>>>>>>> The only use case I can think of is that the caller
>>>>>>>> might try to map the remaining GFNs again. But that doesn’t seem very
>>>>>>>> useful,
>>>>>>>> if|p2m_set_entry()| wasn’t able to map the full range, it likely
>>>>>>>> indicates a serious
>>>>>>>> issue, and retrying would probably result in the same error.
>>>>>>>>
>>>>>>>> The same applies to rolling back the state. It wouldn’t be difficult
>>>>>>>> to add a local
>>>>>>>> array to track all modified PTEs and then use it to revert the state
>>>>>>>> if needed.
>>>>>>>> But again, what would the caller do after the rollback? At this point,
>>>>>>>> it still seems
>>>>>>>> like the best option is simply to|panic(). |
>>>>>>>>
>>>>>>>> Basically, I don’t see or understand the cases where knowing how many
>>>>>>>> GFNs were
>>>>>>>> successfully mapped, or whether a rollback was performed, would really
>>>>>>>> help — because
>>>>>>>> in most cases, I don’t have a better option than just calling|panic()|
>>>>>>>> at the end.
>>>>>>> panic()-ing is of course only a last resort. Anything related to domain
>>>>>>> handling
>>>>>>> would better crash only the domain in question. And even that only if
>>>>>>> suitable
>>>>>>> error handling isn't possible.
>>>>>> And if there is no still any runnable domain available, for example, we
>>>>>> are creating
>>>>>> domain and some p2m mapping is called? Will it be enough just ignore to
>>>>>> boot this domain?
>>>>>> If yes, then it is enough to return only error code without returning
>>>>>> how many GFNs were
>>>>>> mapped or rollbacking as domain won't be ran anyway.
>>>>> During domain creation all you need to do is return an error. But when
>>>>> you write a
>>>>> generic function that's also (going to be) used at domain runtime, you
>>>>> need to
>>>>> consider what to do there in case of partial success.
>>>>>
>>>>>>>> For example, if I call|map_regions_p2mt()| for an MMIO region
>>>>>>>> described in a device
>>>>>>>> tree node, and the mapping fails partway through, I’m left with two
>>>>>>>> options: either
>>>>>>>> ignore the device (if it's not essential for Xen or guest
>>>>>>>> functionality) and continue
>>>>>>>> booting; in which case I’d need to perform a rollback, and simply
>>>>>>>> knowing the number
>>>>>>>> of successfully mapped GFNs may not be enough or, more likely, just
>>>>>>>> panic.
>>>>>>> Well, no. For example, before even trying to map you could check that
>>>>>>> the range
>>>>>>> of P2M entries covered is all empty.
>>>>>> Could it be that they aren't all empty? Then it seems like we have
>>>>>> overlapping and we can't
>>>>>> just do a mapping, right?
>>>>> Possibly that would simply mean to return an error, yes.
>>>>>
>>>>>> Won't be this procedure consume a lot of time as it is needed to go
>>>>>> through each page
>>>>>> tables for each entry.
>>>>> Well, you're free to suggest a clean alternative without doing so.
>>>> I thought about dynamically allocating an array in p2m_set_entry(), where
>>>> to save all changed PTEs,
>>>> and then use it to roll back if __p2m_set_entry() returns rc != 0 ...
>> That's another possible source for failure, and such an allocation may end
>> up being a rather big one.
>>
>>>>>>> _Then_ you know how to correctly roll back.
>>>>>>> And yes, doing so may not even require passing back information on how
>>>>>>> much of
>>>>>>> a region was successfully mapped.
>>>>>> If P2M entries were empty before start of the mapping then it is enough
>>>>>> to just go
>>>>>> through the same range (sgfn,nr,smfn) and just clean them, right?
>>>>> Yes, what else would "roll back" mean in that case?
>>>> ... If we know that the P2M entries were empty, then there's nothing else
>>>> to be done, just
>>>> clean PTE is needed to be done.
>>>> However, if the P2M entries weren’t empty (and I’m still not sure whether
>>>> that’s a legal
>>>> case), then rolling back would mean restoring their original state, the
>>>> state they
>>>> had before the P2M mapping procedure started.
>>> Possible roll back is harder to implement as expected because there is a
>>> case where subtree
>>> could be freed:
>>> /*
>>> * Free the entry only if the original pte was valid and the base
>>> * is different (to avoid freeing when permission is changed).
>>> */
>>> if ( p2me_is_valid(p2m, orig_pte) &&
>>> !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) )
>>> p2m_free_subtree(p2m, orig_pte, level);
>>> In this case then it will be needed to store the full subtree.
>> Right, which is why it may be desirable to limit the ability to update
>> multiple
>> entries in one go. Or work from certain assumptions, violation of which would
>> cause the domain to be crashed.
>
> It seems to me that the main issue with updating multiple entries in one go
> is the rollback
> mechanism in case of a partial mapping failure. (other issues? mapping could
> consume a lot
> of time so something should wait while allocation will end?) In my opinion,
> the rollback
> mechanism is quite complex to implement and could become a source of further
> failures.
> For example, most of the cases where p2m_set_entry() could fail are due to
> failure in
> mapping the page table (to allow Xen to walk through it) or failure in
> creating a new page
> table due to memory exhaustion. Then, during rollback, which might also
> require memory
> allocation, we could face the same memory shortage issue.
> And what should be done in that case?
>
> In my opinion, the best option is to simply return from p2m_set_entry() the
> number of
> successfully mapped GFNs (stored in rc which is returned by p2m_set_entry())
> and let
> the caller decide how to handle the partial mapping:
> 1. If a partial mapping occurs during domain creation, we could just report
> that this
> domain can't be created and continue without it if there are other
> domains to start;
> otherwise, panic.
I don't see how panic()-ing is relevant here. That's to be decided (far) up
the call stack.
> 2. If a partial mapping occurs during the lifetime of a domain, for example,
> if the domain
> requests to map some memory, we return the number of successfully mapped
> GFNs and let the
> domain decide what to do: either remove the mappings or retry mapping the
> remaining part.
> However, I think there's not much value in retrying, since
> p2m_set_entry() is likely to
> fail again. So, perhaps the best course of action is to stop the domain
> altogether.
> Does that make sense?
Sure, why not. Provided you actually have a way to communicate back how much
was mapped.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |