|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
On 09.07.2025 10:24, Oleksii Kurochko wrote:
>
> On 7/8/25 6:04 PM, Jan Beulich wrote:
>> On 08.07.2025 17:42, Oleksii Kurochko wrote:
>>> On 7/8/25 2:45 PM, Jan Beulich wrote:
>>>> On 08.07.2025 12:37, Oleksii Kurochko wrote:
>>>>> On 7/8/25 11:01 AM, Oleksii Kurochko wrote:
>>>>>> On 7/8/25 9:10 AM, Jan Beulich wrote:
>>>>>>> On 07.07.2025 18:10, Oleksii Kurochko wrote:
>>>>>>>> On 7/7/25 5:15 PM, Jan Beulich wrote:
>>>>>>>>> On 07.07.2025 17:00, Oleksii Kurochko wrote:
>>>>>>>>>> On 7/7/25 2:53 PM, Jan Beulich wrote:
>>>>>>>>>>> On 07.07.2025 13:46, Oleksii Kurochko wrote:
>>>>>>>>>>>> On 7/7/25 9:20 AM, Jan Beulich wrote:
>>>>>>>>>>>>> On 04.07.2025 17:01, Oleksii Kurochko wrote:
>>>>>>>>>>>>>> On 7/1/25 3:49 PM, Jan Beulich wrote:
>>>>>>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> + panic("%s: isn't implemented for now\n", __func__);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + return false;
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> For this function in particular, though: Besides the "p2me" in
>>>>>>>>>>>>>>> the name
>>>>>>>>>>>>>>> being somewhat odd (supposedly page table entries here are
>>>>>>>>>>>>>>> simply pte_t),
>>>>>>>>>>>>>>> how is this going to be different from pte_is_valid()?
>>>>>>>>>>>>>> pte_is_valid() is checking a real bit of PTE, but
>>>>>>>>>>>>>> p2me_is_valid() is checking
>>>>>>>>>>>>>> what is a type stored in the radix tree (p2m->p2m_types):
>>>>>>>>>>>>>> /*
>>>>>>>>>>>>>> * In the case of the P2M, the valid bit is used for
>>>>>>>>>>>>>> other purpose. Use
>>>>>>>>>>>>>> * the type to check whether an entry is valid.
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>> static inline bool p2me_is_valid(struct p2m_domain
>>>>>>>>>>>>>> *p2m, pte_t pte)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> return p2m_type_radix_get(p2m, pte) != p2m_invalid;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is done to track which page was modified by a guest.
>>>>>>>>>>>>> But then (again) the name doesn't convey what the function does.
>>>>>>>>>>>> Then probably p2me_type_is_valid(struct p2m_domain *p2m, pte_t
>>>>>>>>>>>> pte) would better.
>>>>>>>>>>> For P2M type checks please don't invent new naming, but use what
>>>>>>>>>>> both x86
>>>>>>>>>>> and Arm are already using. Note how we already have p2m_is_valid()
>>>>>>>>>>> in that
>>>>>>>>>>> set. Just that it's not doing what you want here.
>>>>>>>>>> Hm, why not doing what I want? p2m_is_valid() verifies if P2M entry
>>>>>>>>>> is valid.
>>>>>>>>>> And in here it is checked if P2M pte is valid from P2M point of view
>>>>>>>>>> by checking
>>>>>>>>>> the type in radix tree and/or in reserved PTEs bits (just to remind
>>>>>>>>>> we have only 2
>>>>>>>>>> free bits for type).
>>>>>>>>> Because this is how it's defined on x86:
>>>>>>>>>
>>>>>>>>> #define p2m_is_valid(_t) (p2m_to_mask(_t) & \
>>>>>>>>> (P2M_RAM_TYPES |
>>>>>>>>> p2m_to_mask(p2m_mmio_direct)))
>>>>>>>>>
>>>>>>>>> I.e. more strict that simply "!= p2m_invalid". And I think such
>>>>>>>>> predicates
>>>>>>>>> would better be uniform across architectures, such that in principle
>>>>>>>>> they
>>>>>>>>> might also be usable in common code (as we already do with
>>>>>>>>> p2m_is_foreign()).
>>>>>>>> Yeah, Arm isn't so strict in definition of p2m_is_valid() and it seems
>>>>>>>> like
>>>>>>>> x86 and Arm have different understanding what is valid.
>>>>>>>>
>>>>>>>> Except what mentioned in the comment that grant types aren't
>>>>>>>> considered valid
>>>>>>>> for x86 (and shouldn't be the same then for Arm?), it isn't clear why
>>>>>>>> x86's
>>>>>>>> p2m_is_valid() is stricter then Arm's one and if other arches should
>>>>>>>> be also
>>>>>>>> so strict.
>>>>>>> Arm's p2m_is_valid() is entirely different (and imo misnamed, but
>>>>>>> arguably one
>>>>>>> could also consider x86'es to require a better name). It's a local
>>>>>>> helper, not
>>>>>>> a P2M type checking predicate. With that in mind, you may of course
>>>>>>> follow
>>>>>>> Arm's model, but in the longer run we may need to do something about
>>>>>>> the name
>>>>>>> collision then.
>>>>>>>
>>>>>>>>>> The only use case I can think of is that the caller
>>>>>>>>>> might try to map the remaining GFNs again. But that doesn’t seem
>>>>>>>>>> very useful,
>>>>>>>>>> if|p2m_set_entry()| wasn’t able to map the full range, it likely
>>>>>>>>>> indicates a serious
>>>>>>>>>> issue, and retrying would probably result in the same error.
>>>>>>>>>>
>>>>>>>>>> The same applies to rolling back the state. It wouldn’t be difficult
>>>>>>>>>> to add a local
>>>>>>>>>> array to track all modified PTEs and then use it to revert the state
>>>>>>>>>> if needed.
>>>>>>>>>> But again, what would the caller do after the rollback? At this
>>>>>>>>>> point, it still seems
>>>>>>>>>> like the best option is simply to|panic(). |
>>>>>>>>>>
>>>>>>>>>> Basically, I don’t see or understand the cases where knowing how
>>>>>>>>>> many GFNs were
>>>>>>>>>> successfully mapped, or whether a rollback was performed, would
>>>>>>>>>> really help — because
>>>>>>>>>> in most cases, I don’t have a better option than just
>>>>>>>>>> calling|panic()| at the end.
>>>>>>>>> panic()-ing is of course only a last resort. Anything related to
>>>>>>>>> domain handling
>>>>>>>>> would better crash only the domain in question. And even that only if
>>>>>>>>> suitable
>>>>>>>>> error handling isn't possible.
>>>>>>>> And if there is no still any runnable domain available, for example,
>>>>>>>> we are creating
>>>>>>>> domain and some p2m mapping is called? Will it be enough just ignore
>>>>>>>> to boot this domain?
>>>>>>>> If yes, then it is enough to return only error code without returning
>>>>>>>> how many GFNs were
>>>>>>>> mapped or rollbacking as domain won't be ran anyway.
>>>>>>> During domain creation all you need to do is return an error. But when
>>>>>>> you write a
>>>>>>> generic function that's also (going to be) used at domain runtime, you
>>>>>>> need to
>>>>>>> consider what to do there in case of partial success.
>>>>>>>
>>>>>>>>>> For example, if I call|map_regions_p2mt()| for an MMIO region
>>>>>>>>>> described in a device
>>>>>>>>>> tree node, and the mapping fails partway through, I’m left with two
>>>>>>>>>> options: either
>>>>>>>>>> ignore the device (if it's not essential for Xen or guest
>>>>>>>>>> functionality) and continue
>>>>>>>>>> booting; in which case I’d need to perform a rollback, and
>>>>>>>>>> simply knowing the number
>>>>>>>>>> of successfully mapped GFNs may not be enough or, more likely, just
>>>>>>>>>> panic.
>>>>>>>>> Well, no. For example, before even trying to map you could check that
>>>>>>>>> the range
>>>>>>>>> of P2M entries covered is all empty.
>>>>>>>> Could it be that they aren't all empty? Then it seems like we have
>>>>>>>> overlapping and we can't
>>>>>>>> just do a mapping, right?
>>>>>>> Possibly that would simply mean to return an error, yes.
>>>>>>>
>>>>>>>> Won't be this procedure consume a lot of time as it is needed to go
>>>>>>>> through each page
>>>>>>>> tables for each entry.
>>>>>>> Well, you're free to suggest a clean alternative without doing so.
>>>>>> I thought about dynamically allocating an array in p2m_set_entry(),
>>>>>> where to save all changed PTEs,
>>>>>> and then use it to roll back if __p2m_set_entry() returns rc != 0 ...
>>>> That's another possible source for failure, and such an allocation may end
>>>> up being a rather big one.
>>>>
>>>>>>>>> _Then_ you know how to correctly roll back.
>>>>>>>>> And yes, doing so may not even require passing back information on
>>>>>>>>> how much of
>>>>>>>>> a region was successfully mapped.
>>>>>>>> If P2M entries were empty before start of the mapping then it is
>>>>>>>> enough to just go
>>>>>>>> through the same range (sgfn,nr,smfn) and just clean them, right?
>>>>>>> Yes, what else would "roll back" mean in that case?
>>>>>> ... If we know that the P2M entries were empty, then there's nothing
>>>>>> else to be done, just
>>>>>> clean PTE is needed to be done.
>>>>>> However, if the P2M entries weren’t empty (and I’m still not sure
>>>>>> whether that’s a legal
>>>>>> case), then rolling back would mean restoring their original state, the
>>>>>> state they
>>>>>> had before the P2M mapping procedure started.
>>>>> Possible roll back is harder to implement as expected because there is a
>>>>> case where subtree
>>>>> could be freed:
>>>>> /*
>>>>> * Free the entry only if the original pte was valid and the base
>>>>> * is different (to avoid freeing when permission is changed).
>>>>> */
>>>>> if ( p2me_is_valid(p2m, orig_pte) &&
>>>>> !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) )
>>>>> p2m_free_subtree(p2m, orig_pte, level);
>>>>> In this case then it will be needed to store the full subtree.
>>>> Right, which is why it may be desirable to limit the ability to update
>>>> multiple
>>>> entries in one go. Or work from certain assumptions, violation of which
>>>> would
>>>> cause the domain to be crashed.
>>> It seems to me that the main issue with updating multiple entries in one go
>>> is the rollback
>>> mechanism in case of a partial mapping failure. (other issues? mapping
>>> could consume a lot
>>> of time so something should wait while allocation will end?) In my opinion,
>>> the rollback
>>> mechanism is quite complex to implement and could become a source of
>>> further failures.
>>> For example, most of the cases where p2m_set_entry() could fail are due to
>>> failure in
>>> mapping the page table (to allow Xen to walk through it) or failure in
>>> creating a new page
>>> table due to memory exhaustion. Then, during rollback, which might also
>>> require memory
>>> allocation, we could face the same memory shortage issue.
>>> And what should be done in that case?
>>>
>>> In my opinion, the best option is to simply return from p2m_set_entry() the
>>> number of
>>> successfully mapped GFNs (stored in rc which is returned by
>>> p2m_set_entry()) and let
>>> the caller decide how to handle the partial mapping:
>>> 1. If a partial mapping occurs during domain creation, we could just report
>>> that this
>>> domain can't be created and continue without it if there are other
>>> domains to start;
>>> otherwise, panic.
>> I don't see how panic()-ing is relevant here. That's to be decided (far) up
>> the call stack.
>
> So it's just a question of whether the caller should panic() or propagate the
> return
> value (error code) up the call stack.
>
> For example, in case of domain construction return value is propogate almost
> to the top
> of the stack:
> p2m_set_entry(p2m_access_t a, p2m_type_t t, mfn_t smfn, unsigned long nr,
> gfn_t sgfn, struct p2m_domain * p2m)
> (/run/media/ok/blue_disk//xen/xen/arch/riscv/p2m.c:1005)
> p2m_insert_mapping(struct domain * d, gfn_t start_gfn, unsigned long nr,
> mfn_t mfn, p2m_type_t t)
> (/run/media/ok/blue_disk//xen/xen/arch/riscv/p2m.c:1055)
> guest_physmap_add_entry(struct domain * d, gfn_t gfn, mfn_t mfn, unsigned
> long page_order, p2m_type_t t)
> (/run/media/ok/blue_disk//xen/xen/arch/riscv/p2m.c:1076)
> guest_physmap_add_page(unsigned int page_order, struct domain * d)
> (/run/media/ok/blue_disk//xen/xen/arch/riscv/include/asm/p2m.h:152)
> guest_map_pages(struct domain * d, struct page_info * pg, unsigned int
> order, void * extra)
> (/run/media/ok/blue_disk//xen/xen/common/device-tree/domain-build.c:63)
> allocate_domheap_memory(struct domain * d, paddr_t tot_size,
> alloc_domheap_mem_cb cb, void * extra)
> (/run/media/ok/blue_disk//xen/xen/common/device-tree/domain-build.c:47)
> allocate_bank_memory(struct kernel_info * kinfo, gfn_t sgfn, paddr_t
> tot_size)
> (/run/media/ok/blue_disk//xen/xen/common/device-tree/domain-build.c:99)
> allocate_memory(struct domain * d, struct kernel_info * kinfo)
> (/run/media/ok/blue_disk//xen/xen/include/xen/mm-frame.h:43)
> construct_domU(struct domain * d, const struct dt_device_node * node)
> (/run/media/ok/blue_disk//xen/xen/common/device-tree/dom0less-build.c:835)
> create_domUs()
> (/run/media/ok/blue_disk//xen/xen/common/device-tree/dom0less-build.c:1019)
> start_xen(unsigned long bootcpu_id, paddr_t dtb_addr)
> (/run/media/ok/blue_disk//xen/xen/arch/riscv/setup.c:296)
> start() (/run/media/ok/blue_disk//xen/xen/arch/riscv/riscv64/head.S:61)
>
> And panic() almost at the end:
> rc = construct_domU(d, node);
> if ( rc )
> panic("Could not set up domain %s (rc = %d)\n",
> dt_node_name(node), rc);
Which is what is wanted, imo.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |