|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1 2/6] xen/riscv: introduce things necessary for p2m initialization
On 05.06.2025 16:10, Oleksii Kurochko wrote:
>
> On 6/2/25 1:04 PM, Jan Beulich wrote:
>> On 23.05.2025 11:44, Oleksii Kurochko wrote:
>>> On 5/22/25 6:09 PM, Jan Beulich wrote:
>>>> On 22.05.2025 17:53, Oleksii Kurochko wrote:
>>>>> On 5/20/25 3:37 PM, Jan Beulich wrote:
>>>>>> On 09.05.2025 17:57, Oleksii Kurochko wrote:
>>>>>>> +static struct page_info *p2m_get_clean_page(struct domain *d)
>>>>>>> +{
>>>>>>> + struct page_info *page;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * As mentioned in the Priviliged Architecture Spec (version
>>>>>>> 20240411)
>>>>>>> + * As explained in Section 18.5.1, for the paged virtual-memory
>>>>>>> schemes
>>>>>>> + * (Sv32x4, Sv39x4, Sv48x4, and Sv57x4), the root page table is 16
>>>>>>> KiB
>>>>>>> + * and must be aligned to a 16-KiB boundary.
>>>>>>> + */
>>>>>>> + page = alloc_domheap_pages(NULL, 2, 0);
>>>>>> Shouldn't this allocation come from the domain's P2M pool (which is yet
>>>>>> to be introduced)?
>>>>> First, I will drop p2m_get_clean_page() as it will be used only for p2m
>>>>> root page
>>>>> table allocation.
>>>>>
>>>>> p2m_init() is called by domain_create()
>>>>> [->arch_domain_create()->p2m_init()] from create_domUs():
>>>>> [https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/device-tree/dom0less-build.c?ref_type=heads#L984].
>>>>>
>>>>> When p2m_init() is called, p2m pool isn't ready and domain isn't created
>>>>> yet. Last one
>>>>> is also crucial for usage of p2m pool as p2m pool belongs to domain and
>>>>> thereby it is
>>>>> using alloc_domheap_page(d, ...) (Not NULL as for allocation of p2m root
>>>>> table above),
>>>>> so domain should be created first.
>>>> Yet that is part of my point: This allocation should be against the domain,
>>>> so it is properly accounted. What's the problem with allocating the root
>>>> table when the pools is being created / filled?
>>> I can't use pages from pool for root table as they aren't properly aligned.
>>>
>>> At the moment, creation of p2m pool looks like:
>>> int p2m_set_allocation(struct domain *d, unsigned long pages, bool
>>> *preempted)
>>> {
>>> struct page_info *pg;
>>>
>>> ASSERT(spin_is_locked(&d->arch.paging.lock));
>>>
>>> for ( ; ; )
>>> {
>>> if ( d->arch.paging.p2m_total_pages < pages )
>>> {
>>> /* Need to allocate more memory from domheap */
>>> pg = alloc_domheap_page(d, MEMF_no_owner);
>>> if ( pg == NULL )
>>> {
>>> printk(XENLOG_ERR "Failed to allocate P2M pages.\n");
>>> return -ENOMEM;
>>> }
>>> ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
>>> d->arch.paging.p2m_total_pages + 1;
>>> page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
>>> }
>>> ...
>>> }
>>>
>>> return 0;
>>> }
>>> alloc_domheap_page(d, MEMF_no_owner) allocates page table with order 0, so
>>> 4k-aligned page table.
>>> But if I needed 16k for root table and it should be 16k-aligned then I
>>> still have to use
>>> alloc_domheap_pages(NULL, 2, 0);
>>>
>>> Or do you mean that I have to something like:
>>> int p2m_set_allocation(struct domain *d, unsigned long pages, bool
>>> *preempted)
>>> {
>>> struct page_info *pg;
>>>
>>> ASSERT(spin_is_locked(&d->arch.paging.lock));
>>>
>>> + if ( !d->arch.p2m.root )
>>> + {
>>> + unsigned int order = get_order_from_bytes(KB(16));
>>> + unsigned int nr_pages = _AC(1,U) << order;
>>> + /*
>>> + * As mentioned in the Priviliged Architecture Spec (version
>>> 20240411)
>>> + * As explained in Section 18.5.1, for the paged virtual-memory
>>> schemes
>>> + * (Sv32x4, Sv39x4, Sv48x4, and Sv57x4), the root page table is 16
>>> KiB
>>> + * and must be aligned to a 16-KiB boundary.
>>> + */
>>> + d->arch.p2m.root = alloc_domheap_pages(d, order, MEMF_no_owner);
>>> + if ( d->arch.p2m.root == NULL )
>>> + panic("root page table hasn't been allocated\n");
>>> +
>>> + clear_and_clean_page(d->arch.p2m.root);
>>> +
>>> + /* TODO: do I need TLB flush here? */
>>> +
>>> + ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
>>> + d->arch.paging.p2m_total_pages + nr_pages;
>>> + }
>>> +
>>> ...
>>> }
>> Neither. I was thinking of you taking 4 pages off the pool in exchange for
>> the
>> order-2 allocation. Primarily to get the memory accounting right (or at least
>> closer to reality).
>
> Do you mean that I have to call 4 times page_list_remove_head(), something
> like
> that:
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -215,6 +215,44 @@ int p2m_set_allocation(struct domain *d, unsigned long
> pages, bool *preempted)
> }
> }
>
> + if ( !d->arch.p2m.root )
> + {
> + unsigned int order = get_order_from_bytes(KB(16));
> + unsigned int nr_pages = _AC(1,U) << order;
> +
> + if ( ACCESS_ONCE(d->arch.paging.p2m_total_pages) < nr_pages )
> + panic("Specify more xen,domain-p2m-mem-mb\n");
> +
> + /*
> + * As mentioned in the Priviliged Architecture Spec (version
> 20240411)
> + * As explained in Section 18.5.1, for the paged virtual-memory
> schemes
> + * (Sv32x4, Sv39x4, Sv48x4, and Sv57x4), the root page table is 16
> KiB
> + * and must be aligned to a 16-KiB boundary.
> + */
> + d->arch.p2m.root = alloc_domheap_pages(NULL, order, 0);
Imo you'd better not use NULL here, but instead pass MEMF_no_owner. See
respective x86 code. I also think you want to do the freeing first, and
only then do this allocation, such that ...
> + if ( d->arch.p2m.root == NULL )
> + panic("failed to allocate p2m root page table\n");
> +
> + for ( unsigned int i = 0; i < nr_pages; i++ )
> + {
> + clear_and_clean_page(d->arch.p2m.root + i);
> +
> + /* Return memory to domheap */
> + pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
> + if( pg )
> + {
> + ACCESS_ONCE(d->arch.paging.p2m_total_pages)--;
> + free_domheap_page(pg);
> + }
> + else
> + {
> + printk(XENLOG_ERR
> + "Failed to free P2M pages, P2M freelist is empty.\n");
> + return -ENOMEM;
... this path will not have eaten more memory than was given back.
>>>>>>> +static int p2m_alloc_table(struct domain *d)
>>>>>>> +{
>>>>>>> + struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>>>>> +
>>>>>>> + p2m->root = p2m_allocate_root(d);
>>>>>>> + if ( !p2m->root )
>>>>>>> + return -ENOMEM;
>>>>>>> +
>>>>>>> + p2m->hgatp = hgatp_from_page_info(p2m->root);
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Make sure that all TLBs corresponding to the new VMID are
>>>>>>> flushed
>>>>>>> + * before using it.
>>>>>>> + */
>>>>>>> + p2m_write_lock(p2m);
>>>>>>> + p2m_force_tlb_flush_sync(p2m);
>>>>>>> + p2m_write_unlock(p2m);
>>>>>> While Andrew directed you towards a better model in general, it won't be
>>>>>> usable here then, as the guest didn't run on any pCPU(s) yet. Imo you
>>>>>> want to do a single global flush e.g. when VMIDs wrap around. That'll be
>>>>>> fewer global flushes than one per VM creation.
>>>>> I am not sure that I get a phrase 'VMIDs wrap around'.
>>>> You have to allocate them somehow. Typically you'll use the next one
>>>> available.
>>>> At some point you will need to start over, searching from the beginning.
>>>> Prior
>>>> to that now allocation of a new one will require any flush, as none of them
>>>> had be in use before (after boot or the last such flush).
>>> Thanks. Now I get your point.
>>>
>>> Won't be better to do TLB flushing during destroying of a domain so then we
>>> will
>>> be sure that TLBs connected to freed VMID aren't present in TLB anymore?
>> That's an option, but will result in more flushes. Furthermore there may be
>> reasons to change the VMID for a domain while it's running.
>>
>>> IIUC, it will work only if VMID is used, right?
>> Well, anything VMID related is of course only relevant when VMIDs are in use.
>>
>>> In case if VMID isn't used, probably we can drop flushing here and do a
>>> flush
>>> during booting, right?
>> That'll be too little flushing?
>
> I meant that instead of having TLB flush in p2m_alloc_table() we could have a
> one flush
> during booting. And of course, we still should have flush on each context
> switch.
Yet as said - context switches are likely too frequent for having an
unconditional flush there (if it can be avoided).
>>> Won't be enough to flushing of guest TLB only during context switch?
>> "only" is interesting here. Context switches are a relatively frequent
>> operation, which in addition you want to be fast. If a flush is necessary
>> there for correctness (e.g. when VMIDs aren't in use), you have to do it
>> there. But if you can flush less frequently without violating correctness,
>> you'd almost always want to use such an opportunity.
>
> Then it is better to introduce VMID now, it seems it's only one place where
> it should be set, when hgatp is initialized.
>
> Does Xen have some framework to work with VMID?
That's all arch-specific, I think.
>>>>> I am going to implement, p2m_force_tlb_flush_sync() as:
>>>>> static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
>>>>> {
>>>>> ...
>>>>> sbi_remote_hfence_gvma(d->dirty_cpumask, 0, 0);
>>>>> ...
>>>>> }
>>>>>
>>>>> With such implementation if the guest didn't run on any pCPU(s) yet
>>>>> then d->dirty_cpumask is empty, then sbi_remote_hfence_gvma() will do
>>>>> nothing
>>>>> as hmask will be NULL
>>>>> (https://gitlab.com/xen-project/people/olkur/xen/-/blob/staging/xen/arch/riscv/sbi.c?ref_type=heads#L238).
>>>>> I am not sure that it is a good idea as I can't find a guarantee in the
>>>>> spec
>>>>> that TLB will be empty during boot time.
>>>> If in doubt, do one global flush while booting.
>>> By booting you mean somewhere in continue_new_vcpu()?
>> I don't particularly mean any specific place. However, continue_new_vcpu()
>> (by its name) isn't involved in bringing up Xen, is it?
>>
> No, it isn't. By booting here I meant a boot of a guest domain, not Xen
> itself.
Please don't call this "booting", but "starting of a guest" (or "launching" or
some such). When you originally said "booting" I thought RISC-V wouldn't
guarantee clean TLBs when being booted, and hence suggested to cover for this
by doing a single flush during (Xen) boot. Looks like this may not be needed
then, simply because of the misunderstanding.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |