Xen project Mailing List

Re: [PATCH v1 2/6] xen/riscv: introduce things necessary for p2m initialization

To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Date: Thu, 5 Jun 2025 16:19:14 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 05 Jun 2025 14:19:21 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.06.2025 16:10, Oleksii Kurochko wrote: > > On 6/2/25 1:04 PM, Jan Beulich wrote: >> On 23.05.2025 11:44, Oleksii Kurochko wrote: >>> On 5/22/25 6:09 PM, Jan Beulich wrote: >>>> On 22.05.2025 17:53, Oleksii Kurochko wrote: >>>>> On 5/20/25 3:37 PM, Jan Beulich wrote: >>>>>> On 09.05.2025 17:57, Oleksii Kurochko wrote: >>>>>>> +static struct page_info *p2m_get_clean_page(struct domain *d) >>>>>>> +{ >>>>>>> + struct page_info *page; >>>>>>> + >>>>>>> + /* >>>>>>> + * As mentioned in the Priviliged Architecture Spec (version >>>>>>> 20240411) >>>>>>> + * As explained in Section 18.5.1, for the paged virtual-memory >>>>>>> schemes >>>>>>> + * (Sv32x4, Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 >>>>>>> KiB >>>>>>> + * and must be aligned to a 16-KiB boundary. >>>>>>> + */ >>>>>>> + page = alloc_domheap_pages(NULL, 2, 0); >>>>>> Shouldn't this allocation come from the domain's P2M pool (which is yet >>>>>> to be introduced)? >>>>> First, I will drop p2m_get_clean_page() as it will be used only for p2m >>>>> root page >>>>> table allocation. >>>>> >>>>> p2m_init() is called by domain_create() >>>>> [->arch_domain_create()->p2m_init()] from create_domUs(): >>>>> [https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/device-tree/dom0less-build.c?ref_type=heads#L984]. >>>>> >>>>> When p2m_init() is called, p2m pool isn't ready and domain isn't created >>>>> yet. Last one >>>>> is also crucial for usage of p2m pool as p2m pool belongs to domain and >>>>> thereby it is >>>>> using alloc_domheap_page(d, ...) (Not NULL as for allocation of p2m root >>>>> table above), >>>>> so domain should be created first. >>>> Yet that is part of my point: This allocation should be against the domain, >>>> so it is properly accounted. What's the problem with allocating the root >>>> table when the pools is being created / filled? >>> I can't use pages from pool for root table as they aren't properly aligned. >>> >>> At the moment, creation of p2m pool looks like: >>> int p2m_set_allocation(struct domain *d, unsigned long pages, bool >>> *preempted) >>> { >>> struct page_info *pg; >>> >>> ASSERT(spin_is_locked(&d->arch.paging.lock)); >>> >>> for ( ; ; ) >>> { >>> if ( d->arch.paging.p2m_total_pages < pages ) >>> { >>> /* Need to allocate more memory from domheap */ >>> pg = alloc_domheap_page(d, MEMF_no_owner); >>> if ( pg == NULL ) >>> { >>> printk(XENLOG_ERR "Failed to allocate P2M pages.\n"); >>> return -ENOMEM; >>> } >>> ACCESS_ONCE(d->arch.paging.p2m_total_pages) = >>> d->arch.paging.p2m_total_pages + 1; >>> page_list_add_tail(pg, &d->arch.paging.p2m_freelist); >>> } >>> ... >>> } >>> >>> return 0; >>> } >>> alloc_domheap_page(d, MEMF_no_owner) allocates page table with order 0, so >>> 4k-aligned page table. >>> But if I needed 16k for root table and it should be 16k-aligned then I >>> still have to use >>> alloc_domheap_pages(NULL, 2, 0); >>> >>> Or do you mean that I have to something like: >>> int p2m_set_allocation(struct domain *d, unsigned long pages, bool >>> *preempted) >>> { >>> struct page_info *pg; >>> >>> ASSERT(spin_is_locked(&d->arch.paging.lock)); >>> >>> + if ( !d->arch.p2m.root ) >>> + { >>> + unsigned int order = get_order_from_bytes(KB(16)); >>> + unsigned int nr_pages = _AC(1,U) << order; >>> + /* >>> + * As mentioned in the Priviliged Architecture Spec (version >>> 20240411) >>> + * As explained in Section 18.5.1, for the paged virtual-memory >>> schemes >>> + * (Sv32x4, Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 >>> KiB >>> + * and must be aligned to a 16-KiB boundary. >>> + */ >>> + d->arch.p2m.root = alloc_domheap_pages(d, order, MEMF_no_owner); >>> + if ( d->arch.p2m.root == NULL ) >>> + panic("root page table hasn't been allocated\n"); >>> + >>> + clear_and_clean_page(d->arch.p2m.root); >>> + >>> + /* TODO: do I need TLB flush here? */ >>> + >>> + ACCESS_ONCE(d->arch.paging.p2m_total_pages) = >>> + d->arch.paging.p2m_total_pages + nr_pages; >>> + } >>> + >>> ... >>> } >> Neither. I was thinking of you taking 4 pages off the pool in exchange for >> the >> order-2 allocation. Primarily to get the memory accounting right (or at least >> closer to reality). > > Do you mean that I have to call 4 times page_list_remove_head(), something > like > that: > --- a/xen/arch/riscv/p2m.c > +++ b/xen/arch/riscv/p2m.c > @@ -215,6 +215,44 @@ int p2m_set_allocation(struct domain *d, unsigned long > pages, bool *preempted) > } > } > > + if ( !d->arch.p2m.root ) > + { > + unsigned int order = get_order_from_bytes(KB(16)); > + unsigned int nr_pages = _AC(1,U) << order; > + > + if ( ACCESS_ONCE(d->arch.paging.p2m_total_pages) < nr_pages ) > + panic("Specify more xen,domain-p2m-mem-mb\n"); > + > + /* > + * As mentioned in the Priviliged Architecture Spec (version > 20240411) > + * As explained in Section 18.5.1, for the paged virtual-memory > schemes > + * (Sv32x4, Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 > KiB > + * and must be aligned to a 16-KiB boundary. > + */ > + d->arch.p2m.root = alloc_domheap_pages(NULL, order, 0); Imo you'd better not use NULL here, but instead pass MEMF_no_owner. See respective x86 code. I also think you want to do the freeing first, and only then do this allocation, such that ... > + if ( d->arch.p2m.root == NULL ) > + panic("failed to allocate p2m root page table\n"); > + > + for ( unsigned int i = 0; i < nr_pages; i++ ) > + { > + clear_and_clean_page(d->arch.p2m.root + i); > + > + /* Return memory to domheap */ > + pg = page_list_remove_head(&d->arch.paging.p2m_freelist); > + if( pg ) > + { > + ACCESS_ONCE(d->arch.paging.p2m_total_pages)--; > + free_domheap_page(pg); > + } > + else > + { > + printk(XENLOG_ERR > + "Failed to free P2M pages, P2M freelist is empty.\n"); > + return -ENOMEM; ... this path will not have eaten more memory than was given back. >>>>>>> +static int p2m_alloc_table(struct domain *d) >>>>>>> +{ >>>>>>> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >>>>>>> + >>>>>>> + p2m->root = p2m_allocate_root(d); >>>>>>> + if ( !p2m->root ) >>>>>>> + return -ENOMEM; >>>>>>> + >>>>>>> + p2m->hgatp = hgatp_from_page_info(p2m->root); >>>>>>> + >>>>>>> + /* >>>>>>> + * Make sure that all TLBs corresponding to the new VMID are >>>>>>> flushed >>>>>>> + * before using it. >>>>>>> + */ >>>>>>> + p2m_write_lock(p2m); >>>>>>> + p2m_force_tlb_flush_sync(p2m); >>>>>>> + p2m_write_unlock(p2m); >>>>>> While Andrew directed you towards a better model in general, it won't be >>>>>> usable here then, as the guest didn't run on any pCPU(s) yet. Imo you >>>>>> want to do a single global flush e.g. when VMIDs wrap around. That'll be >>>>>> fewer global flushes than one per VM creation. >>>>> I am not sure that I get a phrase 'VMIDs wrap around'. >>>> You have to allocate them somehow. Typically you'll use the next one >>>> available. >>>> At some point you will need to start over, searching from the beginning. >>>> Prior >>>> to that now allocation of a new one will require any flush, as none of them >>>> had be in use before (after boot or the last such flush). >>> Thanks. Now I get your point. >>> >>> Won't be better to do TLB flushing during destroying of a domain so then we >>> will >>> be sure that TLBs connected to freed VMID aren't present in TLB anymore? >> That's an option, but will result in more flushes. Furthermore there may be >> reasons to change the VMID for a domain while it's running. >> >>> IIUC, it will work only if VMID is used, right? >> Well, anything VMID related is of course only relevant when VMIDs are in use. >> >>> In case if VMID isn't used, probably we can drop flushing here and do a >>> flush >>> during booting, right? >> That'll be too little flushing? > > I meant that instead of having TLB flush in p2m_alloc_table() we could have a > one flush > during booting. And of course, we still should have flush on each context > switch. Yet as said - context switches are likely too frequent for having an unconditional flush there (if it can be avoided). >>> Won't be enough to flushing of guest TLB only during context switch? >> "only" is interesting here. Context switches are a relatively frequent >> operation, which in addition you want to be fast. If a flush is necessary >> there for correctness (e.g. when VMIDs aren't in use), you have to do it >> there. But if you can flush less frequently without violating correctness, >> you'd almost always want to use such an opportunity. > > Then it is better to introduce VMID now, it seems it's only one place where > it should be set, when hgatp is initialized. > > Does Xen have some framework to work with VMID? That's all arch-specific, I think. >>>>> I am going to implement, p2m_force_tlb_flush_sync() as: >>>>> static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m) >>>>> { >>>>> ... >>>>> sbi_remote_hfence_gvma(d->dirty_cpumask, 0, 0); >>>>> ... >>>>> } >>>>> >>>>> With such implementation if the guest didn't run on any pCPU(s) yet >>>>> then d->dirty_cpumask is empty, then sbi_remote_hfence_gvma() will do >>>>> nothing >>>>> as hmask will be NULL >>>>> (https://gitlab.com/xen-project/people/olkur/xen/-/blob/staging/xen/arch/riscv/sbi.c?ref_type=heads#L238). >>>>> I am not sure that it is a good idea as I can't find a guarantee in the >>>>> spec >>>>> that TLB will be empty during boot time. >>>> If in doubt, do one global flush while booting. >>> By booting you mean somewhere in continue_new_vcpu()? >> I don't particularly mean any specific place. However, continue_new_vcpu() >> (by its name) isn't involved in bringing up Xen, is it? >> > No, it isn't. By booting here I meant a boot of a guest domain, not Xen > itself. Please don't call this "booting", but "starting of a guest" (or "launching" or some such). When you originally said "booting" I thought RISC-V wouldn't guarantee clean TLBs when being booted, and hence suggested to cover for this by doing a single flush during (Xen) boot. Looks like this may not be needed then, simply because of the misunderstanding. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.