[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
On 22.06.2023 12:44, Julien Grall wrote: > On 13/01/2023 09:22, Jan Beulich wrote: >> On 13.01.2023 00:20, Julien Grall wrote: >>> On 04/01/2023 10:27, Jan Beulich wrote: >>>> On 23.12.2022 13:22, Julien Grall wrote: >>>>> On 22/12/2022 11:12, Jan Beulich wrote: >>>>>> On 16.12.2022 12:48, Julien Grall wrote: >>>>>>> --- a/xen/arch/x86/x86_64/entry.S >>>>>>> +++ b/xen/arch/x86/x86_64/entry.S >>>>>>> @@ -165,7 +165,24 @@ restore_all_guest: >>>>>>> and %rsi, %rdi >>>>>>> and %r9, %rsi >>>>>>> add %rcx, %rdi >>>>>>> - add %rcx, %rsi >>>>>>> + >>>>>>> + /* >>>>>>> + * Without a direct map, we have to map first before copying. >>>>>>> We only >>>>>>> + * need to map the guest root table but not the per-CPU >>>>>>> root_pgt, >>>>>>> + * because the latter is still a xenheap page. >>>>>>> + */ >>>>>>> + pushq %r9 >>>>>>> + pushq %rdx >>>>>>> + pushq %rax >>>>>>> + pushq %rdi >>>>>>> + mov %rsi, %rdi >>>>>>> + shr $PAGE_SHIFT, %rdi >>>>>>> + callq map_domain_page >>>>>>> + mov %rax, %rsi >>>>>>> + popq %rdi >>>>>>> + /* Stash the pointer for unmapping later. */ >>>>>>> + pushq %rax >>>>>>> + >>>>>>> mov $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx >>>>>>> mov root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), >>>>>>> %r8 >>>>>>> mov %r8, >>>>>>> root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi) >>>>>>> @@ -177,6 +194,14 @@ restore_all_guest: >>>>>>> sub $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \ >>>>>>> ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi >>>>>>> rep movsq >>>>>>> + >>>>>>> + /* Unmap the page. */ >>>>>>> + popq %rdi >>>>>>> + callq unmap_domain_page >>>>>>> + popq %rax >>>>>>> + popq %rdx >>>>>>> + popq %r9 >>>>>> >>>>>> While the PUSH/POP are part of what I dislike here, I think this wants >>>>>> doing differently: Establish a mapping when putting in place a new guest >>>>>> page table, and use the pointer here. This could be a new per-domain >>>>>> mapping, to limit its visibility. >>>>> >>>>> I have looked at a per-domain approach and this looks way more complex >>>>> than the few concise lines here (not mentioning the extra amount of >>>>> memory). >>>> >>>> Yes, I do understand that would be a more intrusive change. >>> >>> I could be persuaded to look at a more intrusive change if there are a >>> good reason to do it. To me, at the moment, it mostly seem a matter of >>> taste. >>> >>> So what would we gain from a perdomain mapping? >> >> Rather than mapping/unmapping once per hypervisor entry/exit, we'd >> map just once per context switch. Plus we'd save ugly/fragile assembly >> code (apart from the push/pop I also dislike C functions being called >> from assembly which aren't really meant to be called this way: While >> these two may indeed be unlikely to ever change, any such change comes >> with the risk of the assembly callers being missed - the compiler >> won't tell you that e.g. argument types/count don't match parameters >> anymore). > > I think I have managed to write what you suggested. I would like to > share to get early feedback before resending the series. > > There are also a couple of TODOs (XXX) in place where I am not sure if > this is correct. Sure, some comments below. But note that this isn't a full review. One remark up front: The CR3 part of the names isn't matching what you map, as it's not the register but the page thar it points to. I'd suggest "rootpt" (or "root_pt") as the respective part of the names instead. > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -509,6 +509,13 @@ void share_xen_page_with_guest(struct page_info > *page, struct domain *d, > spin_unlock(&d->page_alloc_lock); > } > > +#define shadow_cr3_idx(v) \ > + ((v)->vcpu_id >> PAGETABLE_ORDER) > + > +#define pv_shadow_cr3_pte(v) \ > + ((v)->domain->arch.pv.shadow_cr3_l1tab[shadow_cr3_idx(v)] + \ > + ((v)->vcpu_id & (L1_PAGETABLE_ENTRIES - 1))) > + > void make_cr3(struct vcpu *v, mfn_t mfn) > { > struct domain *d = v->domain; > @@ -516,6 +523,18 @@ void make_cr3(struct vcpu *v, mfn_t mfn) > v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT; > if ( is_pv_domain(d) && d->arch.pv.pcid ) > v->arch.cr3 |= get_pcid_bits(v, false); > + > + /* Update the CR3 mapping */ > + if ( is_pv_domain(d) ) > + { > + l1_pgentry_t *pte = pv_shadow_cr3_pte(v); > + > + /* XXX Do we need to call get page first? */ I don't think so. You piggy-back on the reference obtained when the page address is stored in v->arch.cr3. What you need to be sure of though is that there can't be a stale mapping left once that value is replaced. I think the place here is the one central one, but this will want double checking. > + l1e_write(pte, l1e_from_mfn(mfn, __PAGE_HYPERVISOR_RW)); > + /* XXX Can the flush be reduced to the page? */ I think so; any reason you think more needs flushing? I'd rather raise the question whether any flushing is needed at all. Before this mapping can come into use, there necessarily is a CR3 write. See also below. > + /* XXX Do we always call with current? */ I don't think we do. See e.g. arch_set_info_guest() or some of the calls here from shadow code. However, I think when v != current, it is always the case that v is paused. In which case no flushing would be needed at all then, only when v == current. Another question is whether this is the best place to make the mapping. On one hand it is true that the way you do it, the mapping isn't even re-written on each context switch. Otoh having it in write_ptbase() may be the more natural (easier to prove as correct, and that no dangling mappings can be left) place. For example then you'll know that v == current in all cases (avoiding the other code paths, examples of which I gave above). Plus explicit flushing can be omitted, as switch_cr3_cr4() will always flush all non-global mappings. > + flush_tlb_local(); > + } > } > > void write_ptbase(struct vcpu *v) >[...] > --- a/xen/arch/x86/x86_64/entry.S > +++ b/xen/arch/x86/x86_64/entry.S > @@ -165,7 +165,16 @@ restore_all_guest: > and %rsi, %rdi > and %r9, %rsi > add %rcx, %rdi > + > + /* > + * The address in the vCPU cr3 is always mapped in the shadow > + * cr3 virt area. > + */ > + mov VCPU_id(%rbx), %rsi The field is 32 bits, so you need to use %esi here. > + shl $PAGE_SHIFT, %rsi I wonder whether these two wouldn't sensibly be combined to imul $PAGE_SIZE, VCPU_id(%rbx), %esi as the result is guaranteed to fit in 32 bits. A final remark, with no good place to attach it to: The code path above is bypassed when xpti is off for the domain. You may want to avoid all of the setup (and mapping) in that case. This, btw, could be done quite naturally if - as outlined above as an alternative - the mapping occurred in write_ptbase(): The function already distinguishes the two cases. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |