[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
Hello Jan, On 07.06.19 17:23, Jan Beulich wrote: On 24.05.19 at 20:12, <andrii.anisov@xxxxxxxxx> wrote:From: Andrii Anisov <andrii_anisov@xxxxxxxx> Existing interface to register runstate are with its virtual address is prone to issues which became more obvious with KPTI enablement in guests. The nature of those issues is the fact that the guest could be interrupted by the hypervisor at any time, and there is no guarantee to have the registered virtual address translated with the currently available guest's page tables. Before the KPTI such a situation was possible in case the guest is caught in the middle of PT processing (e.g. superpage shattering). With the KPTI this happens also when the guest runs userspace, so has a pretty high probability.Except when there's no need for KPTI in the guest in the first place, as is the case for x86-64 PV guests. I think this is worthwhile clarifying.So it was agreed to register runstate with the guest's physical address so that its mapping is permanent from the hypervisor point of view. [1] The hypercall employs the same vcpu_register_runstate_memory_area structure for the interface, but requires a registered area to not cross a page boundary. [1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html Signed-off-by: Andrii Anisov <andrii_anisov@xxxxxxxx>I would have really hoped that you would outline the intended interaction between both interfaces, I'd suppose no specific interaction between interfaces. I hardly imagine realistic use-cases where this might be needed. such that while reviewing one can judge whether you're actually matching the goal. I think it is actually mandatory to make explicit in the public header what level of mixing is permitted, what is notpermitted, and what mix of requests is simply undefined.> In particular, while we did work out during prior discussions that somelevel of mixing has to be permitted, I'm unconvinced that arbitrary mixing has to be allowed. For example, switching from one model to another could be permitted only with just a single active vCPU. This might allow to do away again with the runstate_in_use field you add. Well, my point here is to left it as is, maybe add more documentation. If one likes shooting his leg, we should only care about avoiding ricochet harms hypervisor or other guests. If you disagree, please suggest your interaction model, I'll be happy to implement it. --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -274,17 +274,15 @@ static void ctxt_switch_to(struct vcpu *n) virt_timer_restore(n); }-/* Update per-VCPU guest runstate shared memory area (if registered). */-static void update_runstate_area(struct vcpu *v) +static void update_runstate_by_gvaddr(struct vcpu *v) { void __user *guest_handle = NULL;- if ( guest_handle_is_null(runstate_guest(v)) )- return; + ASSERT(!guest_handle_is_null(runstate_guest_virt(v)));if ( VM_ASSIST(v->domain, runstate_update_flag) ){ - guest_handle = &v->runstate_guest.p->state_entry_time + 1; + guest_handle = &v->runstate_guest.virt.p->state_entry_time + 1; guest_handle--; v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; __raw_copy_to_guest(guest_handle, @@ -292,7 +290,7 @@ static void update_runstate_area(struct vcpu *v) smp_wmb(); }- __copy_to_guest(runstate_guest(v), &v->runstate, 1);+ __copy_to_guest(runstate_guest_virt(v), &v->runstate, 1);In a prior version you did the mechanical part of adjusting the VA-based code in a prereq patch, aiding review. Is there a particular reason you folded everything into one patch now? I silently followed suggestion from George [1]. Any objections? @@ -303,6 +301,53 @@ static void update_runstate_area(struct vcpu *v) } }+static void update_runstate_by_gpaddr(struct vcpu *v)+{ + struct vcpu_runstate_info *runstate = + (struct vcpu_runstate_info *)v->runstate_guest.phys;What's the cast for here?@@ -1614,6 +1613,92 @@ bool update_runstate_area(struct vcpu *v) return rc; }+static bool update_runstate_by_gpaddr_native(struct vcpu *v)+{ + struct vcpu_runstate_info *runstate = + (struct vcpu_runstate_info *)v->runstate_guest.phys; + + ASSERT(runstate != NULL); + + if ( VM_ASSIST(v->domain, runstate_update_flag) ) + { + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; + smp_wmb(); + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; + } + + memcpy(v->runstate_guest.phys, &v->runstate, sizeof(v->runstate)); + + if ( VM_ASSIST(v->domain, runstate_update_flag) ) + { + runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; + smp_wmb(); + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; + } + + return true; +}I can't help thinking that this matches the Arm code. Can common things please be put in common code? I may be asking too much, but if the pre-existing implementations are similar enough (I didn't check) perhaps they could be folded in a first step, too? The problem thought to be the interface: on x86 update_runstate_area() returns bool, but on ARM it is void. But for the common function update_runstate_by_gpaddr_~native~() it would not be a problem, because we will return proper bool from the refactored update_runstate_area(). Thank you for the point. +static bool update_runstate_by_gpaddr_compat(struct vcpu *v) +{ + struct compat_vcpu_runstate_info *runstate = + (struct compat_vcpu_runstate_info *)v->runstate_guest.phys; + + ASSERT(runstate != NULL); + + if ( VM_ASSIST(v->domain, runstate_update_flag) ) + { + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; + smp_wmb(); + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; + } + + { + struct compat_vcpu_runstate_info info; + XLAT_vcpu_runstate_info(&info, &v->runstate); + memcpy(v->runstate_guest.phys, &info, sizeof(info)); + } + else + memcpy(v->runstate_guest.phys, &v->runstate, sizeof(v->runstate));This "else" does not seem to be paired with an if(). Does this build at all? This particular - not! And it is really strange, I remember I checked patch compilation for x86. Looking through git reflog to realize at what amend it became broken. But also I do not completely understand the meaning of "_compat" and if it should be supported here? --- a/xen/arch/x86/x86_64/domain.c +++ b/xen/arch/x86/x86_64/domain.c @@ -12,6 +12,8 @@ CHECK_vcpu_get_physid; #undef xen_vcpu_get_physid+extern void discard_runstate_area(struct vcpu *v);No, this is not allowed. The declaration must be visible to both consumer and producer, so that when either side is changed things won't build until the other side gets changed too.@@ -35,8 +37,16 @@ arch_compat_vcpu_op( !compat_handle_okay(area.addr.h, 1) ) break;+ while( xchg(&v->runstate_in_use, 1) == 0);At the very least such loops want a cpu_relax() in their bodies. But this being on a hypercall path - are there theoretical guarantees that a guest can't abuse this to lock up a CPU? Furthermore I don't understand how this is supposed to work in the first place. The first xchg() will store 1, no matter what. Thus on the second iteration you'll find the wanted value unless the other side stored 0. Yet the other side is a simple xchg() too. Hence its first attempt would fail, but its second attempt (which we didn't exit the critical region here yet) would succeed. Also - style. Will leave this part for locking issue discussion. --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -698,6 +698,74 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, struct domain **d) return 0; }+static void unmap_runstate_area(struct vcpu *v)+{ + mfn_t mfn; + + if ( ! v->runstate_guest.phys )Stray blank after ! . OK. + return; + + mfn = domain_page_map_to_mfn(v->runstate_guest.phys); + + unmap_domain_page_global((void *) + ((unsigned long)v->runstate_guest.phys & + PAGE_MASK)); + + v->runstate_guest.phys = NULL;I think you would better store NULL before unmapping. Do you mean using local variable to pass address to unmap_domain_page_global(), and setting v->runstate_guest.phys to NULL prior to unmap? @@ -734,7 +802,10 @@ int domain_kill(struct domain *d) if ( cpupool_move_domain(d, cpupool0) ) return -ERESTART; for_each_vcpu ( d, v ) + { + discard_runstate_area_locked(v); unmap_vcpu_info(v); + }What is the "locked" aspect here about? The guest itself is dead (i.e. fully paused) at this point, isn't it? And it won't ever be unpaused again. You are right. All vcpus here are stopped. We can discard runstate area without locking. --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -163,17 +163,31 @@ struct vcpu void *sched_priv; /* scheduler-specific data */struct vcpu_runstate_info runstate;+ + enum { + RUNSTATE_NONE = 0, + RUNSTATE_PADDR = 1, + RUNSTATE_VADDR = 2, + } runstate_guest_type; + + unsigned long runstate_in_use;Why "unsigned long"? Isn't a bool all you need? Bool should be enough. But it seems we will have a lock here. Also these would now all want to be grouped in a sub-structure named "runstate", rather than having "runstate_" prefixes. Member `runstate` has already a type of `struct vcpu_runstate_info` which is an interface type. `runstate_guest` is a union. I'd not like moving `runstate_guest` union into another substructure. Because we would have long lines like `v->struct.runstate_guest.virt.p->state_entry_time`. + void* phys;Bad ordering between * and the blanks. There ought to be a blank ahead of the *, and I don't see why you need any after it. OK. [1] https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg00599.html -- Sincerely, Andrii Anisov. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |