[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 13/13] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy
On 04/07/18 11:16, Jan Beulich wrote: >>>> On 03.07.18 at 22:55, <andrew.cooper3@xxxxxxxxxx> wrote: >> From: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx> >> >> This hypercall allows the toolstack to present one combined CPUID and MSR >> policy for a domain, which can be audited in one go by Xen, which is >> necessary >> for correctness of the auditing. >> >> A stub x86_policies_are_compatible() function is introduced, although at >> present it will always fail the hypercall. >> >> The hypercall ABI allows for update of individual CPUID or MSR entries, so >> begins by duplicating the existing policy (for which a helper is introduced), >> merging the toolstack data, then checking compatibility of the result. > This reads to me as if it was fine for the tool stack to supply only partial > data (or else there would be no need to merge anything). What's the > thinking behind this, rather than requiring complete sets of data to be > supplied? Unless you have an identical build of Xen and libxc, the toolstack won't always provide the same leaves as Xen expects. Such a restriction would massively hamper development. More importantly, with the max_extd_leaf being vendor dependent (which is data earlier in the structure), the toolstack can't reasonably create a dataset which matches those expectations. Also you'll get into problems when migrating from older hosts, and when levelling a policy down with a lower max_leaf. The reason for having an architectural representation was deliberately to prohibit having a hypercall which was a memcpy of an unstable structure in the background, but with this design comes the fact that Xen must not expect to find an exact match of data. > >> One awkard corner case is re-deserialising of the vcpu msrs. The correct fix >> would be to allocate a buffer, copy the MSRs list, then deserialise from >> that, >> but trips the bounds checks in the copy_from_guest() helpers. The compat >> XLAT >> are would work, but would require that we allocate it even for 64bit PV >> guests. > > >> --- a/xen/arch/x86/domctl.c >> +++ b/xen/arch/x86/domctl.c >> @@ -330,6 +330,71 @@ static int update_domain_cpuid_info(struct domain *d, >> return 0; >> } >> >> +static int update_domain_cpumsr_policy(struct domain *d, >> + xen_domctl_cpumsr_policy_t *xdpc) >> +{ >> + struct policy_group new = {}; >> + const struct policy_group *sys = is_pv_domain(d) >> + ? &system_policies[XEN_SYSCTL_cpumsr_policy_pv_max] >> + : &system_policies[XEN_SYSCTL_cpumsr_policy_hvm_max]; >> + struct vcpu *v = d->vcpu[0]; >> + int ret = -ENOMEM; >> + >> + /* Initialise some help identifying auditing errors. */ >> + xdpc->err_leaf = xdpc->err_subleaf = XEN_CPUID_NO_SUBLEAF; >> + xdpc->err_msr_idx = ~0; > I'm having trouble extracting information from the comment. So am I... These are the fields inside the domctl which try to provide some hint as to which piece of data is problematic, rather than leaving the user with simply "somthing is wrong". > >> + /* Start with existing domain's policies */ >> + if ( !(new.cp = xmemdup(d->arch.cpuid)) || >> + !(new.dp = xmemdup(d->arch.msr)) || >> + !(new.vp = xmemdup(v->arch.msr)) ) >> + goto out; >> + >> + /* Merge the toolstack provided data. */ >> + if ( (ret = x86_cpuid_copy_from_buffer( >> + new.cp, xdpc->cpuid_policy, xdpc->nr_leaves, >> + &xdpc->err_leaf, &xdpc->err_subleaf)) ) >> + goto out; >> + >> + if ( (ret = x86_msr_copy_from_buffer( >> + new.dp, new.vp, >> + xdpc->msr_policy, xdpc->nr_msrs, &xdpc->err_msr_idx)) ) >> + goto out; >> + >> + /* Audit the combined dataset. */ >> + ret = x86_policies_are_compatible(sys, &new); >> + if ( ret ) >> + goto out; > I'm afraid I don't follow - where's the merging? All you do is copy the > first so many entries coming from libxc, and using the later so many > entries from the previous policies. How's that going to provide a > complete set, rather than e.g. some duplicate entries and some > missing ones? Consider the case where max_leaf is lower than Xen's maximum. The data provided by the toolstack can be self consistent and complete without matching Xen's exact size of structure. > >> + /* >> + * Audit was successful. Replace existing policies, leaving the old >> + * policies to be freed. >> + */ >> + SWAP(new.cp, d->arch.cpuid); >> + SWAP(new.dp, d->arch.msr); >> + SWAP(new.vp, v->arch.msr); >> + >> + /* Merge the (now audited) vCPU MSRs into every other msr_vcpu_policy. >> */ >> + for ( ; v; v = v->next_in_list ) > This open-coded almost-for_each_domain() doesn't look very nice. ITYM for_each_vcpu() And yes, but for_each_vcpu() is wrong to use here, and we don't have a for_each_vcpu_other_than_0() helper. > >> + { >> + /* XXX - Figure out how to avoid a TOCTOU race here. XLAT area? */ >> + if ( (ret = x86_msr_copy_from_buffer( >> + NULL, v->arch.msr, xdpc->msr_policy, xdpc->nr_msrs, >> NULL)) ) > Why can't you go from vCPU 0's v->arch.msr here, which is the copied-in > (and sanitized) representation already? Also, is it really a good idea to > assume all vCPU-s have the same policies? There are multiple colliding issues which lead to this code, but as several people have pointed out, its probably over the top. First, as to the same policy. This hypercall can currently only be used before the vcpu has started executing. As such, it is setting the init state of the MSRs from the guests point of view, and there is exactly one MSR I'm aware of which has an init value which depends on the core (that being APIC_BASE.BSP which can trivially be handled in Xen). All other MSRs have identical init state AFAICT, and I don't want to create an interface which makes it easy to accidentally end up with wrong values. The re-de-serialising actually came from your suggested usecase to be able to increase the policy at runtime, .e.g. after a microcode update and hypervisor livepatch. In that case, a partial set doesn't want to copy vCPU0's generally R/W MSRs over other vcpus, because that would be a bad thing. However, these two points aren't compatible, so if we accept the "only before the domain has started running" aspect, then it should be safe to copy v0's policy over all other vcpus. Really, the problem here comes from (necessarily) how we will have to implement SGX_LC and the fact that the LC Hash MSRs may be read-only domain-wide state, or read-write per-vcpu state depending on a different MSR setting (MSR_FEAT_CTRL.SGX_LC). I was initially hoping to have only msr_domain_policy poked via this interface, and msr_vcpu_policy poked exclusively via per-vcpu hypercalls. Then again, at least the complexity is visible at this point, rather than at some point in the future when we need to try and retrofit it. > >> @@ -1570,6 +1635,28 @@ long arch_do_domctl( >> domain_unpause(d); >> break; >> >> + case XEN_DOMCTL_set_cpumsr_policy: >> + if ( d == currd || /* no domain_pause() */ >> + d->max_vcpus == 0 ) /* No vcpus yet. */ >> + { >> + ret = -EINVAL; >> + break; >> + } >> + >> + domain_pause(d); >> + >> + if ( d->creation_finished ) >> + ret = -EEXIST; /* No changing once the domain is running. */ >> + else >> + { >> + ret = update_domain_cpumsr_policy(d, &domctl->u.cpumsr_policy); >> + if ( !ret ) /* Copy domctl->u.cpumsr_policy.err_* to guest. */ >> + copyback = true; > x86_cpuid_copy_from_buffer(), for example, sets the err_ fields > only when returning -ERANGE. Is the if() condition inverted? Oops yes. This was part of a blanket "ret > 0" correction I needed to do because the privcmd() ioctl interface isn't sufficiently expressive. > >> --- a/xen/include/public/domctl.h >> +++ b/xen/include/public/domctl.h >> @@ -648,6 +648,12 @@ struct xen_domctl_cpumsr_policy { >> * 'msr_domain_policy' */ >> XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* IN/OUT: */ >> XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy; /* IN/OUT: */ >> + uint32_t err_leaf, err_subleaf; /* OUT, set_policy only. If not ~0, >> + * indicates the leaf/subleaf which >> + * auditing objected to. */ >> + uint32_t err_msr_idx; /* OUT, set_policy only. If not ~0, >> + * indicates the MSR idx which >> + * auditing objected to. */ > Explicit padding again please, with the handler checking it to be > zero. > >> --- a/xen/include/xen/xmalloc.h >> +++ b/xen/include/xen/xmalloc.h >> @@ -13,6 +13,13 @@ >> #define xmalloc(_type) ((_type *)_xmalloc(sizeof(_type), >> __alignof__(_type))) >> #define xzalloc(_type) ((_type *)_xzalloc(sizeof(_type), >> __alignof__(_type))) >> >> +/* Allocate space for a typed object and copy an existing instance. */ >> +#define xmemdup(ptr) \ >> + ({ typeof(*ptr) *n_ = xmalloc(typeof(*ptr)); \ >> + if ( n_ ) \ >> + memcpy(n_, ptr, sizeof(*ptr)); \ >> + n_; }) > Would be nice if this could handle input pointers to const-qualified types. > I vaguely recall having seen a solution to this recently, but I don't recall > where that was or how it looked like. Until then, may I suggest to use > void * instead, despite this opening the risk of type incompatibilities? The only way I'm aware of which might fix the const qualified aspect is _Generic(), but ISTR you can't use typeof() inside. As for xmemdup() specific, no - this will never want const qualified types. The caller is always going to mutate the structure, or they wouldn't have dup'd the object to begin with. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |