Xen project Mailing List

Re: [Xen-devel] [PATCH v2 for Xen 4.6 1/4] xen: enabling XL to set per-VCPU parameters of a domain for RTDS scheduler

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Fri, 29 May 2015 14:01:28 +0100

Cc: Chong Li <chong.li@xxxxxxxxx>, Sisu Xi <xisisu@xxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>, Meng Xu <mengxu@xxxxxxxxxxxxx>, Chong Li <lichong659@xxxxxxxxx>, Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>

Delivery-date: Fri, 29 May 2015 13:02:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, 2015-05-27 at 12:41 +0100, Jan Beulich wrote: > >>> On 26.05.15 at 19:18, <lichong659@xxxxxxxxx> wrote: > > On Tue, May 26, 2015 at 4:08 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: > >>>>> On 26.05.15 at 02:05, <lichong659@xxxxxxxxx> wrote: > >>> Add two hypercalls(XEN_DOMCTL_SCHEDOP_getvcpuinfo/putvcpuinfo) to get/set > >>> a > >>> domain's > >>> per-VCPU parameters. Hypercalls are handled by newly added hook > >>> (.adjust_vcpu) in the > >>> scheduler interface. > >>> > >>> Add a new data structure (struct xen_domctl_scheduler_vcpu_op) for > >>> transferring data > >>> between tool and hypervisor. > >>> > >>> Signed-off-by: Chong Li <chong.li@xxxxxxxxx> > >>> Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx> > >>> Signed-off-by: Sisu Xi <xisisu@xxxxxxxxx> > >>> --- > >> > >> Missing brief description of changes in v2 here. > > > > Based on Dario's suggestion in PATCH v1, we think it would be better > > to make the per-vcpu get/set functionalities more general, rather than > > just serving for rtds scheduler. So, the changes in v2 are: > > > > 1) Add a new hypercall, XEN_DOMCTL_scheduler_vcpu_op. Any scheduler > > can use it for per-vcpu get/set. There is a new data structure (struct > > xen_domctl_scheduler_vcpu_op), which helps transferring data (per-vcpu > > params) between tool and hypervisor when the hypercall is invoked. > > > > 2) The new hypercall is handled by function sched_adjust_vcpu (in > > schedule.c), which would call a newly added hook (named as > > .adjust_vcpu, added to the scheduler interface (struct scheduler)). > > > > 3) In RTDS scheduler, .adjust_vcpu hooks a function defined in > > sched_rt.c, named as rt_vcpu_cntl. This function gets/sets the > > per-vcpu params. > > No, that looks to be the description of the patch as a whole, not > one of the delta between v1 and v2 (which is what I was asking for). > > >>> + > >>> + switch ( op->cmd ) > >>> + { > >>> + case XEN_DOMCTL_SCHEDOP_getvcpuinfo: > >>> + spin_lock_irqsave(&prv->lock, flags); > >>> + list_for_each( iter, &sdom->vcpu ) > >>> + { > >>> + svc = list_entry(iter, struct rt_vcpu, sdom_elem); > >>> + vcpuid = svc->vcpu->vcpu_id; > >>> + > >>> + local_sched.budget = svc->budget / MICROSECS(1); > >>> + local_sched.period = svc->period / MICROSECS(1); > >>> + if ( copy_to_guest_offset(op->u.rtds.vcpus, vcpuid, > >> > >> What makes you think that the caller has provided enough slots? > > > > This code works in a similar way as the one for XEN_SYSCTL_numainfo. > > So if there is no enough slot in guest space, en error code is > > returned. > > I don't think I saw any place you returned an error here. The > only error being returned is -EFAULT when the copy-out fails. > Look again at XEN_SYSCTL_numainfo, in particular, at this part: if ( ni->num_nodes < num_nodes ) { ret = -ENOBUFS; i = num_nodes; } else i = 0 Which, as you'll appreciate if you check from where ni->num_nodes and num_nodes come from, if where the boundary checking we're asking for happens. Note how the 'i' variable is used in the rest of the block, and you'll see what we mean, and can do something similar. > >>> + &local_sched, 1) ) > >> > >> You're leaking hypervisor stack contents here, but by reading > >> the structure from guest memory first to honor its vcpuid field > >> (this making "get" match "put") you'd avoid this anyway. > >> > > > > The structure in guest memory has nothing, and it will hold per-vcpu > > params after this XEN_DOMCTL_SCHEDOP_getvcpuinfo hypercall is well > > served. > > Does "leaking hypervisor stack" mean that I need to call > > "local_sched.vcpuid = vcpuid" before copying this local_sched to the > > guest memory? > > Not just that. You'd also need to make sure padding space is > cleared. > Indeed. > But as said, I suppose you want to copy in what the > guest provided anyway, in which case the leak is implicitly > gone. > Chong, you're saying above "The structure in guest memory has nothing". AFAICT, what Jan means when he says "make get match put", is why don't you, for XEN_DOMCTL_SCHEDOP_getvcpuinfo, allow the caller to provide an array with fewer elements than the number of vcpus, and put in the vcpuid field of those elements, the id-s of the vcpus she wants to receive information upon. I also was about to make a similar request, as it makes things more consistent, between get and put. After all, if we think that it is valuable to allow batching (i.e., having an hypercall that returns the parameters of _a_bunch_ of vcpus, not just of one), and that it makes sense to allow that bunch of vcpus to be incomplete and sparse, this is true for both the directions, isn't it? > >>> + } > >>> + hypercall_preempt_check(); > >> > >> ??? > > > > We've talked about this in patch v1 > > (http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01152.html). > > When a domain has too many VCPUs, we need to make sure the spin lock > > does not last for too long. > > Right, but the call above does nothing like that. Please look at other > uses of it to understand what needs to be done here. > This being said, would it make sense to put down a threshold, below which we don't need to check for preemption (nor to ask to reissue the hypercall)? Something like, if we're dealing with a request for the parameters of N (== 16 ? 32 ?) vcpus, it's fine to do them all at once. Above that limit, we just do bunches of N vcpus, and then do a preempt check. What do you think Jan? And what do you think a suitable limit would be? In fact, apart from the fact that it's used incorrectly, I don't think that checking for preemption after _each_ step of the loop make much sense... > >>> --- a/xen/common/schedule.c > >>> +++ b/xen/common/schedule.c > >>> @@ -1104,6 +1104,30 @@ long sched_adjust(struct domain *d, struct > > xen_domctl_scheduler_op *op) > >>> return ret; > >>> } > >>> > >>> +/* Adjust scheduling parameter for the vcpus of a given domain. */ > >>> +long sched_adjust_vcpu( > >>> + struct domain *d, > >>> + struct xen_domctl_scheduler_vcpu_op *op) > >>> +{ > >>> + long ret; > >> > >> I see no reason for this and the function return type to be "long". > > > > The reason is stated in the begining. > > In the beginning of what? I don't see any such, and it would also be > odd for there to be an explanation of why a particular type was chosen. > As Chong he's saying elsewhere, he's moslty following suit. Should we consider a pre-patch making both sched_adjust() and sched_adjust_global() retunr int? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.