[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: Implement domain runstates
On 23/11/2010 14:23, "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote: > There is a difference: how the various runstates of the vcpus were > correlated. > > So for example, suppose you have 2 vcpus with the following runstates: > v0: running 50%, runnable 25%, blocked 25% > v1: running 50%, runnable 25%, blocked 25% > > That could break down like this: > full run: 25% > concurrency hazard: 50% > blocked: 25% > > Or like this: > full run: 5% > concurrency hazard: 50% > partial run: 40% > blocked: 5% > > Or like this: > full contention: 20% > full run: 50% > concurrency hazard: 5% > blocked: 25% > > To me those tell very different stories about what the VM's workload is > like and what it's getting cpu-wise from Xen. The concurrency hazard %age might be useful for telling you how much Xen scheduling is screwing the VM I suppose. Still I wonder how much of this could typically be deduced from per-vcpu stats. And how much this improves load balancer performance beyond very basic per-domain stats. Overall you can count me as not 100% convinced. But mny own personal opinion on the pros/cons of this particular toolstack interface is only part of my argument... >> Also, I would never take a patch for a new control interface >> without the thing that uses it being supplied at the same time; Otherwise >> you can just as well maintain it out of tree. > > Would "the thing that uses it" be a libxc call, or an actual program > using the data (like xentop)? > > FWIW, xapi uses this call in XenServer (just for customer reporting). > We could carry it out-of-tree, but we're trying to minimize the > patchqueue. But if modifying xentop to be able to report these was a > requisite for acceptance, I'd make an effort. :-) Well, actually what we care about is #users-who-care. Plumbing into xentop to satisfy an acceptance criteria is probably misguided. Reducing patch-queue length just to make a patch-queue short is also probably misguided. I think this patch can live in the XCP patchqueue quite happily. If the feature proves itself and others want it as a more general facility, for other toolstacks or whatever, it can be considered for upstream in due course. There's no rush. -- Keir > -George > >> >> -- Keir >> >> On 23/11/2010 11:26, "George Dunlap"<george.dunlap@xxxxxxxxxxxxx> wrote: >> >>> To help simplify the load configuration of servers, I developed the >>> idea of a "domain runstate". The "runstate" of a domain is defined >>> by the runstate of its constituent vcpus. The "states" are defined >>> as follows: >>> >>> Blocked: All vcpus blocked >>> Partial run: Some vcpus running, some blocked >>> Partial contention: Some vcpus waiting for the cpu, some blocked >>> Full run: All vcpus running >>> Concurrency hazard: Some vcpus running, some waiting for the cpu >>> Full contention: All vcpus waiting for the cpu >>> >>> The idea is that by looking at the amount of time in each state >>> over the last unit of time, an administrator (or an automated >>> load-balancing program) can determine whether their vcpu and/or >>> domain configuration needs to be tweaked. For example: >>> >>> If a VM spends the majority of its time in "full run", it may >>> not have enough vcpus to do its work. >>> >>> If a VM spends a large amount of time in full contention, the system >>> is probably too busy, and some workloads should be moved onto other >>> servers. >>> >>> If a VM spends a large amount of time in concurrency hazard, it means >>> that the VM may be wasting a lot of time in spinlocks and other >>> synchronization (like cache misses). Either work should be moved off >>> the machine, or the number of vcpus assigned to the VM should be >>> reduced. >>> >>> The state is protected by a lock, but to avoid tricky deadlock >>> issues, if the lock cannot be acquired, the state is simply not >>> updated. Number of lost state updates is recorded. >>> >>> Signed-off-by: George Dunlap<george.dunlap@xxxxxxxxxxxxx> >>> >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c >>> --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000 >>> @@ -238,6 +238,7 @@ >>> spin_lock_init_prof(d, domain_lock); >>> spin_lock_init_prof(d, page_alloc_lock); >>> spin_lock_init(&d->hypercall_deadlock_mutex); >>> + spin_lock_init(&d->runstate_lock); >>> INIT_PAGE_LIST_HEAD(&d->page_list); >>> INIT_PAGE_LIST_HEAD(&d->xenpage_list); >>> >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c >>> --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000 >>> @@ -970,6 +970,24 @@ >>> } >>> break; >>> >>> + case XEN_DOMCTL_get_runstate_info: >>> + { >>> + struct domain *d; >>> + >>> + ret = -ESRCH; >>> + d = rcu_lock_domain_by_id(op->domain); >>> + if ( d != NULL ) >>> + { >>> + domain_runstate_get(d,&op->u.domain_runstate); >>> + ret = 0; >>> + >>> + rcu_unlock_domain(d); >>> + >>> + if ( copy_to_guest(u_domctl, op, 1) ) >>> + ret = -EFAULT; >>> + } >>> + break; >>> + } >>> default: >>> ret = arch_do_domctl(op, u_domctl); >>> break; >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c >>> --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000 >>> @@ -135,10 +135,31 @@ >>> } >>> } >>> >>> +/* Used to quickly map the vcpu runstate mask to a domain runstate */ >>> +static int mask_to_state[] = { >>> + /* 000: Nothing in any runstate. Should never happen. */ >>> + -1, >>> + /* 001: All running */ >>> + DOMAIN_RUNSTATE_full_run, >>> + /* 010: All runnable */ >>> + DOMAIN_RUNSTATE_full_contention, >>> + /* 011: Some running, some runnable */ >>> + DOMAIN_RUNSTATE_concurrency_hazard, >>> + /* 100: All blocked / offline */ >>> + DOMAIN_RUNSTATE_blocked, >>> + /* 101: Some running, some blocked / offline */ >>> + DOMAIN_RUNSTATE_partial_run, >>> + /* 110: Some blocked / offline, some runnable */ >>> + DOMAIN_RUNSTATE_partial_contention, >>> + /* 111: Some in every state. Mixed running + runnable is most >>> important. >>> */ >>> + DOMAIN_RUNSTATE_concurrency_hazard >>> +}; >>> + >>> static inline void vcpu_runstate_change( >>> struct vcpu *v, int new_state, s_time_t new_entry_time) >>> { >>> s_time_t delta; >>> + struct domain *d = v->domain; >>> >>> ASSERT(v->runstate.state != new_state); >>> >>> ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock)); >>> @@ -155,6 +176,37 @@ >>> } >>> >>> v->runstate.state = new_state; >>> + >>> + /* Update domain runstate */ >>> + if(spin_trylock(&d->runstate_lock)) { >>> + unsigned mask=0; >>> + struct vcpu *ov; >>> + >>> + BUG_ON(d->runstate.state> DOMAIN_RUNSTATE_partial_contention); >>> + >>> + d->runstate.time[d->runstate.state] += >>> + (new_entry_time - d->runstate.state_entry_time); >>> + d->runstate.state_entry_time = new_entry_time; >>> + >>> + /* Determine new runstate. First, see what states we have */ >>> + for_each_vcpu(d, ov) { >>> + /* Don't count vcpus that have beent taken offline by the guest >>> */ >>> + if(! (ov->runstate.state == RUNSTATE_offline >>> +&& test_bit(_VPF_down,&ov->pause_flags)) ) >>> + mask |= (1<< ov->runstate.state); >>> + } >>> + >>> + BUG_ON(mask == 0); >>> + >>> + /* Offline& blocked are the same */ >>> + mask |= ((1<< RUNSTATE_offline)& mask)>> 1; >>> + >>> + d->runstate.state = mask_to_state[mask&0x7]; >>> + >>> + spin_unlock(&d->runstate_lock); >>> + } else { >>> + atomic_inc(&d->runstate_missed_changes); >>> + } >>> } >>> >>> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info >>> *runstate) >>> @@ -173,6 +225,18 @@ >>> vcpu_schedule_unlock_irq(v); >>> } >>> >>> +void domain_runstate_get(struct domain *d, >>> + domain_runstate_info_t *runstate) >>> +{ >>> + spin_lock(&d->runstate_lock); >>> + >>> + memcpy(runstate,&d->runstate, sizeof(*runstate)); >>> + runstate->time[d->runstate.state] += NOW() - >>> runstate->state_entry_time; >>> + runstate->missed_changes = atomic_read(&d->runstate_missed_changes); >>> + >>> + spin_unlock(&d->runstate_lock); >>> +} >>> + >>> uint64_t get_cpu_idle_time(unsigned int cpu) >>> { >>> struct vcpu_runstate_info state; >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h >>> --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000 >>> @@ -806,6 +806,46 @@ >>> DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t); >>> #endif >>> >>> +/* >>> + * Return information about the state and running time of a domain. >>> + * The "domain runstate" is based on the runstates of all the vcpus of the >>> + * domain (see below). >>> + * @extra_arg == pointer to domain_runstate_info structure. >>> + */ >>> +struct xen_domctl_runstate_info { >>> + /* Domain's's current state (DOMAIN_RUNSTATE_*). */ >>> + uint32_t state; >>> + /* Number of times we missed an update due to contention */ >>> + uint32_t missed_changes; >>> + /* When was current state entered (system time, ns)? */ >>> + uint64_t state_entry_time; >>> + /* >>> + * Time spent in each RUNSTATE_* (ns). The sum of these times is >>> + * NOT guaranteed not to drift from system time. >>> + */ >>> + uint64_t time[6]; >>> +}; >>> +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t; >>> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t); >>> + >>> +/* All vcpus are running */ >>> +#define DOMAIN_RUNSTATE_full_run 0 >>> + >>> +/* All vcpus are runnable (i.e., waiting for cpu) */ >>> +#define DOMAIN_RUNSTATE_full_contention 1 >>> + >>> +/* Some vcpus are running, some are runnable */ >>> +#define DOMAIN_RUNSTATE_concurrency_hazard 2 >>> + >>> +/* All vcpus are blocked / offline */ >>> +#define DOMAIN_RUNSTATE_blocked 3 >>> + >>> +/* Some vpcus are running, some are blocked */ >>> +#define DOMAIN_RUNSTATE_partial_run 4 >>> + >>> +/* Some vcpus are runnable, some are blocked */ >>> +#define DOMAIN_RUNSTATE_partial_contention 5 >>> + >>> struct xen_domctl { >>> uint32_t cmd; >>> #define XEN_DOMCTL_createdomain 1 >>> @@ -868,6 +908,7 @@ >>> #define XEN_DOMCTL_getpageframeinfo3 61 >>> #define XEN_DOMCTL_setvcpuextstate 62 >>> #define XEN_DOMCTL_getvcpuextstate 63 >>> +#define XEN_DOMCTL_get_runstate_info 64 >>> #define XEN_DOMCTL_gdbsx_guestmemio 1000 >>> #define XEN_DOMCTL_gdbsx_pausevcpu 1001 >>> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 >>> @@ -923,6 +964,7 @@ >>> struct xen_domctl_gdbsx_memio gdbsx_guest_memio; >>> struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; >>> struct xen_domctl_gdbsx_domstatus gdbsx_domstatus; >>> + struct xen_domctl_runstate_info domain_runstate; >>> uint8_t pad[128]; >>> } u; >>> }; >>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h >>> --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000 >>> +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000 >>> @@ -200,6 +200,8 @@ >>> int xen_port; >>> }; >>> >>> +typedef struct xen_domctl_runstate_info domain_runstate_info_t; >>> + >>> struct domain >>> { >>> domid_t domain_id; >>> @@ -332,6 +334,11 @@ >>> nodemask_t node_affinity; >>> unsigned int last_alloc_node; >>> spinlock_t node_affinity_lock; >>> + >>> + /* Domain runstate tracking */ >>> + spinlock_t runstate_lock; >>> + atomic_t runstate_missed_changes; >>> + domain_runstate_info_t runstate; >>> }; >>> >>> struct domain_setup_info >>> @@ -614,6 +621,7 @@ >>> int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity); >>> >>> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info >>> *runstate); >>> +void domain_runstate_get(struct domain *d, domain_runstate_info_t >>> *runstate); >>> uint64_t get_cpu_idle_time(unsigned int cpu); >>> >>> /* >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>> http://lists.xensource.com/xen-devel >> >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |