[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen: add hypercall option to temporarily pin a vcpu



On 26/02/16 11:39, Jan Beulich wrote:
>>>> On 25.02.16 at 17:50, <JGross@xxxxxxxx> wrote:
>> @@ -670,7 +676,13 @@ int cpu_disable_scheduler(unsigned int cpu)
>>              if ( cpumask_empty(&online_affinity) &&
>>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>>              {
>> -                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
>> +                if ( v->affinity_broken )
>> +                {
>> +                    /* The vcpu is temporarily pinned, can't move it. */
>> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>> +                    ret = -EBUSY;
>> +                    continue;
>> +                }
> 
> So far the function can only return 0 or -EAGAIN. By using "continue"
> here you will make it impossible for the caller to reliably determine
> whether possibly both things failed. Despite -EBUSY being a logical
> choice here, I think you'd better use -EAGAIN here too. And it needs
> to be determined whether continuing the loop in this as well as the
> pre-existing cases is actually the right thing to do.

EBUSY vs. EAGAIN: by returning EAGAIN I would signal to Xen tools that
the hypervisor is currently not able to do the desired operation
(especially removing a cpu from a cpupool), but the situation will
change automatically via scheduling. EBUSY will stop retries in Xen
tools and this is want I want here: I can't be sure the situation
will change soon.

Regarding continuation of the loop: I think you are right in the
EBUSY case: I should break out of the loop. I should not do so in the
EAGAIN case as I want to remove as many vcpus from the physical cpu as
possible without returning to the Xen tools in between.

> 
>> @@ -679,6 +691,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>>                      v->affinity_broken = 1;
>>                  }
>>  
>> +                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
> 
> Wouldn't it be even better to make this the "else" to the
> preceding if(), since in the suspend case this is otherwise going
> to be printed for every vCPU not currently running on pCPU0?

Yes, I'll change it.

> 
>> @@ -753,14 +767,22 @@ static int vcpu_set_affinity(
>>      struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
>>  {
>>      spinlock_t *lock;
>> +    int ret = 0;
>>  
>>      lock = vcpu_schedule_lock_irq(v);
>>  
>> -    cpumask_copy(which, affinity);
>> +    if ( v->affinity_broken )
>> +    {
>> +        ret = -EBUSY;
>> +    }
> 
> Unnecessary braces.

Will remove.

> 
>> @@ -979,6 +1001,53 @@ void watchdog_domain_destroy(struct domain *d)
>>          kill_timer(&d->watchdog_timer[i]);
>>  }
>>  
>> +static long do_pin_temp(int cpu)
>> +{
>> +    struct vcpu *v = current;
>> +    spinlock_t *lock;
>> +    long ret = -EINVAL;
>> +
>> +    lock = vcpu_schedule_lock_irq(v);
>> +
>> +    if ( cpu == -1 )
>> +    {
>> +        if ( v->affinity_broken )
>> +        {
>> +            cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
>> +            v->affinity_broken = 0;
>> +            set_bit(_VPF_migrating, &v->pause_flags);
>> +            ret = 0;
>> +        }
>> +    }
>> +    else if ( cpu < nr_cpu_ids && cpu >= 0 )
> 
> Perhaps easier to simply use "cpu < 0" in the first if()?

Okay.

> 
>> +    {
>> +        if ( v->affinity_broken )
>> +        {
>> +            ret = -EBUSY;
>> +        }
>> +        else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
>> +        {
> 
> This is a rather ugly restriction: How would a caller fulfill its job
> when this is not the case?

He can't. We should document that at least on hardware requiring this
functionality it is a bad idea to remove cpu 0 from the cpupool with the
hardware domain.

> 
>> @@ -1088,6 +1157,23 @@ ret_t do_sched_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case SCHEDOP_pin_temp:
>> +    {
>> +        struct sched_pin_temp sched_pin_temp;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&sched_pin_temp, arg, 1) )
>> +            break;
>> +
>> +        ret = xsm_schedop_pin_temp(XSM_PRIV);
>> +        if ( ret )
>> +            break;
>> +
>> +        ret = do_pin_temp(sched_pin_temp.pcpu);
>> +
>> +        break;
>> +    }
> 
> So having come here I still don't see why this is called "temp":
> Nothing enforces this to be a temporary state, and hence the
> sub-op name currently is actively misleading.

I've chosen this name as the old affinity is saved and can (and should)
be recovered later. So it is intended to be temporary.

>> --- a/xen/include/public/sched.h
>> +++ b/xen/include/public/sched.h
>> @@ -118,6 +118,15 @@
>>   * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
>>   */
>>  #define SCHEDOP_watchdog    6
>> +
>> +/*
>> + * Temporarily pin the current vcpu to one physical cpu or undo that 
>> pinning.
>> + * @arg == pointer to sched_pin_temp_t structure.
>> + *
>> + * Setting pcpu to -1 will undo a previous temporary pinning.
>> + * This call is allowed for domains with domain control privilege only.
>> + */
> 
> Why domain control privilege? I'd actually suggest limiting the
> ability to the hardware domain, at once eliminating the need
> for the XSM check.

Sure, I'd be happy to simplify the patch.

> 
>> +struct sched_pin_temp {
>> +    int pcpu;
> 
> Fixed width types only please in the public interface. Also this needs
> an entry in xen/include/xlat.lst, and a consumer of the resulting
> check macro.

Aah, okay.

Thanks for the review,

Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.