[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 8/9] x86/intel_pstate: support the use of intel_pstate in pmstat.c



On 08/10/2015 12:11,  Jan Beulich wrote:
> >>> On 14.09.15 at 04:32, <wei.w.wang@xxxxxxxxx> wrote:
> > --- a/tools/libxc/xc_pm.c
> > +++ b/tools/libxc/xc_pm.c
> > @@ -260,13 +260,14 @@ int xc_get_cpufreq_para(xc_interface *xch, int
> cpuid,
> >      }
> >      else
> >      {
> > -        user_para->cpuinfo_cur_freq = sys_para->cpuinfo_cur_freq;
> > -        user_para->cpuinfo_max_freq = sys_para->cpuinfo_max_freq;
> > -        user_para->cpuinfo_min_freq = sys_para->cpuinfo_min_freq;
> > -        user_para->scaling_cur_freq = sys_para->scaling_cur_freq;
> > -        user_para->scaling_max_freq = sys_para->scaling_max_freq;
> > -        user_para->scaling_min_freq = sys_para->scaling_min_freq;
> > -        user_para->turbo_enabled    = sys_para->turbo_enabled;
> > +        user_para->cpuinfo_cur_freq  = sys_para->cpuinfo_cur_freq;
> > +        user_para->cpuinfo_max_freq  = sys_para->cpuinfo_max_freq;
> > +        user_para->cpuinfo_min_freq  = sys_para->cpuinfo_min_freq;
> > +        user_para->scaling_cur_freq  = sys_para->scaling_cur_freq;
> > +        user_para->scaling_max.pct   = sys_para->scaling_max_perf;
> > +        user_para->scaling_min.pct   = sys_para->scaling_min_perf;
> > +        user_para->scaling_turbo_pct = sys_para->scaling_turbo_pct;
> > +        user_para->turbo_enabled     = sys_para->turbo_enabled;
> 
> You fail to communicate perf_alias here. How will the caller know?

"user_para->perf_alias = sys_para->perf_alias" is added in the next patch 
"tools:...", because the the "perf_alias " field is added to 
xc_get_cpufreq_para there.

> >      op->u.get_para.cpuinfo_cur_freq =
> >          cpufreq_driver->get ? cpufreq_driver->get(op->cpuid) : policy->cur;
> >      op->u.get_para.cpuinfo_max_freq = policy->cpuinfo.max_freq;
> >      op->u.get_para.cpuinfo_min_freq = policy->cpuinfo.min_freq;
> >      op->u.get_para.scaling_cur_freq = policy->cur;
> > -    op->u.get_para.scaling_max_freq = policy->max;
> > -    op->u.get_para.scaling_min_freq = policy->min;
> > +    if ( internal_gov )
> > +    {
> > +        op->u.get_para.scaling_max_perf = limits->max_perf_pct;
> > +        op->u.get_para.scaling_min_perf = limits->min_perf_pct;
> > +        op->u.get_para.scaling_turbo_pct = limits->turbo_pct;
> > +    }
> > +    else
> > +    {
> > +        op->u.get_para.scaling_max_perf = policy->max;
> > +        op->u.get_para.scaling_min_perf = policy->min;
> > +    }
> >
> >      if ( cpufreq_driver->name[0] )
> > +    {
> >          strlcpy(op->u.get_para.scaling_driver,
> >              cpufreq_driver->name, CPUFREQ_NAME_LEN);
> > +        if ( !strncmp(cpufreq_driver->name,
> > +                     "intel_pstate", CPUFREQ_NAME_LEN) )
> > +            op->u.get_para.perf_alias = PERCENTAGE;
> > +        else
> > +            op->u.get_para.perf_alias = FREQUENCY;
> 
> This should be done together with the other things in the if/else right above.

The code here is in a different file from the if/else above. Here in pmstat.c , 
we set the perf_alias ("meaning of data"). In the above if/else(xc_pm.c), we 
pass the value from one end to another.
 
> > @@ -299,16 +357,36 @@ static int get_cpufreq_para(struct
> > xen_sysctl_pm_op *op)  static int set_cpufreq_gov(struct
> > xen_sysctl_pm_op *op)  {
> >      struct cpufreq_policy new_policy, *old_policy;
> > +    struct internal_governor *internal_gov;
> >
> >      old_policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
> >      if ( !old_policy )
> >          return -EINVAL;
> > +    internal_gov = old_policy->internal_gov;
> >
> >      memcpy(&new_policy, old_policy, sizeof(struct cpufreq_policy));
> >
> > -    new_policy.governor = __find_governor(op->u.set_gov.scaling_governor);
> > -    if (new_policy.governor == NULL)
> > -        return -EINVAL;
> > +    if ( internal_gov && internal_gov->cur_gov )
> > +    {
> > +        if ( !strnicmp(op->u.set_gov.scaling_governor,
> > +                       "performance", CPUFREQ_NAME_LEN) )
> > +            internal_gov->cur_gov = INTERNAL_GOV_PERFORMANCE;
> > +        else if ( !strnicmp(op->u.set_gov.scaling_governor,
> > +                       "powersave", CPUFREQ_NAME_LEN) )
> > +            internal_gov->cur_gov = INTERNAL_GOV_POWERSAVE;
> > +        else if ( !strnicmp(op->u.set_gov.scaling_governor,
> > +                       "userspace", CPUFREQ_NAME_LEN) )
> > +            internal_gov->cur_gov = INTERNAL_GOV_USERSPACE;
> > +        else if ( !strnicmp(op->u.set_gov.scaling_governor,
> > +                       "ondemand", CPUFREQ_NAME_LEN) )
> > +            internal_gov->cur_gov = INTERNAL_GOV_ONDEMAND;
> 
> Wouldn't this better be done by the internal governor's code, so it can also 
> for
> itself decide which of the kinds it may not want to support?

I think it should be pmstat.c-'s job here to set " internal_gov->cur_gov", 
which is later passed to the internal governor's implementation code, who then 
decides how to deal with the requested governor in "internal_gov->cur_gov".

> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -297,11 +297,28 @@ typedef struct xen_ondemand xen_ondemand_t;
> >   * same as sysfs file name of native linux
> >   */
> >  #define CPUFREQ_NAME_LEN 16
> > +
> > +enum perf_alias {
> 
> xen_ prefix missing.

This enum will also be used in xc_get_cpufreq_para, should we still add xen_ 
prefix here?

> > +    FREQUENCY  = 0,
> > +    PERCENTAGE = 1
> > +};
> > +
> >  struct xen_get_cpufreq_para {
> >      /* IN/OUT variable */
> >      uint32_t cpu_num;
> >      uint32_t freq_num;
> >      uint32_t gov_num;
> > +    int32_t turbo_enabled;
> > +
> > +    uint32_t cpuinfo_cur_freq;
> > +    uint32_t cpuinfo_max_freq;
> > +    uint32_t cpuinfo_min_freq;
> > +    uint32_t scaling_cur_freq;
> > +
> > +    uint32_t scaling_turbo_pct;
> > +    uint32_t scaling_max_perf;
> > +    uint32_t scaling_min_perf;
> > +    enum perf_alias perf_alias;
> >
> >      /* for all governors */
> >      /* OUT variable */
> > @@ -309,23 +326,13 @@ struct xen_get_cpufreq_para {
> >      XEN_GUEST_HANDLE_64(uint32) scaling_available_frequencies;
> >      XEN_GUEST_HANDLE_64(char)   scaling_available_governors;
> >      char scaling_driver[CPUFREQ_NAME_LEN];
> > -
> > -    uint32_t cpuinfo_cur_freq;
> > -    uint32_t cpuinfo_max_freq;
> > -    uint32_t cpuinfo_min_freq;
> > -    uint32_t scaling_cur_freq;
> > -
> >      char scaling_governor[CPUFREQ_NAME_LEN];
> > -    uint32_t scaling_max_freq;
> > -    uint32_t scaling_min_freq;
> >
> >      /* for specific governor */
> >      union {
> >          struct  xen_userspace userspace;
> >          struct  xen_ondemand ondemand;
> >      } u;
> > -
> > -    int32_t turbo_enabled;
> >  };
> 
> Is all of this re-arrangement really needed? Also can't turbo_enabled and
> scaling_turbo_pct be combined into a single field?

Personally, we should not combine the two. 
turbo_enabled is used by both the old pstate driver and intel_pstate, but 
scaling_turbo_pct is only used in intel_pstate. If we use "scaling_turbo_pct=0" 
to represent "turbo_enabled=0", and "scaling_turbo_pct>0" to represent " 
turbo_enabled=1", then we will need to modify the old driver to use 
scaling_turbo_pct, i.e. changing the old driver to be aware of the "percentage" 
concept, which is proposed in intel_pstate. On the other side, I think keeping 
turbo_enabled and scaling_turbo_pct separated makes the code easier to read.
After re-arranging the layout, we got enough space to keep the two. I think the 
re-arranged layout doesn't look bad.

Best,
Wei

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.