[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/S3: Fix cpu pool scheduling after suspend/resume



On Tue, Apr 9, 2013 at 10:52 AM, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
> On 09/04/2013 14:17, "Ben Guthro" <benjamin.guthro@xxxxxxxxxx> wrote:
>
>> On Tue, Apr 9, 2013 at 9:03 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>> On 09.04.13 at 14:46, Ben Guthro <benjamin.guthro@xxxxxxxxxx> wrote:
>>>> This review is another S3 scheduler problem with the system_state variable
>>>> introduced with the following changeset:
>>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=269f543ea750ed567d18f2e8
>>>> 19e5d5ce58eda5c5
>>>>
>>>> Specifically, the cpu_callback function that takes the CPU down during
>>>> suspend, and back up during resume.
>>>> We were seeing situations where, after S3, only CPU0 was in cpupool0. Guest
>>>> performance suffered greatly, since all vcpus were only on a single pcpu.
>>>> Guests under high CPU load showed the problem much more quickly than an 
>>>> idle
>>>> guest.
>>>>
>>>> Removing this if condition forces the CPUs to go through the expected
>>>> online/offline state, and be properly scheduled after S3.
>>>
>>> But this doesn't explain _why_ the code block you remove was
>>> wrong. And that would be vital to understand, so we can be
>>> reasonably sure this change won't lead to regressions elsewhere
>>> again.
>>
>> I would argue that there has been so many problems with the original
>> changeset, that the argument should be in the other direction - since
>> this changeset that introduced the system_state variable, nobody has
>> been able to successfully suspend, as has been discussed in multiple
>> threads over the past year.
>
> We could revert all the system_state patches and try Juergen's original
> patch?

Apologies for my delay on responding to this - I have been traveling.



I think that reversion of the original patch should be considered, if
we want to have S3 work.

IMO - Severe regression of a long standing platform (S3) feature for
the introduction of a new one (cpupools) seems like a problem that
should be addressed by the author of the new feature, or that new
feature should be reverted.

Unfortunately, since Konrad's acpi-s3 patches are not upstream, it
makes it difficult to add to the automated tests to catch these early.


Ben


>
>  -- Keir
>
>> What is the reason that this particular callback gets bailed out of,
>> but not others?
>> Previously, the code worked, and went through this code path.
>> Why this one, in particular?
>>
>> We have been systematically removing parts of the system_state
>> changeset, in regard to the S3 path. This is just another one that
>> puts it back to the way it was prior to that changeset (at least the
>> second hunk of it)
>>
>> I'm open to other suggestions, but this was the only path that
>> explained the fact that all of the vcpus would end up on cpu0.
>>
>> Ben
>>
>>>
>>> Jan
>>>
>>>> This also includes a necessary partial change proposed earlier by Tomasz
>>>> Wroblewski here:
>>>> http://lists.xen.org/archives/html/xen-devel/2013-01/msg02206.html
>>>>
>>>> It should also resolve the issues discussed in this thread:
>>>> http://lists.xen.org/archives/html/xen-devel/2012-11/msg01801.html
>>>>
>>>> Signed-off-by: Ben Guthro <benjamin.guthro@xxxxxxxxxx>
>>>> ---
>>>>  xen/common/cpu.c     |    3 +++
>>>>  xen/common/cpupool.c |    5 -----
>>>>  2 files changed, 3 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
>>>> index 630881e..e20868c 100644
>>>> --- a/xen/common/cpu.c
>>>> +++ b/xen/common/cpu.c
>>>> @@ -5,6 +5,7 @@
>>>>  #include <xen/init.h>
>>>>  #include <xen/sched.h>
>>>>  #include <xen/stop_machine.h>
>>>> +#include <xen/sched-if.h>
>>>>
>>>>  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>>>>  #ifndef nr_cpumask_bits
>>>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>>>>              BUG_ON(error == -EBUSY);
>>>>              printk("Error taking CPU%d up: %d\n", cpu, error);
>>>>          }
>>>> +        if (system_state == SYS_STATE_resume)
>>>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>>>      }
>>>>
>>>>      cpumask_clear(&frozen_cpus);
>>>> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
>>>> index 10b10f8..a9653a8 100644
>>>> --- a/xen/common/cpupool.c
>>>> +++ b/xen/common/cpupool.c
>>>> @@ -633,10 +633,6 @@ static int cpu_callback(
>>>>      unsigned int cpu = (unsigned long)hcpu;
>>>>      int rc = 0;
>>>>
>>>> -    if ( (system_state == SYS_STATE_suspend) ||
>>>> -         (system_state == SYS_STATE_resume) )
>>>> -        goto out;
>>>> -
>>>>      switch ( action )
>>>>      {
>>>>      case CPU_DOWN_FAILED:
>>>> @@ -650,7 +646,6 @@ static int cpu_callback(
>>>>          break;
>>>>      }
>>>>
>>>> -out:
>>>>      return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
>>>>  }
>>>>
>>>> --
>>>> 1.7.9.5
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxx
>>>> http://lists.xen.org/xen-devel
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxx
>>> http://lists.xen.org/xen-devel
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.