[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x



On 01.04.2013 15:53, Ben Guthro wrote:
> On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> (XEN) Restoring affinity for d2v3
>> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
>> sched_credit.c:481
> 
> 
> I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
> http://markmail.org/message/llj3oyhgjzvw3t23
> 
> 
> Specifically, I think you need this bit:
> 
> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
> index 630881e..e20868c 100644
> --- a/xen/common/cpu.c
> +++ b/xen/common/cpu.c
> @@ -5,6 +5,7 @@
>  #include <xen/init.h>
>  #include <xen/sched.h>
>  #include <xen/stop_machine.h>
> +#include <xen/sched-if.h>
> 
>  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>  #ifndef nr_cpumask_bits
> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>              BUG_ON(error == -EBUSY);
>              printk("Error taking CPU%d up: %d\n", cpu, error);
>          }
> +        if (system_state == SYS_STATE_resume)
> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>      }
> 
>      cpumask_clear(&frozen_cpus);
> 

Indeed, this makes things better, but still not ideal.
Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more
preferred than others (xl vcpu-list). For example if I start 4 busy loops in
dom0, I got (even after some time):
[user@dom0 ~]$ xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
dom0                                 0     0    0   r--      98.5  any cpu
dom0                                 0     1    0   ---     181.3  any cpu
dom0                                 0     2    2   r--     262.4  any cpu
dom0                                 0     3    3   r--     230.8  any cpu
netvm                                1     0    0   -b-      18.4  any cpu
netvm                                1     1    0   -b-       9.1  any cpu
netvm                                1     2    0   -b-       7.1  any cpu
netvm                                1     3    0   -b-       5.4  any cpu
firewallvm                           2     0    0   -b-      10.7  any cpu
firewallvm                           2     1    0   -b-       3.0  any cpu
firewallvm                           2     2    0   -b-       2.5  any cpu
firewallvm                           2     3    3   -b-       3.6  any cpu

If I remove some CPU from Pool-0 and re-add it, things back to normal for this
particular CPU (so I got two equally used CPUs) - to fully restore system I
must remove all but CPU0 from Pool-0 and add it again.

Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1.
This probably could be fixed by your "xen: Re-upload processor PM data to
hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps
here). But I don't think it is a right way. It isn't necessary on other
systems (with somehow older hardware). It must be something missing on resume
path. The question is what...

Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and check
if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?).
Unfortunately I don't know x86 details so good to follow that code...

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.