[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
On Fri, Mar 22, 2013 at 04:34:11PM +0100, Marek Marczykowski wrote: > On 15.03.2013 14:02, Konrad Rzeszutek Wilk wrote: > > On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote: > >> Hi, > >> > >> I've still have problems with ACPI(?) on Xen. After some system startup or > >> resume CPU temperature goes high although all domUs (and dom0) are idle. On > >> "good" system startup it is about 50-55C, on "bad" - above 67C (most time > >> above 70C). I've noticed difference in C-states repored by Xen (attached > >> files). On "bad" startups in addition suspend doesn't work - system > >> restarts > >> during suspend (still didn't managed to get console messages - I don't have > >> serial port on this system). Note that sometimes system boots fine ("good" > >> state), but problem occurs after some suspend/resume cycles. Some time ago > >> I've got other symptoms: only CPU0 was used - for all VCPUs (according to > >> xl > >> vcpu-list). Maybe it is related? > >> > >> Hardware: Dell Latitude E6420 > >> CPU: Intel i5-2520M > >> > >> Software: > >> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve > >> picking > >> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state > >> variable." > >> But the same problem on vanilla xen 4.1.2. > >> > >> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer > >> (but still occurs). > >> Kernel config: > >> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD > >> I've tried some bisect from 3.7.4 to 3.7.6, but without success because > >> problem isn't 100% reproducible. > >> > >> Any ideas? > > > > That C-states difference is important. The SYSIO part on your box means > > that the > > CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving > > friendly. > > > > Looking at this: > >> (XEN) no cpu_id for acpi_id 5 > >> (XEN) no cpu_id for acpi_id 6 > >> (XEN) no cpu_id for acpi_id 7 > >> (XEN) no cpu_id for acpi_id 8 > > > > .. means that xen-acpi-processor was trying to probe for the ACPI IDs of the > > the other CPUs that the machine theoritcally can support. That means it got > > the ACPI information for the first four CPUs (which is good). > > > > You can as the first step in trying to figure this out, add #define DEBUG 1 > > in xen-acpi-processor.c right before any of the #includes. And also boot > > Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the > > xen-acpi-processor uploaded (And if it did it for all of the vCPUS). > > > > If both bootups show that we do upload the C-states for all the CPUs but > > they > > vary that means digging a bit deeper in the ACPI code. Specifically in > > acpi_processor_get_power_info_cst and seeing if it hits any of the > > 'continue'. > > > > Then I would say take also the DSDT for both bootups and compare them. It > > might > > be that the BIOS is using a scratch register at reboot to construct the > > C-states > > and somehow it ends up being corrupted. Which means that on the next warm > > reboot > > the C-states has bogus data. This does show up in the field :-( > > Finally I've found some time for further debugging this. And it looks like > some deeper ACPI code problem... > > I've switched to 3.8.4, on which problem is much easier to reproduce (almost > every startup). > > On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU > _pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In > this case suspend (or shutdown) always ends up with reset. This is you booting the machine from a cold-state or a warm one? There are some BIOSes out there that I know that use the scratchpad registers in IOH (so depending on the platform that can be 0:0e.1 , Reg 0x84). If Xen or Linux touch it then the P-states and C-states that the BIOS generates are buggy. But that is not the case here - you are saying that the DSDT after disassembling (so cat /sys/firmware/acpi/tables/DSDT, or SSDT* and the iasl -d on them), the _PSD, _PSS, and _PCT look the same? You could also look at the FACP table and see if they are different. > > On good one xen-acpi-processor got C1-C3 states for each CPU, then suspend > succeeded, but after resume CPU0 had C1-C3, but others only C1. Reloading > xen-acpi-processor (rmmod -f...) fixes this (according to xl debug-key c), but > still temperature keep high. Regardless of xen-acpi-processor reloading, next > suspend always fails. If you reload, and look at the runqeueus, are all of them using the ACPI idler or the default one? > > Not sure how C-states can be related to S3 suspend, but perhaps something more > general with ACPI is wrong? This reminds me of something. I recall a long long time ago seeing something like this.... Completly forgot about this until now. The difference was whether the Xen's cpu_idle as running a) the acpi_idle (so using the different C-states), or b) the default one (so just using HLT). With the b), during resume it would get half-way through (http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard. Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html And it looks Kevin's recommendation was use the a) case with max_cstates=1 to narrow it down. > > Each time DSDT (get from /sys/firmware/acpi/tables) is exactly the same. > > -- > Best Regards / Pozdrawiam, > Marek Marczykowski > Invisible Things Lab > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |