[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] EDAC infomation partially missing
Am 21.01.16 um 17:41 schrieb Jan Beulich: >>>> On 20.01.16 at 16:01, <andreas.pflug@xxxxxx> wrote: >> Initially reported to debian >> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here: >> >> With AMD Opteron 6xxx processors, half of the memory controllers are >> missing from /sys/devices/system/edac/mc >> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual >> MC), other dual-module CPUs might be affected too. >> >> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are >> listed under /sys/devices/system/edac/mc as expected. Same happens, when >> Xen 4.1 is used: all MCs present. >> >> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU >> machine) or mc2/mc3 (dual CPU machine) are present, although the full >> system memory is accessible. Checked versions were 4.1.4 (Debian >> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid) > As already indicated by Ian in that bug, you should supply us with > full kernel and hypervisor logs for both the good and bad cases > (ideally with the same kernel version use in both runs, so that we > can exclude kernel behavior differences). Here are some dmesg excerpts, all performed with Linux 4.1.3. When booting with Xen 4.1.4: AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV 0000:00:18.2 (INTERRUPT) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) When booting with Xen 4.4.1: AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable. EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. Either enable ECC checking or force module loading by setting 'ecc_enable_override'. (Note that use of the override may cause unknown side effects.) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) Apparently Xen4.4 doesn't report the BIOS flag correctly. I added ecc_enable_override=1 to amd64_edac_mod, and then I get EDAC MC: Ver: 3.0.0 AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable. EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. EDAC amd64: Forcing ECC on! EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV 0000:00:18.2 (INTERRUPT) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) This restored both MCs, so the BIOS flag seems to be the culprit. Regards, Andreas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |