[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] EDAC infomation partially missing



On 16/05/17 10:54, Jan Beulich wrote:
>>>> On 16.05.17 at 05:47, <ehem+debian@xxxxxxx> wrote:
>> On Mon, May 15, 2017 at 02:02:53AM -0600, Jan Beulich wrote:
>>>>>> On 14.05.17 at 00:36, <ehem+debian@xxxxxxx> wrote:
>>>> I haven't yet done as much experimentation as Andreas Pflug has, but I
>>>> can confirm I'm also running into this bug with Xen 4.4.1.
>>>>
>>>> I've only tried Linux kernel 3.16.43, but as Dom0:
>>>>
>>>> EDAC MC: Ver: 3.0.0
>>>> AMD64 EDAC driver v3.4.0
>>>> EDAC amd64: DRAM ECC enabled.
>>>> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to 
>>>> enable.
>>>> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
>>>> load.
>>>> AMD64 EDAC driver v3.4.0
>>>> EDAC amd64: DRAM ECC enabled.
>>>> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to 
>>>> enable.
>>>> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
>>>> load.
>>> Afaict the driver as is simply can't work in a Xen Dom0; it needs
>>> enabling (read: para-virtualizing). I'm actually glad to see it doesn't
>>> load (the worse alternative would be for it to load and then do the
>>> wrong thing or give you a false sense of safety of your data).
>> I'm unsure of how to evaluate the situation.  Since ECC is enabled in the
>> BIOS, data should be safe whether or not the EDAC driver loads.  I
>> /suspect/ the EDAC driver failing to load merely means reportting of ECC
>> errors won't happen.
> "Merely" being relative here: The missing reports mean a false feeling
> of safety, as they may be early indications of later double-bit errors.
>
>>  I suspect the only paravirtualization needed is to
>> map the physical address of the soft|hard errors to which VM's memory
>> range was effected.  What this effects is which VM should panic in case
>> of hard errors.
> Which in turn obviously requires hypervisor interaction. It's not really
> clear to me whether perhaps the driver would better live in the
> hypervisor in the first place for that reason.

The driver should probably live directly in Xen; it needs to program a
number of nothbridge and CPU registers including interrupt information.

For the reporting side of things, it looks like it would require vMCE to
pass on fault information to guests.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.