|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Re: [Xen-devel] RFC: MCA/MCE concept
Hi, Apologies for the screwy quoting below - I did not receive the first half of this thread so it's been forwarded to me.
The greatest rewards here are in syndrome/row/column/bank analysis of the error stream. Where something like a bad pin produces tonnes of CEs they are always on the same bit and your chance of a UE is that of a random radiation type CE colliding within the set of ECC checkwords being undermined by that pin - not very high. On the other hand if we're seeing repeated distinct syndromes from the same chip-select (or chip-select in a pair) then there is a good chance they could collide "soon" - our data is that this combination predicts a UE within hours to a few days. If you have row/column/bank decoding you can also perform further analysis of the error source and assess the chances of a collision that would produce a UE. That example has DIMM memory in mind, but similar approaches apply to cache memory where it is ECC protected and so on.
As above, some predictors can give you hours to a few days warning of a UE. Gavin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |