[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RFC: MCA/MCE concept
On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote: > >>> "Christoph Egger" <Christoph.Egger@xxxxxxx> 30.05.07 09:45 >>> > > > >On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote: > >> >case I) - Xen reveives a MCE from the CPU > >> > > >> >1) Xen MCE handler figures out if error is an correctable error (CE) > >> > or uncorrectable error (UE) > >> >2a) error == CE: > >> > Xen notifies Dom0 if Dom0 installed an MCA event handler > >> > for statistical purpose > >> >2b) error == UE and UE impacts Xen or Dom0: > >> > >> A very important aspect here is how you want to classify what impact an > >> uncorrectable has - generally, I can see very few situations where you > >> could confine the impact to a sub-portion of the system (i.e. a single > >> domU, dom0, or Xen). The general rule in my opinion must be to halt the > >> system, the question just is how likely it is that you can get a > >> meaningful message out (to screen, serial, or logs) that can help > >> analyze the problem afterwards. If it is somewhat likely, then dom0 > >> should be involved, otherwise Xen should just shut down the system. > > > >Here you can best help out using HW features to handle errors. > >AMD CPUs features online-spare RAM and Chipkill since K8 RevF. > > > >CPUs such as the Sparc features Data Poisoning. That would be the > >most handy technique that can be used here. > > But that assumes the error is recoverable (i.e. no other data got > corrupted). You still didn't clarify how you intend to determine the > impact an uncorrectable error had. I know. I am lacking a sudden inspiration here. That's why I discuss this here before writing code that goes to nowhere. Anyone here with a flash of genius? :-) > >> >3a) DomU is a PV guest: > >> > if DomU installed MCA event handler, it gets notified to perform > >> > self-healing > >> > if DomU did not install MCA event handler, notify Dom0 to do > >> > some operations on DomU (case II) > >> > if neither DomU nor Dom0 did not install MCA event handlers, > >> > then Xen kills DomU > >> >3b) DomU is a HVM guest: > >> > if DomU features a PV driver then behave as in 3a) > >> > >> What significance do pv drivers have here? Or do you mean a pv MCA > >> driver? > > > >Yes, I mean the pv MCA driver. > > > >> > if DomU enabled MCA/MCE via MSR, inject MCE into guest > >> > if DomU did not enable MCA/MCE via MSR, notify Dom0 > >> > to do some operations on DomU (case II) > >> > if neither DomU enabled MCA/MCE nor Dom0 did not install > >> > MCA event handler, Xen kills DomU > >> > >> Injecting an MCE to a hvm guest seems at least questionable. It can't > >> really do anything about it (it doesn't even know the real topology of > >> the system it's running on, so addresses stored in MSRs are meaningless > >> - either you allow them to be read untranslated [in which case the guest > >> cannot make sense of them] or you do translation for the guest [in which > >> case it might make assumptions about co-locality of other nearby pages > >> which will be wrong]). > > > >Yes, Xen should do the translation for the guest. The assumptions must > >be fixed then. I know that's easier said than done. > > Exactly - you are proposing to fix all possible OSes, including > sufficiently old ones. That's impossible. And I can't even see why an OS > intended to run on native hardware would care to try to deal with > virtualization aspects like this. I think, it was not obvious that Xen should not inject failures into DomU that don't feature a fault management. In this case, either Dom0 tells Xen what to do with the DomU or Xen just kills the DomU. <snippet from above> > >> >3a) DomU is a PV guest: .... > >> > if DomU did not install MCA event handler, notify Dom0 to do > >> > some operations on DomU (case II) > >> > if neither DomU nor Dom0 did not install MCA event handlers, > >> > then Xen kills DomU > >> >3b) DomU is a HVM guest: .... > >> > if DomU did not enable MCA/MCE via MSR, notify Dom0 > >> > to do some operations on DomU (case II) > >> > if neither DomU enabled MCA/MCE nor Dom0 did not install > >> > MCA event handler, Xen kills DomU </snippet> Christoph -- AMD Saxony, Dresden Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |