[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RFC: MCA/MCE concept
On Wednesday 30 May 2007 17:03:55 Petersson, Mats wrote: > [snip] > > > My feeling is that the hypervisor and dom0 own the hardware > > and as such > > all hardware fault management should reside there. So we should never > > deliver any form of #MC to a domU, nor should a poll of MCA state from > > a domU ever observe valid state (e.g, make the RDMSR return 0). > > So all handling, logging and diagnosis as well as hardware > > response actions > > (such as to deploy an online spare chip-select) are controlled > > in the hypervisor/dom0 combination. That seems a consistent > > model - e.g., > > if a domU is migrated to another system it should not carry the > > diagnosis state of the original system across etc, since that > > belongs with > > the one domain that cannot migrate. > > I agree entirely with this. > > > But that is not to say that (I think at a future phase) domU > > should not > > participate in a higher-level fault management function, at > > the direction > > of the hypervisor/dom0 combo. For example if/when we can isolate an > > uncorrectable error to a single domU we could forward such an event to > > the affected domU if it has registered its ability/interest in such > > events. These won't be in the form of a faked #MC or anything, > > instead they'd be some form of synchronous trap experienced when next > > the affected domU context resumes on CPU. The intelligent > > domU handler > > can then decide whether the domU must panic, whether it could simply > > kill the affected process etc. Those details are clearly > > sketchy, but the > > idea is to up-level the communication to a domU to be more like > > "you're broken" rather than "here's a machine-level hardware error for > > you to interpret and decide what to do with". > > Yes, this makes much more sense than forwarding #MC, as the guest would > have a hard time to actually do anything really useful with this. As far > as I know, most uncorrectable errors are near enough entirely fatal in > most commercial non-Enterprise OS's anyways - e.g. in Windows XP or > Server 2K3, it always ends in a blue-screen - which is hardly any better > than the guest being "humanely euthenazed" by Dom0. > > I take it this would be some sort of hypercall (available through the > regular PV-driver interface for HVM guests) to say "Let me know if I'm > broken - trap on vector X". For short, guests with a PV MCA driver will see a certain event (assuming the event mechanism will be used for the notification) and guests w/o a PV MCA driver will see a "General Protection Fault". Is that right? > -- > Mats > > > Gavin > > -- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |