Xen project Mailing List

Re: [Xen-devel] RFC: MCA/MCE concept

From: "Christoph Egger" <Christoph.Egger@xxxxxxx>

Date: Wed, 6 Jun 2007 15:24:33 +0200

Cc: Gavin Maltby <Gavin.Maltby@xxxxxxx>, Keir Fraser <keir@xxxxxxxxxxxxx>

Delivery-date: Wed, 06 Jun 2007 06:32:04 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Wednesday 06 June 2007 14:25:26 Gavin Maltby wrote: > Hi, > > On 06/06/07 12:57, Christoph Egger wrote: > >>>> For the first I've assumed so far that an event channel notification > >>>> of the MCA event will suffice; as long as the hypervisor only polls > >>>> for correctable MCA errors at a low-frequency rate (currently 15s > >>>> interval) there is no danger of spamming that single notification. > >>> > >>> Why polling? > >> > >> Polling for correctable errors, but #MC as usual for others. Setting > >> MCi_CTL bits for correctable errors does not produce a machine check, > >> so polling is the only approach unless one sets additional (and > >> undocumented, certainly for AMD chips) config bits. What I was getting > >> at here is that polling at largish intervals for correctables is > >> the correct approach - trapping for them or polling at a high-frequency > >> is bad because in cases where you have some form of solid correctable > >> error (say a single bad pin in a dimm socket affecting one or two ranks > >> of that dimm but never able to produce a UE) the trap handling and > >> diagnosis software consume the machine and things make little useful > >> forward progress. > > > > I still don't see, why #MC for all kind of errors is bad. > > I'm talking about whether the hypervisor takes a machine check > for an error or polls for it. We do not want #MC for correctable > errors stopping the hypervisor from making progress. And if the > hypervisor poll interval was to small a solid error would again > keep the hypervisor busy producing (mostly/all duplicate) > error telemetry and the diagnosis code in dom0 would burn > cpu cycles, too. > > How errors observed by the hypervisor, be they from #MC or from > a poll, are propogated to the domains is unimportant from this > point of view - e.g., if we decide to take error telemetry > discovered via a poll in the hypervisor and propogate it > to the domain pretending it is undistinguishable from a machine > check that will not hurt or limit the domain processing. > > An untested design I had in mind, unashamedly influenced by what > we do in Solaris, was to have some common memory shared between > hypervisor and domain into which the hypervisor produces > error telemetry and the domain consumes that telemetry. That is the struct vcpu_info in the PUBLIC xen.h. It is accessable in the hypervisor as well as in the guest. > Producing and consuming is lockless using compare-and-swap. > There are two queues in this shared memory - one for uncorrectable > error telemetry and one for correctable error telemetry. When the > domain gets whatever event to notify it of telemetry for processing > it processes the queues; the event would be synchronous for > uncorrectable errors (ie, domain must process the telemetry > right now) or asynchronous in the case of correctable errors > (process when convenient). The separation of CE and UE queues > stops CEs from flooding the more important UE events (you can > always drop CEs if there is no more space, but you can never > drop UEs). So we use the asynchronous event mechanism VIRQ_DOM_EXC to report correctable errors to the Dom0 and the nmi stuff for uncorrectable errors to Dom0 and DomU, right? The fact that VIRQ_DOM_EXC is for Dom0 only doesn't hurt here, since we never report CEs to DomUs. > [cut] > > > After some code reading I found a nmi_pending, nmi_masked and nmi_addr in > > [cut] > > Still chewing on that ... Christoph -- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.