[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Monday 02 March 2009 06:51:22 Jiang, Yunhong wrote: > Frank/Christopher, can you please give more comments for it, or you are OK Sorry, for the delay. I'm also busy with other tasks. > with this? For the action reporting mechanism, we will send out a proposal > for review soon. I would like to see interface definition first, which covers all aspects we discussed. > > Thanks > Yunhong Jiang > > Jiang, Yunhong <> wrote: > > Christopher/Frank, thanks for reply very much, see comments below. > > > >> -----Original Message----- > >> From: Frank.Vanderlinden@xxxxxxx [mailto:Frank.Vanderlinden@xxxxxxx] > >> Sent: 2009年2月26日 1:33 To: Christoph Egger > >> Cc: Jiang, Yunhong; Kleen, Andi; > >> xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ke, Liping; Gavin Maltby > >> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > >> > >> Christoph Egger wrote: > >>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote: > >>>> So, Frank/Egger, can I assume followed are consensus currently? > >>>> > >>>> 1) MCE is handled by Xen HV totally, while guest's vMCE handler will > >>>> only works for itself. 2) Xen present a virtual #MC to guest through > >>>> MSR access emulation.(Xen will do the translation if needed). > >>>> 3) Guest's unmodified > >>>> MCE handler will handle the vMCE injected. > >>>> 4) Dom0 will get all log/telemetry through hypercall. > >>>> 5) The action taken by xen will be passed to dom0 through the > >>>> telemetry mechanism. > >>> > >>> Mostly. Regarding 2) I want like to discuss first how to handle errors > >>> impacting multiple contiguous physical pages which are non-contigous > >>> in guest physical space. > >>> > >>> > >>> > >>> And I also want to discuss about how to do recovery actions requiring > >>> PCI access. One example for this is > >>> Shanghai's "L3 Cache Index Disable"-Feature. > >>> Xen delegates PCI config space to Dom0 and > >>> via PCI passthrough partly to DomU. > >>> That means, if registers in PCI config space are independently > >>> accessable by Xen, Dom0 and/or DomU, they can interfere with each > >>> other. Therefore, we need to a) clearly define who handles what and > >>> b) define some rules based on a) > >>> c) discuss how to handle Dom0/DomU going wild > >>> and break the rules defined in b) > >> > >> I also agree on the approach in principle, but would like to see these > >> points addressed. For non-contiguous pages, I suppose Xen > >> could deliver > >> multiple #vMCEs to the guest, split into contiguous parts. The > >> vmce code > >> seems to be set up to be able to do this. > > > > For the contigous pages, I agree with Gavin that such > > contiguous page error should be triggered as multiple #MC and so is ok. > > > > For PCI config space issue, Christoph, can you please share > > more information on it (or provide some document as Frank > > suggested), like is it for CE (Correctable error or > > UC(UnCorrectable error), is it in PCI range or PCI-E range > > (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's > > BDF caculated etc. Followed is some of my understanding. > > > > Firstly, if it is CE, Xen will do nothing and dom0 will take > > recovery action. If it is UC, Xen will take action when all > > CPU is in SoftIRQ context, and dom0 will not take action, so > > it should be ok. > > > > Secondly, in Xen environment, per my understanding, CPU is > > owned by Xen HV, so I'm not sure when dom0 disable L3 cache > > (if it is CE), should Xen be aware or not. That is, should > > dom0 disable the cache directly, or it should user hypercall > > to ask Xen do that. Keir can give us more suggestion. > > > > For item C, currently Xen/dom0 can both access configuration > > space, while domU will do that through PCI_frontend/backend. > > Because PCI backend only cover device assigned to domU, so we > > don't need worry about domU and dom0 should be trusted. > > However, one thing left is, if this range is beyond 0x100 > > (i.e. in pci-e range), we need add mmconfig support in Xen, > > although it can be added simply. > > > > Thanks > > -- Yunhong Jiang > > > >> As for the Shanghai feature: Christoph, are there any documents > >> available on that feature? Yes, our BKDG. > >> What kind of errors are delivered (corrected/correctable)? The error type can be both depending on whether correction via ECC was successful or not. -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |