[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN



On Monday 02 March 2009 06:51:22 Jiang, Yunhong wrote:
> Frank/Christopher, can you please give more comments for it, or you are OK

Sorry, for the delay. I'm also busy with other tasks.

> with this? For the action reporting mechanism, we will send out a proposal
> for review soon.

I would like to see interface definition first, which covers all aspects
we discussed.



>
> Thanks
> Yunhong Jiang
>
> Jiang, Yunhong <> wrote:
> > Christopher/Frank, thanks for reply very much, see comments below.
> >
> >> -----Original Message-----
> >> From: Frank.Vanderlinden@xxxxxxx [mailto:Frank.Vanderlinden@xxxxxxx]
> >> Sent: 2009年2月26日 1:33 To: Christoph Egger
> >> Cc: Jiang, Yunhong; Kleen, Andi;
> >> xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ke, Liping; Gavin Maltby
> >> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> >>
> >> Christoph Egger wrote:
> >>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
> >>>> So, Frank/Egger, can I assume followed are consensus currently?
> >>>>
> >>>> 1) MCE is handled by Xen HV totally, while guest's vMCE handler will
> >>>> only works for itself. 2) Xen present a virtual #MC to guest through
> >>>> MSR access emulation.(Xen will do the translation if needed).
> >>>> 3) Guest's unmodified
> >>>> MCE handler will handle the vMCE injected.
> >>>> 4) Dom0 will get all log/telemetry through hypercall.
> >>>> 5) The action taken by xen will be passed to dom0 through the
> >>>> telemetry mechanism.
> >>>
> >>> Mostly. Regarding 2) I want like to discuss first how to handle errors
> >>> impacting multiple contiguous physical pages which are non-contigous
> >>> in guest physical space.
> >>>
> >>>
> >>>
> >>> And I also want to discuss about how to do recovery actions requiring
> >>> PCI access. One example for this is
> >>> Shanghai's "L3 Cache Index Disable"-Feature.
> >>> Xen delegates PCI config space to Dom0 and
> >>> via PCI passthrough partly to DomU.
> >>> That means, if registers in PCI config space are independently
> >>> accessable by Xen, Dom0 and/or DomU, they can interfere with each
> >>> other. Therefore, we need to a) clearly define who handles what and
> >>> b) define some rules based on a)
> >>> c) discuss how to handle Dom0/DomU going wild
> >>>     and break the rules defined in b)
> >>
> >> I also agree on the approach in principle, but would like to see these
> >> points addressed. For non-contiguous pages, I suppose Xen
> >> could deliver
> >> multiple #vMCEs to the guest, split into contiguous parts. The
> >> vmce code
> >> seems to be set up to be able to do this.
> >
> > For the contigous pages, I agree with Gavin that such
> > contiguous page error should be triggered as multiple #MC and so is ok.
> >
> > For PCI config space issue, Christoph, can you please share
> > more information on it (or provide some document as Frank
> > suggested), like is it for CE (Correctable error or
> > UC(UnCorrectable error), is it in PCI range or PCI-E range
> > (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
> > BDF caculated etc. Followed is some of my understanding.
> >
> > Firstly, if it is CE, Xen will do nothing and dom0 will take
> > recovery action. If it is UC, Xen will take action when all
> > CPU is in SoftIRQ context, and dom0 will not take action, so
> > it should be ok.
> >
> > Secondly, in Xen environment, per my understanding, CPU is
> > owned by Xen HV, so I'm not sure when dom0 disable L3 cache
> > (if it is CE), should Xen be aware or not. That is, should
> > dom0 disable the cache directly, or it should user hypercall
> > to ask Xen do that. Keir can give us more suggestion.
> >
> > For item C, currently Xen/dom0 can both access configuration
> > space, while domU will do that through PCI_frontend/backend.
> > Because PCI backend only cover device assigned to domU, so we
> > don't need worry about domU and dom0 should be trusted.
> > However, one thing left is, if this range is beyond 0x100
> > (i.e. in pci-e range), we need add mmconfig support in Xen,
> > although it can be added simply.
> >
> > Thanks
> > -- Yunhong Jiang
> >
> >> As for the Shanghai feature: Christoph, are there any documents
> >> available on that feature?

Yes, our BKDG.

> >> What kind of errors are delivered (corrected/correctable)?

The error type can be both depending on whether correction
via ECC was successful or not.


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.