|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK
The L3 cache index disable feature works like this:
You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1)
and write it into the index field. This MSR does not belong to the standard
mc bank data and is therefore provided by mcinfo_extended.
The index field are the bits 11:0 of the PCI function 3 register
"L3 Cache Index Disable".
Why is the recover action bound to the bank ?
I would like to see a struct mcinfo_recover rather extending
struct mcinfo_bank. That gives us flexibility.
Christoph
On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
> Christoph/Frank, Followed is the interface definition, please have a look.
>
> Thanks
> Yunhong Jiang
>
> 1) Interface between Xen/dom0 for passing xen's recovery action information
> to dom0. Usage model: After offlining broken page, Xen might pass its
> page-offline recovery action result information to dom0. Dom0 will save the
> information in non-volatile memory for further proactive actions, such as
> offlining the easy-broken page early when doing next reboot.
>
>
> struct page_offline_action
> {
> /* Params for passing the offlined page number to DOM0 */
> uint64_t mfn;
> uint64_t status; /* Similar to page offline hypercall */
> };
>
> struct cpu_offline_action
> {
> /* Params for passing the identity of the offlined CPU to DOM0 */
> uint32_t mc_socketid;
> uint16_t mc_coreid;
> uint16_t mc_core_threadid;
> };
>
> struct cache_shrink_action
> {
> /* TBD, Christoph, please fill it */
> };
>
> /* Recover action flags, giving recovery result information to guest */
> /* Recovery successfully after taking certain recovery actions below */
> #define REC_ACT_RECOVERED (0x1 << 0)
> /* For solaris's usage that dom0 will take ownership when crash */
> #define REC_ACT_RESET (0x1 << 2)
> /* No action is performed by XEN */
> #define REC_ACT_INFO (0x1 << 3)
>
> /* Recover action type definition, valid only when flags &
> REC_ACT_RECOVERED */
> #define MC_ACT_PAGE_OFFLINE 1
> #define MC_ACT_CPU_OFFLINE 2
> #define MC_ACT_CACHE_SHIRNK 3
>
> struct recovery_action
> {
> uint8_t flags;
> uint8_t action_type;
> union
> {
> struct page_offline_action page_retire;
> struct cpu_offline_action cpu_offline;
> struct cache_shrink_action cache_shrink;
> uint8_t pad[MAX_ACTION_SIZE];
> } action_info;
> }
>
> struct mcinfo_bank {
> struct mcinfo_common common;
>
> uint16_t mc_bank; /* bank nr */
> uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
> * and if mc_addr is valid. Never valid on DomU. */
> uint64_t mc_status; /* bank status */
> uint64_t mc_addr; /* bank address, only valid
> * if addr bit is set in mc_status */
> uint64_t mc_misc;
> uint64_t mc_ctrl2;
> uint64_t mc_tsc;
> /* Recovery action is performed per bank */
> struct recovery_action action;
> };
>
> 2) Below two interfaces are for MCA processing internal use.
> a. pre_handler will be called earlier in MCA ISR context, mainly for
> early need_reset detection for avoiding log missing (flag MCA_RESET).
> Also, pre_handler might be able to find the impacted domain if possible.
> b. mca_error_handler is actually a (error_action_index,
> recovery_handler pointer) pair. The defined recovery_handler function
> performs the actual recovery operations in softIrq context after the
> per_bank MCA error matching the corresponding mca_code index. If
> pre_handler can't judge the impacted domain, recovery_handler must figure
> it out.
>
> /* Error has been recovered successfully */
> #define MCA_RECOVERD 0
> /* Error impact one guest as stated in owner field */
> #define MCA_OWNER 1
> /* Error can't be recovered and need reboot system */
> #define MCA_RESET 2
> /* Error should be handled in softIRQ context */
> #define MCA_MORE_ACTION 3
>
> struct mca_handle_result
> {
> uint32_t flags;
> /* Valid only when flags & MCA_OWNER */
> domid_d owner;
> /* valid only when flags & MCA_RECOVERD */
> struct recovery_action *action;
> };
>
> struct mca_error_handler
> {
> /*
> * Assume we will need only architecture defined code. If the index
> can't be setup by * mca_code, we will add a function to do the (index,
> recovery_handler) mapping check. * This mca_code represents the recovery
> handler pointer index for identifying this * particular error's
> corresponding recover action
> */
> uint16_t mca_code;
>
> /* Handler to be called in softIRQ handler context */
> int recovery_handler(struct mcinfo_bank *bank,
> struct mcinfo_global *global,
> struct mcinfo_extended *extention,
> struct mca_handle_result *result);
>
> };
>
> struct mca_error_handler intel_mca_handler[] =
> {
> ....
> };
>
> struct mca_error_handler amd_mca_handler[] =
> {
> ....
> };
>
>
> /* HandlVer to be called in MCA ISR in MCA context */
> int intel_mca_pre_handler(struct cpu_user_regs *regs,
> struct mca_handle_result *result);
>
> int amd_mca_pre_handler(struct cpu_user_regs *regs,
> struct mca_handle_result *result);
>
> Frank.Vanderlinden@xxxxxxx <mailto:Frank.Vanderlinden@xxxxxxx> wrote:
> > Jiang, Yunhong wrote:
> >> Frank/Christopher, can you please give more comments for it, or you are
> >> OK with this? For the action reporting mechanism, we will send out a
> >> proposal for review soon.
> >
> > I'm ok with this. We need a little more information on the AMD
> > mechanism, but it seems to me that we can fit this in.
> >
> > Sometime this week, I'll also send out the last of our changes that
> > haven't been sent upstream to xen-unstable yet. Maybe we can combine
> > some things in to one patch, like the telemetry handling changes that
> > Gavin did. The other changes are error injection (for debugging) and
> > panic crash dump support for our FMA tools, but those are probably only
> > interesting to us.
> >
> > - Frank
--
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |