[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph/Frank, Followed is the interface definition, please have a look. Thanks Yunhong Jiang 1) Interface between Xen/dom0 for passing xen's recovery action information to dom0. Usage model: After offlining broken page, Xen might pass its page-offline recovery action result information to dom0. Dom0 will save the information in non-volatile memory for further proactive actions, such as offlining the easy-broken page early when doing next reboot. struct page_offline_action { /* Params for passing the offlined page number to DOM0 */ uint64_t mfn; uint64_t status; /* Similar to page offline hypercall */ }; struct cpu_offline_action { /* Params for passing the identity of the offlined CPU to DOM0 */ uint32_t mc_socketid; uint16_t mc_coreid; uint16_t mc_core_threadid; }; struct cache_shrink_action { /* TBD, Christoph, please fill it */ }; /* Recover action flags, giving recovery result information to guest */ /* Recovery successfully after taking certain recovery actions below */ #define REC_ACT_RECOVERED (0x1 << 0) /* For solaris's usage that dom0 will take ownership when crash */ #define REC_ACT_RESET (0x1 << 2) /* No action is performed by XEN */ #define REC_ACT_INFO (0x1 << 3) /* Recover action type definition, valid only when flags & REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 #define MC_ACT_CPU_OFFLINE 2 #define MC_ACT_CACHE_SHIRNK 3 struct recovery_action { uint8_t flags; uint8_t action_type; union { struct page_offline_action page_retire; struct cpu_offline_action cpu_offline; struct cache_shrink_action cache_shrink; uint8_t pad[MAX_ACTION_SIZE]; } action_info; } struct mcinfo_bank { struct mcinfo_common common; uint16_t mc_bank; /* bank nr */ uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t mc_status; /* bank status */ uint64_t mc_addr; /* bank address, only valid * if addr bit is set in mc_status */ uint64_t mc_misc; uint64_t mc_ctrl2; uint64_t mc_tsc; /* Recovery action is performed per bank */ struct recovery_action action; }; 2) Below two interfaces are for MCA processing internal use. a. pre_handler will be called earlier in MCA ISR context, mainly for early need_reset detection for avoiding log missing (flag MCA_RESET). Also, pre_handler might be able to find the impacted domain if possible. b. mca_error_handler is actually a (error_action_index, recovery_handler pointer) pair. The defined recovery_handler function performs the actual recovery operations in softIrq context after the per_bank MCA error matching the corresponding mca_code index. If pre_handler can't judge the impacted domain, recovery_handler must figure it out. /* Error has been recovered successfully */ #define MCA_RECOVERD 0 /* Error impact one guest as stated in owner field */ #define MCA_OWNER 1 /* Error can't be recovered and need reboot system */ #define MCA_RESET 2 /* Error should be handled in softIRQ context */ #define MCA_MORE_ACTION 3 struct mca_handle_result { uint32_t flags; /* Valid only when flags & MCA_OWNER */ domid_d owner; /* valid only when flags & MCA_RECOVERD */ struct recovery_action *action; }; struct mca_error_handler { /* * Assume we will need only architecture defined code. If the index can't be setup by * mca_code, we will add a function to do the (index, recovery_handler) mapping check. * This mca_code represents the recovery handler pointer index for identifying this * particular error's corresponding recover action */ uint16_t mca_code; /* Handler to be called in softIRQ handler context */ int recovery_handler(struct mcinfo_bank *bank, struct mcinfo_global *global, struct mcinfo_extended *extention, struct mca_handle_result *result); }; struct mca_error_handler intel_mca_handler[] = { .... }; struct mca_error_handler amd_mca_handler[] = { .... }; /* HandlVer to be called in MCA ISR in MCA context */ int intel_mca_pre_handler(struct cpu_user_regs *regs, struct mca_handle_result *result); int amd_mca_pre_handler(struct cpu_user_regs *regs, struct mca_handle_result *result); Frank.Vanderlinden@xxxxxxx <mailto:Frank.Vanderlinden@xxxxxxx> wrote: > Jiang, Yunhong wrote: >> Frank/Christopher, can you please give more comments for it, or you are OK >> with this? For the action reporting mechanism, we will send out a proposal >> for review soon. > > I'm ok with this. We need a little more information on the AMD > mechanism, but it seems to me that we can fit this in. > > Sometime this week, I'll also send out the last of our changes that > haven't been sent upstream to xen-unstable yet. Maybe we can combine > some things in to one patch, like the telemetry handling changes that > Gavin did. The other changes are error injection (for debugging) and > panic crash dump support for our FMA tools, but those are probably only > interesting to us. > > - Frank _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |