[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK The L3 cache index disable feature works like this: You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1) and write it into the index field. This MSR does not belong to the standard mc bank data and is therefore provided by mcinfo_extended. The index field are the bits 11:0 of the PCI function 3 register "L3 Cache Index Disable". Why is the recover action bound to the bank ? I would like to see a struct mcinfo_recover rather extending struct mcinfo_bank. That gives us flexibility. Christoph On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote: > Christoph/Frank, Followed is the interface definition, please have a look. > > Thanks > Yunhong Jiang > > 1) Interface between Xen/dom0 for passing xen's recovery action information > to dom0. Usage model: After offlining broken page, Xen might pass its > page-offline recovery action result information to dom0. Dom0 will save the > information in non-volatile memory for further proactive actions, such as > offlining the easy-broken page early when doing next reboot. > > > struct page_offline_action > { > /* Params for passing the offlined page number to DOM0 */ > uint64_t mfn; > uint64_t status; /* Similar to page offline hypercall */ > }; > > struct cpu_offline_action > { > /* Params for passing the identity of the offlined CPU to DOM0 */ > uint32_t mc_socketid; > uint16_t mc_coreid; > uint16_t mc_core_threadid; > }; > > struct cache_shrink_action > { > /* TBD, Christoph, please fill it */ > }; > > /* Recover action flags, giving recovery result information to guest */ > /* Recovery successfully after taking certain recovery actions below */ > #define REC_ACT_RECOVERED (0x1 << 0) > /* For solaris's usage that dom0 will take ownership when crash */ > #define REC_ACT_RESET (0x1 << 2) > /* No action is performed by XEN */ > #define REC_ACT_INFO (0x1 << 3) > > /* Recover action type definition, valid only when flags & > REC_ACT_RECOVERED */ > #define MC_ACT_PAGE_OFFLINE 1 > #define MC_ACT_CPU_OFFLINE 2 > #define MC_ACT_CACHE_SHIRNK 3 > > struct recovery_action > { > uint8_t flags; > uint8_t action_type; > union > { > struct page_offline_action page_retire; > struct cpu_offline_action cpu_offline; > struct cache_shrink_action cache_shrink; > uint8_t pad[MAX_ACTION_SIZE]; > } action_info; > } > > struct mcinfo_bank { > struct mcinfo_common common; > > uint16_t mc_bank; /* bank nr */ > uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0 > * and if mc_addr is valid. Never valid on DomU. */ > uint64_t mc_status; /* bank status */ > uint64_t mc_addr; /* bank address, only valid > * if addr bit is set in mc_status */ > uint64_t mc_misc; > uint64_t mc_ctrl2; > uint64_t mc_tsc; > /* Recovery action is performed per bank */ > struct recovery_action action; > }; > > 2) Below two interfaces are for MCA processing internal use. > a. pre_handler will be called earlier in MCA ISR context, mainly for > early need_reset detection for avoiding log missing (flag MCA_RESET). > Also, pre_handler might be able to find the impacted domain if possible. > b. mca_error_handler is actually a (error_action_index, > recovery_handler pointer) pair. The defined recovery_handler function > performs the actual recovery operations in softIrq context after the > per_bank MCA error matching the corresponding mca_code index. If > pre_handler can't judge the impacted domain, recovery_handler must figure > it out. > > /* Error has been recovered successfully */ > #define MCA_RECOVERD 0 > /* Error impact one guest as stated in owner field */ > #define MCA_OWNER 1 > /* Error can't be recovered and need reboot system */ > #define MCA_RESET 2 > /* Error should be handled in softIRQ context */ > #define MCA_MORE_ACTION 3 > > struct mca_handle_result > { > uint32_t flags; > /* Valid only when flags & MCA_OWNER */ > domid_d owner; > /* valid only when flags & MCA_RECOVERD */ > struct recovery_action *action; > }; > > struct mca_error_handler > { > /* > * Assume we will need only architecture defined code. If the index > can't be setup by * mca_code, we will add a function to do the (index, > recovery_handler) mapping check. * This mca_code represents the recovery > handler pointer index for identifying this * particular error's > corresponding recover action > */ > uint16_t mca_code; > > /* Handler to be called in softIRQ handler context */ > int recovery_handler(struct mcinfo_bank *bank, > struct mcinfo_global *global, > struct mcinfo_extended *extention, > struct mca_handle_result *result); > > }; > > struct mca_error_handler intel_mca_handler[] = > { > .... > }; > > struct mca_error_handler amd_mca_handler[] = > { > .... > }; > > > /* HandlVer to be called in MCA ISR in MCA context */ > int intel_mca_pre_handler(struct cpu_user_regs *regs, > struct mca_handle_result *result); > > int amd_mca_pre_handler(struct cpu_user_regs *regs, > struct mca_handle_result *result); > > Frank.Vanderlinden@xxxxxxx <mailto:Frank.Vanderlinden@xxxxxxx> wrote: > > Jiang, Yunhong wrote: > >> Frank/Christopher, can you please give more comments for it, or you are > >> OK with this? For the action reporting mechanism, we will send out a > >> proposal for review soon. > > > > I'm ok with this. We need a little more information on the AMD > > mechanism, but it seems to me that we can fit this in. > > > > Sometime this week, I'll also send out the last of our changes that > > haven't been sent upstream to xen-unstable yet. Maybe we can combine > > some things in to one patch, like the telemetry handling changes that > > Gavin did. The other changes are error injection (for debugging) and > > panic crash dump support for our FMA tools, but those are probably only > > interesting to us. > > > > - Frank -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |