[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
On 06.05.13 17:14, Liu, Jinsong wrote: > Egger, Christoph wrote: >> On 06.05.13 17:00, Liu, Jinsong wrote: >>> Christoph Egger wrote: >>>> On 06.05.13 11:50, Liu, Jinsong wrote: >>>>> Christoph Egger wrote: >>>>>> On 06.05.13 11:24, Liu, Jinsong wrote: >>>>>>> Jan Beulich wrote: >>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@xxxxxxxxx> >>>>>>>>>>> wrote: >>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote: >>>>>>>>>> Jan Beulich wrote: >>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" >>>>>>>>>>>>>> <jinsong.liu@xxxxxxxxx> wrote: >>>>>>>>>>>> Jan Beulich wrote: >>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" >>>>>>>>>>>>>>>> <jinsong.liu@xxxxxxxxx> wrote: >>>>>>>>>>>>>> Jan Beulich wrote: >>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" >>>>>>>>>>>>>>>>>> <jinsong.liu@xxxxxxxxx> wrote: >>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 >>>>>>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@xxxxxxxxx> >>>>>>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800 >>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove >>>>>>>>>>>>>>>> problematic is_vmce_ready check >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is_vmce_ready() is problematic: >>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog >>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However, it's >>>>>>>>>>>>>>>> problematic and overkilled since mcelog as a dom0 >>>>>>>>>>>>>>>> feature could be enabled/disabled per dom0 option: >>>>>>>>>>>>>>>> (XEN) MCE: This error page is ownded by DOM 0 (XEN) >>>>>>>>>>>>>>>> DOM0 not ready for vMCE (XEN) domain_crash called from >>>>>>>>>>>>>>>> mcaction.c:133 (XEN) Domain 0 reported crashed by >>>>>>>>>>>>>>>> domain 32767 on cpu#31: (XEN) Domain 0 crashed: >>>>>>>>>>>>>>>> rebooting machine in 5 seconds. (XEN) Resetting with >>>>>>>>>>>>>>>> ACPI MEMORY or I/O RESET_REG. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * For dom0, if really need check, it should check >>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say, exception >>>>>>>>>>>>>>>> trap bounce check, which has been done at >>>>>>>>>>>>>>>> inject_vmce()), not check dom0 mcelog ready (which has >>>>>>>>>>>>>>>> been done at mce_softirq() before send global virq to >>>>>>>>>>>>>>>> dom0). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Following the argumentation above, I wonder which of the >>>>>>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e. >>>>>>>>>>>>>>> whether the patch shouldn't be extended (at least for the >>>>>>>>>>>>>>> Dom0 case). >>>>>>>>>>>>>> >>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate >>>>>>>>>>>>>> (I'm not quite clear your point)? >>>>>>>>>>>>> >>>>>>>>>>>>> Yes. >>>>>>>>>>>>> >>>>>>>>>>>>>> Would you please point out which point you think not >>>>>>>>>>>>>> appropriate? >>>>>>>>>>>>> >>>>>>>>>>>>> I question whether it is correct/necessary to crash the >>>>>>>>>>>>> domain in any of those failure cases. Perhaps when we fail >>>>>>>>>>>>> to unmap the page it is, but failure of fill_vmsr_data() >>>>>>>>>>>>> and inject_vmce() don't appear to be valid reasons once the >>>>>>>>>>>>> is_vmce_ready() path is being dropped. >>>>>>>>>>>> >>>>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP >>>>>>>>>>>> bit still set when next vMCE# occur, means the 2nd vMCE# >>>>>>>>>>>> occur when the 1st vMCE# not handled yet. Per SDM it should >>>>>>>>>>>> shutdown. >>>>>>>>>>>> >>>>>>>>>>>> For inject_vmce(), it failed when >>>>>>>>>>>> 1). vcpu is still mce_pending, or >>>>>>>>>>>> 2). pv not register trap callback >>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's >>>>>>>>>>>> ok to kill them), but any graceful way to quit? >>>>>>>>>>> >>>>>>>>>>> Just exit and do nothing (except perhaps log a rate limited >>>>>>>>>>> message)? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> One concern of quiet exit is, the error will be totally >>>>>>>>>> ignored by guest --> it >>>>>>>>> didn't get preperly handled, and may recursively occur to make >>>>>>>>> worse error --> it's better to kill guest under such case. >>>>>>>>>> >>>>>>>>>>>> or, considering it rarely happens, how about keep current >>>>>>>>>>>> way (kill guest no matter dom0 or not)? >>>>>>>>>>> >>>>>>>>>>> Possibly - I was merely asking why this one condition was >>>>>>>>>>> found to be too strict, while the others are being left as >>>>>>>>>>> is. >>>>>>>>>>> >>>>>>>>>>> Jan >>>>>>>>>> >>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's >>>>>>>>>> problematic (check mcelog driver, not vmce tap callback), >>>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog >>>>>>>>>> driver, under which case system will crash whenever vmce >>>>>>>>>> inject to dom0) >>>>>>>>>> >>>>>>>>>> --> So patch 2/2 is not too strict for dom0. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Please keep in mind the mcelog userland/kernel interface is not >>>>>>>>> designed with xen in mind. mcelog cannot report which guest is >>>>>>>>> impacted for example, although xen reports that to dom0. >>>>>>>>> I object 'fixing' the hypervisor to come over with mcelog >>>>>>>>> drawbacks. I prefer fixing Dom0 instead. >>>>>>>>> >>>>>>> >>>>>>> Sure, xen mcelog driver in linux is implemented by me :-) >>>>>>> This patch does not intend to 'fix' hypervisor but just avoid >>>>>>> overkilled system (when xen mcelog driver in dom0 not loaded as >>>>>>> default). >>>>>> >>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not >>>>>> capable to deal with machine check errors. Is this correct? >>>>>> >>>>> >>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was >>>>> affected. Dom0 is still capable of handling vmce as long as it >>>>> registered trap callback (which is checked at hypervisor >>>>> inject_vmce()). >>>> >>>> Oh, I thought the xen mcelog *driver* registers the trap callback >>>> and in a Dom0 it additionally registers the logging callback. >>>> What registers the logging callback and what registers >>>> the trap callback? >>>> >>>> Christoph >>>> >>> [...] >> >> Ok, that's how I understand the code, too. But you say above that w/o >> this driver Dom0 is still capable to handle vmce. That puzzle's me. >> >> Christoph > > Ah, what I meant is, w/o mcelog dirver, kernel still successfully > register its mce handler as trap callback to hypervisor, hence the > injection of vmce from hypervisor to dom0 is OK, and surely got properly > handled at dom0. > mcelog driver is just for error logging, getting error info from hypervisor > and provide interface for user daemon tools 'mcelog'. Ah, so the trap handler is *always* installed in a Linux Dom0? NetBSD Dom0/DomU still has no machine check support at all, so in that case Dom0/DomU is not vmce ready. That means the vmce ready check is needed. Christoph _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |