[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
On 06.05.13 11:50, Liu, Jinsong wrote: > Christoph Egger wrote: >> On 06.05.13 11:24, Liu, Jinsong wrote: >>> Jan Beulich wrote: >>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@xxxxxxxxx> wrote: >>>>> On 03.05.13 17:51, Liu, Jinsong wrote: >>>>>> Jan Beulich wrote: >>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>> Jan Beulich wrote: >>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>>>>>>>> wrote: >>>>>>>>>> Jan Beulich wrote: >>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" >>>>>>>>>>>>>> <jinsong.liu@xxxxxxxxx> wrote: >>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 >>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@xxxxxxxxx> >>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800 >>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic >>>>>>>>>>>> is_vmce_ready check >>>>>>>>>>>> >>>>>>>>>>>> is_vmce_ready() is problematic: >>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If >>>>>>>>>>>> not, it results dom0 crash. However, it's problematic and >>>>>>>>>>>> overkilled since mcelog as a dom0 feature could be >>>>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error page >>>>>>>>>>>> is ownded by DOM 0 (XEN) DOM0 not ready for vMCE (XEN) >>>>>>>>>>>> domain_crash called from mcaction.c:133 (XEN) Domain 0 >>>>>>>>>>>> reported crashed by domain 32767 on cpu#31: (XEN) Domain 0 >>>>>>>>>>>> crashed: rebooting machine in 5 seconds. (XEN) Resetting >>>>>>>>>>>> with ACPI MEMORY or I/O RESET_REG. >>>>>>>>>>>> >>>>>>>>>>>> * For dom0, if really need check, it should check whether >>>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce >>>>>>>>>>>> check, which has been done at inject_vmce()), not check >>>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq() >>>>>>>>>>>> before send global virq to dom0). >>>>>>>>>>> >>>>>>>>>>> Following the argumentation above, I wonder which of the >>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e. >>>>>>>>>>> whether the patch shouldn't be extended (at least for the >>>>>>>>>>> Dom0 case). >>>>>>>>>> >>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate >>>>>>>>>> (I'm not quite clear your point)? >>>>>>>>> >>>>>>>>> Yes. >>>>>>>>> >>>>>>>>>> Would you please point out which point you think not >>>>>>>>>> appropriate? >>>>>>>>> >>>>>>>>> I question whether it is correct/necessary to crash the domain >>>>>>>>> in any of those failure cases. Perhaps when we fail to unmap >>>>>>>>> the page it is, but failure of fill_vmsr_data() and >>>>>>>>> inject_vmce() don't appear to be valid reasons once the >>>>>>>>> is_vmce_ready() path is being dropped. >>>>>>>> >>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit >>>>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur when >>>>>>>> the 1st vMCE# not handled yet. Per SDM it should shutdown. >>>>>>>> >>>>>>>> For inject_vmce(), it failed when >>>>>>>> 1). vcpu is still mce_pending, or >>>>>>>> 2). pv not register trap callback >>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to >>>>>>>> kill them), but any graceful way to quit? >>>>>>> >>>>>>> Just exit and do nothing (except perhaps log a rate limited >>>>>>> message)? >>>>>>> >>>>>> >>>>>> One concern of quiet exit is, the error will be totally ignored by >>>>>> guest --> it >>>>> didn't get preperly handled, and may recursively occur to make >>>>> worse error --> it's better to kill guest under such case. >>>>>> >>>>>>>> or, considering it rarely happens, how about keep current way >>>>>>>> (kill guest no matter dom0 or not)? >>>>>>> >>>>>>> Possibly - I was merely asking why this one condition was found >>>>>>> to be too strict, while the others are being left as is. >>>>>>> >>>>>>> Jan >>>>>> >>>>>> Ah, the reason of removing is_vmce_ready check is, it's >>>>>> problematic (check mcelog driver, not vmce tap callback), >>>>>> and overkilled (since defaultly dom0 will not start mcelog driver, >>>>>> under which case system will crash whenever vmce inject to dom0) >>>>>> >>>>>> --> So patch 2/2 is not too strict for dom0. >>>>>> >>>>> >>>>> Please keep in mind the mcelog userland/kernel interface is not >>>>> designed with xen in mind. mcelog cannot report which guest is >>>>> impacted for example, although xen reports that to dom0. >>>>> I object 'fixing' the hypervisor to come over with mcelog >>>>> drawbacks. I prefer fixing Dom0 instead. >>>>> >>> >>> Sure, xen mcelog driver in linux is implemented by me :-) >>> This patch does not intend to 'fix' hypervisor but just avoid >>> overkilled system (when xen mcelog driver in dom0 not loaded as >>> default). >> >> I assume dom0 w/o xen mcelog driver active means dom0 is not capable >> to deal with machine check errors. Is this correct? >> > > No, w/o xen mcelog driver active, only user daemon 'mcelog' was affected. > Dom0 is still capable of handling vmce as long as it registered trap callback > (which is checked at hypervisor inject_vmce()). Oh, I thought the xen mcelog *driver* registers the trap callback and in a Dom0 it additionally registers the logging callback. What registers the logging callback and what registers the trap callback? Christoph >>>>> From the design perspective, the virq for Dom0 is for logging >>>>> purpose only and the trap handler has equal purpose for both Dom0 >>>>> and DomU. >>> >>> Sure, that's what I meant 'problematic' check. >> >> What do you want to do when Dom0 is not capable to deal with >> machine check errors and Dom0 is impacted? >> >> Christoph > > As above comments :) > > Thanks, > Jinsong > >> >>> Thanks, >>> Jinsong >>> >>>> So as this doesn't read like "don't care" - is this an ack, nak, or >>>> a request to Jinsong to change something for the patch to be >>>> acceptable? >>>> >>>> Jan > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |