[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
Jan Beulich wrote: >>>> On 06.05.13 at 10:54, Christoph Egger <chegger@xxxxxxxxx> wrote: >> On 03.05.13 17:51, Liu, Jinsong wrote: >>> Jan Beulich wrote: >>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>>> wrote: >>>>> Jan Beulich wrote: >>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>>>>> wrote: >>>>>>> Jan Beulich wrote: >>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>>>>>>> wrote: >>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 >>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@xxxxxxxxx> >>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800 >>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic >>>>>>>>> is_vmce_ready check >>>>>>>>> >>>>>>>>> is_vmce_ready() is problematic: >>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If >>>>>>>>> not, it results dom0 crash. However, it's problematic and >>>>>>>>> overkilled since mcelog as a dom0 feature could be >>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error page >>>>>>>>> is ownded by DOM 0 (XEN) DOM0 not ready for vMCE (XEN) >>>>>>>>> domain_crash called from mcaction.c:133 (XEN) Domain 0 >>>>>>>>> reported crashed by domain 32767 on cpu#31: (XEN) Domain 0 >>>>>>>>> crashed: rebooting machine in 5 seconds. (XEN) Resetting with >>>>>>>>> ACPI MEMORY or I/O RESET_REG. >>>>>>>>> >>>>>>>>> * For dom0, if really need check, it should check whether vMCE >>>>>>>>> injection for dom0 ready (say, exception trap bounce check, >>>>>>>>> which has been done at inject_vmce()), not check dom0 mcelog >>>>>>>>> ready (which has been done at mce_softirq() before send >>>>>>>>> global virq to dom0). >>>>>>>> >>>>>>>> Following the argumentation above, I wonder which of the other >>>>>>>> "goto vmce_failed" are really appropriate, i.e. whether the >>>>>>>> patch shouldn't be extended (at least for the Dom0 case). >>>>>>> >>>>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm >>>>>>> not quite clear your point)? >>>>>> >>>>>> Yes. >>>>>> >>>>>>> Would you please point out which point you think not >>>>>>> appropriate? >>>>>> >>>>>> I question whether it is correct/necessary to crash the domain in >>>>>> any of those failure cases. Perhaps when we fail to unmap the >>>>>> page it is, but failure of fill_vmsr_data() and inject_vmce() >>>>>> don't appear to be valid reasons once the is_vmce_ready() path >>>>>> is being dropped. >>>>> >>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit >>>>> still set when next vMCE# occur, means the 2nd vMCE# occur when >>>>> the 1st vMCE# not handled yet. Per SDM it should shutdown. >>>>> >>>>> For inject_vmce(), it failed when >>>>> 1). vcpu is still mce_pending, or >>>>> 2). pv not register trap callback >>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to >>>>> kill them), but any graceful way to quit? >>>> >>>> Just exit and do nothing (except perhaps log a rate limited >>>> message)? >>>> >>> >>> One concern of quiet exit is, the error will be totally ignored by >>> guest --> it >> didn't get preperly handled, and may recursively occur to make worse >> error --> it's better to kill guest under such case. >>> >>>>> or, considering it rarely happens, how about keep current way >>>>> (kill guest no matter dom0 or not)? >>>> >>>> Possibly - I was merely asking why this one condition was found to >>>> be too strict, while the others are being left as is. >>>> >>>> Jan >>> >>> Ah, the reason of removing is_vmce_ready check is, it's >>> problematic (check mcelog driver, not vmce tap callback), >>> and overkilled (since defaultly dom0 will not start mcelog driver, >>> under which case system will crash whenever vmce inject to dom0) >>> >>> --> So patch 2/2 is not too strict for dom0. >>> >> >> Please keep in mind the mcelog userland/kernel interface is not >> designed >> with xen in mind. mcelog cannot report which guest is impacted for >> example, although xen reports that to dom0. >> I object 'fixing' the hypervisor to come over with mcelog drawbacks. >> I prefer fixing Dom0 instead. >> Sure, xen mcelog driver in linux is implemented by me :-) This patch does not intend to 'fix' hypervisor but just avoid overkilled system (when xen mcelog driver in dom0 not loaded as default). >> From the design perspective, the virq for Dom0 is for logging purpose >> only and the trap handler has equal purpose for both Dom0 and DomU. > Sure, that's what I meant 'problematic' check. Thanks, Jinsong > So as this doesn't read like "don't care" - is this an ack, nak, or > a request to Jinsong to change something for the patch to be > acceptable? > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |