[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
On 06.05.13 17:00, Liu, Jinsong wrote: > Christoph Egger wrote: >> On 06.05.13 11:50, Liu, Jinsong wrote: >>> Christoph Egger wrote: >>>> On 06.05.13 11:24, Liu, Jinsong wrote: >>>>> Jan Beulich wrote: >>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@xxxxxxxxx> >>>>>>>>> wrote: >>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote: >>>>>>>> Jan Beulich wrote: >>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>>>>>>>> wrote: >>>>>>>>>> Jan Beulich wrote: >>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" >>>>>>>>>>>>>> <jinsong.liu@xxxxxxxxx> wrote: >>>>>>>>>>>> Jan Beulich wrote: >>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" >>>>>>>>>>>>>>>> <jinsong.liu@xxxxxxxxx> wrote: >>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 >>>>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@xxxxxxxxx> >>>>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800 >>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove >>>>>>>>>>>>>> problematic is_vmce_ready check >>>>>>>>>>>>>> >>>>>>>>>>>>>> is_vmce_ready() is problematic: >>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. >>>>>>>>>>>>>> If not, it results dom0 crash. However, it's problematic >>>>>>>>>>>>>> and overkilled since mcelog as a dom0 feature could be >>>>>>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error >>>>>>>>>>>>>> page is ownded by DOM 0 (XEN) DOM0 not ready for vMCE >>>>>>>>>>>>>> (XEN) domain_crash called from mcaction.c:133 (XEN) >>>>>>>>>>>>>> Domain 0 reported crashed by domain 32767 on cpu#31: >>>>>>>>>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds. >>>>>>>>>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. >>>>>>>>>>>>>> >>>>>>>>>>>>>> * For dom0, if really need check, it should check whether >>>>>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce >>>>>>>>>>>>>> check, which has been done at inject_vmce()), not check >>>>>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq() >>>>>>>>>>>>>> before send global virq to dom0). >>>>>>>>>>>>> >>>>>>>>>>>>> Following the argumentation above, I wonder which of the >>>>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e. >>>>>>>>>>>>> whether the patch shouldn't be extended (at least for the >>>>>>>>>>>>> Dom0 case). >>>>>>>>>>>> >>>>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate >>>>>>>>>>>> (I'm not quite clear your point)? >>>>>>>>>>> >>>>>>>>>>> Yes. >>>>>>>>>>> >>>>>>>>>>>> Would you please point out which point you think not >>>>>>>>>>>> appropriate? >>>>>>>>>>> >>>>>>>>>>> I question whether it is correct/necessary to crash the >>>>>>>>>>> domain in any of those failure cases. Perhaps when we fail >>>>>>>>>>> to unmap the page it is, but failure of fill_vmsr_data() and >>>>>>>>>>> inject_vmce() don't appear to be valid reasons once the >>>>>>>>>>> is_vmce_ready() path is being dropped. >>>>>>>>>> >>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit >>>>>>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur >>>>>>>>>> when the 1st vMCE# not handled yet. Per SDM it should >>>>>>>>>> shutdown. >>>>>>>>>> >>>>>>>>>> For inject_vmce(), it failed when >>>>>>>>>> 1). vcpu is still mce_pending, or >>>>>>>>>> 2). pv not register trap callback >>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok >>>>>>>>>> to kill them), but any graceful way to quit? >>>>>>>>> >>>>>>>>> Just exit and do nothing (except perhaps log a rate limited >>>>>>>>> message)? >>>>>>>>> >>>>>>>> >>>>>>>> One concern of quiet exit is, the error will be totally ignored >>>>>>>> by guest --> it >>>>>>> didn't get preperly handled, and may recursively occur to make >>>>>>> worse error --> it's better to kill guest under such case. >>>>>>>> >>>>>>>>>> or, considering it rarely happens, how about keep current way >>>>>>>>>> (kill guest no matter dom0 or not)? >>>>>>>>> >>>>>>>>> Possibly - I was merely asking why this one condition was found >>>>>>>>> to be too strict, while the others are being left as is. >>>>>>>>> >>>>>>>>> Jan >>>>>>>> >>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's >>>>>>>> problematic (check mcelog driver, not vmce tap callback), >>>>>>>> and overkilled (since defaultly dom0 will not start mcelog >>>>>>>> driver, under which case system will crash whenever vmce inject >>>>>>>> to dom0) >>>>>>>> >>>>>>>> --> So patch 2/2 is not too strict for dom0. >>>>>>>> >>>>>>> >>>>>>> Please keep in mind the mcelog userland/kernel interface is not >>>>>>> designed with xen in mind. mcelog cannot report which guest is >>>>>>> impacted for example, although xen reports that to dom0. >>>>>>> I object 'fixing' the hypervisor to come over with mcelog >>>>>>> drawbacks. I prefer fixing Dom0 instead. >>>>>>> >>>>> >>>>> Sure, xen mcelog driver in linux is implemented by me :-) >>>>> This patch does not intend to 'fix' hypervisor but just avoid >>>>> overkilled system (when xen mcelog driver in dom0 not loaded as >>>>> default). >>>> >>>> I assume dom0 w/o xen mcelog driver active means dom0 is not capable >>>> to deal with machine check errors. Is this correct? >>>> >>> >>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was >>> affected. Dom0 is still capable of handling vmce as long as it >>> registered trap callback (which is checked at hypervisor >>> inject_vmce()). >> >> Oh, I thought the xen mcelog *driver* registers the trap callback >> and in a Dom0 it additionally registers the logging callback. >> What registers the logging callback and what registers >> the trap callback? >> >> Christoph >> > > They are 2 paths (refer latest kernl): > 1. for xen mcelog driver, dom0 registers irq handler at driver init via > drivers/xen/mcelog.c > --> bind_virq_for_mce() > --> bind_virq_to_irqhandler() > which got VIRQ_MCA via eventchannel binding and handle mcelog error > logging accordingly; > > 2. for vmce injection, dom0 registers trap callback (which re-use kernel > machine check handler) via normal kernel traps init: > arch/x86/kernel/traps.c > --> set_intr_gate_ist() > --> write_idt_entry() > --> xen_write_idt_entry() > --> cvt_gate_to_trap() & HYPERVISOR_set_trap_table() > hypervisor entry.S use the registerred trap callback to construct > bounce back stack for TRAP_machine_check, bounce back to dom0 mce handler. Ok, that's how I understand the code, too. But you say above that w/o this driver Dom0 is still capable to handle vmce. That puzzle's me. Christoph _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |