[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Woes of NMIs and MCEs, and possibly how to fix
At 17:34 +0000 on 30 Nov (1354296851), Andrew Cooper wrote: > Hello, > > Yesterday, Tim and myself spent a very long time in front of a > whiteboard trying to develop a fix which covered all the problems, and > sadly it is very hard. We managed to possibly come up with a long > solution which we think has no race conditions, but relies on very large > sections of reentrant code which cant use the stack or trash registers. > As such, is it is not practical at all (assuming that any of us could > actually code it) For the record, we also came up with a much simpler solution, which I prefer: - The MCE handler should never return to Xen with IRET. - The NMI handler should always return with IRET. - There should be no faulting code in the NMI or MCE handlers. That covers all the interesting cases except (3), (4) and (7) below, and a simple per-cpu {nmi,mce}-in-progress flag will be enough to detect (and crash) on _almost_ all cases where that bites us (the other cases will crash less politely from their stacks being smashed). Even if we go on to build some more bulletproof solution, I think we should consider implementing that now, as the baseline. Tim. > As a result, I thought instead that I would outline all the issues we > currently face. We can then: > * Decide which issues need fixing > * Decide which issues need to at least be detected and crash gracefully > * Decide which issues we are happy (or perhaps at least willing, if not > happy) to ignore > > So, the issues are as follows. (I have tried to list them in a logical > order, with 1 individual problem per number, but please do point out if > I have missed/miss-attributed entries) > > 1) Faults on the NMI path will re-enable NMIs before the handler > returns, leading to reentrant behaviour. We should audit the NMI path > to try and remove any needless cases which might fault, but getting a > fault-free path will be hard (and is not going so solve the reentrant > behaviour itself). > > 2) Faults on the MCE path will re-enable NMIs, as will the iret of the > MCE itself if an MCE interrupts an NMI. > > 3) SMM mode executing an iret will re-enable NMIs. There is nothing we > can do to prevent this, and as an SMI can interrupt NMIs and MCEs, no > way to predict if/when it may happen. The best we can do is accept that > it might happen, and try to deal with the after effects. > > 4) "Fake NMIs" can be caused by hardware with access to the INTR pin > (very unlikely in modern systems with the LAPIC supporting virtual wire > mode), or by software executing an `int $0x2`. This can cause the NMI > handler to run on the NMI stack, but without the normal hardware NMI > cessation logic being triggered. > > 5) "Fake MCEs" can be caused by software executing `int $0x18`, and by > any MSI/IOMMU/IOAPIC programmed to deliver vector 0x18. Normally, this > could only be caused by a bug in Xen, although it is also possible on a > system with out interrupt remapping. (Where the host administrator has > accepted the documented security issue, and decided still to pass-though > a device to a trusted VM, and the VM in question has a buggy driver for > the passed-through hardware) > > 6) Because of interrupt stack tables, real NMIs/MCEs can race with their > fake alternatives, where the real interrupt interrupts the fake one and > corrupts the exception frame of the fake one, loosing the original > context to return to. (This is one of the two core problem of > reentrancy with NMIs and MCEs) > > 7) Real MCEs can race with each other. If two real MCEs occur too close > together, the processor shuts down (We cant avoid this). However, there > is large race condition between the MCE handler clearing the MCIP bit of > IA32_MCG_STATUS and the handler returning during which a new MCE can > occur and the exception frame will be corrupted. > > > In addition to the above issues, we have two NMI related bugs in Xen > which need fixing (which shall be part of the series which fixes the above) > > 8) VMEXIT reason NMI on Intel calls self_nmi() while NMIs are latched, > causing the PCPU to fall into loop of VMEXITs until the VCPU timeslice > has expired, at which point the return-to-guest path decides to schedule > instead of resuming the guest. > > 9) The NMI handler when returning to ring3 will leave NMIs latched, as > it uses the sysret path. > > > As for 1 possible solution which we cant use: > > If it were not for the sysret stupidness[1] of requiring the hypervisor > to move to the guest stack before executing the `sysret` instruction, we > could do away with the stack tables for NMIs and MCEs alltogether, and > the above crazyness would be easy to fix. However, the overhead of > always using iret to return to ring3 is not likely to be acceptable, > meaning that we cannot "fix" the problem by discarding interrupt stacks > and doing everything properly on the main hypervisor stack. > > > Looking at the above problems, I believe there is a solution if we are > willing to ignore the problem to do with SMM re-enabling NMIs, and if we > are happy to crash gracefully when mixes of NMIs and MCEs interrupt each > other and trash their exception frames (in situations were we could > technically fix up correctly), which is based on the Linux NMI solution. > > As questions to the community - have I missed, or misrepresented any > points above which might perhaps influence the design of the solution? > I think the list is complete, but would not be supprised if there is a > case still not considered yet. > > ~Andrew > > > [1] In an effort to prevent a flamewar with my comment, the situation we > find outself in now is almost certainly the result of unforseen > interactions of individual features, but we are left to pick up the many > pieces in way which cant completely be solved. > > -- > Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer > T: +44 (0)1223 225 900, http://www.citrix.com > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |