[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: x86/vmx: Don't spuriously crash the domain when INIT is received
On 25.02.2022 14:51, Marek Marczykowski-Górecki wrote: > On Fri, Feb 25, 2022 at 02:19:39PM +0100, Jan Beulich wrote: >> On 25.02.2022 13:28, Andrew Cooper wrote: >>> On 25/02/2022 08:44, Jan Beulich wrote: >>>> On 24.02.2022 20:48, Andrew Cooper wrote: >>>>> In VMX operation, the handling of INIT IPIs is changed. EXIT_REASON_INIT >>>>> has >>>>> nothing to do with the guest in question, simply signals that an INIT was >>>>> received. >>>>> >>>>> Ignoring the INIT is probably the wrong thing to do, but is helpful for >>>>> debugging. Crashing the domain which happens to be in context is >>>>> definitely >>>>> wrong. Print an error message and continue. >>>>> >>>>> Discovered as collateral damage from when an AP triple faults on S3 >>>>> resume on >>>>> Intel TigerLake platforms. >>>> I'm afraid I don't follow the scenario, which was (only) outlined in >>>> patch 1: Why would the BSP receive INIT in this case? >>> >>> SHUTDOWN is a signal emitted by a core when it can't continue. Triple >>> fault is one cause, but other sources include a double #MC, etc. >>> >>> Some external component, in the PCH I expect, needs to turn this into a >>> platform reset, because one malfunctioning core can't. It is why a >>> triple fault on any logical processor brings the whole system down. >> >> I'm afraid this doesn't answer my question. Clearly the system didn't >> shut down. Hence I still don't see why the BSP would see INIT in the >> first place. >> >>>> And it also cannot be that the INIT was received by the vCPU while running >>>> on >>>> another CPU: >>> >>> It's nothing (really) to do with the vCPU. INIT is a external signal to >>> the (real) APIC, just like NMI/etc. >>> >>> It is the next VMEntry on a CPU which received INIT that suffers a >>> VMEntry failure, and the VMEntry failure has nothing to do with the >>> contents of the VMCS. >>> >>> Importantly for Xen however, this isn't applicable for scheduling PV >>> vCPUs, which is why dom0 wasn't the one that crashed. This actually >>> meant that dom0 was alive an usable, albeit it sharing all vCPUs on a >>> single CPU. >>> >>> >>> The change in INIT behaviour exists for TXT, where is it critical that >>> software can clear secrets from RAM before resetting. I'm not wanting >>> to get into any of that because it's far more complicated than I have >>> time to fix. >> >> I guess there's something hidden behind what you say here, like INIT >> only being latched, but this latched state then causing the VM entry >> failure. Which would mean that really the INIT was a signal for the >> system to shut down / shutting down. In which case arranging to >> continue by ignoring the event in VMX looks wrong. Simply crashing >> the guest would then be wrong as well, of course. We should shut >> down instead. > > A shutdown could be an alternative here, with remark that it would make > debugging such issues significantly harder. Note the INIT is delivered > to BSP, but the actual reason (in this case) is on some AP. Shutdown > (crash) in this case would prevent (still functioning) BSP to show you > the message (unless you have serial console, which is rather rare in > laptops - which are significant target for Qubes OS). Well, I didn't necessarily mean shutting down silently. I fully appreciate the usefulness of getting state dumped out for debugging of an issue. >> But I don't think I see the full picture here yet, unless your >> mentioning of TXT was actually implying that TXT was active at the >> point of the crash (which I don't think was said anywhere). > > No, TXT wasn't (intentionally) active. I think Andrew mentioned it as > explanation why VMX behaves this way: to let the OS do something _if_ TXT > is active (the check for TXT is the OS responsibility here). But that's > my guess only... One part here that I don't understand: How would the OS become aware of the INIT if it didn't try to enter a VMX guest (i.e. non- root mode)? Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |