| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: x86/vmx: Don't spuriously crash the domain when INIT is received
 
To: Jan Beulich <jbeulich@xxxxxxxx>From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>Date: Fri, 25 Feb 2022 12:28:32 +0000Accept-language: en-GB, en-USArc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=noneArc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oKtYSX1KmB/UzKlONbxyD0KC73ISEOnvwAvXlAp3hHo=; b=NZGPo5rs/pNHepTP3XRxXiVC5jCOG9k+gP13XloT7vd00IrxNXUe1+0GNB4ZD4eJE6NOuIs9PqscI9H0HUVmWnWlAhBB3VMzTIy1wD8hJDcEWDfzVseWcAUP500RWikNh5Iq7g2DeOZ0tQNFh8QgQpcjpLPdy3Y0jqrWcsEPkcVdAVZob0ut1B5YBSLQW28eNZO/mmV11fbMZ92lPsY6MN7fzuFNPXvgvUzzSWwx/QL8sjINS8O/aDhlsVpWHvaIrc1rTzVY2HpXg7AsHngy2ZMnNnyzSN0aFupVgdoA/vmoHI5/QLZziukCGvRBQJaiKXG0m/f5vQ84zXTMIoUY3w==Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Pw1thLJX6rMgArkaRY15fMasmAsHXIKqrMUhYJcWdmvCk9EaCjuhzUMwDhb2INdxd7mZ9LuqhQXcRN4mrSUX9H3U9M+cm01eAvh+Rwyox1S3r9HtP2km0wdJciZawKXz+HOei/AYz6ctF/a+xX2KSOZcakTgh/0ZJbxpb7SW8G8B8AUmxes2I/5JRfUDjEF3uhLGOQdu8IGc0Pam4PPaQ2cKH++EpW0Zp1frm61E6H/d+vYsYKou/41i0cDRq5OPuwTS2DzjpC2Rk9f6E6C8/lz/9ACk99nir6HCxeD0QtLesKf/7S8R0fwbBDdSFAq3TvPKEYGYhYY8eaWwhTZD4A==Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.comCc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jun Nakajima	<jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Thiner Logoer	<logoerthiner1@xxxxxxx>, Marek Marczykowski-Górecki	<marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>Delivery-date: Fri, 25 Feb 2022 12:28:52 +0000Ironport-data: A9a23:yhgSFawzNV0N6l4qyWV6t+fMxirEfRIJ4+MujC+fZmUNrF6WrkUFm 2dJWTiAM/reNjPwf9x/YY2y9B4Hv8XUzdUyTVdlqyAxQypGp/SeCIXCJC8cHc8zwu4v7q5Dx 59DAjUVBJlsFhcwnj/0bv656yMUOZigHtIQMsadUsxKbVIiGX9JZS5LwbZj2NYy24XhWWthh PupyyHhEA79s9JLGjp8B5Kr8HuDa9yr5Vv0FnRnDRx6lAe2e0s9VfrzFonoR5fMeaFGH/bSe gr25OrRElU1XfsaIojNfr7TKiXmS1NJVOSEoiI+t6OK2nCuqsGuu0qS2TV1hUp/0l20c95NJ NplnI6MUz8mLLz1mf1GWElUOAh3ba8dweqSSZS/mZT7I0zudnLtx7NlDV0sPJ1e8eFyaY1M3 aVGcnZXNEnF3r/ohuLgIgVvrp1LwM3DFYUToHx/ixreCu4rW8vrSKTW/95Imjw3g6iiGN6AO 5tJNmM+PXwsZTVUJwsQLLFmx921h2SuYxF8sU2XtKc4tj27IAtZj+G2bYu9lsaxbcFSkUGVv H7G/mL0GEgybYLEjzGC9xqEg+bVmCrhVYE6Fbum9+Vrilme2mwSDhINUVKx5/K+jyaWS99Zb kAZ5Ccqhawz71CwCMnwWQWip3yJtQJaXMBfe8U55R+MzOzI4g+fLmkCUjNFLtchsaceRzYny 1uIlNPBHiF0vfueTnf13qiQhSO/P24SN2BqTS0ZS00D6trqooA2hzrOSMpuFOi+ididJN3r6 2nU9m5k3exV1JNVkfXglbzav96yjrHbchQN6RfGZ2O8tQgpaKWMbtyqsmGOuJ6sM72lZlWGu XEFne2X4+YPEYyBmUSxfQkdIF26z63baWOB2DaDC7Fkrm3woCD7Iei89RkjfB8BDyoSRdP+j KY/Uyt17YQbAnalZLQfj2mZW5VzlviI+TgIu5npgjtyjnpZKF7vEMJGPxf4M4XRfK4Ey/BX1 XCzK5vEMJriIf47pAdavs9EuVPR+ggwxHnIWbfwxAm93LyVaRa9EOlZbQTSN7xktPvc+G05F uqz0ePQmn2zt8WkP0HqHXM7dwhWfRDX+7iswyCoSgJzClU/QzxwYxMg6bggZ5Zkj8xoehTgp RmAtrtj4AOn3xXvcFzSAlg6Me+Hdcsv/BoTYH13VX71iidLXGpaxPpGH3fBVeJ8r7ILID8dZ 6RtRvhs9dwUEmWXo2lGNMKlxGGgHTzy7T+z0+OeSGFXV7ZrRhDT+8+ieQ3q9SIUCTGwu9d4q Lqlvj43i7JaL+i+JK46sM6S8m4=Ironport-hdrordr: A9a23:wk1CIKOAb7mvhcBcTjujsMiBIKoaSvp037BK7S1MoNJuEvBw9v re+MjzsCWftN9/Yh4dcLy7VpVoIkmskKKdg7NhXotKNTOO0AeVxelZhrcKqAeQeREWmNQ96U 9hGZIOdeEZDzJB/LrHCN/TKade/DGFmprY+9s31x1WPGZXgzkL1XYDNu6ceHcGIjVuNN4CO7 e3wNFInDakcWR/VLXAOpFUN9Kz3uEijfjdEGY7OyI=List-id: Xen developer discussion <xen-devel.lists.xenproject.org>Thread-index: AQHYKben+W9NHlmgjUe5tEPatIDj2Kyj9FgAgAA+noA=Thread-topic: x86/vmx: Don't spuriously crash the domain when INIT is received 
 On 25/02/2022 08:44, Jan Beulich wrote:
> On 24.02.2022 20:48, Andrew Cooper wrote:
>> In VMX operation, the handling of INIT IPIs is changed.  EXIT_REASON_INIT has
>> nothing to do with the guest in question, simply signals that an INIT was
>> received.
>>
>> Ignoring the INIT is probably the wrong thing to do, but is helpful for
>> debugging.  Crashing the domain which happens to be in context is definitely
>> wrong.  Print an error message and continue.
>>
>> Discovered as collateral damage from when an AP triple faults on S3 resume on
>> Intel TigerLake platforms.
> I'm afraid I don't follow the scenario, which was (only) outlined in
> patch 1: Why would the BSP receive INIT in this case?
SHUTDOWN is a signal emitted by a core when it can't continue.  Triple
fault is one cause, but other sources include a double #MC, etc.
Some external component, in the PCH I expect, needs to turn this into a
platform reset, because one malfunctioning core can't.  It is why a
triple fault on any logical processor brings the whole system down.
> And it also cannot be that the INIT was received by the vCPU while running on
> another CPU:
It's nothing (really) to do with the vCPU.  INIT is a external signal to
the (real) APIC, just like NMI/etc.
It is the next VMEntry on a CPU which received INIT that suffers a
VMEntry failure, and the VMEntry failure has nothing to do with the
contents of the VMCS.
Importantly for Xen however, this isn't applicable for scheduling PV
vCPUs, which is why dom0 wasn't the one that crashed.  This actually
meant that dom0 was alive an usable, albeit it sharing all vCPUs on a
single CPU.
The change in INIT behaviour exists for TXT, where is it critical that
software can clear secrets from RAM before resetting.  I'm not wanting
to get into any of that because it's far more complicated than I have
time to fix.
Xen still ignores the INIT, but now doesn't crash an entirely innocent
domain as a side effect of the platform sending an INIT IPI.
~Andrew
P.S. This is also fun without interrupt remapping.  XSA-3 didn't imagine
the full scope of problems which could occur.
 
 |