Xen project Mailing List

Re: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled

From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>

Date: Mon, 21 Nov 2022 12:23:52 +0000

Accept-language: en-GB, en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oVL7tfPvhqZHEWr4k0UwNBS6LYjPQVIpkkPufNfNWCQ=; b=NMcuS8tKLWo97lFEdCXpeGDYaA6lTIUgnN2GiUTiKpRF7HGwNdrcswlVS2XqwnAevoDOHPzK3Vj5gWc/0nOz6mgzQ+T1K5emPWaJRYaEmEo4Qk8+hMXFUoVsp/JtGICpbgfjgUeVwAsBjIR1IJxV8eeBohOLo7YSHIIdHisbA3xPuceoQGdEFs3dUHhTEckjRkZeu68PGrfI7AQYiWUglB422AkOBZHoIrFlqXzvttRVkJU0ThMHXqf5vdobXAtsjrelRASdjOti6ET1jKpEza0RbwdN+w3ux+T5NPHduf5WWg+Ty7K/GTIVtFsq0gwhJ9wIzwL5YoRQfivsg57MDA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MLZ1T6aU5w7jxtBojLBVo0tKfBIM6ZWEKkvjVlPulZecYgMSMu/T/yfFtjnLErf69FBjQbCIKjZLwRXGBTDCcnXnrxp76AD3utEUKCMAJWV4f0rdWJHhj2IP/KszkhWWRr5V0K32UMtF0g1Aukn7Xm0/nqHUzir6h1R9ALvChgmSvrfiIlSFGANNm68jrMOHAutveXhDD5OctOhXUdAx9NmYJjQk6xoeDurghBFhmVN7koCzTNObyT5K3yG3YxGQKFCt74TZOzE56IaAe+cZT/i1oxz5iZT17LdqXwVE8jNwrU2FLe7mZ3HABqqPdxZt/VaBgJsRvjKxjEt/T10emQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 21 Nov 2022 12:24:07 +0000

Ironport-data: A9a23:CpfDY6u5bWo5utqUaNebnlR0rOfnVGdfMUV32f8akzHdYApBsoF/q tZmKWDTa6yKMTHxetx2PNmw9EkH6J7Qx4BhTQprrX8wRnsR+JbJXdiXEBz9bniYRiHhoOCLz O1FM4Wdc5pkJpP4jk3wWlQ0hSAkjclkfpKlVKiffHg0HVU/IMsYoUoLs/YjhYJ1isSODQqIu Nfjy+XSI1bg0DNvWo4uw/vrRChH4bKj5lv0gnRkPaoR5QaEziFMZH4iDfrZw0XQE9E88tGSH 44v/JnhlkvF8hEkDM+Sk7qTWiXmlZaLYGBiIlIPM0STqkAqSh4ai87XB9JFAatjsB2bnsgZ9 Tl4ncfYpTHFnEH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwNm8yNR2nl6GK64mRYO5Lids7M+TABdZK0p1g5Wmx4fcOZ7nmG/+Pz/kBmTA6i4ZJAOrUY NcfZXx3dhPcbhZTO1ARTpUjgOOvgXq5eDpdwL6XjfNvvy6Pk0osif6xa7I5efTTLSlRtm+eq njL4CLSBRYCOcbE4TGE7mitlqnEmiaTtIc6RODmqKMy2wz7Kmo7VjgKTgGBsfqCj22lSf1iB 3U4xDEVsv1nnKCsZpynN/Gim1aGtBMBX9tbE8Uh9RqAjKHT5m6xGWwsXjNHLts8u6ceXTU30 neZktWvAiZg2JWFRHTY+rqKoDeaPSkOMXREdSICVREC4dTovMc0lB2nczp4OKu8j9mwEzepx TmP9HI6n+9L0ZVN0Lin91fahT7qvoLOUgM++gTQWCSi8x99Y4mmIYev7DA38Mp9EWpQdXHZ1 FBspiRUxLlm4U2l/MBVfNgwIQ==

Ironport-hdrordr: A9a23:stulDKDbxes4vTDlHeiEsseALOsnbusQ8zAXPh9KJCC9I/bzqy nxpp8mPEfP+U0ssHFJo6HiBEEZKUmsuKKdkrNhR4tKOzOW9FdATbsSp7cKpgeNJ8SQzJ876U 4NSclD4ZjLfCBHZKXBkUaF+rQbsb+6GcmT7I+woUuFDzsaEp2IhD0JaDpzZ3cGIDWucqBJca Z0iPAmmxOQPVAsKuirDHgMWObO4/XNiZLdeBYDQzI39QWUijusybjiVzyVxA0XXT9jyaortT GtqX252oyT99WAjjPM3W7a6Jpb3PPn19t4HcSJzuQFNzn2jQ6sRYJ5H5mPpio8ru2D4Esj1P PMvxAjFcJu7G65RBD6nTLdny3blBo+4X7rzlGVxVH5p9bieT48A81dwapEbxrw8SMbzZJB+Z MO+1jcm4tcDBvGkii4zcPPTQtWmk29pmdnufIPjkZYTZAVZNZq3M4iFQJuYdI99RDBmcca+d pVfYfhDTFtAAqnhkXizy1SKRqXLywO91m9MxM/U4euokVrdThCvjclLYok7zc9HdsGOud5D6 6vCNUWqJheCsARdq5zH+EHXI++DXHMWwvFNCaILU3gD7xvAQOFl3fb2sRD2AiRQu1/8LIi3J DaFF9Iv287fEzjTcWIwZ1Q6xjIBGGwRy7kxM1S74Vw/uSUfsuhDQSTDFQ118ewqfQWBcPWH/ 61JZJNGvfmaW/jA5xA0QHyU4RbbXMeTMoWsNAmXE/mmLOCFqT68ujANPrDLrvkFjgpHmv5H3 sYRTD2YN5N60i6M0WI9CQ5m0mdD3AX0agAY5QypdJjubTlHrc8wjQ9mBC++tyBLyFEv+g/YF Z+SYmX4J+GmQ==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHY+zjv47cM7GZ23kWVhiI/91aGOa5EnVaAgAAGB4CAABnfgIAEWpmAgAA59AA=

Thread-topic: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled

On 21/11/2022 08:56, Jan Beulich wrote: > On 18.11.2022 15:27, Andrew Cooper wrote: >> On 18/11/2022 12:54, Jan Beulich wrote: >>> On 18.11.2022 13:33, Andrew Cooper wrote: >>>> On 18/11/2022 10:31, Jan Beulich wrote: >>>>> Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has >>>>> exposed a problem with the marking of the respective vector as >>>>> pending: For quite some time Linux has been checking whether any stale >>>>> ISR or IRR bits would still be set while preparing the LAPIC for use. >>>>> This check is now triggering on the upcall vector, as the registration, >>>>> at least for APs, happens before the LAPIC is actually enabled. >>>>> >>>>> In software-disabled state an LAPIC would not accept any interrupt >>>>> requests and hence no IRR bit would newly become set while in this >>>>> state. As a result it is also wrong for us to mark the upcall vector as >>>>> having a pending request when the vLAPIC is in this state. >>>> I agree with this. >>>> >>>>> To compensate for the "enabled" check added to the assertion logic, add >>>>> logic to (conditionally) mark the upcall vector as having a request >>>>> pending at the time the LAPIC is being software-enabled by the guest. >>>> But this, I don't think is appropriate. >>>> >>>> The point of raising on enable is allegedly to work around setup race >>>> conditions. I'm unconvinced by this reasoning, but it is what it is, >>>> and the stated behaviour is to raise there and then. >>>> >>>> If a guest enables evtchn while the LAPIC is disabled, then the >>>> interrupt is lost. Like every other interrupt in an x86 system. >>> Edge triggered ones you mean, I suppose, but yes. >> For IO-APIC systems, you mostly lose line interrupts too, don't you? >> >> The line will remain pending at the IO-APIC, but nothing in the system >> will unwedge until someone polls the IO-APIC. >> >> Either way... >> >>>> I don't think there is any credible way a guest kernel author can expect >>>> the weird evtchn edgecase to wait for an arbitrary point in the future, >>>> and it's a corner case that I think is worth not keeping. >>> Well - did you look at 7b5b8ca7dffd ("x86/upcall: inject a spurious event >>> after setting upcall vector"), referenced by the Fixes: tag? The issue is >>> that with evtchn_upcall_pending once set, there would never again be a >>> notification. >> Ok, so we do need to do something. >> >>> So if what you say is to be the model we follow, then that >>> earlier change was perhaps wrong as well. Instead it should then have >>> been a guest change (as also implicit from your reply) to clear >>> evtchn_upcall_pending after vCPU info registration (there) or LAPIC >>> enabling (here), perhaps by way of "manually" invoking the handling of >>> that pending event, or by issuing a self-IPI with that vector. >>> Especially the LAPIC enabling case would then be yet another Xen-specific >>> on a guest code path which better wouldn't have to be aware of Xen. >> Without trying to prescribe how to fix this specific issue, wherever >> possible we should be trying to limit the Xen-isms from non-Xen areas. >> There's a whole lot of poorly described and surprising behaviours which >> have not stood the test of time. >> >> In this case, it seems that we have yet another x86 PV-ism which hasn't >> translated well x86 HVM. Specifically, we're trying to overlay an >> entirely shared-memory (and delayed return-to-guest) interrupt >> controller onto one which is properly constructed to handle events in >> realtime. >> >> >> I even got as far as writing that maybe leaving it as-is was the best >> option (principle of least surprise for Xen developers), but our >> "friend" apic acceleration strikes again. >> >> Xen doesn't always get a VMExit when the guest clears SW_DISABLE, >> because microcode may have accelerated it. > But as per "APIC-Write Emulation" in the SDM we'd still get an APIC-write > VM exit. Intel isn't the only accelerated implementation, and there future details not in the public docs. There will be an implementation we will want to support where Xen doesn't get a vmexit for a write to SPIV. > If we didn't, how would our internal accounting of APIC enabled > state (VLAPIC_SW_DISABLED) work? It doesn't. One of many problems on the "known errors" list from an incomplete original attempt to get acceleration working. There's no good reason to cache those disables in the first place (both are both trivially available from other positions in memory), and correctness reasons not to. > And the neighboring (to where I'm adding > the new code) pt_may_unmask_irq() call then also wouldn't occur. > > I'm actually pretty sure we do too much in this case - in particular none > of the vlapic_set_reg() should be necessary. But we certainly can't get > away with doing nothing, and hence we depend on that VM exit to actually > occur. We must do exactly and only what real hardware does, so that the state changes emulated by Xen are identical to those accelerated by microcode. Other than that, I really wouldn't make any presumptions about the existing vLAPIC logic being correct. It is, at best, enough of an approximation to the spec for major OSes to function, with multiple known bugs and no coherent testing. ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.