[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled
On 18.11.2022 15:27, Andrew Cooper wrote: > On 18/11/2022 12:54, Jan Beulich wrote: >> On 18.11.2022 13:33, Andrew Cooper wrote: >>> On 18/11/2022 10:31, Jan Beulich wrote: >>>> Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has >>>> exposed a problem with the marking of the respective vector as >>>> pending: For quite some time Linux has been checking whether any stale >>>> ISR or IRR bits would still be set while preparing the LAPIC for use. >>>> This check is now triggering on the upcall vector, as the registration, >>>> at least for APs, happens before the LAPIC is actually enabled. >>>> >>>> In software-disabled state an LAPIC would not accept any interrupt >>>> requests and hence no IRR bit would newly become set while in this >>>> state. As a result it is also wrong for us to mark the upcall vector as >>>> having a pending request when the vLAPIC is in this state. >>> I agree with this. >>> >>>> To compensate for the "enabled" check added to the assertion logic, add >>>> logic to (conditionally) mark the upcall vector as having a request >>>> pending at the time the LAPIC is being software-enabled by the guest. >>> But this, I don't think is appropriate. >>> >>> The point of raising on enable is allegedly to work around setup race >>> conditions. I'm unconvinced by this reasoning, but it is what it is, >>> and the stated behaviour is to raise there and then. >>> >>> If a guest enables evtchn while the LAPIC is disabled, then the >>> interrupt is lost. Like every other interrupt in an x86 system. >> Edge triggered ones you mean, I suppose, but yes. > > For IO-APIC systems, you mostly lose line interrupts too, don't you? > > The line will remain pending at the IO-APIC, but nothing in the system > will unwedge until someone polls the IO-APIC. > > Either way... > >> >>> I don't think there is any credible way a guest kernel author can expect >>> the weird evtchn edgecase to wait for an arbitrary point in the future, >>> and it's a corner case that I think is worth not keeping. >> Well - did you look at 7b5b8ca7dffd ("x86/upcall: inject a spurious event >> after setting upcall vector"), referenced by the Fixes: tag? The issue is >> that with evtchn_upcall_pending once set, there would never again be a >> notification. > > Ok, so we do need to do something. > >> So if what you say is to be the model we follow, then that >> earlier change was perhaps wrong as well. Instead it should then have >> been a guest change (as also implicit from your reply) to clear >> evtchn_upcall_pending after vCPU info registration (there) or LAPIC >> enabling (here), perhaps by way of "manually" invoking the handling of >> that pending event, or by issuing a self-IPI with that vector. >> Especially the LAPIC enabling case would then be yet another Xen-specific >> on a guest code path which better wouldn't have to be aware of Xen. > > Without trying to prescribe how to fix this specific issue, wherever > possible we should be trying to limit the Xen-isms from non-Xen areas. > There's a whole lot of poorly described and surprising behaviours which > have not stood the test of time. > > In this case, it seems that we have yet another x86 PV-ism which hasn't > translated well x86 HVM. Specifically, we're trying to overlay an > entirely shared-memory (and delayed return-to-guest) interrupt > controller onto one which is properly constructed to handle events in > realtime. > > > I even got as far as writing that maybe leaving it as-is was the best > option (principle of least surprise for Xen developers), but our > "friend" apic acceleration strikes again. > > Xen doesn't always get a VMExit when the guest clears SW_DISABLE, > because microcode may have accelerated it. But as per "APIC-Write Emulation" in the SDM we'd still get an APIC-write VM exit. If we didn't, how would our internal accounting of APIC enabled state (VLAPIC_SW_DISABLED) work? And the neighboring (to where I'm adding the new code) pt_may_unmask_irq() call then also wouldn't occur. I'm actually pretty sure we do too much in this case - in particular none of the vlapic_set_reg() should be necessary. But we certainly can't get away with doing nothing, and hence we depend on that VM exit to actually occur. Plus simply making the vlapic_set_reg() conditional also likely wouldn't do any good, so if anything we may want to split vlapic_reg_write() and invoke only the "2nd half" from vlapic_apicv_write(). Jan > A consequence of this observation is that Xen cannot have > non-LAPIC-archtiectural behaviour in the vlapic emulation. So I think > we need to find a solution to this problem that doesn't hook APIC_SPIV. > > ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |