[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 23 Nov 2022 13:03:08 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CaBc5YUtrTp/b8VA8qm2klkf++dz7OUz+9ZVGyK1Oms=; b=f7P7CSRWfoLuyhISsU+GoeBWcdENz8R8rur28po+YXV42/yrjEweV9+utmWCGsq8T69pJoK/F1pH6p2fcQptsSRo1VjJjSz7o9SSQDNd9+PW9hhBT03dEPWLwKQgsOgKl5QSI0wMBf8e0xW3a7qHd9OIMLA0mvvVrXRuiHCaq8VSC1WVKC253WGGbZE9dA9OUlKaSfu8Es1yeJo8F0V0iIayLu07s5znbSxXGhNLprVJz27hRU+4C+C4+6og/+1Cv116WdbOqqYZNguOHRtYJnjqxozFNUG6KSO8XvB0WZsbcAP0U3xZlW24BFJ7wsck19Mrj5i1ifNH6sT7HdyeGA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d/QaZg/AtVPTGfRp3QGx1Ju8A+ivhN/EKJps8BcR/4LnYuYGqe+h984/vOJSLzx/L/3aAMBVP9Gd28r8dvb/wYEYA8Z0yBc1ZPCevdVXyaHwQ8//6QrHQu3Z5zCCBIDi5kc3OWoPy2gpMSRv2k+xV0aRvBSXdUL6Jt/GCfibIvhNMQMHyPnBzrgn3V9+53ic9r5vpfViiplzqfOzfmEdBhbvFyaPZ7TbtiBO9ROxwCrEYDMTNth5VA+4d4imMeZouwSYI5nnd36b2p+y8hE40mPFdDbdYh84+95ydT3Cp60WUnkJu/n/eHdQrnoTeGdGERz31rYPo2MXszKhw7tStQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 23 Nov 2022 12:03:30 +0000
  • Ironport-data: A9a23:m+6RtaBteIgivxVW/xziw5YqxClBgxIJ4kV8jS/XYbTApD0hhjcCy 2FMXm7UP/mPMTfxeN92b4Xi9koP6p7Vnd82QQY4rX1jcSlH+JHPbTi7wuUcHAvJd5GeExg3h yk6QoOdRCzhZiaE/n9BCpC48T8nk/nNHuCnYAL9EngZbRd+Tys8gg5Ulec8g4p56fC0GArIs t7pyyHlEAbNNwVcbyRFtcpvlDs15K6o4WpC5QRkDRx2lAS2e0c9Xcp3yZ6ZdxMUcqEMdsamS uDKyq2O/2+x13/B3fv8z94X2mVTKlLjFVDmZkh+AsBOsTAbzsAG6Y4pNeJ0VKtio27hc+ada jl6ncfYpQ8BZsUgkQmGOvVSO3kW0aZuoNcrLZUj2CA6IoKvn3bEmp1T4E8K0YIwvfZmJ0px2 +cjLHNXXjqsvb6/346Dc7w57igjBJGD0II3nFhFlGicIdN4BJfJTuPN+MNS2yo2ioZWB/HCa sEFaD1pKhPdfxlIPVRRA5U79AuqriCnL3sE9xTI+uxuvTi7IA9ZidABNPLPfdOHX4NNl1uwr WPa5WXpRBodMbRzzBLVryL33LCUw0sXXqoMG+yE2d5qm2efmGwrIiA5XmaYiPSm3xvWt9V3b hZ8FjAVhao4+VGvT9L9dwalu3PCtRkZM/JPF8Uq5QfLzbDbiy6BD3UAZi5MbpohrsBebSwn0 BqFks3kARRrsaaJUjSN+7GMtzSwNCMJa2gYakc5oRAt5tDipMQ/i0zJR9M6Saqt1ISrSXf33 iyAqzU4i/MLl8kX2q6n/FfBxTWxupzOSQ1z7QLSNo640j5EiEeeT9TAwTDmATxodu51knHpU KA4pvWj
  • Ironport-hdrordr: A9a23:9rpCgqE+jipImL7+pLqFiJLXdLJyesId70hD6qkvc3Fom52j/f xGws5x6faVslkssb8b6LW90Y27MAvhHPlOkPIs1NaZLXDbUQ6TQL2KgrGD/9SNIVycygcZ79 YbT0EcMqyOMbEZt7ec3ODQKb9Jrri6GeKT9IHjJh9WPH1XgspbnmNE42igYy9LrF4sP+tFKH PQ3LswmxOQPVAsKuirDHgMWObO4/XNiZLdeBYDQzI39QWUijusybjiVzyVxA0XXT9jyaortT GtqX2y2oyT99WAjjPM3W7a6Jpb3PPn19t4HcSJzuQFNzn2jQ6sRYJ5H5mPpio8ru2D4Esj1P PMvxAjFcJu7G65RBD/nTLdny3blBo+4X7rzlGVxVPlvMzCXTo/T+5Mn5hQfBf141cp+IgU6t MC40up875sST/QliX04NbFEzlsi0qPuHIn1coelWZWX4cyYKJY6aYf4ERWOpEdGz+S0vFvLM BeSOXnoNpGe1KTaH7U+kFp3dyXR3w2WiyLR0AT0/blpgR+rTRc9Q811cYflnAP+NYWUJ9f/d nJNaxuifVnUtIWRbgVPpZOfeKHTkj2BT7cOmObJlrqUIsdPWjWlpLx6LIpoMm3ZZ0zyocokp ipaiIWiYcLQTOvNSSy5uwJzviUK1/NHwgFi/suq6SRg4eMBYYCaka4ORUTe8jJmYRsPiSUYY f2BHtsOY6SEYLfI/c24+TAYegiFZA/arxghj9pYSP4nuv7bqvXi8f8TNH/YJLQLBdMYBKNPp JEZkm/GPl9
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Nov 21, 2022 at 01:34:38PM +0100, Jan Beulich wrote:
> On 21.11.2022 13:23, Andrew Cooper wrote:
> > On 21/11/2022 08:56, Jan Beulich wrote:
> >> On 18.11.2022 15:27, Andrew Cooper wrote:
> >>> On 18/11/2022 12:54, Jan Beulich wrote:
> >>>> On 18.11.2022 13:33, Andrew Cooper wrote:
> >>>>> On 18/11/2022 10:31, Jan Beulich wrote:
> >>>>>> Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has
> >>>>>> exposed a problem with the marking of the respective vector as
> >>>>>> pending: For quite some time Linux has been checking whether any stale
> >>>>>> ISR or IRR bits would still be set while preparing the LAPIC for use.
> >>>>>> This check is now triggering on the upcall vector, as the registration,
> >>>>>> at least for APs, happens before the LAPIC is actually enabled.
> >>>>>>
> >>>>>> In software-disabled state an LAPIC would not accept any interrupt
> >>>>>> requests and hence no IRR bit would newly become set while in this
> >>>>>> state. As a result it is also wrong for us to mark the upcall vector as
> >>>>>> having a pending request when the vLAPIC is in this state.
> >>>>> I agree with this.
> >>>>>
> >>>>>> To compensate for the "enabled" check added to the assertion logic, add
> >>>>>> logic to (conditionally) mark the upcall vector as having a request
> >>>>>> pending at the time the LAPIC is being software-enabled by the guest.
> >>>>> But this, I don't think is appropriate.
> >>>>>
> >>>>> The point of raising on enable is allegedly to work around setup race
> >>>>> conditions.  I'm unconvinced by this reasoning, but it is what it is,
> >>>>> and the stated behaviour is to raise there and then.
> >>>>>
> >>>>> If a guest enables evtchn while the LAPIC is disabled, then the
> >>>>> interrupt is lost.  Like every other interrupt in an x86 system.
> >>>> Edge triggered ones you mean, I suppose, but yes.
> >>> For IO-APIC systems, you mostly lose line interrupts too, don't you?
> >>>
> >>> The line will remain pending at the IO-APIC, but nothing in the system
> >>> will unwedge until someone polls the IO-APIC.
> >>>
> >>> Either way...
> >>>
> >>>>> I don't think there is any credible way a guest kernel author can expect
> >>>>> the weird evtchn edgecase to wait for an arbitrary point in the future,
> >>>>> and it's a corner case that I think is worth not keeping.
> >>>> Well - did you look at 7b5b8ca7dffd ("x86/upcall: inject a spurious event
> >>>> after setting upcall vector"), referenced by the Fixes: tag? The issue is
> >>>> that with evtchn_upcall_pending once set, there would never again be a
> >>>> notification.
> >>> Ok, so we do need to do something.
> >>>
> >>>>  So if what you say is to be the model we follow, then that
> >>>> earlier change was perhaps wrong as well. Instead it should then have
> >>>> been a guest change (as also implicit from your reply) to clear
> >>>> evtchn_upcall_pending after vCPU info registration (there) or LAPIC
> >>>> enabling (here), perhaps by way of "manually" invoking the handling of
> >>>> that pending event, or by issuing a self-IPI with that vector.
> >>>> Especially the LAPIC enabling case would then be yet another Xen-specific
> >>>> on a guest code path which better wouldn't have to be aware of Xen. 
> >>> Without trying to prescribe how to fix this specific issue, wherever
> >>> possible we should be trying to limit the Xen-isms from non-Xen areas. 
> >>> There's a whole lot of poorly described and surprising behaviours which
> >>> have not stood the test of time.
> >>>
> >>> In this case, it seems that we have yet another x86 PV-ism which hasn't
> >>> translated well x86 HVM.  Specifically, we're trying to overlay an
> >>> entirely shared-memory (and delayed return-to-guest) interrupt
> >>> controller onto one which is properly constructed to handle events in
> >>> realtime.
> >>>
> >>>
> >>> I even got as far as writing that maybe leaving it as-is was the best
> >>> option (principle of least surprise for Xen developers), but our
> >>> "friend" apic acceleration strikes again.
> >>>
> >>> Xen doesn't always get a VMExit when the guest clears SW_DISABLE,
> >>> because microcode may have accelerated it.
> >> But as per "APIC-Write Emulation" in the SDM we'd still get an APIC-write
> >> VM exit.
> > 
> > Intel isn't the only accelerated implementation, and there future
> > details not in the public docs.
> > 
> > There will be an implementation we will want to support where Xen
> > doesn't get a vmexit for a write to SPIV.
> 
> I see.
> 
> >> If we didn't, how would our internal accounting of APIC enabled
> >> state (VLAPIC_SW_DISABLED) work?
> > 
> > It doesn't.
> > 
> > One of many problems on the "known errors" list from an incomplete
> > original attempt to get acceleration working.
> > 
> > There's no good reason to cache those disables in the first place (both
> > are both trivially available from other positions in memory), and
> > correctness reasons not to.
> > 
> >>  And the neighboring (to where I'm adding
> >> the new code) pt_may_unmask_irq() call then also wouldn't occur.
> >>
> >> I'm actually pretty sure we do too much in this case - in particular none
> >> of the vlapic_set_reg() should be necessary. But we certainly can't get
> >> away with doing nothing, and hence we depend on that VM exit to actually
> >> occur.
> > 
> > We must do exactly and only what real hardware does, so that the state
> > changes emulated by Xen are identical to those accelerated by microcode.
> > 
> > Other than that, I really wouldn't make any presumptions about the
> > existing vLAPIC logic being correct.
> > 
> > It is, at best, enough of an approximation to the spec for major OSes to
> > function, with multiple known bugs and no coherent testing.
> 
> But can we leave resolving of the wider issue then separate, and leave
> the change here as it presently is? Yes, mimic-ing the same behavior
> later may be "interesting", but if we can't achieve consistent behavior
> with yet more advanced acceleration, maybe we simply can't use that
> (perhaps until a complete overhaul of everything involved in LAPIC
> handling, possibly including a guest side indicator that they're happy
> without the extra signaling, at which point that yet-more-advanced
> acceleration could then be enabled for that guest).
> 
> Otherwise - do you have any suggestion as to alternative logic which I
> might use in this patch?

Maybe the underlying issue is that we allow executing
HVMOP_set_evtchn_upcall_vector against remote vCPU.  This could be
solved by only allowing HVMOP_set_evtchn_upcall_vector against the
current vCPU and with the LAPIC enabled, but I guess we are too late
for that.

Actually, what about only injecting the spurious event if the vCPU is
online, ie:

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ae4368ec4b..4b84706579 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4105,7 +4105,8 @@ static int hvmop_set_evtchn_upcall_vector(
     printk(XENLOG_G_INFO "%pv: upcall vector %02x\n", v, op.vector);
 
     v->arch.hvm.evtchn_upcall_vector = op.vector;
-    hvm_assert_evtchn_irq(v);
+    if ( is_vcpu_online(v) )
+        hvm_assert_evtchn_irq(v);
     return 0;
 }
 

I think that should fix the issue seen on Linux.  We would
additionally need to fix hvm_assert_evtchn_irq() to only set the
vector if the vLAPIC is enabled.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.