[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code



> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of David 
> Woodhouse
> Sent: 11 August 2020 14:25
> To: Paul Durrant <paul.durrant@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; 
> Roger Pau Monne
> <roger.pau@xxxxxxxxxx>
> Cc: Eslam Elnikety <elnikety@xxxxxxxxx>; Andrew Cooper 
> <andrew.cooper3@xxxxxxxxxx>; Shan Haitao
> <haitao.shan@xxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>
> Subject: Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code
> 
> Resending this straw man patch at Roger's request, to restart discussion.
> 
> Redux: In order to cope with the relatively rare case of unmaskable
> legacy MSIs, each vlapic EOI takes a domain-global spinlock just to
> iterate over all IRQs and determine that there's actually nothing to
> do.
> 
> In my testing, I observe that this drops Windows performance on passed-
> through NVMe from 2.2M IOPS down to about 1.0M IOPS.
> 
> I have a variant of this patch which just has a single per-domain "I
> attached legacy unmaskable MSIs" flag, which is never cleared. The
> patch below is per-vector (but Roger points out it should be per-vCPU
> per-vector). I don't know that we really care enough to do more than
> the single per-domain flag, which in real life would never happen
> anyway unless you have crappy hardware, at which point you get back to
> today's status quo.
> 
> My main concern is that this code is fairly sparsely documented and I'm
> only 99% sure that this code path really *is* only for unmaskable MSIs,
> and doesn't have some other esoteric use. A second opinion on that
> would be particularly welcome.
> 

The loop appears to be there to handle the case where multiple devices assigned 
to a domain have MSIs programmed with the same dest/vector... which seems like 
an odd thing for a guest to do but I guess it is at liberty to do it. Does it 
matter whether they are maskable or not? 

  Paul

> 
> (NB: APIC assist was a red herring here, except that it had a bug which
> stopped the gratuitous EOI work from ever happening at all — which
> nobody ever cared about since it doesn't matter on sane hardware, but
> then when Paul *fixed* it, we saw it as a performance regression.)
> 
> 
> On Sat, 2018-08-25 at 00:38 +0100, David Woodhouse wrote:
> > On Thu, 2018-01-18 at 10:10 -0500, Paul Durrant wrote:
> > > Lastly the previous code did not properly emulate an EOI if a missed EOI
> > > was discovered in vlapic_has_pending_irq(); it merely cleared the bit in
> > > the ISR. The new code instead calls vlapic_EOI_set().
> >
> > Hm, this *halves* my observed performance running a 32-thread
> > 'diskspd.exe' on a Windows box with attached NVME devices, which makes
> > me sad.
> >
> > It's the call to hvm_dpci_msi_eoi() that does it.
> >
> > Commenting out the call to pt_pirq_iterate() and leaving *just* the
> > domain-global spinlock bouncing cache lines between all my CPUs, it's
> > already down to 1.6MIOPS/s from 2.2M on my test box before it does
> > *anything* at all.
> >
> > Calling an *inline* version of pt_pirq_iterate so no retpoline for the
> > indirect calls, and I'm down to 1.1M even when I've nopped out the
> > whole of the _hvm_dpci_msi_eoi function that it's calling. Put it all
> > back, and I'm down to about 1.0M. So it's worse than halved.
> >
> > And what's all this for? The code here is making my eyes bleed but I
> > believe it's for unmaskable MSIs, and these aren't unmaskable.
> >
> > Tempted to make it all go away by having a per-domain bitmap of vectors
> > for which all this work is actually required, and bypassing the whole
> > bloody lot in hvm_dpci_msi_eoi() if the corresponding in bit that
> > bitmap isn't set.
> >
> > The hackish version of that (which seems to work, but would probably
> > want testing with an actual unmaskable MSI in the system, and I have
> > absolutely no confidence I understand what's going on here) looks
> > something like this:
> >
> > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > index bab3aa3..24df008 100644
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -24,6 +24,7 @@
> >  #include <asm/hvm/irq.h>
> >  #include <asm/hvm/support.h>
> >  #include <asm/io_apic.h>
> > +#include <asm/msi.h>
> >
> >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> >
> > @@ -282,6 +283,7 @@ int pt_irq_create_bind(
> >      struct hvm_pirq_dpci *pirq_dpci;
> >      struct pirq *info;
> >      int rc, pirq = pt_irq_bind->machine_irq;
> > +    irq_desc_t *desc;
> >
> >      if ( pirq < 0 || pirq >= d->nr_pirqs )
> >          return -EINVAL;
> > @@ -422,6 +425,13 @@ int pt_irq_create_bind(
> >
> >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> > +        BUG_ON(!local_irq_is_enabled());
> > +        desc = pirq_spin_lock_irq_desc(info, NULL);
> > +        if ( desc && desc->msi_desc && !msi_maskable_irq(desc-
> > >msi_desc) )
> > +            set_bit(pirq_dpci->gmsi.gvec,
> > +                    hvm_domain_irq(d)->unmaskable_msi_vecs);
> > +        spin_unlock_irq(&desc->lock);
> > +
> >          spin_unlock(&d->event_lock);
> >
> >          pirq_dpci->gmsi.posted = false;
> > @@ -869,7 +874,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
> >
> >  void hvm_dpci_msi_eoi(struct domain *d, int vector)
> >  {
> > -    if ( !iommu_enabled || !hvm_domain_irq(d)->dpci )
> > +    if ( !iommu_enabled || !hvm_domain_irq(d)->dpci ||
> > +         !test_bit(vector, hvm_domain_irq(d)->unmaskable_msi_vecs) )
> >         return;
> >
> >      spin_lock(&d->event_lock);
> > diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-
> > x86/hvm/irq.h
> > index 8a43cb9..d9d4652 100644
> > --- a/xen/include/asm-x86/hvm/irq.h
> > +++ b/xen/include/asm-x86/hvm/irq.h
> > @@ -78,6 +78,7 @@ struct hvm_irq {
> >      u8 round_robin_prev_vcpu;
> >
> >      struct hvm_irq_dpci *dpci;
> > +    DECLARE_BITMAP(unmaskable_msi_vecs, 256);
> >
> >      /*
> >       * Number of wires asserting each GSI.
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxxx
> > https://lists.xenproject.org/mailman/listinfo/xen-devel





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.