[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code
> -----Original Message----- > From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of David > Woodhouse > Sent: 11 August 2020 14:25 > To: Paul Durrant <paul.durrant@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; > Roger Pau Monne > <roger.pau@xxxxxxxxxx> > Cc: Eslam Elnikety <elnikety@xxxxxxxxx>; Andrew Cooper > <andrew.cooper3@xxxxxxxxxx>; Shan Haitao > <haitao.shan@xxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx> > Subject: Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code > > Resending this straw man patch at Roger's request, to restart discussion. > > Redux: In order to cope with the relatively rare case of unmaskable > legacy MSIs, each vlapic EOI takes a domain-global spinlock just to > iterate over all IRQs and determine that there's actually nothing to > do. > > In my testing, I observe that this drops Windows performance on passed- > through NVMe from 2.2M IOPS down to about 1.0M IOPS. > > I have a variant of this patch which just has a single per-domain "I > attached legacy unmaskable MSIs" flag, which is never cleared. The > patch below is per-vector (but Roger points out it should be per-vCPU > per-vector). I don't know that we really care enough to do more than > the single per-domain flag, which in real life would never happen > anyway unless you have crappy hardware, at which point you get back to > today's status quo. > > My main concern is that this code is fairly sparsely documented and I'm > only 99% sure that this code path really *is* only for unmaskable MSIs, > and doesn't have some other esoteric use. A second opinion on that > would be particularly welcome. > The loop appears to be there to handle the case where multiple devices assigned to a domain have MSIs programmed with the same dest/vector... which seems like an odd thing for a guest to do but I guess it is at liberty to do it. Does it matter whether they are maskable or not? Paul > > (NB: APIC assist was a red herring here, except that it had a bug which > stopped the gratuitous EOI work from ever happening at all — which > nobody ever cared about since it doesn't matter on sane hardware, but > then when Paul *fixed* it, we saw it as a performance regression.) > > > On Sat, 2018-08-25 at 00:38 +0100, David Woodhouse wrote: > > On Thu, 2018-01-18 at 10:10 -0500, Paul Durrant wrote: > > > Lastly the previous code did not properly emulate an EOI if a missed EOI > > > was discovered in vlapic_has_pending_irq(); it merely cleared the bit in > > > the ISR. The new code instead calls vlapic_EOI_set(). > > > > Hm, this *halves* my observed performance running a 32-thread > > 'diskspd.exe' on a Windows box with attached NVME devices, which makes > > me sad. > > > > It's the call to hvm_dpci_msi_eoi() that does it. > > > > Commenting out the call to pt_pirq_iterate() and leaving *just* the > > domain-global spinlock bouncing cache lines between all my CPUs, it's > > already down to 1.6MIOPS/s from 2.2M on my test box before it does > > *anything* at all. > > > > Calling an *inline* version of pt_pirq_iterate so no retpoline for the > > indirect calls, and I'm down to 1.1M even when I've nopped out the > > whole of the _hvm_dpci_msi_eoi function that it's calling. Put it all > > back, and I'm down to about 1.0M. So it's worse than halved. > > > > And what's all this for? The code here is making my eyes bleed but I > > believe it's for unmaskable MSIs, and these aren't unmaskable. > > > > Tempted to make it all go away by having a per-domain bitmap of vectors > > for which all this work is actually required, and bypassing the whole > > bloody lot in hvm_dpci_msi_eoi() if the corresponding in bit that > > bitmap isn't set. > > > > The hackish version of that (which seems to work, but would probably > > want testing with an actual unmaskable MSI in the system, and I have > > absolutely no confidence I understand what's going on here) looks > > something like this: > > > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c > > index bab3aa3..24df008 100644 > > --- a/xen/drivers/passthrough/io.c > > +++ b/xen/drivers/passthrough/io.c > > @@ -24,6 +24,7 @@ > > #include <asm/hvm/irq.h> > > #include <asm/hvm/support.h> > > #include <asm/io_apic.h> > > +#include <asm/msi.h> > > > > static DEFINE_PER_CPU(struct list_head, dpci_list); > > > > @@ -282,6 +283,7 @@ int pt_irq_create_bind( > > struct hvm_pirq_dpci *pirq_dpci; > > struct pirq *info; > > int rc, pirq = pt_irq_bind->machine_irq; > > + irq_desc_t *desc; > > > > if ( pirq < 0 || pirq >= d->nr_pirqs ) > > return -EINVAL; > > @@ -422,6 +425,13 @@ int pt_irq_create_bind( > > > > dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode); > > pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id; > > + BUG_ON(!local_irq_is_enabled()); > > + desc = pirq_spin_lock_irq_desc(info, NULL); > > + if ( desc && desc->msi_desc && !msi_maskable_irq(desc- > > >msi_desc) ) > > + set_bit(pirq_dpci->gmsi.gvec, > > + hvm_domain_irq(d)->unmaskable_msi_vecs); > > + spin_unlock_irq(&desc->lock); > > + > > spin_unlock(&d->event_lock); > > > > pirq_dpci->gmsi.posted = false; > > @@ -869,7 +874,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d, > > > > void hvm_dpci_msi_eoi(struct domain *d, int vector) > > { > > - if ( !iommu_enabled || !hvm_domain_irq(d)->dpci ) > > + if ( !iommu_enabled || !hvm_domain_irq(d)->dpci || > > + !test_bit(vector, hvm_domain_irq(d)->unmaskable_msi_vecs) ) > > return; > > > > spin_lock(&d->event_lock); > > diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm- > > x86/hvm/irq.h > > index 8a43cb9..d9d4652 100644 > > --- a/xen/include/asm-x86/hvm/irq.h > > +++ b/xen/include/asm-x86/hvm/irq.h > > @@ -78,6 +78,7 @@ struct hvm_irq { > > u8 round_robin_prev_vcpu; > > > > struct hvm_irq_dpci *dpci; > > + DECLARE_BITMAP(unmaskable_msi_vecs, 256); > > > > /* > > * Number of wires asserting each GSI. > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxxx > > https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |