[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/2] x86/iommu: avoid MSI address and data writes if IRT index hasn't changed



On Mon, Mar 10, 2025 at 11:51:09AM +0100, Jan Beulich wrote:
> On 10.03.2025 10:55, Roger Pau Monne wrote:
> > Attempt to reduce the MSI entry writes, and the associated checking whether
> > memory decoding and MSI-X is enabled for the PCI device, when the MSI data
> > hasn't changed.
> > 
> > When using Interrupt Remapping the MSI entry will contain an index into
> > the remapping table, and it's in such remapping table where the MSI vector
> > and destination CPU is stored.  As such, when using interrupt remapping,
> > changes to the interrupt affinity shouldn't result in changes to the MSI
> > entry, and the MSI entry update can be avoided.
> > 
> > Signal from the IOMMU update_ire_from_msi hook whether the MSI data or
> > address fields have changed, and thus need writing to the device registers.
> > Such signaling is done by returning 1 from the function.  Otherwise
> > returning 0 means no update of the MSI fields, and thus no write
> > required.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> 
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> with two purely cosmetic suggestions and an only loosely related question 
> below.
> 
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -415,7 +415,9 @@ static int cf_check vmx_pi_update_irte(const struct 
> > vcpu *v,
> >  
> >      ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain);
> >  
> > -    return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> > +    rc = iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> > +
> > +    return rc < 0 ? rc : 0;
> 
> Only tangential here, but: Why does this function have a return type of
> non-void, when neither caller cares?

I'm afraid there's more wrong in vmx_pi_update_irte() that I've just
spotted afterwards.

vmx_pi_update_irte() passes to iommu_update_ire_from_msi() the
msi_desc->msg field, but that field is supposed to always contain the
non-translated MSI data, as you correctly pointed out in v1 it's
consumed by dump_msi().  vmx_pi_update_irte() using msi_desc->msg to
store the translated MSI effectively breaks dump_msi().

Also vmx_pi_update_irte() relies on the IRT index never changing, as
otherwise it's missing any logic to update the MSI registers.

I will fix that in a pre-patch.

> 
> > --- a/xen/drivers/passthrough/amd/iommu_intr.c
> > +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> > @@ -492,7 +492,7 @@ static int update_intremap_entry_from_msi_msg(
> >                 get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
> >      }
> >  
> > -    return 0;
> > +    return !fresh ? 0 : 1;
> >  }
> 
> Simply
> 
>     return fresh;
> 
> ?
> 
> > @@ -546,7 +546,7 @@ int cf_check amd_iommu_msi_msg_update_ire(
> >      rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
> >                                              &msi_desc->remap_index,
> >                                              msg, &data);
> > -    if ( !rc )
> > +    if ( rc > 0 )
> >      {
> >          for ( i = 1; i < nr; ++i )
> >              msi_desc[i].remap_index = msi_desc->remap_index + i;
> > --- a/xen/drivers/passthrough/vtd/intremap.c
> > +++ b/xen/drivers/passthrough/vtd/intremap.c
> > @@ -506,6 +506,7 @@ static int msi_msg_to_remap_entry(
> >      unsigned int index, i, nr = 1;
> >      unsigned long flags;
> >      const struct pi_desc *pi_desc = msi_desc->pi_desc;
> > +    bool alloc = false;
> >  
> >      if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
> >          nr = msi_desc->msi.nvec;
> > @@ -529,6 +530,7 @@ static int msi_msg_to_remap_entry(
> >          index = alloc_remap_entry(iommu, nr);
> >          for ( i = 0; i < nr; ++i )
> >              msi_desc[i].remap_index = index + i;
> > +        alloc = true;
> >      }
> >      else
> >          index = msi_desc->remap_index;
> > @@ -601,7 +603,7 @@ static int msi_msg_to_remap_entry(
> >      unmap_vtd_domain_page(iremap_entries);
> >      spin_unlock_irqrestore(&iommu->intremap.lock, flags);
> >  
> > -    return 0;
> > +    return alloc ? 1 : 0;
> >  }
> 
> Like above, simply
> 
>     return alloc;
> 
> ?

I wasn't sure whether this was overloading the boolean type and
possibly breaking some MISRA rule.  I can adjust.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.