[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] drivers: Change amd_iommu struct to contain pci_sbdf_t, simplify code



On Thu, 13 Mar 2025 at 20:16, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 13/03/2025 6:57 pm, Andrii Sultanov wrote:
> > Following on from 250d87dc, struct amd_iommu has its seg and bdf fields
> > backwards with relation to pci_sbdf_t. Swap them around, and simplify the
> > expressions regenerating an sbdf_t from seg+bdf.
> >
> > Simplify ioapic_sbdf and bpet_sbdf along the way. Adjust functions
> > taking seg and bdf fields of these structs to take pci_sbdf_t instead.
> > Simplify comparisons similarly.
> >
> > Bloat-o-meter reports:
> >
> > ```
> > add/remove: 0/0 grow/shrink: 13/21 up/down: 352/-576 (-224)
> > Function                                     old     new   delta
> > _einittext                                 22028   22220    +192
> > amd_iommu_prepare                            853     897     +44
> > _invalidate_all_devices                      133     154     +21
> > amd_iommu_domain_destroy                      46      63     +17
> > disable_fmt                                12336   12352     +16
> > _acpi_module_name                            416     432     +16
> > amd_iommu_get_reserved_device_memory         521     536     +15
> > parse_ivrs_table                            3955    3966     +11
> > amd_iommu_assign_device                      271     282     +11
> > find_iommu_for_device                        242     246      +4
> > get_intremap_requestor_id                     49      52      +3
> > amd_iommu_free_intremap_table                360     361      +1
> > allocate_domain_resources                     82      83      +1
> > reassign_device                              843     838      -5
> > amd_iommu_remove_device                      352     347      -5
> > amd_iommu_flush_iotlb                        359     354      -5
> > iov_supports_xt                              270     264      -6
> > amd_iommu_setup_domain_device               1478    1472      -6
> > amd_setup_hpet_msi                           232     224      -8
> > amd_iommu_ioapic_update_ire                  572     564      -8
> > _hvm_dpci_msi_eoi                            157     149      -8
> > amd_iommu_msi_enable                          33      20     -13
> > register_range_for_device                    297     281     -16
> > amd_iommu_add_device                         856     839     -17
> > update_intremap_entry_from_msi_msg           879     861     -18
> > amd_iommu_read_ioapic_from_ire               347     323     -24
> > amd_iommu_msi_msg_update_ire                 472     431     -41
> > flush_command_buffer                         460     417     -43
> > set_iommu_interrupt_handler                  421     377     -44
> > amd_iommu_detect_one_acpi                    918     868     -50
> > amd_iommu_get_supported_ivhd_type             86      31     -55
> > iterate_ivrs_mappings                        169     113     -56
> > parse_ivmd_block                            1339    1271     -68
> > enable_iommu                                1745    1665     -80
> > ```
> >
> > Resolves: https://gitlab.com/xen-project/xen/-/issues/198
> >
> > Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > Signed-off-by: Andrii Sultanov <sultanovandriy@xxxxxxxxx>
>
> Well, this is awkward.  This is the task I'd put together for Cody to
> try.  I guess I have to find another one.

My apologies! Just noticed it wasn't claimed on Gitlab, probably should
have pinged you first.

> In commit messages, we always want the subject alongside a hash.  I have
> this local alias to help:
>
> > xen.git/xen$ git commit-str 250d87dc
> > commit 250d87dc3ff9 ("x86/msi: Change __msi_set_enable() to take
> > pci_sbdf_t")
> > xen.git/xen$ git help commit-str
> > 'commit-str' is aliased to 'log -1 --abbrev=12 --pretty=format:'commit
> > %h ("%s")''
>
> (The name is not imaginative, and could probably be better.)

Thanks, will amend.

> > @@ -239,17 +239,17 @@ static int __init register_range_for_device(
> >      unsigned int bdf, paddr_t base, paddr_t limit,
> >      bool iw, bool ir, bool exclusion)
> >  {
> > -    int seg = 0; /* XXX */
> > -    struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
> > +    pci_sbdf_t sbdf = { .seg = 0, .bdf = bdf };
>
> The /* XXX */ wants to stay.  It's highlighting that this code isn't
> muti-segment aware (yet).

Will do.

> > +    struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(sbdf.seg);
> >      struct amd_iommu *iommu;
> >      u16 req;
> >      int rc = 0;
> >
> > -    iommu = find_iommu_for_device(seg, bdf);
> > +    iommu = find_iommu_for_device(sbdf);
> >      if ( !iommu )
> >      {
> >          AMD_IOMMU_WARN("IVMD: no IOMMU for device %pp - ignoring 
> > constrain\n",
> > -                       &PCI_SBDF(seg, bdf));
> > +                       &(sbdf));
>
> The brackets should be dropped now.  This should be just &sbdf.

Will do

> > @@ -1543,14 +1540,14 @@ static void invalidate_all_domain_pages(void)
> >  static int cf_check _invalidate_all_devices(
> >      u16 seg, struct ivrs_mappings *ivrs_mappings)
> >  {
> > -    unsigned int bdf;
> > +    pci_sbdf_t sbdf = { .seg = seg, .bdf = 0 };
> >      u16 req_id;
> >      struct amd_iommu *iommu;
> >
> > -    for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
> > +    for ( /* sbdf.bdf = 0 */ ; sbdf.bdf < ivrs_bdf_entries; sbdf.bdf++ )
> >      {
> > -        iommu = find_iommu_for_device(seg, bdf);
> > -        req_id = ivrs_mappings[bdf].dte_requestor_id;
> > +        iommu = find_iommu_for_device(sbdf);
> > +        req_id = ivrs_mappings[sbdf.bdf].dte_requestor_id;
>
> See how bloat-o-meter reports this as the 3rd most increased function.
> This is an example where merging to a pci_sbdf_t local variable is
> making things worse.
>
> Keep the bdf local variable, and use PCI_SBDF() for the call to
> find_iommu_for_device().
>
> The reason is that you're now modifying the low uint16_t half of a
> uint32_t.  This requires emitting 16-bit logic (requires the Operand
> Size Override prefix, contributing to your code size), it also suffers
> register merge penalty

This particular example would be equivalent:
add/remove: 0/0 grow/shrink: 2/2 up/down: 24/-24 (0)
Function                                     old     new   delta
_invalidate_all_devices                      138     154     +16
build_info                                   744     752      +8
__mon_lengths                               2936    2928      -8
iterate_ivrs_mappings                        129     113     -16

Will drop the hunk anyhow to reduce the size of diff. Thanks!


> > diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
> > b/xen/drivers/passthrough/amd/iommu_intr.c
> > index 9abdc38053..0c91125ec0 100644
> > --- a/xen/drivers/passthrough/amd/iommu_intr.c
> > +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> > @@ -123,10 +123,10 @@ static spinlock_t* get_intremap_lock(int seg, int 
> > req_id)
> >             &shared_intremap_lock);
> >  }
> >
> > -static int get_intremap_requestor_id(int seg, int bdf)
> > +static int get_intremap_requestor_id(pci_sbdf_t sbdf)
> >  {
> > -    ASSERT( bdf < ivrs_bdf_entries );
> > -    return get_ivrs_mappings(seg)[bdf].dte_requestor_id;
> > +    ASSERT( sbdf.bdf < ivrs_bdf_entries );
> > +    return get_ivrs_mappings(sbdf.seg)[sbdf.bdf].dte_requestor_id;
>
> This is also an example where merging the parameter is not necessarily
> wise.  The segment and bdf parts are used differently, so this function
> now has to split the one parameter in two, and shift segment by 16 just
> to use it.

Will do:
add/remove: 0/0 grow/shrink: 4/6 up/down: 32/-64 (-32)

> > @@ -335,20 +336,19 @@ void cf_check amd_iommu_ioapic_update_ire(
> >      new_rte.raw = rte;
> >
> >      /* get device id of ioapic devices */
> > -    bdf = ioapic_sbdf[idx].bdf;
> > -    seg = ioapic_sbdf[idx].seg;
> > -    iommu = find_iommu_for_device(seg, bdf);
> > +    sbdf.sbdf = ioapic_sbdf[idx].sbdf.sbdf;
>
> sbdf = ioapic_sbdf[idx].sbdf;
>
> > +    iommu = find_iommu_for_device(sbdf);
> >      if ( !iommu )
> >      {
> >          AMD_IOMMU_WARN("failed to find IOMMU for IO-APIC @ %04x:%04x\n",
> > -                       seg, bdf);
> > +                       sbdf.seg, sbdf.bdf);
>
> This should be converted to %pp, which has a side effect of correcting
> the rendering of bdf.
>
> > @@ -515,15 +515,15 @@ int cf_check amd_iommu_msi_msg_update_ire(
> >      struct msi_desc *msi_desc, struct msi_msg *msg)
> >  {
> >      struct pci_dev *pdev = msi_desc->dev;
> > -    int bdf, seg, rc;
> > +    pci_sbdf_t sbdf;
> > +    int rc;
> >      struct amd_iommu *iommu;
> >      unsigned int i, nr = 1;
> >      u32 data;
> >
> > -    bdf = pdev ? pdev->sbdf.bdf : hpet_sbdf.bdf;
> > -    seg = pdev ? pdev->seg : hpet_sbdf.seg;
> > +    sbdf.sbdf = pdev ? pdev->sbdf.sbdf : hpet_sbdf.sbdf.sbdf;
>
> This is a better example where
>
> sbdf = pdev ? pdev->sbdf : hpet_sbdf.sbdf;
>
> is equivalent.
>
> > diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
> > b/xen/drivers/passthrough/amd/iommu_map.c
> > index dde393645a..17070904fa 100644
> > --- a/xen/drivers/passthrough/amd/iommu_map.c
> > +++ b/xen/drivers/passthrough/amd/iommu_map.c
> > @@ -694,17 +694,16 @@ int amd_iommu_reserve_domain_unity_unmap(struct 
> > domain *d,
> >  int cf_check amd_iommu_get_reserved_device_memory(
> >      iommu_grdm_t *func, void *ctxt)
> >  {
> > -    unsigned int seg = 0 /* XXX */, bdf;
> > -    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
> > +    pci_sbdf_t sbdf = {0};
>
> "= {}" please.
>
> GCC has just introduced a nasty bug (they claim its a feature) where {0}
> on unions now zeros less than it used to do.  pci_sbdf_t doesn't tickle
> this corner case, but we need to be proactive about removing examples of
> "= {0}".

Will amend with all of these.

> > +    const struct ivrs_mappings *ivrs_mappings = 
> > get_ivrs_mappings(sbdf.seg);
> >      /* At least for global entries, avoid reporting them multiple times. */
> >      enum { pending, processing, done } global = pending;
> >
> > -    for ( bdf = 0; bdf < ivrs_bdf_entries; ++bdf )
> > +    for ( /* sbdf.bdf = 0 */ ; sbdf.bdf < ivrs_bdf_entries; ++sbdf.bdf )
> >      {
> > -        pci_sbdf_t sbdf = PCI_SBDF(seg, bdf);
> > -        const struct ivrs_unity_map *um = ivrs_mappings[bdf].unity_map;
> > -        unsigned int req = ivrs_mappings[bdf].dte_requestor_id;
> > -        const struct amd_iommu *iommu = ivrs_mappings[bdf].iommu;
> > +        const struct ivrs_unity_map *um = 
> > ivrs_mappings[sbdf.bdf].unity_map;
> > +        unsigned int req = ivrs_mappings[sbdf.bdf].dte_requestor_id;
> > +        const struct amd_iommu *iommu = ivrs_mappings[sbdf.bdf].iommu;
>
> Again, this will be better staying as two split variables.

Indeed (on top of the previous similar suggestion):
add/remove: 0/0 grow/shrink: 4/9 up/down: 32/-128 (-96)

> ~Andrew

Thank you!



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.