|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 2/3] xen/pci: introduce PF<->VF links
On 10/28/24 14:41, Roger Pau Monné wrote:
> On Fri, Oct 18, 2024 at 04:39:09PM -0400, Stewart Hildebrand wrote:
>> Add links between a VF's struct pci_dev and its associated PF struct
>> pci_dev. Move the calls to pci_get_pdev()/pci_add_device() down to avoid
>> dropping and re-acquiring the pcidevs_lock().
>>
>> During PF removal, unlink VF from PF and mark the VF broken. As before,
>> VFs may exist without a corresponding PF, although now only with
>> pdev->broken = true.
>>
>> The hardware domain is expected to remove the associated VFs before
>> removing the PF. Print a warning in case a PF is removed with associated
>> VFs still present.
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
>> ---
>> Candidate for backport to 4.19 (the next patch depends on this one)
>>
>> v5->v6:
>> * move printk() before ASSERT_UNREACHABLE()
>> * warn about PF removal with VFs still present
>> * clarify commit message
>>
>> v4->v5:
>> * new patch, split from ("x86/msi: fix locking for SR-IOV devices")
>> * move INIT_LIST_HEAD(&pdev->vf_list); earlier
>> * collapse struct list_head instances
>> * retain error code from pci_add_device()
>> * unlink (and mark broken) VFs instead of removing them
>> * const-ify VF->PF link
>> ---
>> xen/drivers/passthrough/pci.c | 76 ++++++++++++++++++++++++++++-------
>> xen/include/xen/pci.h | 10 +++++
>> 2 files changed, 72 insertions(+), 14 deletions(-)
>>
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 74d3895e1ef6..fe31255b1207 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -333,6 +333,8 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg,
>> u8 bus, u8 devfn)
>> *((u8*) &pdev->devfn) = devfn;
>> pdev->domain = NULL;
>>
>> + INIT_LIST_HEAD(&pdev->vf_list);
>> +
>> arch_pci_init_pdev(pdev);
>>
>> rc = pdev_msi_init(pdev);
>> @@ -449,6 +451,10 @@ static void free_pdev(struct pci_seg *pseg, struct
>> pci_dev *pdev)
>>
>> list_del(&pdev->alldevs_list);
>> pdev_msi_deinit(pdev);
>> +
>> + if ( pdev->info.is_virtfn && pdev->virtfn.pf_pdev )
>
> Shouldn't having pdev->info.is_virtfn set already ensure that
> pdev->virtfn.pf_pdev != NULL?
In the current rev, the possibility exists, however unlikely, that a
*buggy* dom0 may remove the PF before removing the VFs. In this case a
VF would exist without a corresponding PF, and thus pdev->virtfn.pf_pdev
is NULL.
For the next rev, you're right, it'll be back to the situation where a
VF can only exist with an associated PF.
>> + list_del(&pdev->vf_list);
>> +
>> xfree(pdev);
>> }
>>
>> @@ -656,24 +662,11 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
>> const char *type;
>> int ret;
>> - bool pf_is_extfn = false;
>>
>> if ( !info )
>> type = "device";
>> else if ( info->is_virtfn )
>> - {
>> - pcidevs_lock();
>> - pdev = pci_get_pdev(NULL,
>> - PCI_SBDF(seg, info->physfn.bus,
>> - info->physfn.devfn));
>> - if ( pdev )
>> - pf_is_extfn = pdev->info.is_extfn;
>> - pcidevs_unlock();
>> - if ( !pdev )
>> - pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
>> - NULL, node);
>> type = "virtual function";
>> - }
>> else if ( info->is_extfn )
>> type = "extended function";
>> else
>> @@ -703,7 +696,44 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> * extended function.
>> */
>> if ( pdev->info.is_virtfn )
>> - pdev->info.is_extfn = pf_is_extfn;
>> + {
>> + struct pci_dev *pf_pdev;
>
> This could be const?
No, as we are doing this below:
list_add(&pdev->vf_list, &pf_pdev->vf_list);
>> +
>> + pf_pdev = pci_get_pdev(NULL,
>> + PCI_SBDF(seg, info->physfn.bus,
>> + info->physfn.devfn));
>
> You can probably initialize at declaration?
OK
>> +
>> + if ( !pf_pdev )
>
> Is this even feasible during correct operation?
No, I don't think so.
> IOW: shouldn't the PF
> always be added first, so that SR-IOV can be enabled and the VFs added
> afterwards?
Yes, I think you're right.
> I see previous code also catered for VFs being added without the PF
> being present, so I assume there was some need for this.
This is exactly the source of the confusion on my part. In the removal
path, the consensus seems to be that removing a PF with VFs still
present indicates an error. Then shouldn't the opposite also be true?
Adding a VF without a PF should also indicate an error.
I see the PF-adding logic was added in 942a6f1376d8 ("x86/PCI-MSI:
properly determine VF BAR values"). Searching the mailing list archives
didn't reveal much about it [0].
[0]
https://lore.kernel.org/xen-devel/4E3FC6E102000078000501CA@xxxxxxxxxxxxxxxxxxxx/
The only time I've observed this path being taken is by manually
calling PHYSDEVOP_pci_device_add. I've resorted to calling
PHYSDEVOP_pci_device_{remove,add} from userspace in order to test this,
because the Linux kernel doesn't behave this way.
I can't think of a good rationale for catering to VFs being added
without a PF, so I'll turn it into an error for the next rev.
>> + {
>> + ret = pci_add_device(seg, info->physfn.bus,
>> info->physfn.devfn,
>> + NULL, node);
>> + if ( ret )
>> + {
>> + printk(XENLOG_WARNING "Failed to add SR-IOV device PF
>> %pp for VF %pp\n",
>
> Could you split this to make the line a bit shorter?
>
> printk(XENLOG_WARNING
> "Failed to add SR-IOV device PF %pp for VF %pp\n",
>
> Same below.
>
>> + &PCI_SBDF(seg, info->physfn.bus,
>> info->physfn.devfn),
>> + &pdev->sbdf);
>> + free_pdev(pseg, pdev);
>> + goto out;
>> + }
>> + pf_pdev = pci_get_pdev(NULL,
>> + PCI_SBDF(seg, info->physfn.bus,
>> + info->physfn.devfn));
>> + if ( !pf_pdev )
>> + {
>> + printk(XENLOG_ERR "Failed to find SR-IOV device PF %pp
>> for VF %pp\n",
>> + &PCI_SBDF(seg, info->physfn.bus,
>> info->physfn.devfn),
>> + &pdev->sbdf);
>
> If you want to add an error message here, I think it should mention
> the fact this state is not expected:
>
> "Inconsistent PCI state: failed to find newly added PF %pp for VF %pp\n"
>
>> + ASSERT_UNREACHABLE();
>> + free_pdev(pseg, pdev);
>> + ret = -EILSEQ;
>> + goto out;
>> + }
>> + }
>> +
>> + pdev->info.is_extfn = pf_pdev->info.is_extfn;
>> + pdev->virtfn.pf_pdev = pf_pdev;
>> + list_add(&pdev->vf_list, &pf_pdev->vf_list);
>> + }
>> }
>>
>> if ( !pdev->info.is_virtfn && !pdev->vf_rlen[0] )
>> @@ -821,6 +851,24 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>> list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>> if ( pdev->bus == bus && pdev->devfn == devfn )
>> {
>> + if ( !pdev->info.is_virtfn )
>
> Given we have no field to mark a device as a PF, we could check that
> pdev->vf_list is not empty, and by doing so the warn_stale_vfs
> variable could be dropped?
>
> if ( !pdev->info.is_virtfn && !list_empty(&pdev->vf_list) )
> {
> struct pci_dev *vf_pdev;
>
> while ( (vf_pdev = list_first_entry_or_null(&pdev->vf_list,
> struct pci_dev,
> vf_list)) != NULL )
> {
> list_del(&vf_pdev->vf_list);
> vf_pdev->virtfn.pf_pdev = NULL;
> vf_pdev->broken = true;
> }
>
> printk(XENLOG_WARNING "PCI SR-IOV PF %pp removed with VFs still
> present\n",
> &pdev->sbdf);
> }
Yeah. Given that the consensus is leaning toward keeping the PF and
returning an error, here's my suggestion:
if ( !pdev->info.is_virtfn && !list_empty(&pdev->vf_list) )
{
struct pci_dev *vf_pdev;
list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
vf_pdev->broken = true;
pdev->broken = true;
printk(XENLOG_WARNING
"Attempted to remove PCI SR-IOV PF %pp with VFs still present\n",
&pdev->sbdf);
ret = -EBUSY;
break;
}
>> + {
>> + struct pci_dev *vf_pdev, *tmp;
>> + bool warn_stale_vfs = false;
>> +
>> + list_for_each_entry_safe(vf_pdev, tmp, &pdev->vf_list,
>> vf_list)
>> + {
>> + list_del(&vf_pdev->vf_list);
>> + vf_pdev->virtfn.pf_pdev = NULL;
>> + vf_pdev->broken = true;
>> + warn_stale_vfs = true;
>> + }
>> +
>> + if ( warn_stale_vfs )
>> + printk(XENLOG_WARNING "PCI SR-IOV PF %pp removed with
>> VFs still present\n",
>> + &pdev->sbdf);
>> + }
>> +
>> if ( pdev->domain )
>> {
>> write_lock(&pdev->domain->pci_lock);
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index ef56e80651d6..2ea168d5f914 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -153,7 +153,17 @@ struct pci_dev {
>> unsigned int count;
>> #define PT_FAULT_THRESHOLD 10
>> } fault;
>> +
>> + /*
>> + * List head if info.is_virtfn == false
>> + * List entry if info.is_virtfn == true
>> + */
>> + struct list_head vf_list;
>> u64 vf_rlen[6];
>> + struct {
>> + /* Only populated for VFs (info.is_virtfn == true) */
>
> All comments here (specially the first ones) would better use PF and
> VF consistently, rather than referring to other fields in the struct.
> Specially because the fields can change names and the comments would
> then become stale.
OK
>> + const struct pci_dev *pf_pdev; /* Link from VF to PF */
>> + } virtfn;
>
> I'm unsure you need an outer virtfn struct, as it's only one field in
> this patch? Maybe more fields gets added by further patches?
Right. There are no more fields to be added, so there's no need. It was
leftover from a previous rev when vf_list was split.
>
> Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |