[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 5/7] vpci: add SR-IOV support for PVH Dom0


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Mykyta Poturai <Mykyta_Poturai@xxxxxxxx>
  • Date: Wed, 6 May 2026 09:39:57 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7amzujlHZzSzrXZ+W4vN9u3a7C4kW0IefvHvGxkayZo=; b=OJbmHQy+cDYJ7rHO1f/gtOrbuWy1IG/JUzOGi9aojOnXXowL/b/oqCfXfRwJq3StmK0gxsgVKEhisgCN7lRONmkQoLO6RznMGcK9pPS/UvWh/OqLbartVngwvoAjQcT9TXt4mvdqF8HbsV79TRh4caouaH+aF0mDAwLSuBV/EM7q3LsBI9Rojg3lGA9NkDvll9g3lhlxcLX3r+ByuWL8XiOnEo7wYADwFBNc9LCiuDQfX+YzoB5xewU+sJYBauKcEz7UmragR/1BwwisicsBg+IVZS0tJkx5HYq1XInEMD8Di3dzzmsA+5lwrHy/DM7JyAxBvvLGwYU/M8fCSNu6Jw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FS9ZKLYgf+7eNrRZkkiuBs1zHA6sxXcFri8cBnQY0KYfdQf9OTE9CJRdNMnODMA6rBrPMEnxEo/dtL1oYzP1bG0FGQ7DVlHz4Qbuv1/dsEnNhxff3OkiOhoJnJLiMtxfTHMyp72KmNThC8sgpUnYK0MKsFmCbZI22Rzok2AysEPALPWKhK3gosOK5ypROfGz1lmGI/TxvHUTCP7VNEvRSxn9rFdPJGLn4EccsGCFKn8IbAXmQR/rRjzu2o+dV1A4kHGIWTtjtWJ+Gyp816AQWNxixdyZQ9Y9wfysrLZV+jUQyWmy0ncFdDl3guZNc4+v87RBjibQG5y70S+rMlX1kA==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=epam.com header.i="@epam.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:x-ms-exchange-senderadcheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=epam.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Delivery-date: Wed, 06 May 2026 09:40:03 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHcyClfuk/Yqk/9/E2oGahG1JP+srXpqhqAgALY6YCAEPz5AIADaEkA
  • Thread-topic: [PATCH v3 5/7] vpci: add SR-IOV support for PVH Dom0

On 5/4/26 08:37, Jan Beulich wrote:
> On 23.04.2026 12:12, Mykyta Poturai wrote:
>> On 4/21/26 17:43, Jan Beulich wrote:
>>> On 09.04.2026 16:01, Mykyta Poturai wrote:
>>>> From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
>>>>
>>>> This code is expected to only be used by privileged domains,
>>>> unprivileged domains should not get access to the SR-IOV capability.
>>>>
>>>> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
>>>> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
>>>> for possible changes in the system page size register. Also force VFs to
>>>> always use emulated reads for command register, this is needed to
>>>> prevent some drivers accidentally unmapping BARs.
>>>
>>> This apparently refers to the change to vpci_init_header(). Writes are
>>> already intercepted. How would a read lead to accidental BAR unmap? Even
>>> for writes I don't see how a VF driver could accidentally unmap BARs, as
>>> the memory decode bit there is hardwired to 0.
>>>
>>>> Discovery of VFs is
>>>> done by Dom0, which must register them with Xen.
>>>
>>> If we intercept control register writes, why would we still require
>>> Dom0 to report the VFs that appear?
>>>
>>
>> Sorry, I don't understand this question. You specifically requested this
>> to be done this way in V2. Quoting your reply from V2 below.
>>
>>   > Aren't you effectively busy-waiting for these 100ms, by simply
>> returning "true"
>>   > from vpci_process_pending() until the time has passed? This imo is a
>> no-go. You
>>   > want to set a timer and put the vCPU to sleep, to wake it up again
>> when the
>>   > timer has expired. That'll then eliminate the need for the
>> not-so-nice patch 4.
>>
>>   > Question is whether we need to actually go this far (right away). I
>> expect you
>>   > don't mean to hand PFs to DomU-s. As long as we keep them in the hardware
>>   > domain, can't we trust it to set things up correctly, just like we
>> trust it in
>>   > a number of other aspects?
> 
> How's any of this related to the question I raised here, or your reply
> thereto? If we intercept PCI_SRIOV_CTRL, we know when VFs are created.
> Why still demand Dom0 to report them then?
> 

The spec states that VFs can take up to 100ms after the VF_ENABLE bit is 
set to become alive. We discussed in the V2 that it is not acceptable to 
do a required 100ms wait in Xen while blocking a domain. And not doing 
that blocking would require some mechanism to only allow a domain to run 
for precisely 99(or more?)ms. You yourself suggested that we can trust 
the hardware domain with registering VFs if we already trust it with 
other PCI-related stuff. Did you change your mind, or am I completely 
misunderstanding this question?

>>>> +static int map_vfs(const struct pci_dev *pf_pdev, uint16_t cmd)
>>>> +{
>>>> +    struct pci_dev *vf_pdev;
>>>> +    int rc;
>>>> +
>>>> +    ASSERT(rw_is_write_locked(&pf_pdev->domain->pci_lock));
>>>> +
>>>> +    list_for_each_entry(vf_pdev, &pf_pdev->vf_list, vf_list)
>>>> +    {
>>>> +        rc = vpci_modify_bars(vf_pdev, cmd, false);
>>>> +        if ( rc )
>>>> +        {
>>>> +            gprintk(XENLOG_ERR, "failed to %s VF %pp: %d\n",
>>>> +                    (cmd & PCI_COMMAND_MEMORY) ? "map" : "unmap",
>>>> +                    &vf_pdev->sbdf, rc);
>>>> +            return rc;
>>>> +        }
>>>> +
>>>> +        vf_pdev->vpci->header.guest_cmd &= ~PCI_COMMAND_MEMORY;
>>>> +        vf_pdev->vpci->header.guest_cmd |= (cmd & PCI_COMMAND_MEMORY);
>>>
>>> As mentioned elsewhere as well, this bit is supposed to be 0 for VFs.
>>
>> There are some devices that expose VFs with the same VID/DID as in the
>> PF, causing Linux to use normal driver for them and threat them like
>> normal devices. At some point, those normal drivers try to do a
>> read-modify-update of the command register and end up writing 0 to
>> PCI_COMMAND_MEMORY, causing cmd_write to unmap the BARS of that device.
>> I am not sure, maybe it would be better to just ignore cmd writes for VFs?
> 
> No. We should treat r/o bits as r/o (which for this bit implies it not
> controlling BAR mapping).
> 
>>>> +    sriov_pos = pci_find_ext_capability(pf_pdev, PCI_EXT_CAP_ID_SRIOV);
>>>> +    ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL);
>>>> +
>>>> +    if ( (pf_pdev->domain == vf_pdev->domain) && (ctrl & 
>>>> PCI_SRIOV_CTRL_MSE) )
>>>> +    {
>>>> +        rc = vpci_modify_bars(vf_pdev, PCI_COMMAND_MEMORY, false);
>>>
>>> Doesn't VF enable also need to be set for the BARs to be mapped?
>>
>> I don't think so. Enabling memory space logically maps very well to
>> mapping memory to the guest. I don’t see any benefit of also requiring
>> VFE bit here.
> 
> Iirc the spec is quite explicit in this regard.
> 
> Jan

I will recheck the spec regarding this question.

-- 
Mykyta

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.