[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] [RFC] vpci: allow BAR write while mapped


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Date: Thu, 13 Mar 2025 14:28:06 -0400
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=citrix.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ufv7YYxL1vv+wUtopyB0g6mQIH21Fb5XpzvlnFUKMNA=; b=RNdTpe8fTr7uuZ8HwdDzudGzUTJpVVGtbxmHgXVSi1xqS2JiRee85bB9MndRUKhWcWK/KKa4uIqQNS4nUX3elZKTGW1d6jJX4PVaxGXq9RafqEHXbrN2CajIOsyHUxeV7ajZZMdR/M2y/GY80rHiXoMeLrR7gDShiA9mA4TxQVfE2/XNQ474mpvzRMGw2H9hjmpDL9au1c4unhIrO4mIcN0aNXnbUfCX7JBXWDkDEpyNwDlSxcUCXpVJLl+LnqouzD3r2Jx1ymx9a4d3Pyq5W1RWqZdeEbkvW8qJgBxeu30ay2PpHdh7f3iV5bmnGxHVz1VUR5Ix8LztuBjB5MlVZQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=kKNBMMhg/aBfWJIG5nPB4xJuuvnERSQta0yDF996TlwgwTZ1CDRADdkqLVHaksIoyoxiAGUALK1ioYYh1wrVYjwhwsLdSqM2h+z2G+TbFzAq8e5I3oHleQ1cR7bYeDXcMcmXSrobqDE2Jt8+X2mzCgkRjbU6ej9q51601q0Y5S9KbthYYcRnoSBoBp4r7YK+T7P0+8GCWchaZo9wK0THNCetxGyWeCaM2+xmTEIEZj4tf0vwk6Hj365+3tIZGLbKgSXmlzHIpW6AIRmJeQLLmhlOwSGe1P8puTd0DdCXR+7hYLGycTAriEARNMD+ayChwjLHNhB/6ElCKbSa/zpTCw==
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 13 Mar 2025 18:28:32 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 3/13/25 13:48, Roger Pau Monné wrote:
> On Wed, Mar 12, 2025 at 03:50:17PM -0400, Stewart Hildebrand wrote:
>> Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware
>> initialized the BAR to a bad address, Linux will try to write a new
>> address to the BAR without disabling memory decoding. Allow the write
>> by updating p2m right away in the vPCI BAR write handler.
> 
> IIRC it's only 32bit BARs that Linux will attempt to reposition
> without toggling memory decoding off.

This matches my observations as well.

> For 64bit BARs it will in
> general (unless pci_dev->mmio_always_on is set) toggle memory decoding
> off and then update the BAR registers.
> 
>>
>> Resolves: https://gitlab.com/xen-project/xen/-/issues/197
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
>> ---
>> RFC: Currently the deferred mapping machinery supports only map or
>>      unmap, not both. It might be better to rework the mapping machinery
>>      to support unmap-then-map operations, but please let me know your
>>      thoughts.
>> RFC: This patch has not yet made an attempt to distinguish between
>>      32-bit and 64-bit writes. It probably should.
>> ---
>>  xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++--------
>>  1 file changed, 53 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ef6c965c081c..66adb2183cfe 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, 
>> uint16_t cmd,
>>          ASSERT_UNREACHABLE();
>>  }
>>  
>> -bool vpci_process_pending(struct vcpu *v)
>> +static bool process_pending(struct vcpu *v, bool need_lock)
>>  {
>>      struct pci_dev *pdev = v->vpci.pdev;
>>      struct vpci_header *header = NULL;
>> @@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v)
>>      if ( !pdev )
>>          return false;
>>  
>> -    read_lock(&v->domain->pci_lock);
>> +    if ( need_lock )
>> +        read_lock(&v->domain->pci_lock);
> 
> The addition of need_lock would better be done in a pre-patch.
> 
>>  
>>      if ( !pdev->vpci || (v->domain != pdev->domain) )
>>      {
>>          v->vpci.pdev = NULL;
>> -        read_unlock(&v->domain->pci_lock);
>> +        if ( need_lock )
>> +            read_unlock(&v->domain->pci_lock);
>>          return false;
>>      }
>>  
>> @@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v)
>>  
>>          if ( rc == -ERESTART )
>>          {
>> -            read_unlock(&v->domain->pci_lock);
>> +            if ( need_lock )
>> +                read_unlock(&v->domain->pci_lock);
>>              return true;
>>          }
>>  
>>          if ( rc )
>>          {
>> -            spin_lock(&pdev->vpci->lock);
>> +            if ( need_lock )
>> +                spin_lock(&pdev->vpci->lock);
>>              /* Disable memory decoding unconditionally on failure. */
>>              modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
>>                              false);
>> -            spin_unlock(&pdev->vpci->lock);
>> +            if ( need_lock )
>> +                spin_unlock(&pdev->vpci->lock);
>>  
>>              /* Clean all the rangesets */
>>              for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> @@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v)
>>  
>>              v->vpci.pdev = NULL;
>>  
>> -            read_unlock(&v->domain->pci_lock);
>> +            if ( need_lock )
>> +                read_unlock(&v->domain->pci_lock);
>>  
>>              if ( !is_hardware_domain(v->domain) )
>>                  domain_crash(v->domain);
>> @@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v)
>>      }
>>      v->vpci.pdev = NULL;
>>  
>> -    spin_lock(&pdev->vpci->lock);
>> +    if ( need_lock )
>> +        spin_lock(&pdev->vpci->lock);
>>      modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only);
>> -    spin_unlock(&pdev->vpci->lock);
>> +    if ( need_lock )
>> +        spin_unlock(&pdev->vpci->lock);
>>  
>> -    read_unlock(&v->domain->pci_lock);
>> +    if ( need_lock )
>> +        read_unlock(&v->domain->pci_lock);
>>  
>>      return false;
>>  }
>>  
>> +bool vpci_process_pending(struct vcpu *v)
>> +{
>> +    return process_pending(v, true);
>> +}
>> +
>>  static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>>                              uint16_t cmd)
>>  {
>> @@ -565,6 +579,8 @@ static void cf_check bar_write(
>>  {
>>      struct vpci_bar *bar = data;
>>      bool hi = false;
>> +    bool reenable = false;
>> +    uint32_t cmd = 0;
>>  
>>      ASSERT(is_hardware_domain(pdev->domain));
>>  
>> @@ -585,10 +601,31 @@ static void cf_check bar_write(
>>      {
>>          /* If the value written is the current one avoid printing a 
>> warning. */
>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>> +        {
>>              gprintk(XENLOG_WARNING,
>> -                    "%pp: ignored BAR %zu write while mapped\n",
>> +                    "%pp: allowing BAR %zu write while mapped\n",
>>                      &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
> 
> If Xen now handles BARs writes with memory decoding enabled the
> message can be removed.  It's only purpose was to signal this missing
> handling.

OK

>> -        return;
>> +            ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +            ASSERT(spin_is_locked(&pdev->vpci->lock));
>> +            reenable = true;
>> +            cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>> +            /*
>> +             * Write-while-mapped: unmap the old BAR in p2m. We want this to
>> +             * finish right away since the deferral machinery only supports
>> +             * unmap OR map, not unmap-then-remap. Ultimately, it probably 
>> would
>> +             * be better to defer the write-while-mapped case just like 
>> regular
>> +             * BAR writes (but still only allow it for 32-bit BAR writes).
>> +             */
>> +            /* Disable memory decoding */
>> +            modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
> 
> I think if the guest explicitly avoids toggling memory decoding Xen
> should also to the same, and not touch the bit.

OK

>> +            /* Call process pending here to ensure P2M operations are done 
>> */
>> +            while ( process_pending(current, false) )
>> +            {
>> +                /* Pre-empted, try again */
>> +            }
> 
> I'm afraid this is not how I would expect this to be done.  We
> explicitly do the p2m modifications in a deferred context to avoid
> long running operations.  We should continue to do so to perform this
> unmap + map operation.
> 
> I think you need to introduce a way to queue an operation that will do
> a map + unmap in the deferred context processing, or signal that after
> the currently queued operation finishes a new call to modify_bars()
> should be issued.

Yep, this makes sense. Will do.

> 
> It would be nice if we had a more generic way to queue guest vCPU p2m
> (map and unmap) operations, but that's likely to require a much better
> interface than what we currently have.
> 
> Thanks, Roger.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.