[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/pci: Correct ECS handling with CF8/CFC emulation


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 3 Apr 2023 15:24:58 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EeO9WlffdyqbjvDDnpxR+pBUtbwT2GBSWRHSi5bAjnM=; b=HpvVtyGveNfjn83dfVWcJzzLN0mhOjzzeOs7XZLS2xhwvAE/14y6IR3cp1UKbFRAKxAX6vOg6Oz7NtTWbun9X3uXsAYgKk2kP4UjUAjPf6K0VK2bf3amimhcrmwfSKTtBDR4IJiyYJH3fJPgfJbypJr60vXdbtDLmWG1sL0Axn1bhyBZCALdoiXIhCniol6iiHABUyt8OFfXmRPYTf2jA9h6FLZp1VULNmkpIAiHE12VDOJw2r9qygftcUK2UnTwz70vPRVgeEyiPyQjvc+XfASH5AaX3tLrvzhCWjvM4AW38aDMyxJhejrzTfCF04v07NOxRIuqw8TD3KhVep14jA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BkLheGfPYrqgJ+vKyithgeLT5qTRmEiqg82FBguVCvfHKpQROcZikyMZfZvXK9aYcuvcCLd0eoBx98mdCMzCY/Mw5CwnCY8qLpeWDj4CVWK95kUDsLw10dFtvkzpOMj5m5cuzhIX+4UFMX29Lqvy3uL/A9DcXU+WvJ/b3K6XLPew9ClsGcw7vH9I4KfcwWJouUNPOKat1O0sE05xrP4Wliie7iF4snl9IcEzNuxPIzsDm68b/T9yi3Ui9i2/AXt36ewRQa0bU4WFCyVQPJV6etVCADUIhEcScTm00o4njaKc1TaB7qaAo4FZ6WFJK8mu5S9G0sUHCilE9hsDFLvSzQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 03 Apr 2023 13:25:17 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 31.03.2023 19:57, Andrew Cooper wrote:
> This started by noticing the dubious Fam17h check in
> arch_ioreq_server_get_type_addr(), and then realising that checking the host
> CF8_EXT setting is utterly bogus for the guest <-> qemu emulation path.
> 
> What should be consulted here is the guest's MSR_AMD64_NB_CFG setting, but a)
> there isn't one, and b) the vestigial remnants of the cross-vendor migration
> logic cause MSR_AMD64_NB_CFG to be unconditionally read-as-zero, making the
> CF8_EXT path unused by any suitably-written OS in the first place.
> 
> MSR_AMD64_NB_CFG really has been removed on Fam17h (it's now just a read-zero,
> write-discard stub), and the ECS extension is unconditionally active, meaning
> it is not correct for Xen to ignore the ExtRegNo field on newer AMD CPUs.

I don't buy "unconditionally active". Whether to enable this by default is
up to firmware; I view it as not unlikely that some firmware actually have
a setup control for it. Fam17 controls it through D1[8-F]F4x044:0, while
Fam19 has D1[8-F]F0xC00:0 for this purpose. (I've had a work item for quite
some time to actually make use of this bit to enable ECS.)

> It turns out that Xen did even have this behaviour in 4.5 and earlier, with
> this problematic CF8_EXT checking being added in 4.6.  Therefore, revert back
> to Xen's older behaviour - it is objectively less wrong than the current
> logic.
> 
> While fixing this, get rid of hvm_pci_decode_addr() - it is more complicated
> to follow (and to call) than using the CF8* macros in the calling context.
> Rename CF8_ADDR() to CF8_REG() to better describe what it does, and write a
> comment explaining all about CF8/CFC accesses.
> 
> There's one rare case when CF8_EXT is visible to guests, and that is for a
> pinned hwdom.  Right now, we permit such a dom0 to modify the CF8_EXT bit, but
> this seems like a very unwise idea.  Leave a TODO for people to consider.
> 
> Fixes: e0fbf3bf9871 ("x86/AMD: correct certain Fam17 checks")

Therefore I'm not convinced this Fixes: tag is warranted.

> Fixes: 2d67a7a4d37a ("x86: synchronize PCI config space access decoding")

I'm also curious which particular aspect of that change you are considering
to be fixed here.

Your claim about behavior being reserved is perhaps okay as far as hardware
is concerned, but by removing the checks you e.g. misguide
xsm_pci_config_permission() in case it handles certain register ranges
specially. Dealing with that was, according to the description, one of the
purposes of the commit above (it's been long enough, so I can only go from
the description in git).

> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -248,20 +248,6 @@ void register_g2m_portio_handler(struct domain *d)
>      handler->ops = &g2m_portio_ops;
>  }
>  
> -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
> -                                 pci_sbdf_t *sbdf)
> -{
> -    ASSERT(CF8_ENABLED(cf8));
> -
> -    sbdf->bdf = CF8_BDF(cf8);
> -    sbdf->seg = 0;
> -    /*
> -     * NB: the lower 2 bits of the register address are fetched from the
> -     * offset into the 0xcfc register when reading/writing to it.
> -     */
> -    return CF8_ADDR_LO(cf8) | (addr & 3);
> -}

I have to admit that I'm surprised that you fold replacing of this
function (purely mechanical afaict) into a change with a significant
functional change. Wouldn't you agree that this may better be split off?

> @@ -1104,6 +1092,11 @@ static int cf_check write_msr(
>          if ( !is_hwdom_pinned_vcpu(curr) )
>              return X86EMUL_OKAY;
>          if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
> +             /*
> +              * TODO: this is broken.  What happens when dom0 is pinned but
> +              * can't see the full system?  CF8_EXT probably ought to be a
> +              * Xen-owned setting, and made symmetric across the system.
> +              */
>               ((val ^ temp) & ~(1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
>              goto invalid;
>          if ( wrmsr_safe(MSR_AMD64_NB_CFG, val) == 0 )

What do you mean by "can't see the full system"? Originally Linux used
only the MSR view of the bit to enable ECS, but this was specifically
extended to prefer the PCI config space view instead (24d9b70b8c67
"x86: Use PCI method for enabling AMD extended config space before MSR
method").

Since here we're dealing with the MSR flavor, and since Linux shouldn't
be using this anymore (due to checking whether the bit is clear before
trying to set it), we may be okay with simply purging this code if we
don't care about very old Linux Dom0 anymore (or if we use your
argument of it not reliably affecting the entire system).

As to the setting becoming Xen-owned: Dom0 being responsible for
(almost) all of PCI, it would be somewhat odd to take away from it this
level of control.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.