[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edk2-devel] [PATCH] OvmfPkg/XenPlatformPei: Use CPUID to get physical address width on Xen



On 01/19/21 10:37, Julien Grall wrote:
> Hi Igor,
> 
> On 13/01/2021 03:42, Igor Druzhinin wrote:
>> We faced a problem with passing through a PCI device with 64GB BAR to
>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
>> 64G address which pushes physical address space to 37 bits. That is above
>> 36-bit width that OVMF exposes currently to a guest without tweaking
>> PcdPciMmio64Size knob.
>>
>> The reverse calculation using this knob was inhereted from QEMU-KVM
>> platform
>> code where it serves the purpose of finding max accessible physical
>> address without necessary trusting emulated CPUID physbits value (that
>> could
>> be different from host physbits). On Xen we expect to use CPUID policy
>> to level the data correctly to prevent situations with guest physbits >
>> host physbits e.g. across migrations.
>>
>> The next aspect raising concern - resource consumption for DXE IPL
>> page tables
>> and time required to map the whole address space in case of using CPUID
>> bits directly. That could be mitigated by enabling support for 1G pages
>> in DXE IPL configuration. 1G pages are available on most CPUs produced in
>> the last 10 years and those without don't have many phys bits.
>>
>> Remove all the redundant code now (including PcdPciMmio64.. handling
>> that's
>> not used on Xen anyway) and grab physbits directly from CPUID that should
>> be what baremetal UEFI systems do.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
> 
> Reviewed-by: Julien Grall <julien@xxxxxxx>

I'm going to hold this patch until Anthony responds in this thread as well:

http://mid.mail-archive.com/63cf6053-9787-d8cf-db18-982ebcab1780@xxxxxxxxxx
https://edk2.groups.io/g/devel/message/70541

Thanks
Laszlo

> 
> Cheers,
> 
>> ---
>>   OvmfPkg/OvmfXen.dsc                |   3 +
>>   OvmfPkg/XenPlatformPei/MemDetect.c | 166
>> +++----------------------------------
>>   2 files changed, 15 insertions(+), 154 deletions(-)
>>
>> diff --git a/OvmfPkg/OvmfXen.dsc b/OvmfPkg/OvmfXen.dsc
>> index 7d31e88..8ae6ed0 100644
>> --- a/OvmfPkg/OvmfXen.dsc
>> +++ b/OvmfPkg/OvmfXen.dsc
>> @@ -444,6 +444,9 @@
>>     ## Xen vlapic's frequence is 100 MHz
>>     gEfiMdePkgTokenSpaceGuid.PcdFSBClock|100000000
>>   +  # We populate DXE IPL tables with 1G pages preferably on Xen
>> +  gEfiMdeModulePkgTokenSpaceGuid.PcdUse1GPageTable|TRUE
>> +
>>  
>> ################################################################################
>>
>>   #
>>   # Pcd Dynamic Section - list of all EDK II PCD Entries defined by
>> this Platform
>> diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c
>> b/OvmfPkg/XenPlatformPei/MemDetect.c
>> index 1f81eee..1970b63 100644
>> --- a/OvmfPkg/XenPlatformPei/MemDetect.c
>> +++ b/OvmfPkg/XenPlatformPei/MemDetect.c
>> @@ -172,175 +172,33 @@ GetSystemMemorySizeBelow4gb (
>>     return (UINT32) (((UINTN)((Cmos0x35 << 8) + Cmos0x34) << 16) +
>> SIZE_16MB);
>>   }
>>   -
>> -STATIC
>> -UINT64
>> -GetSystemMemorySizeAbove4gb (
>> -  )
>> -{
>> -  UINT32 Size;
>> -  UINTN  CmosIndex;
>> -
>> -  //
>> -  // In PVH case, there is no CMOS, we have to calculate the memory size
>> -  // from parsing the E820
>> -  //
>> -  if (XenPvhDetected ()) {
>> -    UINT64  HighestAddress;
>> -
>> -    HighestAddress = GetHighestSystemMemoryAddress (FALSE);
>> -    ASSERT (HighestAddress == 0 || HighestAddress >= BASE_4GB);
>> -
>> -    if (HighestAddress >= BASE_4GB) {
>> -      HighestAddress -= BASE_4GB;
>> -    }
>> -
>> -    return HighestAddress;
>> -  }
>> -
>> -  //
>> -  // CMOS 0x5b-0x5d specifies the system memory above 4GB MB.
>> -  // * CMOS(0x5d) is the most significant size byte
>> -  // * CMOS(0x5c) is the middle size byte
>> -  // * CMOS(0x5b) is the least significant size byte
>> -  // * The size is specified in 64kb chunks
>> -  //
>> -
>> -  Size = 0;
>> -  for (CmosIndex = 0x5d; CmosIndex >= 0x5b; CmosIndex--) {
>> -    Size = (UINT32) (Size << 8) + (UINT32) CmosRead8 (CmosIndex);
>> -  }
>> -
>> -  return LShiftU64 (Size, 16);
>> -}
>> -
>> -
>> -/**
>> -  Return the highest address that DXE could possibly use, plus one.
>> -**/
>> -STATIC
>> -UINT64
>> -GetFirstNonAddress (
>> -  VOID
>> -  )
>> -{
>> -  UINT64               FirstNonAddress;
>> -  UINT64               Pci64Base, Pci64Size;
>> -  RETURN_STATUS        PcdStatus;
>> -
>> -  FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
>> -
>> -  //
>> -  // If DXE is 32-bit, then we're done; PciBusDxe will degrade 64-bit
>> MMIO
>> -  // resources to 32-bit anyway. See DegradeResource() in
>> -  // "PciResourceSupport.c".
>> -  //
>> -#ifdef MDE_CPU_IA32
>> -  if (!FeaturePcdGet (PcdDxeIplSwitchToLongMode)) {
>> -    return FirstNonAddress;
>> -  }
>> -#endif
>> -
>> -  //
>> -  // Otherwise, in order to calculate the highest address plus one,
>> we must
>> -  // consider the 64-bit PCI host aperture too. Fetch the default size.
>> -  //
>> -  Pci64Size = PcdGet64 (PcdPciMmio64Size);
>> -
>> -  if (Pci64Size == 0) {
>> -    if (mBootMode != BOOT_ON_S3_RESUME) {
>> -      DEBUG ((DEBUG_INFO, "%a: disabling 64-bit PCI host aperture\n",
>> -        __FUNCTION__));
>> -      PcdStatus = PcdSet64S (PcdPciMmio64Size, 0);
>> -      ASSERT_RETURN_ERROR (PcdStatus);
>> -    }
>> -
>> -    //
>> -    // There's nothing more to do; the amount of memory above 4GB fully
>> -    // determines the highest address plus one. The memory hotplug
>> area (see
>> -    // below) plays no role for the firmware in this case.
>> -    //
>> -    return FirstNonAddress;
>> -  }
>> -
>> -  //
>> -  // SeaBIOS aligns both boundaries of the 64-bit PCI host aperture
>> to 1GB, so
>> -  // that the host can map it with 1GB hugepages. Follow suit.
>> -  //
>> -  Pci64Base = ALIGN_VALUE (FirstNonAddress, (UINT64)SIZE_1GB);
>> -  Pci64Size = ALIGN_VALUE (Pci64Size, (UINT64)SIZE_1GB);
>> -
>> -  //
>> -  // The 64-bit PCI host aperture should also be "naturally" aligned.
>> The
>> -  // alignment is determined by rounding the size of the aperture
>> down to the
>> -  // next smaller or equal power of two. That is, align the aperture
>> by the
>> -  // largest BAR size that can fit into it.
>> -  //
>> -  Pci64Base = ALIGN_VALUE (Pci64Base, GetPowerOfTwo64 (Pci64Size));
>> -
>> -  if (mBootMode != BOOT_ON_S3_RESUME) {
>> -    //
>> -    // The core PciHostBridgeDxe driver will automatically add this
>> range to
>> -    // the GCD memory space map through our PciHostBridgeLib
>> instance; here we
>> -    // only need to set the PCDs.
>> -    //
>> -    PcdStatus = PcdSet64S (PcdPciMmio64Base, Pci64Base);
>> -    ASSERT_RETURN_ERROR (PcdStatus);
>> -    PcdStatus = PcdSet64S (PcdPciMmio64Size, Pci64Size);
>> -    ASSERT_RETURN_ERROR (PcdStatus);
>> -
>> -    DEBUG ((DEBUG_INFO, "%a: Pci64Base=0x%Lx Pci64Size=0x%Lx\n",
>> -      __FUNCTION__, Pci64Base, Pci64Size));
>> -  }
>> -
>> -  //
>> -  // The useful address space ends with the 64-bit PCI host aperture.
>> -  //
>> -  FirstNonAddress = Pci64Base + Pci64Size;
>> -  return FirstNonAddress;
>> -}
>> -
>> -
>>   /**
>> -  Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
>> +  Initialize the mPhysMemAddressWidth variable, based on CPUID data.
>>   **/
>>   VOID
>>   AddressWidthInitialization (
>>     VOID
>>     )
>>   {
>> -  UINT64 FirstNonAddress;
>> +  UINT32 RegEax;
>>   -  //
>> -  // As guest-physical memory size grows, the permanent PEI RAM
>> requirements
>> -  // are dominated by the identity-mapping page tables built by the
>> DXE IPL.
>> -  // The DXL IPL keys off of the physical address bits advertized in
>> the CPU
>> -  // HOB. To conserve memory, we calculate the minimum address width
>> here.
>> -  //
>> -  FirstNonAddress      = GetFirstNonAddress ();
>> -  mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
>> -
>> -  //
>> -  // If FirstNonAddress is not an integral power of two, then we need an
>> -  // additional bit.
>> -  //
>> -  if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
>> -    ++mPhysMemAddressWidth;
>> +  AsmCpuid (0x80000000, &RegEax, NULL, NULL, NULL);
>> +  if (RegEax >= 0x80000008) {
>> +    AsmCpuid (0x80000008, &RegEax, NULL, NULL, NULL);
>> +    mPhysMemAddressWidth = (UINT8) RegEax;
>> +  } else {
>> +    mPhysMemAddressWidth = 36;
>>     }
>>       //
>> -  // The minimum address width is 36 (covers up to and excluding 64
>> GB, which
>> -  // is the maximum for Ia32 + PAE). The theoretical architecture
>> maximum for
>> -  // X64 long mode is 52 bits, but the DXE IPL clamps that down to 48
>> bits. We
>> -  // can simply assert that here, since 48 bits are good enough for
>> 256 TB.
>> +  // IA-32e paging translates 48-bit linear addresses to 52-bit
>> physical addresses.
>>     //
>> -  if (mPhysMemAddressWidth <= 36) {
>> -    mPhysMemAddressWidth = 36;
>> +  ASSERT (mPhysMemAddressWidth <= 52);
>> +  if (mPhysMemAddressWidth > 48) {
>> +    mPhysMemAddressWidth = 48;
>>     }
>> -  ASSERT (mPhysMemAddressWidth <= 48);
>>   }
>>   -
>>   /**
>>     Calculate the cap for the permanent PEI memory.
>>   **/
>>
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.