[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

On 01/11/21 17:31, Igor Druzhinin wrote:
> On 11/01/2021 15:35, Laszlo Ersek wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>> unless you have verified the sender and know the content is safe.
>> On 01/11/21 16:26, Igor Druzhinin wrote:
>>> On 11/01/2021 15:21, Jan Beulich wrote:
>>>> On 11.01.2021 15:49, Laszlo Ersek wrote:
>>>>> On 01/11/21 15:00, Igor Druzhinin wrote:
>>>>>> On 11/01/2021 09:27, Jan Beulich wrote:
>>>>>>> On 11.01.2021 05:53, Igor Druzhinin wrote:
>>>>>>>> We faced a problem with passing through a PCI device with 64GB BAR to
>>>>>>>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture 
>>>>>>>> at
>>>>>>>> 64G address which pushes physical address space to 37 bits. OVMF uses
>>>>>>>> address width early in PEI phase to make DXE identity pages covering
>>>>>>>> the whole addressable space so it needs to know the last address it 
>>>>>>>> needs
>>>>>>>> to cover but at the same time not overdo the mappings.
>>>>>>>> As there is seemingly no other way to pass or get this information in
>>>>>>>> OVMF at this early phase (ACPI is not yet available, PCI is not yet 
>>>>>>>> enumerated,
>>>>>>>> xenstore is not yet initialized) - extend the info structure with a new
>>>>>>>> table. Since the structure was initially created to be extendable -
>>>>>>>> the change is backward compatible.
>>>>>>> How does UEFI handle the same situation on baremetal? I'd guess it is
>>>>>>> in even more trouble there, as it couldn't even read addresses from
>>>>>>> BARs, but would first need to assign them (or at least calculate
>>>>>>> their intended positions).
>>>>>> Maybe Laszlo or Anthony could answer this question quickly while I'm 
>>>>>> investigating?
>>>>> On the bare metal, the phys address width of the processor is known.
>>>> From CPUID I suppose.
>>>>> OVMF does the whole calculation in reverse because there's no way for it
>>>>> to know the physical address width of the physical (= host) CPU.
>>>>> "Overdoing" the mappings doesn't only waste resources, it breaks hard
>>>>> with EPT -- access to a GPA that is inexpressible with the phys address
>>>>> width of the host CPU (= not mappable successfully with the nested page
>>>>> tables) will behave super bad. I don't recall the exact symptoms, but it
>>>>> prevents booting the guest OS.
>>>>> This is why the most conservative 36-bit width is assumed by default.
>>>> IOW you don't trust virtualized CPUID output?
>>> I'm discussing this with Andrew and it appears we're certainly more lax in
>>> wiring physical address width into the guest from hardware directly rather
>>> than KVM.
>>> Another problem that I faced while experimenting is that creating page
>>> tables for 46-bits (that CPUID returned in my case) of address space takes
>>> about a minute on a modern CPU.
>> Even if you enable 1GiB pages?
>> (In the libvirt domain XML, it's expressed as
>>     <feature policy='require' name='pdpe1gb'/>
>> )
>> ... I'm not doubtful, just curious. I guess that, when the physical
>> address width is so large, a physical UEFI platform firmware will limit
>> itself to a lesser width -- it could even offer some knobs in the setup TUI.
> So it wasn't the feature bit that we expose by default in Xen but the OVMF 
> configuration
> with 1G pages disabled for that use. I enabled it and got booting even with 
> 46-bits
> in reasonable time now.
> Given we're not that sensitive in Xen to physical address being different and 
> prefer to
> control that on different level I'd like to abandon that ABI change approach 
> (does anyone
> have any objections?) and instead take physical address width directly from 
> CPUID which
> we do in hvmloader already. The change would be local to Xen platform.

Yes, as long as you limit the approach to "OvmfPkg/XenPlatformPei" (or,
more generally, to the "OvmfPkg/OvmfXen.dsc" platform), it makes perfect




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.