[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

On 01/11/21 16:26, Igor Druzhinin wrote:
> On 11/01/2021 15:21, Jan Beulich wrote:
>> On 11.01.2021 15:49, Laszlo Ersek wrote:
>>> On 01/11/21 15:00, Igor Druzhinin wrote:
>>>> On 11/01/2021 09:27, Jan Beulich wrote:
>>>>> On 11.01.2021 05:53, Igor Druzhinin wrote:
>>>>>> We faced a problem with passing through a PCI device with 64GB BAR to
>>>>>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
>>>>>> 64G address which pushes physical address space to 37 bits. OVMF uses
>>>>>> address width early in PEI phase to make DXE identity pages covering
>>>>>> the whole addressable space so it needs to know the last address it needs
>>>>>> to cover but at the same time not overdo the mappings.
>>>>>> As there is seemingly no other way to pass or get this information in
>>>>>> OVMF at this early phase (ACPI is not yet available, PCI is not yet 
>>>>>> enumerated,
>>>>>> xenstore is not yet initialized) - extend the info structure with a new
>>>>>> table. Since the structure was initially created to be extendable -
>>>>>> the change is backward compatible.
>>>>> How does UEFI handle the same situation on baremetal? I'd guess it is
>>>>> in even more trouble there, as it couldn't even read addresses from
>>>>> BARs, but would first need to assign them (or at least calculate
>>>>> their intended positions).
>>>> Maybe Laszlo or Anthony could answer this question quickly while I'm 
>>>> investigating?
>>> On the bare metal, the phys address width of the processor is known.
>> From CPUID I suppose.
>>> OVMF does the whole calculation in reverse because there's no way for it
>>> to know the physical address width of the physical (= host) CPU.
>>> "Overdoing" the mappings doesn't only waste resources, it breaks hard
>>> with EPT -- access to a GPA that is inexpressible with the phys address
>>> width of the host CPU (= not mappable successfully with the nested page
>>> tables) will behave super bad. I don't recall the exact symptoms, but it
>>> prevents booting the guest OS.
>>> This is why the most conservative 36-bit width is assumed by default.
>> IOW you don't trust virtualized CPUID output?
> I'm discussing this with Andrew and it appears we're certainly more lax in
> wiring physical address width into the guest from hardware directly rather
> than KVM.
> Another problem that I faced while experimenting is that creating page
> tables for 46-bits (that CPUID returned in my case) of address space takes
> about a minute on a modern CPU.

Even if you enable 1GiB pages?

(In the libvirt domain XML, it's expressed as

    <feature policy='require' name='pdpe1gb'/>

... I'm not doubtful, just curious. I guess that, when the physical
address width is so large, a physical UEFI platform firmware will limit
itself to a lesser width -- it could even offer some knobs in the setup TUI.





Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.