Xen project Mailing List

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

To: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

From: Laszlo Ersek <lersek@xxxxxxxxxx>

Date: Mon, 11 Jan 2021 16:35:18 +0100

Authentication-results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@xxxxxxxxxx

Cc: andrew.cooper3@xxxxxxxxxx, roger.pau@xxxxxxxxxx, wl@xxxxxxx, iwj@xxxxxxxxxxxxxx, anthony.perard@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 11 Jan 2021 15:35:33 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01/11/21 16:26, Igor Druzhinin wrote: > On 11/01/2021 15:21, Jan Beulich wrote: >> On 11.01.2021 15:49, Laszlo Ersek wrote: >>> On 01/11/21 15:00, Igor Druzhinin wrote: >>>> On 11/01/2021 09:27, Jan Beulich wrote: >>>>> On 11.01.2021 05:53, Igor Druzhinin wrote: >>>>>> We faced a problem with passing through a PCI device with 64GB BAR to >>>>>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at >>>>>> 64G address which pushes physical address space to 37 bits. OVMF uses >>>>>> address width early in PEI phase to make DXE identity pages covering >>>>>> the whole addressable space so it needs to know the last address it needs >>>>>> to cover but at the same time not overdo the mappings. >>>>>> >>>>>> As there is seemingly no other way to pass or get this information in >>>>>> OVMF at this early phase (ACPI is not yet available, PCI is not yet >>>>>> enumerated, >>>>>> xenstore is not yet initialized) - extend the info structure with a new >>>>>> table. Since the structure was initially created to be extendable - >>>>>> the change is backward compatible. >>>>> >>>>> How does UEFI handle the same situation on baremetal? I'd guess it is >>>>> in even more trouble there, as it couldn't even read addresses from >>>>> BARs, but would first need to assign them (or at least calculate >>>>> their intended positions). >>>> >>>> Maybe Laszlo or Anthony could answer this question quickly while I'm >>>> investigating? >>> >>> On the bare metal, the phys address width of the processor is known. >> >> From CPUID I suppose. >> >>> OVMF does the whole calculation in reverse because there's no way for it >>> to know the physical address width of the physical (= host) CPU. >>> "Overdoing" the mappings doesn't only waste resources, it breaks hard >>> with EPT -- access to a GPA that is inexpressible with the phys address >>> width of the host CPU (= not mappable successfully with the nested page >>> tables) will behave super bad. I don't recall the exact symptoms, but it >>> prevents booting the guest OS. >>> >>> This is why the most conservative 36-bit width is assumed by default. >> >> IOW you don't trust virtualized CPUID output? > > I'm discussing this with Andrew and it appears we're certainly more lax in > wiring physical address width into the guest from hardware directly rather > than KVM. > > Another problem that I faced while experimenting is that creating page > tables for 46-bits (that CPUID returned in my case) of address space takes > about a minute on a modern CPU. Even if you enable 1GiB pages? (In the libvirt domain XML, it's expressed as <feature policy='require' name='pdpe1gb'/> ) ... I'm not doubtful, just curious. I guess that, when the physical address width is so large, a physical UEFI platform firmware will limit itself to a lesser width -- it could even offer some knobs in the setup TUI. Thanks, Laszlo Laszlo

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.