Xen project Mailing List

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

To: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

From: Laszlo Ersek <lersek@xxxxxxxxxx>

Date: Mon, 11 Jan 2021 17:35:23 +0100

Authentication-results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@xxxxxxxxxx

Cc: andrew.cooper3@xxxxxxxxxx, roger.pau@xxxxxxxxxx, wl@xxxxxxx, iwj@xxxxxxxxxxxxxx, anthony.perard@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 11 Jan 2021 16:35:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01/11/21 17:31, Igor Druzhinin wrote: > On 11/01/2021 15:35, Laszlo Ersek wrote: >> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments >> unless you have verified the sender and know the content is safe. >> >> On 01/11/21 16:26, Igor Druzhinin wrote: >>> On 11/01/2021 15:21, Jan Beulich wrote: >>>> On 11.01.2021 15:49, Laszlo Ersek wrote: >>>>> On 01/11/21 15:00, Igor Druzhinin wrote: >>>>>> On 11/01/2021 09:27, Jan Beulich wrote: >>>>>>> On 11.01.2021 05:53, Igor Druzhinin wrote: >>>>>>>> We faced a problem with passing through a PCI device with 64GB BAR to >>>>>>>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture >>>>>>>> at >>>>>>>> 64G address which pushes physical address space to 37 bits. OVMF uses >>>>>>>> address width early in PEI phase to make DXE identity pages covering >>>>>>>> the whole addressable space so it needs to know the last address it >>>>>>>> needs >>>>>>>> to cover but at the same time not overdo the mappings. >>>>>>>> >>>>>>>> As there is seemingly no other way to pass or get this information in >>>>>>>> OVMF at this early phase (ACPI is not yet available, PCI is not yet >>>>>>>> enumerated, >>>>>>>> xenstore is not yet initialized) - extend the info structure with a new >>>>>>>> table. Since the structure was initially created to be extendable - >>>>>>>> the change is backward compatible. >>>>>>> >>>>>>> How does UEFI handle the same situation on baremetal? I'd guess it is >>>>>>> in even more trouble there, as it couldn't even read addresses from >>>>>>> BARs, but would first need to assign them (or at least calculate >>>>>>> their intended positions). >>>>>> >>>>>> Maybe Laszlo or Anthony could answer this question quickly while I'm >>>>>> investigating? >>>>> >>>>> On the bare metal, the phys address width of the processor is known. >>>> >>>> From CPUID I suppose. >>>> >>>>> OVMF does the whole calculation in reverse because there's no way for it >>>>> to know the physical address width of the physical (= host) CPU. >>>>> "Overdoing" the mappings doesn't only waste resources, it breaks hard >>>>> with EPT -- access to a GPA that is inexpressible with the phys address >>>>> width of the host CPU (= not mappable successfully with the nested page >>>>> tables) will behave super bad. I don't recall the exact symptoms, but it >>>>> prevents booting the guest OS. >>>>> >>>>> This is why the most conservative 36-bit width is assumed by default. >>>> >>>> IOW you don't trust virtualized CPUID output? >>> >>> I'm discussing this with Andrew and it appears we're certainly more lax in >>> wiring physical address width into the guest from hardware directly rather >>> than KVM. >>> >>> Another problem that I faced while experimenting is that creating page >>> tables for 46-bits (that CPUID returned in my case) of address space takes >>> about a minute on a modern CPU. >> >> Even if you enable 1GiB pages? >> >> (In the libvirt domain XML, it's expressed as >> >> <feature policy='require' name='pdpe1gb'/> >> ) >> >> ... I'm not doubtful, just curious. I guess that, when the physical >> address width is so large, a physical UEFI platform firmware will limit >> itself to a lesser width -- it could even offer some knobs in the setup TUI. > > So it wasn't the feature bit that we expose by default in Xen but the OVMF > configuration > with 1G pages disabled for that use. I enabled it and got booting even with > 46-bits > in reasonable time now. > > Given we're not that sensitive in Xen to physical address being different and > prefer to > control that on different level I'd like to abandon that ABI change approach > (does anyone > have any objections?) and instead take physical address width directly from > CPUID which > we do in hvmloader already. The change would be local to Xen platform. Yes, as long as you limit the approach to "OvmfPkg/XenPlatformPei" (or, more generally, to the "OvmfPkg/OvmfXen.dsc" platform), it makes perfect sense. Thanks! Laszlo

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.