Xen project Mailing List

Re: [Xen-devel] PV guest with PCI passthrough crash on Xen 4.8.3 inside KVM when booted through OVMF

From: Juergen Gross <jgross@xxxxxxxx>

Date: Mon, 27 Nov 2023 17:05:52 +0100

Authentication-results: smtp-out1.suse.de; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=suse.com (policy=quarantine); spf=fail (smtp-out1.suse.de: domain of jgross@xxxxxxxx does not designate 2a07:de40:b281:104:10:150:64:97 as permitted sender) smtp.mailfrom=jgross@xxxxxxxx

Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==

Delivery-date: Mon, 27 Nov 2023 16:06:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 27.11.23 16:56, Jason Andryuk wrote:

On Mon, Nov 27, 2023 at 6:27 AM Marek Marczykowski-Górecki
<marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:


On Mon, Nov 27, 2023 at 11:20:36AM +0000, Frediano Ziglio wrote:

On Sun, Nov 26, 2023 at 2:51 PM Marek Marczykowski-Górecki
<marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:


On Mon, Feb 19, 2018 at 06:30:14PM +0100, Juergen Gross wrote:

On 16/02/18 20:02, Andrew Cooper wrote:

On 16/02/18 18:51, Marek Marczykowski-Górecki wrote:

On Fri, Feb 16, 2018 at 05:52:50PM +0000, Andrew Cooper wrote:

On 16/02/18 17:48, Marek Marczykowski-Górecki wrote:

Hi,

As in the subject, the guest crashes on boot, before kernel output
anything. I've isolated this to the conditions below:
  - PV guest have PCI device assigned (e1000e emulated by QEMU in this case),
    without PCI device it works
  - Xen (in KVM) is started through OVMF; with seabios it works
  - nested HVM is disabled in KVM
  - AMD IOMMU emulation is disabled in KVM; when enabled qemu crashes on
    boot (looks like qemu bug, unrelated to this one)

Version info:
  - KVM host: OpenSUSE 42.3, qemu 2.9.1, 
ovmf-2017+git1492060560.b6d11d7c46-4.1, AMD
  - Xen host: Xen 4.8.3, dom0: Linux 4.14.13
  - Xen domU: Linux 4.14.13, direct boot

Not sure if relevant, but initially I've tried booting xen.efi /mapbs
/noexitboot and then dom0 kernel crashed saying something about conflict
between e820 and kernel mapping. But now those options are disabled.

The crash message:
(XEN) d1v0 Unhandled invalid opcode fault/trap [#6, ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080218720 
entry.o#create_bounce_frame+0x137/0x146
(XEN) Domain 1 (vcpu#0) crashed on cpu#1:
(XEN) ----[ Xen-4.8.3  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e033:[<ffffffff826d9156>]

This is #UD, which is most probably hitting a BUG().  addr2line this ^
to find some code to look at.

addr2line failed me


By default, vmlinux is stripped and compressed.  Ideally you want to
addr2line the vmlinux artefact in the root of your kernel build, which
is the plain elf with debugging symbols.

Alternatively, use scripts/extract-vmlinux on the binary you actually
booted, which might get you somewhere.

, but System.map says its xen_memory_setup. And it
looks like the BUG() is the same as I had in dom0 before:
"Xen hypervisor allocated kernel memory conflicts with E820 map".


Juergen: Is there anything we can do to try and insert some dummy
exception handlers right at PV start, so we could at least print out a
oneliner to the host console which is a little more helpful than Xen
saying "something unknown went wrong" ?


You mean something like commit 42b3a4cb5609de757f5445fcad18945ba9239a07
added to kernel 4.15?


Disabling e820_host in guest config solved the problem. Thanks!

Is this some bug in Xen or OVMF, or is it expected behavior and e820_host
should be avoided?


I don't really know.  e820_host is a gross hack which shouldn't really
be present.  The actually problem is that Linux can't cope with the
memory layout it was given (and I can't recall if there is anything
Linux could potentially to do cope).  OTOH, the toolstack, which knew
about e820_host and chose to lay the guest out in an overlapping way is
probably also at fault.


The kernel can cope with lots of E820 scenarios (e.g. by relocating
initrd or the p2m map), but moving itself out of the way is not
possible.


I'm afraid I need to resurrect this thread...

With recent kernel (6.6+), the host_e820=0 workaround is not an option
anymore. It makes Linux not initialize xen-swiotlb (due to
f9a38ea5172a3365f4594335ed5d63e15af2fd18), so PCI passthrough doesn't
work at all. While I can add yet another layer of workaround (force
xen-swiotlb with iommu=soft), that's getting unwieldy.

Furthermore, I don't get the crash message anymore, even with debug
hypervisor and guest_loglvl=all. Not even "Domain X crashed" in `xl
dmesg`. It looks like the "crash" shutdown reason doesn't reach Xen, and
it's considered clean shutdown (I can confirm it by changing various
`on_*` settings (via libvirt) and observing which gets applied).

Most tests I've done with 6.7-rc1, but the issue I observed on 6.6.1
already.

This is on Xen 4.17.2. And the L0 is running Linux 6.6.1, and then uses
QEMU 8.1.2 + OVMF 202308 to run Xen as L1.


So basically you start the domain and it looks like it's shutting down
cleanly from logs.
Can you see anything from the guest? Can you turn on some more
debugging at guest level?


No, it crashes before printing anything to the console, also with
earlyprintk=xen.

I tried to get some more information from the initial crash but I
could not understand which guest code triggered the bug.


I'm not sure which one is it this time (because I don't have Xen
reporting guest crash...) but last time it was here:
https://github.com/torvalds/linux/blob/master/arch/x86/xen/setup.c#L873-L874


Hi Marek,

I too have run into this "Xen hypervisor allocated kernel memory
conflicts with E820 map" error when running Xen under KVM & OVMF with
SecureBoot.  OVMF built without SecureBoot did not trip over the
issue.  It was a little while back - I have some notes though.

Non-SecureBoot
(XEN)  [0000000000810000, 00000000008fffff] (ACPI NVS)
(XEN)  [0000000000900000, 000000007f8eefff] (usable)

SecureBoot
(XEN)  [0000000000810000, 000000000170ffff] (ACPI NVS)
(XEN)  [0000000001710000, 000000007f0edfff] (usable)

Linux (under Xen) is checking that _pa(_text) (= 0x1000000) is RAM,
but it is not.  Looking at the E820 map, there is type 4, NVS, region
defined:
[0000000000810000, 000000000170ffff] (ACPI NVS)

When OVMF is built with SMM (for SecureBoot) and S3Supported is true,
the memory range 0x900000-0x170ffff is additionally marked ACPI NVS
and Linux trips over this.  It becomes usable RAM under Non-SecureBoot
so Linux boots fine.

What I don't understand is why there is even a check that _pa(_text)
is RAM.  Xen logs that it places dom0 way up high in memory, so the
physical address of the kernel pages are much higher than 0x1000000.
The value 0x1000000 for _pa(_text) doesn't match reality.  Maybe there
are some expectations for the ACPI NVS and other reserved regions to
be 1-1 mapped?  I tried removing the BUG mentioned above, but it still
failed to boot.  I think I also removed a second BUG, but
unfortunately I don't have notes on either.

The _guest_ physical address is what matters here. With using the host E820 map the PV-kernel tries to rearrange its guest physical memory layout to match the E820 map. And a non-RAM GPA for the location where the kernel is located triggers the BUG. Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.