[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] xen/x86/pvh: copy ACPI tables to Dom0 instead of mapping


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 17 May 2023 10:40:56 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eAq9fnKH21Pz3T30qI0X/bMwHGnzIAvBXdpHc/TAM1E=; b=KDe6nIQG/1VasmtqESuu8wELR0bfkJtbyRQbMYRPmqt22gGGgW6AmDnAm5wjmh7mahw7tiAXhNGaO3U2cArMvZwtDMcsB/6Py7+hEbcR8r2q0p2YBTr/i6c4Ahjn2iJ3LZbay4cDZdIwp+yPgYouI5MjkuXdrcFZ7+BZZYIjKFAPv/Waqad+Lt8SC3EY3Af/MXyeDYQISURGKZybIqX94xIzsxRAC6orluKFAK4qEEMBvKRLy/yHC2/TKsXYAR44tmwL9JKb9JTuGXQKZhyxxuG4C8l+SqOsaxmezpOJDSF4Ez7TOfgOBTcopTARzWjWbzL2TyjRDXRh5arlrIlaQg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kpVbJ9XDUXwTk1ErFUhVvLy/X60tfFhei1V8ppc7paEAc4Evk9dbUPUzI9LqRyDxTVlGQGZRA+Al3KYcVfDur9HCDUnr+DOFsWrNfWZNEdDQXBvFZZg99E2pg3evP8T0Wvc4LKmN9IEqYzK7RZcJb5BHLY5QAtfl8K30Uko1Cy7w3xG26aIYyKTjySTLvjBO0MWAmEadh8//BFPhB2YtoXDe2R7D4W4FqOum4/SbXgFlMzevFs+qR6aY6W9ohCHfGRjJnJifI57J8RlvHkkN5/ckHm4XaabCv9bhX/uz/PdK8e2697Xd0E6WEMIhny+3gXuvxiebSslDK/LLG8rrCA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: andrew.cooper3@xxxxxxxxxx, jbeulich@xxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Xenia.Ragiadakou@xxxxxxx, Stefano Stabellini <stefano.stabellini@xxxxxxx>
  • Delivery-date: Wed, 17 May 2023 08:41:30 +0000
  • Ironport-data: A9a23:EkKTmqBKzfKC8hVW/8niw5YqxClBgxIJ4kV8jS/XYbTApDl312NRz WtLXT+AOK2Na2L1eNp2OtuyphsDuJ7Sy9ExQQY4rX1jcSlH+JHPbTi7wuUcHAvJd5GeExg3h yk6QoOdRCzhZiaE/n9BCpC48T8nk/nOHuGmYAL9EngZbRd+Tys8gg5Ulec8g4p56fC0GArIs t7pyyHlEAbNNwVcbyRFuspvlDs15K6p4G5B4QRnDRx2lAS2e0c9Xcp3yZ6ZdxMUcqEMdsamS uDKyq2O/2+x13/B3fv8z94X2mVTKlLjFVDmZkh+AsBOsTAbzsAG6Y4pNeJ0VKtio27hc+ada jl6ncfYpQ8BZsUgkQmGOvVSO3kW0aZuoNcrLZUj2CA6IoKvn3bEmp1T4E8K0YIw6spnPGZ37 NoiGGpWMCyGoMeVwIqXVbw57igjBJGD0II3nFhFlGucIdN4BJfJTuPN+MNS2yo2ioZWB/HCa sEFaD1pKhPdfxlIPVRRA5U79AuqriCnL3sE9xTI9OxuvDe7IA9ZidABNPLPfdOHX4NNl1uwr WPa5WXpRBodMbRzzBLcqiL017SUx3mTtIQ6HbKH0MRLrQyvhXFLMxYWTkOaq+uysxvrMz5YA wlOksY0loAp71CiRNT5Wxy+oVaHswQaVt4WFPc1gCmPwKfJ5weSBkAfUyVMLtchsaceVTEsk 1OEgd7tLThuq6GOD2KQ8K+OqjG/MjRTKnUNDQc0QA0E6p/ZqY4yhx7GTdF+OKewgpv+HjSY6 yuWoSY3gbJVltIC3ai/+VHBghqlo5SPRQkwjjg7RUqg5wJ9IYKgOYqh7AGB6e4addnGCF6co HIDhs6SqvgUCo2AnzCMR+NLG6y14/GCM3vXhlsH84QdyglBMkWLJeh4iAyS7m8wWirYUVcFu HPuhD4=
  • Ironport-hdrordr: A9a23:qL9xUq0wbrtXwAYMdwEKFAqjBHYkLtp133Aq2lEZdPU0SKGlfq GV7ZEmPHrP4gr5N0tOpTntAse9qBDnhPxICOsqXYtKNTOO0AeVxelZhrcKqAeQeBEWmNQ96U 9hGZIOcuEZDzJB/LvHCN/TKadd/DGFmprY+ts31x1WPGVXgzkL1XYANu6ceHcGIzVuNN4CO7 e3wNFInDakcWR/VLXBOpFUN9KzweEijfjdEGc7OyI=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, May 16, 2023 at 04:34:09PM -0700, Stefano Stabellini wrote:
> On Tue, 16 May 2023, Roger Pau Monné wrote:
> > On Mon, May 15, 2023 at 05:11:25PM -0700, Stefano Stabellini wrote:
> > > On Mon, 15 May 2023, Roger Pau Monné wrote:
> > > > On Fri, May 12, 2023 at 06:17:20PM -0700, Stefano Stabellini wrote:
> > > > > From: Stefano Stabellini <stefano.stabellini@xxxxxxx>
> > > > > 
> > > > > Mapping the ACPI tables to Dom0 PVH 1:1 leads to memory corruptions of
> > > > > the tables in the guest. Instead, copy the tables to Dom0.
> > > > > 
> > > > > This is a workaround.
> > > > > 
> > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx>
> > > > > ---
> > > > > As mentioned in the cover letter, this is a RFC workaround as I don't
> > > > > know the cause of the underlying problem. I do know that this patch
> > > > > solves what would be otherwise a hang at boot when Dom0 PVH attempts 
> > > > > to
> > > > > parse ACPI tables.
> > > > 
> > > > I'm unsure how safe this is for native systems, as it's possible for
> > > > firmware to modify the data in the tables, so copying them would
> > > > break that functionality.
> > > > 
> > > > I think we need to get to the root cause that triggers this behavior
> > > > on QEMU.  Is it the table checksum that fail, or something else?  Is
> > > > there an error from Linux you could reference?
> > > 
> > > I agree with you but so far I haven't managed to find a way to the root
> > > of the issue. Here is what I know. These are the logs of a successful
> > > boot using this patch:
> > > 
> > > [   10.437488] ACPI: Early table checksum verification disabled
> > > [   10.439345] ACPI: RSDP 0x000000004005F955 000024 (v02 BOCHS )
> > > [   10.441033] ACPI: RSDT 0x000000004005F979 000034 (v01 BOCHS  BXPCRSDT 
> > > 00000001 BXPC 00000001)
> > > [   10.444045] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS  BXPCAPIC 
> > > 00000001 BXPC 00000001)
> > > [   10.445984] ACPI: FACP 0x000000004005FA65 000074 (v01 BOCHS  BXPCFACP 
> > > 00000001 BXPC 00000001)
> > > [   10.447170] ACPI BIOS Warning (bug): Incorrect checksum in table 
> > > [FACP] - 0x67, should be 0x30 (20220331/tbprint-174)
> > > [   10.449522] ACPI: DSDT 0x000000004005FB19 00145D (v01 BOCHS  BXPCDSDT 
> > > 00000001 BXPC 00000001)
> > > [   10.451258] ACPI: FACS 0x000000004005FAD9 000040
> > > [   10.452245] ACPI: Reserving APIC table memory at [mem 
> > > 0x40060f76-0x40060fff]
> > > [   10.452389] ACPI: Reserving FACP table memory at [mem 
> > > 0x4005fa65-0x4005fad8]
> > > [   10.452497] ACPI: Reserving DSDT table memory at [mem 
> > > 0x4005fb19-0x40060f75]
> > > [   10.452602] ACPI: Reserving FACS table memory at [mem 
> > > 0x4005fad9-0x4005fb18]
> > > 
> > > 
> > > And these are the logs of the same boot (unsuccessful) without this
> > > patch:
> > > 
> > > [   10.516015] ACPI: Early table checksum verification disabled
> > > [   10.517732] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS )
> > > [   10.519535] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS  BXPCRSDT 
> > > 00000001 BXPC 00000001)
> > > [   10.522523] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS  BXPCAPIC 
> > > 00000001 BXPC 00000001)
> > > [   10.527453] ACPI: ���� 0x000000007FFE149D FFFFFFFF (v255 ������ 
> > > �������� FFFFFFFF ���� FFFFFFFF)
> > > [   10.528362] ACPI: Reserving APIC table memory at [mem 
> > > 0x40060f76-0x40060fff]
> > > [   10.528491] ACPI: Reserving ���� table memory at [mem 
> > > 0x7ffe149d-0x17ffe149b]
> > > 
> > > It is clearly a memory corruption around FACS but I couldn't find the
> > > reason for it. The mapping code looks correct. I hope you can suggest a
> > > way to narrow down the problem. If I could, I would suggest to apply
> > > this patch just for the QEMU PVH tests but we don't have the
> > > infrastructure for that in gitlab-ci as there is a single Xen build for
> > > all tests.
> > 
> > Would be helpful to see the memory map provided to Linux, just in case
> > we messed up and there's some overlap.
> 
> Everything looks correct. Here are some more logs:
> 
> (XEN) Xen-e820 RAM map:
> (XEN)  [0000000000000000, 000000000009fbff] (usable)
> (XEN)  [000000000009fc00, 000000000009ffff] (reserved)
> (XEN)  [00000000000f0000, 00000000000fffff] (reserved)
> (XEN)  [0000000000100000, 000000007ffdffff] (usable)
> (XEN)  [000000007ffe0000, 000000007fffffff] (reserved)
> (XEN)  [00000000fffc0000, 00000000ffffffff] (reserved)
> (XEN)  [000000fd00000000, 000000ffffffffff] (reserved)
> (XEN) Microcode loading not available
> (XEN) New Xen image base address: 0x7f600000
> (XEN) System RAM: 2047MB (2096636kB)
> (XEN) ACPI: RSDP 000F58D0, 0014 (r0 BOCHS )
> (XEN) ACPI: RSDT 7FFE1B21, 0034 (r1 BOCHS  BXPC            1 BXPC        1)
> (XEN) ACPI: FACP 7FFE19CD, 0074 (r1 BOCHS  BXPC            1 BXPC        1)
> (XEN) ACPI: DSDT 7FFE0040, 198D (r1 BOCHS  BXPC            1 BXPC        1)
> (XEN) ACPI: FACS 7FFE0000, 0040
> (XEN) ACPI: APIC 7FFE1A41, 0080 (r1 BOCHS  BXPC            1 BXPC        1)
> (XEN) ACPI: HPET 7FFE1AC1, 0038 (r1 BOCHS  BXPC            1 BXPC        1)
> (XEN) ACPI: WAET 7FFE1AF9, 0028 (r1 BOCHS  BXPC            1 BXPC        1)
> [...]
> (XEN) Dom0 memory map:
> (XEN)  [0000000000000000, 000000000009efff] (usable)
> (XEN)  [000000000009fc00, 000000000009ffff] (reserved)
> (XEN)  [00000000000f0000, 00000000000fffff] (reserved)
> (XEN)  [0000000000100000, 0000000040060f1d] (usable)
> (XEN)  [0000000040060f1e, 0000000040060fa7] (ACPI data)
> (XEN)  [0000000040061000, 000000007ffdffff] (unusable)
> (XEN)  [000000007ffe0000, 000000007fffffff] (reserved)
> (XEN)  [00000000fffc0000, 00000000ffffffff] (reserved)
> (XEN)  [000000fd00000000, 000000ffffffffff] (reserved)
> [...]
> [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
> [    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x00000000000fffff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000040060f1d] usable
> [    0.000000] BIOS-e820: [mem 0x0000000040060f1e-0x0000000040060fa7] ACPI 
> data
> [    0.000000] BIOS-e820: [mem 0x0000000040061000-0x000000007ffdffff] unusable
> [    0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
> [...]
> [   10.102427] ACPI: Early table checksum verification disabled
> [   10.104455] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS )
> [   10.106250] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS  BXPC     
> 00000001 BXPC 00000001)
> [   10.109549] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS  BXPC     
> 00000001 BXPC 00000001)
> [   10.115173] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ �������� 
> FFFFFFFF ���� FFFFFFFF)
> [   10.116054] ACPI: Reserving APIC table memory at [mem 
> 0x40060f76-0x40060fff]
> [   10.116182] ACPI: Reserving ���� table memory at [mem 
> 0x7ffe19cd-0x17ffe19cb]
> 
> 
> 
> > It seems like some of the XSDT entries (the FADT one) is corrupt?
> > 
> > Could you maybe add some debug to the Xen-crafted XSDT placement.
> 
> I added a printk just after:
> 
>   xsdt->table_offset_entry[j++] = tables[i].address;
> 
> And it printed only once:
> 
>   (XEN) DEBUG pvh_setup_acpi_xsdt 1000 name=FACP address=7ffe19cd
> 
> That actually matches the address read by Linux:
> 
>   [   10.175448] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ �������� 
> FFFFFFFF ���� FFFFFFFF)
> 
> So the address seems correct. It is the content of the FADT/FACP table
> that is corrupted.
> 
> I wrote the following function in Xen:
> 
> static void check(void)
> {
>     unsigned long addr = 0x7ffe19cd;
>     struct acpi_table_fadt *fadt;
>     fadt = acpi_os_map_memory(addr, sizeof(*fadt));
>     printk("DEBUG %s %d s=%s\n",__func__,__LINE__,fadt->header.signature);
>     acpi_os_unmap_memory(fadt, sizeof(*fadt));
> }
> 
> It prints the right table signature at the end of pvh_setup_acpi.
> I also added a call at the top of xenmem_add_to_physmap_one, and the
> signature is still correct. Then I added a call at the beginning of
> __update_vcpu_system_time. Here is the surprise: from Xen point of view
> the table never gets corrupted. Here are the logs:
> 
> [...]
> (XEN) DEBUG fadt_check 1551 s=FACPt
> (XEN) DEBUG fadt_check 1551 s=FACPt
> (XEN) DEBUG fadt_check 1551 s=FACPt
> (XEN) DEBUG fadt_check 1551 s=FACPt
> (XEN) d0v0: upcall vector f3
> [    0.000000] Linux version 6.1.19 (root@124de7fbba7f) (gcc (Debian 
> 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_3
> [    0.000000] Command line: console=hvc0
> [...]
> [   10.371610] ACPI: Early table checksum verification disabled
> [   10.373633] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS )
> [   10.375548] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS  BXPC     
> 00000001 BXPC 00000001)
> [   10.378732] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS  BXPC     
> 00000001 BXPC 00000001)
> [   10.384188] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ �������� 
> FFFFFFFF ���� FFFFFFFF)
> [   10.385374] ACPI: Reserving APIC table memory at [mem 
> 0x40060f76-0x40060fff]
> [   10.385519] ACPI: Reserving ���� table memory at [mem 
> 0x7ffe19cd-0x17ffe19cb]
> [...]
> (XEN) DEBUG fadt_check 1551 s=FACPt
> (XEN) DEBUG fadt_check 1551 s=FACPt
> (XEN) DEBUG fadt_check 1551 s=FACPt
> (XEN) DEBUG fadt_check 1551 s=FACPt
> 
> 
> So it looks like it is a problem with the mapping itself? Xen sees the
> data correctly and Linux sees it corrupted?

It seems to me like the page is not correctly mapped, and so 1s are
returned? (same behavior as a hole).  IOW: would seem to me like MMIO
areas are not correctly handled by nested NPT? (I assume you are
running this on AMD).

Does it make a difference if you try to boot with dom0=pvh,shadow?

A couple of wild ideas.  Maybe the nested virt support that you are
using doesn't handle the UC bit in second stage page table entries?
You could to remove this in p2m_type_to_flags() (see the
p2m_mmio_direct case).

Another wild idea I have is that the emulated NPT code doesn't like
having the bits 63:52 from the PTE set to anything different than 0,
and hence only p2m_ram_rw works (p2m_mmio_direct is 5).

> 
> 
> > > If it helps to repro on your side, you can just do the following,
> > > assuming your Xen repo is in /local/repos/xen:
> > > 
> > > 
> > > cd /local/repos/xen
> > > mkdir binaries
> > > cd binaries
> > > mkdir -p dist/install/
> > > 
> > > docker run -it -v `pwd`:`pwd` 
> > > registry.gitlab.com/xen-project/xen/tests-artifacts/alpine:3.12
> > > cp /initrd* /local/repos/xen/binaries
> > > exit
> > > 
> > > docker run -it -v `pwd`:`pwd` 
> > > registry.gitlab.com/xen-project/xen/tests-artifacts/kernel:6.1.19
> > > cp /bzImage /local/repos/xen/binaries
> > > exit
> > > 
> > > That's it. Now you have enough pre-built binaries to repro the issue.
> > > Next you can edit automation/scripts/qemu-alpine-x86_64.sh to add
> > > 
> > >   dom0=pvh dom0_mem=1G dom0-iommu=none
> > 
> > Do you get to boot with dom0-iommu=none?  Is there also some trick
> > here in order to identity map dom0? I would expect things to not work
> > because addresses used for IO with QEMU emulated devices won't be
> > correct.
> 
> That's easy: just don't use any devices to boot. Put everything needed
> in the dom0 ramdisk. That's the configuration provided in the gitlab-ci
> script I pointed you in the previous email which uses an Alpine Linux
> ramdisk.

Doh, yes :).

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.