[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 2/2] xen/x86/pvh: copy ACPI tables to Dom0 instead of mapping
On Wed, 17 May 2023, Roger Pau Monné wrote: > On Tue, May 16, 2023 at 04:34:09PM -0700, Stefano Stabellini wrote: > > On Tue, 16 May 2023, Roger Pau Monné wrote: > > > On Mon, May 15, 2023 at 05:11:25PM -0700, Stefano Stabellini wrote: > > > > On Mon, 15 May 2023, Roger Pau Monné wrote: > > > > > On Fri, May 12, 2023 at 06:17:20PM -0700, Stefano Stabellini wrote: > > > > > > From: Stefano Stabellini <stefano.stabellini@xxxxxxx> > > > > > > > > > > > > Mapping the ACPI tables to Dom0 PVH 1:1 leads to memory corruptions > > > > > > of > > > > > > the tables in the guest. Instead, copy the tables to Dom0. > > > > > > > > > > > > This is a workaround. > > > > > > > > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx> > > > > > > --- > > > > > > As mentioned in the cover letter, this is a RFC workaround as I > > > > > > don't > > > > > > know the cause of the underlying problem. I do know that this patch > > > > > > solves what would be otherwise a hang at boot when Dom0 PVH > > > > > > attempts to > > > > > > parse ACPI tables. > > > > > > > > > > I'm unsure how safe this is for native systems, as it's possible for > > > > > firmware to modify the data in the tables, so copying them would > > > > > break that functionality. > > > > > > > > > > I think we need to get to the root cause that triggers this behavior > > > > > on QEMU. Is it the table checksum that fail, or something else? Is > > > > > there an error from Linux you could reference? > > > > > > > > I agree with you but so far I haven't managed to find a way to the root > > > > of the issue. Here is what I know. These are the logs of a successful > > > > boot using this patch: > > > > > > > > [ 10.437488] ACPI: Early table checksum verification disabled > > > > [ 10.439345] ACPI: RSDP 0x000000004005F955 000024 (v02 BOCHS ) > > > > [ 10.441033] ACPI: RSDT 0x000000004005F979 000034 (v01 BOCHS > > > > BXPCRSDT 00000001 BXPC 00000001) > > > > [ 10.444045] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS > > > > BXPCAPIC 00000001 BXPC 00000001) > > > > [ 10.445984] ACPI: FACP 0x000000004005FA65 000074 (v01 BOCHS > > > > BXPCFACP 00000001 BXPC 00000001) > > > > [ 10.447170] ACPI BIOS Warning (bug): Incorrect checksum in table > > > > [FACP] - 0x67, should be 0x30 (20220331/tbprint-174) > > > > [ 10.449522] ACPI: DSDT 0x000000004005FB19 00145D (v01 BOCHS > > > > BXPCDSDT 00000001 BXPC 00000001) > > > > [ 10.451258] ACPI: FACS 0x000000004005FAD9 000040 > > > > [ 10.452245] ACPI: Reserving APIC table memory at [mem > > > > 0x40060f76-0x40060fff] > > > > [ 10.452389] ACPI: Reserving FACP table memory at [mem > > > > 0x4005fa65-0x4005fad8] > > > > [ 10.452497] ACPI: Reserving DSDT table memory at [mem > > > > 0x4005fb19-0x40060f75] > > > > [ 10.452602] ACPI: Reserving FACS table memory at [mem > > > > 0x4005fad9-0x4005fb18] > > > > > > > > > > > > And these are the logs of the same boot (unsuccessful) without this > > > > patch: > > > > > > > > [ 10.516015] ACPI: Early table checksum verification disabled > > > > [ 10.517732] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS ) > > > > [ 10.519535] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS > > > > BXPCRSDT 00000001 BXPC 00000001) > > > > [ 10.522523] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS > > > > BXPCAPIC 00000001 BXPC 00000001) > > > > [ 10.527453] ACPI: ���� 0x000000007FFE149D FFFFFFFF (v255 ������ > > > > �������� FFFFFFFF ���� FFFFFFFF) > > > > [ 10.528362] ACPI: Reserving APIC table memory at [mem > > > > 0x40060f76-0x40060fff] > > > > [ 10.528491] ACPI: Reserving ���� table memory at [mem > > > > 0x7ffe149d-0x17ffe149b] > > > > > > > > It is clearly a memory corruption around FACS but I couldn't find the > > > > reason for it. The mapping code looks correct. I hope you can suggest a > > > > way to narrow down the problem. If I could, I would suggest to apply > > > > this patch just for the QEMU PVH tests but we don't have the > > > > infrastructure for that in gitlab-ci as there is a single Xen build for > > > > all tests. > > > > > > Would be helpful to see the memory map provided to Linux, just in case > > > we messed up and there's some overlap. > > > > Everything looks correct. Here are some more logs: > > > > (XEN) Xen-e820 RAM map: > > (XEN) [0000000000000000, 000000000009fbff] (usable) > > (XEN) [000000000009fc00, 000000000009ffff] (reserved) > > (XEN) [00000000000f0000, 00000000000fffff] (reserved) > > (XEN) [0000000000100000, 000000007ffdffff] (usable) > > (XEN) [000000007ffe0000, 000000007fffffff] (reserved) > > (XEN) [00000000fffc0000, 00000000ffffffff] (reserved) > > (XEN) [000000fd00000000, 000000ffffffffff] (reserved) > > (XEN) Microcode loading not available > > (XEN) New Xen image base address: 0x7f600000 > > (XEN) System RAM: 2047MB (2096636kB) > > (XEN) ACPI: RSDP 000F58D0, 0014 (r0 BOCHS ) > > (XEN) ACPI: RSDT 7FFE1B21, 0034 (r1 BOCHS BXPC 1 BXPC 1) > > (XEN) ACPI: FACP 7FFE19CD, 0074 (r1 BOCHS BXPC 1 BXPC 1) > > (XEN) ACPI: DSDT 7FFE0040, 198D (r1 BOCHS BXPC 1 BXPC 1) > > (XEN) ACPI: FACS 7FFE0000, 0040 > > (XEN) ACPI: APIC 7FFE1A41, 0080 (r1 BOCHS BXPC 1 BXPC 1) > > (XEN) ACPI: HPET 7FFE1AC1, 0038 (r1 BOCHS BXPC 1 BXPC 1) > > (XEN) ACPI: WAET 7FFE1AF9, 0028 (r1 BOCHS BXPC 1 BXPC 1) > > [...] > > (XEN) Dom0 memory map: > > (XEN) [0000000000000000, 000000000009efff] (usable) > > (XEN) [000000000009fc00, 000000000009ffff] (reserved) > > (XEN) [00000000000f0000, 00000000000fffff] (reserved) > > (XEN) [0000000000100000, 0000000040060f1d] (usable) > > (XEN) [0000000040060f1e, 0000000040060fa7] (ACPI data) > > (XEN) [0000000040061000, 000000007ffdffff] (unusable) > > (XEN) [000000007ffe0000, 000000007fffffff] (reserved) > > (XEN) [00000000fffc0000, 00000000ffffffff] (reserved) > > (XEN) [000000fd00000000, 000000ffffffffff] (reserved) > > [...] > > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable > > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x00000000000fffff] > > reserved > > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000040060f1d] usable > > [ 0.000000] BIOS-e820: [mem 0x0000000040060f1e-0x0000000040060fa7] ACPI > > data > > [ 0.000000] BIOS-e820: [mem 0x0000000040061000-0x000000007ffdffff] > > unusable > > [ 0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff] > > reserved > > [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] > > reserved > > [ 0.000000] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] > > reserved > > [...] > > [ 10.102427] ACPI: Early table checksum verification disabled > > [ 10.104455] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS ) > > [ 10.106250] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS BXPC > > 00000001 BXPC 00000001) > > [ 10.109549] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS BXPC > > 00000001 BXPC 00000001) > > [ 10.115173] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ �������� > > FFFFFFFF ���� FFFFFFFF) > > [ 10.116054] ACPI: Reserving APIC table memory at [mem > > 0x40060f76-0x40060fff] > > [ 10.116182] ACPI: Reserving ���� table memory at [mem > > 0x7ffe19cd-0x17ffe19cb] > > > > > > > > > It seems like some of the XSDT entries (the FADT one) is corrupt? > > > > > > Could you maybe add some debug to the Xen-crafted XSDT placement. > > > > I added a printk just after: > > > > xsdt->table_offset_entry[j++] = tables[i].address; > > > > And it printed only once: > > > > (XEN) DEBUG pvh_setup_acpi_xsdt 1000 name=FACP address=7ffe19cd > > > > That actually matches the address read by Linux: > > > > [ 10.175448] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ > > �������� FFFFFFFF ���� FFFFFFFF) > > > > So the address seems correct. It is the content of the FADT/FACP table > > that is corrupted. > > > > I wrote the following function in Xen: > > > > static void check(void) > > { > > unsigned long addr = 0x7ffe19cd; > > struct acpi_table_fadt *fadt; > > fadt = acpi_os_map_memory(addr, sizeof(*fadt)); > > printk("DEBUG %s %d s=%s\n",__func__,__LINE__,fadt->header.signature); > > acpi_os_unmap_memory(fadt, sizeof(*fadt)); > > } > > > > It prints the right table signature at the end of pvh_setup_acpi. > > I also added a call at the top of xenmem_add_to_physmap_one, and the > > signature is still correct. Then I added a call at the beginning of > > __update_vcpu_system_time. Here is the surprise: from Xen point of view > > the table never gets corrupted. Here are the logs: > > > > [...] > > (XEN) DEBUG fadt_check 1551 s=FACPt > > (XEN) DEBUG fadt_check 1551 s=FACPt > > (XEN) DEBUG fadt_check 1551 s=FACPt > > (XEN) DEBUG fadt_check 1551 s=FACPt > > (XEN) d0v0: upcall vector f3 > > [ 0.000000] Linux version 6.1.19 (root@124de7fbba7f) (gcc (Debian > > 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_3 > > [ 0.000000] Command line: console=hvc0 > > [...] > > [ 10.371610] ACPI: Early table checksum verification disabled > > [ 10.373633] ACPI: RSDP 0x0000000040060F1E 000024 (v02 BOCHS ) > > [ 10.375548] ACPI: RSDT 0x0000000040060F42 000034 (v01 BOCHS BXPC > > 00000001 BXPC 00000001) > > [ 10.378732] ACPI: APIC 0x0000000040060F76 00008A (v01 BOCHS BXPC > > 00000001 BXPC 00000001) > > [ 10.384188] ACPI: ���� 0x000000007FFE19CD FFFFFFFF (v255 ������ �������� > > FFFFFFFF ���� FFFFFFFF) > > [ 10.385374] ACPI: Reserving APIC table memory at [mem > > 0x40060f76-0x40060fff] > > [ 10.385519] ACPI: Reserving ���� table memory at [mem > > 0x7ffe19cd-0x17ffe19cb] > > [...] > > (XEN) DEBUG fadt_check 1551 s=FACPt > > (XEN) DEBUG fadt_check 1551 s=FACPt > > (XEN) DEBUG fadt_check 1551 s=FACPt > > (XEN) DEBUG fadt_check 1551 s=FACPt > > > > > > So it looks like it is a problem with the mapping itself? Xen sees the > > data correctly and Linux sees it corrupted? > > It seems to me like the page is not correctly mapped, and so 1s are > returned? (same behavior as a hole). IOW: would seem to me like MMIO > areas are not correctly handled by nested NPT? (I assume you are > running this on AMD). > > Does it make a difference if you try to boot with dom0=pvh,shadow? > > A couple of wild ideas. Maybe the nested virt support that you are > using doesn't handle the UC bit in second stage page table entries? > You could to remove this in p2m_type_to_flags() (see the > p2m_mmio_direct case). > > Another wild idea I have is that the emulated NPT code doesn't like > having the bits 63:52 from the PTE set to anything different than 0, > and hence only p2m_ram_rw works (p2m_mmio_direct is 5). Many thanks to Xenia for figuring out the root cause of the bug. The underlying memory region is already added as E820_RESERVED to the guest (instead of E820_ACPI). When pvh_add_mem_range is called with E820_ACPI as parameter for the interesting table, pvh_add_mem_range returns with -EEXIST without doing anything. The original fix by Xenia was to carve out the relevant subset of the reserved region and mark it as E820_ACPI. Instead, I rewrote it as changing the type of the entire region to E820_ACPI because it is simpler and doesn't have to deal with the edge cases (partially overlapping, overlapping 2 existing regions, etc.) What do you think? diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index e1043e40d2..6c1c73d853 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -241,6 +241,20 @@ static int __init pvh_add_mem_range(struct domain *d, uint64_t s, uint64_t e, if ( rs >= e ) break; + if ( re >= e && rs <= s ) + { + /* + * An existing overlapping memory range exists and it is + * marked as reserved. This happens on QEMU. Change the type + * to E820_ACPI. + */ + if ( d->arch.e820[i].type == E820_RESERVED && type == E820_ACPI ) + { + d->arch.e820[i].type = E820_ACPI; + break; + } + } + if ( re > s ) return -EEXIST; }
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |