[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 07/17] IOMMU/x86: restrict IO-APIC mappings for PV Dom0
On 07.09.2021 19:13, Andrew Cooper wrote: > On 26/08/2021 13:55, Jan Beulich wrote: >> On 26.08.2021 13:57, Andrew Cooper wrote: >>> On 24/08/2021 15:21, Jan Beulich wrote: >>>> While already the case for PVH, there's no reason to treat PV >>>> differently here, though of course the addresses get taken from another >>>> source in this case. Except that, to match CPU side mappings, by default >>>> we permit r/o ones. This then also means we now deal consistently with >>>> IO-APICs whose MMIO is or is not covered by E820 reserved regions. >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>> Why do we give PV dom0 a mapping of the IO-APIC? Having thought about >>> it, it cannot possibly be usable. >>> >>> IO-APICs use a indirect window, and Xen doesn't perform any >>> write-emulation (that I can see), so the guest can't even read the data >>> register and work out which register it represents. It also can't do an >>> atomic 64bit read across the index and data registers, as that is >>> explicitly undefined behaviour in the IO-APIC spec. >>> >>> On the other hand, we do have PHYSDEVOP_apic_{read,write} which, despite >>> the name, is for the IO-APIC not the LAPIC, and I bet these hypercalls >>> where introduced upon discovering that a read-only mapping of the >>> IO-APIC it totally useless. >>> >>> I think we can safely not expose the IO-APICs to PV dom0 at all, which >>> simplifies the memory handling further. >> The reason we do expose it r/o is that some ACPI implementations access >> (read and write) some RTEs from AML. If we don't allow Dom0 to map the >> IO-APIC, Dom0 will typically fail to boot on such systems. > > I think more details are needed. How do you expect to collect the necessary info without having an affected system to test? I see close to zero chance to locate the old reports (and possible discussion) via web search. > If AML is reading the RTEs, it's is also writing to the index register. Quite likely, yes. Albeit this being broken to a fair degree in the first place, ... > Irrespective of Xen, doing this behind the back of the OS cannot work > safely, because at a minimum the ACPI interpreter would need to take the > ioapic lock, and I see no evidence of workarounds like this in Linux. ... as you indicate you think (as much as I do), leaves room for the actual accesses to also be flawed (and hence meaningless in the first place). I do recall looking at the disassembled AML back at the time, but I do not recall any details for sure. What I seem to vaguely recall is that their whole purpose was to set the mask bit in an RTE (I think to work around the dual routing issue, and I assume in turn to work around missing workarounds in certain OSes). For that the current approach as well as the alternative one you suggest below would be equally "good enough". > In Xen, we appear to swallow writes to mmio_ro ranges which is rude, and > causes the associated reads to read garbage. This is Xen-induced memory > corruption inside dom0. > > > I don't think any of this is legitimate behaviour. ACPI needs disabling > on such systems, or interlocking suitably, and its not just IO-APICs > which are problematic - any windowed I/O (e.g. RTC) as well as any other > device with read side effects. I don't think disabling ACPI on such systems would be a viable option. Things tend to not work very well that way ... Plus iirc these issues weren't exclusively on some random no-name systems, but in at least one of the cases on ones of a pretty large vendor. > I think we should seriously consider not mapping the IO-APIC at all. We can easily do so on the IOMMU side, if you agree to have CPU and IOMMU mappings diverge for this case. Since the behavior is PV- specific anyway, there are also no concerns as to differing behavior with vs without shared page tables (on PVH). > That said, I would be surprised if Linux is cleanly avoiding the > IO-APIC, so a slightly less bad alternative is to redirect to an "MMIO" > frame of ~0's which we still write-discard on top of. > > That at least makes the Xen-induced memory corruption behave > deterministically. It would work more deterministically, yes, but without such a system available to test we wouldn't know whether that approach would actually make things work (sufficiently). Whereas for the current approach we do know (from the testing done back at the time). Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |