[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote: > >>> On 08.08.18 at 12:07, <roger.pau@xxxxxxxxxx> wrote: > > Several people have reported hardware issues (malfunctioning USB > > controllers) due to iommu page faults on Intel hardware. Those faults > > are caused by missing RMRR (VTd) entries in the ACPI tables. Those can > > be worked around on VTd hardware by manually adding RMRR entries on > > the command line, this is however limited to Intel hardware and quite > > cumbersome to do. > > > > In order to solve those issues add a new dom0-iommu=reserved option > > that identity maps all regions marked as reserved in the memory map. > > Considering that report we've had (yesterday? earlier today?), don't > we need to go further and use the union of RMRRs and reserved > regions? Iirc they had a case where an RMRR was not in a reserved > region ... AFAICT (and I could be reading the code wrong) RMRR regions not on reserved regions are still correctly mapped to the guest. > > --- a/docs/misc/xen-command-line.markdown > > +++ b/docs/misc/xen-command-line.markdown > > @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses > > to that port. > > >> Enable IOMMU debugging code (implies `verbose`). > > > > ### dom0-iommu > > -> `= List of [ none | strict | relaxed | inclusive ]` > > +> `= List of [ none | strict | relaxed | inclusive | reserved ]` > > > > * `none`: disables DMA remapping for Dom0. > > > > @@ -1233,6 +1233,15 @@ meaning: > > option is only applicable to a PV Dom0 and is enabled by default on Intel > > hardware. > > > > +* `reserved`: sets up DMA remapping for all the reserved regions in the > > memory > > + map for Dom0. Use this to work around firmware issues providing incorrect > > + RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses > > + for Dom0, all memory regions marked as reserved in the memory map that > > don't > > + overlap with any MMIO region from emulated devices will be identity > > mapped. > > + This option maps a subset of the memory that would be mapped when using > > the > > + `inclusive` option. This option is available to a PVH Dom0 and is > > enabled by > > + default on Intel hardware. > > The sub-options so far were quite clear in their meanings, but > "dom0-iommu=reserved" might mean all sorts of things to me, but quite > certainly not "map all reserved regions". "map-reserved" perhaps? Then we should also have 'map-inclusive' for symmetry IMO. > > --- a/xen/arch/x86/hvm/io.c > > +++ b/xen/arch/x86/hvm/io.c > > @@ -404,6 +404,11 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const > > struct domain *d, > > return NULL; > > } > > > > +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr) > > +{ > > + return vpci_mmcfg_find(d, addr); > > +} > > I think the function name should have an "is_" somewhere, to make > clear address is a noun here and not a verb. > > > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > > @@ -256,6 +256,9 @@ static void __hwdom_init amd_iommu_hwdom_init(struct > > domain *d) > > /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */ > > iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false > > : > > iommu_dom0_inclusive; > > + /* Reserved IOMMU mappings are disabled by default on AMD hardware. */ > > + iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false > > + : iommu_dom0_reserved; > > Especially seeing these two together now, I think an if() each would > be easier to read (not the least because of being shorter). To me, > the main purpose of the conditional operator is to allow shortening > simple if/else pairs, rather than lengthening simple if()-s. > > > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d) > > { > > } > > > > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned > > long pfn, > > + unsigned long max_pfn) > > +{ > > + unsigned int i; > > + > > + /* > > + * Ignore any address below 1MB, that's already identity mapped by the > > + * domain builder for HVM. > > + */ > > + if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) || > > Careful again here with the distinction between Dom0 and hwdom: > The domain builder you refer to is - aiui - the in-Xen one, i.e. the > one _only_ dealing with Dom0. So this should instead be: if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) || > > + /* > > + * If dom0-strict mode is enabled or the guest type is PVH/HVM then > > exclude > > + * conventional RAM and let the common code map dom0's pages. > > + */ > > + if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) && > > + (iommu_dom0_strict || is_hvm_domain(d)) ) > > + return false; > > + if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) && > > + !iommu_dom0_reserved && !iommu_dom0_inclusive ) > > + return false; > > + if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) && > > + !page_is_ram_type(pfn, RAM_TYPE_RESERVED) && > > + !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) && > > + (!iommu_dom0_inclusive || pfn > max_pfn) ) > > + return false; > > As page_is_ram_type() is, especially on systems with many E820 > entries, not really cheap, I think at least a minimum amount of > optimization is on order here - after all you do this for every > single page of the system. Hence minimally the result of the first > CONVENTIONAL and RESERVED queries should be latched and > re-used (or "else" be used suitably). Ideally an approach would > be used which involved just a single iteration through the E820 > map, but I realize this may be more than is feasible here. This would be quite better if I could just fetch the type, then I could add a switch (type) { ... and it would be better IMO. > Furthermore I'm unconvinced the !page_is_ram_type() uses > are really what you want: The function returning false means > "don't know", not "no". Perhaps the function needs to be made > return a tristate (yes, no, or part of it). Right, that's why I have three different !page_is_ram_type checks in the last branch of the if, so that I can make sure it's not one of the previous types and also account for holes. > > + /* Check that it doesn't overlap with the LAPIC */ > > + if ( has_vlapic(d) ) > > + { > > + const struct vcpu *v; > > + > > + for_each_vcpu(d, v) > > + if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) ) > > + return false; > > + } > > I don't, btw, recall any code adjusting the IOMMU mappings in > case the domain relocates its LAPIC. If you do the check above, > wouldn't that other side too need taking care of? Yes. I can add something later but this is already an issue if the guest for example relocates the LAPIC over a RAM region. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |