|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
On 20.10.2021 22:01, Andrew Cooper wrote:
> On 20/10/2021 11:36, Jan Beulich wrote:
>> x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
>> mode (physical vs clustered) depends on iommu_intremap, that variable
>> needs to be set to off as soon as we know we can't / won't enable the
>> IOMMU, i.e. in particular when
>> - parsing of the respective ACPI tables failed,
>> - "iommu=off" is in effect, but not "iommu=no-intremap".
>> Move the turning off of iommu_intremap from AMD specific code into
>> acpi_iommu_init(), accompanying it by clearing of iommu_enable.
>>
>> Take the opportunity and also skip ACPI table parsing altogether when
>> "iommu=off" is in effect anyway.
>>
>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> I've deliberately not added a Fixes: tag here, as I'm of the opinion
>> that d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier") only
>> uncovered a pre-existing anomaly.
>
> I agree it uncovered a pre-existing issue, but that doesn't mean a Fixes
> tag should be omitted. That change very concretely regressed booting on
> some systems in their pre-existing configuration.
>
> The commit message needs to spell out a link to d8bd82327b0f, but it's
> fine to say "that commit broke it by violating an unexpected ordering
> dependency, but isn't really the source of the bug".
>
>> This particularly applies to the "iommu=off" aspect.
>
> There should be at least two Fixes tags, but I suspect trying to trace
> the history of this mess is not a productive use of time.
>
>> (This now allows me to remove an item from my TODO
>> list: I was meaning to figure out why one of my systems wouldn't come
>> up properly with "iommu=off", and I had never thought of combining this
>> with "no-intremap".)
>>
>> Arguably "iommu=off" should turn off subordinate features in common
>> IOMMU code, but doing so in parse_iommu_param() would be wrong (as
>> there might be multiple "iommu=" to parse). This could be placed in
>> iommu_supports_x2apic(), but see the next item.
>
> I don't think we make any claim or implication that passing the same
> option twice works. The problem here is the nested structure of
> options, and the variable doing double duty.
>
>>
>> While the change here deals with apic_x2apic_probe() as called from
>> x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
>> similarly affected. That call occurs before acpi_boot_init(), which is
>> what calls acpi_iommu_init(). The ordering in setup.c is in part
>> relatively fragile, which is why for the moment I'm still hesitant to
>> move the generic_apic_probe() call down. Plus I don't have easy access
>> to a suitable system to test this case. Thoughts?
>
> I've written these thoughts before, but IOMMU handling it a catastrophic
> mess. It needs burning to the ground and redoing from scratch.
>
> We currently have two ways of turning on the IOMMU, depending on what
> firmware does, and plenty ways of crashing Xen with cmdline options
> which should work, and the legacy xAPIC startup routine is after
> interrupts have been enabled, leading to an attempt to rewrite
> interrupts in place to remap them. This in particular has lead to XSAs
> due to trusting registers which can't be trusted, and the rewrite is
> impossible to do safely.
>
> The correct order is:
> 1) Parse DMAR/IVRS (but do not configure anything), MADT, current APIC
> setting and cmdline arguments
> 2) Figure out whether we want to be in xAPIC or x2APIC mode, and whether
> we need intremap. Change the LAPIC setting
> 3) Configure the IOMMUs. In particular, their interrupt needs to be
> after step 2
Leaving aside check_x2apic_preenabled(), all of this is already the
case afaict, almost at least: We do acpi_boot_init(), later
x2apic_bsp_setup(), and yet later iommu_setup(). The only issue might
be inside x2apic_bsp_setup(), where we call iommu_enable_x2apic()
before switching to x2APIC mode. Yet we avoid setting up IOMMU
interrupts during this early stage.
Hence I think, as expressed, that the question really is whether we
can safely defer check_x2apic_preenabled() by a little bit.
> 4) Start up Xen's general IRQ infrastructure.
>
> It's a fair chunk of work, but it will vastly simplify the boot logic
> and let us delete the infinite flushing loops out of the IOMMU logic,
> and we don't need any logic which has to second guess itself based on
> what happened earlier on boot.
>
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -41,6 +41,23 @@ enum iommu_intremap __read_mostly iommu_
>> bool __read_mostly iommu_intpost;
>> #endif
>>
>> +void __init acpi_iommu_init(void)
>> +{
>> + if ( iommu_enable )
>> + {
>> + int ret = acpi_dmar_init();
>> +
>> + if ( ret == -ENODEV )
>> + ret = acpi_ivrs_init();
>> +
>> + if ( ret )
>> + iommu_enable = false;
>> + }
>> +
>> + if ( !iommu_enable )
>> + iommu_intremap = iommu_intremap_off;
>> +}
>
> This does fix my issue, so preferably with the Fixes tag reinstated,
>
> Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Tested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Thanks, but I think there will need to be a v2, as per below (plus
possibly the dealing with check_x2apic_preenabled()).
> However, I don't think skipping parsing is a sensible move. Intremap is
> utterly mandatory if during boot, we discover that our APIC ID is >254,
> and iommu=no-intremap must be ignored in this case, or if the MADT says
> we have CPUs beyond that limit and the user hasn't specified nr_cpus=1
> or equivalent.
Reading this made me realize that the change breaks other behavior.
The conditional really needs to be iommu_enable || iommu_intremap -
at least AMD code added in support for x2APIC already treats the
latter to not be a sub-option of the former (iov_supports_xt(),
acpi_ivrs_init()), and e.g. intel_iommu_supports_eim() also checks
the latter alone.
Overriding "iommu=no-intremap" in case it's unavoidable could then
be a later change, not further affecting the function here.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |