|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
On 20.10.2021 01:34, Andrew Cooper wrote:
> On 22/09/2021 15:36, Jan Beulich wrote:
>> Doing this in amd_iommu_prepare() is too late for it, in particular, to
>> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
>> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
>> (luckily) pretty simple, (pretty importantly) without breaking
>> amd_iommu_prepare()'s logic to prevent multiple processing.
>>
>> This involves moving table checksumming, as
>> amd_iommu_get_supported_ivhd_type() -> get_supported_ivhd_type() will
>> now be invoked before amd_iommu_detect_acpi() -> detect_iommu_acpi(). In
>> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
>> we have other uses of this originally ACPI-private function elsewhere in
>> the tree.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>
> I'm afraid this breaks booting on Skylake Server. Yes, really - I
> didn't believe the bisection at first either.
>
> From a bit of debugging, I've found:
>
> (XEN) *** acpi_dmar_init() => -19
> (XEN) *** amd_iommu_get_supported_ivhd_type() => -19
>
> So VT-d is disabled in firmware. Oops, but something we should cope with.
I wanted to say that I definitely did test this (for a long, long
time) on Intel systems, but clearly not on one like this. I'm sure
though that I did test on IOMMU-less Intel systems, so I'm still a
bit puzzled.
> Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
> exit with -ENOENT too.
>
> It turns out ...
>
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>>
>> int __init acpi_ivrs_init(void)
>> {
>> + int rc;
>> +
>> if ( !iommu_enable && !iommu_intremap )
>> return 0;
>>
>> + rc = amd_iommu_get_supported_ivhd_type();
>> + if ( rc < 0 )
>> + return rc;
>> + BUG_ON(!rc);
>> + ivhd_type = rc;
>> +
>> if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>> {
>> iommu_intremap = iommu_intremap_off;
>>
>
> ... we're relying on this path (now skipped) to set iommu_intremap away
> from iommu_intremap_full in the "no IOMMU anywhere to be found" case.
>
> This explains why I occasionally during failure get spew about:
>
> (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
> [ 17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
> [ 17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
> host supports is 64.
> [ 17.125198] usb usb1-port2: couldn't allocate usb_device
> [ 248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.
>
> and eventually (gone 400s) get dumped in a dracut shell.
>
> Booting with an explicit iommu=no-intremap, which clobbers
> iommu_intremap during cmdline parsing, recovers the system.
>
> This variable controls a whole lot of magic with interrupt handling. It
> should default to 0, not 2, and only become nonzero when an IOMMU is
> properly established. It also shouldn't be serving double duty as "what
> the user wants" ahead of determining the system capabilities.
This would probably be too large a change at this point in time;
I'll see whether I can find something less intrusive. Unless of
course there's a patch already on xen-devel, which I didn't get
to read yet.
> And not to open another can of worms, but our entire way of working
> explodes if there are devices on the system not covered by an IOMMU.
I wouldn't be surprised, but is this something we have to expect
on non-broken systems? (I do know of broken systems giving the
appearance of uncovered devices by lacking suitable include-all
DRHD entries.)
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |