[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] x2apic broken with current AMD hardware


  • To: Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 21 Mar 2023 08:13:15 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=txfMLxfdsC7xPBvjI7isx1WEOEcBv2c99H5fK7IJ0SA=; b=Br+LWjqfs5fclmKDBe00uDCavqaNglmQNVeGg2+oOv7wmLkdJyw+JyVPwt/F8Fwbj1HUsRZ5lMr2STcyOV7tnjXU75uIiPNnV7FcOIv0wkqCiP/1MFyzQtp2zIYgqSKRGkFn23z1I18xf2nbnbRQhflaT4s7a5tuSZ76zyidNxcklLB8hbCZlZjnIEk02rchecxKQ/FOzQhvA2yYK7bmjRuisGW/6ArNmZEDTh43+qJcxgW1CuGVizDEKnJRffv5pde7NHY07uA0qT0tUaB+4dk2F7Dgx+UlsBmzeCe7aTHG+IeZotx/SljZF1hJoEaGD1Ytmb/9/fgG75/714daEg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DRjVEXBfEXC17VaD1/1C6L9ZHFQgNQwE3u7NfowS/pHGGSgVyrjMzGDyIKpQKtmqxyvrgFrFxtBpKUPkK3wZUSHK8h1xS7u+zNfqT90UX/w6mLGIgDgWLRrNTGg3aL6JoIukCuXNwsbLGaKHomdCX3L8X23i5ELKtpj2r9BmoRblCwSTrvfgYNgTGmkjxheM69Sti6MmCLbh4NqGsia8dh7bZUThDAPUflle1AaPJaWvLxyasy/my9XvA6VXbJJrAJ6ic80QZVumdAWxXLLTarUG5Q4GfSDPtyaX4PBA96JrS7MWzcjKS2Vdi5ylPUFd5mufOjuNial5qp/J4g7FCw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Neowutran <xen@xxxxxxxxxxxxx>
  • Delivery-date: Tue, 21 Mar 2023 07:13:38 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.03.2023 05:19, Elliott Mitchell wrote:
> On Mon, Mar 20, 2023 at 09:28:20AM +0100, Jan Beulich wrote:
>> AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
>>
>> An earlier change with the same title (commit 1ba66a870eba) altered only
>> the path where x2apic_phys was already set to false (perhaps from the
>> command line). The same of course needs applying when the variable
>> wasn't modified yet from its initial value.
>>
>> Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> This does appear to be an improvement.  With this the system boots if
> the "Local APIC Mode" setting is "auto".  As you may have guessed,
> "(XEN) Switched to APIC driver x2apic_phys".
> 
> 
> 
> When I tried setting "Local APIC Mode" to "x2APIC" though things didn't
> go so well.  Sometime >15 seconds after Domain 0 boots, first:
> 
> "(XEN) APIC error on CPU#: 00(08), Receive accept error" (looks to be
> for every core)
> 
> Then:
> "(XEN) APIC error on CPU#: 08(08), Receive accept error" (again for
> every core, but *after* the above has appeared for all cores)

Receive accept errors generally mean a bad vector was received, yet the
sending side deemed it fine. That could be a bad I/O APIC RTE, a bad MSI
message data value, or a bad translation thereof into an IRTE (albeit
iirc we never alter the vector).

> The above appears about twice for each core, then I start seeing
> "(XEN) CPU#: No irq handler for vector ?? (IRQ -2147483648, LAPIC)"
> 
> The core doesn't vary too much with this, but the vector varies some.
> 
> Upon looking "(XEN) Using APIC driver x2apic_cluster".  Unfortunately
> I didn't try booting with x2apic_phys forced with this setting.

My guess is that this would also help. But the system should still work
correctly in clustered mode. As a first step I guess debug key 'i', 'z',
and 'M' output may provide some insight. But the request for a full log
at maximum verbosity also remains (ideally with a debug hypervisor).

> So x2apic_cluster is looking like a <ahem> on recent AMD processors.
> 
> 
> I'm unsure this qualifies as "Tested-by".  Certainly it IS an
> improvement, but the problem certainly isn't 100% solved.

There simply are multiple problems; one looks to be solved now.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.