[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/x2apic: introduce a mixed physical/cluster mode



Content-D
isposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <81f6bbd5-0487-461a-af1a-dbb6ead47cab@xxxxxxxxxx>

On 2023-11-18 11:11, Andrew Cooper wrote:
> On 18/11/2023 3:04 am, Elliott Mitchell wrote:
> > On Fri, Nov 17, 2023 at 11:12:37AM +0100, Neowutran wrote:
> >> On 2023-11-07 11:11, Elliott Mitchell wrote:
> >>> On Mon, Oct 30, 2023 at 04:27:22PM +01
> >>>> On Mon, Oct 30, 2023 at 07:50:27AM -0700, Elliott Mitchell wrote:
> >>>>> On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
> >>>>>> diff --git a/xen/arch/x86/genapic/x2apic.c 
> >>>>>> b/xen/arch/x86/genapic/x2apic.c
> >>>>>> index 707deef98c27..15632cc7332e 100644
> >>>>>> --- a/xen/arch/x86/genapic/x2apic.c
> >>>>>> +++ b/xen/arch/x86/genapic/x2apic.c
> >>>>>> @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
> >>>>>>  static int8_t __initdata x2apic_phys = -1;
> >>>>>>  boolean_param("x2apic_phys", x2apic_phys);
> >>>>>>  
> >>>>>> +enum {
> >>>>>> +   unset, physical, cluster, mixed
> >>>>>> +} static __initdata x2
apic_mode = unset;
> >>>>>> +
> >>>>>> +static int __init parse_x2apic_mode(const char *s)
> >>>>>> +{
> >>>>>> +    if ( !cmdline_strcmp(s, "physical") )
> >>>>>> +        x2apic_mode = physical;
> >>>>>> +    else if ( !cmdline_strcmp(s, "cluster") )
> >>>>>> +        x2apic_mode = cluster;
> >>>>>> +    else if ( !cmdline_strcmp(s, "mixed") )
> >>>>>> +   
> >>>>>> +    else
> >>>>>> +        return EINVAL;
> >>>>>> +
> >>>>>> +    return 0;
> >>>>>> +}
> >>>>>> +custom_param("x2apic-mode", parse_x2apic_mode);
> >>>>>> +
> >>>>>>  const struct genapic *__init apic_x2apic_probe(void)
> >>>>>>  {
> >>>>>> -    if ( x2apic_phys < 0 )
> >>>>>> +    /* x2apic-mode option has preference over x2apic_phys. */
> >>>>>> +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
> >>>>>> +        x2apic_mode = x2apic_phys ? physical : cluster;
> >>>>>> +
> >>>>>> +    if ( x2apic_mode == unset )
> >>>>>>      {
> >>>>>> -        /*
> >>>>>> -         * Force physical mode if there's no (full) interrupt 
> >>>>>> remapping support:

> >>>>>> -         * The ID in clustered mode requires a 32 bit destination 
> >>>>>> field due to
> >>>>>> -         * the usage of the high 16 bits to hold the cluster ID.
> >>>>>> -         */
> >>>>>> -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
> >>>>>> -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) 
> >>>>>> ||
> >>>>>> -        
> >>>>>> -    }
> >>>>>> -    else if ( !x2apic_phys )
> >>>>>> -        switch ( iommu_intremap )
> >>>>>> +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
> >>>>>>          {
> >>>>> Could this explain the issues with recent AMD processors/motherboards?
> >>>>>
> >>>>> Mainly the firmware had been setting this flag, but Xen was previously
> >>>>> ignoring it?
> >>>> No, not unless you pass {no-}x2apic_phys={false,0} on the Xen command
> >>>> line to force logical (clustered) destination mode.
> >>>>
> >>>>> As such Xen had been attempting to use cluster mode on an
> >>>>> x2APIC where that mode was broken for physical interrupts?
> >>>> No
, not realy, x2apic_phys was already forced to true if
> >>>> acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL is set on the FADT (I
> >>>> just delete that line in this same chunk and move it here).
> >>> Okay, that was from a quick look at the patch.  Given the symptoms and
> >>> workaround with recent AMD motherboards, this looked
> >>>
> >>> In that case it might be a bug in what AMD is providing to motherboard
> >>> manufacturers.  Mainly this bit MUST be set, but AMD's implementation
> >>> leaves it unset.
> >>>
> >>> Could also be if the setup is done correctly the bit can be cleared, but
> >>> multiple motherboard manufacturers are breaking this.  Perhaps the steps
> >>> are fragile and AMD needed to provide better guidance.
> >>>
> >>>
> >>> Neowutran, are you still setup to and interested in doing
> >>> experimentation/testing with Xen on your AMD computer?  Would you be up
> >>> for trying the patch here:
> >>>
> >>> https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@xxxxxxxxxx/raw
> 
>>>
> >>> I have a suspicion this *might* fix the x2APIC issue everyone has been
> >>> seeing.
> >>>
> >>> How plausible would it be to release this as a bugfix/workaround on 4.17?
> >>> I'm expecting a "no", but figured I should ask given how widespread the
> >>> issue is.
> >> I just applied the patch on my setup ( 
> >> https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@xxxxxxxxxx/raw
> >>  ) 
> >> It seems to fix the x2APIC issue I was having. 
> >>
> >> I only did some quick tests: 
> >>
> >> I tried all the differents values in my bios for the X2APIC settings. 
> >> Now the system successfully boot in all the cases, without needing
> >> manual override of the x2apic_phys/x2apic_mode parameter in boot 
> >> commandline .
> > In light of this issue effecting a large number of people with recent
> > hardware, I formally request the patch
> > "x86/x2apic: introduce a mixed physical/cluster mode" be considered for
> > backport release on the 4.17 and 4.18 branches.
> >
> > (I'm unsure whether it is realistic 
for a 4.17 update, but I figure I
> > should ask)
> 
> This is an unreasonable ask.
> 
> I believe you when you say there is (or at least was) an x2apic bug (or
> bugs), but not once did you provide the logging requested, nor engage
> usefully with us in debugging.
> 
> And despite this, we (Roger, Jan and myself) found, fixed and backported
> 3 x2apic bugs.
> 
> Now you come along guessing alone at x2apic in a patch name that it
> fixes your problem, on a patch which is not a bugfix - it's a
> performance optimisation.


------

> Neowutran, thankyou for looking into the patch, but I'm afraid that
> doesn't confirm that this patch fixed an issue either.  If it really did
> make a difference, then you'll see a difference in behaviour using each
> of the 3 new x2apic-mode= options.
> 
> Please could you take your single up-to-date build of Xen, put the BIOS
> settings back to whatever was causing you problems originally, and
> describe what happens when booting each of
> x2apic-mode={physical,cluster,mixed}?


Hi, 
I did some more tests and research, indeed this patch improved/solved my 
specific case. 

Starting point: 

I am using Xen version 4.17.2 (exactly this source 
https://github.com/QubesOS/qubes-vmm-xen).
In the bios (a Asus motherboard), I configured the "local apic" parameter to 
"X2APIC".
For Xen, I did not set the parameter "x2apic-mode" nor the parameter 
"x2apic_phys". 

Case 1:
I tryied to boot just like that, result: system is unusuably slow

Case 2:
Then, I applied a backport of the patch  
https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@xxxxxxxxxx/raw
 
to the original Xen version of QubesOS and I recompiled. 
(https://github.com/neowutran/qubes-vmm-xen/blob/x2apic3/X2APIC.patch)
Result: it work, the system is usable. 

Case 3:
Then, I applied the patch 
https://github.com/xen-project/xen/commit/26a449ce32cef33f2cb50602be19fcc0c4223ba9
to the original Xen version of QubesOS and I recompiled.
(https://github.com/neowutran/qubes-vmm-xen/blob/x2apic4/X2APIC.patch)
Result: system is 
unusuably slow. 


In "Case 2", the value returned by the function "apic_x2apic_probe" is 
"&apic_x2apic_mixed". 
In "Case 3", the value returned by the function "apic_x2apic_probe" is 
"&apic_x2apic_cluster". 


-------------------
If you want / need, details for the function "apic_x2apic_probe":

Known "input" value:

"CONFIG_X2APIC_PHYSICAL" is not defined
"iommu_intremap == iommu_intremap_off" = false
"acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL" -> 0
"acpi_gbl_FADT.flags" = 247205 (in decimal)
"CONFIG_X2APIC_PHYSICAL" is not defined
"CONFIG_X2APIC_MIXED" is defined, because it is the default choice
"x2apic_mode" = 0
"x2apic_phys" = -1



Trace log (I did some call "printk" to trace what was going on)
Case 2:
(XEN) NEOWUTRAN: X2APIC_MODE: 0 
(XEN) NEOWUTRAN: X2APIC_PHYS: -1 
(XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
(XEN) NEOWUTRAN IOMMU_INTREMAP: different 
(XEN) Neowutran: PASSE 2 
(XEN) Neowutran: PASSE 4 
(XEN) NEOWUTRAN: X2APIC_MODE: 3 
(XEN) Neowutran: PASSE 7 
(XEN) NEOWUTRAN: X2APIC_MODE: 3 

(XEN) NEOWUTRAN: X2APIC_PHYS: -1 
(XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
(XEN) NEOWUTRAN IOMMU_INTREMAP: different 

Case 3:
(XEN) NEOWUTRAN2: X2APIC_PHYS: -1 
(XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
(XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
(XEN) Neowutran2: Passe 1 
(XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
(XEN) Neowutran2: Passe 6 
(XEN) Neowutran2: Passe 7 
(XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
(XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
(XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
(XEN) Neowutran2: Passe 2 
(XEN) Neowutran2: Passe 4 
(XEN) Neowutran2: Passe 7



If you require the full logs, I could publish the full logs somewhere.
----------------------

( However I do not understand if the root issue is a buggy motherboard, a
bug in xen, or if the parameter "X2APIC_PHYSICAL" should have been set
by the QubesOS project, or something else)
 
> Thankyou,
> 
> ~Andrew

Thanks you, 
Neowutran




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.