[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/x2apic: introduce a mixed physical/cluster mode



cab@xxxxxxxxxx>
 <x4qzfuqkkebjkdfmhw6rvdhrn2ewa6ghjtjqd7xevnuylfahh7@pjratinsg76a>
 <a4b4546a-60b8-4d0e-bdf4-9af6699fb925@xxxxxxxxxx>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <a4b4546a-60b8-4d0e-bdf4-9af6699fb925@xxxxxxxxxx>


Hello, 
Thanks a lot for all the details and explainations ! :)

On 2023-11-27 11:11, Andrew Cooper wrote:
> On 24/11/2023 7:54 pm, Neowutran wrote:
> > Hi, 
> > I did some more tests and research, indeed this patch improved/solved my 
> > specific case. 
> >
> > Starting point: 
> >
> > I am using Xen version 4.17.2 (exactly this source 
> > https://github.com/QubesOS/qubes-vmm-xen).
> > In the bios (a Asus motherboard), I configured the "local apic" parameter 
> > to "X2APIC".
> > For Xen, I did not set the parameter "x2apic-mode" nor the parameter 
> > "x2apic_phys". 
> >
> > Case 1:
> > I tryied to boot just like that, result: system is unusuably slow
> >
> > Case 2:
> > Then, I applied a backport of the patch  

> > https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@xxxxxxxxxx/raw
> >  
> > to the original Xen version of QubesOS and I recompiled. 
> > (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic3/X2APIC.patch)
> > Result: it work, the system is usable. 
> >
> > Case 3:
> > Then, I applied the patch 
> > https://github.com/xen-project/xen/commit/26a449ce32cef33f2cb50602be19fcc0c4223ba9
> > to the original Xen version of QubesOS and I recompiled.
> > (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic4/X2APIC.patch)
> > Result: system is  
> > unusuably slow. 
> >
> >
> > In "Case 2", the value returned by the function "apic_x2apic_probe" is 
> > "&apic_x2apic_mixed". 
> > In "Case 3", the value returned by the function "apic_x2apic_probe" is 
> > "&apic_x2apic_cluster". 
> >
> >
> > -------------------
> > If you want / need, details for the function "apic_x2apic_probe":
> >
> > Known "input" value:
> >
> > "CONFIG_X2APIC_PHYSICAL" is not defined
> > "iommu_intremap == iommu_intremap_off" = false
> > "acpi
_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL" -> 0
> > "acpi_gbl_FADT.flags" = 247205 (in decimal)
> > "CONFIG_X2APIC_PHYSICAL" is not defined
> > "CONFIG_X2APIC_MIXED" is defined, because it is the default choice
> > "x2apic_mode" = 0
> > "x2apic_phys" = -1
> >
> >
> >
> > Trace log (I did some call "printk" to trace what was going on)
> > Case 2:
> > (XEN) NEOWUTRAN: X2APIC_MODE: 0 
> > (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
> > (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
> > (XEN) Neowutran: PASSE 2 
> > (XEN) Neowutran: PASSE 4 
> > (XEN) NEOWUTRAN: X2APIC_MODE: 3 
> > (XEN) Neowutran: PASSE 7 
> > (XEN) NEOWUTRAN: X2APIC_MODE: 3 
> >  
> > (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
> > (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
> >
> > Case 3:
> > (XEN) NEOWUTRAN2: X2APIC_PHYS: -1 
> > (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
> > (XEN) Neowutran2: Passe 1 
> > (XEN) NEO
WUTRAN2: X2APIC_PHYS: 0 
> > (XEN) Neowutran2: Passe 6 
> > (XEN) Neowutran2: Passe 7 
> > (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
> > (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
> > (XEN) Neowutran2: Passe 2 
> > (XEN) Neowutran2: Passe 4 
> > (XEN) Neowutran2: Passe 7
> >
> >
> >
> > If you require the full logs, I could publish the full logs somewhere.
> > ----------------------
> >
> > ( However I do not understand if the root issue is a buggy motherboard, a
> > bug in xen, or if the parameter "X2APIC_PHYSICAL" should have been set
> > by the QubesOS project, or something else)
> 
> Hello,
> 
> Thankyou for the analysis.
> 
> For your base version of QubeOS Xen, was that 4.13.2-5 ?   I can't see
> any APIC changes in the patchqueue, and I believe all relevant bugfixes
> are in 4.17.2, but I'd just like to confirm.

I was using the qubes-vmm-xen release "4.17.2-5" that use xen version
"4.17.2" . I don't see custom modification for APIC in the patchs
applied t
o Xen by QubesOS

> 
> First, by "unusable slow", other than the speed, did everything else
> appear to operate adequately?  Any chance you could guess the slowdown. 
> i.e. was it half the speed, or "seconds per log console line during
> boot" levels of slow?

Once I was logged in, it took me around 10 minutes to type the command
"sudo dmesg > log"

There was also graphical instabilities (screen display something, then it is 
black,
few seconds later it display things again. 
Sometime it completly crash and I need to reboot to try to finish the 
boot+login process),
and unable to start guests due to the system being too slow. 

Some of the logs gathered from "sudo dmesg" that only appear for case 1 and
case 3: 

"
 nvme nvme1: I/O 998 QID 1 timeout, completion polled
 nvme nvme1: I/O 854 QID 5 timeout, completion polled
 ...
 [drm] Fence fallback timer expired on ring sdma0
 [drm] Fence fallback timer expired on ring sdma0
 ...
 [drm] Fence fallback timer expired on ring sdma0
 [drm] Fence fallback timer ex
pired on ring gfx_0.0.0
 [drm] Fence fallback timer expired on ring gfx_0.0.0
 [drm] Fence fallback timer expired on ring sdma0
 ...
" 
things like that repeated hundreds of times. 

> 
> Having re-reviewed 26a449ce32, the patch is correct but the reasoning is
> wrong.
> 
> ACPI_FADT_APIC_CLUSTER predates x2APIC by almost a decade (it appeared
> in ACPI 3.0), and is not relevant outside of xAPIC mode.  xAPIC has 2
> different logical destination modes, cluster and flat, and their
> applicability is dependent on whether you have fewer or more than 8
> local APICs, hence that property being called out in the ACPI spec.
> 
> x2APIC does not have this property.  DFR was removed from the
> architecture, and logical mode is strictly cluster.  So the bit should
> never have been interpreted on an x2APIC code path.
> 
> Not that it matters in your case - the bit isn't set in your FADT, hence
> why case 1 and 3 have the same behaviour.
> 
> 
> This brings us to case 2, where mixed mode does seem to resolve the per
f
> problem.
> 
> Since that patch was written, I've learnt how cluster delivery mode
> works for external interrupts, and Xen should never ever have been using
> it (Xen appears to be alone in OS software here).  For an external
> interrupt in Logical cluster mode, it always sends to the lowest ID in
> the cluster.  If that APIC decides that the local processor is too busy
> to handle the interrupt now, it forwards the interrupt to the next APIC
> in the cluster, and this cycle continues until one APIC accepts the message.
> 
> You get most interrupts hitting the lowest APIC in the cluster, but the
> interrupt can be forwarded between APICs for an unbounded quantity of
> time depending on system utilisation.
> 
> 
> Could you please take case 2 and confirm what happens when booting with
> x2apic-mode={physical,cluster}?  If the pattern holds, the physical
> should be fine, and cluster should see the same problems as case 1 and 3.

I confirm that the pattern holds. "physical" is fine and "cluster"
have th
e same issue as case 1 and case 3. 


> Thanks,
> 
> ~Andrew

Thanks, 
Neowutran




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.