Xen project Mailing List

Re: S3 regression on AMD in 4.20

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Mon, 17 Mar 2025 17:35:51 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Doug Goldstein <cardoe@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Mykola Kvach <xakep.amatop@xxxxxxxxx>, Mykyta Poturai <mykyta_poturai@xxxxxxxx>

Delivery-date: Mon, 17 Mar 2025 16:36:01 +0000

Feedback-id: i1568416f:Fastmail

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Mar 17, 2025 at 04:56:15PM +0100, Roger Pau Monné wrote: > On Sat, Mar 15, 2025 at 12:02:50AM +0000, Andrew Cooper wrote: > > On 14/03/2025 11:53 pm, Marek Marczykowski-Górecki wrote: > > > On Fri, Mar 14, 2025 at 11:23:28PM +0100, Marek Marczykowski-Górecki > > > wrote: > > >> On Fri, Mar 14, 2025 at 02:19:19PM -0700, Stefano Stabellini wrote: > > >>> On Fri, 14 Mar 2025, Marek Marczykowski-Górecki wrote: > > >>>> This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 G7. > > >>>> > > >>>> This one has working S3, so add a test for it here. > > >>>> > > >>>> Signed-off-by: Marek Marczykowski-Górecki > > >>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > > >>>> --- > > >>>> Cc: Jan Beulich <jbeulich@xxxxxxxx> > > >>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > >>>> > > >>>> The suspend test added here currently fails on staging[1], but passes > > >>>> on > > >>>> staging-4.19[2]. So the regression wants fixing before committing this > > >>>> patch. > > >>>> > > >>>> [1] > > >>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408437140 > > >>>> [2] > > >>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408943441 > > >>> We could commit the patch now without the s3 test. > > >>> > > >>> I don't know what the x86 maintainers think about fixing the suspend > > >>> bug, but one idea would be to run a bisection between 4.20 and 4.19. > > >> I'm on it already, but it's annoying. Lets convert this thread to > > >> discussion about the issue: > > >> > > >> So, I bisected it between staging-4.19 and master. The breakage is > > >> somewhere between (inclusive): > > >> eb21ce14d709 x86/boot: Rewrite EFI/MBI2 code partly in C > > >> and > > >> 47990ecef286 x86/boot: Improve MBI2 structure check > > >> > > >> But, the first one breaks booting on this system and it remains broken > > >> until the second commit (or its parent) - at which point S3 is already > > >> broken. So, there is a range of 71 commits that may be responsible... > > >> > > >> But then, based on a matrix chat and Jan's observation I've tried > > >> reverting f75780d26b2f "xen: move per-cpu area management into common > > >> code" just on top of 47990ecef286, and that fixed suspend. > > >> Applying "xen/percpu: don't initialize percpu on resume" on top of > > >> 47990ecef286 fixes suspend too. > > >> But applying it on top of master > > >> (91772f8420dfa2fcfe4db68480c216db5b79c512 specifically) does not fix it, > > >> but the failure mode is different than without the patch - system resets > > >> on S3 resume, with no crash message on the serial console (even with > > >> sync_console), instead of hanging. > > >> And one more data point: reverting f75780d26b2f on top of master is the > > >> same as applying "xen/percpu: don't initialize percpu on resume" on > > >> master - system reset on S3 resume. > > >> So, it looks like there are more issues... > > > Another bisection round and I have the second culprit: > > > > > > 8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT > > > index hasn't changed > > > > > > With master+"xen/percpu: don't initialize percpu on resume"+revert of > > > 8e60d47cf011 suspend works again on this AMD system. > > > > That's not surprising in the slightest. > > > > Caching hardware values in Xen isn't safe across S3, which QubesOS has > > found time and time again, and for which we still have outstanding bugs. > > > > S3 turns most of the system off. RAM gets preserved, but devices and > > plenty of internal registers don't. > > I think I've spotted the issue. enable_iommu() called on resume > (ab)uses set_msi_affinity() to force an MSI register write, as it's > previous behavior was to unconditionally propagate the values. With > my change it would no longer perform such writes on resume. > > I think the patch below should help. It doesn't, I still got reboot on resume, with no crash message on serial (even with sync_console). And the other patch (setting force=true in all calls) gets me panic on boot: (XEN) [ 11.890757] Assertion '(msg->data & (entry[-nr].msi.nvec - 1)) == nr' failed at arch/x86/msi.c:210 (XEN) [ 11.900440] ----[ Xen-4.21-unstable x86_64 debug=y Tainted: C ]---- (XEN) [ 11.908082] CPU: 3 (XEN) [ 11.910923] RIP: e008:[<ffff82d040304bdc>] msi.c#write_msi_msg+0xf5/0x16c (XEN) [ 11.918652] RFLAGS: 0000000000010002 CONTEXT: hypervisor (d0v3) (XEN) [ 11.925404] rax: 0000000000000001 rbx: ffff830401ded840 rcx: ffff8303f5367d38 (XEN) [ 11.933576] rdx: 0000000000000000 rsi: ffff8303f536e1d8 rdi: 8000000000000000 (XEN) [ 11.941750] rbp: ffff8303f5367d90 rsp: ffff8303f5367d68 r8: 0000000000000000 (XEN) [ 11.949924] r9: 0000000000000000 r10: ffff8303f5367d58 r11: 0000000000000000 (XEN) [ 11.958097] r12: ffff8303f5367da0 r13: 0000000000000000 r14: 0000000000000001 (XEN) [ 11.966269] r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000003506e0 (XEN) [ 11.974442] cr3: 00000003f2a0c000 cr2: ffff888100109890 (XEN) [ 11.980483] fsb: 0000000000000000 gsb: ffff8881352c0000 gss: 0000000000000000 (XEN) [ 11.988656] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) [ 11.996301] Xen code around <ffff82d040304bdc> (msi.c#write_msi_msg+0xf5/0x16c): (XEN) [ 12.004382] 41 bd 00 00 00 00 eb c2 <0f> 0b 4c 8b 73 28 44 0f b6 7b 01 41 8b 14 24 41 (XEN) [ 12.012998] Xen stack trace from rsp=ffff8303f5367d68: (XEN) [ 12.018772] ffff8303f5582000 0000000000000000 ffff830401ded840 ffff8303f555e000 (XEN) [ 12.027036] 0000000000000000 ffff8303f5367dc8 ffff82d040304cd9 00000000fee00000 (XEN) [ 12.035299] 0000000000004061 ffff8303f5582000 ffff830401ded380 ffff82d0403c6340 (XEN) [ 12.043560] ffff830401ded9a0 ffff82d04030b00a 000000010000002a ffff82d0403a5aaf (XEN) [ 12.051821] ffff8303f5367ef8 ffffc90040087afc ffff8303f556f000 ffff8303f5569a80 (XEN) [ 12.060083] 000000000000002a 00000000000004d8 ffff82d04020ca6f 0000002aaaaaaaaa (XEN) [ 12.068344] ffff830401ded380 ffff8303f555e000 00000000000004d8 aaaaaaaa00000000 (XEN) [ 12.076606] aaaaaaaaaaaaaaaa ffff8303f5367ef8 0000000000000020 ffff8303f554f000 (XEN) [ 12.084869] ffffc90040087afc 0000000000000000 0000000000000002 ffff82d0402eb7fc (XEN) [ 12.093130] ffff88810ff6a228 ffff888100448c90 ffffffffffffffff 0000000000000000 (XEN) [ 12.101393] 0000000000000000 ffff8303f5367ef8 0000000000000000 ffff8303f554f000 (XEN) [ 12.109654] 0000000000000000 0000000000000000 0000000000000000 ffff8303f5367fff (XEN) [ 12.117916] 0000000000000000 ffff82d0402012cd 0000000000000000 ffff88810ff6a2a4 (XEN) [ 12.126179] ffff88810ff6a360 0000000000000000 ffff88810236fde0 0000000000000060 (XEN) [ 12.134442] 0000000000000246 0000000000000000 ffffffff82b3cce0 ffff888100448c90 (XEN) [ 12.142703] 0000000000000020 ffffffff81df540a ffff88810ff6a228 ffffc90040087afc (XEN) [ 12.150966] 0000000000000002 0000010000000000 ffffffff81df540a 000000000000e033 (XEN) [ 12.159227] 0000000000000246 ffffc90040087ae0 000000000000e02b c2c2c2c2c2c2c2c2 (XEN) [ 12.167489] c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 0000e01000000003 (XEN) [ 12.175752] ffff8303f554f000 00000033b4d9f000 00000000003506e0 0000000000000000 (XEN) [ 12.184014] Xen call trace: (XEN) [ 12.187388] [<ffff82d040304bdc>] R msi.c#write_msi_msg+0xf5/0x16c (XEN) [ 12.194407] [<ffff82d040304cd9>] S set_msi_affinity+0x86/0x93 (XEN) [ 12.201071] [<ffff82d04030b00a>] S pirq_guest_bind+0x2c0/0x484 (XEN) [ 12.207823] [<ffff82d04020ca6f>] S do_event_channel_op+0xad1/0x11b0 (XEN) [ 12.215019] [<ffff82d0402eb7fc>] S pv_hypercall+0x55f/0x5dd (XEN) [ 12.221504] [<ffff82d0402012cd>] S lstar_enter+0x13d/0x150 (XEN) [ 12.227899] (XEN) [ 12.229943] (XEN) [ 12.231987] **************************************** (XEN) [ 12.237584] Panic on CPU 3: (XEN) [ 12.240960] Assertion '(msg->data & (entry[-nr].msi.nvec - 1)) == nr' failed at arch/x86/msi.c:210 (XEN) [ 12.250644] **************************************** I wonder if similar crash happens on resume with this patch. BTW, the -nr in the above assert looks suspicious, is it trying to fine the first entry? > I wonder if we should unconditionally propagate the write from > __setup_msi_irq(), as it's also unlikely to make any difference to > skip that write, and would further keep the previous behavior. > > Thanks, Roger. > --- > commit 1d9bfd0d45f6b547b19f0d2f752fc3bd10103971 > Author: Roger Pau Monne <roger.pau@xxxxxxxxxx> > Date: Mon Mar 17 15:40:11 2025 +0100 > > x86/msi: always propagate MSI writes when not in active system mode > > Relax the limitation on MSI register writes, and only apply it when the > system is in active state. For example AMD IOMMU drivers rely on using > set_msi_affinity() to force an MSI register write on resume from > suspension. > > The original patch intention was to reduce the number of MSI register > writes when the system is in active state. Leave the other states to > always perform the writes, as it's safer given the existing code, and it's > expected to not make a difference performance wise. > > Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > Fixes: ('8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT > index hasn't changed') > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c > index 163ccf874720..8bb3bb18af61 100644 > --- a/xen/arch/x86/msi.c > +++ b/xen/arch/x86/msi.c > @@ -189,6 +189,15 @@ static int write_msi_msg(struct msi_desc *entry, struct > msi_msg *msg, > { > entry->msg = *msg; > > + if ( unlikely(system_state != SYS_STATE_active) ) > + /* > + * Always propagate writes when not in the 'active' state. The > + * optimization to avoid the MSI address and data registers write is > + * only relevant for runtime state, and drivers on resume (at least) > + * rely on set_msi_affinity() to update the hardware state. > + */ > + force = true; > + > if ( iommu_intremap != iommu_intremap_off ) > { > int rc; -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab

Attachment: signature.asc
Description: PGP signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.