[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 regression on AMD in 4.20



On Mon, Mar 17, 2025 at 04:56:15PM +0100, Roger Pau Monné wrote:
> On Sat, Mar 15, 2025 at 12:02:50AM +0000, Andrew Cooper wrote:
> > On 14/03/2025 11:53 pm, Marek Marczykowski-Górecki wrote:
> > > On Fri, Mar 14, 2025 at 11:23:28PM +0100, Marek Marczykowski-Górecki 
> > > wrote:
> > >> On Fri, Mar 14, 2025 at 02:19:19PM -0700, Stefano Stabellini wrote:
> > >>> On Fri, 14 Mar 2025, Marek Marczykowski-Górecki wrote:
> > >>>> This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 G7.
> > >>>>
> > >>>> This one has working S3, so add a test for it here.
> > >>>>
> > >>>> Signed-off-by: Marek Marczykowski-Górecki 
> > >>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > >>>> ---
> > >>>> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> > >>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > >>>>
> > >>>> The suspend test added here currently fails on staging[1], but passes 
> > >>>> on
> > >>>> staging-4.19[2]. So the regression wants fixing before committing this
> > >>>> patch.
> > >>>>
> > >>>> [1] 
> > >>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408437140
> > >>>> [2] 
> > >>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408943441
> > >>> We could commit the patch now without the s3 test.
> > >>>
> > >>> I don't know what the x86 maintainers think about fixing the suspend
> > >>> bug, but one idea would be to run a bisection between 4.20 and 4.19.
> > >> I'm on it already, but it's annoying. Lets convert this thread to
> > >> discussion about the issue:
> > >>
> > >> So, I bisected it between staging-4.19 and master. The breakage is
> > >> somewhere between (inclusive):
> > >> eb21ce14d709 x86/boot: Rewrite EFI/MBI2 code partly in C
> > >> and
> > >> 47990ecef286 x86/boot: Improve MBI2 structure check
> > >>
> > >> But, the first one breaks booting on this system and it remains broken
> > >> until the second commit (or its parent) - at which point S3 is already
> > >> broken. So, there is a range of 71 commits that may be responsible...
> > >>
> > >> But then, based on a matrix chat and Jan's observation I've tried
> > >> reverting f75780d26b2f "xen: move per-cpu area management into common
> > >> code" just on top of 47990ecef286, and that fixed suspend.
> > >> Applying "xen/percpu: don't initialize percpu on resume" on top of
> > >> 47990ecef286 fixes suspend too.
> > >> But applying it on top of master
> > >> (91772f8420dfa2fcfe4db68480c216db5b79c512 specifically) does not fix it,
> > >> but the failure mode is different than without the patch - system resets
> > >> on S3 resume, with no crash message on the serial console (even with
> > >> sync_console), instead of hanging.
> > >> And one more data point: reverting f75780d26b2f on top of master is the
> > >> same as applying "xen/percpu: don't initialize percpu on resume" on
> > >> master - system reset on S3 resume.
> > >> So, it looks like there are more issues...
> > > Another bisection round and I have the second culprit:
> > >
> > >     8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT 
> > > index hasn't changed
> > >
> > > With master+"xen/percpu: don't initialize percpu on resume"+revert of
> > > 8e60d47cf011 suspend works again on this AMD system.
> > 
> > That's not surprising in the slightest.
> > 
> > Caching hardware values in Xen isn't safe across S3, which QubesOS has
> > found time and time again, and for which we still have outstanding bugs.
> > 
> > S3 turns most of the system off.  RAM gets preserved, but devices and
> > plenty of internal registers don't.
> 
> I think I've spotted the issue.  enable_iommu() called on resume
> (ab)uses set_msi_affinity() to force an MSI register write, as it's
> previous behavior was to unconditionally propagate the values.  With
> my change it would no longer perform such writes on resume.
> 
> I think the patch below should help.

It doesn't, I still got reboot on resume, with no crash message on
serial (even with sync_console).

And the other patch (setting force=true in all calls) gets me panic
on boot:

(XEN) [   11.890757] Assertion '(msg->data & (entry[-nr].msi.nvec - 1)) == nr' 
failed at arch/x86/msi.c:210
(XEN) [   11.900440] ----[ Xen-4.21-unstable  x86_64  debug=y  Tainted:   C    
]----
(XEN) [   11.908082] CPU:    3
(XEN) [   11.910923] RIP:    e008:[<ffff82d040304bdc>] 
msi.c#write_msi_msg+0xf5/0x16c
(XEN) [   11.918652] RFLAGS: 0000000000010002   CONTEXT: hypervisor (d0v3)
(XEN) [   11.925404] rax: 0000000000000001   rbx: ffff830401ded840   rcx: 
ffff8303f5367d38
(XEN) [   11.933576] rdx: 0000000000000000   rsi: ffff8303f536e1d8   rdi: 
8000000000000000
(XEN) [   11.941750] rbp: ffff8303f5367d90   rsp: ffff8303f5367d68   r8:  
0000000000000000
(XEN) [   11.949924] r9:  0000000000000000   r10: ffff8303f5367d58   r11: 
0000000000000000
(XEN) [   11.958097] r12: ffff8303f5367da0   r13: 0000000000000000   r14: 
0000000000000001
(XEN) [   11.966269] r15: 0000000000000000   cr0: 0000000080050033   cr4: 
00000000003506e0
(XEN) [   11.974442] cr3: 00000003f2a0c000   cr2: ffff888100109890
(XEN) [   11.980483] fsb: 0000000000000000   gsb: ffff8881352c0000   gss: 
0000000000000000
(XEN) [   11.988656] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: 
e008
(XEN) [   11.996301] Xen code around <ffff82d040304bdc> 
(msi.c#write_msi_msg+0xf5/0x16c):
(XEN) [   12.004382]  41 bd 00 00 00 00 eb c2 <0f> 0b 4c 8b 73 28 44 0f b6 7b 
01 41 8b 14 24 41
(XEN) [   12.012998] Xen stack trace from rsp=ffff8303f5367d68:
(XEN) [   12.018772]    ffff8303f5582000 0000000000000000 ffff830401ded840 
ffff8303f555e000
(XEN) [   12.027036]    0000000000000000 ffff8303f5367dc8 ffff82d040304cd9 
00000000fee00000
(XEN) [   12.035299]    0000000000004061 ffff8303f5582000 ffff830401ded380 
ffff82d0403c6340
(XEN) [   12.043560]    ffff830401ded9a0 ffff82d04030b00a 000000010000002a 
ffff82d0403a5aaf
(XEN) [   12.051821]    ffff8303f5367ef8 ffffc90040087afc ffff8303f556f000 
ffff8303f5569a80
(XEN) [   12.060083]    000000000000002a 00000000000004d8 ffff82d04020ca6f 
0000002aaaaaaaaa
(XEN) [   12.068344]    ffff830401ded380 ffff8303f555e000 00000000000004d8 
aaaaaaaa00000000
(XEN) [   12.076606]    aaaaaaaaaaaaaaaa ffff8303f5367ef8 0000000000000020 
ffff8303f554f000
(XEN) [   12.084869]    ffffc90040087afc 0000000000000000 0000000000000002 
ffff82d0402eb7fc
(XEN) [   12.093130]    ffff88810ff6a228 ffff888100448c90 ffffffffffffffff 
0000000000000000
(XEN) [   12.101393]    0000000000000000 ffff8303f5367ef8 0000000000000000 
ffff8303f554f000
(XEN) [   12.109654]    0000000000000000 0000000000000000 0000000000000000 
ffff8303f5367fff
(XEN) [   12.117916]    0000000000000000 ffff82d0402012cd 0000000000000000 
ffff88810ff6a2a4
(XEN) [   12.126179]    ffff88810ff6a360 0000000000000000 ffff88810236fde0 
0000000000000060
(XEN) [   12.134442]    0000000000000246 0000000000000000 ffffffff82b3cce0 
ffff888100448c90
(XEN) [   12.142703]    0000000000000020 ffffffff81df540a ffff88810ff6a228 
ffffc90040087afc
(XEN) [   12.150966]    0000000000000002 0000010000000000 ffffffff81df540a 
000000000000e033
(XEN) [   12.159227]    0000000000000246 ffffc90040087ae0 000000000000e02b 
c2c2c2c2c2c2c2c2
(XEN) [   12.167489]    c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 
0000e01000000003
(XEN) [   12.175752]    ffff8303f554f000 00000033b4d9f000 00000000003506e0 
0000000000000000
(XEN) [   12.184014] Xen call trace:
(XEN) [   12.187388]    [<ffff82d040304bdc>] R msi.c#write_msi_msg+0xf5/0x16c
(XEN) [   12.194407]    [<ffff82d040304cd9>] S set_msi_affinity+0x86/0x93
(XEN) [   12.201071]    [<ffff82d04030b00a>] S pirq_guest_bind+0x2c0/0x484
(XEN) [   12.207823]    [<ffff82d04020ca6f>] S do_event_channel_op+0xad1/0x11b0
(XEN) [   12.215019]    [<ffff82d0402eb7fc>] S pv_hypercall+0x55f/0x5dd
(XEN) [   12.221504]    [<ffff82d0402012cd>] S lstar_enter+0x13d/0x150
(XEN) [   12.227899] 
(XEN) [   12.229943] 
(XEN) [   12.231987] ****************************************
(XEN) [   12.237584] Panic on CPU 3:
(XEN) [   12.240960] Assertion '(msg->data & (entry[-nr].msi.nvec - 1)) == nr' 
failed at arch/x86/msi.c:210
(XEN) [   12.250644] ****************************************

I wonder if similar crash happens on resume with this patch.

BTW, the -nr in the above assert looks suspicious, is it trying to fine
the first entry?

> I wonder if we should unconditionally propagate the write from
> __setup_msi_irq(), as it's also unlikely to make any difference to
> skip that write, and would further keep the previous behavior.
> 
> Thanks, Roger.
> ---
> commit 1d9bfd0d45f6b547b19f0d2f752fc3bd10103971
> Author: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> Date:   Mon Mar 17 15:40:11 2025 +0100
> 
>     x86/msi: always propagate MSI writes when not in active system mode
>     
>     Relax the limitation on MSI register writes, and only apply it when the
>     system is in active state.  For example AMD IOMMU drivers rely on using
>     set_msi_affinity() to force an MSI register write on resume from
>     suspension.
>     
>     The original patch intention was to reduce the number of MSI register
>     writes when the system is in active state.  Leave the other states to
>     always perform the writes, as it's safer given the existing code, and it's
>     expected to not make a difference performance wise.
>     
>     Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
>     Fixes: ('8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT 
> index hasn't changed')
>     Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> 
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 163ccf874720..8bb3bb18af61 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -189,6 +189,15 @@ static int write_msi_msg(struct msi_desc *entry, struct 
> msi_msg *msg,
>  {
>      entry->msg = *msg;
>  
> +    if ( unlikely(system_state != SYS_STATE_active) )
> +        /*
> +         * Always propagate writes when not in the 'active' state.  The
> +         * optimization to avoid the MSI address and data registers write is
> +         * only relevant for runtime state, and drivers on resume (at least)
> +         * rely on set_msi_affinity() to update the hardware state.
> +         */
> +        force = true;
> +
>      if ( iommu_intremap != iommu_intremap_off )
>      {
>          int rc;

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.