[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 regression on AMD in 4.20 (was: Re: [PATCH] ci: add yet another HW runner)



On Fri, Mar 14, 2025 at 11:23:28PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Mar 14, 2025 at 02:19:19PM -0700, Stefano Stabellini wrote:
> > On Fri, 14 Mar 2025, Marek Marczykowski-Górecki wrote:
> > > This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 G7.
> > > 
> > > This one has working S3, so add a test for it here.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki 
> > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > > ---
> > > Cc: Jan Beulich <jbeulich@xxxxxxxx>
> > > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > > 
> > > The suspend test added here currently fails on staging[1], but passes on
> > > staging-4.19[2]. So the regression wants fixing before committing this
> > > patch.
> > >
> > > [1] https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408437140
> > > [2] https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408943441
> > 
> > We could commit the patch now without the s3 test.
> > 
> > I don't know what the x86 maintainers think about fixing the suspend
> > bug, but one idea would be to run a bisection between 4.20 and 4.19.
> 
> I'm on it already, but it's annoying. Lets convert this thread to
> discussion about the issue:
> 
> So, I bisected it between staging-4.19 and master. The breakage is
> somewhere between (inclusive):
> eb21ce14d709 x86/boot: Rewrite EFI/MBI2 code partly in C
> and
> 47990ecef286 x86/boot: Improve MBI2 structure check
> 
> But, the first one breaks booting on this system and it remains broken
> until the second commit (or its parent) - at which point S3 is already
> broken. So, there is a range of 71 commits that may be responsible...
> 
> But then, based on a matrix chat and Jan's observation I've tried
> reverting f75780d26b2f "xen: move per-cpu area management into common
> code" just on top of 47990ecef286, and that fixed suspend.
> Applying "xen/percpu: don't initialize percpu on resume" on top of
> 47990ecef286 fixes suspend too.
> But applying it on top of master
> (91772f8420dfa2fcfe4db68480c216db5b79c512 specifically) does not fix it,
> but the failure mode is different than without the patch - system resets
> on S3 resume, with no crash message on the serial console (even with
> sync_console), instead of hanging.
> And one more data point: reverting f75780d26b2f on top of master is the
> same as applying "xen/percpu: don't initialize percpu on resume" on
> master - system reset on S3 resume.
> So, it looks like there are more issues...

Another bisection round and I have the second culprit:

    8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT index 
hasn't changed

With master+"xen/percpu: don't initialize percpu on resume"+revert of
8e60d47cf011 suspend works again on this AMD system.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.