[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 regression on AMD in 4.20



On Tue, Mar 18, 2025 at 07:24:31AM +0100, Jan Beulich wrote:
> On 17.03.2025 18:38, Roger Pau Monné wrote:
> > On Mon, Mar 17, 2025 at 05:11:56PM +0100, Jan Beulich wrote:
> >> On 17.03.2025 16:56, Roger Pau Monné wrote:
> >>> On Sat, Mar 15, 2025 at 12:02:50AM +0000, Andrew Cooper wrote:
> >>>> On 14/03/2025 11:53 pm, Marek Marczykowski-Górecki wrote:
> >>>>> On Fri, Mar 14, 2025 at 11:23:28PM +0100, Marek Marczykowski-Górecki 
> >>>>> wrote:
> >>>>>> On Fri, Mar 14, 2025 at 02:19:19PM -0700, Stefano Stabellini wrote:
> >>>>>>> On Fri, 14 Mar 2025, Marek Marczykowski-Górecki wrote:
> >>>>>>>> This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 
> >>>>>>>> G7.
> >>>>>>>>
> >>>>>>>> This one has working S3, so add a test for it here.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Marek Marczykowski-Górecki 
> >>>>>>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> >>>>>>>> ---
> >>>>>>>> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> >>>>>>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>>>>>>>
> >>>>>>>> The suspend test added here currently fails on staging[1], but 
> >>>>>>>> passes on
> >>>>>>>> staging-4.19[2]. So the regression wants fixing before committing 
> >>>>>>>> this
> >>>>>>>> patch.
> >>>>>>>>
> >>>>>>>> [1] 
> >>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408437140
> >>>>>>>> [2] 
> >>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408943441
> >>>>>>> We could commit the patch now without the s3 test.
> >>>>>>>
> >>>>>>> I don't know what the x86 maintainers think about fixing the suspend
> >>>>>>> bug, but one idea would be to run a bisection between 4.20 and 4.19.
> >>>>>> I'm on it already, but it's annoying. Lets convert this thread to
> >>>>>> discussion about the issue:
> >>>>>>
> >>>>>> So, I bisected it between staging-4.19 and master. The breakage is
> >>>>>> somewhere between (inclusive):
> >>>>>> eb21ce14d709 x86/boot: Rewrite EFI/MBI2 code partly in C
> >>>>>> and
> >>>>>> 47990ecef286 x86/boot: Improve MBI2 structure check
> >>>>>>
> >>>>>> But, the first one breaks booting on this system and it remains broken
> >>>>>> until the second commit (or its parent) - at which point S3 is already
> >>>>>> broken. So, there is a range of 71 commits that may be responsible...
> >>>>>>
> >>>>>> But then, based on a matrix chat and Jan's observation I've tried
> >>>>>> reverting f75780d26b2f "xen: move per-cpu area management into common
> >>>>>> code" just on top of 47990ecef286, and that fixed suspend.
> >>>>>> Applying "xen/percpu: don't initialize percpu on resume" on top of
> >>>>>> 47990ecef286 fixes suspend too.
> >>>>>> But applying it on top of master
> >>>>>> (91772f8420dfa2fcfe4db68480c216db5b79c512 specifically) does not fix 
> >>>>>> it,
> >>>>>> but the failure mode is different than without the patch - system 
> >>>>>> resets
> >>>>>> on S3 resume, with no crash message on the serial console (even with
> >>>>>> sync_console), instead of hanging.
> >>>>>> And one more data point: reverting f75780d26b2f on top of master is the
> >>>>>> same as applying "xen/percpu: don't initialize percpu on resume" on
> >>>>>> master - system reset on S3 resume.
> >>>>>> So, it looks like there are more issues...
> >>>>> Another bisection round and I have the second culprit:
> >>>>>
> >>>>>     8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT 
> >>>>> index hasn't changed
> >>>>>
> >>>>> With master+"xen/percpu: don't initialize percpu on resume"+revert of
> >>>>> 8e60d47cf011 suspend works again on this AMD system.
> >>>>
> >>>> That's not surprising in the slightest.
> >>>>
> >>>> Caching hardware values in Xen isn't safe across S3, which QubesOS has
> >>>> found time and time again, and for which we still have outstanding bugs.
> >>>>
> >>>> S3 turns most of the system off.  RAM gets preserved, but devices and
> >>>> plenty of internal registers don't.
> >>>
> >>> I think I've spotted the issue.  enable_iommu() called on resume
> >>> (ab)uses set_msi_affinity() to force an MSI register write, as it's
> >>> previous behavior was to unconditionally propagate the values.  With
> >>> my change it would no longer perform such writes on resume.
> >>>
> >>> I think the patch below should help.
> >>>
> >>> I wonder if we should unconditionally propagate the write from
> >>> __setup_msi_irq(), as it's also unlikely to make any difference to
> >>> skip that write, and would further keep the previous behavior.
> >>
> >> That might be better. In fact I wonder whether it wouldn't better be
> >> callers of write_msi_msg() to use ...
> >>
> >>> ---
> >>> commit 1d9bfd0d45f6b547b19f0d2f752fc3bd10103971
> >>> Author: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> >>> Date:   Mon Mar 17 15:40:11 2025 +0100
> >>>
> >>>     x86/msi: always propagate MSI writes when not in active system mode
> >>>     
> >>>     Relax the limitation on MSI register writes, and only apply it when 
> >>> the
> >>>     system is in active state.  For example AMD IOMMU drivers rely on 
> >>> using
> >>>     set_msi_affinity() to force an MSI register write on resume from
> >>>     suspension.
> >>>     
> >>>     The original patch intention was to reduce the number of MSI register
> >>>     writes when the system is in active state.  Leave the other states to
> >>>     always perform the writes, as it's safer given the existing code, and 
> >>> it's
> >>>     expected to not make a difference performance wise.
> >>>     
> >>>     Reported-by: Marek Marczykowski-Górecki 
> >>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> >>>     Fixes: ('8e60d47cf011 x86/iommu: avoid MSI address and data writes if 
> >>> IRT index hasn't changed')
> >>>     Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >>>
> >>> --- a/xen/arch/x86/msi.c
> >>> +++ b/xen/arch/x86/msi.c
> >>> @@ -189,6 +189,15 @@ static int write_msi_msg(struct msi_desc *entry, 
> >>> struct msi_msg *msg,
> >>>  {
> >>>      entry->msg = *msg;
> >>>  
> >>> +    if ( unlikely(system_state != SYS_STATE_active) )
> >>
> >> ... this condition as needed. Conceivably there might be cases where even
> >> during resume writes aren't always necessary to propagate to hardware.
> > 
> > Isn't this going to be quite more cumbersome, and I don't think the
> > cases outside of active mode are very interesting or relevant
> > performance wise?
> 
> Certainly. As to cumbersome - having the conditions at the call sites
> would also serve as kind of documentation. That said, this was just a
> suggestion; I'm not going to insist. I see from other traffic on this
> thread the updated patch is now fully addressing the regression. The
> secondary change, if not broken out to a separate patch, would want
> mentioning in the description, though (which may have been the plan
> anyway).

Indeed, it's already mentioned in the description.  Let me formally
send the patch.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.