[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 regression on AMD in 4.20


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 18 Mar 2025 07:24:31 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Doug Goldstein <cardoe@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Mykola Kvach <xakep.amatop@xxxxxxxxx>, Mykyta Poturai <mykyta_poturai@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 18 Mar 2025 06:24:53 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.03.2025 18:38, Roger Pau Monné wrote:
> On Mon, Mar 17, 2025 at 05:11:56PM +0100, Jan Beulich wrote:
>> On 17.03.2025 16:56, Roger Pau Monné wrote:
>>> On Sat, Mar 15, 2025 at 12:02:50AM +0000, Andrew Cooper wrote:
>>>> On 14/03/2025 11:53 pm, Marek Marczykowski-Górecki wrote:
>>>>> On Fri, Mar 14, 2025 at 11:23:28PM +0100, Marek Marczykowski-Górecki 
>>>>> wrote:
>>>>>> On Fri, Mar 14, 2025 at 02:19:19PM -0700, Stefano Stabellini wrote:
>>>>>>> On Fri, 14 Mar 2025, Marek Marczykowski-Górecki wrote:
>>>>>>>> This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 G7.
>>>>>>>>
>>>>>>>> This one has working S3, so add a test for it here.
>>>>>>>>
>>>>>>>> Signed-off-by: Marek Marczykowski-Górecki 
>>>>>>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> Cc: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>>>
>>>>>>>> The suspend test added here currently fails on staging[1], but passes 
>>>>>>>> on
>>>>>>>> staging-4.19[2]. So the regression wants fixing before committing this
>>>>>>>> patch.
>>>>>>>>
>>>>>>>> [1] 
>>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408437140
>>>>>>>> [2] 
>>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408943441
>>>>>>> We could commit the patch now without the s3 test.
>>>>>>>
>>>>>>> I don't know what the x86 maintainers think about fixing the suspend
>>>>>>> bug, but one idea would be to run a bisection between 4.20 and 4.19.
>>>>>> I'm on it already, but it's annoying. Lets convert this thread to
>>>>>> discussion about the issue:
>>>>>>
>>>>>> So, I bisected it between staging-4.19 and master. The breakage is
>>>>>> somewhere between (inclusive):
>>>>>> eb21ce14d709 x86/boot: Rewrite EFI/MBI2 code partly in C
>>>>>> and
>>>>>> 47990ecef286 x86/boot: Improve MBI2 structure check
>>>>>>
>>>>>> But, the first one breaks booting on this system and it remains broken
>>>>>> until the second commit (or its parent) - at which point S3 is already
>>>>>> broken. So, there is a range of 71 commits that may be responsible...
>>>>>>
>>>>>> But then, based on a matrix chat and Jan's observation I've tried
>>>>>> reverting f75780d26b2f "xen: move per-cpu area management into common
>>>>>> code" just on top of 47990ecef286, and that fixed suspend.
>>>>>> Applying "xen/percpu: don't initialize percpu on resume" on top of
>>>>>> 47990ecef286 fixes suspend too.
>>>>>> But applying it on top of master
>>>>>> (91772f8420dfa2fcfe4db68480c216db5b79c512 specifically) does not fix it,
>>>>>> but the failure mode is different than without the patch - system resets
>>>>>> on S3 resume, with no crash message on the serial console (even with
>>>>>> sync_console), instead of hanging.
>>>>>> And one more data point: reverting f75780d26b2f on top of master is the
>>>>>> same as applying "xen/percpu: don't initialize percpu on resume" on
>>>>>> master - system reset on S3 resume.
>>>>>> So, it looks like there are more issues...
>>>>> Another bisection round and I have the second culprit:
>>>>>
>>>>>     8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT 
>>>>> index hasn't changed
>>>>>
>>>>> With master+"xen/percpu: don't initialize percpu on resume"+revert of
>>>>> 8e60d47cf011 suspend works again on this AMD system.
>>>>
>>>> That's not surprising in the slightest.
>>>>
>>>> Caching hardware values in Xen isn't safe across S3, which QubesOS has
>>>> found time and time again, and for which we still have outstanding bugs.
>>>>
>>>> S3 turns most of the system off.  RAM gets preserved, but devices and
>>>> plenty of internal registers don't.
>>>
>>> I think I've spotted the issue.  enable_iommu() called on resume
>>> (ab)uses set_msi_affinity() to force an MSI register write, as it's
>>> previous behavior was to unconditionally propagate the values.  With
>>> my change it would no longer perform such writes on resume.
>>>
>>> I think the patch below should help.
>>>
>>> I wonder if we should unconditionally propagate the write from
>>> __setup_msi_irq(), as it's also unlikely to make any difference to
>>> skip that write, and would further keep the previous behavior.
>>
>> That might be better. In fact I wonder whether it wouldn't better be
>> callers of write_msi_msg() to use ...
>>
>>> ---
>>> commit 1d9bfd0d45f6b547b19f0d2f752fc3bd10103971
>>> Author: Roger Pau Monne <roger.pau@xxxxxxxxxx>
>>> Date:   Mon Mar 17 15:40:11 2025 +0100
>>>
>>>     x86/msi: always propagate MSI writes when not in active system mode
>>>     
>>>     Relax the limitation on MSI register writes, and only apply it when the
>>>     system is in active state.  For example AMD IOMMU drivers rely on using
>>>     set_msi_affinity() to force an MSI register write on resume from
>>>     suspension.
>>>     
>>>     The original patch intention was to reduce the number of MSI register
>>>     writes when the system is in active state.  Leave the other states to
>>>     always perform the writes, as it's safer given the existing code, and 
>>> it's
>>>     expected to not make a difference performance wise.
>>>     
>>>     Reported-by: Marek Marczykowski-Górecki 
>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
>>>     Fixes: ('8e60d47cf011 x86/iommu: avoid MSI address and data writes if 
>>> IRT index hasn't changed')
>>>     Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>>
>>> --- a/xen/arch/x86/msi.c
>>> +++ b/xen/arch/x86/msi.c
>>> @@ -189,6 +189,15 @@ static int write_msi_msg(struct msi_desc *entry, 
>>> struct msi_msg *msg,
>>>  {
>>>      entry->msg = *msg;
>>>  
>>> +    if ( unlikely(system_state != SYS_STATE_active) )
>>
>> ... this condition as needed. Conceivably there might be cases where even
>> during resume writes aren't always necessary to propagate to hardware.
> 
> Isn't this going to be quite more cumbersome, and I don't think the
> cases outside of active mode are very interesting or relevant
> performance wise?

Certainly. As to cumbersome - having the conditions at the call sites
would also serve as kind of documentation. That said, this was just a
suggestion; I'm not going to insist. I see from other traffic on this
thread the updated patch is now fully addressing the regression. The
secondary change, if not broken out to a separate patch, would want
mentioning in the description, though (which may have been the plan
anyway).

Jan

> The main purpose of the original change was the reduce the number of
> PCI accesses during 'active' state, as every time an interrupt is
> migrated to a different CPU such a (possibly redundant) PCI config
> write would happen.
> 
> Thanks, Roger.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.