Xen project Mailing List

Re: S3 regression on AMD in 4.20

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 18 Mar 2025 07:24:31 +0100

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Doug Goldstein <cardoe@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Mykola Kvach <xakep.amatop@xxxxxxxxx>, Mykyta Poturai <mykyta_poturai@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 18 Mar 2025 06:24:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.03.2025 18:38, Roger Pau Monné wrote: > On Mon, Mar 17, 2025 at 05:11:56PM +0100, Jan Beulich wrote: >> On 17.03.2025 16:56, Roger Pau Monné wrote: >>> On Sat, Mar 15, 2025 at 12:02:50AM +0000, Andrew Cooper wrote: >>>> On 14/03/2025 11:53 pm, Marek Marczykowski-Górecki wrote: >>>>> On Fri, Mar 14, 2025 at 11:23:28PM +0100, Marek Marczykowski-Górecki >>>>> wrote: >>>>>> On Fri, Mar 14, 2025 at 02:19:19PM -0700, Stefano Stabellini wrote: >>>>>>> On Fri, 14 Mar 2025, Marek Marczykowski-Górecki wrote: >>>>>>>> This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 G7. >>>>>>>> >>>>>>>> This one has working S3, so add a test for it here. >>>>>>>> >>>>>>>> Signed-off-by: Marek Marczykowski-Górecki >>>>>>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> >>>>>>>> --- >>>>>>>> Cc: Jan Beulich <jbeulich@xxxxxxxx> >>>>>>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>>>>>>> >>>>>>>> The suspend test added here currently fails on staging[1], but passes >>>>>>>> on >>>>>>>> staging-4.19[2]. So the regression wants fixing before committing this >>>>>>>> patch. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408437140 >>>>>>>> [2] >>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408943441 >>>>>>> We could commit the patch now without the s3 test. >>>>>>> >>>>>>> I don't know what the x86 maintainers think about fixing the suspend >>>>>>> bug, but one idea would be to run a bisection between 4.20 and 4.19. >>>>>> I'm on it already, but it's annoying. Lets convert this thread to >>>>>> discussion about the issue: >>>>>> >>>>>> So, I bisected it between staging-4.19 and master. The breakage is >>>>>> somewhere between (inclusive): >>>>>> eb21ce14d709 x86/boot: Rewrite EFI/MBI2 code partly in C >>>>>> and >>>>>> 47990ecef286 x86/boot: Improve MBI2 structure check >>>>>> >>>>>> But, the first one breaks booting on this system and it remains broken >>>>>> until the second commit (or its parent) - at which point S3 is already >>>>>> broken. So, there is a range of 71 commits that may be responsible... >>>>>> >>>>>> But then, based on a matrix chat and Jan's observation I've tried >>>>>> reverting f75780d26b2f "xen: move per-cpu area management into common >>>>>> code" just on top of 47990ecef286, and that fixed suspend. >>>>>> Applying "xen/percpu: don't initialize percpu on resume" on top of >>>>>> 47990ecef286 fixes suspend too. >>>>>> But applying it on top of master >>>>>> (91772f8420dfa2fcfe4db68480c216db5b79c512 specifically) does not fix it, >>>>>> but the failure mode is different than without the patch - system resets >>>>>> on S3 resume, with no crash message on the serial console (even with >>>>>> sync_console), instead of hanging. >>>>>> And one more data point: reverting f75780d26b2f on top of master is the >>>>>> same as applying "xen/percpu: don't initialize percpu on resume" on >>>>>> master - system reset on S3 resume. >>>>>> So, it looks like there are more issues... >>>>> Another bisection round and I have the second culprit: >>>>> >>>>> 8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT >>>>> index hasn't changed >>>>> >>>>> With master+"xen/percpu: don't initialize percpu on resume"+revert of >>>>> 8e60d47cf011 suspend works again on this AMD system. >>>> >>>> That's not surprising in the slightest. >>>> >>>> Caching hardware values in Xen isn't safe across S3, which QubesOS has >>>> found time and time again, and for which we still have outstanding bugs. >>>> >>>> S3 turns most of the system off. RAM gets preserved, but devices and >>>> plenty of internal registers don't. >>> >>> I think I've spotted the issue. enable_iommu() called on resume >>> (ab)uses set_msi_affinity() to force an MSI register write, as it's >>> previous behavior was to unconditionally propagate the values. With >>> my change it would no longer perform such writes on resume. >>> >>> I think the patch below should help. >>> >>> I wonder if we should unconditionally propagate the write from >>> __setup_msi_irq(), as it's also unlikely to make any difference to >>> skip that write, and would further keep the previous behavior. >> >> That might be better. In fact I wonder whether it wouldn't better be >> callers of write_msi_msg() to use ... >> >>> --- >>> commit 1d9bfd0d45f6b547b19f0d2f752fc3bd10103971 >>> Author: Roger Pau Monne <roger.pau@xxxxxxxxxx> >>> Date: Mon Mar 17 15:40:11 2025 +0100 >>> >>> x86/msi: always propagate MSI writes when not in active system mode >>> >>> Relax the limitation on MSI register writes, and only apply it when the >>> system is in active state. For example AMD IOMMU drivers rely on using >>> set_msi_affinity() to force an MSI register write on resume from >>> suspension. >>> >>> The original patch intention was to reduce the number of MSI register >>> writes when the system is in active state. Leave the other states to >>> always perform the writes, as it's safer given the existing code, and >>> it's >>> expected to not make a difference performance wise. >>> >>> Reported-by: Marek Marczykowski-Górecki >>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> >>> Fixes: ('8e60d47cf011 x86/iommu: avoid MSI address and data writes if >>> IRT index hasn't changed') >>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >>> >>> --- a/xen/arch/x86/msi.c >>> +++ b/xen/arch/x86/msi.c >>> @@ -189,6 +189,15 @@ static int write_msi_msg(struct msi_desc *entry, >>> struct msi_msg *msg, >>> { >>> entry->msg = *msg; >>> >>> + if ( unlikely(system_state != SYS_STATE_active) ) >> >> ... this condition as needed. Conceivably there might be cases where even >> during resume writes aren't always necessary to propagate to hardware. > > Isn't this going to be quite more cumbersome, and I don't think the > cases outside of active mode are very interesting or relevant > performance wise? Certainly. As to cumbersome - having the conditions at the call sites would also serve as kind of documentation. That said, this was just a suggestion; I'm not going to insist. I see from other traffic on this thread the updated patch is now fully addressing the regression. The secondary change, if not broken out to a separate patch, would want mentioning in the description, though (which may have been the plan anyway). Jan > The main purpose of the original change was the reduce the number of > PCI accesses during 'active' state, as every time an interrupt is > migrated to a different CPU such a (possibly redundant) PCI config > write would happen. > > Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.