[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: domU suspend issue - freeze processes failed - Linux 6.16
- To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
- From: Jürgen Groß <jgross@xxxxxxxx>
- Date: Fri, 22 Aug 2025 17:27:20 +0200
- Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
- Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
- Delivery-date: Fri, 22 Aug 2025 15:27:36 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
On 22.08.25 16:42, Marek Marczykowski-Górecki wrote:
On Fri, Aug 22, 2025 at 04:39:33PM +0200, Marek Marczykowski-Górecki wrote:
Hi,
When suspending domU I get the following issue:
Freezing user space processes
Freezing user space processes failed after 20.004 seconds (1 tasks
refusing to freeze, wq_busy=0):
task:xl state:D stack:0 pid:466 tgid:466 ppid:1
task_flags:0x400040 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x2f3/0x780
schedule+0x27/0x80
schedule_preempt_disabled+0x15/0x30
__mutex_lock.constprop.0+0x49f/0x880
unregister_xenbus_watch+0x216/0x230
xenbus_write_watch+0xb9/0x220
xenbus_file_write+0x131/0x1b0
vfs_writev+0x26c/0x3d0
? do_writev+0xeb/0x110
do_writev+0xeb/0x110
do_syscall_64+0x84/0x2c0
? do_syscall_64+0x200/0x2c0
? generic_handle_irq+0x3f/0x60
? syscall_exit_work+0x108/0x140
? do_syscall_64+0x200/0x2c0
? __irq_exit_rcu+0x4c/0xe0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x79b618138642
RSP: 002b:00007fff9a192fc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00000000024fd490 RCX: 000079b618138642
RDX: 0000000000000003 RSI: 00007fff9a193120 RDI: 0000000000000014
RBP: 00007fff9a193000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
R13: 00007fff9a193120 R14: 0000000000000003 R15: 0000000000000000
</TASK>
OOM killer enabled.
Restarting tasks: Starting
Restarting tasks: Done
xen:manage: do_suspend: freeze processes failed -16
The process in question is `xl devd` daemon. It's a domU serving a
xenvif backend.
I noticed it on 6.16.1, but looking at earlier test logs I see it with
6.16-rc6 already (but interestingly, not 6.16-rc2 yet? feels weird given
seemingly no relevant changes between rc2 and rc6).
I forgot to include link for (a little) more details:
https://github.com/QubesOS/qubes-linux-kernel/pull/1157
Especially, there is another call trace with panic_on_warn enabled -
slightly different, but looks related.
I'm pretty sure the PV variant for suspending is just wrong: it is calling
dpm_suspend_start() from do_suspend() without taking the required
system_transition_mutex, resulting in the WARN() in pm_restrict_gfp_mask().
It might be as easy as just adding the mutex() call to do_suspend(), but I'm
really not sure that will be a proper fix.
Juergen
Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature
|