[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Linux: balloon_process() causing workqueue lockups?
On 27.08.2021 11:29, Juergen Gross wrote: > On 27.08.21 11:01, Jan Beulich wrote: >> ballooning down Dom0 by about 16G in one go once in a while causes: >> >> BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 64s! >> Showing busy workqueues and worker pools: >> workqueue events: flags=0x0 >> pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=2/256 refcnt=3 >> in-flight: 229:balloon_process >> pending: cache_reap >> workqueue events_freezable_power_: flags=0x84 >> pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 >> pending: disk_events_workfn >> workqueue mm_percpu_wq: flags=0x8 >> pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 >> pending: vmstat_update >> pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=64s workers=3 idle: 2222 43 >> >> I've tried to double check that this isn't related to my IOMMU work >> in the hypervisor, and I'm pretty sure it isn't. Looking at the >> function I see it has a cond_resched(), but aiui this won't help >> with further items in the same workqueue. >> >> Thoughts? > > I'm seeing two possible solutions here: > > 1. After some time (1 second?) in balloon_process() setup a new > workqueue activity and return (similar to EAGAIN, but without > increasing the delay). > > 2. Don't use a workqueue for the ballooning activity, use a kernel > thread instead. > > I have a slight preference for 2, even if the resulting patch will > be larger. 1 is only working around the issue and it is hard to > find a really good timeout value. > > I'd be fine to write a patch, but would prefer some feedback which > way to go. Was there a particular reason that a workqueue was used in the first place? Otherwise using a kernel thread would look like the way to go, indeed. The presence of cond_resched() kind of indicates such an intention already anyway. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |