[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Recent upgrade of 4.13 -> 4.14 issue
On 15.12.2020 20:08, Liwei wrote: > Hi list, > This is a reply to the thread of the same title (linked here: > https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxxxxxxxxx/msg84916.html > ) which I could not reply to because I receive this list by digest. > > I'm unclear if this is exactly the reason, but I experienced the > same symptoms when upgrading to 4.14. The issue does not occur if I > downgrade to 4.11 (the previous version that was provided by Debian). > Kernel is 5.9.11 and unchanged between xen versions. > > One thing I noticed is that if I disable the monitor/mwait > instructions on my CPU (Intel Xeon E5-2699 v4 ES), the stalls seem to > occur later into the boot. With the instructions enabled, the system > usually stalls less than a few minutes after boot; disabled, it can > last for tens of minutes. > > Further disabling the HPET or forcing the kernel to use PIT causes > it to be somewhat usable. The stalls still occur tens of minutes in > but somehow everything seems to continue chugging along fine? By "the kernel" do you really mean the kernel, or Xen? > I've also verified that the stalls do not occur in all the above > cases if I just boot into the kernel without xen. > > When the stalls happen, I get the "rcu: INFO: rcu_sched detected > stalls on CPUs/tasks" backtraces printed on the console periodically, > but keystrokes don't do anything on the console, and I can't spawn new > SSH sessions even though pinging the system produces a reply. The last > item in the call trace is usually "xen_safe_halt", but I've seen it > occur for other functions related to btrfs and the network adapter as > well. The kernel log may not be the only relevant thing here - the hypervisor log may also need looking at (with full verbosity enabled and preferably a debug build in use). > Do let me know if there's anything I can provide to help > troubleshoot this. At the moment I've reverted to 4.11, but I can > temporarily switch over to 4.14 to collect any necessary information. In that earlier thread a number of things to try were suggested, iirc (switching scheduler or disabling use of deep C states come to mind). Did you experiment with those? If so, can you let us know of the results, so we can see whether there's a pattern? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |