[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Problems with APIC on versions 4.9 and later (4.8 works)
Em qua., 20 de jan. de 2021 às 12:13, Jürgen Groß <jgross@xxxxxxxx> escreveu: > > On 20.01.21 09:50, Jan Beulich wrote: > > On 19.01.2021 20:36, Claudemir Todo Bom wrote: > >> I do not have serial output on this setup, so I recorded a video with > >> boot_delay=50 in order to be able to get all the kernel messages: > >> https://youtu.be/y95h6vqoF7Y > > > > This doesn't show any badness afaics. > > > >> This is running 4.14 from debian bullseye (testing). > >> > >> I'm also attaching the dmesg output when booting xen 4.8 with the same > >> kernel version and same parameters. > >> > >> I visually compared all the messages, and the only thing I noticed was that > >> 4.14 used tsc as clocksource and 4.8 used xen. I tried to boot the kernel > >> with "clocksource=xen" and the problem is happening with that also. > > > > There's some confusion here I suppose: The clock source you talk > > about is the kernel's, not Xen's. I didn't think this would > > change for the same kernel version with different Xen underneath, > > but the Linux maintainers of the Xen code there may know better. > > Cc-ing them. > > This might depend on CPUID bits given to dom0 by Xen, e.g. regarding > TSC stability. Will ignore this for now, I suppose it is not the cause of the problem. > >> The "start" of the problem is that when the kernel gets to the "Freeing > >> unused kernel image (initmem) memory: 2380K" it hangs and stays there for a > >> while. After a few minutes it shows that a process (swapper) is blocked for > >> sometime (image attached) > > > > Now that's pretty unusual - the call trace seen in the screen > > shot you had attached indicates the kernel didn't even make it > > past its own initialization just yet. Just to have explored that > > possibility - could you enable Xen's NMI watchdog (simply > > "watchdog" on the Xen command line)? Among the boot messages > > there ought to be one indicating whether it actually works on > > your system. Without a serial console you wouldn't see anything > > if it triggers, but the system would then never make it to the > > kernel side issue. "watchdog" parameter changed nothing. > > As far as making sure we at least see all kernel messages - > > are you having "ignore_loglevel" in place? I don't think I've > > been able to spot the kernel command line anywhere in the video. I was using loglevel=7, since it is the maximum level according to documentation, should be the same, but tested with "ignore_loglevel" and it looks pretty similar. > > I'm afraid there's no real way around seeing the full Xen > > messages, i.e. including possible ones while Dom0 already boots > > (and allowing some debug keys to be issued, as the rcu_barrier > > on the stack may suggest there's an issue with one of the > > secondary CPUs). You could try whether "vag=keep" on the Xen > > command line allows you to see more, but this option may have > > quite severe an effect on the timing of Dom0's booting, which > > may make an already bad situation worse. already used "vga=keep", no new information. Will try to enable a serial output in order to debug more. Is there any parameters I could give to Xen in order to it write more information on serial line while the dom0 is booting on the screen? > > Alternatively the kernel may need instrumenting to figure what > > exactly it is that prevent forward progress. > > > > There's one other wild guess you may want to try: "cpuidle=no" > > on the Xen command line. > Other wild guesses are: > > - add "sched=credit" to the Xen command line > > or > > - add "xen.fifo_events=0" to the dom0 command line all 3 suggestions changed nothing. I noticed that Debian have a lot of distribution managed patches, so I think that if I want to find exactly where after 4.8.5 the problem started I will need to build Xen from sources ignoring debian helpers. Best regards, Claudemir
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |