[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with APIC on versions 4.9 and later (4.8 works)



Em qua., 20 de jan. de 2021 às 12:13, Jürgen Groß <jgross@xxxxxxxx> escreveu:
>
> On 20.01.21 09:50, Jan Beulich wrote:
> > On 19.01.2021 20:36, Claudemir Todo Bom wrote:
> >> I do not have serial output on this setup, so I recorded a video with
> >> boot_delay=50 in order to be able to get all the kernel messages:
> >> https://youtu.be/y95h6vqoF7Y
> >
> > This doesn't show any badness afaics.
> >
> >> This is running 4.14 from debian bullseye (testing).
> >>
> >> I'm also attaching the dmesg output when booting xen 4.8 with  the same
> >> kernel version and same parameters.
> >>
> >> I visually compared all the messages, and the only thing I noticed was that
> >> 4.14 used tsc as clocksource and 4.8 used xen. I tried to boot the kernel
> >> with "clocksource=xen" and the problem is happening with that also.
> >
> > There's some confusion here I suppose: The clock source you talk
> > about is the kernel's, not Xen's. I didn't think this would
> > change for the same kernel version with different Xen underneath,
> > but the Linux maintainers of the Xen code there may know better.
> > Cc-ing them.
>
> This might depend on CPUID bits given to dom0 by Xen, e.g. regarding
> TSC stability.

Will ignore this for now, I suppose it is not the cause of the problem.

> >> The "start" of the problem is that when the kernel gets to the "Freeing
> >> unused kernel image (initmem) memory: 2380K" it hangs and stays there for a
> >> while. After a few minutes it shows that a process (swapper) is blocked for
> >> sometime (image attached)
> >
> > Now that's pretty unusual - the call trace seen in the screen
> > shot you had attached indicates the kernel didn't even make it
> > past its own initialization just yet. Just to have explored that
> > possibility - could you enable Xen's NMI watchdog (simply
> > "watchdog" on the Xen command line)? Among the boot messages
> > there ought to be one indicating whether it actually works on
> > your system. Without a serial console you wouldn't see anything
> > if it triggers, but the system would then never make it to the
> > kernel side issue.

"watchdog" parameter changed nothing.

> > As far as making sure we at least see all kernel messages -
> > are you having "ignore_loglevel" in place? I don't think I've
> > been able to spot the kernel command line anywhere in the video.

I was using loglevel=7, since it is the maximum level according to
documentation, should be the same, but tested with "ignore_loglevel"
and it looks pretty similar.

> > I'm afraid there's no real way around seeing the full Xen
> > messages, i.e. including possible ones while Dom0 already boots
> > (and allowing some debug keys to be issued, as the rcu_barrier
> > on the stack may suggest there's an issue with one of the
> > secondary CPUs). You could try whether "vag=keep" on the Xen
> > command line allows you to see more, but this option may have
> > quite severe an effect on the timing of Dom0's booting, which
> > may make an already bad situation worse.

already used "vga=keep", no new information. Will try to enable a
serial output in order to debug more. Is there any parameters I could
give to Xen in order to it write more information on serial line while
the dom0 is booting on the screen?

> > Alternatively the kernel may need instrumenting to figure what
> > exactly it is that prevent forward progress.
> >
> > There's one other wild guess you may want to try: "cpuidle=no"
> > on the Xen command line.
> Other wild guesses are:
>
> - add "sched=credit" to the Xen command line
>
> or
>
> - add "xen.fifo_events=0" to the dom0 command line

all 3 suggestions changed nothing.

I noticed that Debian have a lot of distribution managed patches, so I
think that if I want to find exactly where after 4.8.5 the problem
started I will need to build Xen from sources ignoring debian helpers.

Best regards,
Claudemir



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.