[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with APIC on versions 4.9 and later (4.8 works)



On 23.01.2021 00:36, Claudemir Todo Bom wrote:
> Em qua., 20 de jan. de 2021 às 12:13, Jürgen Groß <jgross@xxxxxxxx> escreveu:
>>
>> On 20.01.21 09:50, Jan Beulich wrote:
>>> On 19.01.2021 20:36, Claudemir Todo Bom wrote:
>>>> I do not have serial output on this setup, so I recorded a video with
>>>> boot_delay=50 in order to be able to get all the kernel messages:
>>>> https://youtu.be/y95h6vqoF7Y
>>>
>>> This doesn't show any badness afaics.
>>>
>>>> This is running 4.14 from debian bullseye (testing).
>>>>
>>>> I'm also attaching the dmesg output when booting xen 4.8 with  the same
>>>> kernel version and same parameters.
>>>>
>>>> I visually compared all the messages, and the only thing I noticed was that
>>>> 4.14 used tsc as clocksource and 4.8 used xen. I tried to boot the kernel
>>>> with "clocksource=xen" and the problem is happening with that also.
>>>
>>> There's some confusion here I suppose: The clock source you talk
>>> about is the kernel's, not Xen's. I didn't think this would
>>> change for the same kernel version with different Xen underneath,
>>> but the Linux maintainers of the Xen code there may know better.
>>> Cc-ing them.
>>
>> This might depend on CPUID bits given to dom0 by Xen, e.g. regarding
>> TSC stability.
>>
> 
> Looks like the CPUID changes I observed and wrote on the other
> messages are another
> problem I may end up with. I narrowed down the cause of the problem on
> booting of dom0 with more than 1 core on the following commit:
> 
> https://github.com/xen-project/xen/commit/63e1d01b8fd948b3e0fa3beea494e407668aa43b
> 
> Code built from this commit doesn't boot, built from the parent of it, boots.

Odd.

> Now, there is something I can do on the command line to make it boots?
> Or its needed to fix on the code?

That's too early to ask. We first need to understand what's going
on. There are two things I'd like you to try: One is to use
"clocksource=tsc" on the Xen (not the kernel) command line, and
the other (without that option) is to try the debugging patch
below. Of course that patch is only going to be useful when you
can somehow record Xen's log messages up to the point where the
system hangs. (Both ideally on as new a Xen as you can arrange
for.)

Jan

--- unstable.orig/xen/arch/x86/time.c
+++ unstable/xen/arch/x86/time.c
@@ -1799,9 +1799,11 @@ static void time_calibration(void *unuse
     cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
 
     /* @wait=1 because we must wait for all cpus before freeing @r. */
+printk("TSC: %ps\n", time_calibration_rendezvous_fn);//temp
     on_selected_cpus(&r.cpu_calibration_map,
                      time_calibration_rendezvous_fn,
                      &r, 1);
+printk("TSC: end rendezvous\n");//temp
 }
 
 static struct cpu_time_stamp ap_bringup_ref;
@@ -2043,6 +2045,7 @@ static int __init verify_tsc_reliability
      * While with constant-rate TSCs the scale factor can be shared, when TSCs
      * are not marked as 'reliable', re-sync during rendezvous.
      */
+printk("TSC: c=%d r=%d\n", !!boot_cpu_has(X86_FEATURE_CONSTANT_TSC), 
!!boot_cpu_has(X86_FEATURE_TSC_RELIABLE));//temp
     if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
          !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
         time_calibration_rendezvous_fn = time_calibration_tsc_rendezvous;
@@ -2056,6 +2059,7 @@ int __init init_xen_time(void)
 {
     tsc_check_writability();
 
+printk("TSC: c=%d r=%d\n", !!boot_cpu_has(X86_FEATURE_CONSTANT_TSC), 
!!boot_cpu_has(X86_FEATURE_TSC_RELIABLE));//temp
     open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration);
 
     /* NB. get_wallclock_time() can take over one second to execute. */




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.