[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 11 Jul 2025 08:10:50 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Stefano Stabellini <stefano.stabellini@xxxxxxx>, andrew.cooper3@xxxxxxxxxx, roger.pau@xxxxxxxxxx, Xenia.Ragiadakou@xxxxxxx, alejandro.garciavallejo@xxxxxxx, Jason.Andryuk@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 11 Jul 2025 06:11:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11.07.2025 03:34, Stefano Stabellini wrote:
> On Thu, 10 Jul 2025, Jan Beulich wrote:
>> - the function being a timer handler, it would be preferable if the timer
>>   wasn't ever activated in this mode (at which point rather than returning
>>   early, the code above could simply be purged, maybe replaced by e.g. an
>>   assertion),
> 
> I see your point about the timer not being activated in the first place.
> 
> But if we want to make the code more reliable we should keep the if
> (clocksource_is_tsc()) return; in time_calibration. That way, in case of
> mistakes elsewhere, still the desired behavior is obtained.
> 
> I'll add the changes to cpu_frequency_change and local_time_calibration.
> I'll append an incremental patch to clarify my intent.
> 
> 
>> - the above in particular requires dealing with cpu_frequency_change() (the
>>   other of the two places where the timer is actually activated).
>>
>> Some care may be needed in all of this taking into consideration that the
>> platform timer change to TSC happens late. Albeit commit f954a1bf5f74
>> ("x86/time: change initiation of the calibration timer") has imo eliminated
>> the main concern here.
>>
>> As to skipping the rendezvous: Besides invoking the calibration softirq,
>> time_calibration_nop_rendezvous() also updates the per-CPU cpu_calibration
>> fields. There would thus need to be a pretty formal proof that calculations
>> involving ->local_stime or ->local_tsc can't possibly degrade or even
>> degenerate when they remain at their boot-time values. (As to
>> ->master_stime, afaict the field simply isn't used at all in that mode,
>> which is a fair part of the reason why the code change above is okay _if_
>> the rendezvous itself can be eliminated. The justification for that could
>> also do with extending some, considering that much of the involved code is
>> pretty subtle.) Alternatively, if such a proof turned out impossible,
>> another way of updating the fields every once in a while would need adding.
> 
> Do you mean a formal proof that the TSC is actually stable from a
> hardware perspective? The software algorithm is the same no matter the
> number of updates.

No, I really mean what I said - as the deltas are going to get larger that
are used as inputs to the calculations, it is (at least to me) not entirely
obvious that the calculations using those deltas can't degrade.

>> Finally, what you do here isn't entirely reliable as to your apparent end
>> goal: "clocksource=tsc" is respected only when tsc_check_reliability()
>> completes with an acceptable outcome. There's certainly some variability in
>> this across multiple runs, i.e. if things went extremely bad, once in blue
>> moon you may end up with the TSC being rejected for use as platform timer.
>  
> That is interesting! One option is to change the code so that
> clocksource=tsc is always respected. I have appended the change on top
> of this patch. Please let me know if you have other suggestions.
> 
> 
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index d72e640f72..d29266086d 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1877,7 +1877,7 @@ int cpu_frequency_change(u64 freq)
>      update_vcpu_system_time(current);
>  
>      /* A full epoch should pass before we check for deviation. */
> -    if ( smp_processor_id() == 0 )
> +    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
>      {
>          set_timer(&calibration_timer, NOW() + EPOCH);
>          platform_time_calibration();
> @@ -2024,7 +2024,7 @@ static void cf_check local_time_calibration(void)
>      update_vcpu_system_time(current);
>  
>   out:
> -    if ( smp_processor_id() == 0 )
> +    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
>      {
>          set_timer(&calibration_timer, NOW() + EPOCH);
>          platform_time_calibration();

Is this necessary? In this mode we won't make it into this function anymore,
will we? Hence if anything an early-out would be applicable.

> @@ -2271,22 +2271,6 @@ static void cf_check 
> time_calibration_std_rendezvous(void *_r)
>      time_calibration_rendezvous_tail(r, 0, rdtsc_ordered());
>  }
>  
> -/*
> - * Rendezvous function used when clocksource is TSC and
> - * no CPU hotplug will be performed.
> - */
> -static void cf_check time_calibration_nop_rendezvous(void *rv)
> -{
> -    const struct calibration_rendezvous *r = rv;
> -    struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
> -
> -    c->local_tsc    = r->master_tsc_stamp;
> -    c->local_stime  = r->master_stime;
> -    c->master_stime = r->master_stime;
> -
> -    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
> -}
> -
>  static void (*time_calibration_rendezvous_fn)(void *) =
>      time_calibration_std_rendezvous;
>  
> @@ -2488,7 +2472,7 @@ static int __init cf_check verify_tsc_reliability(void)
>           * CPUs are booted.
>           */
>          tsc_check_reliability();
> -        if ( tsc_max_warp )
> +        if ( tsc_max_warp && strcmp(opt_clocksource, "tsc") )
>          {
>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> @@ -2506,21 +2490,12 @@ static int __init cf_check 
> verify_tsc_reliability(void)
>               */
>              on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>  
> -            /*
> -             * We won't do CPU Hotplug and TSC clocksource is being used 
> which
> -             * means we have a reliable TSC, plus we don't sync with any 
> other
> -             * clocksource so no need for rendezvous.
> -             */
> -            time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;

Much like you prefer to leave a safeguard in time_calibration(), I think you
want to either leave a safeguard in the rendezvous handler now "used" instead,
or you want to poison this pointer. Any of such safeguards then imo want to
include ASSERT_UNREACHABLE().

Plus of course I hope it goes without saying that much also depends on the
(to be extended) patch description.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.