|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH, RFC] x86/HVM: batch vCPU wakeups
At 09:33 +0100 on 09 Sep (1410251617), Jan Beulich wrote:
> Mass wakeups (via vlapic_ipi()) can take enormous amounts of time,
> especially when many of the remote pCPU-s are in deep C-states. For
> 64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware,
> accumulated times of over 2ms were observed. Considering that Windows
> broadcasts IPIs from its timer interrupt
Bleargh. :(
> RFC for two reasons:
> 1) I started to consider elimination of the event-check IPIs in a more
> general way when MONITOR/MWAIT is in use: As long as the remote CPU
> is known to be MWAITing (or in the process of resuming after MWAIT),
> the write of softirq_pending(cpu) ought to be sufficient to wake it.
> This would yield the patch here pointless on MONITOR/MWAIT capable
> hardware.
That's a promising line. I guess the number of 64-way systems that
don't support MONITOR/MWAIT is pretty small. :)
> 2) The condition when to enable batching in vlapic_ipi() is already
> rather complex, but it is nevertheless unclear whether it would
> benefit from further tuning (as mentioned above).
The test you chose seems plausible enough to me. I suppose we care
enough about single-destination IPIs on idle hardware that we need
this check at all; but I doubt that IPIs to exactly two destinations,
for example, are worth worrying about.
The implementation looks OK in general; a few nits below.
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -447,12 +447,30 @@ void vlapic_ipi(
>
> default: {
> struct vcpu *v;
> - for_each_vcpu ( vlapic_domain(vlapic), v )
> + const struct domain *d = vlapic_domain(vlapic);
> + bool_t batch = d->max_vcpus > 2 &&
> + (short_hand
> + ? short_hand != APIC_DEST_SELF
> + : vlapic_x2apic_mode(vlapic)
> + ? dest_mode ? hweight16(dest) > 1
> + : dest == 0xffffffff
> + : dest_mode
> + ? hweight8(dest &
> + GET_xAPIC_DEST_FIELD(
> + vlapic_get_reg(vlapic,
> + APIC_DFR))) > 1
> + : dest == 0xff);
Yoiks. Could you extract some of this into a helper function, say
is_unicast_dest(), just for readability?
> void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
> {
> - if ( !test_and_set_bit(nr, &softirq_pending(cpu))
> - && (cpu != smp_processor_id()) )
> + unsigned int this_cpu = smp_processor_id();
The name clashes with the this_cpu() macro (obviously OK for compiling
but harder to read) and can be dropped since...
> +
> + if ( test_and_set_bit(nr, &softirq_pending(cpu))
> + || (cpu == this_cpu) )
> + return;
> +
> + if ( !per_cpu(batching, this_cpu) || in_irq() )
this_cpu(batching) is the preferred way of addressing this on Xen
anyway - though I guess maybe it's _slightly_ less efficient to
recalculate get_cpu_info() here than to indirect through the
per_cpu_offsets[]? I haven't looked at the asm.
> smp_send_event_check_cpu(cpu);
> + else
> + set_bit(nr, &per_cpu(batch_mask, this_cpu));
> +}
> +
> +void cpu_raise_softirq_batch_begin(void)
> +{
> + ++per_cpu(batching, smp_processor_id());
This one should definitely be using this_cpu().
> +}
> +
> +void cpu_raise_softirq_batch_finish(void)
> +{
> + unsigned int cpu, this_cpu = smp_processor_id();
> + cpumask_t *mask = &per_cpu(batch_mask, this_cpu);
Again, this_cpu()?
> + ASSERT(per_cpu(batching, this_cpu));
> + for_each_cpu ( cpu, mask )
> + if ( !softirq_pending(cpu) )
> + cpumask_clear_cpu(cpu, mask);
> + smp_send_event_check_mask(mask);
> + cpumask_clear(mask);
> + --per_cpu(batching, this_cpu);
> }
One other thing that occurs to me is that we might do the batching in
the caller (by gathering a mask of pcpus in the vlapic code) but that
sounds messy. And it's not like softirq is so particularly
latency-sensitive that we'd want to avoid this much overhead in the
common case. So on second thoughts this way is better. :)
Cheers,
Tim.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |