[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: Trap and yield on WFE instructions



On Wed, 2014-07-16 at 15:47 +0530, Anup Patel wrote:
> On 16 July 2014 15:39, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> > On Wed, 2014-07-16 at 15:37 +0530, Anup Patel wrote:
> >> On 16 July 2014 15:06, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> >> > On Wed, 2014-07-16 at 14:34 +0530, Anup Patel wrote:
> >> >> If we have a Guest/DomU with two or more of its VCPUs running
> >> >> on same host CPU then it can quite likely happen that these
> >> >> VCPUs fight for same spinlock and one of them will waste CPU
> >> >> cycles in WFE instruction. This patch makes WFE instruction
> >> >> trap for VCPU and forces VCPU to yield its timeslice.
> >> >>
> >> >> The KVM ARM/ARM64 also does similar thing for handling WFE
> >> >> instructions. (Please refer,
> >> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-November/006259.html)
> >> >>
> >> >> In general, this patch is more of an optimization for an
> >> >> oversubscribed system having number of VCPUs more than
> >> >> underlying host CPUs.
> >> >
> >> > Sounds good. Do you have any numbers for the improvement?
> >>
> >> I did not try any benchmarks myself but hackbench shows good
> >> improvement for KVM hence it is a worthy optimization for Xen too.
> >>
> >> I found this change missing for Xen hence this patch.
> >>
> >> >
> >> >> Signed-off-by: Anup Patel <anup.patel@xxxxxxxxxx>
> >> >> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@xxxxxxxxxx>
> >> >> Tested-by: Pranavkumar Sawargaonkar <pranavkumar@xxxxxxxxxx>
> >> >> ---
> >> >>  xen/arch/arm/traps.c |   27 ++++++++++++++++-----------
> >> >>  1 file changed, 16 insertions(+), 11 deletions(-)
> >> >>
> >> >> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> >> >> index 686d8b7..bc9f67a 100644
> >> >> --- a/xen/arch/arm/traps.c
> >> >> +++ b/xen/arch/arm/traps.c
> >> >> @@ -90,7 +90,7 @@ void __cpuinit init_traps(void)
> >> >>
> >> >>      /* Setup hypervisor traps */
> >> >>      WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
> >> >> -                 HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, HCR_EL2);
> >> >> +                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, 
> >> >> HCR_EL2);
> >> >>      isb();
> >> >>  }
> >> >>
> >> >> @@ -1803,16 +1803,21 @@ asmlinkage void do_trap_hypervisor(struct 
> >> >> cpu_user_regs *regs)
> >> >>              advance_pc(regs, hsr);
> >> >>              return;
> >> >>          }
> >> >> -        /* at the moment we only trap WFI */
> >> >> -        vcpu_block();
> >> >> -        /* The ARM spec declares that even if local irqs are masked in
> >> >> -         * the CPSR register, an irq should wake up a cpu from WFI 
> >> >> anyway.
> >> >> -         * For this reason we need to check for irqs that need 
> >> >> delivery,
> >> >> -         * ignoring the CPSR register, *after* calling SCHEDOP_block to
> >> >> -         * avoid races with vgic_vcpu_inject_irq.
> >> >> -         */
> >> >> -        if ( local_events_need_delivery_nomask() )
> >> >> -            vcpu_unblock(current);
> >> >> +        if ( hsr.bits & 0x1 ) {
> >> >
> >> > Please can you add an appropriate struct hsr_wfe_wfi to union hsr and
> >> > use that instead of opencoding bit patterns in hsr.bits.
> >>
> >> Sure, I will update this and send it right away.
> >
> > Thanks.
> >
> >> >> +            /* Yield the VCPU for WFE */
> >> >> +            vcpu_force_reschedule(current);
> >> >
> >> > This is OK fine by me, but I'm wondering: Does the HW have any support
> >> > for actually blocking until an event occurs in another vcpu (some sort
> >> > of trap on sev)? I don't think so, nor am I really sure how such a h/w
> >> > feature might work...
> >>
> >> Typically, on HW the WFE instruction forces cores to go in low power
> >> mode. The core will come out of low power mode due to some interrupt
> >> or SEV executed from another core.
> >
> > Right, but for virt in order to put a wfe-ing vcpu to sleep we would
> > need to be able to get some sort of hypervisor event when another vcpu
> > in the same guest executed sev.
> >
> > Otherwise the best we can do is to yield as you've done here, but that
> > will still keep spinning if there are no other vcpus to run.
> 
> Yes, we cannot trap SEV instruction.

Plus it would be unwieldy to have to enable that for all *other* vcpus
when one did a WFE.

> Yield seems to be the only option here, unless we modify the guest and
> replace SEV instruction with a hypercall.

Perhaps one day we can add PARAVIRT_SPINLOCKS to arm. But not now ;-)

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.