[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Date: Tue, 3 Jun 2014 11:43:50 +0100
Cc: vijay.kilari@xxxxxxxxx, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Prasun.Kapoor@xxxxxxxxxxxxxxxxxx, Vijaya Kumar K <vijaya.kumar@xxxxxxxxxxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, marc.zyngier@xxxxxxx, stefano.stabellini@xxxxxxxxxx
Delivery-date: Tue, 03 Jun 2014 10:44:14 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, 3 Jun 2014, Ian Campbell wrote:
> On Tue, 2014-06-03 at 10:05 +0100, Julien Grall wrote:
> > 
> > On 03/06/14 09:54, Ian Campbell wrote:
> > > On Mon, 2014-06-02 at 18:33 +0100, Stefano Stabellini wrote:
> > >>> +static inline void gicv3_restore_lr(int nr_lrs, const struct vcpu *v)
> > >>> +{
> > >>> +    /* Fall through for all the cases */
> > >>> +    switch ( nr_lrs )
> > >>> +    {
> > >>> +    case 16:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[15], ICH_LR15_EL2);
> > >>> +    case 15:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[14], ICH_LR14_EL2);
> > >>> +    case 14:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[13], ICH_LR13_EL2);
> > >>> +    case 13:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[12], ICH_LR12_EL2);
> > >>> +    case 12:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[11], ICH_LR11_EL2);
> > >>> +    case 11:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[10], ICH_LR10_EL2);
> > >>> +    case 10:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[9], ICH_LR9_EL2);
> > >>> +    case 9:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[8], ICH_LR8_EL2);
> > >>> +    case 8:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[7], ICH_LR7_EL2);
> > >>> +    case 7:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[6], ICH_LR6_EL2);
> > >>> +    case 6:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[5], ICH_LR5_EL2);
> > >>> +    case 5:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[4], ICH_LR4_EL2);
> > >>> +    case 4:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[3], ICH_LR3_EL2);
> > >>> +    case 3:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[2], ICH_LR2_EL2);
> > >>> +    case 2:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[1], ICH_LR1_EL2);
> > >>> +    case 1:
> > >>> +        WRITE_SYSREG(v->arch.gic.v3.lr[0], ICH_LR0_EL2);
> > >>> +         break;
> > >>> +    default:
> > >>> +         BUG();
> > >>> +    }
> > >>> +}
> > >>
> > >> Given that the number of LR registers has grown up quite a bit from
> > >> GICv2, we should optimize this code and only save and restore the
> > >> registers that are actually in used by checking on lr_mask.
> > >
> > > I'm not convinced that will be faster, the code above involves a
> > > straight line with no branches, but potentially some unnecessary stores.
> > > Looping or lr_mask involves a loop, a test_bit and most critically a
> > > switch over the lr number (to select the correct sysreg, which is a
> > > static register, not an opcode argument), but has no redundant stores. I
> > > suspect the straight line version will actually be quicker.
> > 
> > The solution suggested by Stefano doesn't work as it is. We only 
> > save/restore the LR register marked as used in the lr_mask (which is 
> > per-VCPU). We may end up with LR from the previous running VCPU because 
> > Xen will restore only the necessary LRs.
> > 
> > To fix the solution would be to clear the LRs when we save all LRs. But 
> > I'm not convince it will be faster than the current solution.
> 
> yes, that's a second nail in that idea's coffin I think.
 
I agree that we would need to run a few benchmarks to be sure and I
agree that this is OK in first instance.

However keep in mind that last time I checked on GICv2 reading or
writing to one LR costs about 120-140 nanoseconds. I dropped the idea of
using lr_mask on GICv2 because there are usually only very few LRs (4),
so it wouldn't make sense there but I bet it makes sense here.

16*130 ns = 2080 ns

That would be about 4000 cycles on a 2Ghz processor. We can certainly
loop through a bitmask in far less than that.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
  - From: Stefano Stabellini
- Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
  - From: Ian Campbell
- Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
  - From: Julien Grall
- Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
  - From: Ian Campbell

Prev by Date: Re: [Xen-devel] [Hackathon] libvirt session notes
Next by Date: Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
Previous by thread: Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
Next by thread: Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.