|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
On Tue, 3 Jun 2014, Ian Campbell wrote:
> On Tue, 2014-06-03 at 10:05 +0100, Julien Grall wrote:
> >
> > On 03/06/14 09:54, Ian Campbell wrote:
> > > On Mon, 2014-06-02 at 18:33 +0100, Stefano Stabellini wrote:
> > >>> +static inline void gicv3_restore_lr(int nr_lrs, const struct vcpu *v)
> > >>> +{
> > >>> + /* Fall through for all the cases */
> > >>> + switch ( nr_lrs )
> > >>> + {
> > >>> + case 16:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[15], ICH_LR15_EL2);
> > >>> + case 15:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[14], ICH_LR14_EL2);
> > >>> + case 14:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[13], ICH_LR13_EL2);
> > >>> + case 13:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[12], ICH_LR12_EL2);
> > >>> + case 12:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[11], ICH_LR11_EL2);
> > >>> + case 11:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[10], ICH_LR10_EL2);
> > >>> + case 10:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[9], ICH_LR9_EL2);
> > >>> + case 9:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[8], ICH_LR8_EL2);
> > >>> + case 8:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[7], ICH_LR7_EL2);
> > >>> + case 7:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[6], ICH_LR6_EL2);
> > >>> + case 6:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[5], ICH_LR5_EL2);
> > >>> + case 5:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[4], ICH_LR4_EL2);
> > >>> + case 4:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[3], ICH_LR3_EL2);
> > >>> + case 3:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[2], ICH_LR2_EL2);
> > >>> + case 2:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[1], ICH_LR1_EL2);
> > >>> + case 1:
> > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[0], ICH_LR0_EL2);
> > >>> + break;
> > >>> + default:
> > >>> + BUG();
> > >>> + }
> > >>> +}
> > >>
> > >> Given that the number of LR registers has grown up quite a bit from
> > >> GICv2, we should optimize this code and only save and restore the
> > >> registers that are actually in used by checking on lr_mask.
> > >
> > > I'm not convinced that will be faster, the code above involves a
> > > straight line with no branches, but potentially some unnecessary stores.
> > > Looping or lr_mask involves a loop, a test_bit and most critically a
> > > switch over the lr number (to select the correct sysreg, which is a
> > > static register, not an opcode argument), but has no redundant stores. I
> > > suspect the straight line version will actually be quicker.
> >
> > The solution suggested by Stefano doesn't work as it is. We only
> > save/restore the LR register marked as used in the lr_mask (which is
> > per-VCPU). We may end up with LR from the previous running VCPU because
> > Xen will restore only the necessary LRs.
> >
> > To fix the solution would be to clear the LRs when we save all LRs. But
> > I'm not convince it will be faster than the current solution.
>
> yes, that's a second nail in that idea's coffin I think.
I agree that we would need to run a few benchmarks to be sure and I
agree that this is OK in first instance.
However keep in mind that last time I checked on GICv2 reading or
writing to one LR costs about 120-140 nanoseconds. I dropped the idea of
using lr_mask on GICv2 because there are usually only very few LRs (4),
so it wouldn't make sense there but I bet it makes sense here.
16*130 ns = 2080 ns
That would be about 4000 cycles on a 2Ghz processor. We can certainly
loop through a bitmask in far less than that.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |