|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 4/6] xen/arm: inflight irqs during migration
On Wed, 18 Jun 2014, Ian Campbell wrote:
> On Wed, 2014-06-11 at 17:27 +0100, Stefano Stabellini wrote:
> > We need to take special care when migrating irqs that are already
> > inflight from one vcpu to another. See "The effect of changes to an
> > GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> > Controller Architecture Specification.
> >
> > The main issue from the Xen point of view is that the lr_pending and
> > inflight lists are per-vcpu. The lock we take to protect them is also
> > per-vcpu.
> >
> > In order to avoid issues, we set a new flag GIC_IRQ_GUEST_MIGRATING, so
> > that we can recognize when we receive an irq while the previous one is
> > still inflight (given that we are only dealing with hardware interrupts
> > here, it just means that its LR hasn't been cleared yet on the old vcpu).
> >
> > If GIC_IRQ_GUEST_MIGRATING is set, we only set GIC_IRQ_GUEST_QUEUED and
> > interrupt the old vcpu. When clearing the LR on the old vcpu, we take
> > special care of injecting the interrupt into the new vcpu. To do that we
> > need to release the old vcpu lock before taking the new vcpu lock.
> >
> > Using barriers and test_bit on GIC_IRQ_GUEST_MIGRATING, make sure that
> > vgic_migrate_irq can run at the same time as gic_update_one_lr on
> > different cpus with consistent results.
> >
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> >
> > ---
> >
> > Changes in v5:
> > - pass unsigned long to find_next_bit for alignment on aarch64;
> > - call vgic_get_target_vcpu instead of gic_add_to_lr_pending to add the
> > irq in the right inflight queue;
> > - add barrier and bit tests to make sure that vgic_migrate_irq and
> > gic_update_one_lr can run simultaneously on different cpus without
> > issues;
> > - rework the loop to identify the new vcpu when ITARGETSR is written;
> > - use find_first_bit instead of find_next_bit.
> > ---
> > xen/arch/arm/gic.c | 25 ++++++++++++++--
> > xen/arch/arm/vgic.c | 66
> > +++++++++++++++++++++++++++++++++++++-----
> > xen/include/asm-arm/domain.h | 4 +++
> > 3 files changed, 85 insertions(+), 10 deletions(-)
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 5e502df..8ed242e 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -677,10 +677,31 @@ static void gic_update_one_lr(struct vcpu *v, int i)
> > clear_bit(GIC_IRQ_GUEST_ACTIVE, &p->status);
> > p->lr = GIC_INVALID_LR;
> > if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) &&
> > - test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) )
> > + test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) &&
> > + !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> > gic_raise_guest_irq(v, irq, p->priority);
> > - else
> > + else {
> > list_del_init(&p->inflight);
> > +
> > + /* inflight is cleared before clearing
> > + * GIC_IRQ_GUEST_MIGRATING */
> > + dsb(sy);
>
> Is sy really necessary? THis is all about state stored in RAM not the
> actual GIC, right? I think "ish" is probably sufficient. POssibly even a
> dmb of some sort.
Yeah, it is only about the state in RAM. It only needs to be visible
across all the cores, not the devices. So I guess the right function to
call would be smp_wmb() that translates to dmb(ishst)?
That makes me think that I should add smp_rmb() to the corresponding
critical section: vgic_migrate_irq.
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 4d655af..e640de9 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -543,6 +562,8 @@ static int vgic_distr_mmio_write(struct vcpu *v,
> > mmio_info_t *info)
> > int offset = (int)(info->gpa - v->domain->arch.vgic.dbase);
> > int gicd_reg = REG(offset);
> > uint32_t tr;
> > + unsigned long trl;
> > + int i;
> >
> > switch ( gicd_reg )
> > {
> > @@ -626,22 +647,45 @@ static int vgic_distr_mmio_write(struct vcpu *v,
> > mmio_info_t *info)
> > rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR);
> > if ( rank == NULL) goto write_ignore;
> > /* 8-bit vcpu mask for this domain */
> > - tr = (1 << v->domain->max_vcpus) - 1;
> > - tr = tr | (tr << 8) | (tr << 16) | (tr << 24);
> > - tr &= *r;
> > + trl = (1 << v->domain->max_vcpus) - 1;
> > + if ( dabt.size == 2 )
> > + trl = trl | (trl << 8) | (trl << 16) | (trl << 24);
> > + else
> > + trl = (trl << (8 * (offset & 0x3)));
> > + trl &= *r;
>
> What does trl stand for? Why is it an unsigned long when r and tr are
> uint32_t?
It is an unfortunate name. trl is exactly like tr but unsigned long
instead of uint32_t.
It is unsigned long so that it can be used as an argument of
find_next_bit below. uint32_t cannot be used because it doesn't have the
right alignment on armv8.
> > /* ignore zero writes */
> > - if ( !tr )
> > + if ( !trl )
> > goto write_ignore;
> > if ( dabt.size == 2 &&
> > - !((tr & 0xff) && (tr & (0xff << 8)) &&
> > - (tr & (0xff << 16)) && (tr & (0xff << 24))))
> > + !((trl & 0xff) && (trl & (0xff << 8)) &&
> > + (trl & (0xff << 16)) && (trl & (0xff << 24))))
> > goto write_ignore;
>
> I'm still not sure what this is ;-)
All the bytes in trl needs to be non-zero for the write to be valid
(considering a 0 write to any of the byte fields as invalid).
> > vgic_lock_rank(v, rank);
> > + i = 0;
> > + while ( (i = find_next_bit((const unsigned long *)&trl, 32, i)) <
> > 32 )
>
> Unnecessary cast (if trl really should be an unsigned long).
The argument to find_next_bit really needs to be unsigned long. I'll
remove the cast.
> > + {
> > + unsigned int irq, target, old_target;
> > + unsigned long old_target_mask;
> > + struct vcpu *v_target, *v_old;
> > +
> > + target = i % 8;
> > + old_target_mask = byte_read(rank->itargets[REG_RANK_INDEX(8,
> > gicd_reg - GICD_ITARGETSR)], 0, i/8);
> > + old_target = find_first_bit((const unsigned long *)
> > &old_target_mask, 8);
>
> Another unnecessary cast.
Right.
> > +
> > + if ( target != old_target )
>
> Given the implementation only supports a single target I'm wondering if
> maybe we should just store the configured target.
>
> If we weren't required to read back the actual value then we could even
> dispense with rank->itargets completely and just fabricate it.
Yeah, that's a good point, I am not sure.
The spec says that it is valid to statically configure GICD_ITARGETSR
and ignore any writes to it, but it doesn't say that we can ignore
/some/ writes to it.
> > + {
> > + irq = offset + (i / 8);
> > + v_target = v->domain->vcpu[target];
> > + v_old = v->domain->vcpu[old_target];
> > + vgic_migrate_irq(v_old, v_target, irq);
> > + }
> > + i += 8 - target;
>
> I think this whole loop would be less subtle as:
>
> for ( byte = 0 ; byte < 4 ; byte++ )
> target = find_first_bit((char *)(&tr)+byte, 8)
That would have alignment issues, wouldn't it?
> if ( target > 8 ) continue;
> old_target =...
> }
>
> The use of find_next_bit is not really semantically what is happening
> here.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |