Hello Julien,
On 24.10.18 17:41, Julien Grall wrote:
> vGIC is not only about GICv2. It also covers GICv3 that is accessible
> via system registers.
Yes, I know. But, as you state below, for GICv2 based systems those
barriers are not needed.
>>> rely on a context synchronising
>>> operation on the CPU to ensure that the updated status register is
>>> visible to the CPU when handling the interrupt. This usually happens as
>>> a result of taking the IRQ exception in the first place, but there are
>>> two race scenarios where this isn't the case.
>>>
>>> For example, let's say we have two peripherals (X and Y), where Y
>>> uses a
>>> system register for its interrupt status.
>>>
>>> Case 1:
>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt
>>> 2. Y then raises its interrupt line, but the update to its system
>>> register is not yet visible to the CPU
>>> 3. The GIC decides to expose Y's interrupt number first in the Ack
>>> register
>>> 4. The CPU runs the IRQ handler for Y, but the status register is stale
>> But this scenario somehow explains a strange thing I saw during IRQ
>> latency investigation (optimization attempts) during last weeks.
>> The strange sequence is as following:
>> 1. CPU takes an IRQ exception from the guest context
>> 2. In `enter_hypervisor_head()` function (i.e. in
>> `gic_clear_lrs()`
>> as for mainline) from some LR registers interrupt statuses are read as
>> PENDING
>> 3. Performing further code, without returning to the guest (i.e.
>> inside vgic_vcpu_inject_irq()), it happens that we read status INVALID
>> from the LR we read PENDING before, in step 2. >
>> Please note that we are tailoring xen based on RELEASE-4.10.0.
>
> As you tailor Xen, I need more details on your modification to be able
> to provide feedback on the oddness you encounter.
I guess I should make a dedicated patch applicable to mainline to reveal
the issue. I hope I'll do this nearest days.
> But you are using GICv2, right? If so, you are not using system
> registers for the vGIC. This is not covered by this patch.
Yep, our SoC has GICv2.
>>> Case 2:
>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt
>>> 2. CPU reads the interrupt number for X from the Ack register and runs
>>> its IRQ handler
>>> 3. Y raises its interrupt line and the Ack register is updated, but
>>> again, the update to its system register is not yet visible to the
>>> CPU.
>>> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt
>>> number and run its handler without a context synchronisation
>>> operation, therefore seeing the stale register value.
>>>
>>> In either case, we run the risk of missing an IRQ. This patch solves
>>> the
>>> problem by ensuring that we execute an ISB in the GIC drivers prior
>>> to invoking the interrupt handler.
>>>
>>> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
>>> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".
>>>
>>> Signed-off-by: Julien Grall <julien.grall@xxxxxxx>
>>>
>>> ---
>>> This patch is a candidate for backporting up to Xen 4.9.
>>> ---
>>> xen/arch/arm/gic.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> index 8d7e491060..305fbd66dd 100644
>>> --- a/xen/arch/arm/gic.c
>>> +++ b/xen/arch/arm/gic.c
>>> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs,
>>> int is_fiq)
>>> if ( likely(irq >= 16 && irq < 1020) )
>>> {
>>> local_irq_enable();
>>> + isb();
>> Taking in account that the first GICH accesses are from
>> `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest
>> moving isb() there.
Any comments here?
>>
>>> do_IRQ(regs, irq, is_fiq);
>>> local_irq_disable();
>>> }
>>> else if ( is_lpi(irq) )
>>> {
>>> local_irq_enable();
>>> + isb();
>>> gic_hw_ops->do_LPI(irq);
>>> local_irq_disable();
>>> }
>>
>
> Cheers,
>
--
*Andrii Anisov*
|