[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 01/34] ARM: vGIC: avoid rank lock when reading priority



Hi,

On 14/06/17 09:55, Julien Grall wrote:
> 
> 
> On 06/13/2017 11:19 PM, Stefano Stabellini wrote:
>> On Tue, 13 Jun 2017, Julien Grall wrote:
>>> On 12/06/2017 23:34, Stefano Stabellini wrote:
>>>> On Mon, 12 Jun 2017, Julien Grall wrote:
>>>>> Hi Andre,
>>>>>
>>>>> On 09/06/17 18:41, Andre Przywara wrote:
>>>>>> When reading the priority value of a virtual interrupt, we were
>>>>>> taking
>>>>>> the respective rank lock so far.
>>>>>> However for forwarded interrupts (Dom0 only so far) this may lead
>>>>>> to a
>>>>>> deadlock with the following call chain:
>>>>>> - MMIO access to change the IRQ affinity, calling the ITARGETSR
>>>>>> handler
>>>>>> - this handler takes the appropriate rank lock and calls
>>>>>> vgic_store_itargetsr()
>>>>>> - vgic_store_itargetsr() will eventually call vgic_migrate_irq()
>>>>>> - if this IRQ is already in-flight, it will remove it from the old
>>>>>>    VCPU and inject it into the new one, by calling
>>>>>> vgic_vcpu_inject_irq()
>>>>>> - vgic_vcpu_inject_irq will call vgic_get_virq_priority()
>>>>>> - vgic_get_virq_priority() tries to take the rank lock - again!
>>>>>> It seems like this code path has never been exercised before.
>>>>>>
>>>>>> Fix this by avoiding taking the lock in vgic_get_virq_priority()
>>>>>> (like
>>>>>> we
>>>>>> do in vgic_get_target_vcpu()).
>>>>>> Actually we are just reading one byte, and priority changes while
>>>>>> interrupts are handled are a benign race that can happen on real
>>>>>> hardware
>>>>>> too. So it is safe to just prevent the compiler from reading from the
>>>>>> struct more than once.
>>>>>>
>>>>>> Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
>>>>>> ---
>>>>>>   xen/arch/arm/vgic-v2.c | 13 ++++++++-----
>>>>>>   xen/arch/arm/vgic-v3.c | 11 +++++++----
>>>>>>   xen/arch/arm/vgic.c    |  8 +-------
>>>>>>   3 files changed, 16 insertions(+), 16 deletions(-)
>>>>>>
>>>>>> diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
>>>>>> index dc9f95b..5370020 100644
>>>>>> --- a/xen/arch/arm/vgic-v2.c
>>>>>> +++ b/xen/arch/arm/vgic-v2.c
>>>>>> @@ -258,9 +258,9 @@ static int vgic_v2_distr_mmio_read(struct vcpu
>>>>>> *v,
>>>>>> mmio_info_t *info,
>>>>>>           if ( rank == NULL ) goto read_as_zero;
>>>>>>
>>>>>>           vgic_lock_rank(v, rank, flags);
>>>>>> -        ipriorityr = rank->ipriorityr[REG_RANK_INDEX(8,
>>>>>> -                                                     gicd_reg -
>>>>>> GICD_IPRIORITYR,
>>>>>> -                                                     DABT_WORD)];
>>>>>> +        ipriorityr = ACCESS_ONCE(rank->ipriorityr[REG_RANK_INDEX(8,
>>>>>> +                                     gicd_reg - GICD_IPRIORITYR,
>>>>>> +                                     DABT_WORD)]);
>>>>>
>>>>> The indentation is a bit odd. Can you introduce a temporary
>>>>> variable here?
>>>>>
>>>>>>           vgic_unlock_rank(v, rank, flags);
>>>>>>           *r = vgic_reg32_extract(ipriorityr, info);
>>>>>>
>>>>>> @@ -499,7 +499,7 @@ static int vgic_v2_distr_mmio_write(struct
>>>>>> vcpu *v,
>>>>>> mmio_info_t *info,
>>>>>>
>>>>>>       case VRANGE32(GICD_IPRIORITYR, GICD_IPRIORITYRN):
>>>>>>       {
>>>>>> -        uint32_t *ipriorityr;
>>>>>> +        uint32_t *ipriorityr, priority;
>>>>>>
>>>>>>           if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD )
>>>>>> goto
>>>>>> bad_width;
>>>>>>           rank = vgic_rank_offset(v, 8, gicd_reg - GICD_IPRIORITYR,
>>>>>> DABT_WORD);
>>>>>> @@ -508,7 +508,10 @@ static int vgic_v2_distr_mmio_write(struct
>>>>>> vcpu *v,
>>>>>> mmio_info_t *info,
>>>>>>           ipriorityr = &rank->ipriorityr[REG_RANK_INDEX(8,
>>>>>>                                                         gicd_reg -
>>>>>> GICD_IPRIORITYR,
>>>>>>                                                         DABT_WORD)];
>>>>>> -        vgic_reg32_update(ipriorityr, r, info);
>>>>>> +        priority = ACCESS_ONCE(*ipriorityr);
>>>>>> +        vgic_reg32_update(&priority, r, info);
>>>>>> +        ACCESS_ONCE(*ipriorityr) = priority;
>>>>>
>>>>> This is a bit odd to read because of the dereferencing. I admit that I
>>>>> would
>>>>> prefer if you use read_atomic/write_atomic which are easier to
>>>>> understand
>>>>> (though the naming is confusing).
>>>>>
>>>>> Let see what Stefano thinks here.
>>>>
>>>> I also prefer *_atomic, especially given what Jan wrote about
>>>> ACCESS_ONCE:
>>>>
>>>>    Plus ACCESS_ONCE() doesn't enforce a single instruction to be
>>>> used in
>>>>    the resulting assembly).
>>>
>>> I don't buy this argument. There are quite a few places we rely on
>>> assignment
>>> to be atomic (see PV protocols for instance).
>>
>> I don't understand your explanation. There are no PV protocols under
>> xen/, they are implemented in other repositories. I grepped for ACCESS
>> under xen/include/public, in case you referred to the PV protocol
>> headers, but couldn't find anything interesting.
> 
> Have a look at the pl011 emulation from Bhupinder. It will use plain '='
> for updating the PV drivers. So can you explain why it is fine there and
> not here?
> 
>>
>>
>>> Furthermore implementation of
>>> atomic_read/atomic_write in Linux (both ARM and x86) is based on
>>> WRITE_ONCE/READ_ONCE, on Xen it is a simple assignment.
>>
>> I don't follow why you are referring to Linux constructs in this
>> discussion about Xen atomic functions.
> 
> My point here is Xen and Linux are very similar. Actually a lot of the
> atomic code is been taken from Linux (have a look to our
> atomic_read/atomic_write).
> 
> As the atomic code was added the code by you (likely from Linux), I
> don't understand why you don't complain about the atomic implementation
> but ACCESS_ONCE.
> 
>>
>>
>>> In any case, all those macros does not prevent re-ordering at the
>>> processor
>>> level nor read/write atomicity if the variable is misaligned.
>>
>> My understanding is that the unwritten assumption in Xen is that
>> variables are always aligned. You are right about processor level
>> reordering, in fact when needed we have to have barriers
>>
>> I have read Andre's well written README.atomic, and he ends the
>> document stating the following:
>>
>>
>>> This makes read and write accesses to ints and longs (and their
>>> respective
>>> unsigned counter parts) naturally atomic.
>>> However it would be beneficial to use atomic primitives anyway to
>>> annotate
>>> the code as being concurrent and to prevent silent breakage when
>>> changing
>>> the code
>>
>> with which I completely agree
> 
> Which means you are happy to use either ACCESS_ONCE or
> read_atomic/write_atomic as they in-fine exactly the same on the
> compiler we support.

So I think in this case it's not the same.
read_atomic() would prevent the compiler from breaking up the assignment
into multiple reads, which indeed does not seem necessary here (as it is
aligned, especially due to being a member of an array).
But we have to prevent the compiler from reading the variable *multiple
times* (with the value possibly changing in-between), so ACCESS_ONCE()
is needed, IMHO. It is beyond me to see through how the
vgic_reg32_extract/vgic_reg32_update macro expansion actually ends up,
but I reckon the compiler can break this up and inline it and would be
*allowed* to dereference the pointer several times.
Having said this, in practise it may not be relevant though, as the
compiler probably will only read it once anyway, because a read is
deemed comparably expensive and there is hardly any register pressure on
arm64. But for documenting that this is a lockless read outside of the
lock I'd rather have some annotation in the code.

As for the readability argument of "ACCESS_ONCE(*ptr) = foo;":
I agree, but was wondering why we don't have Linux' READ_ONCE/WRITE_ONCE
pairs in the first place. I see that the implementation of them is
nicely merged into one macro, but maybe we should have wrappers to
improve readability? Not sure that's something for this patch, though.
So can we postpone this for the rework? This will be touched anyway
because it's dealing with a rank.

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.