[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] xen/x86: Drop erronious barriers



On 07/12/2016 01:20, Stefano Stabellini wrote:
> On Wed, 7 Dec 2016, Andrew Cooper wrote:
>> On 06/12/2016 20:32, Stefano Stabellini wrote:
>>> On Tue, 6 Dec 2016, Stefano Stabellini wrote:
>>>> On Tue, 6 Dec 2016, Andrew Cooper wrote:
>>>>> On 05/12/2016 19:17, Stefano Stabellini wrote:
>>>>>> On Mon, 5 Dec 2016, Andrew Cooper wrote:
>>>>>>> None of these barriers serve any purpose, as they are not synchronising 
>>>>>>> with
>>>>>>> any anything on remote CPUs.
>>>>>>>
>>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>> ---
>>>>>>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>>>>>>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
>>>>>>> CC: Julien Grall <julien.grall@xxxxxxx>
>>>>>>>
>>>>>>> Restricting to just the $ARCH maintainers, as this is a project-wide 
>>>>>>> sweep.
>>>>>>>
>>>>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier 
>>>>>>> usage
>>>>>>> from x86, but I don't know whether further development has gained a 
>>>>>>> dependence
>>>>>>> on them.
>>>>>> We turned them into smp_wmb already (kudos to IanC).
>>>>> Right, but the entire point I am trying to argue is that they are not
>>>>> needed in the first place.
>>> Just to be clear, on ARM the barriers are unneeded only if it is
>>> unimportant that "init stuff" (which correspond to all the
>>> initialization done in start_secondary up to smp_wmb) below is completed
>>> before "write cpu_online_map". But it looks like we do want to complete
>>> mmu, irq, timer initializations and set the current vcpu before marking
>>> the cpu as online, right?
>> No.  I am sorry, but this question suggests that you still don't appear
>> to understand barriers.
>>
>>>
>>>> This is the current code:
>>>>
>>>>     CPU 1                                  CPU 0
>>>>     -----                                  -----
>>>>
>>>>     init stuff                             read cpu_online_map
>>>>
>>>>     write barrier                          
>>>>
>>>>     write cpu_online_map                   do more initialization
>>>>
>>>>     write barrier
>>>>
>>>>     init more stuff
>>>>
>>>>
>>>> I agree that it's wrong, because the second write barrier in
>>>> start_secondary is useless and in addition we are missing a read barrier
>>>> in __cpu_up, corresponding to the first write barrier in
>>>> start_secondary.
>>>>
>>>> I think it should look like:
>>>>
>>>>
>>>>     CPU 1                                  CPU 0
>>>>     -----                                  -----
>>>>
>>>>     init stuff                             read cpu_online_map
>>>>
>>>>     write barrier                          read barrier 
>>>>
>>>>     write cpu_online_map                   do more initialization
>>>>
>>>>     init more stuff
>> The barriers here serve no purpose, because you have missed an important
>> blocking while loop on CPU 0.
>>
>> Recall, that the read/write barrier example is:
>>
>> foo = a;
>> smp_rmb();
>> bar = b;
>>
>> and
>>
>> a = baz;
>> smp_wmb();
>> b = fromble;
>>
>> This is specifically relevant *only* to the shared variables a and b,
>> where for correctness an update to a must be observed remotely before
>> the update to b.
>>
>> If you do not have the explicitly same a and b on either side of the
>> smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't
>> be using the barriers).
>>
>>
>> The init sequence is a different scenario.
>>
>> Processor 0 spins waiting to observe an update to cpu_online_map.
>>
>> Processor 1 performs its init sequence, and mid way through, sets its
>> own bit in the cpu_online_map.  It then continues further init actions.
>>
>> It does not matter whether processor 0 observes the update to
>> cpu_online_map slightly early or late compared to the local-state
>> updates from the other parts of processor 1's init sequence (because
>> processor 0 had better not be looking at the local-state changes).
> In that case of course there is no need for barriers (I wrote something
> similar in the other follow-up email). The case I was worried about is
> exactly the one where processor 0 looks at one of the changes made by
> processor 1.

Looking at an isolated change doesn't involve any ordering.

Ordering (and therefore barriers) are only relevant when looking at
exactly two related changes which need observing in a particular order. 
If you can reduce the BSP and AP boot sequence down to the two 3-line
examples above (with specifically identified variables/aggregates for a
and b on both sides), then you can make an argument for barriers being
necessary.

I should point out that I don't know whether the barriers are necessary
or unnecessary on ARM.  All I am trying to do is to ensure that everyone
has the same understanding of how barriers work before conclusions are
drawn (because they really are tricky).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.