[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/4] xen/x86: Drop erronious barriers
On 07/12/2016 01:20, Stefano Stabellini wrote: > On Wed, 7 Dec 2016, Andrew Cooper wrote: >> On 06/12/2016 20:32, Stefano Stabellini wrote: >>> On Tue, 6 Dec 2016, Stefano Stabellini wrote: >>>> On Tue, 6 Dec 2016, Andrew Cooper wrote: >>>>> On 05/12/2016 19:17, Stefano Stabellini wrote: >>>>>> On Mon, 5 Dec 2016, Andrew Cooper wrote: >>>>>>> None of these barriers serve any purpose, as they are not synchronising >>>>>>> with >>>>>>> any anything on remote CPUs. >>>>>>> >>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>>>>>> --- >>>>>>> CC: Jan Beulich <JBeulich@xxxxxxxx> >>>>>>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> >>>>>>> CC: Julien Grall <julien.grall@xxxxxxx> >>>>>>> >>>>>>> Restricting to just the $ARCH maintainers, as this is a project-wide >>>>>>> sweep. >>>>>>> >>>>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier >>>>>>> usage >>>>>>> from x86, but I don't know whether further development has gained a >>>>>>> dependence >>>>>>> on them. >>>>>> We turned them into smp_wmb already (kudos to IanC). >>>>> Right, but the entire point I am trying to argue is that they are not >>>>> needed in the first place. >>> Just to be clear, on ARM the barriers are unneeded only if it is >>> unimportant that "init stuff" (which correspond to all the >>> initialization done in start_secondary up to smp_wmb) below is completed >>> before "write cpu_online_map". But it looks like we do want to complete >>> mmu, irq, timer initializations and set the current vcpu before marking >>> the cpu as online, right? >> No. I am sorry, but this question suggests that you still don't appear >> to understand barriers. >> >>> >>>> This is the current code: >>>> >>>> CPU 1 CPU 0 >>>> ----- ----- >>>> >>>> init stuff read cpu_online_map >>>> >>>> write barrier >>>> >>>> write cpu_online_map do more initialization >>>> >>>> write barrier >>>> >>>> init more stuff >>>> >>>> >>>> I agree that it's wrong, because the second write barrier in >>>> start_secondary is useless and in addition we are missing a read barrier >>>> in __cpu_up, corresponding to the first write barrier in >>>> start_secondary. >>>> >>>> I think it should look like: >>>> >>>> >>>> CPU 1 CPU 0 >>>> ----- ----- >>>> >>>> init stuff read cpu_online_map >>>> >>>> write barrier read barrier >>>> >>>> write cpu_online_map do more initialization >>>> >>>> init more stuff >> The barriers here serve no purpose, because you have missed an important >> blocking while loop on CPU 0. >> >> Recall, that the read/write barrier example is: >> >> foo = a; >> smp_rmb(); >> bar = b; >> >> and >> >> a = baz; >> smp_wmb(); >> b = fromble; >> >> This is specifically relevant *only* to the shared variables a and b, >> where for correctness an update to a must be observed remotely before >> the update to b. >> >> If you do not have the explicitly same a and b on either side of the >> smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't >> be using the barriers). >> >> >> The init sequence is a different scenario. >> >> Processor 0 spins waiting to observe an update to cpu_online_map. >> >> Processor 1 performs its init sequence, and mid way through, sets its >> own bit in the cpu_online_map. It then continues further init actions. >> >> It does not matter whether processor 0 observes the update to >> cpu_online_map slightly early or late compared to the local-state >> updates from the other parts of processor 1's init sequence (because >> processor 0 had better not be looking at the local-state changes). > In that case of course there is no need for barriers (I wrote something > similar in the other follow-up email). The case I was worried about is > exactly the one where processor 0 looks at one of the changes made by > processor 1. Looking at an isolated change doesn't involve any ordering. Ordering (and therefore barriers) are only relevant when looking at exactly two related changes which need observing in a particular order. If you can reduce the BSP and AP boot sequence down to the two 3-line examples above (with specifically identified variables/aggregates for a and b on both sides), then you can make an argument for barriers being necessary. I should point out that I don't know whether the barriers are necessary or unnecessary on ARM. All I am trying to do is to ensure that everyone has the same understanding of how barriers work before conclusions are drawn (because they really are tricky). ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |