|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen
On 17/12/2020 15:37, Ash Wilding wrote: Hi Julien, Hi, First of all, apologies for the very late reply. Thanks for taking a look at the patches and providing feedback. I've seen your other comments and will reply to those separately when I get a chance (maybe at the weekend or over the Christmas break). RE the differences in ordering semantics between Xen's and Linux's atomics helpers, please find my notes below. Thoughts? Thank you for the very detailed answer, it made a lot easier to understand the differences! I think it would be good to have some (if not all) the content in Documents for future reference. [...] If I am not mistaken, Linux is implementing atomic_cmpxchg() with the *_mb() version. So the semantic would be: ---/---/rel/mem/dmb From Linux:#define __XCHG_CASE(w, sfx, name, sz, mb, nop_lse, acq, acq_lse, rel, cl) \ [...]/* LL/SC */ \ " prfm pstl1strm, %2\n" \ "1: ld" #acq "xr" #sfx "\t%" #w "0, %2\n" \ " st" #rel "xr" #sfx "\t%w1, %" #w "3, %2\n" \ " cbnz %w1, 1b\n" \ " " #mb, \ [...] __XCHG_CASE(w, , mb_, 32, dmb ish, nop, , a, l, "memory") So I think the Linux semantic would be: ---/---/rel/mem/dmb Therefore there would be no inconsistency between Xen and Linux. I have done some digging. The original implementation of atomic_{sub, add}_return was actually using acq/rel semantics up until Linux 3.14. But this was reworked by 8e86f0b409a4 "arm64: atomics: fix use of acquire + release for full barrier semantics". The commit I pointed above contains a lot of details on the issue. For convenience, I copied the most relevant bits below:
"
On arm64, these operations have been incorrectly implemented as follows:
// A, B, C are independent memory locations
<Access [A]>
// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
<Access [C]>
The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).
Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.
"
I think the DMB is only on the success path and there is no barrier on the failure path. The commit 4e39715f4b5c "arm64: cmpxchg: avoid memory barrier on comparison failure" seems to corroborate that. The atomic helpers used in Xen were originally taken from Linux 3.14. Back then, atomic_cmpxchg() were using the two full barriers. This was introduced by the commit I pointed out in (1). This was then optimized with commit 4e39715f4b5c "arm64: cmpxchg: avoid memory barrier on comparison failure". If my understanding is correct the semantics would be (xen vs Linux): - failure: dmb/---/---/---/dmb ---/---/rel/mem/--- - success: dmb/---/---/---/dmb ---/---/rel/mem/dmbI think the success path is not going to be a problem. But we would need to check if all the callers expect a full barrier for the failure path (I would hope not). IIUC, the LSE implementation would be using a single instruction that
has both the acquire and release semantics. Therefore, it would act as a
full barrier.
On the other hand, in the LL/SC implementation, the acquire and release semantics is happening with two different instruction. Therefore, they don't act as a full barrier. So I think the memory ordering is going to be equivalent between Xen and the Linux LSE implementation. Are they really different? In both cases, the helper will act as a full
barrier. The main difference is how this ordering is achieved.
Linux will do the same when the fully ordered version is not implemented (see include/linux/atomic-fallback.h). That's good to know! So far, I haven't heard anyone using Xen 32-bit on Armv8. I would wait until there is a request before introducing a 3rd (4th if counting the ll/sc as 2) implementation for the atomics helpers. That said, the 32-bit port is meant to only be supported on Armv7. It should boot on Armv8, but there is no promise it will be fully functional nor stable. Overall, it looks like to me that re-syncing the atomic implementation with Linux should not be a major problem. We are in the middle of 4.15 freeze, I will try to go through the series ASAP. Cheers, -- Julien Grall
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |