[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] rwlock: remove unneeded subtraction
On 15.02.2022 15:16, Roger Pau Monné wrote: > On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote: >> On 15.02.2022 14:13, Julien Grall wrote: >>> On 15/02/2022 09:39, Roger Pau Monne wrote: >>>> There's no need to subtract _QR_BIAS from the lock value for storing >>>> in the local cnts variable in the read lock slow path: the users of >>>> the value in cnts only care about the writer-related bits and use a >>>> mask to get the value. >>>> >>>> Note that further setting of cnts in rspin_until_writer_unlock already >>>> do not subtract _QR_BIAS. >>> >>> The rwlock is a copy of the Linux implementation. So I looked at the >>> history to find out why _QR_BIAS was substracted. >>> >>> It looks like this was done to get better assembly on x86: >>> >>> commit f9852b74bec0117b888da39d070c323ea1cb7f4c >>> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >>> Date: Mon Apr 18 01:27:03 2016 +0200 >>> >>> locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire() >>> >>> The only reason for the current code is to make GCC emit only the >>> "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on >>> the result), do so nicer. >>> >>> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> >>> Acked-by: Waiman Long <waiman.long@xxxxxxx> >>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> >>> Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> >>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >>> Cc: linux-arch@xxxxxxxxxxxxxxx >>> Cc: linux-kernel@xxxxxxxxxxxxxxx >>> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> >>> >>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c >>> index fec082338668..19248ddf37ce 100644 >>> --- a/kernel/locking/qrwlock.c >>> +++ b/kernel/locking/qrwlock.c >>> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, >>> u32 cnts) >>> * that accesses can't leak upwards out of our subsequent critical >>> * section in the case that the lock is currently held for write. >>> */ >>> - cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS; >>> + cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts); >>> rspin_until_writer_unlock(lock, cnts); >>> >>> /* >>> >>> This is a slowpath, so probably not a concern. But I thought I would >>> double check whether the x86 folks are still happy to proceed with that >>> in mind. >> >> Hmm, that's an interesting observation. Roger - did you inspect the >> generated code? At the very least the description may want amending. > > It seems to always generate the same code for me when using gcc 8.3, > I even tried using arch_fetch_and_add directly, it always results > in: > > ffff82d04022d983: f0 0f c1 03 lock xadd %eax,(%rbx) > ffff82d04022d987: 25 00 30 00 00 and $0x3000,%eax > > Similarly clang 13.0.0 seem to always generate: > > ffff82d0402085de: f0 0f c1 03 lock xadd %eax,(%rbx) > ffff82d0402085e2: 89 c1 mov %eax,%ecx > ffff82d0402085e4: 81 e1 00 30 00 00 and $0x3000,%ecx > > Maybe I'm missing something, but I don't see a difference in the > generated code. I've looked myself in the meantime, and I can largely confirm this. Clang 5 doesn't eliminate the "add" (or really "lea") though. But nevertheless ... > I could add to the commit message: > > "Originally _QR_BIAS was subtracted from the result of > atomic_add_return_acquire in order to prevent GCC from emitting an > unneeded ADD instruction. This being in the lock slow path such > optimizations don't seem likely to make any relevant performance > difference. Also modern GCC and CLANG versions will already avoid > emitting the ADD instruction." ... I'm fine with this as explanation; I'd also be fine adding this to the description while committing. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |