Xen project Mailing List

Re: [PATCH] rwlock: remove unneeded subtraction

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 15 Feb 2022 15:45:00 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/YUQRnu7FAeGZcxpUphTiHuUlTf3kZQccc3Fm3/StxY=; b=GkMXJcLpzucr/C4ptqEFF66MyDAwvliFeEiijxP1wP56q0AVdSd2Dd8scLqWWoGiSeR2Sg3w3Nbtzm00BY/VGdE7fQqC9Se7K0QkN1/Z5vlpvLhJDqYqFL6e6bzIrA+4bK0Oy7+JeiTDZ1i5mdOR3gKlgestOy4FMcvfauNdji6uosqTmDx3+9pmIsHDFo3KcZwSB2x6iUHjXkxfvszluNVHtFj9njqtkNdM72JJvcb3eh/hGrKL/UeyAys4NFoONUl4yjSKBW9fifxFU6Zlpl4MQb6l05QqiF7rBGHeER8dP6lh3yQAKfvC6gj9wbbFw7KtIKlttJaaE6RNiHV4OQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HGPa7WUX3UeWjpE1W0/XisSSiUlFYCmd52G9D986Ij4m//qSAyrYMHob+nD8l61M8LGx9USj5Rs3ZxTRFpFlywER++ZO1eTZMkIO11xxUycO+AZERU8fp7PDgNYob9C6Q2c/0VvixpTrUV9rYqV/2oTR95boWKdYut1t0II+YToOuC2ctDEZkanZt5TgVq6eXidJ9PqJo4zKd49AK4T18eMwqRC3RElEjrzXGcE6WX67hvsyrhJoUi9QF9Xq6p/ANkxLAut0MWvGLQzI57CEhaoFHaLvujHIux3QJUnh9Yd0g2/0xeWzYMwJiKohQRto3D/1ArKpkbYG+OkoSJcycA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Julien Grall <julien@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 15 Feb 2022 14:45:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 15.02.2022 15:16, Roger Pau Monné wrote: > On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote: >> On 15.02.2022 14:13, Julien Grall wrote: >>> On 15/02/2022 09:39, Roger Pau Monne wrote: >>>> There's no need to subtract _QR_BIAS from the lock value for storing >>>> in the local cnts variable in the read lock slow path: the users of >>>> the value in cnts only care about the writer-related bits and use a >>>> mask to get the value. >>>> >>>> Note that further setting of cnts in rspin_until_writer_unlock already >>>> do not subtract _QR_BIAS. >>> >>> The rwlock is a copy of the Linux implementation. So I looked at the >>> history to find out why _QR_BIAS was substracted. >>> >>> It looks like this was done to get better assembly on x86: >>> >>> commit f9852b74bec0117b888da39d070c323ea1cb7f4c >>> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >>> Date: Mon Apr 18 01:27:03 2016 +0200 >>> >>> locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire() >>> >>> The only reason for the current code is to make GCC emit only the >>> "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on >>> the result), do so nicer. >>> >>> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> >>> Acked-by: Waiman Long <waiman.long@xxxxxxx> >>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> >>> Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> >>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >>> Cc: linux-arch@xxxxxxxxxxxxxxx >>> Cc: linux-kernel@xxxxxxxxxxxxxxx >>> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> >>> >>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c >>> index fec082338668..19248ddf37ce 100644 >>> --- a/kernel/locking/qrwlock.c >>> +++ b/kernel/locking/qrwlock.c >>> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, >>> u32 cnts) >>> * that accesses can't leak upwards out of our subsequent critical >>> * section in the case that the lock is currently held for write. >>> */ >>> - cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS; >>> + cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts); >>> rspin_until_writer_unlock(lock, cnts); >>> >>> /* >>> >>> This is a slowpath, so probably not a concern. But I thought I would >>> double check whether the x86 folks are still happy to proceed with that >>> in mind. >> >> Hmm, that's an interesting observation. Roger - did you inspect the >> generated code? At the very least the description may want amending. > > It seems to always generate the same code for me when using gcc 8.3, > I even tried using arch_fetch_and_add directly, it always results > in: > > ffff82d04022d983: f0 0f c1 03 lock xadd %eax,(%rbx) > ffff82d04022d987: 25 00 30 00 00 and $0x3000,%eax > > Similarly clang 13.0.0 seem to always generate: > > ffff82d0402085de: f0 0f c1 03 lock xadd %eax,(%rbx) > ffff82d0402085e2: 89 c1 mov %eax,%ecx > ffff82d0402085e4: 81 e1 00 30 00 00 and $0x3000,%ecx > > Maybe I'm missing something, but I don't see a difference in the > generated code. I've looked myself in the meantime, and I can largely confirm this. Clang 5 doesn't eliminate the "add" (or really "lea") though. But nevertheless ... > I could add to the commit message: > > "Originally _QR_BIAS was subtracted from the result of > atomic_add_return_acquire in order to prevent GCC from emitting an > unneeded ADD instruction. This being in the lock slow path such > optimizations don't seem likely to make any relevant performance > difference. Also modern GCC and CLANG versions will already avoid > emitting the ADD instruction." ... I'm fine with this as explanation; I'd also be fine adding this to the description while committing. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.