Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/atomic: Improvements and simplifications to assembly constraints

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 22 Nov 2018 12:38:56 +0000

Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 22 Nov 2018 12:39:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 22/11/2018 08:57, Jan Beulich wrote: > >>> On 21.11.18 at 20:37, <andrew.cooper3@xxxxxxxxxx> wrote: >> * Some of the single-byte versions specify "=q" as the output. AFAICT, there >> was not a legitimate reason to restrict the use of %esi/%edi in the 32-bit >> build. Either way, in 64-bit, it is equivelent to "=r". > I'm confused about the 32-bit part here: Of course it was necessary > to restrict the compiler to the low 4 registers in that case. It's just > not clear to me whether you've just written it down wrongly, or > whether you indeed think the way it reads to me. Wait - are you saying that the combination of "=r" and %b0 would actually fail to build if the compiler happened to chose %edi/%esi? Now that you point it out, I can see why %esi/%edi aren't actually encodable in this circumstance, but surely the fact that the compiler has to pick a byte register means that it wouldn't end up choosing these? > >> * Constraints in the form "=r" (x) : "0" (x) can be folded to just "+r" (x) >> * Switch to using named parameters (mostly for legibility) which in >> particular helps with... >> * __xchg(), __cmpxchg() and __cmpxchg_user() modify their memory operand, so >> must list it as an output operand. This only works because they each have >> a memory clobber to give the construct full compiler-barrier properties. >> * Every memory operand has an explicit known size. Letting the compiler see >> the real size rather than obscuring it with __xg() allows for the removal >> of the instruction size suffixes without introducing ambiguity. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> --- >> CC: Jan Beulich <JBeulich@xxxxxxxx> >> CC: Wei Liu <wei.liu2@xxxxxxxxxx> >> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> >> Interestingly, switching to use output memory operands has the following >> perturbance in the build: >> >> add/remove: 0/0 grow/shrink: 3/5 up/down: 70/-124 (-54) >> Function old new delta >> do_mmu_update 7041 7101 +60 >> mctelem_process_deferred 234 242 +8 >> cpufreq_governor_dbs 851 853 +2 >> _set_status 162 161 -1 >> create_irq 325 323 -2 >> do_tmem_put 2066 2062 -4 >> task_switch_load_seg 892 884 -8 >> _get_page_type 6057 5948 -109 >> >> but as far as I can tell, it is exclusively down to different register >> scheduling choices. >> --- >> xen/include/asm-x86/system.h | 99 >> +++++++++++++++++-------------------- >> xen/include/asm-x86/x86_64/system.h | 24 ++++----- >> 2 files changed, 57 insertions(+), 66 deletions(-) >> >> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h >> index 483cd20..8764e31 100644 >> --- a/xen/include/asm-x86/system.h >> +++ b/xen/include/asm-x86/system.h >> @@ -23,9 +23,6 @@ >> #define xchg(ptr,v) \ >> ((__typeof__(*(ptr)))__xchg((unsigned long)(v),(ptr),sizeof(*(ptr)))) >> >> -struct __xchg_dummy { unsigned long a[100]; }; >> -#define __xg(x) ((volatile struct __xchg_dummy *)(x)) > I never fully understood why we have this, so I'm happy to see it > go away. I see it has gone away in Linux back in 2.6.36. I've been trying to get rid of it for several years now. I'm glad to see it gone. > >> @@ -79,31 +72,27 @@ static always_inline unsigned long __cmpxchg( >> switch ( size ) >> { >> case 1: >> - asm volatile ( "lock; cmpxchgb %b1,%2" >> - : "=a" (prev) >> - : "q" (new), "m" (*__xg(ptr)), >> - "0" (old) >> + asm volatile ( "lock; cmpxchg %b[new], %[ptr]" >> + : "=a" (prev), [ptr] "+m" (*(uint8_t *)ptr) >> + : [new] "r" (new), "0" (old) > Any reason you retain the reference by number in the input > constraint here, rather than giving its corresponding output > one a name? Not specifically. I suppose this is doable because the constraint is an explicitly register. > > Also since you're playing with this anyway - is there a need to > retain the bogus ; after "lock"? Ok. > >> --- a/xen/include/asm-x86/x86_64/system.h >> +++ b/xen/include/asm-x86/x86_64/system.h >> @@ -25,7 +25,7 @@ static always_inline __uint128_t __cmpxchg16b( >> >> /* Don't use "=A" here - clang can't deal with that. */ >> asm volatile ( "lock; cmpxchg16b %2" > Any reason not to change this to named operands as well? Simply code perturbance, and the fact that it isn't exactly ambiguous to begin with. I can change it. > >> @@ -63,36 +63,38 @@ static always_inline __uint128_t cmpxchg16b_local_( >> * If no fault occurs then _o is updated to the value we saw at _p. If this >> * is the same as the initial value of _o then _n is written to location _p. >> */ >> -#define __cmpxchg_user(_p,_o,_n,_isuff,_oppre,_regtype) \ >> +#define __cmpxchg_user(_p, _o, _n, _oppre) \ >> stac(); \ >> asm volatile ( \ >> - "1: lock; cmpxchg"_isuff" %"_oppre"2,%3\n" \ >> + "1: lock; cmpxchg %"_oppre"[new], %[ptr]\n" \ >> "2:\n" \ >> ".section .fixup,\"ax\"\n" \ >> "3: movl $1,%1\n" \ > Any what about this? I'm certain that I fix that at one point. I must have lost it in a rebase. > >> " jmp 2b\n" \ >> ".previous\n" \ >> _ASM_EXTABLE(1b, 3b) \ >> - : "=a" (_o), "=r" (_rc) \ >> - : _regtype (_n), "m" (*__xg((volatile void *)_p)), "0" (_o), "1" >> (0) \ >> + : "+a" (_o), "=r" (_rc), \ >> + [ptr] "+m" (*(volatile typeof(*(_p)) *)(_p)) \ > Does casting to add "volatile" here really make any difference, > considering the asm() itself is a volatile one and has a memory > clobber? Yes. mod_l1_entry() hits a BUG() without it. Until I understand why, I purposefully didn't change the volatility of any of these constructs in what is otherwise a cleanup patch. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.