[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/elf: Improve code generation in elf_core_save_regs()


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Tue, 11 Mar 2025 14:21:51 +0000
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 11 Mar 2025 14:21:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 26/02/2025 8:44 am, Jan Beulich wrote:
> On 26.02.2025 08:44, Jan Beulich wrote:
>> On 25.02.2025 23:45, Andrew Cooper wrote:
>>> A CALL with 0 displacement is handled specially, and is why this logic
>>> functions even with CET Shadow Stacks active.  Nevertheless a rip-relative 
>>> LEA
>>> is the more normal way of doing this in 64bit code.
>>>
>>> The retrieval of flags modifies the stack pointer so needs to state a
>>> dependency on the stack pointer.  Despite it's name, ASM_CALL_CONSTRAINT is
>>> the way to do this.
>>>
>>> read_sreg() forces the answer through a register, causing code generation of
>>> the form:
>>>
>>>     mov    %gs, %eax
>>>     mov    %eax, %eax
>>>     mov    %rax, 0x140(%rsi)
>>>
>>> Encode the reads directly with a memory operand.  This results in a 16bit
>>> store instead of an 64bit store, but the backing memory is zeroed.
>> Raises the question whether we shouldn't change read_sreg(). At least the
>> emulator uses of it would also benefit from storing straight to memory. And
>> the remaining uses ought to be optimizable by the compiler, except that I
>> don't expect we'd be able to express the zero-extending nature when the
>> destination is a register. Or wait, maybe I can make up something (whether
>> that's going to be liked is a separate question).
> Here you go.
>
> Jan
>
> x86: make read_sreg() "bi-modal"
>
> Permit use sites to control whether to store directly to memory; right
> now both elf_core_save_regs() and the insn emulator's put_fpu()
> needlessly go through an intermediate GPR. Note that in both cases the
> apparent loss of zero-extension isn't a problem: The fields written to
> start out zero-initialized anyway.
>
> No change in generated code for the use sites not being touched.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> Whether to make the change to put_fpu() is up for discussion: In my
> build it increases code size slightly, despite the reduction of number
> of insns emitted. An alternative (leaving the decision to the compiler)
> might be to drop the if() and use "=g" as constraint.
>
> I was considering to omit the assignment to sel_ on the if() branch,
> expecting the compiler to then flag uses of the return value (as
> consuming uninitialized data) when a 2nd argument is passed. However,
> gcc14 then already flags the "sel_;" at the end of the macro as
> consuming uninitialized data.
>
> --- a/xen/arch/x86/include/asm/regs.h
> +++ b/xen/arch/x86/include/asm/regs.h
> @@ -16,10 +16,20 @@
>      !diff || ((r)->cs != __HYPERVISOR_CS);                                   
>  \
>  })
>  
> -#define read_sreg(name) ({                           \
> -    unsigned int __sel;                              \
> -    asm ( "mov %%" STR(name) ",%0" : "=r" (__sel) ); \
> -    __sel;                                           \
> +#define read_sreg(name, dst...) ({                       \
> +    unsigned int sel_;                                   \
> +    BUILD_BUG_ON(count_args(dst) > 1);                   \
> +    if ( count_args(dst) )                               \
> +    {                                                    \
> +        typeof(LASTARG(&sel_, ## dst)) dst_ =            \
> +            LASTARG(&sel_, ## dst);                      \
> +        asm ( "mov %%" STR(name) ",%0" : "=m" (*dst_) ); \
> +        /* The compiler ought to optimize this out. */   \
> +        sel_ = *dst_;                                    \
> +    }                                                    \
> +    else                                                 \
> +        asm ( "mov %%" STR(name) ",%0" : "=r" (sel_) );  \
> +    sel_;                                                \
>  })

This doesn't fix the register promotion problem.  That can be fixed by
unsigned long rather than int, as you did for rdmsr. 
https://godbolt.org/z/K5hKz7KvM

But the fundamental problem is that the sreg instructions with mem16
encodings are weird.  They don't even follow normal x86 rules for
operand size.

By the end of the FRED series (for which this patch was misc cleanup),
I've almost removed read_sreg(), and was intending to purge it
completely.  Even in it's current form, it's not normal C semantics,
because it looks to take a variable which isn't a variable.

Clever as this trick is, I feel it's a backwards step in terms of
legibility, and that plain asm()'s are the lesser evil when it comes to
mem16 instructions.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.