[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lib32 followup (visibility issue and/or toolchain bug?)



On Wed, Oct 16, 2024 at 3:53 PM Frediano Ziglio
<frediano.ziglio@xxxxxxxxx> wrote:
>
> On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 
> wrote:
> >
> > Hello,
> >
> > Preempting some future work I'm expecting to arrive, I had a go at using
> > __builtin_*() in obj32.
> >
> > This is formed of 2 patches on top of this series:
> > https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32
> >
>
> You are confident we'll have a lot of shared code to need an
> additional "lib32" in the Makefile!
> I would personally stick with obj32.
> Note that file should be strlen.c, not strlen.32.c, otherwise you are
> possibly going to pick up the general rule and not the one in the
> Makefile (but maybe is what you wanted).
>
> > Patch 1 introduces lib32 beside obj32, with strlen() being the first
> > broken-out function, and patch 2 swaps to __builtin_strlen().
> >
> > Both compile, but the difference that patch 2 introduces was unexpected.
> >
> > With just lib32, and taking strsubcmp() as an example, we get:
> >
> > 00000000 <strsubcmp>:
> >    0:        83 ec 0c                 sub    $0xc,%esp
> >    3:        89 5c 24 04              mov    %ebx,0x4(%esp)
> >    7:        89 74 24 08              mov    %esi,0x8(%esp)
> >    b:        89 c6                    mov    %eax,%esi
> >    d:        89 d3                    mov    %edx,%ebx
> >    f:        89 d0                    mov    %edx,%eax
> >   11:    /-- e8 fc ff ff ff           call   12 <strsubcmp+0x12>
> >             12: R_386_PC32    strlen
> >   16:        89 c1                    mov    %eax,%ecx
> >   18:        89 da                    mov    %ebx,%edx
> >   1a:        89 f0                    mov    %esi,%eax
> >   1c:    /-- e8 fc ff ff ff           call   1d <strsubcmp+0x1d>
> >             1d: R_386_PC32    .text.strncmp
> >   21:        8b 5c 24 04              mov    0x4(%esp),%ebx
> >   25:        8b 74 24 08              mov    0x8(%esp),%esi
> >   29:        83 c4 0c                 add    $0xc,%esp
> >   2c:        c3                       ret
> >
> > which all seems fine.  We get a plain PC32 relocation to strlen (which
> > is now in the separate library).
> >
> > However, with patch 2 in place (simply swapping the plain extern for
> > __builtin_strlen(), we now get:
> >
> > 00000000 <strsubcmp>:
> >    0:        83 ec 0c                 sub    $0xc,%esp
> >    3:        89 1c 24                 mov    %ebx,(%esp)
> >    6:        89 74 24 04              mov    %esi,0x4(%esp)
> >    a:        89 7c 24 08              mov    %edi,0x8(%esp)
> >    e:    /-- e8 fc ff ff ff           call   f <strsubcmp+0xf>
> >             f: R_386_PC32    __x86.get_pc_thunk.bx
> >   13:        81 c3 02 00 00 00        add    $0x2,%ebx
> >             15: R_386_GOTPC    _GLOBAL_OFFSET_TABLE_
> >   19:        89 c7                    mov    %eax,%edi
> >   1b:        89 d6                    mov    %edx,%esi
> >   1d:        89 d0                    mov    %edx,%eax
> >   1f:    /-- e8 fc ff ff ff           call   20 <strsubcmp+0x20>
> >             20: R_386_PLT32    strlen
>
> PLT means it not declared hidden, otherwise it would have used the
> relative relocation.
> Maybe
>
> size_t strlen(const char *s);
> #define strlen(s) __builtin_strlen(s)
>
> xen/compiler.h is included, so all declaration should get the hidden
> by default ? Or add __attribute__((visibility("hidden"))) explicitly.
>
> >   24:        89 c1                    mov    %eax,%ecx
> >   26:        89 f2                    mov    %esi,%edx
> >   28:        89 f8                    mov    %edi,%eax
> >   2a:    /-- e8 fc ff ff ff           call   2b <strsubcmp+0x2b>
> >             2b: R_386_PC32    .text.strncmp
> >   2f:        8b 1c 24                 mov    (%esp),%ebx
> >   32:        8b 74 24 04              mov    0x4(%esp),%esi
> >   36:        8b 7c 24 08              mov    0x8(%esp),%edi
> >   3a:        83 c4 0c                 add    $0xc,%esp
> >   3d:        c3                       ret
> >
> >
> > The builtin hasn't managed to optimise away the call to strlen (that's
> > fine).  But, we've ended up spilling %ebx to the stack, calculating the
> > location of the GOT and not using it.
> >
>
> Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it.
>
> > So, as it stands, trying to use __builtin_strlen() results in worse code
> > generation.  One thing I noticed was that we're not passing
> > -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help
> > either.  We do have the pragma from compiler.h, so I'm out of visibility
> > ideas.
> >
>
> The -fvisibility=hidden should be overridden by the xen/compiler.h;
> but should be overridden with hidden!
> Maybe strlen is defined by default with another visibility?
> If you generate the assembly, you should see if the strlen symbol gets
> the .hidden bless or not.
>

I did some more experiments, but I didn't manage to fix the issue. It
looks like when __builtin_strlen fallback to calling strlen it uses a
private declaration of strlen. -mregparam argument is taken into
account, but not visibility. I tried to declare strlen as hidden
(checking also preprocessor output to see if other declaration were
present, none found), still the @PLT. I tried to add a "static inline
strlen"... and it was not used.

I found this workaround:

diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c
index 80ffd0885e..ac2b0b0a4d 100644
--- a/xen/arch/x86/boot/cmdline.c
+++ b/xen/arch/x86/boot/cmdline.c
@@ -51,7 +51,12 @@ static const char delim_chars_comma[] = ", \n\r\t";

#define delim_chars    (delim_chars_comma + 1)

-#define strlen(s) __builtin_strlen(s)
+size_t strlen(const char *s);
+#define strlen(str) \
+  (__extension__ (__builtin_constant_p(str) \
+                 ? __builtin_strlen(str)    \
+                 : strlen(str)))
+

static int strncmp(const char *cs, const char *ct, size_t count)
{
diff --git a/xen/arch/x86/boot/strlen.32.c b/xen/arch/x86/boot/strlen.c
similarity index 100%
rename from xen/arch/x86/boot/strlen.32.c
rename to xen/arch/x86/boot/strlen.c

Frediano



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.