[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lib32 followup (visibility issue and/or toolchain bug?)



On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>
> Hello,
>
> Preempting some future work I'm expecting to arrive, I had a go at using
> __builtin_*() in obj32.
>
> This is formed of 2 patches on top of this series:
> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32
>

You are confident we'll have a lot of shared code to need an
additional "lib32" in the Makefile!
I would personally stick with obj32.
Note that file should be strlen.c, not strlen.32.c, otherwise you are
possibly going to pick up the general rule and not the one in the
Makefile (but maybe is what you wanted).

> Patch 1 introduces lib32 beside obj32, with strlen() being the first
> broken-out function, and patch 2 swaps to __builtin_strlen().
>
> Both compile, but the difference that patch 2 introduces was unexpected.
>
> With just lib32, and taking strsubcmp() as an example, we get:
>
> 00000000 <strsubcmp>:
>    0:        83 ec 0c                 sub    $0xc,%esp
>    3:        89 5c 24 04              mov    %ebx,0x4(%esp)
>    7:        89 74 24 08              mov    %esi,0x8(%esp)
>    b:        89 c6                    mov    %eax,%esi
>    d:        89 d3                    mov    %edx,%ebx
>    f:        89 d0                    mov    %edx,%eax
>   11:    /-- e8 fc ff ff ff           call   12 <strsubcmp+0x12>
>             12: R_386_PC32    strlen
>   16:        89 c1                    mov    %eax,%ecx
>   18:        89 da                    mov    %ebx,%edx
>   1a:        89 f0                    mov    %esi,%eax
>   1c:    /-- e8 fc ff ff ff           call   1d <strsubcmp+0x1d>
>             1d: R_386_PC32    .text.strncmp
>   21:        8b 5c 24 04              mov    0x4(%esp),%ebx
>   25:        8b 74 24 08              mov    0x8(%esp),%esi
>   29:        83 c4 0c                 add    $0xc,%esp
>   2c:        c3                       ret
>
> which all seems fine.  We get a plain PC32 relocation to strlen (which
> is now in the separate library).
>
> However, with patch 2 in place (simply swapping the plain extern for
> __builtin_strlen(), we now get:
>
> 00000000 <strsubcmp>:
>    0:        83 ec 0c                 sub    $0xc,%esp
>    3:        89 1c 24                 mov    %ebx,(%esp)
>    6:        89 74 24 04              mov    %esi,0x4(%esp)
>    a:        89 7c 24 08              mov    %edi,0x8(%esp)
>    e:    /-- e8 fc ff ff ff           call   f <strsubcmp+0xf>
>             f: R_386_PC32    __x86.get_pc_thunk.bx
>   13:        81 c3 02 00 00 00        add    $0x2,%ebx
>             15: R_386_GOTPC    _GLOBAL_OFFSET_TABLE_
>   19:        89 c7                    mov    %eax,%edi
>   1b:        89 d6                    mov    %edx,%esi
>   1d:        89 d0                    mov    %edx,%eax
>   1f:    /-- e8 fc ff ff ff           call   20 <strsubcmp+0x20>
>             20: R_386_PLT32    strlen

PLT means it not declared hidden, otherwise it would have used the
relative relocation.
Maybe

size_t strlen(const char *s);
#define strlen(s) __builtin_strlen(s)

xen/compiler.h is included, so all declaration should get the hidden
by default ? Or add __attribute__((visibility("hidden"))) explicitly.

>   24:        89 c1                    mov    %eax,%ecx
>   26:        89 f2                    mov    %esi,%edx
>   28:        89 f8                    mov    %edi,%eax
>   2a:    /-- e8 fc ff ff ff           call   2b <strsubcmp+0x2b>
>             2b: R_386_PC32    .text.strncmp
>   2f:        8b 1c 24                 mov    (%esp),%ebx
>   32:        8b 74 24 04              mov    0x4(%esp),%esi
>   36:        8b 7c 24 08              mov    0x8(%esp),%edi
>   3a:        83 c4 0c                 add    $0xc,%esp
>   3d:        c3                       ret
>
>
> The builtin hasn't managed to optimise away the call to strlen (that's
> fine).  But, we've ended up spilling %ebx to the stack, calculating the
> location of the GOT and not using it.
>

Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it.

> So, as it stands, trying to use __builtin_strlen() results in worse code
> generation.  One thing I noticed was that we're not passing
> -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help
> either.  We do have the pragma from compiler.h, so I'm out of visibility
> ideas.
>

The -fvisibility=hidden should be overridden by the xen/compiler.h;
but should be overridden with hidden!
Maybe strlen is defined by default with another visibility?
If you generate the assembly, you should see if the strlen symbol gets
the .hidden bless or not.

> Anything else I've missed?
>

Coffee :-)

Frediano



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.