[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: lib32 followup (visibility issue and/or toolchain bug?)
On 16.10.2024 16:53, Frediano Ziglio wrote: > On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > wrote: >> >> Hello, >> >> Preempting some future work I'm expecting to arrive, I had a go at using >> __builtin_*() in obj32. >> >> This is formed of 2 patches on top of this series: >> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32 >> > > You are confident we'll have a lot of shared code to need an > additional "lib32" in the Makefile! > I would personally stick with obj32. > Note that file should be strlen.c, not strlen.32.c, otherwise you are > possibly going to pick up the general rule and not the one in the > Makefile (but maybe is what you wanted). > >> Patch 1 introduces lib32 beside obj32, with strlen() being the first >> broken-out function, and patch 2 swaps to __builtin_strlen(). >> >> Both compile, but the difference that patch 2 introduces was unexpected. >> >> With just lib32, and taking strsubcmp() as an example, we get: >> >> 00000000 <strsubcmp>: >> 0: 83 ec 0c sub $0xc,%esp >> 3: 89 5c 24 04 mov %ebx,0x4(%esp) >> 7: 89 74 24 08 mov %esi,0x8(%esp) >> b: 89 c6 mov %eax,%esi >> d: 89 d3 mov %edx,%ebx >> f: 89 d0 mov %edx,%eax >> 11: /-- e8 fc ff ff ff call 12 <strsubcmp+0x12> >> 12: R_386_PC32 strlen >> 16: 89 c1 mov %eax,%ecx >> 18: 89 da mov %ebx,%edx >> 1a: 89 f0 mov %esi,%eax >> 1c: /-- e8 fc ff ff ff call 1d <strsubcmp+0x1d> >> 1d: R_386_PC32 .text.strncmp >> 21: 8b 5c 24 04 mov 0x4(%esp),%ebx >> 25: 8b 74 24 08 mov 0x8(%esp),%esi >> 29: 83 c4 0c add $0xc,%esp >> 2c: c3 ret >> >> which all seems fine. We get a plain PC32 relocation to strlen (which >> is now in the separate library). >> >> However, with patch 2 in place (simply swapping the plain extern for >> __builtin_strlen(), we now get: >> >> 00000000 <strsubcmp>: >> 0: 83 ec 0c sub $0xc,%esp >> 3: 89 1c 24 mov %ebx,(%esp) >> 6: 89 74 24 04 mov %esi,0x4(%esp) >> a: 89 7c 24 08 mov %edi,0x8(%esp) >> e: /-- e8 fc ff ff ff call f <strsubcmp+0xf> >> f: R_386_PC32 __x86.get_pc_thunk.bx >> 13: 81 c3 02 00 00 00 add $0x2,%ebx >> 15: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ >> 19: 89 c7 mov %eax,%edi >> 1b: 89 d6 mov %edx,%esi >> 1d: 89 d0 mov %edx,%eax >> 1f: /-- e8 fc ff ff ff call 20 <strsubcmp+0x20> >> 20: R_386_PLT32 strlen > > PLT means it not declared hidden, otherwise it would have used the > relative relocation. > Maybe > > size_t strlen(const char *s); > #define strlen(s) __builtin_strlen(s) > > xen/compiler.h is included, so all declaration should get the hidden > by default ? Or add __attribute__((visibility("hidden"))) explicitly. > >> 24: 89 c1 mov %eax,%ecx >> 26: 89 f2 mov %esi,%edx >> 28: 89 f8 mov %edi,%eax >> 2a: /-- e8 fc ff ff ff call 2b <strsubcmp+0x2b> >> 2b: R_386_PC32 .text.strncmp >> 2f: 8b 1c 24 mov (%esp),%ebx >> 32: 8b 74 24 04 mov 0x4(%esp),%esi >> 36: 8b 7c 24 08 mov 0x8(%esp),%edi >> 3a: 83 c4 0c add $0xc,%esp >> 3d: c3 ret >> >> >> The builtin hasn't managed to optimise away the call to strlen (that's >> fine). But, we've ended up spilling %ebx to the stack, calculating the >> location of the GOT and not using it. > > Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it. Yes, PIC PLT entries use %ebx. >> So, as it stands, trying to use __builtin_strlen() results in worse code >> generation. One thing I noticed was that we're not passing >> -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help >> either. We do have the pragma from compiler.h, so I'm out of visibility >> ideas. > > The -fvisibility=hidden should be overridden by the xen/compiler.h; > but should be overridden with hidden! > Maybe strlen is defined by default with another visibility? > If you generate the assembly, you should see if the strlen symbol gets > the .hidden bless or not. Only defined symbols have .hidden emitted for them, afaics. But that also doesn't matter much, as it's the emission of the @plt on the calls which requires the setting up of %ebx prior to those calls. (Arguably in 32-bit code, where all addresses are reachable, "hidden" could serve as a hint that calls don't need to go through the PLT. This maybe considered a missed optimization opportunity.) There's no difference there between calling strlen() or e.g. a custom (extern) StrLen(). And btw, reloc.c and cmdline.c currently also compile to code using __x86.get_pc_thunk.* (and @gotoff relocations for data references). Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |