On Fri, 3 Mar 2023 at 17:47, Alexandre Oliva wrote: > On Mar 3, 2023, Jonathan Wakely wrote: > > > On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely wrote: > >> Jakub previously suggested doing this for PR 61841, which was a similar > >> problem with pthread_create: > >> > >> __asm ("" : : "r" (&pthread_create)); would not be optimized away. > >> > >> > >> That would avoid the multiple copies. > > Not really. There would be multiple copies of the code that loads > pthread_create's address. And we don't really need the address, a > single never-executed call would do. I've explored these possibilities > a bit, and here's what I've come up with: a private static member > function that we output in units that instantiate the thread template > ctor, to pass its address to _M_start_thread. Since it's never actually > called, we don't really need the hacks in some of the alternatives I > left in place, mainly for your enjoyment. > > They all work equally well, just as efficient per-instantiation at > runtime, a little different space and loading overheads, but the last > one, that is enabled, is my favorite: only PLT relocations, that we'd > likely get anyway, no full-address resolution, and as-short-as-possible > calls, enough to get a relocation with a strong reference to pull the > symbol in when linking, but as short as possible call sequences, because > of the type cast. > And those expressions aren't ever optimized away as unused? > > As a bonus, I put in (in the last minute, after my test runs) something > to keep even LTO happy: the asm statements to prevent depend from being > optimized out in _M_start_thread. In non-LTO, its impact should be > virtually zero. > > How does this look? (minus the #if 0/#elif 0/.../#else) > Looks good, thanks for going the extra mile to check all the alternatives, and the futureproofing it for LTO. OK for trunk.