On Fri, 3 Mar 2023 at 18:12, Jonathan Wakely wrote: > > > On Fri, 3 Mar 2023 at 17:47, Alexandre Oliva wrote: > >> On Mar 3, 2023, Jonathan Wakely wrote: >> >> > On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely >> wrote: >> >> Jakub previously suggested doing this for PR 61841, which was a similar >> >> problem with pthread_create: >> >> >> >> __asm ("" : : "r" (&pthread_create)); would not be optimized away. >> >> >> >> >> >> That would avoid the multiple copies. >> >> Not really. There would be multiple copies of the code that loads >> pthread_create's address. And we don't really need the address, a >> single never-executed call would do. I've explored these possibilities >> a bit, and here's what I've come up with: a private static member >> function that we output in units that instantiate the thread template >> ctor, to pass its address to _M_start_thread. Since it's never actually >> called, we don't really need the hacks in some of the alternatives I >> left in place, mainly for your enjoyment. >> >> They all work equally well, just as efficient per-instantiation at >> runtime, a little different space and loading overheads, but the last >> one, that is enabled, is my favorite: only PLT relocations, that we'd >> likely get anyway, no full-address resolution, and as-short-as-possible >> calls, enough to get a relocation with a strong reference to pull the >> symbol in when linking, but as short as possible call sequences, because >> of the type cast. >> > > And those expressions aren't ever optimized away as unused? > Oh, I missed that they're called after casting them, I didn't notice the trailing (). That would be UB to call them through the wrong pointer type, so the compiler could decide they're unreachable, but it seems to work for now. Thanks!