public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases
@ 2024-07-22 17:28 nsz at gcc dot gnu.org
  2024-07-23 14:19 ` [Bug dynamic-link/32008] " nsz at gcc dot gnu.org
  2024-07-23 14:54 ` nsz at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: nsz at gcc dot gnu.org @ 2024-07-22 17:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=32008

            Bug ID: 32008
           Summary: aarch64: tlsdesc can be optimized by custom tlsdesc
                    calls for common cases
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: dynamic-link
          Assignee: unassigned at sourceware dot org
          Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

the static tls case is essentially:

size_t _dl_tlsdesc_return(size_t *got)
{
  return got[1]; // offset
}

the dynamic tls fast path is

size_t _dl_tlsdesc_dynamic(size_t *got)
{
  void *tp = gettp();
  void **dtv = *(void ***)tp;
  size_t *data = (size_t *)got[1];
  size_t modid = data[0];
  size_t offset = data[1];
  size_t gencount = data[2];
  if (gencount < (size_t)dtv[0] && dtv[modid] != UNALLOCATED)
    return dtv[modid] + offset - tp;
  //...
}

can be optimized under the assumption that certain modid or offset
values are common (e.g. modid < 8 or offset == 0): for such cases
use a custom _dl_tlsdesc_* when the tlsdesc relocation is processed.

e.g.

size_t _dl_tlsdesc_return42(size_t *got)
{
  return 42;
}

can be used for a tlsdesc reloc if the module uses static tls and
the tp offset to the variable ends up 42. this of course does not
save much and rarely useful, but for the dynamic case it may be
possible to save many instructions if some common modid/offset/gen
cases have separate asm entry points: even if bug 31995 is not fixed
similar optimization to the proposed fix for bug 27404 can be
implemented.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug dynamic-link/32008] aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases
  2024-07-22 17:28 [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases nsz at gcc dot gnu.org
@ 2024-07-23 14:19 ` nsz at gcc dot gnu.org
  2024-07-23 14:54 ` nsz at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: nsz at gcc dot gnu.org @ 2024-07-23 14:19 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=32008

--- Comment #1 from Szabolcs Nagy <nsz at gcc dot gnu.org> ---
if we compress modid/offset/gen in one got entry then
the dynamic fast path can be slightly improved with
small amount of additional code (slow path can be shared
between the current entry point and the optimized one)

e.g the fast path can go from

  ...
  ldr  x1, [x0, 8]
  ldr  x3, [x1, 16]    // gen
  cmp  x3, x2
  b.hi slow_dtv_update
  ldp  x2, x3, [x1]    // modid, off
  ...

to

  ...
  ldp  w1, w3, [x0, 8] // 32bit gen
  cmp  x3, x2
  b.hi slow_dtv_update
  and  w2, w1, 1023    // 10bit modid
  lsr  x3, w1, 10      // 22bit off
  ...

which can be slightly faster (one load instead of 3).

such micro optimization is not needed if we get rid of
the gen count, but it is easier to implement.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug dynamic-link/32008] aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases
  2024-07-22 17:28 [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases nsz at gcc dot gnu.org
  2024-07-23 14:19 ` [Bug dynamic-link/32008] " nsz at gcc dot gnu.org
@ 2024-07-23 14:54 ` nsz at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: nsz at gcc dot gnu.org @ 2024-07-23 14:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=32008

--- Comment #2 from Szabolcs Nagy <nsz at gcc dot gnu.org> ---
single threaded dlopen can also use different tlsdesc
entry point.

in dlopen the dtv is updated in the current thread after
relocations are processed and future threads will have
updated dtv too so the single thread tlsdesc entry does
not have to check the gen count:

size_t _dl_tlsdesc_single_thread(size_t *got)
{
  void *tp = gettp();
  void **dtv = *(void ***)tp;
  size_t modid = (uint32_t)got[1];
  size_t offset = got[1]>>32;
  if (dtv[modid] == UNALLOCATED)
    return slow_path(tp, dtv, modid, offset);
  return dtv[modid] + offset - tp;
}

(except it has to be in asm for PCS reasons)

(this is another micro optimization that may not be worth it)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-07-23 14:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 17:28 [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases nsz at gcc dot gnu.org
2024-07-23 14:19 ` [Bug dynamic-link/32008] " nsz at gcc dot gnu.org
2024-07-23 14:54 ` nsz at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).