public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases
@ 2024-07-22 17:28 nsz at gcc dot gnu.org
2024-07-23 14:19 ` [Bug dynamic-link/32008] " nsz at gcc dot gnu.org
2024-07-23 14:54 ` nsz at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: nsz at gcc dot gnu.org @ 2024-07-22 17:28 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=32008
Bug ID: 32008
Summary: aarch64: tlsdesc can be optimized by custom tlsdesc
calls for common cases
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: dynamic-link
Assignee: unassigned at sourceware dot org
Reporter: nsz at gcc dot gnu.org
Target Milestone: ---
the static tls case is essentially:
size_t _dl_tlsdesc_return(size_t *got)
{
return got[1]; // offset
}
the dynamic tls fast path is
size_t _dl_tlsdesc_dynamic(size_t *got)
{
void *tp = gettp();
void **dtv = *(void ***)tp;
size_t *data = (size_t *)got[1];
size_t modid = data[0];
size_t offset = data[1];
size_t gencount = data[2];
if (gencount < (size_t)dtv[0] && dtv[modid] != UNALLOCATED)
return dtv[modid] + offset - tp;
//...
}
can be optimized under the assumption that certain modid or offset
values are common (e.g. modid < 8 or offset == 0): for such cases
use a custom _dl_tlsdesc_* when the tlsdesc relocation is processed.
e.g.
size_t _dl_tlsdesc_return42(size_t *got)
{
return 42;
}
can be used for a tlsdesc reloc if the module uses static tls and
the tp offset to the variable ends up 42. this of course does not
save much and rarely useful, but for the dynamic case it may be
possible to save many instructions if some common modid/offset/gen
cases have separate asm entry points: even if bug 31995 is not fixed
similar optimization to the proposed fix for bug 27404 can be
implemented.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug dynamic-link/32008] aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases
2024-07-22 17:28 [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases nsz at gcc dot gnu.org
@ 2024-07-23 14:19 ` nsz at gcc dot gnu.org
2024-07-23 14:54 ` nsz at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: nsz at gcc dot gnu.org @ 2024-07-23 14:19 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=32008
--- Comment #1 from Szabolcs Nagy <nsz at gcc dot gnu.org> ---
if we compress modid/offset/gen in one got entry then
the dynamic fast path can be slightly improved with
small amount of additional code (slow path can be shared
between the current entry point and the optimized one)
e.g the fast path can go from
...
ldr x1, [x0, 8]
ldr x3, [x1, 16] // gen
cmp x3, x2
b.hi slow_dtv_update
ldp x2, x3, [x1] // modid, off
...
to
...
ldp w1, w3, [x0, 8] // 32bit gen
cmp x3, x2
b.hi slow_dtv_update
and w2, w1, 1023 // 10bit modid
lsr x3, w1, 10 // 22bit off
...
which can be slightly faster (one load instead of 3).
such micro optimization is not needed if we get rid of
the gen count, but it is easier to implement.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug dynamic-link/32008] aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases
2024-07-22 17:28 [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases nsz at gcc dot gnu.org
2024-07-23 14:19 ` [Bug dynamic-link/32008] " nsz at gcc dot gnu.org
@ 2024-07-23 14:54 ` nsz at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: nsz at gcc dot gnu.org @ 2024-07-23 14:54 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=32008
--- Comment #2 from Szabolcs Nagy <nsz at gcc dot gnu.org> ---
single threaded dlopen can also use different tlsdesc
entry point.
in dlopen the dtv is updated in the current thread after
relocations are processed and future threads will have
updated dtv too so the single thread tlsdesc entry does
not have to check the gen count:
size_t _dl_tlsdesc_single_thread(size_t *got)
{
void *tp = gettp();
void **dtv = *(void ***)tp;
size_t modid = (uint32_t)got[1];
size_t offset = got[1]>>32;
if (dtv[modid] == UNALLOCATED)
return slow_path(tp, dtv, modid, offset);
return dtv[modid] + offset - tp;
}
(except it has to be in asm for PCS reasons)
(this is another micro optimization that may not be worth it)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-07-23 14:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 17:28 [Bug dynamic-link/32008] New: aarch64: tlsdesc can be optimized by custom tlsdesc calls for common cases nsz at gcc dot gnu.org
2024-07-23 14:19 ` [Bug dynamic-link/32008] " nsz at gcc dot gnu.org
2024-07-23 14:54 ` nsz at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).