public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113686] New: [RISC-V] TLS (Local Exec) relaxation on structures (LE)
@ 2024-01-31 18:47 hpa at zytor dot com
  2024-01-31 19:05 ` [Bug target/113686] " palmer at gcc dot gnu.org
  2024-02-01 20:24 ` hpa at zytor dot com
  0 siblings, 2 replies; 3+ messages in thread
From: hpa at zytor dot com @ 2024-01-31 18:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686

            Bug ID: 113686
           Summary: [RISC-V] TLS (Local Exec) relaxation on structures
                    (LE)
           Product: gcc
           Version: 13.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hpa at zytor dot com
  Target Milestone: ---

When the Local Exec TLS model is in use, gcc generates inefficient code for
accessing the member of a structure:

struct foobar {
   int alpha;
   int beta;
};

_Thread_local struct foobar foo;

void func(int bar)
{
    foo.beta = bar;
}

    # Version 1
    lui    a1,%tprel_hi(foo)
    add    a1,a1,tp,%tprel_add(foo)
    addi   a1,a1,%tprel_lo(foo)
    sw     a0,4(a1)

However, in this case it could be generated as:

    # Version 2
    lui    a1,%tprel_hi(sym+4)
    addi   a1,a1,tp,%tprel_add(sym+4)
    sw     a0,%tprel_lo(sym+4)(a1)

... which, if %tprel_hi(sym+4) == 0, as it often is for small embedded
software, the linker can relax to a simple (tp) reference:

    # Version 2a (post-relaxation with small .tbss)
    sw a0,%tprel_lo(sym+4)(tp)

The linker will *not* relax version 1 all the way; leaving an unnecessary mv:

    # Version 1a (post-relaxation with small .tbss)
    mv a1,tp
    sw a0,%tprel_lo(sym+4)(tp)

It is of course trickier for the case of multiple subsequent references to the
structure if the structure is not aligned, as gcc can't know a priori where the
4K breaks are[*]. The version 1 code is more efficient in that case (3
instructions + 1 instruction/field as opposed to 3 instructions/field.)

However, if the structure *is* aligned, gcc will still not optimize 1 into 2.

There are at least a few options I see:

1. gcc option: gcc can generate version 2 code for a single field reference, or
if the alignment is such that all fields are guaranteed to fall inside the same
4K window.

2. gcc and optional ABI option: introduce a "TLS TE-tiny" model for deep
embedded use, where the combined size of the TSS area is limited to 4K
equivalent to the way direct gp references [or zero, if the global pointer is
0] work. Thus, direct (tp) references can be used.

NOTE: With the current binutils, this will error unless .option norelax is in
effect. It might be desirable to instead have a new relocation type, which
would require binutils support. Alternatively, ld should recognize that the TLS
offset is within +/- 2K and suppress the warning in that case (since at that
point the address is available the the linker.)

The linker could be further optimized by allowing the TLS to offset; presumably
equivalently to the __global_pointer$ symbol.

3. binutils option: teach ld to relax these kinds of chained pointer
references.



[*] Rant: in my opinion, the lui/auipc instructions are fundamentally
misdesigned by not having an overlap bit to guarantee a sizable window.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-02-01 20:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 18:47 [Bug target/113686] New: [RISC-V] TLS (Local Exec) relaxation on structures (LE) hpa at zytor dot com
2024-01-31 19:05 ` [Bug target/113686] " palmer at gcc dot gnu.org
2024-02-01 20:24 ` hpa at zytor dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).