发件人:Jinyang He 发送日期:2023-12-04 16:57:55 收件人:"常佳琛" ,binutils@sourceware.org 抄送人:xuchenghua@loongson.cn,chenglulu@loongson.cn,liuzhensong@loongson.cn,xry111@xry111.site,i.swmail@xen0n.name,maskray@google.com,cailulu@loongson.cn,luweining@loongson.cn,wanglei@loongson.cn,Lazy_Linux@126.com,mengqinggang@loongson.cn 主题:Re: [PATCH v2 0/5] LoongArch tls le model linker relaxation support.> >On 2023-12-04 11:39, 常佳琛 wrote: >> The above is a simple explanation of the O0 optimization, >> which is currently available with O2 and O3 turned on. >> >> example: >> test.c: >> __thread int count1; >> int main(){ >> count1 = 1; >> } >> (Enable O2 option and no relax) >> 0000000120000480
: >> 120000480:1400000c lu12i.w $t0, 0 >> 120000484:0280040d li.w $t1, 1 >> 120000488:0010898c add.d $t0, $t0, $tp >> 12000048c:00150004 move $a0, $zero >> 120000490:2980018d st.w $t1, $t0, 0 >> 120000494:4c000020 ret >> >> (Enable O2 option and relax) >> 0000000120000480
: >> 120000480:0280040d li.w $t1, 1 >> 120000484:00150004 move $a0, $zero >> 120000488:2980004d st.w $t1, $tp, 0 >> 12000048c:4c000020 ret >> >> As you can see, with the O2 option turned on, the order of >> instructions changes, >> but the relax optimization is still not affected, and the address >> calculation of the >> tls variable count1 is correct before and after optimization. The >> situation of enabling >> O3 is similar to that of enabling O2. >> >> > >How can I get your gcc (or patches)? I tried to compare access to >non-thread var with old gcc. Reply : There are still some issues with gcc that need to be worked out. As for gcc patch, it will be shipped on Tuesday or Wednesday of this week, you may have to wait for a while. changjiachen > >Condition: >__thread int a; >int b; >extern int foo(int *); > >Compare in old gcc: > >a = 1; b = 1; > >lu12i.w $r12,%le_hi20(a) pcalau12i $r12,%pc_hi20(b) >ori $r12,$r12,%le_lo12(a) >addi.w $r13,$r0,1 addi.w $r13,$r0,1 >stx.w $r13,$r12,$r2 st.w $r13,$r12,%pc_lo12(b) > > >a = 1; return foo(&a); b = 1; return foo(&b); > >lu12i.w $r12,%le_hi20(a) pcalau12i $r4,%pc_hi20(b) >ori $r12,$r12,%le_lo12(a) addi.d $r4,$r4,%pc_lo12(b) >addi.w $r13,$r0,1 addi.w $r12,$r0,1 >add.d $r4,$r12,$r2 >stx.w $r13,$r12,$r2 stptr.w $r12,$r4,0 >b %plt(foo) b %plt(foo) > >I worry about this case we need the address of the thread-var after >accessing it, which may cause worse sequence in your gcc. For the >non-thread-var it load the address to a register first and then >access it by that register. How about your gcc handle this case? > > >> >> >> From: Jinyang He >> Date: 2023-12-04 10:25:13 >> To: changjiachen ,binutils@sourceware.org >> Cc: xuchenghua@loongson.cn,chenglulu@loongson.cn,liuzhensong@loongson.cn,xry111@xry111.site,i.swmail@xen0n.name,maskray@google.com,cailulu@loongson.cn,luweining@loongson.cn,wanglei@loongson.cn,Lazy_Linux@126.com,mengqinggang@loongson.cn >> Subject: Re: [PATCH v2 0/5] LoongArch tls le model linker relaxation support.> >> >On 2023-12-02 14:53, changjiachen wrote: >> >> This is the v2 version of patches to support loongarch linker tls le model relax. >> >> >> >> Changes from v1: >> >> >> >> * Modified v1-0000-cover-letter.patch part of the explanatory content. >> >> >> >> Before Modify: >> >> >> >> example: __thread int a = 1; >> >> >> >> old insn sequence: >> >> >> >> lu12i.w $r12,%le_hi20_r(a) >> >> ori $r12,$r12,%le_lo12_r(a) >> >> add.d $r12,$r12,$r2,%le_add_r(a) >> >> li.w $r13,$r0,1 >> >> stptr.w $r13,$r12,0 >> >> >> >> new insn sequence: >> >> >> >> lu12i.w $r12,%le_hi20_r(a) >> >> add.d $r12,$r12,$r2,%le_add_r(a) >> >> li.w $r13,$r0,1 >> >> st.w $r13,$r12,%le_lo12_r(a) >> >> >> >> After Modify: >> >> >> >> example: __thread int a = 1; >> >> >> >> old insn sequence(at the O0 optimization level): >> > >> >If the sequence appear only at -O0, is it worth optimizing by relaxation? >> > >> > >> >> >> >> lu12i.w $r12,%le_hi20(a) >> >> ori $r12,$r12,%le_lo12(a) >> >> add.d $r12,$r12,$r2 >> >> addi.w $r13,$r0,1 >> >> stptr.w $r13,$r12,0 >> >> >> >> new insn sequence(at the O0 optimization level): >> >> >> >> lu12i.w $r12,%le_hi20_r(a) >> >> add.d $r12,$r12,$r2,%le_add_r(a) >> >And here, if the sequence appear in other optimization level, will >> >register value ($r12) being different between the old sequence and >> >the new sequence cause other problems, e.g. worse sequence? Have you >> > >> >tried this relaxation at other optimization levels? >> > >> > >> >Thanks. >> > >> >> addi.w $r13,$r0,1 >> >> st.w $r13,$r12,%le_lo12_r(a) >> >> >> >> changjiachen (5): >> >> LoongArch: bfd: Add support for tls le relax. >> >> LoongArch: include: Add support for tls le relax. >> >> LoongArch: opcodes: Add support for tls le relax. >> >> LoongArch: gas: Add support for tls le relax. >> >> LoongArch: ld: Add support for tls le relax. >> >> >> >> bfd/bfd-in2.h | 4 + >> >> bfd/elfnn-loongarch.c | 74 +++++++++ >> >> bfd/elfxx-loongarch.c | 50 ++++++ >> >> bfd/libbfd.h | 3 + >> >> bfd/reloc.c | 6 + >> >> gas/config/tc-loongarch.c | 12 +- >> >> gas/testsuite/gas/loongarch/reloc.d | 18 +++ >> >> gas/testsuite/gas/loongarch/reloc.s | 11 ++ >> >> include/elf/loongarch.h | 13 ++ >> >> ld/testsuite/ld-loongarch-elf/old-tls-le.s | 19 +++ >> >> .../relax-bound-check-tls-le.s | 48 ++++++ >> >> .../ld-loongarch-elf/relax-check-tls-le.s | 43 ++++++ >> >> ld/testsuite/ld-loongarch-elf/relax-tls-le.s | 17 ++ >> >> ld/testsuite/ld-loongarch-elf/relax.exp | 146 +++++++++++++++++- >> >> .../tls-relax-compatible-check-old.s | 39 +++++ >> >> opcodes/loongarch-opc.c | 1 + >> >> 16 files changed, 501 insertions(+), 3 deletions(-) >> >> create mode 100644 ld/testsuite/ld-loongarch-elf/old-tls-le.s >> >> create mode 100644 ld/testsuite/ld-loongarch-elf/relax-bound-check-tls-le.s >> >> create mode 100644 ld/testsuite/ld-loongarch-elf/relax-check-tls-le.s >> >> create mode 100644 ld/testsuite/ld-loongarch-elf/relax-tls-le.s >> >> create mode 100644 ld/testsuite/ld-loongarch-elf/tls-relax-compatible-check-old.s >> >> >> > >> >