* PATCH: Properly generate X32 IE sequence @ 2012-03-09 22:26 H.J. Lu 2012-03-10 13:10 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-09 22:26 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 6145 bytes --] On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>> by checking >>>> >>>> movq foo@gottpoff(%rip), %reg >>>> >>>> and >>>> >>>> addq foo@gottpoff(%rip), %reg >>>> >>>> It uses the REX prefix to avoid the last byte of the previous >>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>> the last byte of the previous instruction may be an offset, which >>>> may look like a REX prefix. IE->LE optimization will generate corrupted >>>> binary. This patch makes sure we always output an REX pfrefix for >>>> UNSPEC_GOTNTPOFF. OK for trunk? >>> >>> Actually, linker has: >>> >>> case R_X86_64_GOTTPOFF: >>> /* Check transition from IE access model: >>> mov foo@gottpoff(%rip), %reg >>> add foo@gottpoff(%rip), %reg >>> */ >>> >>> /* Check REX prefix first. */ >>> if (offset >= 3 && (offset + 4) <= sec->size) >>> { >>> val = bfd_get_8 (abfd, contents + offset - 3); >>> if (val != 0x48 && val != 0x4c) >>> { >>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>> if (ABI_64_P (abfd)) >>> return FALSE; >>> } >>> } >>> else >>> { >>> /* X32 may not have any REX prefix. */ >>> if (ABI_64_P (abfd)) >>> return FALSE; >>> if (offset < 2 || (offset + 3) > sec->size) >>> return FALSE; >>> } >>> >>> So, it should handle the case without REX just OK. If it doesn't, then >>> this is a bug in binutils. >>> >> >> The last byte of the displacement in the previous instruction >> may happen to look like a REX byte. In that case, linker >> will overwrite the last byte of the previous instruction and >> generate the wrong instruction sequence. >> >> I need to update linker to enforce the REX byte check. > > One important observation: if we want to follow the x86_64 TLS spec > strictly, we have to use existing DImode patterns only. This also > means that we should NOT convert other TLS patterns to Pmode, since > they explicitly state movq and addq. If this is not the case, then we > need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; call __tls_get_addr@plt GD->IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD->LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD->LE optimization .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE->LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 or movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 X32 TLS implementation is straight forward, except for IE: 1. Since address override works only on the (reg32) part in fs:(reg32), we can't use it as memory operand. This patch changes ix86_decompose_address to disallow fs:(reg) if Pmode != word_mode. 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid any instructions between MOV and ADD, which may interfere linker IE->LE optimization, since the last byte of the previous instruction before ADD may look like a REX prefix. This patch adds tls_initial_exec_x32 to make sure that we always have movl %fs:0, %reg32 addl xgottpoff(%rip), %reg32 so that the last byte of the previous instruction before ADD will never be a REX byte. Tested on Linux/x32. -- H.J. -- 2012-03-09 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. (legitimize_tls_address): Call gen_tls_initial_exec_x32 if Pmode == SImode for x32. * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. (tls_initial_exec_x32): Likewise. [-- Attachment #2: gcc-x32-tls-1.patch --] [-- Type: text/plain, Size: 2698 bytes --] 2012-03-09 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. (legitimize_tls_address): Call gen_tls_initial_exec_x32 if Pmode == SImode for x32. * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. (tls_initial_exec_x32): Likewise. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 15465c2..312b50c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11524,6 +11534,11 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ + /* Since address override works only on the (reg32) part in fs:(reg32), + we can't use it as memory operand. */ + if (Pmode != word_mode && seg == SEG_FS && (base || index)) + return 0; + if (index) { if (REG_P (index)) @@ -12618,6 +12643,17 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) emit_insn (gen_tls_initial_exec_64_sun (dest, x)); return dest; } + else if (Pmode == SImode) + { + /* Always generate + movl %fs:0, %reg32 + addl xgottpoff(%rip), %reg32 + to support linker IE->LE optimization and avoid + fs:(%reg32) as memory operand. */ + dest = gen_reg_rtx (Pmode); + emit_insn (gen_tls_initial_exec_x32 (dest, x)); + return dest; + } pic = NULL; type = UNSPEC_GOTNTPOFF; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 188c982..d1fa997 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -95,6 +95,7 @@ UNSPEC_TLS_LD_BASE UNSPEC_TLSDESC UNSPEC_TLS_IE_SUN + UNSPEC_TLS_IE_X32 ;; Other random patterns UNSPEC_SCAS @@ -12775,6 +12776,28 @@ } [(set_attr "type" "multi")]) +;; When Pmode == SImode, there may be no REX prefix for ADD. Avoid +;; any instructions between MOV and ADD, which may interfere linker +;; IE->LE optimization, since the last byte of the previous instruction +;; before ADD may look like a REX prefix. This also avoids +;; movl x@gottpoff(%rip), %reg32 +;; movl $fs:(%reg32), %reg32 +;; Since address override works only on the (reg32) part in fs:(reg32), +;; we can't use it as memory operand. +(define_insn "tls_initial_exec_x32" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec:SI + [(match_operand:SI 1 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32)) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32" +{ + output_asm_insn + ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands); + return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}"; +} + [(set_attr "type" "multi")]) + ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-09 22:26 PATCH: Properly generate X32 IE sequence H.J. Lu @ 2012-03-10 13:10 ` Uros Bizjak 2012-03-10 18:50 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-10 13:10 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> >>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>> by checking >>>>> >>>>> movq foo@gottpoff(%rip), %reg >>>>> >>>>> and >>>>> >>>>> addq foo@gottpoff(%rip), %reg >>>>> >>>>> It uses the REX prefix to avoid the last byte of the previous >>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>> the last byte of the previous instruction may be an offset, which >>>>> may look like a REX prefix. IE->LE optimization will generate corrupted >>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>> >>>> Actually, linker has: >>>> >>>> case R_X86_64_GOTTPOFF: >>>> /* Check transition from IE access model: >>>> mov foo@gottpoff(%rip), %reg >>>> add foo@gottpoff(%rip), %reg >>>> */ >>>> >>>> /* Check REX prefix first. */ >>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>> { >>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>> if (val != 0x48 && val != 0x4c) >>>> { >>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>> if (ABI_64_P (abfd)) >>>> return FALSE; >>>> } >>>> } >>>> else >>>> { >>>> /* X32 may not have any REX prefix. */ >>>> if (ABI_64_P (abfd)) >>>> return FALSE; >>>> if (offset < 2 || (offset + 3) > sec->size) >>>> return FALSE; >>>> } >>>> >>>> So, it should handle the case without REX just OK. If it doesn't, then >>>> this is a bug in binutils. >>>> >>> >>> The last byte of the displacement in the previous instruction >>> may happen to look like a REX byte. In that case, linker >>> will overwrite the last byte of the previous instruction and >>> generate the wrong instruction sequence. >>> >>> I need to update linker to enforce the REX byte check. >> >> One important observation: if we want to follow the x86_64 TLS spec >> strictly, we have to use existing DImode patterns only. This also >> means that we should NOT convert other TLS patterns to Pmode, since >> they explicitly state movq and addq. If this is not the case, then we >> need new TLS specification for X32. > > Here is a patch to properly generate X32 IE sequence. > > This is the summary of differences between x86-64 TLS and x32 TLS: > > x86-64 x32 > GD > byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; > .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; > call __tls_get_addr@plt > > GD->IE optimization > movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; > addq x@gottpoff(%rip),%rax > > GD->LE optimization > movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; > leaq x@tpoff(%rax),%rax > > LD > leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; > call __tls_get_addr@plt call __tls_get_addr@plt > > LD->LE optimization > .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl > %fs:0, %eax > > IE > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 > > or > Not supported if > Pmode == SImode > movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; > movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 > > IE->LE optimization > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 > > to > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 > > movq %fs:0,%reg64; movl %fs:0,%reg32; > leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 > > or > > movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; > movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 > > to > > movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 > movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 > > LE > movq %fs:0,%reg64; movl %fs:0,%reg32; > leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 > > or > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 > > or > > movq %fs:0,%reg64; movl %fs:0,%reg32; > movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 > > or > > movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 > > > X32 TLS implementation is straight forward, except for IE: > > 1. Since address override works only on the (reg32) part in fs:(reg32), > we can't use it as memory operand. This patch changes ix86_decompose_address > to disallow fs:(reg) if Pmode != word_mode. > 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid > any instructions between MOV and ADD, which may interfere linker > IE->LE optimization, since the last byte of the previous instruction > before ADD may look like a REX prefix. This patch adds tls_initial_exec_x32 > to make sure that we always have > > movl %fs:0, %reg32 > addl xgottpoff(%rip), %reg32 > > so that the last byte of the previous instruction before ADD will > never be a REX byte. Tested on Linux/x32. > > 2012-03-09 H.J. Lu <hongjiu.lu@intel.com> > > * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) > if Pmode != word_mode. > (legitimize_tls_address): Call gen_tls_initial_exec_x32 if > Pmode == SImode for x32. > > * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. > (tls_initial_exec_x32): Likewise. Nice solution! OK for mainline. BTW: Did you investigate the issue with memory aliasing? Thanks, Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-10 13:10 ` Uros Bizjak @ 2012-03-10 18:50 ` H.J. Lu 2012-03-11 17:12 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-10 18:50 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> >>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>> by checking >>>>>> >>>>>> movq foo@gottpoff(%rip), %reg >>>>>> >>>>>> and >>>>>> >>>>>> addq foo@gottpoff(%rip), %reg >>>>>> >>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>>> the last byte of the previous instruction may be an offset, which >>>>>> may look like a REX prefix. IE->LE optimization will generate corrupted >>>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>>> >>>>> Actually, linker has: >>>>> >>>>> case R_X86_64_GOTTPOFF: >>>>> /* Check transition from IE access model: >>>>> mov foo@gottpoff(%rip), %reg >>>>> add foo@gottpoff(%rip), %reg >>>>> */ >>>>> >>>>> /* Check REX prefix first. */ >>>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>>> { >>>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>>> if (val != 0x48 && val != 0x4c) >>>>> { >>>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>>> if (ABI_64_P (abfd)) >>>>> return FALSE; >>>>> } >>>>> } >>>>> else >>>>> { >>>>> /* X32 may not have any REX prefix. */ >>>>> if (ABI_64_P (abfd)) >>>>> return FALSE; >>>>> if (offset < 2 || (offset + 3) > sec->size) >>>>> return FALSE; >>>>> } >>>>> >>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>> this is a bug in binutils. >>>>> >>>> >>>> The last byte of the displacement in the previous instruction >>>> may happen to look like a REX byte. In that case, linker >>>> will overwrite the last byte of the previous instruction and >>>> generate the wrong instruction sequence. >>>> >>>> I need to update linker to enforce the REX byte check. >>> >>> One important observation: if we want to follow the x86_64 TLS spec >>> strictly, we have to use existing DImode patterns only. This also >>> means that we should NOT convert other TLS patterns to Pmode, since >>> they explicitly state movq and addq. If this is not the case, then we >>> need new TLS specification for X32. >> >> Here is a patch to properly generate X32 IE sequence. >> >> This is the summary of differences between x86-64 TLS and x32 TLS: >> >> x86-64 x32 >> GD >> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >> .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; >> call __tls_get_addr@plt >> >> GD->IE optimization >> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >> addq x@gottpoff(%rip),%rax >> >> GD->LE optimization >> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >> leaq x@tpoff(%rax),%rax >> >> LD >> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >> call __tls_get_addr@plt call __tls_get_addr@plt >> >> LD->LE optimization >> .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >> %fs:0, %eax >> >> IE >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >> >> or >> Not supported if >> Pmode == SImode >> movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; >> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >> >> IE->LE optimization >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >> >> to >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 >> >> or >> >> movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; >> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >> >> to >> >> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >> >> LE >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 >> >> or >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 >> >> or >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 >> >> or >> >> movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 >> >> >> X32 TLS implementation is straight forward, except for IE: >> >> 1. Since address override works only on the (reg32) part in fs:(reg32), >> we can't use it as memory operand. This patch changes ix86_decompose_address >> to disallow fs:(reg) if Pmode != word_mode. >> 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid >> any instructions between MOV and ADD, which may interfere linker >> IE->LE optimization, since the last byte of the previous instruction >> before ADD may look like a REX prefix. This patch adds tls_initial_exec_x32 >> to make sure that we always have >> >> movl %fs:0, %reg32 >> addl xgottpoff(%rip), %reg32 >> >> so that the last byte of the previous instruction before ADD will >> never be a REX byte. Tested on Linux/x32. >> >> 2012-03-09 H.J. Lu <hongjiu.lu@intel.com> >> >> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >> if Pmode != word_mode. >> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >> Pmode == SImode for x32. >> >> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >> (tls_initial_exec_x32): Likewise. > > Nice solution! > > OK for mainline. Done. > BTW: Did you investigate the issue with memory aliasing? > It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 which loads address of the TLS symbol. Thanks. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-10 18:50 ` H.J. Lu @ 2012-03-11 17:12 ` H.J. Lu 2012-03-11 17:55 ` Uros Bizjak 2012-03-17 18:10 ` Uros Bizjak 0 siblings, 2 replies; 43+ messages in thread From: H.J. Lu @ 2012-03-11 17:12 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 7598 bytes --] On Sat, Mar 10, 2012 at 10:49 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> >>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>>> by checking >>>>>>> >>>>>>> movq foo@gottpoff(%rip), %reg >>>>>>> >>>>>>> and >>>>>>> >>>>>>> addq foo@gottpoff(%rip), %reg >>>>>>> >>>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>>>> the last byte of the previous instruction may be an offset, which >>>>>>> may look like a REX prefix. IE->LE optimization will generate corrupted >>>>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>>>> >>>>>> Actually, linker has: >>>>>> >>>>>> case R_X86_64_GOTTPOFF: >>>>>> /* Check transition from IE access model: >>>>>> mov foo@gottpoff(%rip), %reg >>>>>> add foo@gottpoff(%rip), %reg >>>>>> */ >>>>>> >>>>>> /* Check REX prefix first. */ >>>>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>>>> { >>>>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>>>> if (val != 0x48 && val != 0x4c) >>>>>> { >>>>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>>>> if (ABI_64_P (abfd)) >>>>>> return FALSE; >>>>>> } >>>>>> } >>>>>> else >>>>>> { >>>>>> /* X32 may not have any REX prefix. */ >>>>>> if (ABI_64_P (abfd)) >>>>>> return FALSE; >>>>>> if (offset < 2 || (offset + 3) > sec->size) >>>>>> return FALSE; >>>>>> } >>>>>> >>>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>>> this is a bug in binutils. >>>>>> >>>>> >>>>> The last byte of the displacement in the previous instruction >>>>> may happen to look like a REX byte. In that case, linker >>>>> will overwrite the last byte of the previous instruction and >>>>> generate the wrong instruction sequence. >>>>> >>>>> I need to update linker to enforce the REX byte check. >>>> >>>> One important observation: if we want to follow the x86_64 TLS spec >>>> strictly, we have to use existing DImode patterns only. This also >>>> means that we should NOT convert other TLS patterns to Pmode, since >>>> they explicitly state movq and addq. If this is not the case, then we >>>> need new TLS specification for X32. >>> >>> Here is a patch to properly generate X32 IE sequence. >>> >>> This is the summary of differences between x86-64 TLS and x32 TLS: >>> >>> x86-64 x32 >>> GD >>> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >>> .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; >>> call __tls_get_addr@plt >>> >>> GD->IE optimization >>> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >>> addq x@gottpoff(%rip),%rax >>> >>> GD->LE optimization >>> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >>> leaq x@tpoff(%rax),%rax >>> >>> LD >>> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >>> call __tls_get_addr@plt call __tls_get_addr@plt >>> >>> LD->LE optimization >>> .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >>> %fs:0, %eax >>> >>> IE >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >>> >>> or >>> Not supported if >>> Pmode == SImode >>> movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; >>> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>> >>> IE->LE optimization >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >>> >>> to >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 >>> >>> or >>> >>> movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; >>> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>> >>> to >>> >>> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >>> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >>> >>> LE >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 >>> >>> or >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 >>> >>> or >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 >>> >>> or >>> >>> movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 >>> >>> >>> X32 TLS implementation is straight forward, except for IE: >>> >>> 1. Since address override works only on the (reg32) part in fs:(reg32), >>> we can't use it as memory operand. This patch changes ix86_decompose_address >>> to disallow fs:(reg) if Pmode != word_mode. >>> 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid >>> any instructions between MOV and ADD, which may interfere linker >>> IE->LE optimization, since the last byte of the previous instruction >>> before ADD may look like a REX prefix. This patch adds tls_initial_exec_x32 >>> to make sure that we always have >>> >>> movl %fs:0, %reg32 >>> addl xgottpoff(%rip), %reg32 >>> >>> so that the last byte of the previous instruction before ADD will >>> never be a REX byte. Tested on Linux/x32. >>> >>> 2012-03-09 H.J. Lu <hongjiu.lu@intel.com> >>> >>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>> if Pmode != word_mode. >>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>> Pmode == SImode for x32. >>> >>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>> (tls_initial_exec_x32): Likewise. >> >> Nice solution! >> >> OK for mainline. > > Done. > >> BTW: Did you investigate the issue with memory aliasing? >> > > It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 > which loads address of the TLS symbol. > > Thanks. > Since we must use reg64 in %fs:(%reg) memory operand like movq x@gottpoff(%rip),%reg64; mov %fs:(%reg64),%reg this patch optimizes x32 TLS IE load and store by wrapping %reg64 inside of UNSPEC when Pmode == SImode. OK for trunk? Thanks. -- H.J. --- 2012-03-11 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.md (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise. [-- Attachment #2: gcc-x32-tls-2.patch --] [-- Type: text/plain, Size: 1614 bytes --] 2012-03-11 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.md (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index ae1dd1c..67441cd 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12806,6 +12806,41 @@ } [(set_attr "type" "multi")]) +(define_insn "*tls_initial_exec_x32_load" + [(set (match_operand:SWI1248x 0 "register_operand" "=r") + (mem:SWI1248x + (unspec:SI + [(match_operand:SI 1 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32" +{ + output_asm_insn + ("mov{q}\t{%a1@gottpoff(%%rip), %q0|%q0, %a1@gottpoff[rip]}", + operands); + if (!TARGET_MOVX || <MODE>mode == DImode || <MODE>mode == SImode) + return "mov{<imodesuffix>}\t{%%fs:(%q0), %0|%0, <iptrsize> PTR fs:[%q0]}"; + return "movz{<imodesuffix>l|x}\t{%%fs:(%q0), %k0|%k0, <iptrsize> PTR fs:[%q0]}"; +} + [(set_attr "type" "multi")]) + +(define_insn "*tls_initial_exec_x32_store" + [(set (mem:SWI1248x + (unspec:SI + [(match_operand:SI 0 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32)) + (match_operand:SWI1248x 1 "register_operand" "r")) + (clobber (match_scratch:DI 2 "=&r")) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32" +{ + output_asm_insn + ("mov{q}\t{%a0@gottpoff(%%rip), %q2|%q2, %a0@gottpoff[rip]}", + operands); + return "mov{<imodesuffix>}\t{%1, %%fs:(%q2)|<iptrsize> PTR fs:[%q2], %1}"; +} + [(set_attr "type" "multi")]) + ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-11 17:12 ` H.J. Lu @ 2012-03-11 17:55 ` Uros Bizjak 2012-03-11 18:16 ` H.J. Lu 2012-03-17 18:10 ` Uros Bizjak 1 sibling, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-11 17:55 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>>>> by checking >>>>>>>> >>>>>>>> movq foo@gottpoff(%rip), %reg >>>>>>>> >>>>>>>> and >>>>>>>> >>>>>>>> addq foo@gottpoff(%rip), %reg >>>>>>>> >>>>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>>>>> the last byte of the previous instruction may be an offset, which >>>>>>>> may look like a REX prefix. IE->LE optimization will generate corrupted >>>>>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>>>>> >>>>>>> Actually, linker has: >>>>>>> >>>>>>> case R_X86_64_GOTTPOFF: >>>>>>> /* Check transition from IE access model: >>>>>>> mov foo@gottpoff(%rip), %reg >>>>>>> add foo@gottpoff(%rip), %reg >>>>>>> */ >>>>>>> >>>>>>> /* Check REX prefix first. */ >>>>>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>>>>> { >>>>>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>>>>> if (val != 0x48 && val != 0x4c) >>>>>>> { >>>>>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>>>>> if (ABI_64_P (abfd)) >>>>>>> return FALSE; >>>>>>> } >>>>>>> } >>>>>>> else >>>>>>> { >>>>>>> /* X32 may not have any REX prefix. */ >>>>>>> if (ABI_64_P (abfd)) >>>>>>> return FALSE; >>>>>>> if (offset < 2 || (offset + 3) > sec->size) >>>>>>> return FALSE; >>>>>>> } >>>>>>> >>>>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>>>> this is a bug in binutils. >>>>>>> >>>>>> >>>>>> The last byte of the displacement in the previous instruction >>>>>> may happen to look like a REX byte. In that case, linker >>>>>> will overwrite the last byte of the previous instruction and >>>>>> generate the wrong instruction sequence. >>>>>> >>>>>> I need to update linker to enforce the REX byte check. >>>>> >>>>> One important observation: if we want to follow the x86_64 TLS spec >>>>> strictly, we have to use existing DImode patterns only. This also >>>>> means that we should NOT convert other TLS patterns to Pmode, since >>>>> they explicitly state movq and addq. If this is not the case, then we >>>>> need new TLS specification for X32. >>>> >>>> Here is a patch to properly generate X32 IE sequence. >>>> >>>> This is the summary of differences between x86-64 TLS and x32 TLS: >>>> >>>> x86-64 x32 >>>> GD >>>> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >>>> .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; >>>> call __tls_get_addr@plt >>>> >>>> GD->IE optimization >>>> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >>>> addq x@gottpoff(%rip),%rax >>>> >>>> GD->LE optimization >>>> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >>>> leaq x@tpoff(%rax),%rax >>>> >>>> LD >>>> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >>>> call __tls_get_addr@plt call __tls_get_addr@plt >>>> >>>> LD->LE optimization >>>> .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >>>> %fs:0, %eax >>>> >>>> IE >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >>>> >>>> or >>>> Not supported if >>>> Pmode == SImode >>>> movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; >>>> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>> >>>> IE->LE optimization >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >>>> >>>> to >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 >>>> >>>> or >>>> >>>> movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; >>>> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>> >>>> to >>>> >>>> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >>>> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >>>> >>>> LE >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 >>>> >>>> or >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 >>>> >>>> or >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 >>>> >>>> or >>>> >>>> movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 >>>> >>>> >>>> X32 TLS implementation is straight forward, except for IE: >>>> >>>> 1. Since address override works only on the (reg32) part in fs:(reg32), >>>> we can't use it as memory operand. This patch changes ix86_decompose_address >>>> to disallow fs:(reg) if Pmode != word_mode. >>>> 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid >>>> any instructions between MOV and ADD, which may interfere linker >>>> IE->LE optimization, since the last byte of the previous instruction >>>> before ADD may look like a REX prefix. This patch adds tls_initial_exec_x32 >>>> to make sure that we always have >>>> >>>> movl %fs:0, %reg32 >>>> addl xgottpoff(%rip), %reg32 >>>> >>>> so that the last byte of the previous instruction before ADD will >>>> never be a REX byte. Tested on Linux/x32. >>>> >>>> 2012-03-09 H.J. Lu <hongjiu.lu@intel.com> >>>> >>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>> if Pmode != word_mode. >>>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>>> Pmode == SImode for x32. >>>> >>>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>>> (tls_initial_exec_x32): Likewise. >>> >>> Nice solution! >>> >>> OK for mainline. >> >> Done. >> >>> BTW: Did you investigate the issue with memory aliasing? >>> >> >> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 >> which loads address of the TLS symbol. >> >> Thanks. >> > > Since we must use reg64 in %fs:(%reg) memory operand like > > movq x@gottpoff(%rip),%reg64; > mov %fs:(%reg64),%reg > > this patch optimizes x32 TLS IE load and store by wrapping > %reg64 inside of UNSPEC when Pmode == SImode. OK for > trunk? I think we should just scrap all these complications and go with the idea of clearing MASK_TLS_DIRECT_SEG_REFS. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-11 17:55 ` Uros Bizjak @ 2012-03-11 18:16 ` H.J. Lu 2012-03-11 18:21 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-11 18:16 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Sun, Mar 11, 2012 at 10:55 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>>>>> by checking >>>>>>>>> >>>>>>>>> movq foo@gottpoff(%rip), %reg >>>>>>>>> >>>>>>>>> and >>>>>>>>> >>>>>>>>> addq foo@gottpoff(%rip), %reg >>>>>>>>> >>>>>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>>>>>> the last byte of the previous instruction may be an offset, which >>>>>>>>> may look like a REX prefix. IE->LE optimization will generate corrupted >>>>>>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>>>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>>>>>> >>>>>>>> Actually, linker has: >>>>>>>> >>>>>>>> case R_X86_64_GOTTPOFF: >>>>>>>> /* Check transition from IE access model: >>>>>>>> mov foo@gottpoff(%rip), %reg >>>>>>>> add foo@gottpoff(%rip), %reg >>>>>>>> */ >>>>>>>> >>>>>>>> /* Check REX prefix first. */ >>>>>>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>>>>>> { >>>>>>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>>>>>> if (val != 0x48 && val != 0x4c) >>>>>>>> { >>>>>>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>>>>>> if (ABI_64_P (abfd)) >>>>>>>> return FALSE; >>>>>>>> } >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> /* X32 may not have any REX prefix. */ >>>>>>>> if (ABI_64_P (abfd)) >>>>>>>> return FALSE; >>>>>>>> if (offset < 2 || (offset + 3) > sec->size) >>>>>>>> return FALSE; >>>>>>>> } >>>>>>>> >>>>>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>>>>> this is a bug in binutils. >>>>>>>> >>>>>>> >>>>>>> The last byte of the displacement in the previous instruction >>>>>>> may happen to look like a REX byte. In that case, linker >>>>>>> will overwrite the last byte of the previous instruction and >>>>>>> generate the wrong instruction sequence. >>>>>>> >>>>>>> I need to update linker to enforce the REX byte check. >>>>>> >>>>>> One important observation: if we want to follow the x86_64 TLS spec >>>>>> strictly, we have to use existing DImode patterns only. This also >>>>>> means that we should NOT convert other TLS patterns to Pmode, since >>>>>> they explicitly state movq and addq. If this is not the case, then we >>>>>> need new TLS specification for X32. >>>>> >>>>> Here is a patch to properly generate X32 IE sequence. >>>>> >>>>> This is the summary of differences between x86-64 TLS and x32 TLS: >>>>> >>>>> x86-64 x32 >>>>> GD >>>>> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >>>>> .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; >>>>> call __tls_get_addr@plt >>>>> >>>>> GD->IE optimization >>>>> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >>>>> addq x@gottpoff(%rip),%rax >>>>> >>>>> GD->LE optimization >>>>> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >>>>> leaq x@tpoff(%rax),%rax >>>>> >>>>> LD >>>>> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >>>>> call __tls_get_addr@plt call __tls_get_addr@plt >>>>> >>>>> LD->LE optimization >>>>> .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >>>>> %fs:0, %eax >>>>> >>>>> IE >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >>>>> >>>>> or >>>>> Not supported if >>>>> Pmode == SImode >>>>> movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; >>>>> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>>> >>>>> IE->LE optimization >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >>>>> >>>>> to >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 >>>>> >>>>> or >>>>> >>>>> movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; >>>>> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>>> >>>>> to >>>>> >>>>> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >>>>> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >>>>> >>>>> LE >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 >>>>> >>>>> or >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 >>>>> >>>>> or >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 >>>>> >>>>> or >>>>> >>>>> movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 >>>>> >>>>> >>>>> X32 TLS implementation is straight forward, except for IE: >>>>> >>>>> 1. Since address override works only on the (reg32) part in fs:(reg32), >>>>> we can't use it as memory operand. This patch changes ix86_decompose_address >>>>> to disallow fs:(reg) if Pmode != word_mode. >>>>> 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid >>>>> any instructions between MOV and ADD, which may interfere linker >>>>> IE->LE optimization, since the last byte of the previous instruction >>>>> before ADD may look like a REX prefix. This patch adds tls_initial_exec_x32 >>>>> to make sure that we always have >>>>> >>>>> movl %fs:0, %reg32 >>>>> addl xgottpoff(%rip), %reg32 >>>>> >>>>> so that the last byte of the previous instruction before ADD will >>>>> never be a REX byte. Tested on Linux/x32. >>>>> >>>>> 2012-03-09 H.J. Lu <hongjiu.lu@intel.com> >>>>> >>>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>>> if Pmode != word_mode. >>>>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>>>> Pmode == SImode for x32. >>>>> >>>>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>>>> (tls_initial_exec_x32): Likewise. >>>> >>>> Nice solution! >>>> >>>> OK for mainline. >>> >>> Done. >>> >>>> BTW: Did you investigate the issue with memory aliasing? >>>> >>> >>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 >>> which loads address of the TLS symbol. >>> >>> Thanks. >>> >> >> Since we must use reg64 in %fs:(%reg) memory operand like >> >> movq x@gottpoff(%rip),%reg64; >> mov %fs:(%reg64),%reg >> >> this patch optimizes x32 TLS IE load and store by wrapping >> %reg64 inside of UNSPEC when Pmode == SImode. OK for >> trunk? > > I think we should just scrap all these complications and go with the > idea of clearing MASK_TLS_DIRECT_SEG_REFS. > I will give it a try. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-11 18:16 ` H.J. Lu @ 2012-03-11 18:21 ` Uros Bizjak 2012-03-11 21:25 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-11 18:21 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>>>> if Pmode != word_mode. >>>>>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>>>>> Pmode == SImode for x32. >>>>>> >>>>>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>>>>> (tls_initial_exec_x32): Likewise. >>>>> >>>>> Nice solution! >>>>> >>>>> OK for mainline. >>>> >>>> Done. >>>> >>>>> BTW: Did you investigate the issue with memory aliasing? >>>>> >>>> >>>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 >>>> which loads address of the TLS symbol. >>>> >>>> Thanks. >>>> >>> >>> Since we must use reg64 in %fs:(%reg) memory operand like >>> >>> movq x@gottpoff(%rip),%reg64; >>> mov %fs:(%reg64),%reg >>> >>> this patch optimizes x32 TLS IE load and store by wrapping >>> %reg64 inside of UNSPEC when Pmode == SImode. OK for >>> trunk? >> >> I think we should just scrap all these complications and go with the >> idea of clearing MASK_TLS_DIRECT_SEG_REFS. >> > > I will give it a try. You can also revert: >>>>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>>>> if Pmode != word_mode. then, since this part is handled later in the function. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-11 18:21 ` Uros Bizjak @ 2012-03-11 21:25 ` H.J. Lu 2012-03-12 19:39 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-11 21:25 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Sun, Mar 11, 2012 at 11:21 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>>>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>>>>> if Pmode != word_mode. >>>>>>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>>>>>> Pmode == SImode for x32. >>>>>>> >>>>>>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>>>>>> (tls_initial_exec_x32): Likewise. >>>>>> >>>>>> Nice solution! >>>>>> >>>>>> OK for mainline. >>>>> >>>>> Done. >>>>> >>>>>> BTW: Did you investigate the issue with memory aliasing? >>>>>> >>>>> >>>>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 >>>>> which loads address of the TLS symbol. >>>>> >>>>> Thanks. >>>>> >>>> >>>> Since we must use reg64 in %fs:(%reg) memory operand like >>>> >>>> movq x@gottpoff(%rip),%reg64; >>>> mov %fs:(%reg64),%reg >>>> >>>> this patch optimizes x32 TLS IE load and store by wrapping >>>> %reg64 inside of UNSPEC when Pmode == SImode. OK for >>>> trunk? >>> >>> I think we should just scrap all these complications and go with the >>> idea of clearing MASK_TLS_DIRECT_SEG_REFS. >>> >> >> I will give it a try. > > You can also revert: > >>>>>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>>>>> if Pmode != word_mode. > > then, since this part is handled later in the function. > Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS when Pmode != word_mode. We need to keep else if (Pmode == SImode) { /* Always generate movl %fs:0, %reg32 addl xgottpoff(%rip), %reg32 to support linker IE->LE optimization and avoid fs:(%reg32) as memory operand. */ dest = gen_reg_rtx (Pmode); emit_insn (gen_tls_initial_exec_x32 (dest, x)); return dest; } to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects TLS LE access and fs:(%reg) is only generated by combine. So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable fs:immediate memory operand for TLS LE access, which doesn't have any problems to begin with. I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only fs:(%reg), which is generated by combine. -- H.J. -- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b101922..1ffcc85 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11478,6 +11478,7 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) case UNSPEC: if (XINT (op, 1) == UNSPEC_TP + && Pmode == word_mode && TARGET_TLS_DIRECT_SEG_REFS && seg == SEG_DEFAULT) seg = TARGET_64BIT ? SEG_FS : SEG_GS; @@ -11534,11 +11535,6 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ - /* Since address override works only on the (reg32) part in fs:(reg32), - we can't use it as memory operand. */ - if (Pmode != word_mode && seg == SEG_FS && (base || index)) - return 0; - if (index) { if (REG_P (index)) @@ -12706,7 +12702,9 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + base = get_thread_pointer (for_mov + || Pmode != word_mode + || !TARGET_TLS_DIRECT_SEG_REFS); return gen_rtx_PLUS (Pmode, base, off); } else @@ -13239,7 +13237,7 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!TARGET_TLS_DIRECT_SEG_REFS) + if (Pmode != word_mode || !TARGET_TLS_DIRECT_SEG_REFS) return orig_x; if (MEM_P (x)) x = XEXP (x, 0); ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-11 21:25 ` H.J. Lu @ 2012-03-12 19:39 ` Uros Bizjak 2012-03-12 22:35 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-12 19:39 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1230 bytes --] On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS > when Pmode != word_mode. We need to keep > > else if (Pmode == SImode) > { > /* Always generate > movl %fs:0, %reg32 > addl xgottpoff(%rip), %reg32 > to support linker IE->LE optimization and avoid > fs:(%reg32) as memory operand. */ > dest = gen_reg_rtx (Pmode); > emit_insn (gen_tls_initial_exec_x32 (dest, x)); > return dest; > } > > to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects > TLS LE access and fs:(%reg) is only generated by combine. > > So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable > fs:immediate memory operand for TLS LE access, which doesn't have any problems > to begin with. > > I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only > fs:(%reg), which is generated by combine. Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS to block only indirect seg references. Uros. [-- Attachment #2: p.diff.txt --] [-- Type: text/plain, Size: 2325 bytes --] Index: i386.c =================================================================== --- i386.c (revision 185250) +++ i386.c (working copy) @@ -11552,11 +11552,6 @@ ix86_decompose_address (rtx addr, struct ix86_addr else disp = addr; /* displacement */ - /* Since address override works only on the (reg32) part in fs:(reg32), - we can't use it as memory operand. */ - if (Pmode != word_mode && seg == SEG_FS && (base || index)) - return 0; - if (index) { if (REG_P (index)) @@ -11568,6 +11563,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr return 0; } + if (seg != SEG_DEFAULT && (base || index) + && !TARGET_TLS_INDIRECT_SEG_REFS) + return 0; + /* Extract the integral value of scale. */ if (scale_rtx) { @@ -12696,7 +12695,9 @@ legitimize_tls_address (rtx x, enum tls_model mode if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + base = get_thread_pointer (for_mov + || !(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)); off = force_reg (Pmode, off); return gen_rtx_PLUS (Pmode, base, off); } @@ -12716,7 +12717,9 @@ legitimize_tls_address (rtx x, enum tls_model mode if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + base = get_thread_pointer (for_mov + || !(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)); return gen_rtx_PLUS (Pmode, base, off); } else @@ -13249,7 +13252,8 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!TARGET_TLS_DIRECT_SEG_REFS) + if (!(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)) return orig_x; if (MEM_P (x)) x = XEXP (x, 0); Index: i386.h =================================================================== --- i386.h (revision 185250) +++ i386.h (working copy) @@ -467,6 +467,9 @@ extern int x86_prefetch_sse; #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0 #endif +/* Address override works only on the (%reg) part in %fs:(%reg). */ +#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode) + /* Fence to use after loop using storent. */ extern tree x86_mfence; ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-12 19:39 ` Uros Bizjak @ 2012-03-12 22:35 ` H.J. Lu 2012-03-13 1:21 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-12 22:35 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Mon, Mar 12, 2012 at 12:39 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS >> when Pmode != word_mode. We need to keep >> >> else if (Pmode == SImode) >> { >> /* Always generate >> movl %fs:0, %reg32 >> addl xgottpoff(%rip), %reg32 >> to support linker IE->LE optimization and avoid >> fs:(%reg32) as memory operand. */ >> dest = gen_reg_rtx (Pmode); >> emit_insn (gen_tls_initial_exec_x32 (dest, x)); >> return dest; >> } >> >> to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects >> TLS LE access and fs:(%reg) is only generated by combine. >> >> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable >> fs:immediate memory operand for TLS LE access, which doesn't have any problems >> to begin with. >> >> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only >> fs:(%reg), which is generated by combine. > > Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS > to block only indirect seg references. > > Uros. I am testing it. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-12 22:35 ` H.J. Lu @ 2012-03-13 1:21 ` H.J. Lu 2012-03-13 7:11 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-13 1:21 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Mon, Mar 12, 2012 at 3:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 12, 2012 at 12:39 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> >>> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS >>> when Pmode != word_mode. We need to keep >>> >>> else if (Pmode == SImode) >>> { >>> /* Always generate >>> movl %fs:0, %reg32 >>> addl xgottpoff(%rip), %reg32 >>> to support linker IE->LE optimization and avoid >>> fs:(%reg32) as memory operand. */ >>> dest = gen_reg_rtx (Pmode); >>> emit_insn (gen_tls_initial_exec_x32 (dest, x)); >>> return dest; >>> } >>> >>> to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects >>> TLS LE access and fs:(%reg) is only generated by combine. >>> >>> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable >>> fs:immediate memory operand for TLS LE access, which doesn't have any problems >>> to begin with. >>> >>> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only >>> fs:(%reg), which is generated by combine. >> >> Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS >> to block only indirect seg references. >> >> Uros. > > I am testing it. > There is no regression. BTW, this x32 TLS IE optimization: http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html is still useful. For [hjl@gnu-6 tls]$ cat ie2.i extern __thread long long int x; extern long long int y; void ie2 (void) { x = y; } [hjl@gnu-6 tls]$ my patch turns ie2: .LFB0: .cfi_startproc movq y(%rip), %rdx # 6 *movdi_internal_rex64/2 [length = 7] movl %fs:0, %eax # 5 tls_initial_exec_x32 [length = 16] addl x@gottpoff(%rip), %eax movq %rdx, (%eax) # 7 *movdi_internal_rex64/4 [length = 3] ret # 14 simple_return_internal [length = 1] .cfi_endproc into ie2: .LFB0: .cfi_startproc movq y(%rip), %rax # 6 *movdi_internal_rex64/2 [length = 7] movq x@gottpoff(%rip), %rdx # 7 *tls_initial_exec_x32_store [length = 16] movq %rax, %fs:(%rdx) ret # 14 simple_return_internal [length = 1] .cfi_endproc -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-13 1:21 ` H.J. Lu @ 2012-03-13 7:11 ` Uros Bizjak 2012-03-13 10:37 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-13 7:11 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Tue, Mar 13, 2012 at 2:20 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS >>>> when Pmode != word_mode. We need to keep >>>> >>>> else if (Pmode == SImode) >>>> { >>>> /* Always generate >>>> movl %fs:0, %reg32 >>>> addl xgottpoff(%rip), %reg32 >>>> to support linker IE->LE optimization and avoid >>>> fs:(%reg32) as memory operand. */ >>>> dest = gen_reg_rtx (Pmode); >>>> emit_insn (gen_tls_initial_exec_x32 (dest, x)); >>>> return dest; >>>> } >>>> >>>> to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects >>>> TLS LE access and fs:(%reg) is only generated by combine. >>>> >>>> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable >>>> fs:immediate memory operand for TLS LE access, which doesn't have any problems >>>> to begin with. >>>> >>>> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only >>>> fs:(%reg), which is generated by combine. >>> >>> Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS >>> to block only indirect seg references. > > There is no regression. Thanks, committed to mainline SVN with following ChangeLog: 2012-03-13 Uros Bizjak <ubizjak@gmail.com> * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. * config/i386/i386.c (ix86_decompose_address): Use TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load thread pointer to a register. Tested on x86_64-pc-linux-gnu {,-m32}. > BTW, this x32 TLS IE optimization: > movq %rax, %fs:(%rdx) This is just looking for troubles. If we said these addresses are invalid, then we shouldn't generate them. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-13 7:11 ` Uros Bizjak @ 2012-03-13 10:37 ` Uros Bizjak 2012-03-13 15:47 ` H.J. Lu 2012-03-17 17:53 ` H.J. Lu 0 siblings, 2 replies; 43+ messages in thread From: Uros Bizjak @ 2012-03-13 10:37 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS >>>> to block only indirect seg references. >> >> There is no regression. > > Thanks, committed to mainline SVN with following ChangeLog: > > 2012-03-13 Uros Bizjak <ubizjak@gmail.com> > > * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. > * config/i386/i386.c (ix86_decompose_address): Use > TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. > (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load > thread pointer to a register. > > Tested on x86_64-pc-linux-gnu {,-m32}. > >> BTW, this x32 TLS IE optimization: > > > movq %rax, %fs:(%rdx) > > This is just looking for troubles. If we said these addresses are > invalid, then we shouldn't generate them. OTOH, we can improve rejection test a bit to reject only non-word mode registers. 2012-03-13 Uros Bizjak <ubizjak@gmail.com> * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg) addresses only when %reg is not in word mode. Tested on x86_64-pc-linux-gnu {,-m32}, committed. Uros. Index: i386.c =================================================================== --- i386.c (revision 185278) +++ i386.c (working copy) @@ -11563,8 +11563,10 @@ return 0; } - if (seg != SEG_DEFAULT && (base || index) - && !TARGET_TLS_INDIRECT_SEG_REFS) +/* Address override works only on the (%reg) part of %fs:(%reg). */ + if (seg != SEG_DEFAULT + && ((base && GET_MODE (base) != word_mode) + || (index && GET_MODE (index) != word_mode))) return 0; /* Extract the integral value of scale. */ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-13 10:37 ` Uros Bizjak @ 2012-03-13 15:47 ` H.J. Lu 2012-03-17 17:53 ` H.J. Lu 1 sibling, 0 replies; 43+ messages in thread From: H.J. Lu @ 2012-03-13 15:47 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > >>>>> Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS >>>>> to block only indirect seg references. >>> >>> There is no regression. >> >> Thanks, committed to mainline SVN with following ChangeLog: >> >> 2012-03-13 Uros Bizjak <ubizjak@gmail.com> >> >> * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. >> * config/i386/i386.c (ix86_decompose_address): Use >> TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. >> (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load >> thread pointer to a register. >> >> Tested on x86_64-pc-linux-gnu {,-m32}. >> >>> BTW, this x32 TLS IE optimization: >> >> > movq %rax, %fs:(%rdx) >> >> This is just looking for troubles. If we said these addresses are >> invalid, then we shouldn't generate them. > > OTOH, we can improve rejection test a bit to reject only non-word > mode registers. > > 2012-03-13 Uros Bizjak <ubizjak@gmail.com> > > * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg) > addresses only when %reg is not in word mode. > > Tested on x86_64-pc-linux-gnu {,-m32}, committed. > > Uros. > > Index: i386.c > =================================================================== > --- i386.c (revision 185278) > +++ i386.c (working copy) > @@ -11563,8 +11563,10 @@ > return 0; > } > > - if (seg != SEG_DEFAULT && (base || index) > - && !TARGET_TLS_INDIRECT_SEG_REFS) > +/* Address override works only on the (%reg) part of %fs:(%reg). */ > + if (seg != SEG_DEFAULT > + && ((base && GET_MODE (base) != word_mode) > + || (index && GET_MODE (index) != word_mode))) > return 0; > > /* Extract the integral value of scale. */ This works. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-13 10:37 ` Uros Bizjak 2012-03-13 15:47 ` H.J. Lu @ 2012-03-17 17:53 ` H.J. Lu 1 sibling, 0 replies; 43+ messages in thread From: H.J. Lu @ 2012-03-17 17:53 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > >>>>> Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS >>>>> to block only indirect seg references. >>> >>> There is no regression. >> >> Thanks, committed to mainline SVN with following ChangeLog: >> >> 2012-03-13 Uros Bizjak <ubizjak@gmail.com> >> >> * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. >> * config/i386/i386.c (ix86_decompose_address): Use >> TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. >> (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load >> thread pointer to a register. >> >> Tested on x86_64-pc-linux-gnu {,-m32}. >> >>> BTW, this x32 TLS IE optimization: >> >> > movq %rax, %fs:(%rdx) >> >> This is just looking for troubles. If we said these addresses are >> invalid, then we shouldn't generate them. > > OTOH, we can improve rejection test a bit to reject only non-word > mode registers. > > 2012-03-13 Uros Bizjak <ubizjak@gmail.com> > > * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg) > addresses only when %reg is not in word mode. > > Tested on x86_64-pc-linux-gnu {,-m32}, committed. > > Uros. > > Index: i386.c > =================================================================== > --- i386.c (revision 185278) > +++ i386.c (working copy) > @@ -11563,8 +11563,10 @@ > return 0; > } > > - if (seg != SEG_DEFAULT && (base || index) > - && !TARGET_TLS_INDIRECT_SEG_REFS) > +/* Address override works only on the (%reg) part of %fs:(%reg). */ > + if (seg != SEG_DEFAULT > + && ((base && GET_MODE (base) != word_mode) > + || (index && GET_MODE (index) != word_mode))) > return 0; > > /* Extract the integral value of scale. */ Is my x32 TLS IE optimization: http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html OK for trunk? Thanks. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-11 17:12 ` H.J. Lu 2012-03-11 17:55 ` Uros Bizjak @ 2012-03-17 18:10 ` Uros Bizjak 2012-03-17 18:19 ` H.J. Lu 1 sibling, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-17 18:10 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > Since we must use reg64 in %fs:(%reg) memory operand like > > movq x@gottpoff(%rip),%reg64; > mov %fs:(%reg64),%reg > > this patch optimizes x32 TLS IE load and store by wrapping > %reg64 inside of UNSPEC when Pmode == SImode. OK for > trunk? > > Thanks. > > -- > H.J. > --- > 2012-03-11 H.J. Lu <hongjiu.lu@intel.com> > > * config/i386/i386.md (*tls_initial_exec_x32_load): New. > (*tls_initial_exec_x32_store): Likewise. Can you implement this with define_insn_and_split, like i.e. *tls_dynamic_gnu2_combine_32 ? Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-17 18:10 ` Uros Bizjak @ 2012-03-17 18:19 ` H.J. Lu 2012-03-17 18:21 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-17 18:19 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Sat, Mar 17, 2012 at 11:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >> Since we must use reg64 in %fs:(%reg) memory operand like >> >> movq x@gottpoff(%rip),%reg64; >> mov %fs:(%reg64),%reg >> >> this patch optimizes x32 TLS IE load and store by wrapping >> %reg64 inside of UNSPEC when Pmode == SImode. OK for >> trunk? >> >> Thanks. >> >> -- >> H.J. >> --- >> 2012-03-11 H.J. Lu <hongjiu.lu@intel.com> >> >> * config/i386/i386.md (*tls_initial_exec_x32_load): New. >> (*tls_initial_exec_x32_store): Likewise. > > Can you implement this with define_insn_and_split, like i.e. > *tls_dynamic_gnu2_combine_32 ? > I will give it a try again. Last time when I tried it, GCC didn't like memory operand in DImode when Pmode == SImode. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-17 18:19 ` H.J. Lu @ 2012-03-17 18:21 ` Uros Bizjak 2012-03-17 21:50 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-17 18:21 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> Since we must use reg64 in %fs:(%reg) memory operand like >>> >>> movq x@gottpoff(%rip),%reg64; >>> mov %fs:(%reg64),%reg >>> >>> this patch optimizes x32 TLS IE load and store by wrapping >>> %reg64 inside of UNSPEC when Pmode == SImode. OK for >>> trunk? >>> >>> Thanks. >>> >>> -- >>> H.J. >>> --- >>> 2012-03-11 H.J. Lu <hongjiu.lu@intel.com> >>> >>> * config/i386/i386.md (*tls_initial_exec_x32_load): New. >>> (*tls_initial_exec_x32_store): Likewise. >> >> Can you implement this with define_insn_and_split, like i.e. >> *tls_dynamic_gnu2_combine_32 ? >> > > I will give it a try again. Last time when I tried it, GCC didn't > like memory operand in DImode when Pmode == SImode. You should remove mode for tls_symbolic_operand predicate. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-17 18:21 ` Uros Bizjak @ 2012-03-17 21:50 ` H.J. Lu 2012-03-18 16:02 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-17 21:50 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1065 bytes --] On Sat, Mar 17, 2012 at 11:20 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> Since we must use reg64 in %fs:(%reg) memory operand like >>>> >>>> movq x@gottpoff(%rip),%reg64; >>>> mov %fs:(%reg64),%reg >>>> >>>> this patch optimizes x32 TLS IE load and store by wrapping >>>> %reg64 inside of UNSPEC when Pmode == SImode. OK for >>>> trunk? >>>> >>>> Thanks. >>>> >>>> -- >>>> H.J. >>>> --- >>>> 2012-03-11 H.J. Lu <hongjiu.lu@intel.com> >>>> >>>> * config/i386/i386.md (*tls_initial_exec_x32_load): New. >>>> (*tls_initial_exec_x32_store): Likewise. >>> >>> Can you implement this with define_insn_and_split, like i.e. >>> *tls_dynamic_gnu2_combine_32 ? >>> >> >> I will give it a try again. Last time when I tried it, GCC didn't >> like memory operand in DImode when Pmode == SImode. > > You should remove mode for tls_symbolic_operand predicate. > I am testing this patch. OK for trunk if it passes all tests? Thanks. -- H.J. [-- Attachment #2: gcc-x32-tls-3.patch --] [-- Type: text/plain, Size: 2938 bytes --] 2012-03-17 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New. * config/i386/i386.c (ix86_split_tls_initial_exec_x32): Likewise. * config/i386/i386.md (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 630112f..2c4f1ed 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -213,6 +213,7 @@ extern unsigned int ix86_get_callcvt (const_tree); #endif extern rtx ix86_tls_module_base (void); +extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool); extern void ix86_expand_vector_init (bool, rtx, rtx); extern void ix86_expand_vector_set (bool, rtx, rtx, int); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 78a366e..5a9c673 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12754,6 +12754,28 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) return dest; } +/* Split x32 TLS IE access in MODE. Split load if LOAD is TRUE, + otherwise split store. */ + +void +ix86_split_tls_initial_exec_x32 (rtx operands[], + enum machine_mode mode, bool load) +{ + rtx base, mem; + rtx off = load ? operands[1] : operands[0]; + off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF); + off = gen_rtx_CONST (DImode, off); + off = gen_const_mem (DImode, off); + set_mem_alias_set (off, ix86_GOT_alias_set ()); + base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP); + off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off)); + mem = gen_rtx_MEM (mode, off); + if (load) + emit_move_insn (operands[0], mem); + else + emit_move_insn (mem, operands[1]); +} + /* Create or return the unique __imp_DECL dllimport symbol corresponding to symbol DECL. */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index eae26ae..78faeec 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12858,6 +12858,32 @@ } [(set_attr "type" "multi")]) +(define_insn_and_split "*tls_initial_exec_x32_load" + [(set (match_operand:SWI1248x 0 "register_operand" "=r") + (mem:SWI1248x + (unspec:SI + [(match_operand 1 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32" + "#" + "" + [(const_int 0)] + "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, TRUE); DONE;") + +(define_insn_and_split "*tls_initial_exec_x32_store" + [(set (mem:SWI1248x + (unspec:SI + [(match_operand 0 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32)) + (match_operand:SWI1248x 1 "register_operand" "r")) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32" + "#" + "" + [(const_int 0)] + "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, FALSE); DONE;") + ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-17 21:50 ` H.J. Lu @ 2012-03-18 16:02 ` Uros Bizjak 2012-03-18 20:55 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-18 16:02 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Sat, Mar 17, 2012 at 10:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> Since we must use reg64 in %fs:(%reg) memory operand like >>>>> >>>>> movq x@gottpoff(%rip),%reg64; >>>>> mov %fs:(%reg64),%reg >>>>> >>>>> this patch optimizes x32 TLS IE load and store by wrapping >>>>> %reg64 inside of UNSPEC when Pmode == SImode. OK for >>>>> trunk? >>>> >>>> Can you implement this with define_insn_and_split, like i.e. >>>> *tls_dynamic_gnu2_combine_32 ? >>>> >>> >>> I will give it a try again. Last time when I tried it, GCC didn't >>> like memory operand in DImode when Pmode == SImode. >> >> You should remove mode for tls_symbolic_operand predicate. >> > > I am testing this patch. OK for trunk if it passes all tests? No, force_reg will generate a pseudo, so this conversion is valid only for !can_create_pseudo (). At least for *tls_initial_exec_x32_store, you will need a temporary to split the pattern after reload. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-18 16:02 ` Uros Bizjak @ 2012-03-18 20:55 ` Uros Bizjak 2012-03-19 15:51 ` H.J. Lu 2012-03-20 8:52 ` Eric Botcazou 0 siblings, 2 replies; 43+ messages in thread From: Uros Bizjak @ 2012-03-18 20:55 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1081 bytes --] On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> I am testing this patch. OK for trunk if it passes all tests? > > No, force_reg will generate a pseudo, so this conversion is valid only > for !can_create_pseudo (). > > At least for *tls_initial_exec_x32_store, you will need a temporary to > split the pattern after reload. Please try attached patch. It simply throws away all recent complications w.r.t. to thread pointer and always handles TP in DImode. The testcase: --cut here-- __thread int foo __attribute__ ((tls_model ("initial-exec"))); void bar (int x) { foo = x; } int baz (void) { return foo; } --cut here-- Now compiles to: bar: movq foo@gottpoff(%rip), %rax movl %edi, %fs:(%rax) ret baz: movq foo@gottpoff(%rip), %rax movl %fs:(%rax), %eax ret In effect, this always generates %fs(%rDI) and emits REX prefix before mov/add to satisfy brain-dead linkers. The patch is bootstrapping now on x86_64-pc-linux-gnu. Uros. [-- Attachment #2: p.diff.txt --] [-- Type: text/plain, Size: 6435 bytes --] Index: i386.md =================================================================== --- i386.md (revision 185505) +++ i386.md (working copy) @@ -12836,28 +12836,6 @@ } [(set_attr "type" "multi")]) -;; When Pmode == SImode, there may be no REX prefix for ADD. Avoid -;; any instructions between MOV and ADD, which may interfere linker -;; IE->LE optimization, since the last byte of the previous instruction -;; before ADD may look like a REX prefix. This also avoids -;; movl x@gottpoff(%rip), %reg32 -;; movl $fs:(%reg32), %reg32 -;; Since address override works only on the (reg32) part in fs:(reg32), -;; we can't use it as memory operand. -(define_insn "tls_initial_exec_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI - [(match_operand 1 "tls_symbolic_operand")] - UNSPEC_TLS_IE_X32)) - (clobber (reg:CC FLAGS_REG))] - "TARGET_X32" -{ - output_asm_insn - ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands); - return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}"; -} - [(set_attr "type" "multi")]) - ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" Index: i386.c =================================================================== --- i386.c (revision 185504) +++ i386.c (working copy) @@ -11509,6 +11509,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr scale = 1 << scale; break; + case ZERO_EXTEND: + op = XEXP (op, 0); + /* FALLTHRU */ + case UNSPEC: if (XINT (op, 1) == UNSPEC_TP && TARGET_TLS_DIRECT_SEG_REFS @@ -12478,15 +12482,15 @@ legitimize_pic_address (rtx orig, rtx reg) /* Load the thread pointer. If TO_REG is true, force it into a register. */ static rtx -get_thread_pointer (bool to_reg) +get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); - if (GET_MODE (tp) != Pmode) - tp = convert_to_mode (Pmode, tp, 1); + if (GET_MODE (tp) != tp_mode) + tp = convert_to_mode (tp_mode, tp, 1); if (to_reg) - tp = copy_addr_to_reg (tp); + tp = copy_to_mode_reg (tp_mode, tp); return tp; } @@ -12538,6 +12542,7 @@ legitimize_tls_address (rtx x, enum tls_model mode { rtx dest, base, off; rtx pic = NULL_RTX, tp = NULL_RTX; + enum machine_mode tp_mode = Pmode; int type; switch (model) @@ -12563,7 +12568,7 @@ legitimize_tls_address (rtx x, enum tls_model mode else emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest)); set_unique_reg_note (get_last_insn (), REG_EQUAL, x); @@ -12613,7 +12618,7 @@ legitimize_tls_address (rtx x, enum tls_model mode else emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); set_unique_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_MINUS (Pmode, tmp, tp)); } @@ -12659,27 +12664,18 @@ legitimize_tls_address (rtx x, enum tls_model mode case TLS_MODEL_INITIAL_EXEC: if (TARGET_64BIT) { + tp_mode = DImode; + if (TARGET_SUN_TLS) { /* The Sun linker took the AMD64 TLS spec literally and can only handle %rax as destination of the initial executable code sequence. */ - dest = gen_reg_rtx (Pmode); + dest = gen_reg_rtx (tp_mode); emit_insn (gen_tls_initial_exec_64_sun (dest, x)); return dest; } - else if (Pmode == SImode) - { - /* Always generate - movl %fs:0, %reg32 - addl xgottpoff(%rip), %reg32 - to support linker IE->LE optimization and avoid - fs:(%reg32) as memory operand. */ - dest = gen_reg_rtx (Pmode); - emit_insn (gen_tls_initial_exec_x32 (dest, x)); - return dest; - } pic = NULL; type = UNSPEC_GOTNTPOFF; @@ -12703,24 +12699,23 @@ legitimize_tls_address (rtx x, enum tls_model mode type = UNSPEC_INDNTPOFF; } - off = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, x), type); - off = gen_rtx_CONST (Pmode, off); + off = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, x), type); + off = gen_rtx_CONST (tp_mode, off); if (pic) - off = gen_rtx_PLUS (Pmode, pic, off); - off = gen_const_mem (Pmode, off); + off = gen_rtx_PLUS (tp_mode, pic, off); + off = gen_const_mem (tp_mode, off); set_mem_alias_set (off, ix86_GOT_alias_set ()); if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov - || !(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)); - off = force_reg (Pmode, off); - return gen_rtx_PLUS (Pmode, base, off); + base = get_thread_pointer (tp_mode, + for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + off = force_reg (tp_mode, off); + return gen_rtx_PLUS (tp_mode, base, off); } else { - base = get_thread_pointer (true); + base = get_thread_pointer (Pmode, true); dest = gen_reg_rtx (Pmode); emit_insn (gen_subsi3 (dest, base, off)); } @@ -12734,14 +12729,13 @@ legitimize_tls_address (rtx x, enum tls_model mode if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov - || !(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)); + base = get_thread_pointer (Pmode, + for_mov || !TARGET_TLS_DIRECT_SEG_REFS); return gen_rtx_PLUS (Pmode, base, off); } else { - base = get_thread_pointer (true); + base = get_thread_pointer (Pmode, true); dest = gen_reg_rtx (Pmode); emit_insn (gen_subsi3 (dest, base, off)); } @@ -13269,8 +13263,7 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)) + if (!TARGET_TLS_DIRECT_SEG_REFS) return orig_x; if (MEM_P (x)) x = XEXP (x, 0); Index: i386.h =================================================================== --- i386.h (revision 185504) +++ i386.h (working copy) @@ -467,9 +467,6 @@ extern int x86_prefetch_sse; #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0 #endif -/* Address override works only on the (%reg) part of %fs:(%reg). */ -#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode) - /* Fence to use after loop using storent. */ extern tree x86_mfence; ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-18 20:55 ` Uros Bizjak @ 2012-03-19 15:51 ` H.J. Lu 2012-03-19 15:54 ` H.J. Lu 2012-03-20 8:52 ` Eric Botcazou 1 sibling, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-19 15:51 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1714 bytes --] On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > >>> I am testing this patch. OK for trunk if it passes all tests? >> >> No, force_reg will generate a pseudo, so this conversion is valid only >> for !can_create_pseudo (). >> >> At least for *tls_initial_exec_x32_store, you will need a temporary to >> split the pattern after reload. Here is the updated patch to add can_create_pseudo. I also changed tls_initial_exec_x32 to take an input register operand as thread pointer. > Please try attached patch. It simply throws away all recent > complications w.r.t. to thread pointer and always handles TP in > DImode. > > The testcase: > > --cut here-- > __thread int foo __attribute__ ((tls_model ("initial-exec"))); > > void bar (int x) > { > foo = x; > } > > int baz (void) > { > return foo; > } > --cut here-- > > Now compiles to: > > bar: > movq foo@gottpoff(%rip), %rax > movl %edi, %fs:(%rax) > ret > > baz: > movq foo@gottpoff(%rip), %rax > movl %fs:(%rax), %eax > ret > > In effect, this always generates %fs(%rDI) and emits REX prefix before > mov/add to satisfy brain-dead linkers. > > The patch is bootstrapping now on x86_64-pc-linux-gnu. > For -- extern __thread char c; extern char y; void ie (void) { y = c; } -- Your patch generates: movl %fs:0, %eax movq c@gottpoff(%rip), %rdx movzbl (%rax,%rdx), %edx movb %dl, y(%rip) ret It can be optimized to: movq c@gottpoff(%rip), %rax movzbl %fs:(%rax), %eax movb %al, y(%rip) ret H.J. [-- Attachment #2: gcc-x32-tls-4.patch --] [-- Type: text/x-patch, Size: 6055 bytes --] 2012-03-19 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New. * config/i386/i386.c (legitimize_tls_address): Also pass thread pointer to gen_tls_initial_exec_x32. (ix86_split_tls_initial_exec_x32): New. * config/i386/i386.md (*load_tp_x32): Renamed to ... (*load_tp_x32_<mode>): This. Replace SI with SWI48x. (tls_initial_exec_x32): Add an input register operand as thread pointer. Generate a REX prefix if needed. (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 630112f..528eeaa 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -142,6 +142,7 @@ extern void ix86_split_lshr (rtx *, rtx, enum machine_mode); extern rtx ix86_find_base_term (rtx); extern bool ix86_check_movabs (rtx, int); extern void ix86_split_idivmod (enum machine_mode, rtx[], bool); +extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool); extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot); extern int ix86_attr_length_immediate_default (rtx, bool); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 78a366e..fb802ee 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12671,13 +12671,14 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) } else if (Pmode == SImode) { - /* Always generate - movl %fs:0, %reg32 + /* Always generate a REX prefix for addl xgottpoff(%rip), %reg32 - to support linker IE->LE optimization and avoid - fs:(%reg32) as memory operand. */ + to support linker IE->LE optimization. */ dest = gen_reg_rtx (Pmode); - emit_insn (gen_tls_initial_exec_x32 (dest, x)); + base = get_thread_pointer (for_mov + || !(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)); + emit_insn (gen_tls_initial_exec_x32 (dest, base, x)); return dest; } @@ -12754,6 +12755,28 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) return dest; } +/* Split x32 TLS IE access in MODE. Split load if LOAD is TRUE, + otherwise split store. */ + +void +ix86_split_tls_initial_exec_x32 (rtx operands[], + enum machine_mode mode, bool load) +{ + rtx base, mem; + rtx off = load ? operands[1] : operands[0]; + off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF); + off = gen_rtx_CONST (DImode, off); + off = gen_const_mem (DImode, off); + set_mem_alias_set (off, ix86_GOT_alias_set ()); + base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP); + off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off)); + mem = gen_rtx_MEM (mode, off); + if (load) + emit_move_insn (operands[0], mem); + else + emit_move_insn (mem, operands[1]); +} + /* Create or return the unique __imp_DECL dllimport symbol corresponding to symbol DECL. */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index eae26ae..1643792 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12747,11 +12747,11 @@ (define_mode_attr tp_seg [(SI "gs") (DI "fs")]) ;; Load and add the thread base pointer from %<tp_seg>:0. -(define_insn "*load_tp_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI [(const_int 0)] UNSPEC_TP))] +(define_insn "*load_tp_x32_<mode>" + [(set (match_operand:SWI48x 0 "register_operand" "=r") + (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] "TARGET_X32" - "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" + "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" [(set_attr "type" "imov") (set_attr "modrm" "0") (set_attr "length" "7") @@ -12836,27 +12836,54 @@ } [(set_attr "type" "multi")]) -;; When Pmode == SImode, there may be no REX prefix for ADD. Avoid -;; any instructions between MOV and ADD, which may interfere linker -;; IE->LE optimization, since the last byte of the previous instruction -;; before ADD may look like a REX prefix. This also avoids -;; movl x@gottpoff(%rip), %reg32 -;; movl $fs:(%reg32), %reg32 -;; Since address override works only on the (reg32) part in fs:(reg32), -;; we can't use it as memory operand. +;; When Pmode == SImode, there may be no REX prefix for ADD. Make sure +;; there is a REX prefix. (define_insn "tls_initial_exec_x32" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI - [(match_operand 1 "tls_symbolic_operand" "")] + [(match_operand:SI 1 "register_operand" "0") + (match_operand 2 "tls_symbolic_operand" "")] UNSPEC_TLS_IE_X32)) (clobber (reg:CC FLAGS_REG))] "TARGET_X32" { - output_asm_insn - ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands); - return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}"; + if (!REX_INT_REG_P (operands[0])) + fputs ("\trex ", asm_out_file); + return "add{l}\t{%a2@gottpoff(%%rip), %0|%0, %a2@gottpoff[rip]}"; } - [(set_attr "type" "multi")]) + [(set_attr "type" "alu") + (set_attr "length" "7") + (set_attr "memory" "load")]) + +(define_insn_and_split "*tls_initial_exec_x32_load" + [(set (match_operand:SWI1248x 0 "register_operand" "=r") + (mem:SWI1248x + (unspec:SI + [(unspec:SI [(const_int 0)] UNSPEC_TP) + (match_operand 1 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32 + && can_create_pseudo_p ()" + "#" + "" + [(const_int 0)] + "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, TRUE); DONE;") + +(define_insn_and_split "*tls_initial_exec_x32_store" + [(set (mem:SWI1248x + (unspec:SI + [(unspec:SI [(const_int 0)] UNSPEC_TP) + (match_operand 0 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32)) + (match_operand:SWI1248x 1 "register_operand" "r")) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32 + && can_create_pseudo_p ()" + "#" + "" + [(const_int 0)] + "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, FALSE); DONE;") ;; GNU2 TLS patterns can be split. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 15:51 ` H.J. Lu @ 2012-03-19 15:54 ` H.J. Lu 2012-03-19 16:20 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-19 15:54 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> >>>> I am testing this patch. OK for trunk if it passes all tests? >>> >>> No, force_reg will generate a pseudo, so this conversion is valid only >>> for !can_create_pseudo (). >>> >>> At least for *tls_initial_exec_x32_store, you will need a temporary to >>> split the pattern after reload. > > Here is the updated patch to add can_create_pseudo. I also changed > tls_initial_exec_x32 to take an input register operand as thread pointer. > >> Please try attached patch. It simply throws away all recent >> complications w.r.t. to thread pointer and always handles TP in >> DImode. >> >> The testcase: >> >> --cut here-- >> __thread int foo __attribute__ ((tls_model ("initial-exec"))); >> >> void bar (int x) >> { >> foo = x; >> } >> >> int baz (void) >> { >> return foo; >> } >> --cut here-- >> >> Now compiles to: >> >> bar: >> movq foo@gottpoff(%rip), %rax >> movl %edi, %fs:(%rax) >> ret >> >> baz: >> movq foo@gottpoff(%rip), %rax >> movl %fs:(%rax), %eax >> ret >> >> In effect, this always generates %fs(%rDI) and emits REX prefix before >> mov/add to satisfy brain-dead linkers. >> >> The patch is bootstrapping now on x86_64-pc-linux-gnu. >> > > For > > -- > extern __thread char c; > extern char y; > void > ie (void) > { > y = c; > } > -- > > Your patch generates: > > movl %fs:0, %eax > movq c@gottpoff(%rip), %rdx > movzbl (%rax,%rdx), %edx > movb %dl, y(%rip) > ret > > It can be optimized to: > > movq c@gottpoff(%rip), %rax > movzbl %fs:(%rax), %eax > movb %al, y(%rip) > ret > Combine failed: (set (reg:QI 63 [ c ]) (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ (const_int 0 [0]) ] UNSPEC_TP)) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:SI ("c") [flags 0x60] <var_decl 0x7ffff19b8140 c>) ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 15:54 ` H.J. Lu @ 2012-03-19 16:20 ` H.J. Lu 2012-03-19 16:35 ` H.J. Lu 2012-03-19 16:47 ` Uros Bizjak 0 siblings, 2 replies; 43+ messages in thread From: H.J. Lu @ 2012-03-19 16:20 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> >>>>> I am testing this patch. OK for trunk if it passes all tests? >>>> >>>> No, force_reg will generate a pseudo, so this conversion is valid only >>>> for !can_create_pseudo (). >>>> >>>> At least for *tls_initial_exec_x32_store, you will need a temporary to >>>> split the pattern after reload. >> >> Here is the updated patch to add can_create_pseudo. I also changed >> tls_initial_exec_x32 to take an input register operand as thread pointer. >> >>> Please try attached patch. It simply throws away all recent >>> complications w.r.t. to thread pointer and always handles TP in >>> DImode. >>> >>> The testcase: >>> >>> --cut here-- >>> __thread int foo __attribute__ ((tls_model ("initial-exec"))); >>> >>> void bar (int x) >>> { >>> foo = x; >>> } >>> >>> int baz (void) >>> { >>> return foo; >>> } >>> --cut here-- >>> >>> Now compiles to: >>> >>> bar: >>> movq foo@gottpoff(%rip), %rax >>> movl %edi, %fs:(%rax) >>> ret >>> >>> baz: >>> movq foo@gottpoff(%rip), %rax >>> movl %fs:(%rax), %eax >>> ret >>> >>> In effect, this always generates %fs(%rDI) and emits REX prefix before >>> mov/add to satisfy brain-dead linkers. >>> >>> The patch is bootstrapping now on x86_64-pc-linux-gnu. >>> >> >> For >> >> -- >> extern __thread char c; >> extern char y; >> void >> ie (void) >> { >> y = c; >> } >> -- >> >> Your patch generates: >> >> movl %fs:0, %eax >> movq c@gottpoff(%rip), %rdx >> movzbl (%rax,%rdx), %edx >> movb %dl, y(%rip) >> ret >> >> It can be optimized to: >> >> movq c@gottpoff(%rip), %rax >> movzbl %fs:(%rax), %eax >> movb %al, y(%rip) >> ret >> > > Combine failed: > > (set (reg:QI 63 [ c ]) > (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ > (const_int 0 [0]) > ] UNSPEC_TP)) > (mem/u/c:DI (const:DI (unspec:DI [ > (symbol_ref:SI ("c") [flags 0x60] > <var_decl 0x7ffff19b8140 c>) > ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) > > Wrong testcase. IT should be -- extern __thread char c; extern __thread short w; extern char y; extern short i; void ie (void) { y = c; i = w; } --- I got movl %fs:0, %eax movq c@gottpoff(%rip), %rdx movzbl (%rax,%rdx), %edx movb %dl, y(%rip) movq w@gottpoff(%rip), %rdx movzwl (%rax,%rdx), %eax movw %ax, i(%rip) ret It can be movq c@gottpoff(%rip), %rax movzbl %fs:(%rax), %eax movb %al, y(%rip) movq w@gottpoff(%rip), %rax movzwl %fs:(%rax), %eax movw %ax, i(%rip) ret -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 16:20 ` H.J. Lu @ 2012-03-19 16:35 ` H.J. Lu 2012-03-19 16:38 ` Uros Bizjak 2012-03-19 16:47 ` Uros Bizjak 1 sibling, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-19 16:35 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 3852 bytes --] On Mon, Mar 19, 2012 at 9:19 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> >>>>>> I am testing this patch. OK for trunk if it passes all tests? >>>>> >>>>> No, force_reg will generate a pseudo, so this conversion is valid only >>>>> for !can_create_pseudo (). >>>>> >>>>> At least for *tls_initial_exec_x32_store, you will need a temporary to >>>>> split the pattern after reload. >>> >>> Here is the updated patch to add can_create_pseudo. I also changed >>> tls_initial_exec_x32 to take an input register operand as thread pointer. >>> >>>> Please try attached patch. It simply throws away all recent >>>> complications w.r.t. to thread pointer and always handles TP in >>>> DImode. >>>> >>>> The testcase: >>>> >>>> --cut here-- >>>> __thread int foo __attribute__ ((tls_model ("initial-exec"))); >>>> >>>> void bar (int x) >>>> { >>>> foo = x; >>>> } >>>> >>>> int baz (void) >>>> { >>>> return foo; >>>> } >>>> --cut here-- >>>> >>>> Now compiles to: >>>> >>>> bar: >>>> movq foo@gottpoff(%rip), %rax >>>> movl %edi, %fs:(%rax) >>>> ret >>>> >>>> baz: >>>> movq foo@gottpoff(%rip), %rax >>>> movl %fs:(%rax), %eax >>>> ret >>>> >>>> In effect, this always generates %fs(%rDI) and emits REX prefix before >>>> mov/add to satisfy brain-dead linkers. >>>> >>>> The patch is bootstrapping now on x86_64-pc-linux-gnu. >>>> >>> >>> For >>> >>> -- >>> extern __thread char c; >>> extern char y; >>> void >>> ie (void) >>> { >>> y = c; >>> } >>> -- >>> >>> Your patch generates: >>> >>> movl %fs:0, %eax >>> movq c@gottpoff(%rip), %rdx >>> movzbl (%rax,%rdx), %edx >>> movb %dl, y(%rip) >>> ret >>> >>> It can be optimized to: >>> >>> movq c@gottpoff(%rip), %rax >>> movzbl %fs:(%rax), %eax >>> movb %al, y(%rip) >>> ret >>> >> >> Combine failed: >> >> (set (reg:QI 63 [ c ]) >> (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ >> (const_int 0 [0]) >> ] UNSPEC_TP)) >> (mem/u/c:DI (const:DI (unspec:DI [ >> (symbol_ref:SI ("c") [flags 0x60] >> <var_decl 0x7ffff19b8140 c>) >> ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) >> >> > > Wrong testcase. IT should be > > -- > extern __thread char c; > extern __thread short w; > extern char y; > extern short i; > void > ie (void) > { > y = c; > i = w; > } > --- > > I got > > movl %fs:0, %eax > movq c@gottpoff(%rip), %rdx > movzbl (%rax,%rdx), %edx > movb %dl, y(%rip) > movq w@gottpoff(%rip), %rdx > movzwl (%rax,%rdx), %eax > movw %ax, i(%rip) > ret > > It can be > > movq c@gottpoff(%rip), %rax > movzbl %fs:(%rax), %eax > movb %al, y(%rip) > movq w@gottpoff(%rip), %rax > movzwl %fs:(%rax), %eax > movw %ax, i(%rip) > ret > > How about this patch? I changed 32 TP load to (define_insn "*load_tp_x32_<mode>" [(set (match_operand:SWI48x 0 "register_operand" "=r") (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] "TARGET_X32" "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" [(set_attr "type" "imov") (set_attr "modrm" "0") (set_attr "length" "7") (set_attr "memory" "load") (set_attr "imm_disp" "false")]) and removed *load_tp_x32_zext. -- H.J. [-- Attachment #2: gcc-x32-tls-5.patch --] [-- Type: text/x-patch, Size: 7130 bytes --] diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 9aa5ee7..66221e4 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12483,15 +12483,12 @@ legitimize_pic_address (rtx orig, rtx reg) /* Load the thread pointer. If TO_REG is true, force it into a register. */ static rtx -get_thread_pointer (bool to_reg) +get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { - rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); - - if (GET_MODE (tp) != Pmode) - tp = convert_to_mode (Pmode, tp, 1); + rtx tp = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); if (to_reg) - tp = copy_addr_to_reg (tp); + tp = copy_to_mode_reg (tp_mode, tp); return tp; } @@ -12543,6 +12540,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) { rtx dest, base, off; rtx pic = NULL_RTX, tp = NULL_RTX; + enum machine_mode tp_mode = Pmode; int type; switch (model) @@ -12568,7 +12566,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) else emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest)); set_unique_reg_note (get_last_insn (), REG_EQUAL, x); @@ -12618,7 +12616,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) else emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); set_unique_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_MINUS (Pmode, tmp, tp)); } @@ -12664,27 +12662,18 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) case TLS_MODEL_INITIAL_EXEC: if (TARGET_64BIT) { + tp_mode = DImode; + if (TARGET_SUN_TLS) { /* The Sun linker took the AMD64 TLS spec literally and can only handle %rax as destination of the initial executable code sequence. */ - dest = gen_reg_rtx (Pmode); + dest = gen_reg_rtx (tp_mode); emit_insn (gen_tls_initial_exec_64_sun (dest, x)); return dest; } - else if (Pmode == SImode) - { - /* Always generate - movl %fs:0, %reg32 - addl xgottpoff(%rip), %reg32 - to support linker IE->LE optimization and avoid - fs:(%reg32) as memory operand. */ - dest = gen_reg_rtx (Pmode); - emit_insn (gen_tls_initial_exec_x32 (dest, x)); - return dest; - } pic = NULL; type = UNSPEC_GOTNTPOFF; @@ -12708,24 +12697,23 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) type = UNSPEC_INDNTPOFF; } - off = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, x), type); - off = gen_rtx_CONST (Pmode, off); + off = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, x), type); + off = gen_rtx_CONST (tp_mode, off); if (pic) - off = gen_rtx_PLUS (Pmode, pic, off); - off = gen_const_mem (Pmode, off); + off = gen_rtx_PLUS (tp_mode, pic, off); + off = gen_const_mem (tp_mode, off); set_mem_alias_set (off, ix86_GOT_alias_set ()); if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov - || !(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)); - off = force_reg (Pmode, off); - return gen_rtx_PLUS (Pmode, base, off); + base = get_thread_pointer (tp_mode, + for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + off = force_reg (tp_mode, off); + return gen_rtx_PLUS (tp_mode, base, off); } else { - base = get_thread_pointer (true); + base = get_thread_pointer (Pmode, true); dest = gen_reg_rtx (Pmode); emit_insn (ix86_gen_sub3 (dest, base, off)); } @@ -12739,14 +12727,13 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov - || !(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)); + base = get_thread_pointer (Pmode, + for_mov || !TARGET_TLS_DIRECT_SEG_REFS); return gen_rtx_PLUS (Pmode, base, off); } else { - base = get_thread_pointer (true); + base = get_thread_pointer (Pmode, true); dest = gen_reg_rtx (Pmode); emit_insn (ix86_gen_sub3 (dest, base, off)); } @@ -13274,8 +13261,7 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)) + if (!TARGET_TLS_DIRECT_SEG_REFS) return orig_x; if (MEM_P (x)) x = XEXP (x, 0); diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 9e5ac00..3fcd209 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -467,9 +467,6 @@ extern int x86_prefetch_sse; #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0 #endif -/* Address override works only on the (%reg) part of %fs:(%reg). */ -#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode) - /* Fence to use after loop using storent. */ extern tree x86_mfence; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d23c67b..e167ceb 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12747,20 +12747,9 @@ (define_mode_attr tp_seg [(SI "gs") (DI "fs")]) ;; Load and add the thread base pointer from %<tp_seg>:0. -(define_insn "*load_tp_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI [(const_int 0)] UNSPEC_TP))] - "TARGET_X32" - "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" - [(set_attr "type" "imov") - (set_attr "modrm" "0") - (set_attr "length" "7") - (set_attr "memory" "load") - (set_attr "imm_disp" "false")]) - -(define_insn "*load_tp_x32_zext" - [(set (match_operand:DI 0 "register_operand" "=r") - (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))] +(define_insn "*load_tp_x32_<mode>" + [(set (match_operand:SWI48x 0 "register_operand" "=r") + (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] "TARGET_X32" "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" [(set_attr "type" "imov") @@ -12836,28 +12825,6 @@ } [(set_attr "type" "multi")]) -;; When Pmode == SImode, there may be no REX prefix for ADD. Avoid -;; any instructions between MOV and ADD, which may interfere linker -;; IE->LE optimization, since the last byte of the previous instruction -;; before ADD may look like a REX prefix. This also avoids -;; movl x@gottpoff(%rip), %reg32 -;; movl $fs:(%reg32), %reg32 -;; Since address override works only on the (reg32) part in fs:(reg32), -;; we can't use it as memory operand. -(define_insn "tls_initial_exec_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI - [(match_operand 1 "tls_symbolic_operand")] - UNSPEC_TLS_IE_X32)) - (clobber (reg:CC FLAGS_REG))] - "TARGET_X32" -{ - output_asm_insn - ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands); - return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}"; -} - [(set_attr "type" "multi")]) - ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 16:35 ` H.J. Lu @ 2012-03-19 16:38 ` Uros Bizjak 2012-03-19 16:47 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-19 16:38 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 5:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> Combine failed: >>> >>> (set (reg:QI 63 [ c ]) >>> (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ >>> (const_int 0 [0]) >>> ] UNSPEC_TP)) >>> (mem/u/c:DI (const:DI (unspec:DI [ >>> (symbol_ref:SI ("c") [flags 0x60] >>> <var_decl 0x7ffff19b8140 c>) >>> ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) >>> >>> >> >> Wrong testcase. IT should be >> >> -- >> extern __thread char c; >> extern __thread short w; >> extern char y; >> extern short i; >> void >> ie (void) >> { >> y = c; >> i = w; >> } >> --- >> >> I got >> >> movl %fs:0, %eax >> movq c@gottpoff(%rip), %rdx >> movzbl (%rax,%rdx), %edx >> movb %dl, y(%rip) >> movq w@gottpoff(%rip), %rdx >> movzwl (%rax,%rdx), %eax >> movw %ax, i(%rip) >> ret >> >> It can be >> >> movq c@gottpoff(%rip), %rax >> movzbl %fs:(%rax), %eax >> movb %al, y(%rip) >> movq w@gottpoff(%rip), %rax >> movzwl %fs:(%rax), %eax >> movw %ax, i(%rip) >> ret >> >> > > How about this patch? I changed 32 TP load to > > (define_insn "*load_tp_x32_<mode>" > [(set (match_operand:SWI48x 0 "register_operand" "=r") > (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] > "TARGET_X32" > "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" > [(set_attr "type" "imov") > (set_attr "modrm" "0") > (set_attr "length" "7") > (set_attr "memory" "load") > (set_attr "imm_disp" "false")]) > > and removed *load_tp_x32_zext. No, your whole approach with splitters is wrong. @@ -12747,11 +12747,11 @@ (define_mode_attr tp_seg [(SI "gs") (DI "fs")]) ;; Load and add the thread base pointer from %<tp_seg>:0. -(define_insn "*load_tp_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI [(const_int 0)] UNSPEC_TP))] +(define_insn "*load_tp_x32_<mode>" + [(set (match_operand:SWI48x 0 "register_operand" "=r") + (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] "TARGET_X32" - "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" + "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" The result is zero_extended SImode register, not fake SImode register in DImore. But as said, you should generate correct sequence from the beginning. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 16:38 ` Uros Bizjak @ 2012-03-19 16:47 ` H.J. Lu 2012-03-19 16:49 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-19 16:47 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 9:37 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Mar 19, 2012 at 5:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> Combine failed: >>>> >>>> (set (reg:QI 63 [ c ]) >>>> (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ >>>> (const_int 0 [0]) >>>> ] UNSPEC_TP)) >>>> (mem/u/c:DI (const:DI (unspec:DI [ >>>> (symbol_ref:SI ("c") [flags 0x60] >>>> <var_decl 0x7ffff19b8140 c>) >>>> ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) >>>> >>>> >>> >>> Wrong testcase. IT should be >>> >>> -- >>> extern __thread char c; >>> extern __thread short w; >>> extern char y; >>> extern short i; >>> void >>> ie (void) >>> { >>> y = c; >>> i = w; >>> } >>> --- >>> >>> I got >>> >>> movl %fs:0, %eax >>> movq c@gottpoff(%rip), %rdx >>> movzbl (%rax,%rdx), %edx >>> movb %dl, y(%rip) >>> movq w@gottpoff(%rip), %rdx >>> movzwl (%rax,%rdx), %eax >>> movw %ax, i(%rip) >>> ret >>> >>> It can be >>> >>> movq c@gottpoff(%rip), %rax >>> movzbl %fs:(%rax), %eax >>> movb %al, y(%rip) >>> movq w@gottpoff(%rip), %rax >>> movzwl %fs:(%rax), %eax >>> movw %ax, i(%rip) >>> ret >>> >>> >> >> How about this patch? I changed 32 TP load to >> >> (define_insn "*load_tp_x32_<mode>" >> [(set (match_operand:SWI48x 0 "register_operand" "=r") >> (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] >> "TARGET_X32" >> "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" >> [(set_attr "type" "imov") >> (set_attr "modrm" "0") >> (set_attr "length" "7") >> (set_attr "memory" "load") >> (set_attr "imm_disp" "false")]) >> >> and removed *load_tp_x32_zext. > > No, your whole approach with splitters is wrong. > > @@ -12747,11 +12747,11 @@ > (define_mode_attr tp_seg [(SI "gs") (DI "fs")]) > > ;; Load and add the thread base pointer from %<tp_seg>:0. > -(define_insn "*load_tp_x32" > - [(set (match_operand:SI 0 "register_operand" "=r") > - (unspec:SI [(const_int 0)] UNSPEC_TP))] > +(define_insn "*load_tp_x32_<mode>" > + [(set (match_operand:SWI48x 0 "register_operand" "=r") > + (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] > "TARGET_X32" > - "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" > + "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" > > The result is zero_extended SImode register, not fake SImode register in DImore. > > But as said, you should generate correct sequence from the beginning. > For x32, thread pointer is an unsigned 32bit value. movl %fs:0, %eax is the correct instruction to load thread pointer into EAX and RAX. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 16:47 ` H.J. Lu @ 2012-03-19 16:49 ` Uros Bizjak 2012-03-19 16:56 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-19 16:49 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > For x32, thread pointer is an unsigned 32bit value. > > movl %fs:0, %eax > > is the correct instruction to load thread pointer into EAX and RAX. So, where is ZERO_EXTEND RTX then? Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 16:49 ` Uros Bizjak @ 2012-03-19 16:56 ` H.J. Lu 2012-03-19 17:02 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-19 16:56 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 9:49 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >> For x32, thread pointer is an unsigned 32bit value. >> >> movl %fs:0, %eax >> >> is the correct instruction to load thread pointer into EAX and RAX. > > So, where is ZERO_EXTEND RTX then? > Thread pointer (TP) is an opaque value to GCC. GCC needs to load TP into a SImode or DImode register. ZERO_EXTEND isn't needed when there is a single instruction to load TP into a DImode register. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 16:56 ` H.J. Lu @ 2012-03-19 17:02 ` Uros Bizjak 2012-03-19 17:30 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-19 17:02 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 5:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> For x32, thread pointer is an unsigned 32bit value. >>> >>> movl %fs:0, %eax >>> >>> is the correct instruction to load thread pointer into EAX and RAX. >> >> So, where is ZERO_EXTEND RTX then? >> > > Thread pointer (TP) is an opaque value to GCC. GCC needs to load > TP into a SImode or DImode register. ZERO_EXTEND isn't needed > when there is a single instruction to load TP into a DImode register. I don't agree with this explanation. The mode can't be SImode and DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the reason we went for all that TARGET_X32 stuff in TP load RTX. Please test my proposed patch. If it works OK, I will commit it to SVN. Thanks, Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 17:02 ` Uros Bizjak @ 2012-03-19 17:30 ` Uros Bizjak 2012-03-19 17:50 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-19 17:30 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> For x32, thread pointer is an unsigned 32bit value. >>>> >>>> movl %fs:0, %eax >>>> >>>> is the correct instruction to load thread pointer into EAX and RAX. >>> >>> So, where is ZERO_EXTEND RTX then? >>> >> >> Thread pointer (TP) is an opaque value to GCC. GCC needs to load >> TP into a SImode or DImode register. ZERO_EXTEND isn't needed >> when there is a single instruction to load TP into a DImode register. > > I don't agree with this explanation. The mode can't be SImode and > DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the > reason we went for all that TARGET_X32 stuff in TP load RTX. > > Please test my proposed patch. If it works OK, I will commit it to SVN. The onyl acceptable way is to generate ZERO_EXTEND in place, so: --cut here-- static rtx get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); if (GET_MODE (tp) != tp_mode) { gcc_assert (GET_MODE (tp) == SImode); gcc_assert (tp_mode == DImode); tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); } if (to_reg) tp = copy_to_mode_reg (tp_mode, tp); return tp; } --cut here-- This will generate: movq c@gottpoff(%rip), %rax movzbl %fs:(%rax), %eax movb %al, y(%rip) movq w@gottpoff(%rip), %rax movzwl %fs:(%rax), %eax movw %ax, i(%rip) ret Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 17:30 ` Uros Bizjak @ 2012-03-19 17:50 ` H.J. Lu 2012-03-19 19:14 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-19 17:50 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 10:29 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>>> For x32, thread pointer is an unsigned 32bit value. >>>>> >>>>> movl %fs:0, %eax >>>>> >>>>> is the correct instruction to load thread pointer into EAX and RAX. >>>> >>>> So, where is ZERO_EXTEND RTX then? >>>> >>> >>> Thread pointer (TP) is an opaque value to GCC. GCC needs to load >>> TP into a SImode or DImode register. ZERO_EXTEND isn't needed >>> when there is a single instruction to load TP into a DImode register. >> >> I don't agree with this explanation. The mode can't be SImode and >> DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the >> reason we went for all that TARGET_X32 stuff in TP load RTX. FWIW, TP maintained by OS is opaque to GCC and GCC mode doesn't apply to the TP value maintained by OS. The instruction pattern to load TP into a register is provided by OS and is also opaque to GCC. X32 OS provides single instructions to load TP into SImode and DImode registers. We can load x32 TP into SImode register and ZERO_EXTENDs to DImode. Or we can use the OS provided instruction to load TP into DImode register directly. >> Please test my proposed patch. If it works OK, I will commit it to SVN. > > The onyl acceptable way is to generate ZERO_EXTEND in place, so: > > --cut here-- > static rtx > get_thread_pointer (enum machine_mode tp_mode, bool to_reg) > { > rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); > > if (GET_MODE (tp) != tp_mode) > { > gcc_assert (GET_MODE (tp) == SImode); > gcc_assert (tp_mode == DImode); > > tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); > } > > if (to_reg) > tp = copy_to_mode_reg (tp_mode, tp); > > return tp; > } > --cut here-- This version works fine. Thanks. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 17:50 ` H.J. Lu @ 2012-03-19 19:14 ` Uros Bizjak 2012-03-20 9:35 ` Paolo Bonzini 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-19 19:14 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1924 bytes --] On Mon, Mar 19, 2012 at 6:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> Please test my proposed patch. If it works OK, I will commit it to SVN. >> >> The onyl acceptable way is to generate ZERO_EXTEND in place, so: >> >> --cut here-- >> static rtx >> get_thread_pointer (enum machine_mode tp_mode, bool to_reg) >> { >> rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); >> >> if (GET_MODE (tp) != tp_mode) >> { >> gcc_assert (GET_MODE (tp) == SImode); >> gcc_assert (tp_mode == DImode); >> >> tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); >> } >> >> if (to_reg) >> tp = copy_to_mode_reg (tp_mode, tp); >> >> return tp; >> } >> --cut here-- > > This version works fine. Attached patch was committed to mainline SVN with following ChangeLog: 2012-03-19 Uros Bizjak <ubizjak@gmail.com> * config/i386/i386.c (get_thread_pointer): Add tp_mode argument. Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode. (legitimize_tls_address) <TLS_MODEL_INITIAL_EXEC>: Always generate DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT. (ix86_decompose_address): Allow zero extended UNSPEC_TP references. Revert: 2012-03-13 Uros Bizjak <ubizjak@gmail.com> * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. * config/i386/i386.c (ix86_decompose_address): Use TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load thread pointer to a register. Revert: 2012-03-10 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. (legitimize_tls_address): Call gen_tls_initial_exec_x32 if Pmode == SImode for TARGET_X32. * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. (tls_initial_exec_x32): Likewise. Tested on x86_64-pc-linux-gnu {,-m32}. Thanks, Uros. [-- Attachment #2: p.diff.txt --] [-- Type: text/plain, Size: 6477 bytes --] Index: i386.md =================================================================== --- i386.md (revision 185524) +++ i386.md (working copy) @@ -96,7 +96,6 @@ UNSPEC_TLS_LD_BASE UNSPEC_TLSDESC UNSPEC_TLS_IE_SUN - UNSPEC_TLS_IE_X32 ;; Other random patterns UNSPEC_SCAS @@ -12836,28 +12835,6 @@ } [(set_attr "type" "multi")]) -;; When Pmode == SImode, there may be no REX prefix for ADD. Avoid -;; any instructions between MOV and ADD, which may interfere linker -;; IE->LE optimization, since the last byte of the previous instruction -;; before ADD may look like a REX prefix. This also avoids -;; movl x@gottpoff(%rip), %reg32 -;; movl $fs:(%reg32), %reg32 -;; Since address override works only on the (reg32) part in fs:(reg32), -;; we can't use it as memory operand. -(define_insn "tls_initial_exec_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI - [(match_operand 1 "tls_symbolic_operand")] - UNSPEC_TLS_IE_X32)) - (clobber (reg:CC FLAGS_REG))] - "TARGET_X32" -{ - output_asm_insn - ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands); - return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}"; -} - [(set_attr "type" "multi")]) - ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" Index: i386.c =================================================================== --- i386.c (revision 185524) +++ i386.c (working copy) @@ -11514,6 +11514,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr scale = 1 << scale; break; + case ZERO_EXTEND: + op = XEXP (op, 0); + /* FALLTHRU */ + case UNSPEC: if (XINT (op, 1) == UNSPEC_TP && TARGET_TLS_DIRECT_SEG_REFS @@ -12483,15 +12487,20 @@ legitimize_pic_address (rtx orig, rtx reg) /* Load the thread pointer. If TO_REG is true, force it into a register. */ static rtx -get_thread_pointer (bool to_reg) +get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); - if (GET_MODE (tp) != Pmode) - tp = convert_to_mode (Pmode, tp, 1); + if (GET_MODE (tp) != tp_mode) + { + gcc_assert (GET_MODE (tp) == SImode); + gcc_assert (tp_mode == DImode); + tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); + } + if (to_reg) - tp = copy_addr_to_reg (tp); + tp = copy_to_mode_reg (tp_mode, tp); return tp; } @@ -12543,6 +12552,7 @@ legitimize_tls_address (rtx x, enum tls_model mode { rtx dest, base, off; rtx pic = NULL_RTX, tp = NULL_RTX; + enum machine_mode tp_mode = Pmode; int type; switch (model) @@ -12568,7 +12578,7 @@ legitimize_tls_address (rtx x, enum tls_model mode else emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest)); set_unique_reg_note (get_last_insn (), REG_EQUAL, x); @@ -12618,7 +12628,7 @@ legitimize_tls_address (rtx x, enum tls_model mode else emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); set_unique_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_MINUS (Pmode, tmp, tp)); } @@ -12674,18 +12684,10 @@ legitimize_tls_address (rtx x, enum tls_model mode emit_insn (gen_tls_initial_exec_64_sun (dest, x)); return dest; } - else if (Pmode == SImode) - { - /* Always generate - movl %fs:0, %reg32 - addl xgottpoff(%rip), %reg32 - to support linker IE->LE optimization and avoid - fs:(%reg32) as memory operand. */ - dest = gen_reg_rtx (Pmode); - emit_insn (gen_tls_initial_exec_x32 (dest, x)); - return dest; - } + /* Generate DImode references to avoid %fs:(%reg32) + problems and linker IE->LE relaxation bug. */ + tp_mode = DImode; pic = NULL; type = UNSPEC_GOTNTPOFF; } @@ -12708,24 +12710,23 @@ legitimize_tls_address (rtx x, enum tls_model mode type = UNSPEC_INDNTPOFF; } - off = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, x), type); - off = gen_rtx_CONST (Pmode, off); + off = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, x), type); + off = gen_rtx_CONST (tp_mode, off); if (pic) - off = gen_rtx_PLUS (Pmode, pic, off); - off = gen_const_mem (Pmode, off); + off = gen_rtx_PLUS (tp_mode, pic, off); + off = gen_const_mem (tp_mode, off); set_mem_alias_set (off, ix86_GOT_alias_set ()); if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov - || !(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)); - off = force_reg (Pmode, off); - return gen_rtx_PLUS (Pmode, base, off); + base = get_thread_pointer (tp_mode, + for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + off = force_reg (tp_mode, off); + return gen_rtx_PLUS (tp_mode, base, off); } else { - base = get_thread_pointer (true); + base = get_thread_pointer (Pmode, true); dest = gen_reg_rtx (Pmode); emit_insn (ix86_gen_sub3 (dest, base, off)); } @@ -12739,14 +12740,13 @@ legitimize_tls_address (rtx x, enum tls_model mode if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov - || !(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)); + base = get_thread_pointer (Pmode, + for_mov || !TARGET_TLS_DIRECT_SEG_REFS); return gen_rtx_PLUS (Pmode, base, off); } else { - base = get_thread_pointer (true); + base = get_thread_pointer (Pmode, true); dest = gen_reg_rtx (Pmode); emit_insn (ix86_gen_sub3 (dest, base, off)); } @@ -13274,8 +13274,7 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!(TARGET_TLS_DIRECT_SEG_REFS - && TARGET_TLS_INDIRECT_SEG_REFS)) + if (!TARGET_TLS_DIRECT_SEG_REFS) return orig_x; if (MEM_P (x)) x = XEXP (x, 0); Index: i386.h =================================================================== --- i386.h (revision 185524) +++ i386.h (working copy) @@ -467,9 +467,6 @@ extern int x86_prefetch_sse; #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0 #endif -/* Address override works only on the (%reg) part of %fs:(%reg). */ -#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode) - /* Fence to use after loop using storent. */ extern tree x86_mfence; ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 19:14 ` Uros Bizjak @ 2012-03-20 9:35 ` Paolo Bonzini 0 siblings, 0 replies; 43+ messages in thread From: Paolo Bonzini @ 2012-03-20 9:35 UTC (permalink / raw) To: gcc-patches Il 19/03/2012 20:13, Uros Bizjak ha scritto: > 2012-03-19 Uros Bizjak <ubizjak@gmail.com> > > * config/i386/i386.c (get_thread_pointer): Add tp_mode argument. > Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode. > (legitimize_tls_address) <TLS_MODEL_INITIAL_EXEC>: Always generate > DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT. > (ix86_decompose_address): Allow zero extended UNSPEC_TP references. > > Revert: > 2012-03-13 Uros Bizjak <ubizjak@gmail.com> > > * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. > * config/i386/i386.c (ix86_decompose_address): Use > TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. > (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load > thread pointer to a register. > > Revert: > 2012-03-10 H.J. Lu <hongjiu.lu@intel.com> > > * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) > if Pmode != word_mode. > (legitimize_tls_address): Call gen_tls_initial_exec_x32 if > Pmode == SImode for TARGET_X32. > > * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. > (tls_initial_exec_x32): Likewise. > > Tested on x86_64-pc-linux-gnu {,-m32}. No testcases? Paolo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-19 16:20 ` H.J. Lu 2012-03-19 16:35 ` H.J. Lu @ 2012-03-19 16:47 ` Uros Bizjak 1 sibling, 0 replies; 43+ messages in thread From: Uros Bizjak @ 2012-03-19 16:47 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Richard Henderson On Mon, Mar 19, 2012 at 5:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > movl %fs:0, %eax > movq c@gottpoff(%rip), %rdx > movzbl (%rax,%rdx), %edx > movb %dl, y(%rip) > movq w@gottpoff(%rip), %rdx > movzwl (%rax,%rdx), %eax > movw %ax, i(%rip) > ret > > It can be > > movq c@gottpoff(%rip), %rax > movzbl %fs:(%rax), %eax > movb %al, y(%rip) > movq w@gottpoff(%rip), %rax > movzwl %fs:(%rax), %eax > movw %ax, i(%rip) > ret This is just CSE in action. It CSEd movl %fs:0, %eax, since it has to be zero extended before going into address. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-18 20:55 ` Uros Bizjak 2012-03-19 15:51 ` H.J. Lu @ 2012-03-20 8:52 ` Eric Botcazou 2012-03-20 8:59 ` Jakub Jelinek 1 sibling, 1 reply; 43+ messages in thread From: Eric Botcazou @ 2012-03-20 8:52 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, H.J. Lu, Richard Henderson > The patch is bootstrapping now on x86_64-pc-linux-gnu. It very likely breaks bootstrap with RTL checking enabled: /sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc -B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ -B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ -isystem /usr/gnat/i686-pc-linux-gnu/include -isystem /usr/gnat/i686-pc-linux-gnu/sys-include -g -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fpic -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fpic -I. -I. -I../.././gcc -I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc -I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o _popcountsi2.o -MT _popcountsi2.o -MD -MP -MF _popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS ../../../src/libgcc/libgcc2.c: In function '__popcountsi2': ../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, at config/i386/i386.c:11522 Please submit a full bug report, with preprocessed source if appropriate. See <URL:mailto:report@adacore.com> for instructions. make[3]: *** [_popcountsi2.o] Error 1 -- Eric Botcazou ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-20 8:52 ` Eric Botcazou @ 2012-03-20 8:59 ` Jakub Jelinek 2012-03-20 11:20 ` Jakub Jelinek 0 siblings, 1 reply; 43+ messages in thread From: Jakub Jelinek @ 2012-03-20 8:59 UTC (permalink / raw) To: Eric Botcazou; +Cc: Uros Bizjak, gcc-patches, H.J. Lu, Richard Henderson On Tue, Mar 20, 2012 at 09:51:07AM +0100, Eric Botcazou wrote: > > The patch is bootstrapping now on x86_64-pc-linux-gnu. > > It very likely breaks bootstrap with RTL checking enabled: > > /sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc -B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ -B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ -isystem /usr/gnat/i686-pc-linux-gnu/include -isystem /usr/gnat/i686-pc-linux-gnu/sys-include -g -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fpic -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fpic -I. -I. -I../.././gcc -I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc -I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o > _popcountsi2.o -MT _popcountsi2.o -MD -MP -MF > _popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS > ../../../src/libgcc/libgcc2.c: In function '__popcountsi2': > ../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: > expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, > at config/i386/i386.c:11522 > Please submit a full bug report, > with preprocessed source if appropriate. > See <URL:mailto:report@adacore.com> for instructions. > make[3]: *** [_popcountsi2.o] Error 1 Yeah, my bootstrap just failed the same. Will test: 2012-03-20 Jakub Jelinek <jakub@redhat.com> * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>: If operand isn't UNSPEC, return 0. --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100 +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct case ZERO_EXTEND: op = XEXP (op, 0); + if (GET_CODE (op) != UNSPEC) + return 0; /* FALLTHRU */ case UNSPEC: Jakub ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-20 8:59 ` Jakub Jelinek @ 2012-03-20 11:20 ` Jakub Jelinek 2012-03-20 15:52 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Jakub Jelinek @ 2012-03-20 11:20 UTC (permalink / raw) To: Uros Bizjak; +Cc: Eric Botcazou, gcc-patches, H.J. Lu, Richard Henderson On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote: > Yeah, my bootstrap just failed the same. Will test: > > 2012-03-20 Jakub Jelinek <jakub@redhat.com> > > * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>: > If operand isn't UNSPEC, return 0. Committed as obvious now that bootstrap/regtest finished on x86_64-linux and i686-linux. > --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100 > +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 > @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct > > case ZERO_EXTEND: > op = XEXP (op, 0); > + if (GET_CODE (op) != UNSPEC) > + return 0; > /* FALLTHRU */ > > case UNSPEC: Jakub ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-20 11:20 ` Jakub Jelinek @ 2012-03-20 15:52 ` H.J. Lu 2012-03-20 17:55 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-20 15:52 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Uros Bizjak, Eric Botcazou, gcc-patches, Richard Henderson On Tue, Mar 20, 2012 at 4:19 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote: >> Yeah, my bootstrap just failed the same. Will test: >> >> 2012-03-20 Jakub Jelinek <jakub@redhat.com> >> >> * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>: >> If operand isn't UNSPEC, return 0. > > Committed as obvious now that bootstrap/regtest finished on x86_64-linux > and i686-linux. > >> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100 >> +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 >> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct >> >> case ZERO_EXTEND: >> op = XEXP (op, 0); >> + if (GET_CODE (op) != UNSPEC) >> + return 0; >> /* FALLTHRU */ >> >> case UNSPEC: > Uros, I think use the OS provided instruction to load TP into DImode register could simplify the code. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-20 15:52 ` H.J. Lu @ 2012-03-20 17:55 ` Uros Bizjak 2012-03-20 18:27 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-20 17:55 UTC (permalink / raw) To: H.J. Lu; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> Yeah, my bootstrap just failed the same. Will test: >>> >>> 2012-03-20 Jakub Jelinek <jakub@redhat.com> >>> >>> * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>: >>> If operand isn't UNSPEC, return 0. >> >> Committed as obvious now that bootstrap/regtest finished on x86_64-linux >> and i686-linux. >> >>> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100 >>> +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 >>> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct >>> >>> case ZERO_EXTEND: >>> op = XEXP (op, 0); >>> + if (GET_CODE (op) != UNSPEC) >>> + return 0; >>> /* FALLTHRU */ >>> >>> case UNSPEC: >> > > Uros, > > I think use the OS provided instruction to load TP into DImode register > could simplify the code. Which OS provided instruction? Please see how TP is defined in get_thread_pointer, it is in ptr_mode: rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); This says that TP is in SImode on X32. Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-20 17:55 ` Uros Bizjak @ 2012-03-20 18:27 ` H.J. Lu 2012-03-20 18:44 ` Uros Bizjak 0 siblings, 1 reply; 43+ messages in thread From: H.J. Lu @ 2012-03-20 18:27 UTC (permalink / raw) To: Uros Bizjak; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson On Tue, Mar 20, 2012 at 10:54 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> Yeah, my bootstrap just failed the same. Will test: >>>> >>>> 2012-03-20 Jakub Jelinek <jakub@redhat.com> >>>> >>>> * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>: >>>> If operand isn't UNSPEC, return 0. >>> >>> Committed as obvious now that bootstrap/regtest finished on x86_64-linux >>> and i686-linux. >>> >>>> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100 >>>> +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 >>>> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct >>>> >>>> case ZERO_EXTEND: >>>> op = XEXP (op, 0); >>>> + if (GET_CODE (op) != UNSPEC) >>>> + return 0; >>>> /* FALLTHRU */ >>>> >>>> case UNSPEC: >>> >> >> Uros, >> >> I think use the OS provided instruction to load TP into DImode register >> could simplify the code. > > Which OS provided instruction? > > Please see how TP is defined in get_thread_pointer, it is in ptr_mode: > > rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); > > This says that TP is in SImode on X32. > > Uros. TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP) and provided by OS. It is a CONST_INT, but its value is opaque to GCC. MODE here has no impact on its value provided by OS. X32 OS provides instructions to load TP to into an SImode and DImode registers. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-20 18:27 ` H.J. Lu @ 2012-03-20 18:44 ` Uros Bizjak 2012-03-20 19:26 ` H.J. Lu 0 siblings, 1 reply; 43+ messages in thread From: Uros Bizjak @ 2012-03-20 18:44 UTC (permalink / raw) To: H.J. Lu; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> I think use the OS provided instruction to load TP into DImode register >>> could simplify the code. >> >> Which OS provided instruction? >> >> Please see how TP is defined in get_thread_pointer, it is in ptr_mode: >> >> rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); >> >> This says that TP is in SImode on X32. > TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP) > and provided by OS. It is a CONST_INT, but its value is opaque > to GCC. MODE here has no impact on its value provided by OS. > X32 OS provides instructions to load TP to into an SImode and > DImode registers. You must be looking to some other GCC sources than me. (define_insn "*load_tp_x32" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(const_int 0)] UNSPEC_TP))] "TARGET_X32" "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" [(set_attr "type" "imov") (set_attr "modrm" "0") (set_attr "length" "7") (set_attr "memory" "load") (set_attr "imm_disp" "false")]) (define_insn "*load_tp_x32_zext" [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))] "TARGET_X32" "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" [(set_attr "type" "imov") (set_attr "modrm" "0") (set_attr "length" "7") (set_attr "memory" "load") (set_attr "imm_disp" "false")]) Uros. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: PATCH: Properly generate X32 IE sequence 2012-03-20 18:44 ` Uros Bizjak @ 2012-03-20 19:26 ` H.J. Lu 0 siblings, 0 replies; 43+ messages in thread From: H.J. Lu @ 2012-03-20 19:26 UTC (permalink / raw) To: Uros Bizjak; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson On Tue, Mar 20, 2012 at 11:43 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> I think use the OS provided instruction to load TP into DImode register >>>> could simplify the code. >>> >>> Which OS provided instruction? >>> >>> Please see how TP is defined in get_thread_pointer, it is in ptr_mode: >>> >>> rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); >>> >>> This says that TP is in SImode on X32. > >> TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP) >> and provided by OS. It is a CONST_INT, but its value is opaque >> to GCC. MODE here has no impact on its value provided by OS. >> X32 OS provides instructions to load TP to into an SImode and >> DImode registers. > > You must be looking to some other GCC sources than me. > > (define_insn "*load_tp_x32" > [(set (match_operand:SI 0 "register_operand" "=r") > (unspec:SI [(const_int 0)] UNSPEC_TP))] > "TARGET_X32" > "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" > [(set_attr "type" "imov") > (set_attr "modrm" "0") > (set_attr "length" "7") > (set_attr "memory" "load") > (set_attr "imm_disp" "false")]) > > (define_insn "*load_tp_x32_zext" > [(set (match_operand:DI 0 "register_operand" "=r") > (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))] > "TARGET_X32" > "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" > [(set_attr "type" "imov") > (set_attr "modrm" "0") > (set_attr "length" "7") > (set_attr "memory" "load") > (set_attr "imm_disp" "false")]) > Thread pointer (TP) points to thread control block (TCB). X32 TCB is typedef struct { void *tcb; /* Pointer to the TCB. Not necessarily the thread descriptor used by libpthread. */ ... } It is a 32bit address set up by OS. That is where 0 in "%fs:0" comes from since it is the first field of the struct %fs points to. X32 OS provides mov %fs:0, %eax to load the address of TCB into EAX and mov %fs:0, %eax to load the address of TCB into RAX since OS guarantees that the upper 32bits of the address of TCB are all 0s. We added "*load_tp_x32_zext" since we zero-extend SI TP to DI TP. Or we can use mov %fs:0, %eax to directly load the value of the tcb field into RAX and remove "*load_tp_x32_zext". It will simplify the code. -- H.J. ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2012-03-20 19:26 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-09 22:26 PATCH: Properly generate X32 IE sequence H.J. Lu 2012-03-10 13:10 ` Uros Bizjak 2012-03-10 18:50 ` H.J. Lu 2012-03-11 17:12 ` H.J. Lu 2012-03-11 17:55 ` Uros Bizjak 2012-03-11 18:16 ` H.J. Lu 2012-03-11 18:21 ` Uros Bizjak 2012-03-11 21:25 ` H.J. Lu 2012-03-12 19:39 ` Uros Bizjak 2012-03-12 22:35 ` H.J. Lu 2012-03-13 1:21 ` H.J. Lu 2012-03-13 7:11 ` Uros Bizjak 2012-03-13 10:37 ` Uros Bizjak 2012-03-13 15:47 ` H.J. Lu 2012-03-17 17:53 ` H.J. Lu 2012-03-17 18:10 ` Uros Bizjak 2012-03-17 18:19 ` H.J. Lu 2012-03-17 18:21 ` Uros Bizjak 2012-03-17 21:50 ` H.J. Lu 2012-03-18 16:02 ` Uros Bizjak 2012-03-18 20:55 ` Uros Bizjak 2012-03-19 15:51 ` H.J. Lu 2012-03-19 15:54 ` H.J. Lu 2012-03-19 16:20 ` H.J. Lu 2012-03-19 16:35 ` H.J. Lu 2012-03-19 16:38 ` Uros Bizjak 2012-03-19 16:47 ` H.J. Lu 2012-03-19 16:49 ` Uros Bizjak 2012-03-19 16:56 ` H.J. Lu 2012-03-19 17:02 ` Uros Bizjak 2012-03-19 17:30 ` Uros Bizjak 2012-03-19 17:50 ` H.J. Lu 2012-03-19 19:14 ` Uros Bizjak 2012-03-20 9:35 ` Paolo Bonzini 2012-03-19 16:47 ` Uros Bizjak 2012-03-20 8:52 ` Eric Botcazou 2012-03-20 8:59 ` Jakub Jelinek 2012-03-20 11:20 ` Jakub Jelinek 2012-03-20 15:52 ` H.J. Lu 2012-03-20 17:55 ` Uros Bizjak 2012-03-20 18:27 ` H.J. Lu 2012-03-20 18:44 ` Uros Bizjak 2012-03-20 19:26 ` H.J. Lu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).