public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: Properly generate X32 IE sequence
@ 2012-03-09 22:26 H.J. Lu
  2012-03-10 13:10 ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-09 22:26 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 6145 bytes --]

On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
>>>> by checking
>>>>
>>>>        movq foo@gottpoff(%rip), %reg
>>>>
>>>> and
>>>>
>>>>        addq foo@gottpoff(%rip), %reg
>>>>
>>>> It uses the REX prefix to avoid the last byte of the previous
>>>> instruction.  With 32bit Pmode, we may not have the REX prefix and
>>>> the last byte of the previous instruction may be an offset, which
>>>> may look like a REX prefix.  IE->LE optimization will generate corrupted
>>>> binary.  This patch makes sure we always output an REX pfrefix for
>>>> UNSPEC_GOTNTPOFF.  OK for trunk?
>>>
>>> Actually, linker has:
>>>
>>>    case R_X86_64_GOTTPOFF:
>>>      /* Check transition from IE access model:
>>>                mov foo@gottpoff(%rip), %reg
>>>                add foo@gottpoff(%rip), %reg
>>>       */
>>>
>>>      /* Check REX prefix first.  */
>>>      if (offset >= 3 && (offset + 4) <= sec->size)
>>>        {
>>>          val = bfd_get_8 (abfd, contents + offset - 3);
>>>          if (val != 0x48 && val != 0x4c)
>>>            {
>>>              /* X32 may have 0x44 REX prefix or no REX prefix.  */
>>>              if (ABI_64_P (abfd))
>>>                return FALSE;
>>>            }
>>>        }
>>>      else
>>>        {
>>>          /* X32 may not have any REX prefix.  */
>>>          if (ABI_64_P (abfd))
>>>            return FALSE;
>>>          if (offset < 2 || (offset + 3) > sec->size)
>>>            return FALSE;
>>>        }
>>>
>>> So, it should handle the case without REX just OK. If it doesn't, then
>>> this is a bug in binutils.
>>>
>>
>> The last byte of the displacement in the previous instruction
>> may happen to look like a REX byte. In that case, linker
>> will overwrite the last byte of the previous instruction and
>> generate the wrong instruction sequence.
>>
>> I need to update linker to enforce the REX byte check.
>
> One important observation: if we want to follow the x86_64 TLS spec
> strictly, we have to use existing DImode patterns only. This also
> means that we should NOT convert other TLS patterns to Pmode, since
> they explicitly state movq and addq. If this is not the case, then we
> need new TLS specification for X32.

Here is a patch to properly generate X32 IE sequence.

This is the summary of differences between x86-64 TLS and x32 TLS:

                     x86-64                               x32
GD
    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
    .word 0x6666; rex64; call __tls_get_addr@plt  .word 0x6666; rex64;
call __tls_get_addr@plt

GD->IE optimization
   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
addq x@gottpoff(%rip),%rax

GD->LE optimization
   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
leaq x@tpoff(%rax),%rax

LD
  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
  call __tls_get_addr@plt                         call __tls_get_addr@plt

LD->LE optimization
  .word 0x6666; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
%fs:0, %eax

IE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32

   or
                                                  Not supported if
Pmode == SImode
   movq x@gottpoff(%rip),%reg64;                  movq x@gottpoff(%rip),%reg64;
   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

IE->LE optimization

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32

   to

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), %reg32

   or

   movq x@gottpoff(%rip),%reg64                   movq x@gottpoff(%rip),%reg64;
   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

   to

   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32

LE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32

   or

   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32


X32 TLS implementation is straight forward, except for IE:

1. Since address override works only on the (reg32) part in fs:(reg32),
we can't use it as memory operand.  This patch changes ix86_decompose_address
to disallow  fs:(reg) if Pmode != word_mode.
2. When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
any instructions between MOV and ADD, which may interfere linker
IE->LE optimization, since the last byte of the previous instruction
before ADD may look like a REX prefix.  This patch adds tls_initial_exec_x32
to make sure that we always have

movl %fs:0, %reg32
addl xgottpoff(%rip), %reg32

so that the last byte of the previous instruction before ADD will
never be a REX byte.  Tested on Linux/x32.


-- 
H.J.
--
2012-03-09  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
	if Pmode != word_mode.
	(legitimize_tls_address): Call gen_tls_initial_exec_x32 if
	Pmode == SImode for x32.

	* config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
	(tls_initial_exec_x32): Likewise.

[-- Attachment #2: gcc-x32-tls-1.patch --]
[-- Type: text/plain, Size: 2698 bytes --]

2012-03-09  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
	if Pmode != word_mode.
	(legitimize_tls_address): Call gen_tls_initial_exec_x32 if
	Pmode == SImode for x32.

	* config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
	(tls_initial_exec_x32): Likewise.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 15465c2..312b50c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11524,6 +11534,11 @@ ix86_decompose_address (rtx addr, struct ix86_address *out)
   else
     disp = addr;			/* displacement */
 
+  /* Since address override works only on the (reg32) part in fs:(reg32),
+     we can't use it as memory operand.  */
+  if (Pmode != word_mode && seg == SEG_FS && (base || index))
+    return 0;
+
   if (index)
     {
       if (REG_P (index))
@@ -12618,6 +12643,17 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	      emit_insn (gen_tls_initial_exec_64_sun (dest, x));
 	      return dest;
 	    }
+	  else if (Pmode == SImode)
+	    {
+	      /* Always generate
+			movl %fs:0, %reg32
+			addl xgottpoff(%rip), %reg32
+		 to support linker IE->LE optimization and avoid
+		 fs:(%reg32) as memory operand.  */
+	      dest = gen_reg_rtx (Pmode);
+	      emit_insn (gen_tls_initial_exec_x32 (dest, x));
+	      return dest;
+	    }
 
 	  pic = NULL;
 	  type = UNSPEC_GOTNTPOFF;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 188c982..d1fa997 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -95,6 +95,7 @@
   UNSPEC_TLS_LD_BASE
   UNSPEC_TLSDESC
   UNSPEC_TLS_IE_SUN
+  UNSPEC_TLS_IE_X32
 
   ;; Other random patterns
   UNSPEC_SCAS
@@ -12775,6 +12776,28 @@
 }
   [(set_attr "type" "multi")])
 
+;; When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
+;; any instructions between MOV and ADD, which may interfere linker
+;; IE->LE optimization, since the last byte of the previous instruction
+;; before ADD may look like a REX prefix.  This also avoids
+;;	movl x@gottpoff(%rip), %reg32
+;;	movl $fs:(%reg32), %reg32
+;; Since address override works only on the (reg32) part in fs:(reg32),
+;; we can't use it as memory operand.
+(define_insn "tls_initial_exec_x32"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI
+	 [(match_operand:SI 1 "tls_symbolic_operand" "")]
+	 UNSPEC_TLS_IE_X32))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_X32"
+{
+  output_asm_insn
+    ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands);
+  return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}";
+}
+  [(set_attr "type" "multi")])
+
 ;; GNU2 TLS patterns can be split.
 
 (define_expand "tls_dynamic_gnu2_32"

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-09 22:26 PATCH: Properly generate X32 IE sequence H.J. Lu
@ 2012-03-10 13:10 ` Uros Bizjak
  2012-03-10 18:50   ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-10 13:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
>>>>> by checking
>>>>>
>>>>>        movq foo@gottpoff(%rip), %reg
>>>>>
>>>>> and
>>>>>
>>>>>        addq foo@gottpoff(%rip), %reg
>>>>>
>>>>> It uses the REX prefix to avoid the last byte of the previous
>>>>> instruction.  With 32bit Pmode, we may not have the REX prefix and
>>>>> the last byte of the previous instruction may be an offset, which
>>>>> may look like a REX prefix.  IE->LE optimization will generate corrupted
>>>>> binary.  This patch makes sure we always output an REX pfrefix for
>>>>> UNSPEC_GOTNTPOFF.  OK for trunk?
>>>>
>>>> Actually, linker has:
>>>>
>>>>    case R_X86_64_GOTTPOFF:
>>>>      /* Check transition from IE access model:
>>>>                mov foo@gottpoff(%rip), %reg
>>>>                add foo@gottpoff(%rip), %reg
>>>>       */
>>>>
>>>>      /* Check REX prefix first.  */
>>>>      if (offset >= 3 && (offset + 4) <= sec->size)
>>>>        {
>>>>          val = bfd_get_8 (abfd, contents + offset - 3);
>>>>          if (val != 0x48 && val != 0x4c)
>>>>            {
>>>>              /* X32 may have 0x44 REX prefix or no REX prefix.  */
>>>>              if (ABI_64_P (abfd))
>>>>                return FALSE;
>>>>            }
>>>>        }
>>>>      else
>>>>        {
>>>>          /* X32 may not have any REX prefix.  */
>>>>          if (ABI_64_P (abfd))
>>>>            return FALSE;
>>>>          if (offset < 2 || (offset + 3) > sec->size)
>>>>            return FALSE;
>>>>        }
>>>>
>>>> So, it should handle the case without REX just OK. If it doesn't, then
>>>> this is a bug in binutils.
>>>>
>>>
>>> The last byte of the displacement in the previous instruction
>>> may happen to look like a REX byte. In that case, linker
>>> will overwrite the last byte of the previous instruction and
>>> generate the wrong instruction sequence.
>>>
>>> I need to update linker to enforce the REX byte check.
>>
>> One important observation: if we want to follow the x86_64 TLS spec
>> strictly, we have to use existing DImode patterns only. This also
>> means that we should NOT convert other TLS patterns to Pmode, since
>> they explicitly state movq and addq. If this is not the case, then we
>> need new TLS specification for X32.
>
> Here is a patch to properly generate X32 IE sequence.
>
> This is the summary of differences between x86-64 TLS and x32 TLS:
>
>                     x86-64                               x32
> GD
>    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
>    .word 0x6666; rex64; call __tls_get_addr@plt  .word 0x6666; rex64;
> call __tls_get_addr@plt
>
> GD->IE optimization
>   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
> addq x@gottpoff(%rip),%rax
>
> GD->LE optimization
>   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
> leaq x@tpoff(%rax),%rax
>
> LD
>  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
>  call __tls_get_addr@plt                         call __tls_get_addr@plt
>
> LD->LE optimization
>  .word 0x6666; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
> %fs:0, %eax
>
> IE
>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>
>   or
>                                                  Not supported if
> Pmode == SImode
>   movq x@gottpoff(%rip),%reg64;                  movq x@gottpoff(%rip),%reg64;
>   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>
> IE->LE optimization
>
>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>
>   to
>
>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32
>
>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), %reg32
>
>   or
>
>   movq x@gottpoff(%rip),%reg64                   movq x@gottpoff(%rip),%reg64;
>   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>
>   to
>
>   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
>   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32
>
> LE
>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32
>
>   or
>
>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32
>
>   or
>
>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32
>
>   or
>
>   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32
>
>
> X32 TLS implementation is straight forward, except for IE:
>
> 1. Since address override works only on the (reg32) part in fs:(reg32),
> we can't use it as memory operand.  This patch changes ix86_decompose_address
> to disallow  fs:(reg) if Pmode != word_mode.
> 2. When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
> any instructions between MOV and ADD, which may interfere linker
> IE->LE optimization, since the last byte of the previous instruction
> before ADD may look like a REX prefix.  This patch adds tls_initial_exec_x32
> to make sure that we always have
>
> movl %fs:0, %reg32
> addl xgottpoff(%rip), %reg32
>
> so that the last byte of the previous instruction before ADD will
> never be a REX byte.  Tested on Linux/x32.
>
> 2012-03-09  H.J. Lu  <hongjiu.lu@intel.com>
>
>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>        if Pmode != word_mode.
>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>        Pmode == SImode for x32.
>
>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>        (tls_initial_exec_x32): Likewise.

Nice solution!

OK for mainline.

BTW: Did you investigate the issue with memory aliasing?

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-10 13:10 ` Uros Bizjak
@ 2012-03-10 18:50   ` H.J. Lu
  2012-03-11 17:12     ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-10 18:50 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
>>>>>> by checking
>>>>>>
>>>>>>        movq foo@gottpoff(%rip), %reg
>>>>>>
>>>>>> and
>>>>>>
>>>>>>        addq foo@gottpoff(%rip), %reg
>>>>>>
>>>>>> It uses the REX prefix to avoid the last byte of the previous
>>>>>> instruction.  With 32bit Pmode, we may not have the REX prefix and
>>>>>> the last byte of the previous instruction may be an offset, which
>>>>>> may look like a REX prefix.  IE->LE optimization will generate corrupted
>>>>>> binary.  This patch makes sure we always output an REX pfrefix for
>>>>>> UNSPEC_GOTNTPOFF.  OK for trunk?
>>>>>
>>>>> Actually, linker has:
>>>>>
>>>>>    case R_X86_64_GOTTPOFF:
>>>>>      /* Check transition from IE access model:
>>>>>                mov foo@gottpoff(%rip), %reg
>>>>>                add foo@gottpoff(%rip), %reg
>>>>>       */
>>>>>
>>>>>      /* Check REX prefix first.  */
>>>>>      if (offset >= 3 && (offset + 4) <= sec->size)
>>>>>        {
>>>>>          val = bfd_get_8 (abfd, contents + offset - 3);
>>>>>          if (val != 0x48 && val != 0x4c)
>>>>>            {
>>>>>              /* X32 may have 0x44 REX prefix or no REX prefix.  */
>>>>>              if (ABI_64_P (abfd))
>>>>>                return FALSE;
>>>>>            }
>>>>>        }
>>>>>      else
>>>>>        {
>>>>>          /* X32 may not have any REX prefix.  */
>>>>>          if (ABI_64_P (abfd))
>>>>>            return FALSE;
>>>>>          if (offset < 2 || (offset + 3) > sec->size)
>>>>>            return FALSE;
>>>>>        }
>>>>>
>>>>> So, it should handle the case without REX just OK. If it doesn't, then
>>>>> this is a bug in binutils.
>>>>>
>>>>
>>>> The last byte of the displacement in the previous instruction
>>>> may happen to look like a REX byte. In that case, linker
>>>> will overwrite the last byte of the previous instruction and
>>>> generate the wrong instruction sequence.
>>>>
>>>> I need to update linker to enforce the REX byte check.
>>>
>>> One important observation: if we want to follow the x86_64 TLS spec
>>> strictly, we have to use existing DImode patterns only. This also
>>> means that we should NOT convert other TLS patterns to Pmode, since
>>> they explicitly state movq and addq. If this is not the case, then we
>>> need new TLS specification for X32.
>>
>> Here is a patch to properly generate X32 IE sequence.
>>
>> This is the summary of differences between x86-64 TLS and x32 TLS:
>>
>>                     x86-64                               x32
>> GD
>>    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
>>    .word 0x6666; rex64; call __tls_get_addr@plt  .word 0x6666; rex64;
>> call __tls_get_addr@plt
>>
>> GD->IE optimization
>>   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
>> addq x@gottpoff(%rip),%rax
>>
>> GD->LE optimization
>>   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
>> leaq x@tpoff(%rax),%rax
>>
>> LD
>>  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
>>  call __tls_get_addr@plt                         call __tls_get_addr@plt
>>
>> LD->LE optimization
>>  .word 0x6666; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
>> %fs:0, %eax
>>
>> IE
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>
>>   or
>>                                                  Not supported if
>> Pmode == SImode
>>   movq x@gottpoff(%rip),%reg64;                  movq x@gottpoff(%rip),%reg64;
>>   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>
>> IE->LE optimization
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>
>>   to
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), %reg32
>>
>>   or
>>
>>   movq x@gottpoff(%rip),%reg64                   movq x@gottpoff(%rip),%reg64;
>>   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>
>>   to
>>
>>   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
>>   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32
>>
>> LE
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32
>>
>>   or
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32
>>
>>   or
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32
>>
>>   or
>>
>>   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32
>>
>>
>> X32 TLS implementation is straight forward, except for IE:
>>
>> 1. Since address override works only on the (reg32) part in fs:(reg32),
>> we can't use it as memory operand.  This patch changes ix86_decompose_address
>> to disallow  fs:(reg) if Pmode != word_mode.
>> 2. When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
>> any instructions between MOV and ADD, which may interfere linker
>> IE->LE optimization, since the last byte of the previous instruction
>> before ADD may look like a REX prefix.  This patch adds tls_initial_exec_x32
>> to make sure that we always have
>>
>> movl %fs:0, %reg32
>> addl xgottpoff(%rip), %reg32
>>
>> so that the last byte of the previous instruction before ADD will
>> never be a REX byte.  Tested on Linux/x32.
>>
>> 2012-03-09  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>        if Pmode != word_mode.
>>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>>        Pmode == SImode for x32.
>>
>>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>>        (tls_initial_exec_x32): Likewise.
>
> Nice solution!
>
> OK for mainline.

Done.

> BTW: Did you investigate the issue with memory aliasing?
>

It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
which loads address of the TLS symbol.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-10 18:50   ` H.J. Lu
@ 2012-03-11 17:12     ` H.J. Lu
  2012-03-11 17:55       ` Uros Bizjak
  2012-03-17 18:10       ` Uros Bizjak
  0 siblings, 2 replies; 43+ messages in thread
From: H.J. Lu @ 2012-03-11 17:12 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 7598 bytes --]

On Sat, Mar 10, 2012 at 10:49 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
>>>>>>> by checking
>>>>>>>
>>>>>>>        movq foo@gottpoff(%rip), %reg
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>        addq foo@gottpoff(%rip), %reg
>>>>>>>
>>>>>>> It uses the REX prefix to avoid the last byte of the previous
>>>>>>> instruction.  With 32bit Pmode, we may not have the REX prefix and
>>>>>>> the last byte of the previous instruction may be an offset, which
>>>>>>> may look like a REX prefix.  IE->LE optimization will generate corrupted
>>>>>>> binary.  This patch makes sure we always output an REX pfrefix for
>>>>>>> UNSPEC_GOTNTPOFF.  OK for trunk?
>>>>>>
>>>>>> Actually, linker has:
>>>>>>
>>>>>>    case R_X86_64_GOTTPOFF:
>>>>>>      /* Check transition from IE access model:
>>>>>>                mov foo@gottpoff(%rip), %reg
>>>>>>                add foo@gottpoff(%rip), %reg
>>>>>>       */
>>>>>>
>>>>>>      /* Check REX prefix first.  */
>>>>>>      if (offset >= 3 && (offset + 4) <= sec->size)
>>>>>>        {
>>>>>>          val = bfd_get_8 (abfd, contents + offset - 3);
>>>>>>          if (val != 0x48 && val != 0x4c)
>>>>>>            {
>>>>>>              /* X32 may have 0x44 REX prefix or no REX prefix.  */
>>>>>>              if (ABI_64_P (abfd))
>>>>>>                return FALSE;
>>>>>>            }
>>>>>>        }
>>>>>>      else
>>>>>>        {
>>>>>>          /* X32 may not have any REX prefix.  */
>>>>>>          if (ABI_64_P (abfd))
>>>>>>            return FALSE;
>>>>>>          if (offset < 2 || (offset + 3) > sec->size)
>>>>>>            return FALSE;
>>>>>>        }
>>>>>>
>>>>>> So, it should handle the case without REX just OK. If it doesn't, then
>>>>>> this is a bug in binutils.
>>>>>>
>>>>>
>>>>> The last byte of the displacement in the previous instruction
>>>>> may happen to look like a REX byte. In that case, linker
>>>>> will overwrite the last byte of the previous instruction and
>>>>> generate the wrong instruction sequence.
>>>>>
>>>>> I need to update linker to enforce the REX byte check.
>>>>
>>>> One important observation: if we want to follow the x86_64 TLS spec
>>>> strictly, we have to use existing DImode patterns only. This also
>>>> means that we should NOT convert other TLS patterns to Pmode, since
>>>> they explicitly state movq and addq. If this is not the case, then we
>>>> need new TLS specification for X32.
>>>
>>> Here is a patch to properly generate X32 IE sequence.
>>>
>>> This is the summary of differences between x86-64 TLS and x32 TLS:
>>>
>>>                     x86-64                               x32
>>> GD
>>>    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
>>>    .word 0x6666; rex64; call __tls_get_addr@plt  .word 0x6666; rex64;
>>> call __tls_get_addr@plt
>>>
>>> GD->IE optimization
>>>   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
>>> addq x@gottpoff(%rip),%rax
>>>
>>> GD->LE optimization
>>>   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
>>> leaq x@tpoff(%rax),%rax
>>>
>>> LD
>>>  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
>>>  call __tls_get_addr@plt                         call __tls_get_addr@plt
>>>
>>> LD->LE optimization
>>>  .word 0x6666; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
>>> %fs:0, %eax
>>>
>>> IE
>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>>
>>>   or
>>>                                                  Not supported if
>>> Pmode == SImode
>>>   movq x@gottpoff(%rip),%reg64;                  movq x@gottpoff(%rip),%reg64;
>>>   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>>
>>> IE->LE optimization
>>>
>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>>
>>>   to
>>>
>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32
>>>
>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), %reg32
>>>
>>>   or
>>>
>>>   movq x@gottpoff(%rip),%reg64                   movq x@gottpoff(%rip),%reg64;
>>>   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>>
>>>   to
>>>
>>>   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
>>>   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32
>>>
>>> LE
>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32
>>>
>>>   or
>>>
>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32
>>>
>>>   or
>>>
>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32
>>>
>>>   or
>>>
>>>   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32
>>>
>>>
>>> X32 TLS implementation is straight forward, except for IE:
>>>
>>> 1. Since address override works only on the (reg32) part in fs:(reg32),
>>> we can't use it as memory operand.  This patch changes ix86_decompose_address
>>> to disallow  fs:(reg) if Pmode != word_mode.
>>> 2. When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
>>> any instructions between MOV and ADD, which may interfere linker
>>> IE->LE optimization, since the last byte of the previous instruction
>>> before ADD may look like a REX prefix.  This patch adds tls_initial_exec_x32
>>> to make sure that we always have
>>>
>>> movl %fs:0, %reg32
>>> addl xgottpoff(%rip), %reg32
>>>
>>> so that the last byte of the previous instruction before ADD will
>>> never be a REX byte.  Tested on Linux/x32.
>>>
>>> 2012-03-09  H.J. Lu  <hongjiu.lu@intel.com>
>>>
>>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>>        if Pmode != word_mode.
>>>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>>>        Pmode == SImode for x32.
>>>
>>>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>>>        (tls_initial_exec_x32): Likewise.
>>
>> Nice solution!
>>
>> OK for mainline.
>
> Done.
>
>> BTW: Did you investigate the issue with memory aliasing?
>>
>
> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
> which loads address of the TLS symbol.
>
> Thanks.
>

Since we must use reg64 in %fs:(%reg) memory operand like

movq x@gottpoff(%rip),%reg64;
mov %fs:(%reg64),%reg

this patch optimizes x32 TLS IE load and store by wrapping
%reg64 inside of UNSPEC when Pmode == SImode.  OK for
trunk?

Thanks.

-- 
H.J.
---
2012-03-11  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.md (*tls_initial_exec_x32_load): New.
	(*tls_initial_exec_x32_store): Likewise.

[-- Attachment #2: gcc-x32-tls-2.patch --]
[-- Type: text/plain, Size: 1614 bytes --]

2012-03-11  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.md (*tls_initial_exec_x32_load): New.
	(*tls_initial_exec_x32_store): Likewise.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ae1dd1c..67441cd 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12806,6 +12806,41 @@
 }
   [(set_attr "type" "multi")])
 
+(define_insn "*tls_initial_exec_x32_load"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r")
+        (mem:SWI1248x
+	  (unspec:SI
+	   [(match_operand:SI 1 "tls_symbolic_operand" "")]
+	   UNSPEC_TLS_IE_X32)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_X32"
+{
+  output_asm_insn
+    ("mov{q}\t{%a1@gottpoff(%%rip), %q0|%q0, %a1@gottpoff[rip]}",
+     operands);
+  if (!TARGET_MOVX || <MODE>mode == DImode || <MODE>mode == SImode)
+    return "mov{<imodesuffix>}\t{%%fs:(%q0), %0|%0, <iptrsize> PTR fs:[%q0]}";
+    return "movz{<imodesuffix>l|x}\t{%%fs:(%q0), %k0|%k0, <iptrsize> PTR fs:[%q0]}";
+}
+  [(set_attr "type" "multi")])
+
+(define_insn "*tls_initial_exec_x32_store"
+  [(set (mem:SWI1248x
+	  (unspec:SI
+	   [(match_operand:SI 0 "tls_symbolic_operand" "")]
+	   UNSPEC_TLS_IE_X32))
+  	(match_operand:SWI1248x 1 "register_operand" "r"))
+   (clobber (match_scratch:DI 2 "=&r"))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_X32"
+{
+  output_asm_insn
+    ("mov{q}\t{%a0@gottpoff(%%rip), %q2|%q2, %a0@gottpoff[rip]}",
+     operands);
+  return "mov{<imodesuffix>}\t{%1, %%fs:(%q2)|<iptrsize> PTR fs:[%q2], %1}";
+}
+  [(set_attr "type" "multi")])
+
 ;; GNU2 TLS patterns can be split.
 
 (define_expand "tls_dynamic_gnu2_32"

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-11 17:12     ` H.J. Lu
@ 2012-03-11 17:55       ` Uros Bizjak
  2012-03-11 18:16         ` H.J. Lu
  2012-03-17 18:10       ` Uros Bizjak
  1 sibling, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-11 17:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
>>>>>>>> by checking
>>>>>>>>
>>>>>>>>        movq foo@gottpoff(%rip), %reg
>>>>>>>>
>>>>>>>> and
>>>>>>>>
>>>>>>>>        addq foo@gottpoff(%rip), %reg
>>>>>>>>
>>>>>>>> It uses the REX prefix to avoid the last byte of the previous
>>>>>>>> instruction.  With 32bit Pmode, we may not have the REX prefix and
>>>>>>>> the last byte of the previous instruction may be an offset, which
>>>>>>>> may look like a REX prefix.  IE->LE optimization will generate corrupted
>>>>>>>> binary.  This patch makes sure we always output an REX pfrefix for
>>>>>>>> UNSPEC_GOTNTPOFF.  OK for trunk?
>>>>>>>
>>>>>>> Actually, linker has:
>>>>>>>
>>>>>>>    case R_X86_64_GOTTPOFF:
>>>>>>>      /* Check transition from IE access model:
>>>>>>>                mov foo@gottpoff(%rip), %reg
>>>>>>>                add foo@gottpoff(%rip), %reg
>>>>>>>       */
>>>>>>>
>>>>>>>      /* Check REX prefix first.  */
>>>>>>>      if (offset >= 3 && (offset + 4) <= sec->size)
>>>>>>>        {
>>>>>>>          val = bfd_get_8 (abfd, contents + offset - 3);
>>>>>>>          if (val != 0x48 && val != 0x4c)
>>>>>>>            {
>>>>>>>              /* X32 may have 0x44 REX prefix or no REX prefix.  */
>>>>>>>              if (ABI_64_P (abfd))
>>>>>>>                return FALSE;
>>>>>>>            }
>>>>>>>        }
>>>>>>>      else
>>>>>>>        {
>>>>>>>          /* X32 may not have any REX prefix.  */
>>>>>>>          if (ABI_64_P (abfd))
>>>>>>>            return FALSE;
>>>>>>>          if (offset < 2 || (offset + 3) > sec->size)
>>>>>>>            return FALSE;
>>>>>>>        }
>>>>>>>
>>>>>>> So, it should handle the case without REX just OK. If it doesn't, then
>>>>>>> this is a bug in binutils.
>>>>>>>
>>>>>>
>>>>>> The last byte of the displacement in the previous instruction
>>>>>> may happen to look like a REX byte. In that case, linker
>>>>>> will overwrite the last byte of the previous instruction and
>>>>>> generate the wrong instruction sequence.
>>>>>>
>>>>>> I need to update linker to enforce the REX byte check.
>>>>>
>>>>> One important observation: if we want to follow the x86_64 TLS spec
>>>>> strictly, we have to use existing DImode patterns only. This also
>>>>> means that we should NOT convert other TLS patterns to Pmode, since
>>>>> they explicitly state movq and addq. If this is not the case, then we
>>>>> need new TLS specification for X32.
>>>>
>>>> Here is a patch to properly generate X32 IE sequence.
>>>>
>>>> This is the summary of differences between x86-64 TLS and x32 TLS:
>>>>
>>>>                     x86-64                               x32
>>>> GD
>>>>    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
>>>>    .word 0x6666; rex64; call __tls_get_addr@plt  .word 0x6666; rex64;
>>>> call __tls_get_addr@plt
>>>>
>>>> GD->IE optimization
>>>>   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
>>>> addq x@gottpoff(%rip),%rax
>>>>
>>>> GD->LE optimization
>>>>   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
>>>> leaq x@tpoff(%rax),%rax
>>>>
>>>> LD
>>>>  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
>>>>  call __tls_get_addr@plt                         call __tls_get_addr@plt
>>>>
>>>> LD->LE optimization
>>>>  .word 0x6666; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
>>>> %fs:0, %eax
>>>>
>>>> IE
>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>>>
>>>>   or
>>>>                                                  Not supported if
>>>> Pmode == SImode
>>>>   movq x@gottpoff(%rip),%reg64;                  movq x@gottpoff(%rip),%reg64;
>>>>   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>>>
>>>> IE->LE optimization
>>>>
>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>>>
>>>>   to
>>>>
>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32
>>>>
>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), %reg32
>>>>
>>>>   or
>>>>
>>>>   movq x@gottpoff(%rip),%reg64                   movq x@gottpoff(%rip),%reg64;
>>>>   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>>>
>>>>   to
>>>>
>>>>   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
>>>>   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32
>>>>
>>>> LE
>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32
>>>>
>>>>   or
>>>>
>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32
>>>>
>>>>   or
>>>>
>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32
>>>>
>>>>   or
>>>>
>>>>   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32
>>>>
>>>>
>>>> X32 TLS implementation is straight forward, except for IE:
>>>>
>>>> 1. Since address override works only on the (reg32) part in fs:(reg32),
>>>> we can't use it as memory operand.  This patch changes ix86_decompose_address
>>>> to disallow  fs:(reg) if Pmode != word_mode.
>>>> 2. When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
>>>> any instructions between MOV and ADD, which may interfere linker
>>>> IE->LE optimization, since the last byte of the previous instruction
>>>> before ADD may look like a REX prefix.  This patch adds tls_initial_exec_x32
>>>> to make sure that we always have
>>>>
>>>> movl %fs:0, %reg32
>>>> addl xgottpoff(%rip), %reg32
>>>>
>>>> so that the last byte of the previous instruction before ADD will
>>>> never be a REX byte.  Tested on Linux/x32.
>>>>
>>>> 2012-03-09  H.J. Lu  <hongjiu.lu@intel.com>
>>>>
>>>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>>>        if Pmode != word_mode.
>>>>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>>>>        Pmode == SImode for x32.
>>>>
>>>>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>>>>        (tls_initial_exec_x32): Likewise.
>>>
>>> Nice solution!
>>>
>>> OK for mainline.
>>
>> Done.
>>
>>> BTW: Did you investigate the issue with memory aliasing?
>>>
>>
>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
>> which loads address of the TLS symbol.
>>
>> Thanks.
>>
>
> Since we must use reg64 in %fs:(%reg) memory operand like
>
> movq x@gottpoff(%rip),%reg64;
> mov %fs:(%reg64),%reg
>
> this patch optimizes x32 TLS IE load and store by wrapping
> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
> trunk?

I think we should just scrap all these complications and go with the
idea of clearing MASK_TLS_DIRECT_SEG_REFS.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-11 17:55       ` Uros Bizjak
@ 2012-03-11 18:16         ` H.J. Lu
  2012-03-11 18:21           ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-11 18:16 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Sun, Mar 11, 2012 at 10:55 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
>>>>>>>>> by checking
>>>>>>>>>
>>>>>>>>>        movq foo@gottpoff(%rip), %reg
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>        addq foo@gottpoff(%rip), %reg
>>>>>>>>>
>>>>>>>>> It uses the REX prefix to avoid the last byte of the previous
>>>>>>>>> instruction.  With 32bit Pmode, we may not have the REX prefix and
>>>>>>>>> the last byte of the previous instruction may be an offset, which
>>>>>>>>> may look like a REX prefix.  IE->LE optimization will generate corrupted
>>>>>>>>> binary.  This patch makes sure we always output an REX pfrefix for
>>>>>>>>> UNSPEC_GOTNTPOFF.  OK for trunk?
>>>>>>>>
>>>>>>>> Actually, linker has:
>>>>>>>>
>>>>>>>>    case R_X86_64_GOTTPOFF:
>>>>>>>>      /* Check transition from IE access model:
>>>>>>>>                mov foo@gottpoff(%rip), %reg
>>>>>>>>                add foo@gottpoff(%rip), %reg
>>>>>>>>       */
>>>>>>>>
>>>>>>>>      /* Check REX prefix first.  */
>>>>>>>>      if (offset >= 3 && (offset + 4) <= sec->size)
>>>>>>>>        {
>>>>>>>>          val = bfd_get_8 (abfd, contents + offset - 3);
>>>>>>>>          if (val != 0x48 && val != 0x4c)
>>>>>>>>            {
>>>>>>>>              /* X32 may have 0x44 REX prefix or no REX prefix.  */
>>>>>>>>              if (ABI_64_P (abfd))
>>>>>>>>                return FALSE;
>>>>>>>>            }
>>>>>>>>        }
>>>>>>>>      else
>>>>>>>>        {
>>>>>>>>          /* X32 may not have any REX prefix.  */
>>>>>>>>          if (ABI_64_P (abfd))
>>>>>>>>            return FALSE;
>>>>>>>>          if (offset < 2 || (offset + 3) > sec->size)
>>>>>>>>            return FALSE;
>>>>>>>>        }
>>>>>>>>
>>>>>>>> So, it should handle the case without REX just OK. If it doesn't, then
>>>>>>>> this is a bug in binutils.
>>>>>>>>
>>>>>>>
>>>>>>> The last byte of the displacement in the previous instruction
>>>>>>> may happen to look like a REX byte. In that case, linker
>>>>>>> will overwrite the last byte of the previous instruction and
>>>>>>> generate the wrong instruction sequence.
>>>>>>>
>>>>>>> I need to update linker to enforce the REX byte check.
>>>>>>
>>>>>> One important observation: if we want to follow the x86_64 TLS spec
>>>>>> strictly, we have to use existing DImode patterns only. This also
>>>>>> means that we should NOT convert other TLS patterns to Pmode, since
>>>>>> they explicitly state movq and addq. If this is not the case, then we
>>>>>> need new TLS specification for X32.
>>>>>
>>>>> Here is a patch to properly generate X32 IE sequence.
>>>>>
>>>>> This is the summary of differences between x86-64 TLS and x32 TLS:
>>>>>
>>>>>                     x86-64                               x32
>>>>> GD
>>>>>    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
>>>>>    .word 0x6666; rex64; call __tls_get_addr@plt  .word 0x6666; rex64;
>>>>> call __tls_get_addr@plt
>>>>>
>>>>> GD->IE optimization
>>>>>   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
>>>>> addq x@gottpoff(%rip),%rax
>>>>>
>>>>> GD->LE optimization
>>>>>   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
>>>>> leaq x@tpoff(%rax),%rax
>>>>>
>>>>> LD
>>>>>  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
>>>>>  call __tls_get_addr@plt                         call __tls_get_addr@plt
>>>>>
>>>>> LD->LE optimization
>>>>>  .word 0x6666; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
>>>>> %fs:0, %eax
>>>>>
>>>>> IE
>>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>>>>
>>>>>   or
>>>>>                                                  Not supported if
>>>>> Pmode == SImode
>>>>>   movq x@gottpoff(%rip),%reg64;                  movq x@gottpoff(%rip),%reg64;
>>>>>   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>>>>
>>>>> IE->LE optimization
>>>>>
>>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>>>>
>>>>>   to
>>>>>
>>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>>   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32
>>>>>
>>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>>   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), %reg32
>>>>>
>>>>>   or
>>>>>
>>>>>   movq x@gottpoff(%rip),%reg64                   movq x@gottpoff(%rip),%reg64;
>>>>>   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>>>>
>>>>>   to
>>>>>
>>>>>   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
>>>>>   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32
>>>>>
>>>>> LE
>>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>>   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32
>>>>>
>>>>>   or
>>>>>
>>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>>   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32
>>>>>
>>>>>   or
>>>>>
>>>>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>>>>   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32
>>>>>
>>>>>   or
>>>>>
>>>>>   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32
>>>>>
>>>>>
>>>>> X32 TLS implementation is straight forward, except for IE:
>>>>>
>>>>> 1. Since address override works only on the (reg32) part in fs:(reg32),
>>>>> we can't use it as memory operand.  This patch changes ix86_decompose_address
>>>>> to disallow  fs:(reg) if Pmode != word_mode.
>>>>> 2. When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
>>>>> any instructions between MOV and ADD, which may interfere linker
>>>>> IE->LE optimization, since the last byte of the previous instruction
>>>>> before ADD may look like a REX prefix.  This patch adds tls_initial_exec_x32
>>>>> to make sure that we always have
>>>>>
>>>>> movl %fs:0, %reg32
>>>>> addl xgottpoff(%rip), %reg32
>>>>>
>>>>> so that the last byte of the previous instruction before ADD will
>>>>> never be a REX byte.  Tested on Linux/x32.
>>>>>
>>>>> 2012-03-09  H.J. Lu  <hongjiu.lu@intel.com>
>>>>>
>>>>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>>>>        if Pmode != word_mode.
>>>>>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>>>>>        Pmode == SImode for x32.
>>>>>
>>>>>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>>>>>        (tls_initial_exec_x32): Likewise.
>>>>
>>>> Nice solution!
>>>>
>>>> OK for mainline.
>>>
>>> Done.
>>>
>>>> BTW: Did you investigate the issue with memory aliasing?
>>>>
>>>
>>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
>>> which loads address of the TLS symbol.
>>>
>>> Thanks.
>>>
>>
>> Since we must use reg64 in %fs:(%reg) memory operand like
>>
>> movq x@gottpoff(%rip),%reg64;
>> mov %fs:(%reg64),%reg
>>
>> this patch optimizes x32 TLS IE load and store by wrapping
>> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
>> trunk?
>
> I think we should just scrap all these complications and go with the
> idea of clearing MASK_TLS_DIRECT_SEG_REFS.
>

I will give it a try.


-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-11 18:16         ` H.J. Lu
@ 2012-03-11 18:21           ` Uros Bizjak
  2012-03-11 21:25             ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-11 18:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>>>>>        if Pmode != word_mode.
>>>>>>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>>>>>>        Pmode == SImode for x32.
>>>>>>
>>>>>>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>>>>>>        (tls_initial_exec_x32): Likewise.
>>>>>
>>>>> Nice solution!
>>>>>
>>>>> OK for mainline.
>>>>
>>>> Done.
>>>>
>>>>> BTW: Did you investigate the issue with memory aliasing?
>>>>>
>>>>
>>>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
>>>> which loads address of the TLS symbol.
>>>>
>>>> Thanks.
>>>>
>>>
>>> Since we must use reg64 in %fs:(%reg) memory operand like
>>>
>>> movq x@gottpoff(%rip),%reg64;
>>> mov %fs:(%reg64),%reg
>>>
>>> this patch optimizes x32 TLS IE load and store by wrapping
>>> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
>>> trunk?
>>
>> I think we should just scrap all these complications and go with the
>> idea of clearing MASK_TLS_DIRECT_SEG_REFS.
>>
>
> I will give it a try.

You can also revert:

>>>>>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>>>>>        if Pmode != word_mode.

then, since this part is handled later in the function.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-11 18:21           ` Uros Bizjak
@ 2012-03-11 21:25             ` H.J. Lu
  2012-03-12 19:39               ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-11 21:25 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Sun, Mar 11, 2012 at 11:21 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>>>>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>>>>>>        if Pmode != word_mode.
>>>>>>>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>>>>>>>        Pmode == SImode for x32.
>>>>>>>
>>>>>>>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>>>>>>>        (tls_initial_exec_x32): Likewise.
>>>>>>
>>>>>> Nice solution!
>>>>>>
>>>>>> OK for mainline.
>>>>>
>>>>> Done.
>>>>>
>>>>>> BTW: Did you investigate the issue with memory aliasing?
>>>>>>
>>>>>
>>>>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
>>>>> which loads address of the TLS symbol.
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>> Since we must use reg64 in %fs:(%reg) memory operand like
>>>>
>>>> movq x@gottpoff(%rip),%reg64;
>>>> mov %fs:(%reg64),%reg
>>>>
>>>> this patch optimizes x32 TLS IE load and store by wrapping
>>>> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
>>>> trunk?
>>>
>>> I think we should just scrap all these complications and go with the
>>> idea of clearing MASK_TLS_DIRECT_SEG_REFS.
>>>
>>
>> I will give it a try.
>
> You can also revert:
>
>>>>>>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>>>>>>        if Pmode != word_mode.
>
> then, since this part is handled later in the function.
>

Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
when Pmode != word_mode.  We need to keep

          else if (Pmode == SImode)
            {
              /* Always generate
                        movl %fs:0, %reg32
                        addl xgottpoff(%rip), %reg32
                 to support linker IE->LE optimization and avoid
                 fs:(%reg32) as memory operand.  */
              dest = gen_reg_rtx (Pmode);
              emit_insn (gen_tls_initial_exec_x32 (dest, x));
              return dest;
            }

to support linker IE->LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only affects
TLS LE access and fs:(%reg) is only generated by combine.

So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
fs:immediate memory operand for TLS LE access, which doesn't have any problems
to begin with.

I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
fs:(%reg), which is generated by combine.

-- 
H.J.
--
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b101922..1ffcc85 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11478,6 +11478,7 @@ ix86_decompose_address (rtx addr, struct
ix86_address *out)

 	    case UNSPEC:
 	      if (XINT (op, 1) == UNSPEC_TP
+		  && Pmode == word_mode
 	          && TARGET_TLS_DIRECT_SEG_REFS
 	          && seg == SEG_DEFAULT)
 		seg = TARGET_64BIT ? SEG_FS : SEG_GS;
@@ -11534,11 +11535,6 @@ ix86_decompose_address (rtx addr, struct
ix86_address *out)
   else
     disp = addr;			/* displacement */

-  /* Since address override works only on the (reg32) part in fs:(reg32),
-     we can't use it as memory operand.  */
-  if (Pmode != word_mode && seg == SEG_FS && (base || index))
-    return 0;
-
   if (index)
     {
       if (REG_P (index))
@@ -12706,7 +12702,9 @@ legitimize_tls_address (rtx x, enum tls_model
model, bool for_mov)

       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-	  base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+	  base = get_thread_pointer (for_mov
+				     || Pmode != word_mode
+				     || !TARGET_TLS_DIRECT_SEG_REFS);
 	  return gen_rtx_PLUS (Pmode, base, off);
 	}
       else
@@ -13239,7 +13237,7 @@ ix86_delegitimize_tls_address (rtx orig_x)
   rtx x = orig_x, unspec;
   struct ix86_address addr;

-  if (!TARGET_TLS_DIRECT_SEG_REFS)
+  if (Pmode != word_mode || !TARGET_TLS_DIRECT_SEG_REFS)
     return orig_x;
   if (MEM_P (x))
     x = XEXP (x, 0);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-11 21:25             ` H.J. Lu
@ 2012-03-12 19:39               ` Uros Bizjak
  2012-03-12 22:35                 ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-12 19:39 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
> when Pmode != word_mode.  We need to keep
>
>          else if (Pmode == SImode)
>            {
>              /* Always generate
>                        movl %fs:0, %reg32
>                        addl xgottpoff(%rip), %reg32
>                 to support linker IE->LE optimization and avoid
>                 fs:(%reg32) as memory operand.  */
>              dest = gen_reg_rtx (Pmode);
>              emit_insn (gen_tls_initial_exec_x32 (dest, x));
>              return dest;
>            }
>
> to support linker IE->LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only affects
> TLS LE access and fs:(%reg) is only generated by combine.
>
> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
> fs:immediate memory operand for TLS LE access, which doesn't have any problems
> to begin with.
>
> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
> fs:(%reg), which is generated by combine.

Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
to block only indirect seg references.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2325 bytes --]

Index: i386.c
===================================================================
--- i386.c	(revision 185250)
+++ i386.c	(working copy)
@@ -11552,11 +11552,6 @@ ix86_decompose_address (rtx addr, struct ix86_addr
   else
     disp = addr;			/* displacement */
 
-  /* Since address override works only on the (reg32) part in fs:(reg32),
-     we can't use it as memory operand.  */
-  if (Pmode != word_mode && seg == SEG_FS && (base || index))
-    return 0;
-
   if (index)
     {
       if (REG_P (index))
@@ -11568,6 +11563,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr
 	return 0;
     }
 
+  if (seg != SEG_DEFAULT && (base || index)
+      && !TARGET_TLS_INDIRECT_SEG_REFS)
+    return 0;
+
   /* Extract the integral value of scale.  */
   if (scale_rtx)
     {
@@ -12696,7 +12695,9 @@ legitimize_tls_address (rtx x, enum tls_model mode
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-          base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+          base = get_thread_pointer (for_mov
+				     || !(TARGET_TLS_DIRECT_SEG_REFS
+					  && TARGET_TLS_INDIRECT_SEG_REFS));
 	  off = force_reg (Pmode, off);
 	  return gen_rtx_PLUS (Pmode, base, off);
 	}
@@ -12716,7 +12717,9 @@ legitimize_tls_address (rtx x, enum tls_model mode
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-	  base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+	  base = get_thread_pointer (for_mov
+				     || !(TARGET_TLS_DIRECT_SEG_REFS
+					  && TARGET_TLS_INDIRECT_SEG_REFS));
 	  return gen_rtx_PLUS (Pmode, base, off);
 	}
       else
@@ -13249,7 +13252,8 @@ ix86_delegitimize_tls_address (rtx orig_x)
   rtx x = orig_x, unspec;
   struct ix86_address addr;
 
-  if (!TARGET_TLS_DIRECT_SEG_REFS)
+  if (!(TARGET_TLS_DIRECT_SEG_REFS
+	&& TARGET_TLS_INDIRECT_SEG_REFS))
     return orig_x;
   if (MEM_P (x))
     x = XEXP (x, 0);
Index: i386.h
===================================================================
--- i386.h	(revision 185250)
+++ i386.h	(working copy)
@@ -467,6 +467,9 @@ extern int x86_prefetch_sse;
 #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0
 #endif
 
+/* Address override works only on the (%reg) part in %fs:(%reg).  */
+#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode)
+
 /* Fence to use after loop using storent.  */
 
 extern tree x86_mfence;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-12 19:39               ` Uros Bizjak
@ 2012-03-12 22:35                 ` H.J. Lu
  2012-03-13  1:21                   ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-12 22:35 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 12, 2012 at 12:39 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
>> when Pmode != word_mode.  We need to keep
>>
>>          else if (Pmode == SImode)
>>            {
>>              /* Always generate
>>                        movl %fs:0, %reg32
>>                        addl xgottpoff(%rip), %reg32
>>                 to support linker IE->LE optimization and avoid
>>                 fs:(%reg32) as memory operand.  */
>>              dest = gen_reg_rtx (Pmode);
>>              emit_insn (gen_tls_initial_exec_x32 (dest, x));
>>              return dest;
>>            }
>>
>> to support linker IE->LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only affects
>> TLS LE access and fs:(%reg) is only generated by combine.
>>
>> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
>> fs:immediate memory operand for TLS LE access, which doesn't have any problems
>> to begin with.
>>
>> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
>> fs:(%reg), which is generated by combine.
>
> Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
> to block only indirect seg references.
>
> Uros.

I am testing it.

-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-12 22:35                 ` H.J. Lu
@ 2012-03-13  1:21                   ` H.J. Lu
  2012-03-13  7:11                     ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-13  1:21 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 12, 2012 at 3:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 12, 2012 at 12:39 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
>>> when Pmode != word_mode.  We need to keep
>>>
>>>          else if (Pmode == SImode)
>>>            {
>>>              /* Always generate
>>>                        movl %fs:0, %reg32
>>>                        addl xgottpoff(%rip), %reg32
>>>                 to support linker IE->LE optimization and avoid
>>>                 fs:(%reg32) as memory operand.  */
>>>              dest = gen_reg_rtx (Pmode);
>>>              emit_insn (gen_tls_initial_exec_x32 (dest, x));
>>>              return dest;
>>>            }
>>>
>>> to support linker IE->LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only affects
>>> TLS LE access and fs:(%reg) is only generated by combine.
>>>
>>> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
>>> fs:immediate memory operand for TLS LE access, which doesn't have any problems
>>> to begin with.
>>>
>>> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
>>> fs:(%reg), which is generated by combine.
>>
>> Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
>> to block only indirect seg references.
>>
>> Uros.
>
> I am testing it.
>

There is no regression.

BTW, this x32 TLS IE optimization:

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html

is still useful.  For

[hjl@gnu-6 tls]$ cat ie2.i
extern __thread long long int x;

extern long long int y;

void
ie2 (void)
{
  x = y;
}
[hjl@gnu-6 tls]$

my patch turns

ie2:
.LFB0:
	.cfi_startproc
	movq	y(%rip), %rdx	# 6	*movdi_internal_rex64/2	[length = 7]
	movl	%fs:0, %eax	# 5	tls_initial_exec_x32	[length = 16]
	addl	x@gottpoff(%rip), %eax
	movq	%rdx, (%eax)	# 7	*movdi_internal_rex64/4	[length = 3]
	ret	# 14	simple_return_internal	[length = 1]
	.cfi_endproc

into

ie2:
.LFB0:
	.cfi_startproc
	movq	y(%rip), %rax	# 6	*movdi_internal_rex64/2	[length = 7]
	movq	x@gottpoff(%rip), %rdx	# 7	*tls_initial_exec_x32_store	[length = 16]
	movq	%rax, %fs:(%rdx)
	ret	# 14	simple_return_internal	[length = 1]
	.cfi_endproc



-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-13  1:21                   ` H.J. Lu
@ 2012-03-13  7:11                     ` Uros Bizjak
  2012-03-13 10:37                       ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-13  7:11 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Tue, Mar 13, 2012 at 2:20 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>>> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
>>>> when Pmode != word_mode.  We need to keep
>>>>
>>>>          else if (Pmode == SImode)
>>>>            {
>>>>              /* Always generate
>>>>                        movl %fs:0, %reg32
>>>>                        addl xgottpoff(%rip), %reg32
>>>>                 to support linker IE->LE optimization and avoid
>>>>                 fs:(%reg32) as memory operand.  */
>>>>              dest = gen_reg_rtx (Pmode);
>>>>              emit_insn (gen_tls_initial_exec_x32 (dest, x));
>>>>              return dest;
>>>>            }
>>>>
>>>> to support linker IE->LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only affects
>>>> TLS LE access and fs:(%reg) is only generated by combine.
>>>>
>>>> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
>>>> fs:immediate memory operand for TLS LE access, which doesn't have any problems
>>>> to begin with.
>>>>
>>>> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
>>>> fs:(%reg), which is generated by combine.
>>>
>>> Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
>>> to block only indirect seg references.
>
> There is no regression.

Thanks, committed to mainline SVN with following ChangeLog:

2012-03-13  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
	* config/i386/i386.c (ix86_decompose_address): Use
	TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
	(legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
	thread pointer to a register.

Tested on x86_64-pc-linux-gnu {,-m32}.

> BTW, this x32 TLS IE optimization:

 >    movq    %rax, %fs:(%rdx)

This is just looking for troubles. If we said these addresses are
invalid, then we shouldn't generate them.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-13  7:11                     ` Uros Bizjak
@ 2012-03-13 10:37                       ` Uros Bizjak
  2012-03-13 15:47                         ` H.J. Lu
  2012-03-17 17:53                         ` H.J. Lu
  0 siblings, 2 replies; 43+ messages in thread
From: Uros Bizjak @ 2012-03-13 10:37 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>>> Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
>>>> to block only indirect seg references.
>>
>> There is no regression.
>
> Thanks, committed to mainline SVN with following ChangeLog:
>
> 2012-03-13  Uros Bizjak  <ubizjak@gmail.com>
>
>        * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
>        * config/i386/i386.c (ix86_decompose_address): Use
>        TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
>        (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
>        thread pointer to a register.
>
> Tested on x86_64-pc-linux-gnu {,-m32}.
>
>> BTW, this x32 TLS IE optimization:
>
>  >    movq    %rax, %fs:(%rdx)
>
> This is just looking for troubles. If we said these addresses are
> invalid, then we shouldn't generate them.

OTOH,  we can improve rejection test a bit to reject only non-word
mode registers.

2012-03-13  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg)
	addresses only when %reg is not in word mode.

Tested on x86_64-pc-linux-gnu {,-m32}, committed.

Uros.

Index: i386.c
===================================================================
--- i386.c      (revision 185278)
+++ i386.c      (working copy)
@@ -11563,8 +11563,10 @@
        return 0;
     }

-  if (seg != SEG_DEFAULT && (base || index)
-      && !TARGET_TLS_INDIRECT_SEG_REFS)
+/* Address override works only on the (%reg) part of %fs:(%reg).  */
+  if (seg != SEG_DEFAULT
+      && ((base && GET_MODE (base) != word_mode)
+         || (index && GET_MODE (index) != word_mode)))
     return 0;

   /* Extract the integral value of scale.  */

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-13 10:37                       ` Uros Bizjak
@ 2012-03-13 15:47                         ` H.J. Lu
  2012-03-17 17:53                         ` H.J. Lu
  1 sibling, 0 replies; 43+ messages in thread
From: H.J. Lu @ 2012-03-13 15:47 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>>>>> Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
>>>>> to block only indirect seg references.
>>>
>>> There is no regression.
>>
>> Thanks, committed to mainline SVN with following ChangeLog:
>>
>> 2012-03-13  Uros Bizjak  <ubizjak@gmail.com>
>>
>>        * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
>>        * config/i386/i386.c (ix86_decompose_address): Use
>>        TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
>>        (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
>>        thread pointer to a register.
>>
>> Tested on x86_64-pc-linux-gnu {,-m32}.
>>
>>> BTW, this x32 TLS IE optimization:
>>
>>  >    movq    %rax, %fs:(%rdx)
>>
>> This is just looking for troubles. If we said these addresses are
>> invalid, then we shouldn't generate them.
>
> OTOH,  we can improve rejection test a bit to reject only non-word
> mode registers.
>
> 2012-03-13  Uros Bizjak  <ubizjak@gmail.com>
>
>        * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg)
>        addresses only when %reg is not in word mode.
>
> Tested on x86_64-pc-linux-gnu {,-m32}, committed.
>
> Uros.
>
> Index: i386.c
> ===================================================================
> --- i386.c      (revision 185278)
> +++ i386.c      (working copy)
> @@ -11563,8 +11563,10 @@
>        return 0;
>     }
>
> -  if (seg != SEG_DEFAULT && (base || index)
> -      && !TARGET_TLS_INDIRECT_SEG_REFS)
> +/* Address override works only on the (%reg) part of %fs:(%reg).  */
> +  if (seg != SEG_DEFAULT
> +      && ((base && GET_MODE (base) != word_mode)
> +         || (index && GET_MODE (index) != word_mode)))
>     return 0;
>
>   /* Extract the integral value of scale.  */

This works.

-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-13 10:37                       ` Uros Bizjak
  2012-03-13 15:47                         ` H.J. Lu
@ 2012-03-17 17:53                         ` H.J. Lu
  1 sibling, 0 replies; 43+ messages in thread
From: H.J. Lu @ 2012-03-17 17:53 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>>>>> Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
>>>>> to block only indirect seg references.
>>>
>>> There is no regression.
>>
>> Thanks, committed to mainline SVN with following ChangeLog:
>>
>> 2012-03-13  Uros Bizjak  <ubizjak@gmail.com>
>>
>>        * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
>>        * config/i386/i386.c (ix86_decompose_address): Use
>>        TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
>>        (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
>>        thread pointer to a register.
>>
>> Tested on x86_64-pc-linux-gnu {,-m32}.
>>
>>> BTW, this x32 TLS IE optimization:
>>
>>  >    movq    %rax, %fs:(%rdx)
>>
>> This is just looking for troubles. If we said these addresses are
>> invalid, then we shouldn't generate them.
>
> OTOH,  we can improve rejection test a bit to reject only non-word
> mode registers.
>
> 2012-03-13  Uros Bizjak  <ubizjak@gmail.com>
>
>        * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg)
>        addresses only when %reg is not in word mode.
>
> Tested on x86_64-pc-linux-gnu {,-m32}, committed.
>
> Uros.
>
> Index: i386.c
> ===================================================================
> --- i386.c      (revision 185278)
> +++ i386.c      (working copy)
> @@ -11563,8 +11563,10 @@
>        return 0;
>     }
>
> -  if (seg != SEG_DEFAULT && (base || index)
> -      && !TARGET_TLS_INDIRECT_SEG_REFS)
> +/* Address override works only on the (%reg) part of %fs:(%reg).  */
> +  if (seg != SEG_DEFAULT
> +      && ((base && GET_MODE (base) != word_mode)
> +         || (index && GET_MODE (index) != word_mode)))
>     return 0;
>
>   /* Extract the integral value of scale.  */

Is my x32 TLS IE optimization:

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html

OK for trunk?

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-11 17:12     ` H.J. Lu
  2012-03-11 17:55       ` Uros Bizjak
@ 2012-03-17 18:10       ` Uros Bizjak
  2012-03-17 18:19         ` H.J. Lu
  1 sibling, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-17 18:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

> Since we must use reg64 in %fs:(%reg) memory operand like
>
> movq x@gottpoff(%rip),%reg64;
> mov %fs:(%reg64),%reg
>
> this patch optimizes x32 TLS IE load and store by wrapping
> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
> trunk?
>
> Thanks.
>
> --
> H.J.
> ---
> 2012-03-11  H.J. Lu  <hongjiu.lu@intel.com>
>
>        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
>        (*tls_initial_exec_x32_store): Likewise.

Can you implement this with define_insn_and_split, like i.e.
*tls_dynamic_gnu2_combine_32 ?

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-17 18:10       ` Uros Bizjak
@ 2012-03-17 18:19         ` H.J. Lu
  2012-03-17 18:21           ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-17 18:19 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Sat, Mar 17, 2012 at 11:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> Since we must use reg64 in %fs:(%reg) memory operand like
>>
>> movq x@gottpoff(%rip),%reg64;
>> mov %fs:(%reg64),%reg
>>
>> this patch optimizes x32 TLS IE load and store by wrapping
>> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
>> trunk?
>>
>> Thanks.
>>
>> --
>> H.J.
>> ---
>> 2012-03-11  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
>>        (*tls_initial_exec_x32_store): Likewise.
>
> Can you implement this with define_insn_and_split, like i.e.
> *tls_dynamic_gnu2_combine_32 ?
>

I will give it a try again.  Last time when I tried it, GCC didn't
like memory operand in DImode when Pmode == SImode.


-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-17 18:19         ` H.J. Lu
@ 2012-03-17 18:21           ` Uros Bizjak
  2012-03-17 21:50             ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-17 18:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> Since we must use reg64 in %fs:(%reg) memory operand like
>>>
>>> movq x@gottpoff(%rip),%reg64;
>>> mov %fs:(%reg64),%reg
>>>
>>> this patch optimizes x32 TLS IE load and store by wrapping
>>> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
>>> trunk?
>>>
>>> Thanks.
>>>
>>> --
>>> H.J.
>>> ---
>>> 2012-03-11  H.J. Lu  <hongjiu.lu@intel.com>
>>>
>>>        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
>>>        (*tls_initial_exec_x32_store): Likewise.
>>
>> Can you implement this with define_insn_and_split, like i.e.
>> *tls_dynamic_gnu2_combine_32 ?
>>
>
> I will give it a try again.  Last time when I tried it, GCC didn't
> like memory operand in DImode when Pmode == SImode.

You should remove mode for tls_symbolic_operand predicate.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-17 18:21           ` Uros Bizjak
@ 2012-03-17 21:50             ` H.J. Lu
  2012-03-18 16:02               ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-17 21:50 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1065 bytes --]

On Sat, Mar 17, 2012 at 11:20 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> Since we must use reg64 in %fs:(%reg) memory operand like
>>>>
>>>> movq x@gottpoff(%rip),%reg64;
>>>> mov %fs:(%reg64),%reg
>>>>
>>>> this patch optimizes x32 TLS IE load and store by wrapping
>>>> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
>>>> trunk?
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> H.J.
>>>> ---
>>>> 2012-03-11  H.J. Lu  <hongjiu.lu@intel.com>
>>>>
>>>>        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
>>>>        (*tls_initial_exec_x32_store): Likewise.
>>>
>>> Can you implement this with define_insn_and_split, like i.e.
>>> *tls_dynamic_gnu2_combine_32 ?
>>>
>>
>> I will give it a try again.  Last time when I tried it, GCC didn't
>> like memory operand in DImode when Pmode == SImode.
>
> You should remove mode for tls_symbolic_operand predicate.
>

I am testing this patch.  OK for trunk if it passes all tests?

Thanks.

-- 
H.J.

[-- Attachment #2: gcc-x32-tls-3.patch --]
[-- Type: text/plain, Size: 2938 bytes --]

2012-03-17  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New.
	* config/i386/i386.c (ix86_split_tls_initial_exec_x32): Likewise.

	* config/i386/i386.md (*tls_initial_exec_x32_load): New.
	(*tls_initial_exec_x32_store): Likewise.

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 630112f..2c4f1ed 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -213,6 +213,7 @@ extern unsigned int ix86_get_callcvt (const_tree);
 #endif
 
 extern rtx ix86_tls_module_base (void);
+extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool);
 
 extern void ix86_expand_vector_init (bool, rtx, rtx);
 extern void ix86_expand_vector_set (bool, rtx, rtx, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 78a366e..5a9c673 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12754,6 +12754,28 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
   return dest;
 }
 
+/* Split x32 TLS IE access in MODE.  Split load if LOAD is TRUE,
+   otherwise split store.  */
+
+void
+ix86_split_tls_initial_exec_x32 (rtx operands[],
+				 enum machine_mode mode, bool load)
+{
+  rtx base, mem;
+  rtx off = load ? operands[1] : operands[0];
+  off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF);
+  off = gen_rtx_CONST (DImode, off);
+  off = gen_const_mem (DImode, off);
+  set_mem_alias_set (off, ix86_GOT_alias_set ());
+  base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off));
+  mem = gen_rtx_MEM (mode, off);
+  if (load)
+    emit_move_insn (operands[0], mem);
+  else
+    emit_move_insn (mem, operands[1]);
+}
+
 /* Create or return the unique __imp_DECL dllimport symbol corresponding
    to symbol DECL.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eae26ae..78faeec 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12858,6 +12858,32 @@
 }
   [(set_attr "type" "multi")])
 
+(define_insn_and_split "*tls_initial_exec_x32_load"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r")
+        (mem:SWI1248x
+	  (unspec:SI
+	   [(match_operand 1 "tls_symbolic_operand" "")]
+	   UNSPEC_TLS_IE_X32)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_X32"
+  "#"
+  ""
+  [(const_int 0)]
+  "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, TRUE); DONE;")
+
+(define_insn_and_split "*tls_initial_exec_x32_store"
+  [(set (mem:SWI1248x
+	  (unspec:SI
+	   [(match_operand 0 "tls_symbolic_operand" "")]
+	   UNSPEC_TLS_IE_X32))
+  	(match_operand:SWI1248x 1 "register_operand" "r"))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_X32"
+  "#"
+  ""
+  [(const_int 0)]
+  "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, FALSE); DONE;")
+
 ;; GNU2 TLS patterns can be split.
 
 (define_expand "tls_dynamic_gnu2_32"

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-17 21:50             ` H.J. Lu
@ 2012-03-18 16:02               ` Uros Bizjak
  2012-03-18 20:55                 ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-18 16:02 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Sat, Mar 17, 2012 at 10:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>> Since we must use reg64 in %fs:(%reg) memory operand like
>>>>>
>>>>> movq x@gottpoff(%rip),%reg64;
>>>>> mov %fs:(%reg64),%reg
>>>>>
>>>>> this patch optimizes x32 TLS IE load and store by wrapping
>>>>> %reg64 inside of UNSPEC when Pmode == SImode.  OK for
>>>>> trunk?
>>>>
>>>> Can you implement this with define_insn_and_split, like i.e.
>>>> *tls_dynamic_gnu2_combine_32 ?
>>>>
>>>
>>> I will give it a try again.  Last time when I tried it, GCC didn't
>>> like memory operand in DImode when Pmode == SImode.
>>
>> You should remove mode for tls_symbolic_operand predicate.
>>
>
> I am testing this patch.  OK for trunk if it passes all tests?

No, force_reg will generate a pseudo, so this conversion is valid only
for !can_create_pseudo ().

At least for *tls_initial_exec_x32_store, you will need a temporary to
split the pattern after reload.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-18 16:02               ` Uros Bizjak
@ 2012-03-18 20:55                 ` Uros Bizjak
  2012-03-19 15:51                   ` H.J. Lu
  2012-03-20  8:52                   ` Eric Botcazou
  0 siblings, 2 replies; 43+ messages in thread
From: Uros Bizjak @ 2012-03-18 20:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]

On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>> I am testing this patch.  OK for trunk if it passes all tests?
>
> No, force_reg will generate a pseudo, so this conversion is valid only
> for !can_create_pseudo ().
>
> At least for *tls_initial_exec_x32_store, you will need a temporary to
> split the pattern after reload.

Please try attached patch. It simply throws away all recent
complications w.r.t. to thread pointer and always handles TP in
DImode.

The testcase:

--cut here--
__thread int foo __attribute__ ((tls_model ("initial-exec")));

void bar (int x)
{
  foo = x;
}

int baz (void)
{
  return foo;
}
--cut here--

Now compiles to:

bar:
        movq    foo@gottpoff(%rip), %rax
        movl    %edi, %fs:(%rax)
        ret

baz:
        movq    foo@gottpoff(%rip), %rax
        movl    %fs:(%rax), %eax
        ret

In effect, this always generates %fs(%rDI) and emits REX prefix before
mov/add to satisfy brain-dead linkers.

The patch is bootstrapping now on x86_64-pc-linux-gnu.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 6435 bytes --]

Index: i386.md
===================================================================
--- i386.md	(revision 185505)
+++ i386.md	(working copy)
@@ -12836,28 +12836,6 @@
 }
   [(set_attr "type" "multi")])
 
-;; When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
-;; any instructions between MOV and ADD, which may interfere linker
-;; IE->LE optimization, since the last byte of the previous instruction
-;; before ADD may look like a REX prefix.  This also avoids
-;;	movl x@gottpoff(%rip), %reg32
-;;	movl $fs:(%reg32), %reg32
-;; Since address override works only on the (reg32) part in fs:(reg32),
-;; we can't use it as memory operand.
-(define_insn "tls_initial_exec_x32"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(unspec:SI
-	 [(match_operand 1 "tls_symbolic_operand")]
-	 UNSPEC_TLS_IE_X32))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_X32"
-{
-  output_asm_insn
-    ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands);
-  return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}";
-}
-  [(set_attr "type" "multi")])
-
 ;; GNU2 TLS patterns can be split.
 
 (define_expand "tls_dynamic_gnu2_32"
Index: i386.c
===================================================================
--- i386.c	(revision 185504)
+++ i386.c	(working copy)
@@ -11509,6 +11509,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr
 	      scale = 1 << scale;
 	      break;
 
+	    case ZERO_EXTEND:
+	      op = XEXP (op, 0);
+	      /* FALLTHRU */
+
 	    case UNSPEC:
 	      if (XINT (op, 1) == UNSPEC_TP
 	          && TARGET_TLS_DIRECT_SEG_REFS
@@ -12478,15 +12482,15 @@ legitimize_pic_address (rtx orig, rtx reg)
 /* Load the thread pointer.  If TO_REG is true, force it into a register.  */
 
 static rtx
-get_thread_pointer (bool to_reg)
+get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
   rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
-  if (GET_MODE (tp) != Pmode)
-    tp = convert_to_mode (Pmode, tp, 1);
+  if (GET_MODE (tp) != tp_mode)
+    tp = convert_to_mode (tp_mode, tp, 1);
 
   if (to_reg)
-    tp = copy_addr_to_reg (tp);
+    tp = copy_to_mode_reg (tp_mode, tp);
 
   return tp;
 }
@@ -12538,6 +12542,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 {
   rtx dest, base, off;
   rtx pic = NULL_RTX, tp = NULL_RTX;
+  enum machine_mode tp_mode = Pmode;
   int type;
 
   switch (model)
@@ -12563,7 +12568,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 	  else
 	    emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest));
 
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL, x);
@@ -12613,7 +12618,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 	  else
 	    emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL,
 			       gen_rtx_MINUS (Pmode, tmp, tp));
 	}
@@ -12659,27 +12664,18 @@ legitimize_tls_address (rtx x, enum tls_model mode
     case TLS_MODEL_INITIAL_EXEC:
       if (TARGET_64BIT)
 	{
+	  tp_mode = DImode;
+
 	  if (TARGET_SUN_TLS)
 	    {
 	      /* The Sun linker took the AMD64 TLS spec literally
 		 and can only handle %rax as destination of the
 		 initial executable code sequence.  */
 
-	      dest = gen_reg_rtx (Pmode);
+	      dest = gen_reg_rtx (tp_mode);
 	      emit_insn (gen_tls_initial_exec_64_sun (dest, x));
 	      return dest;
 	    }
-	  else if (Pmode == SImode)
-	    {
-	      /* Always generate
-			movl %fs:0, %reg32
-			addl xgottpoff(%rip), %reg32
-		 to support linker IE->LE optimization and avoid
-		 fs:(%reg32) as memory operand.  */
-	      dest = gen_reg_rtx (Pmode);
-	      emit_insn (gen_tls_initial_exec_x32 (dest, x));
-	      return dest;
-	    }
 
 	  pic = NULL;
 	  type = UNSPEC_GOTNTPOFF;
@@ -12703,24 +12699,23 @@ legitimize_tls_address (rtx x, enum tls_model mode
 	  type = UNSPEC_INDNTPOFF;
 	}
 
-      off = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, x), type);
-      off = gen_rtx_CONST (Pmode, off);
+      off = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, x), type);
+      off = gen_rtx_CONST (tp_mode, off);
       if (pic)
-	off = gen_rtx_PLUS (Pmode, pic, off);
-      off = gen_const_mem (Pmode, off);
+	off = gen_rtx_PLUS (tp_mode, pic, off);
+      off = gen_const_mem (tp_mode, off);
       set_mem_alias_set (off, ix86_GOT_alias_set ());
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-          base = get_thread_pointer (for_mov
-				     || !(TARGET_TLS_DIRECT_SEG_REFS
-					  && TARGET_TLS_INDIRECT_SEG_REFS));
-	  off = force_reg (Pmode, off);
-	  return gen_rtx_PLUS (Pmode, base, off);
+	  base = get_thread_pointer (tp_mode,
+				     for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+	  off = force_reg (tp_mode, off);
+	  return gen_rtx_PLUS (tp_mode, base, off);
 	}
       else
 	{
-	  base = get_thread_pointer (true);
+	  base = get_thread_pointer (Pmode, true);
 	  dest = gen_reg_rtx (Pmode);
 	  emit_insn (gen_subsi3 (dest, base, off));
 	}
@@ -12734,14 +12729,13 @@ legitimize_tls_address (rtx x, enum tls_model mode
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-	  base = get_thread_pointer (for_mov
-				     || !(TARGET_TLS_DIRECT_SEG_REFS
-					  && TARGET_TLS_INDIRECT_SEG_REFS));
+	  base = get_thread_pointer (Pmode,
+				     for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
 	  return gen_rtx_PLUS (Pmode, base, off);
 	}
       else
 	{
-	  base = get_thread_pointer (true);
+	  base = get_thread_pointer (Pmode, true);
 	  dest = gen_reg_rtx (Pmode);
 	  emit_insn (gen_subsi3 (dest, base, off));
 	}
@@ -13269,8 +13263,7 @@ ix86_delegitimize_tls_address (rtx orig_x)
   rtx x = orig_x, unspec;
   struct ix86_address addr;
 
-  if (!(TARGET_TLS_DIRECT_SEG_REFS
-	&& TARGET_TLS_INDIRECT_SEG_REFS))
+  if (!TARGET_TLS_DIRECT_SEG_REFS)
     return orig_x;
   if (MEM_P (x))
     x = XEXP (x, 0);
Index: i386.h
===================================================================
--- i386.h	(revision 185504)
+++ i386.h	(working copy)
@@ -467,9 +467,6 @@ extern int x86_prefetch_sse;
 #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0
 #endif
 
-/* Address override works only on the (%reg) part of %fs:(%reg).  */
-#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode)
-
 /* Fence to use after loop using storent.  */
 
 extern tree x86_mfence;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-18 20:55                 ` Uros Bizjak
@ 2012-03-19 15:51                   ` H.J. Lu
  2012-03-19 15:54                     ` H.J. Lu
  2012-03-20  8:52                   ` Eric Botcazou
  1 sibling, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-19 15:51 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>>> I am testing this patch.  OK for trunk if it passes all tests?
>>
>> No, force_reg will generate a pseudo, so this conversion is valid only
>> for !can_create_pseudo ().
>>
>> At least for *tls_initial_exec_x32_store, you will need a temporary to
>> split the pattern after reload.

Here is the updated patch to add can_create_pseudo.  I also changed
tls_initial_exec_x32 to take an input register operand as thread pointer.

> Please try attached patch. It simply throws away all recent
> complications w.r.t. to thread pointer and always handles TP in
> DImode.
>
> The testcase:
>
> --cut here--
> __thread int foo __attribute__ ((tls_model ("initial-exec")));
>
> void bar (int x)
> {
>  foo = x;
> }
>
> int baz (void)
> {
>  return foo;
> }
> --cut here--
>
> Now compiles to:
>
> bar:
>        movq    foo@gottpoff(%rip), %rax
>        movl    %edi, %fs:(%rax)
>        ret
>
> baz:
>        movq    foo@gottpoff(%rip), %rax
>        movl    %fs:(%rax), %eax
>        ret
>
> In effect, this always generates %fs(%rDI) and emits REX prefix before
> mov/add to satisfy brain-dead linkers.
>
> The patch is bootstrapping now on x86_64-pc-linux-gnu.
>

For

--
extern __thread char c;
extern char y;
void
ie (void)
{
  y = c;
}
--

Your patch generates:

	movl	%fs:0, %eax	
	movq	c@gottpoff(%rip), %rdx	
	movzbl	(%rax,%rdx), %edx	
	movb	%dl, y(%rip)	
	ret	

It can be optimized to:

        movq	c@gottpoff(%rip), %rax	
	movzbl	%fs:(%rax), %eax	
	movb	%al, y(%rip)	
	ret	

H.J.

[-- Attachment #2: gcc-x32-tls-4.patch --]
[-- Type: text/x-patch, Size: 6055 bytes --]

2012-03-19  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New.

	* config/i386/i386.c (legitimize_tls_address): Also pass thread
	pointer to gen_tls_initial_exec_x32.
	(ix86_split_tls_initial_exec_x32): New.

	* config/i386/i386.md (*load_tp_x32): Renamed to ...
	(*load_tp_x32_<mode>): This. Replace SI with SWI48x.
	(tls_initial_exec_x32): Add an input register operand as thread
	pointer.  Generate a REX prefix if needed.
	(*tls_initial_exec_x32_load): New.
	(*tls_initial_exec_x32_store): Likewise.

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 630112f..528eeaa 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -142,6 +142,7 @@ extern void ix86_split_lshr (rtx *, rtx, enum machine_mode);
 extern rtx ix86_find_base_term (rtx);
 extern bool ix86_check_movabs (rtx, int);
 extern void ix86_split_idivmod (enum machine_mode, rtx[], bool);
+extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool);
 
 extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot);
 extern int ix86_attr_length_immediate_default (rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 78a366e..fb802ee 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12671,13 +12671,14 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	    }
 	  else if (Pmode == SImode)
 	    {
-	      /* Always generate
-			movl %fs:0, %reg32
+	      /* Always generate a REX prefix for
 			addl xgottpoff(%rip), %reg32
-		 to support linker IE->LE optimization and avoid
-		 fs:(%reg32) as memory operand.  */
+		 to support linker IE->LE optimization.  */
 	      dest = gen_reg_rtx (Pmode);
-	      emit_insn (gen_tls_initial_exec_x32 (dest, x));
+	      base = get_thread_pointer (for_mov
+					 || !(TARGET_TLS_DIRECT_SEG_REFS
+					      && TARGET_TLS_INDIRECT_SEG_REFS));
+	      emit_insn (gen_tls_initial_exec_x32 (dest, base, x));
 	      return dest;
 	    }
 
@@ -12754,6 +12755,28 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
   return dest;
 }
 
+/* Split x32 TLS IE access in MODE.  Split load if LOAD is TRUE,
+   otherwise split store.  */
+
+void
+ix86_split_tls_initial_exec_x32 (rtx operands[],
+				 enum machine_mode mode, bool load)
+{
+  rtx base, mem;
+  rtx off = load ? operands[1] : operands[0];
+  off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF);
+  off = gen_rtx_CONST (DImode, off);
+  off = gen_const_mem (DImode, off);
+  set_mem_alias_set (off, ix86_GOT_alias_set ());
+  base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off));
+  mem = gen_rtx_MEM (mode, off);
+  if (load)
+    emit_move_insn (operands[0], mem);
+  else
+    emit_move_insn (mem, operands[1]);
+}
+
 /* Create or return the unique __imp_DECL dllimport symbol corresponding
    to symbol DECL.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eae26ae..1643792 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12747,11 +12747,11 @@
 (define_mode_attr tp_seg [(SI "gs") (DI "fs")])
 
 ;; Load and add the thread base pointer from %<tp_seg>:0.
-(define_insn "*load_tp_x32"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(unspec:SI [(const_int 0)] UNSPEC_TP))]
+(define_insn "*load_tp_x32_<mode>"
+  [(set (match_operand:SWI48x 0 "register_operand" "=r")
+	(unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
   "TARGET_X32"
-  "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}"
+  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
   [(set_attr "type" "imov")
    (set_attr "modrm" "0")
    (set_attr "length" "7")
@@ -12836,27 +12836,54 @@
 }
   [(set_attr "type" "multi")])
 
-;; When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
-;; any instructions between MOV and ADD, which may interfere linker
-;; IE->LE optimization, since the last byte of the previous instruction
-;; before ADD may look like a REX prefix.  This also avoids
-;;	movl x@gottpoff(%rip), %reg32
-;;	movl $fs:(%reg32), %reg32
-;; Since address override works only on the (reg32) part in fs:(reg32),
-;; we can't use it as memory operand.
+;; When Pmode == SImode, there may be no REX prefix for ADD.  Make sure
+;; there is a REX prefix.
 (define_insn "tls_initial_exec_x32"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
-	 [(match_operand 1 "tls_symbolic_operand" "")]
+	 [(match_operand:SI 1 "register_operand" "0")
+	  (match_operand 2 "tls_symbolic_operand" "")]
 	 UNSPEC_TLS_IE_X32))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_X32"
 {
-  output_asm_insn
-    ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands);
-  return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}";
+  if (!REX_INT_REG_P (operands[0]))
+    fputs ("\trex ", asm_out_file);
+  return "add{l}\t{%a2@gottpoff(%%rip), %0|%0, %a2@gottpoff[rip]}";
 }
-  [(set_attr "type" "multi")])
+  [(set_attr "type" "alu")
+   (set_attr "length" "7")
+   (set_attr "memory" "load")])
+
+(define_insn_and_split "*tls_initial_exec_x32_load"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r")
+        (mem:SWI1248x
+	  (unspec:SI
+	   [(unspec:SI [(const_int 0)] UNSPEC_TP)
+	    (match_operand 1 "tls_symbolic_operand" "")]
+	   UNSPEC_TLS_IE_X32)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_X32
+   && can_create_pseudo_p ()"
+  "#"
+  ""
+  [(const_int 0)]
+  "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, TRUE); DONE;")
+
+(define_insn_and_split "*tls_initial_exec_x32_store"
+  [(set (mem:SWI1248x
+	  (unspec:SI
+	   [(unspec:SI [(const_int 0)] UNSPEC_TP)
+	    (match_operand 0 "tls_symbolic_operand" "")]
+	   UNSPEC_TLS_IE_X32))
+  	(match_operand:SWI1248x 1 "register_operand" "r"))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_X32
+   && can_create_pseudo_p ()"
+  "#"
+  ""
+  [(const_int 0)]
+  "ix86_split_tls_initial_exec_x32 (operands, <MODE>mode, FALSE); DONE;")
 
 ;; GNU2 TLS patterns can be split.
 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 15:51                   ` H.J. Lu
@ 2012-03-19 15:54                     ` H.J. Lu
  2012-03-19 16:20                       ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-19 15:54 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>>>> I am testing this patch.  OK for trunk if it passes all tests?
>>>
>>> No, force_reg will generate a pseudo, so this conversion is valid only
>>> for !can_create_pseudo ().
>>>
>>> At least for *tls_initial_exec_x32_store, you will need a temporary to
>>> split the pattern after reload.
>
> Here is the updated patch to add can_create_pseudo.  I also changed
> tls_initial_exec_x32 to take an input register operand as thread pointer.
>
>> Please try attached patch. It simply throws away all recent
>> complications w.r.t. to thread pointer and always handles TP in
>> DImode.
>>
>> The testcase:
>>
>> --cut here--
>> __thread int foo __attribute__ ((tls_model ("initial-exec")));
>>
>> void bar (int x)
>> {
>>  foo = x;
>> }
>>
>> int baz (void)
>> {
>>  return foo;
>> }
>> --cut here--
>>
>> Now compiles to:
>>
>> bar:
>>        movq    foo@gottpoff(%rip), %rax
>>        movl    %edi, %fs:(%rax)
>>        ret
>>
>> baz:
>>        movq    foo@gottpoff(%rip), %rax
>>        movl    %fs:(%rax), %eax
>>        ret
>>
>> In effect, this always generates %fs(%rDI) and emits REX prefix before
>> mov/add to satisfy brain-dead linkers.
>>
>> The patch is bootstrapping now on x86_64-pc-linux-gnu.
>>
>
> For
>
> --
> extern __thread char c;
> extern char y;
> void
> ie (void)
> {
>  y = c;
> }
> --
>
> Your patch generates:
>
>        movl    %fs:0, %eax
>        movq    c@gottpoff(%rip), %rdx
>        movzbl  (%rax,%rdx), %edx
>        movb    %dl, y(%rip)
>        ret
>
> It can be optimized to:
>
>        movq    c@gottpoff(%rip), %rax
>        movzbl  %fs:(%rax), %eax
>        movb    %al, y(%rip)
>        ret
>

Combine failed:

(set (reg:QI 63 [ c ])
    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
                        (const_int 0 [0])
                    ] UNSPEC_TP))
            (mem/u/c:DI (const:DI (unspec:DI [
                            (symbol_ref:SI ("c") [flags 0x60]
<var_decl 0x7ffff19b8140 c>)
                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))



-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 15:54                     ` H.J. Lu
@ 2012-03-19 16:20                       ` H.J. Lu
  2012-03-19 16:35                         ` H.J. Lu
  2012-03-19 16:47                         ` Uros Bizjak
  0 siblings, 2 replies; 43+ messages in thread
From: H.J. Lu @ 2012-03-19 16:20 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>
>>>>> I am testing this patch.  OK for trunk if it passes all tests?
>>>>
>>>> No, force_reg will generate a pseudo, so this conversion is valid only
>>>> for !can_create_pseudo ().
>>>>
>>>> At least for *tls_initial_exec_x32_store, you will need a temporary to
>>>> split the pattern after reload.
>>
>> Here is the updated patch to add can_create_pseudo.  I also changed
>> tls_initial_exec_x32 to take an input register operand as thread pointer.
>>
>>> Please try attached patch. It simply throws away all recent
>>> complications w.r.t. to thread pointer and always handles TP in
>>> DImode.
>>>
>>> The testcase:
>>>
>>> --cut here--
>>> __thread int foo __attribute__ ((tls_model ("initial-exec")));
>>>
>>> void bar (int x)
>>> {
>>>  foo = x;
>>> }
>>>
>>> int baz (void)
>>> {
>>>  return foo;
>>> }
>>> --cut here--
>>>
>>> Now compiles to:
>>>
>>> bar:
>>>        movq    foo@gottpoff(%rip), %rax
>>>        movl    %edi, %fs:(%rax)
>>>        ret
>>>
>>> baz:
>>>        movq    foo@gottpoff(%rip), %rax
>>>        movl    %fs:(%rax), %eax
>>>        ret
>>>
>>> In effect, this always generates %fs(%rDI) and emits REX prefix before
>>> mov/add to satisfy brain-dead linkers.
>>>
>>> The patch is bootstrapping now on x86_64-pc-linux-gnu.
>>>
>>
>> For
>>
>> --
>> extern __thread char c;
>> extern char y;
>> void
>> ie (void)
>> {
>>  y = c;
>> }
>> --
>>
>> Your patch generates:
>>
>>        movl    %fs:0, %eax
>>        movq    c@gottpoff(%rip), %rdx
>>        movzbl  (%rax,%rdx), %edx
>>        movb    %dl, y(%rip)
>>        ret
>>
>> It can be optimized to:
>>
>>        movq    c@gottpoff(%rip), %rax
>>        movzbl  %fs:(%rax), %eax
>>        movb    %al, y(%rip)
>>        ret
>>
>
> Combine failed:
>
> (set (reg:QI 63 [ c ])
>    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
>                        (const_int 0 [0])
>                    ] UNSPEC_TP))
>            (mem/u/c:DI (const:DI (unspec:DI [
>                            (symbol_ref:SI ("c") [flags 0x60]
> <var_decl 0x7ffff19b8140 c>)
>                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))
>
>

Wrong testcase.  IT should be

--
extern __thread char c;
extern __thread short w;
extern char y;
extern short i;
void
ie (void)
{
  y = c;
  i = w;
}
---

I got

	movl	%fs:0, %eax	
	movq	c@gottpoff(%rip), %rdx	
	movzbl	(%rax,%rdx), %edx	
	movb	%dl, y(%rip)	
	movq	w@gottpoff(%rip), %rdx	
	movzwl	(%rax,%rdx), %eax	
	movw	%ax, i(%rip)	
	ret	

It can be

	movq	c@gottpoff(%rip), %rax	
	movzbl	%fs:(%rax), %eax	
	movb	%al, y(%rip)	
	movq	w@gottpoff(%rip), %rax	
	movzwl	%fs:(%rax), %eax	
	movw	%ax, i(%rip)	
	ret	



-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 16:20                       ` H.J. Lu
@ 2012-03-19 16:35                         ` H.J. Lu
  2012-03-19 16:38                           ` Uros Bizjak
  2012-03-19 16:47                         ` Uros Bizjak
  1 sibling, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-19 16:35 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 3852 bytes --]

On Mon, Mar 19, 2012 at 9:19 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>
>>>>>> I am testing this patch.  OK for trunk if it passes all tests?
>>>>>
>>>>> No, force_reg will generate a pseudo, so this conversion is valid only
>>>>> for !can_create_pseudo ().
>>>>>
>>>>> At least for *tls_initial_exec_x32_store, you will need a temporary to
>>>>> split the pattern after reload.
>>>
>>> Here is the updated patch to add can_create_pseudo.  I also changed
>>> tls_initial_exec_x32 to take an input register operand as thread pointer.
>>>
>>>> Please try attached patch. It simply throws away all recent
>>>> complications w.r.t. to thread pointer and always handles TP in
>>>> DImode.
>>>>
>>>> The testcase:
>>>>
>>>> --cut here--
>>>> __thread int foo __attribute__ ((tls_model ("initial-exec")));
>>>>
>>>> void bar (int x)
>>>> {
>>>>  foo = x;
>>>> }
>>>>
>>>> int baz (void)
>>>> {
>>>>  return foo;
>>>> }
>>>> --cut here--
>>>>
>>>> Now compiles to:
>>>>
>>>> bar:
>>>>        movq    foo@gottpoff(%rip), %rax
>>>>        movl    %edi, %fs:(%rax)
>>>>        ret
>>>>
>>>> baz:
>>>>        movq    foo@gottpoff(%rip), %rax
>>>>        movl    %fs:(%rax), %eax
>>>>        ret
>>>>
>>>> In effect, this always generates %fs(%rDI) and emits REX prefix before
>>>> mov/add to satisfy brain-dead linkers.
>>>>
>>>> The patch is bootstrapping now on x86_64-pc-linux-gnu.
>>>>
>>>
>>> For
>>>
>>> --
>>> extern __thread char c;
>>> extern char y;
>>> void
>>> ie (void)
>>> {
>>>  y = c;
>>> }
>>> --
>>>
>>> Your patch generates:
>>>
>>>        movl    %fs:0, %eax
>>>        movq    c@gottpoff(%rip), %rdx
>>>        movzbl  (%rax,%rdx), %edx
>>>        movb    %dl, y(%rip)
>>>        ret
>>>
>>> It can be optimized to:
>>>
>>>        movq    c@gottpoff(%rip), %rax
>>>        movzbl  %fs:(%rax), %eax
>>>        movb    %al, y(%rip)
>>>        ret
>>>
>>
>> Combine failed:
>>
>> (set (reg:QI 63 [ c ])
>>    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
>>                        (const_int 0 [0])
>>                    ] UNSPEC_TP))
>>            (mem/u/c:DI (const:DI (unspec:DI [
>>                            (symbol_ref:SI ("c") [flags 0x60]
>> <var_decl 0x7ffff19b8140 c>)
>>                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))
>>
>>
>
> Wrong testcase.  IT should be
>
> --
> extern __thread char c;
> extern __thread short w;
> extern char y;
> extern short i;
> void
> ie (void)
> {
>  y = c;
>  i = w;
> }
> ---
>
> I got
>
>        movl    %fs:0, %eax
>        movq    c@gottpoff(%rip), %rdx
>        movzbl  (%rax,%rdx), %edx
>        movb    %dl, y(%rip)
>        movq    w@gottpoff(%rip), %rdx
>        movzwl  (%rax,%rdx), %eax
>        movw    %ax, i(%rip)
>        ret
>
> It can be
>
>        movq    c@gottpoff(%rip), %rax
>        movzbl  %fs:(%rax), %eax
>        movb    %al, y(%rip)
>        movq    w@gottpoff(%rip), %rax
>        movzwl  %fs:(%rax), %eax
>        movw    %ax, i(%rip)
>        ret
>
>

How about this patch?  I changed 32 TP load to

(define_insn "*load_tp_x32_<mode>"
  [(set (match_operand:SWI48x 0 "register_operand" "=r")
        (unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
  "TARGET_X32"
  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
  [(set_attr "type" "imov")
   (set_attr "modrm" "0")
   (set_attr "length" "7")
   (set_attr "memory" "load")
   (set_attr "imm_disp" "false")])

and removed *load_tp_x32_zext.


-- 
H.J.

[-- Attachment #2: gcc-x32-tls-5.patch --]
[-- Type: text/x-patch, Size: 7130 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9aa5ee7..66221e4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12483,15 +12483,12 @@ legitimize_pic_address (rtx orig, rtx reg)
 /* Load the thread pointer.  If TO_REG is true, force it into a register.  */
 
 static rtx
-get_thread_pointer (bool to_reg)
+get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
-  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
-
-  if (GET_MODE (tp) != Pmode)
-    tp = convert_to_mode (Pmode, tp, 1);
+  rtx tp = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
   if (to_reg)
-    tp = copy_addr_to_reg (tp);
+    tp = copy_to_mode_reg (tp_mode, tp);
 
   return tp;
 }
@@ -12543,6 +12540,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 {
   rtx dest, base, off;
   rtx pic = NULL_RTX, tp = NULL_RTX;
+  enum machine_mode tp_mode = Pmode;
   int type;
 
   switch (model)
@@ -12568,7 +12566,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  else
 	    emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest));
 
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL, x);
@@ -12618,7 +12616,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  else
 	    emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL,
 			       gen_rtx_MINUS (Pmode, tmp, tp));
 	}
@@ -12664,27 +12662,18 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
     case TLS_MODEL_INITIAL_EXEC:
       if (TARGET_64BIT)
 	{
+	  tp_mode = DImode;
+
 	  if (TARGET_SUN_TLS)
 	    {
 	      /* The Sun linker took the AMD64 TLS spec literally
 		 and can only handle %rax as destination of the
 		 initial executable code sequence.  */
 
-	      dest = gen_reg_rtx (Pmode);
+	      dest = gen_reg_rtx (tp_mode);
 	      emit_insn (gen_tls_initial_exec_64_sun (dest, x));
 	      return dest;
 	    }
-	  else if (Pmode == SImode)
-	    {
-	      /* Always generate
-			movl %fs:0, %reg32
-			addl xgottpoff(%rip), %reg32
-		 to support linker IE->LE optimization and avoid
-		 fs:(%reg32) as memory operand.  */
-	      dest = gen_reg_rtx (Pmode);
-	      emit_insn (gen_tls_initial_exec_x32 (dest, x));
-	      return dest;
-	    }
 
 	  pic = NULL;
 	  type = UNSPEC_GOTNTPOFF;
@@ -12708,24 +12697,23 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  type = UNSPEC_INDNTPOFF;
 	}
 
-      off = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, x), type);
-      off = gen_rtx_CONST (Pmode, off);
+      off = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, x), type);
+      off = gen_rtx_CONST (tp_mode, off);
       if (pic)
-	off = gen_rtx_PLUS (Pmode, pic, off);
-      off = gen_const_mem (Pmode, off);
+	off = gen_rtx_PLUS (tp_mode, pic, off);
+      off = gen_const_mem (tp_mode, off);
       set_mem_alias_set (off, ix86_GOT_alias_set ());
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-          base = get_thread_pointer (for_mov
-				     || !(TARGET_TLS_DIRECT_SEG_REFS
-					  && TARGET_TLS_INDIRECT_SEG_REFS));
-	  off = force_reg (Pmode, off);
-	  return gen_rtx_PLUS (Pmode, base, off);
+	  base = get_thread_pointer (tp_mode,
+				     for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+	  off = force_reg (tp_mode, off);
+	  return gen_rtx_PLUS (tp_mode, base, off);
 	}
       else
 	{
-	  base = get_thread_pointer (true);
+	  base = get_thread_pointer (Pmode, true);
 	  dest = gen_reg_rtx (Pmode);
 	  emit_insn (ix86_gen_sub3 (dest, base, off));
 	}
@@ -12739,14 +12727,13 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-	  base = get_thread_pointer (for_mov
-				     || !(TARGET_TLS_DIRECT_SEG_REFS
-					  && TARGET_TLS_INDIRECT_SEG_REFS));
+	  base = get_thread_pointer (Pmode,
+				     for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
 	  return gen_rtx_PLUS (Pmode, base, off);
 	}
       else
 	{
-	  base = get_thread_pointer (true);
+	  base = get_thread_pointer (Pmode, true);
 	  dest = gen_reg_rtx (Pmode);
 	  emit_insn (ix86_gen_sub3 (dest, base, off));
 	}
@@ -13274,8 +13261,7 @@ ix86_delegitimize_tls_address (rtx orig_x)
   rtx x = orig_x, unspec;
   struct ix86_address addr;
 
-  if (!(TARGET_TLS_DIRECT_SEG_REFS
-	&& TARGET_TLS_INDIRECT_SEG_REFS))
+  if (!TARGET_TLS_DIRECT_SEG_REFS)
     return orig_x;
   if (MEM_P (x))
     x = XEXP (x, 0);
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 9e5ac00..3fcd209 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -467,9 +467,6 @@ extern int x86_prefetch_sse;
 #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0
 #endif
 
-/* Address override works only on the (%reg) part of %fs:(%reg).  */
-#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode)
-
 /* Fence to use after loop using storent.  */
 
 extern tree x86_mfence;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d23c67b..e167ceb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12747,20 +12747,9 @@
 (define_mode_attr tp_seg [(SI "gs") (DI "fs")])
 
 ;; Load and add the thread base pointer from %<tp_seg>:0.
-(define_insn "*load_tp_x32"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(unspec:SI [(const_int 0)] UNSPEC_TP))]
-  "TARGET_X32"
-  "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}"
-  [(set_attr "type" "imov")
-   (set_attr "modrm" "0")
-   (set_attr "length" "7")
-   (set_attr "memory" "load")
-   (set_attr "imm_disp" "false")])
-
-(define_insn "*load_tp_x32_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-	(zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
+(define_insn "*load_tp_x32_<mode>"
+  [(set (match_operand:SWI48x 0 "register_operand" "=r")
+	(unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
   "TARGET_X32"
   "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
   [(set_attr "type" "imov")
@@ -12836,28 +12825,6 @@
 }
   [(set_attr "type" "multi")])
 
-;; When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
-;; any instructions between MOV and ADD, which may interfere linker
-;; IE->LE optimization, since the last byte of the previous instruction
-;; before ADD may look like a REX prefix.  This also avoids
-;;	movl x@gottpoff(%rip), %reg32
-;;	movl $fs:(%reg32), %reg32
-;; Since address override works only on the (reg32) part in fs:(reg32),
-;; we can't use it as memory operand.
-(define_insn "tls_initial_exec_x32"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(unspec:SI
-	 [(match_operand 1 "tls_symbolic_operand")]
-	 UNSPEC_TLS_IE_X32))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_X32"
-{
-  output_asm_insn
-    ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands);
-  return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}";
-}
-  [(set_attr "type" "multi")])
-
 ;; GNU2 TLS patterns can be split.
 
 (define_expand "tls_dynamic_gnu2_32"

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 16:35                         ` H.J. Lu
@ 2012-03-19 16:38                           ` Uros Bizjak
  2012-03-19 16:47                             ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-19 16:38 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 5:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> Combine failed:
>>>
>>> (set (reg:QI 63 [ c ])
>>>    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
>>>                        (const_int 0 [0])
>>>                    ] UNSPEC_TP))
>>>            (mem/u/c:DI (const:DI (unspec:DI [
>>>                            (symbol_ref:SI ("c") [flags 0x60]
>>> <var_decl 0x7ffff19b8140 c>)
>>>                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))
>>>
>>>
>>
>> Wrong testcase.  IT should be
>>
>> --
>> extern __thread char c;
>> extern __thread short w;
>> extern char y;
>> extern short i;
>> void
>> ie (void)
>> {
>>  y = c;
>>  i = w;
>> }
>> ---
>>
>> I got
>>
>>        movl    %fs:0, %eax
>>        movq    c@gottpoff(%rip), %rdx
>>        movzbl  (%rax,%rdx), %edx
>>        movb    %dl, y(%rip)
>>        movq    w@gottpoff(%rip), %rdx
>>        movzwl  (%rax,%rdx), %eax
>>        movw    %ax, i(%rip)
>>        ret
>>
>> It can be
>>
>>        movq    c@gottpoff(%rip), %rax
>>        movzbl  %fs:(%rax), %eax
>>        movb    %al, y(%rip)
>>        movq    w@gottpoff(%rip), %rax
>>        movzwl  %fs:(%rax), %eax
>>        movw    %ax, i(%rip)
>>        ret
>>
>>
>
> How about this patch?  I changed 32 TP load to
>
> (define_insn "*load_tp_x32_<mode>"
>  [(set (match_operand:SWI48x 0 "register_operand" "=r")
>        (unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
>  "TARGET_X32"
>  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
>  [(set_attr "type" "imov")
>   (set_attr "modrm" "0")
>   (set_attr "length" "7")
>   (set_attr "memory" "load")
>   (set_attr "imm_disp" "false")])
>
> and removed *load_tp_x32_zext.

No, your whole approach with splitters is wrong.

@@ -12747,11 +12747,11 @@
 (define_mode_attr tp_seg [(SI "gs") (DI "fs")])

 ;; Load and add the thread base pointer from %<tp_seg>:0.
-(define_insn "*load_tp_x32"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(unspec:SI [(const_int 0)] UNSPEC_TP))]
+(define_insn "*load_tp_x32_<mode>"
+  [(set (match_operand:SWI48x 0 "register_operand" "=r")
+	(unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
   "TARGET_X32"
-  "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}"
+  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"

The result is zero_extended SImode register, not fake SImode register in DImore.

But as said, you should generate correct sequence from the beginning.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 16:38                           ` Uros Bizjak
@ 2012-03-19 16:47                             ` H.J. Lu
  2012-03-19 16:49                               ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-19 16:47 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 9:37 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Mar 19, 2012 at 5:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> Combine failed:
>>>>
>>>> (set (reg:QI 63 [ c ])
>>>>    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
>>>>                        (const_int 0 [0])
>>>>                    ] UNSPEC_TP))
>>>>            (mem/u/c:DI (const:DI (unspec:DI [
>>>>                            (symbol_ref:SI ("c") [flags 0x60]
>>>> <var_decl 0x7ffff19b8140 c>)
>>>>                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))
>>>>
>>>>
>>>
>>> Wrong testcase.  IT should be
>>>
>>> --
>>> extern __thread char c;
>>> extern __thread short w;
>>> extern char y;
>>> extern short i;
>>> void
>>> ie (void)
>>> {
>>>  y = c;
>>>  i = w;
>>> }
>>> ---
>>>
>>> I got
>>>
>>>        movl    %fs:0, %eax
>>>        movq    c@gottpoff(%rip), %rdx
>>>        movzbl  (%rax,%rdx), %edx
>>>        movb    %dl, y(%rip)
>>>        movq    w@gottpoff(%rip), %rdx
>>>        movzwl  (%rax,%rdx), %eax
>>>        movw    %ax, i(%rip)
>>>        ret
>>>
>>> It can be
>>>
>>>        movq    c@gottpoff(%rip), %rax
>>>        movzbl  %fs:(%rax), %eax
>>>        movb    %al, y(%rip)
>>>        movq    w@gottpoff(%rip), %rax
>>>        movzwl  %fs:(%rax), %eax
>>>        movw    %ax, i(%rip)
>>>        ret
>>>
>>>
>>
>> How about this patch?  I changed 32 TP load to
>>
>> (define_insn "*load_tp_x32_<mode>"
>>  [(set (match_operand:SWI48x 0 "register_operand" "=r")
>>        (unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
>>  "TARGET_X32"
>>  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
>>  [(set_attr "type" "imov")
>>   (set_attr "modrm" "0")
>>   (set_attr "length" "7")
>>   (set_attr "memory" "load")
>>   (set_attr "imm_disp" "false")])
>>
>> and removed *load_tp_x32_zext.
>
> No, your whole approach with splitters is wrong.
>
> @@ -12747,11 +12747,11 @@
>  (define_mode_attr tp_seg [(SI "gs") (DI "fs")])
>
>  ;; Load and add the thread base pointer from %<tp_seg>:0.
> -(define_insn "*load_tp_x32"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> -       (unspec:SI [(const_int 0)] UNSPEC_TP))]
> +(define_insn "*load_tp_x32_<mode>"
> +  [(set (match_operand:SWI48x 0 "register_operand" "=r")
> +       (unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
>   "TARGET_X32"
> -  "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}"
> +  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
>
> The result is zero_extended SImode register, not fake SImode register in DImore.
>
> But as said, you should generate correct sequence from the beginning.
>

For x32,  thread pointer is an unsigned 32bit value.

movl %fs:0, %eax

is the correct instruction to load thread pointer into EAX and RAX.


-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 16:20                       ` H.J. Lu
  2012-03-19 16:35                         ` H.J. Lu
@ 2012-03-19 16:47                         ` Uros Bizjak
  1 sibling, 0 replies; 43+ messages in thread
From: Uros Bizjak @ 2012-03-19 16:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 5:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>        movl    %fs:0, %eax
>        movq    c@gottpoff(%rip), %rdx
>        movzbl  (%rax,%rdx), %edx
>        movb    %dl, y(%rip)
>        movq    w@gottpoff(%rip), %rdx
>        movzwl  (%rax,%rdx), %eax
>        movw    %ax, i(%rip)
>        ret
>
> It can be
>
>        movq    c@gottpoff(%rip), %rax
>        movzbl  %fs:(%rax), %eax
>        movb    %al, y(%rip)
>        movq    w@gottpoff(%rip), %rax
>        movzwl  %fs:(%rax), %eax
>        movw    %ax, i(%rip)
>        ret

This is just CSE in action. It CSEd movl %fs:0, %eax, since it has to
be zero extended before going into address.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 16:47                             ` H.J. Lu
@ 2012-03-19 16:49                               ` Uros Bizjak
  2012-03-19 16:56                                 ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-19 16:49 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

> For x32,  thread pointer is an unsigned 32bit value.
>
> movl %fs:0, %eax
>
> is the correct instruction to load thread pointer into EAX and RAX.

So, where is ZERO_EXTEND RTX then?

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 16:49                               ` Uros Bizjak
@ 2012-03-19 16:56                                 ` H.J. Lu
  2012-03-19 17:02                                   ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-19 16:56 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 9:49 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> For x32,  thread pointer is an unsigned 32bit value.
>>
>> movl %fs:0, %eax
>>
>> is the correct instruction to load thread pointer into EAX and RAX.
>
> So, where is ZERO_EXTEND RTX then?
>

Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
when there is a single instruction to load TP into a DImode register.

-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 16:56                                 ` H.J. Lu
@ 2012-03-19 17:02                                   ` Uros Bizjak
  2012-03-19 17:30                                     ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-19 17:02 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 5:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> For x32,  thread pointer is an unsigned 32bit value.
>>>
>>> movl %fs:0, %eax
>>>
>>> is the correct instruction to load thread pointer into EAX and RAX.
>>
>> So, where is ZERO_EXTEND RTX then?
>>
>
> Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
> TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
> when there is a single instruction to load TP into a DImode register.

I don't agree with this explanation. The mode can't be SImode and
DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the
reason we went for all that TARGET_X32 stuff in TP load RTX.

Please test my proposed patch. If it works OK, I will commit it to SVN.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 17:02                                   ` Uros Bizjak
@ 2012-03-19 17:30                                     ` Uros Bizjak
  2012-03-19 17:50                                       ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-19 17:30 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> For x32,  thread pointer is an unsigned 32bit value.
>>>>
>>>> movl %fs:0, %eax
>>>>
>>>> is the correct instruction to load thread pointer into EAX and RAX.
>>>
>>> So, where is ZERO_EXTEND RTX then?
>>>
>>
>> Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
>> TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
>> when there is a single instruction to load TP into a DImode register.
>
> I don't agree with this explanation. The mode can't be SImode and
> DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the
> reason we went for all that TARGET_X32 stuff in TP load RTX.
>
> Please test my proposed patch. If it works OK, I will commit it to SVN.

The onyl acceptable way is to generate ZERO_EXTEND in place, so:

--cut here--
static rtx
get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
{
  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

  if (GET_MODE (tp) != tp_mode)
    {
      gcc_assert (GET_MODE (tp) == SImode);
      gcc_assert (tp_mode == DImode);

      tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
    }

  if (to_reg)
    tp = copy_to_mode_reg (tp_mode, tp);

  return tp;
}
--cut here--

This will generate:

        movq    c@gottpoff(%rip), %rax
        movzbl  %fs:(%rax), %eax
        movb    %al, y(%rip)
        movq    w@gottpoff(%rip), %rax
        movzwl  %fs:(%rax), %eax
        movw    %ax, i(%rip)
        ret

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 17:30                                     ` Uros Bizjak
@ 2012-03-19 17:50                                       ` H.J. Lu
  2012-03-19 19:14                                         ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-19 17:50 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Henderson

On Mon, Mar 19, 2012 at 10:29 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> For x32,  thread pointer is an unsigned 32bit value.
>>>>>
>>>>> movl %fs:0, %eax
>>>>>
>>>>> is the correct instruction to load thread pointer into EAX and RAX.
>>>>
>>>> So, where is ZERO_EXTEND RTX then?
>>>>
>>>
>>> Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
>>> TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
>>> when there is a single instruction to load TP into a DImode register.
>>
>> I don't agree with this explanation. The mode can't be SImode and
>> DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the
>> reason we went for all that TARGET_X32 stuff in TP load RTX.

FWIW, TP maintained by OS is opaque to GCC and GCC mode doesn't
apply to the TP value maintained by OS.  The instruction pattern to load TP
into a register is provided by OS and is also opaque to GCC.  X32 OS provides
single instructions to load TP into SImode and DImode registers.  We
can load x32 TP into SImode register and ZERO_EXTENDs to DImode.
Or we can use the OS provided instruction to load TP into DImode
register directly.

>> Please test my proposed patch. If it works OK, I will commit it to SVN.
>
> The onyl acceptable way is to generate ZERO_EXTEND in place, so:
>
> --cut here--
> static rtx
> get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
> {
>  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
>
>  if (GET_MODE (tp) != tp_mode)
>    {
>      gcc_assert (GET_MODE (tp) == SImode);
>      gcc_assert (tp_mode == DImode);
>
>      tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
>    }
>
>  if (to_reg)
>    tp = copy_to_mode_reg (tp_mode, tp);
>
>  return tp;
> }
> --cut here--

This version works fine.

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 17:50                                       ` H.J. Lu
@ 2012-03-19 19:14                                         ` Uros Bizjak
  2012-03-20  9:35                                           ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-19 19:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1924 bytes --]

On Mon, Mar 19, 2012 at 6:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> Please test my proposed patch. If it works OK, I will commit it to SVN.
>>
>> The onyl acceptable way is to generate ZERO_EXTEND in place, so:
>>
>> --cut here--
>> static rtx
>> get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
>> {
>>  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
>>
>>  if (GET_MODE (tp) != tp_mode)
>>    {
>>      gcc_assert (GET_MODE (tp) == SImode);
>>      gcc_assert (tp_mode == DImode);
>>
>>      tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
>>    }
>>
>>  if (to_reg)
>>    tp = copy_to_mode_reg (tp_mode, tp);
>>
>>  return tp;
>> }
>> --cut here--
>
> This version works fine.

Attached patch was committed to mainline SVN with following ChangeLog:

2012-03-19  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.c (get_thread_pointer): Add tp_mode argument.
	Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode.
	(legitimize_tls_address) <TLS_MODEL_INITIAL_EXEC>: Always generate
	DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT.
	(ix86_decompose_address): Allow zero extended UNSPEC_TP references.

	Revert:
	2012-03-13  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
	* config/i386/i386.c (ix86_decompose_address): Use
	TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
	(legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
	thread pointer to a register.

	Revert:
	2012-03-10  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
	if Pmode != word_mode.
	(legitimize_tls_address): Call gen_tls_initial_exec_x32 if
	Pmode == SImode for TARGET_X32.

	* config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
	(tls_initial_exec_x32): Likewise.

Tested on x86_64-pc-linux-gnu {,-m32}.

Thanks,
Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 6477 bytes --]

Index: i386.md
===================================================================
--- i386.md	(revision 185524)
+++ i386.md	(working copy)
@@ -96,7 +96,6 @@
   UNSPEC_TLS_LD_BASE
   UNSPEC_TLSDESC
   UNSPEC_TLS_IE_SUN
-  UNSPEC_TLS_IE_X32
 
   ;; Other random patterns
   UNSPEC_SCAS
@@ -12836,28 +12835,6 @@
 }
   [(set_attr "type" "multi")])
 
-;; When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
-;; any instructions between MOV and ADD, which may interfere linker
-;; IE->LE optimization, since the last byte of the previous instruction
-;; before ADD may look like a REX prefix.  This also avoids
-;;	movl x@gottpoff(%rip), %reg32
-;;	movl $fs:(%reg32), %reg32
-;; Since address override works only on the (reg32) part in fs:(reg32),
-;; we can't use it as memory operand.
-(define_insn "tls_initial_exec_x32"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(unspec:SI
-	 [(match_operand 1 "tls_symbolic_operand")]
-	 UNSPEC_TLS_IE_X32))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_X32"
-{
-  output_asm_insn
-    ("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands);
-  return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}";
-}
-  [(set_attr "type" "multi")])
-
 ;; GNU2 TLS patterns can be split.
 
 (define_expand "tls_dynamic_gnu2_32"
Index: i386.c
===================================================================
--- i386.c	(revision 185524)
+++ i386.c	(working copy)
@@ -11514,6 +11514,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr
 	      scale = 1 << scale;
 	      break;
 
+	    case ZERO_EXTEND:
+	      op = XEXP (op, 0);
+	      /* FALLTHRU */
+
 	    case UNSPEC:
 	      if (XINT (op, 1) == UNSPEC_TP
 	          && TARGET_TLS_DIRECT_SEG_REFS
@@ -12483,15 +12487,20 @@ legitimize_pic_address (rtx orig, rtx reg)
 /* Load the thread pointer.  If TO_REG is true, force it into a register.  */
 
 static rtx
-get_thread_pointer (bool to_reg)
+get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
   rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
-  if (GET_MODE (tp) != Pmode)
-    tp = convert_to_mode (Pmode, tp, 1);
+  if (GET_MODE (tp) != tp_mode)
+    {
+      gcc_assert (GET_MODE (tp) == SImode);
+      gcc_assert (tp_mode == DImode);
 
+      tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
+    }
+
   if (to_reg)
-    tp = copy_addr_to_reg (tp);
+    tp = copy_to_mode_reg (tp_mode, tp);
 
   return tp;
 }
@@ -12543,6 +12552,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 {
   rtx dest, base, off;
   rtx pic = NULL_RTX, tp = NULL_RTX;
+  enum machine_mode tp_mode = Pmode;
   int type;
 
   switch (model)
@@ -12568,7 +12578,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 	  else
 	    emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest));
 
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL, x);
@@ -12618,7 +12628,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 	  else
 	    emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL,
 			       gen_rtx_MINUS (Pmode, tmp, tp));
 	}
@@ -12674,18 +12684,10 @@ legitimize_tls_address (rtx x, enum tls_model mode
 	      emit_insn (gen_tls_initial_exec_64_sun (dest, x));
 	      return dest;
 	    }
-	  else if (Pmode == SImode)
-	    {
-	      /* Always generate
-			movl %fs:0, %reg32
-			addl xgottpoff(%rip), %reg32
-		 to support linker IE->LE optimization and avoid
-		 fs:(%reg32) as memory operand.  */
-	      dest = gen_reg_rtx (Pmode);
-	      emit_insn (gen_tls_initial_exec_x32 (dest, x));
-	      return dest;
-	    }
 
+	  /* Generate DImode references to avoid %fs:(%reg32)
+	     problems and linker IE->LE relaxation bug.  */
+	  tp_mode = DImode;
 	  pic = NULL;
 	  type = UNSPEC_GOTNTPOFF;
 	}
@@ -12708,24 +12710,23 @@ legitimize_tls_address (rtx x, enum tls_model mode
 	  type = UNSPEC_INDNTPOFF;
 	}
 
-      off = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, x), type);
-      off = gen_rtx_CONST (Pmode, off);
+      off = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, x), type);
+      off = gen_rtx_CONST (tp_mode, off);
       if (pic)
-	off = gen_rtx_PLUS (Pmode, pic, off);
-      off = gen_const_mem (Pmode, off);
+	off = gen_rtx_PLUS (tp_mode, pic, off);
+      off = gen_const_mem (tp_mode, off);
       set_mem_alias_set (off, ix86_GOT_alias_set ());
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-          base = get_thread_pointer (for_mov
-				     || !(TARGET_TLS_DIRECT_SEG_REFS
-					  && TARGET_TLS_INDIRECT_SEG_REFS));
-	  off = force_reg (Pmode, off);
-	  return gen_rtx_PLUS (Pmode, base, off);
+	  base = get_thread_pointer (tp_mode,
+				     for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+	  off = force_reg (tp_mode, off);
+	  return gen_rtx_PLUS (tp_mode, base, off);
 	}
       else
 	{
-	  base = get_thread_pointer (true);
+	  base = get_thread_pointer (Pmode, true);
 	  dest = gen_reg_rtx (Pmode);
 	  emit_insn (ix86_gen_sub3 (dest, base, off));
 	}
@@ -12739,14 +12740,13 @@ legitimize_tls_address (rtx x, enum tls_model mode
 
       if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
 	{
-	  base = get_thread_pointer (for_mov
-				     || !(TARGET_TLS_DIRECT_SEG_REFS
-					  && TARGET_TLS_INDIRECT_SEG_REFS));
+	  base = get_thread_pointer (Pmode,
+				     for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
 	  return gen_rtx_PLUS (Pmode, base, off);
 	}
       else
 	{
-	  base = get_thread_pointer (true);
+	  base = get_thread_pointer (Pmode, true);
 	  dest = gen_reg_rtx (Pmode);
 	  emit_insn (ix86_gen_sub3 (dest, base, off));
 	}
@@ -13274,8 +13274,7 @@ ix86_delegitimize_tls_address (rtx orig_x)
   rtx x = orig_x, unspec;
   struct ix86_address addr;
 
-  if (!(TARGET_TLS_DIRECT_SEG_REFS
-	&& TARGET_TLS_INDIRECT_SEG_REFS))
+  if (!TARGET_TLS_DIRECT_SEG_REFS)
     return orig_x;
   if (MEM_P (x))
     x = XEXP (x, 0);
Index: i386.h
===================================================================
--- i386.h	(revision 185524)
+++ i386.h	(working copy)
@@ -467,9 +467,6 @@ extern int x86_prefetch_sse;
 #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0
 #endif
 
-/* Address override works only on the (%reg) part of %fs:(%reg).  */
-#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode)
-
 /* Fence to use after loop using storent.  */
 
 extern tree x86_mfence;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-18 20:55                 ` Uros Bizjak
  2012-03-19 15:51                   ` H.J. Lu
@ 2012-03-20  8:52                   ` Eric Botcazou
  2012-03-20  8:59                     ` Jakub Jelinek
  1 sibling, 1 reply; 43+ messages in thread
From: Eric Botcazou @ 2012-03-20  8:52 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, H.J. Lu, Richard Henderson

> The patch is bootstrapping now on x86_64-pc-linux-gnu.

It very likely breaks bootstrap with RTL checking enabled:

/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc -B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ -B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ -isystem /usr/gnat/i686-pc-linux-gnu/include -isystem /usr/gnat/i686-pc-linux-gnu/sys-include    -g -O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include   -fpic -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fpic -I. -I. -I../.././gcc -I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc -I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS  -DUSE_TLS -o 
_popcountsi2.o -MT _popcountsi2.o -MD -MP -MF 
_popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS
../../../src/libgcc/libgcc2.c: In function '__popcountsi2':
../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: 
expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, 
at config/i386/i386.c:11522
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:mailto:report@adacore.com> for instructions.
make[3]: *** [_popcountsi2.o] Error 1

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-20  8:52                   ` Eric Botcazou
@ 2012-03-20  8:59                     ` Jakub Jelinek
  2012-03-20 11:20                       ` Jakub Jelinek
  0 siblings, 1 reply; 43+ messages in thread
From: Jakub Jelinek @ 2012-03-20  8:59 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Uros Bizjak, gcc-patches, H.J. Lu, Richard Henderson

On Tue, Mar 20, 2012 at 09:51:07AM +0100, Eric Botcazou wrote:
> > The patch is bootstrapping now on x86_64-pc-linux-gnu.
> 
> It very likely breaks bootstrap with RTL checking enabled:
> 
> /sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc -B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ -B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ -isystem /usr/gnat/i686-pc-linux-gnu/include -isystem /usr/gnat/i686-pc-linux-gnu/sys-include    -g -O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include   -fpic -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fpic -I. -I. -I../.././gcc -I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc -I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS  -DUSE_TLS -o 
> _popcountsi2.o -MT _popcountsi2.o -MD -MP -MF 
> _popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS
> ../../../src/libgcc/libgcc2.c: In function '__popcountsi2':
> ../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: 
> expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, 
> at config/i386/i386.c:11522
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <URL:mailto:report@adacore.com> for instructions.
> make[3]: *** [_popcountsi2.o] Error 1

Yeah, my bootstrap just failed the same.  Will test:

2012-03-20  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>:
	If operand isn't UNSPEC, return 0.

--- gcc/config/i386/i386.c.jj	2012-03-20 09:35:06.000000000 +0100
+++ gcc/config/i386/i386.c	2012-03-20 09:56:35.038835835 +0100
@@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct
 
 	    case ZERO_EXTEND:
 	      op = XEXP (op, 0);
+	      if (GET_CODE (op) != UNSPEC)
+		return 0;
 	      /* FALLTHRU */
 
 	    case UNSPEC:

	Jakub

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-19 19:14                                         ` Uros Bizjak
@ 2012-03-20  9:35                                           ` Paolo Bonzini
  0 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2012-03-20  9:35 UTC (permalink / raw)
  To: gcc-patches

Il 19/03/2012 20:13, Uros Bizjak ha scritto:
> 2012-03-19  Uros Bizjak  <ubizjak@gmail.com>
> 
> 	* config/i386/i386.c (get_thread_pointer): Add tp_mode argument.
> 	Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode.
> 	(legitimize_tls_address) <TLS_MODEL_INITIAL_EXEC>: Always generate
> 	DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT.
> 	(ix86_decompose_address): Allow zero extended UNSPEC_TP references.
> 
> 	Revert:
> 	2012-03-13  Uros Bizjak  <ubizjak@gmail.com>
> 
> 	* config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
> 	* config/i386/i386.c (ix86_decompose_address): Use
> 	TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
> 	(legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
> 	thread pointer to a register.
> 
> 	Revert:
> 	2012-03-10  H.J. Lu  <hongjiu.lu@intel.com>
> 
> 	* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
> 	if Pmode != word_mode.
> 	(legitimize_tls_address): Call gen_tls_initial_exec_x32 if
> 	Pmode == SImode for TARGET_X32.
> 
> 	* config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
> 	(tls_initial_exec_x32): Likewise.
> 
> Tested on x86_64-pc-linux-gnu {,-m32}.

No testcases?

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-20  8:59                     ` Jakub Jelinek
@ 2012-03-20 11:20                       ` Jakub Jelinek
  2012-03-20 15:52                         ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jakub Jelinek @ 2012-03-20 11:20 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Eric Botcazou, gcc-patches, H.J. Lu, Richard Henderson

On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote:
> Yeah, my bootstrap just failed the same.  Will test:
> 
> 2012-03-20  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>:
> 	If operand isn't UNSPEC, return 0.

Committed as obvious now that bootstrap/regtest finished on x86_64-linux
and i686-linux.

> --- gcc/config/i386/i386.c.jj	2012-03-20 09:35:06.000000000 +0100
> +++ gcc/config/i386/i386.c	2012-03-20 09:56:35.038835835 +0100
> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct
>  
>  	    case ZERO_EXTEND:
>  	      op = XEXP (op, 0);
> +	      if (GET_CODE (op) != UNSPEC)
> +		return 0;
>  	      /* FALLTHRU */
>  
>  	    case UNSPEC:

	Jakub

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-20 11:20                       ` Jakub Jelinek
@ 2012-03-20 15:52                         ` H.J. Lu
  2012-03-20 17:55                           ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-20 15:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Eric Botcazou, gcc-patches, Richard Henderson

On Tue, Mar 20, 2012 at 4:19 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote:
>> Yeah, my bootstrap just failed the same.  Will test:
>>
>> 2012-03-20  Jakub Jelinek  <jakub@redhat.com>
>>
>>       * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>:
>>       If operand isn't UNSPEC, return 0.
>
> Committed as obvious now that bootstrap/regtest finished on x86_64-linux
> and i686-linux.
>
>> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100
>> +++ gcc/config/i386/i386.c    2012-03-20 09:56:35.038835835 +0100
>> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct
>>
>>           case ZERO_EXTEND:
>>             op = XEXP (op, 0);
>> +           if (GET_CODE (op) != UNSPEC)
>> +             return 0;
>>             /* FALLTHRU */
>>
>>           case UNSPEC:
>

Uros,

I think use the OS provided instruction to load TP into DImode register
could simplify the code.


-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-20 15:52                         ` H.J. Lu
@ 2012-03-20 17:55                           ` Uros Bizjak
  2012-03-20 18:27                             ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-20 17:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson

On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> Yeah, my bootstrap just failed the same.  Will test:
>>>
>>> 2012-03-20  Jakub Jelinek  <jakub@redhat.com>
>>>
>>>       * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>:
>>>       If operand isn't UNSPEC, return 0.
>>
>> Committed as obvious now that bootstrap/regtest finished on x86_64-linux
>> and i686-linux.
>>
>>> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100
>>> +++ gcc/config/i386/i386.c    2012-03-20 09:56:35.038835835 +0100
>>> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct
>>>
>>>           case ZERO_EXTEND:
>>>             op = XEXP (op, 0);
>>> +           if (GET_CODE (op) != UNSPEC)
>>> +             return 0;
>>>             /* FALLTHRU */
>>>
>>>           case UNSPEC:
>>
>
> Uros,
>
> I think use the OS provided instruction to load TP into DImode register
> could simplify the code.

Which OS provided instruction?

Please see how TP is defined in get_thread_pointer, it is in ptr_mode:

  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

This says that TP is in SImode on X32.

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-20 17:55                           ` Uros Bizjak
@ 2012-03-20 18:27                             ` H.J. Lu
  2012-03-20 18:44                               ` Uros Bizjak
  0 siblings, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2012-03-20 18:27 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson

On Tue, Mar 20, 2012 at 10:54 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> Yeah, my bootstrap just failed the same.  Will test:
>>>>
>>>> 2012-03-20  Jakub Jelinek  <jakub@redhat.com>
>>>>
>>>>       * config/i386/i386.c (ix86_decompose_address) <case ZERO_EXTEND>:
>>>>       If operand isn't UNSPEC, return 0.
>>>
>>> Committed as obvious now that bootstrap/regtest finished on x86_64-linux
>>> and i686-linux.
>>>
>>>> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.000000000 +0100
>>>> +++ gcc/config/i386/i386.c    2012-03-20 09:56:35.038835835 +0100
>>>> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct
>>>>
>>>>           case ZERO_EXTEND:
>>>>             op = XEXP (op, 0);
>>>> +           if (GET_CODE (op) != UNSPEC)
>>>> +             return 0;
>>>>             /* FALLTHRU */
>>>>
>>>>           case UNSPEC:
>>>
>>
>> Uros,
>>
>> I think use the OS provided instruction to load TP into DImode register
>> could simplify the code.
>
> Which OS provided instruction?
>
> Please see how TP is defined in get_thread_pointer, it is in ptr_mode:
>
>  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
>
> This says that TP is in SImode on X32.
>
> Uros.

TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP)
and provided by OS.  It is a CONST_INT, but its value is opaque
to GCC. MODE here has no impact on its value provided by OS.
X32 OS provides instructions to load TP to into an SImode and
DImode registers.


-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-20 18:27                             ` H.J. Lu
@ 2012-03-20 18:44                               ` Uros Bizjak
  2012-03-20 19:26                                 ` H.J. Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Uros Bizjak @ 2012-03-20 18:44 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson

On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> I think use the OS provided instruction to load TP into DImode register
>>> could simplify the code.
>>
>> Which OS provided instruction?
>>
>> Please see how TP is defined in get_thread_pointer, it is in ptr_mode:
>>
>>  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
>>
>> This says that TP is in SImode on X32.

> TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP)
> and provided by OS.  It is a CONST_INT, but its value is opaque
> to GCC. MODE here has no impact on its value provided by OS.
> X32 OS provides instructions to load TP to into an SImode and
> DImode registers.

You must be looking to some other GCC sources than me.

(define_insn "*load_tp_x32"
  [(set (match_operand:SI 0 "register_operand" "=r")
	(unspec:SI [(const_int 0)] UNSPEC_TP))]
  "TARGET_X32"
  "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}"
  [(set_attr "type" "imov")
   (set_attr "modrm" "0")
   (set_attr "length" "7")
   (set_attr "memory" "load")
   (set_attr "imm_disp" "false")])

(define_insn "*load_tp_x32_zext"
  [(set (match_operand:DI 0 "register_operand" "=r")
	(zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
  "TARGET_X32"
  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
  [(set_attr "type" "imov")
   (set_attr "modrm" "0")
   (set_attr "length" "7")
   (set_attr "memory" "load")
   (set_attr "imm_disp" "false")])

Uros.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: PATCH: Properly generate X32 IE sequence
  2012-03-20 18:44                               ` Uros Bizjak
@ 2012-03-20 19:26                                 ` H.J. Lu
  0 siblings, 0 replies; 43+ messages in thread
From: H.J. Lu @ 2012-03-20 19:26 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Henderson

On Tue, Mar 20, 2012 at 11:43 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> I think use the OS provided instruction to load TP into DImode register
>>>> could simplify the code.
>>>
>>> Which OS provided instruction?
>>>
>>> Please see how TP is defined in get_thread_pointer, it is in ptr_mode:
>>>
>>>  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
>>>
>>> This says that TP is in SImode on X32.
>
>> TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP)
>> and provided by OS.  It is a CONST_INT, but its value is opaque
>> to GCC. MODE here has no impact on its value provided by OS.
>> X32 OS provides instructions to load TP to into an SImode and
>> DImode registers.
>
> You must be looking to some other GCC sources than me.
>
> (define_insn "*load_tp_x32"
>  [(set (match_operand:SI 0 "register_operand" "=r")
>        (unspec:SI [(const_int 0)] UNSPEC_TP))]
>  "TARGET_X32"
>  "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}"
>  [(set_attr "type" "imov")
>   (set_attr "modrm" "0")
>   (set_attr "length" "7")
>   (set_attr "memory" "load")
>   (set_attr "imm_disp" "false")])
>
> (define_insn "*load_tp_x32_zext"
>  [(set (match_operand:DI 0 "register_operand" "=r")
>        (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
>  "TARGET_X32"
>  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
>  [(set_attr "type" "imov")
>   (set_attr "modrm" "0")
>   (set_attr "length" "7")
>   (set_attr "memory" "load")
>   (set_attr "imm_disp" "false")])
>

Thread pointer (TP) points to thread control block (TCB).  X32 TCB is

typedef struct
{
  void *tcb;		/* Pointer to the TCB.  Not necessarily the
			   thread descriptor used by libpthread.  */
  ...
}

It is a 32bit address set up by OS.  That is where 0 in "%fs:0" comes
from since it is the first field of the struct %fs points to.  X32 OS provides

mov %fs:0, %eax

to load the address of TCB into EAX and

mov %fs:0, %eax

to load the address of TCB into RAX since OS guarantees that the upper
32bits of the address of TCB are all 0s. We added "*load_tp_x32_zext"
since we zero-extend SI TP to DI TP.   Or we can use

mov %fs:0, %eax

to directly load the value of the tcb field into RAX and remove
"*load_tp_x32_zext".  It will simplify the code.


-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-03-20 19:26 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-09 22:26 PATCH: Properly generate X32 IE sequence H.J. Lu
2012-03-10 13:10 ` Uros Bizjak
2012-03-10 18:50   ` H.J. Lu
2012-03-11 17:12     ` H.J. Lu
2012-03-11 17:55       ` Uros Bizjak
2012-03-11 18:16         ` H.J. Lu
2012-03-11 18:21           ` Uros Bizjak
2012-03-11 21:25             ` H.J. Lu
2012-03-12 19:39               ` Uros Bizjak
2012-03-12 22:35                 ` H.J. Lu
2012-03-13  1:21                   ` H.J. Lu
2012-03-13  7:11                     ` Uros Bizjak
2012-03-13 10:37                       ` Uros Bizjak
2012-03-13 15:47                         ` H.J. Lu
2012-03-17 17:53                         ` H.J. Lu
2012-03-17 18:10       ` Uros Bizjak
2012-03-17 18:19         ` H.J. Lu
2012-03-17 18:21           ` Uros Bizjak
2012-03-17 21:50             ` H.J. Lu
2012-03-18 16:02               ` Uros Bizjak
2012-03-18 20:55                 ` Uros Bizjak
2012-03-19 15:51                   ` H.J. Lu
2012-03-19 15:54                     ` H.J. Lu
2012-03-19 16:20                       ` H.J. Lu
2012-03-19 16:35                         ` H.J. Lu
2012-03-19 16:38                           ` Uros Bizjak
2012-03-19 16:47                             ` H.J. Lu
2012-03-19 16:49                               ` Uros Bizjak
2012-03-19 16:56                                 ` H.J. Lu
2012-03-19 17:02                                   ` Uros Bizjak
2012-03-19 17:30                                     ` Uros Bizjak
2012-03-19 17:50                                       ` H.J. Lu
2012-03-19 19:14                                         ` Uros Bizjak
2012-03-20  9:35                                           ` Paolo Bonzini
2012-03-19 16:47                         ` Uros Bizjak
2012-03-20  8:52                   ` Eric Botcazou
2012-03-20  8:59                     ` Jakub Jelinek
2012-03-20 11:20                       ` Jakub Jelinek
2012-03-20 15:52                         ` H.J. Lu
2012-03-20 17:55                           ` Uros Bizjak
2012-03-20 18:27                             ` H.J. Lu
2012-03-20 18:44                               ` Uros Bizjak
2012-03-20 19:26                                 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).