public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
@ 2015-06-09 12:05 Jakub Jelinek
  2015-06-09 12:32 ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2015-06-09 12:05 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

Hi!

As mentioned in the PR, when trying to split:
(insn 7 15 13 2 (set (reg:DI 0 ax [92])
        (mem:DI (plus:SI (plus:SI (mult:SI (reg/v:SI 1 dx [orig:89 b ] [89])
                        (const_int 8 [0x8]))
                    (unspec:SI [
                            (const_int 0 [0])
                        ] UNSPEC_TP))
                (reg:SI 0 ax [91])) [1 a S8 A64])) rh1212265.i:2 85 {*movdi_internal}
     (nil))
which has collisions == 2 (both ax and dx used on lhs and both
ax and dx used in the memory address), we generate invalid insn
- lea with %gs: or %fs: in it.  This patch fixes it by using
normal lea instead (so remove the unspec UNSPEC_TP from the address
for lea) and duplicating the unspec UNSPEC_TP to all the memory loads.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk/5/4.9/4.8?

2015-06-09  Jakub Jelinek  <jakub@redhat.com>

	PR target/66470
	* config/i386/i386.c (ix86_split_long_move): For collisions
	involving direct tls segment refs, move the UNSPEC_TP out of
	the address for lea, to each of the memory loads.

	* gcc.dg/tls/pr66470.c: New test.

--- gcc/config/i386/i386.c.jj	2015-06-08 15:41:19.000000000 +0200
+++ gcc/config/i386/i386.c	2015-06-09 11:50:29.960859723 +0200
@@ -22866,7 +22866,7 @@ ix86_split_long_move (rtx operands[])
 	 Do an lea to the last part and use only one colliding move.  */
       else if (collisions > 1)
 	{
-	  rtx base;
+	  rtx base, addr, tls_base = NULL_RTX;
 
 	  collisions = 1;
 
@@ -22877,10 +22877,45 @@ ix86_split_long_move (rtx operands[])
 	  if (GET_MODE (base) != Pmode)
 	    base = gen_rtx_REG (Pmode, REGNO (base));
 
-	  emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
+	  addr = XEXP (part[1][0], 0);
+	  if (TARGET_TLS_DIRECT_SEG_REFS)
+	    {
+	      struct ix86_address parts;
+	      int ok = ix86_decompose_address (addr, &parts);
+	      gcc_assert (ok);
+	      if (parts.seg == DEFAULT_TLS_SEG_REG)
+		{
+		  /* It is not valid to use %gs: or %fs: in
+		     lea though, so we need to remove it from the
+		     address used for lea and add it to each individual
+		     memory loads instead.  */
+		  addr = copy_rtx (addr);
+		  rtx *x = &addr;
+		  while (GET_CODE (*x) == PLUS)
+		    {
+		      for (i = 0; i < 2; i++)
+			if (GET_CODE (XEXP (*x, i)) == UNSPEC
+			    && XINT (XEXP (*x, i), 1) == UNSPEC_TP)
+			  {
+			    tls_base = XEXP (*x, i);
+			    *x = XEXP (*x, 1 - i);
+			    break;
+			  }
+		      if (tls_base)
+			break;
+		      x = &XEXP (*x, 0);
+		    }
+		  gcc_assert (tls_base);
+		}
+	    }
+	  emit_insn (gen_rtx_SET (base, addr));
+	  if (tls_base)
+	    base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
 	  part[1][0] = replace_equiv_address (part[1][0], base);
 	  for (i = 1; i < nparts; i++)
 	    {
+	      if (tls_base)
+		base = copy_rtx (base);
 	      tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
 	      part[1][i] = replace_equiv_address (part[1][i], tmp);
 	    }
--- gcc/testsuite/gcc.dg/tls/pr66470.c.jj	2015-06-09 11:59:05.543954781 +0200
+++ gcc/testsuite/gcc.dg/tls/pr66470.c	2015-06-09 11:58:43.000000000 +0200
@@ -0,0 +1,29 @@
+/* PR target/66470 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target tls } */
+
+extern __thread unsigned long long a[10];
+extern __thread struct S { int a, b; } b[10];
+
+unsigned long long
+foo (long x)
+{
+  return a[x];
+}
+
+struct S
+bar (long x)
+{
+  return b[x];
+}
+
+#ifdef __SIZEOF_INT128__
+extern __thread unsigned __int128 c[10];
+
+unsigned __int128
+baz (long x)
+{
+  return c[x];
+}
+#endif

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 12:05 [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470) Jakub Jelinek
@ 2015-06-09 12:32 ` Uros Bizjak
  2015-06-09 12:57   ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2015-06-09 12:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Tue, Jun 9, 2015 at 1:57 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> As mentioned in the PR, when trying to split:
> (insn 7 15 13 2 (set (reg:DI 0 ax [92])
>         (mem:DI (plus:SI (plus:SI (mult:SI (reg/v:SI 1 dx [orig:89 b ] [89])
>                         (const_int 8 [0x8]))
>                     (unspec:SI [
>                             (const_int 0 [0])
>                         ] UNSPEC_TP))
>                 (reg:SI 0 ax [91])) [1 a S8 A64])) rh1212265.i:2 85 {*movdi_internal}
>      (nil))
> which has collisions == 2 (both ax and dx used on lhs and both
> ax and dx used in the memory address), we generate invalid insn
> - lea with %gs: or %fs: in it.  This patch fixes it by using
> normal lea instead (so remove the unspec UNSPEC_TP from the address
> for lea) and duplicating the unspec UNSPEC_TP to all the memory loads.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk/5/4.9/4.8?
>
> 2015-06-09  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/66470
>         * config/i386/i386.c (ix86_split_long_move): For collisions
>         involving direct tls segment refs, move the UNSPEC_TP out of
>         the address for lea, to each of the memory loads.
>
>         * gcc.dg/tls/pr66470.c: New test.
>
> --- gcc/config/i386/i386.c.jj   2015-06-08 15:41:19.000000000 +0200
> +++ gcc/config/i386/i386.c      2015-06-09 11:50:29.960859723 +0200
> @@ -22866,7 +22866,7 @@ ix86_split_long_move (rtx operands[])
>          Do an lea to the last part and use only one colliding move.  */
>        else if (collisions > 1)
>         {
> -         rtx base;
> +         rtx base, addr, tls_base = NULL_RTX;
>
>           collisions = 1;
>
> @@ -22877,10 +22877,45 @@ ix86_split_long_move (rtx operands[])
>           if (GET_MODE (base) != Pmode)
>             base = gen_rtx_REG (Pmode, REGNO (base));
>
> -         emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
> +         addr = XEXP (part[1][0], 0);
> +         if (TARGET_TLS_DIRECT_SEG_REFS)
> +           {
> +             struct ix86_address parts;
> +             int ok = ix86_decompose_address (addr, &parts);
> +             gcc_assert (ok);
> +             if (parts.seg == DEFAULT_TLS_SEG_REG)
> +               {
> +                 /* It is not valid to use %gs: or %fs: in
> +                    lea though, so we need to remove it from the
> +                    address used for lea and add it to each individual
> +                    memory loads instead.  */
> +                 addr = copy_rtx (addr);
> +                 rtx *x = &addr;
> +                 while (GET_CODE (*x) == PLUS)

Why not use RTX iterators here? IMO, it would be much more readable.

Uros.

> +                   {
> +                     for (i = 0; i < 2; i++)
> +                       if (GET_CODE (XEXP (*x, i)) == UNSPEC
> +                           && XINT (XEXP (*x, i), 1) == UNSPEC_TP)
> +                         {
> +                           tls_base = XEXP (*x, i);
> +                           *x = XEXP (*x, 1 - i);
> +                           break;
> +                         }
> +                     if (tls_base)
> +                       break;
> +                     x = &XEXP (*x, 0);
> +                   }
> +                 gcc_assert (tls_base);
> +               }
> +           }
> +         emit_insn (gen_rtx_SET (base, addr));
> +         if (tls_base)
> +           base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
>           part[1][0] = replace_equiv_address (part[1][0], base);
>           for (i = 1; i < nparts; i++)
>             {
> +             if (tls_base)
> +               base = copy_rtx (base);
>               tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
>               part[1][i] = replace_equiv_address (part[1][i], tmp);
>             }
> --- gcc/testsuite/gcc.dg/tls/pr66470.c.jj       2015-06-09 11:59:05.543954781 +0200
> +++ gcc/testsuite/gcc.dg/tls/pr66470.c  2015-06-09 11:58:43.000000000 +0200
> @@ -0,0 +1,29 @@
> +/* PR target/66470 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target tls } */
> +
> +extern __thread unsigned long long a[10];
> +extern __thread struct S { int a, b; } b[10];
> +
> +unsigned long long
> +foo (long x)
> +{
> +  return a[x];
> +}
> +
> +struct S
> +bar (long x)
> +{
> +  return b[x];
> +}
> +
> +#ifdef __SIZEOF_INT128__
> +extern __thread unsigned __int128 c[10];
> +
> +unsigned __int128
> +baz (long x)
> +{
> +  return c[x];
> +}
> +#endif
>
>         Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 12:32 ` Uros Bizjak
@ 2015-06-09 12:57   ` Jakub Jelinek
  2015-06-09 13:26     ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2015-06-09 12:57 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Tue, Jun 09, 2015 at 02:32:07PM +0200, Uros Bizjak wrote:
> > -         emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
> > +         addr = XEXP (part[1][0], 0);
> > +         if (TARGET_TLS_DIRECT_SEG_REFS)
> > +           {
> > +             struct ix86_address parts;
> > +             int ok = ix86_decompose_address (addr, &parts);
> > +             gcc_assert (ok);
> > +             if (parts.seg == DEFAULT_TLS_SEG_REG)
> > +               {
> > +                 /* It is not valid to use %gs: or %fs: in
> > +                    lea though, so we need to remove it from the
> > +                    address used for lea and add it to each individual
> > +                    memory loads instead.  */
> > +                 addr = copy_rtx (addr);
> > +                 rtx *x = &addr;
> > +                 while (GET_CODE (*x) == PLUS)
> 
> Why not use RTX iterators here? IMO, it would be much more readable.

Do you mean something like this?  It is larger and don't see readability
advantages there at all (plus the 4.8/4.9 backports can't use that anyway).
But if you prefer it, I can retest it.
I still need to look for a PLUS and scan the individual operands of it,
because the desired change is that the PLUS is replaced with its operand
(one that isn't UNSPEC_TP).  And getting rid of the ix86_decompose_address
would mean either unconditionally performing (wasteful) copy_rtx, or
two RTX iterators cycles.  The PLUS it is looking for has to be
a toplevel PLUS or PLUS in XEXP (x, 0) of that (recursively), otherwise
ix86_decompose_address wouldn't recognize it.

2015-06-09  Jakub Jelinek  <jakub@redhat.com>

	PR target/66470
	* config/i386/i386.c (ix86_split_long_move): For collisions
	involving direct tls segment refs, move the UNSPEC_TP out of
	the address for lea, to each of the memory loads.

	* gcc.dg/tls/pr66470.c: New test.

--- gcc/config/i386/i386.c.jj	2015-06-08 15:41:19.000000000 +0200
+++ gcc/config/i386/i386.c	2015-06-09 14:42:18.357849227 +0200
@@ -22866,7 +22866,7 @@ ix86_split_long_move (rtx operands[])
 	 Do an lea to the last part and use only one colliding move.  */
       else if (collisions > 1)
 	{
-	  rtx base;
+	  rtx base, addr, tls_base = NULL_RTX;
 
 	  collisions = 1;
 
@@ -22877,10 +22877,48 @@ ix86_split_long_move (rtx operands[])
 	  if (GET_MODE (base) != Pmode)
 	    base = gen_rtx_REG (Pmode, REGNO (base));
 
-	  emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
+	  addr = XEXP (part[1][0], 0);
+	  if (TARGET_TLS_DIRECT_SEG_REFS)
+	    {
+	      struct ix86_address parts;
+	      int ok = ix86_decompose_address (addr, &parts);
+	      gcc_assert (ok);
+	      if (parts.seg == DEFAULT_TLS_SEG_REG)
+		{
+		  /* It is not valid to use %gs: or %fs: in
+		     lea though, so we need to remove it from the
+		     address used for lea and add it to each individual
+		     memory loads instead.  */
+		  addr = copy_rtx (addr);
+		  subrtx_ptr_iterator::array_type array;
+		  FOR_EACH_SUBRTX_PTR (iter, array, &addr, NONCONST)
+		    {
+		      rtx *x = *iter;
+		      if (GET_CODE (*x) == PLUS)
+			{
+			  for (i = 0; i < 2; i++)
+			    if (GET_CODE (XEXP (*x, i)) == UNSPEC
+				&& XINT (XEXP (*x, i), 1) == UNSPEC_TP)
+			      {
+				tls_base = XEXP (*x, i);
+				*x = XEXP (*x, 1 - i);
+				break;
+			      }
+			  if (tls_base)
+			    break;
+			}
+		    }
+		  gcc_assert (tls_base);
+		}
+	    }
+	  emit_insn (gen_rtx_SET (base, addr));
+	  if (tls_base)
+	    base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
 	  part[1][0] = replace_equiv_address (part[1][0], base);
 	  for (i = 1; i < nparts; i++)
 	    {
+	      if (tls_base)
+		base = copy_rtx (base);
 	      tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
 	      part[1][i] = replace_equiv_address (part[1][i], tmp);
 	    }
--- gcc/testsuite/gcc.dg/tls/pr66470.c.jj	2015-06-09 11:59:05.543954781 +0200
+++ gcc/testsuite/gcc.dg/tls/pr66470.c	2015-06-09 11:58:43.000000000 +0200
@@ -0,0 +1,29 @@
+/* PR target/66470 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target tls } */
+
+extern __thread unsigned long long a[10];
+extern __thread struct S { int a, b; } b[10];
+
+unsigned long long
+foo (long x)
+{
+  return a[x];
+}
+
+struct S
+bar (long x)
+{
+  return b[x];
+}
+
+#ifdef __SIZEOF_INT128__
+extern __thread unsigned __int128 c[10];
+
+unsigned __int128
+baz (long x)
+{
+  return c[x];
+}
+#endif


	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 12:57   ` Jakub Jelinek
@ 2015-06-09 13:26     ` Uros Bizjak
  2015-06-09 13:44       ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2015-06-09 13:26 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Tue, Jun 9, 2015 at 2:57 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 02:32:07PM +0200, Uros Bizjak wrote:
>> > -         emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
>> > +         addr = XEXP (part[1][0], 0);
>> > +         if (TARGET_TLS_DIRECT_SEG_REFS)
>> > +           {
>> > +             struct ix86_address parts;
>> > +             int ok = ix86_decompose_address (addr, &parts);
>> > +             gcc_assert (ok);
>> > +             if (parts.seg == DEFAULT_TLS_SEG_REG)
>> > +               {
>> > +                 /* It is not valid to use %gs: or %fs: in
>> > +                    lea though, so we need to remove it from the
>> > +                    address used for lea and add it to each individual
>> > +                    memory loads instead.  */
>> > +                 addr = copy_rtx (addr);
>> > +                 rtx *x = &addr;
>> > +                 while (GET_CODE (*x) == PLUS)
>>
>> Why not use RTX iterators here? IMO, it would be much more readable.
>
> Do you mean something like this?  It is larger and don't see readability
> advantages there at all (plus the 4.8/4.9 backports can't use that anyway).
> But if you prefer it, I can retest it.

I'm afraid that simple scan loop won't work correctly on x32. There
are some issues with UNSPEC_TP for this target, so we have to generate
zero_extend of SImode UNSPEC, e.g.:

(plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...))

as can be seen in get_thread_pointer to construct the address. It
looks that your loop won't find the UNSPEC_TP tag in the above case.

Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 13:26     ` Uros Bizjak
@ 2015-06-09 13:44       ` Jakub Jelinek
  2015-06-09 14:06         ` Uros Bizjak
  2015-06-09 14:21         ` Uros Bizjak
  0 siblings, 2 replies; 16+ messages in thread
From: Jakub Jelinek @ 2015-06-09 13:44 UTC (permalink / raw)
  To: Uros Bizjak, H.J. Lu; +Cc: gcc-patches

On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote:
> I'm afraid that simple scan loop won't work correctly on x32. There
> are some issues with UNSPEC_TP for this target, so we have to generate
> zero_extend of SImode UNSPEC, e.g.:
> 
> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...))
> 
> as can be seen in get_thread_pointer to construct the address. It
> looks that your loop won't find the UNSPEC_TP tag in the above case.

You're right, for -m32 it would need to start with
   rtx *x = &addr;
+  while (GET_CODE (*x) == ZERO_EXTEND
+	 || GET_CODE (*x) == AND
+	 || GET_CODE (*x) == SUBREG)
+    x = &XEXP (*x, 0);
to get at the PLUS.  Now, with either the original patch with the above
ammendment, or with the iterators, the question is what to do
with the UNSPEC_TP for -m32.  Is it ok to just use addr32 on the
lea and use normal Pmode (== DImode) addressing for the memory reads
(if I read the code well, that is what it does right now for the non-TLS
ones:
          if (GET_MODE (base) != Pmode)
            base = gen_rtx_REG (Pmode, REGNO (base));
)?  Then we'd need to change tls_base mode to Pmode.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 13:44       ` Jakub Jelinek
@ 2015-06-09 14:06         ` Uros Bizjak
  2015-06-09 14:21         ` Uros Bizjak
  1 sibling, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2015-06-09 14:06 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote:
>> I'm afraid that simple scan loop won't work correctly on x32. There
>> are some issues with UNSPEC_TP for this target, so we have to generate
>> zero_extend of SImode UNSPEC, e.g.:
>>
>> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...))
>>
>> as can be seen in get_thread_pointer to construct the address. It
>> looks that your loop won't find the UNSPEC_TP tag in the above case.
>
> You're right, for -m32 it would need to start with
>    rtx *x = &addr;
> +  while (GET_CODE (*x) == ZERO_EXTEND
> +        || GET_CODE (*x) == AND
> +        || GET_CODE (*x) == SUBREG)
> +    x = &XEXP (*x, 0);
> to get at the PLUS.  Now, with either the original patch with the above
> ammendment, or with the iterators, the question is what to do
> with the UNSPEC_TP for -m32.  Is it ok to just use addr32 on the
> lea and use normal Pmode (== DImode) addressing for the memory reads
> (if I read the code well, that is what it does right now for the non-TLS
> ones:
>           if (GET_MODE (base) != Pmode)
>             base = gen_rtx_REG (Pmode, REGNO (base));
> )?  Then we'd need to change tls_base mode to Pmode.

(I assume you referred to -mx32 above. Please also note that -mx32
uses Pmode == SImode by default).

Yes, please emit zero-extended address load to a DImode register, and
use this register with UNSPEC_TP to form final DImode address. The LEA
that will be generated from address load will use DImode inputs due to
%H operand modifier and will output to (implicitly zero-extended)
SImode register.

FYI, you just hit two special cases here:
- addresses that mention %fs and %gs are always in DImode
- LEA input operands are also always in DImode, but compiler already
takes care of it.

Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 13:44       ` Jakub Jelinek
  2015-06-09 14:06         ` Uros Bizjak
@ 2015-06-09 14:21         ` Uros Bizjak
  2015-06-09 14:44           ` Jakub Jelinek
  1 sibling, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2015-06-09 14:21 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote:
>> I'm afraid that simple scan loop won't work correctly on x32. There
>> are some issues with UNSPEC_TP for this target, so we have to generate
>> zero_extend of SImode UNSPEC, e.g.:
>>
>> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...))
>>
>> as can be seen in get_thread_pointer to construct the address. It
>> looks that your loop won't find the UNSPEC_TP tag in the above case.
>
> You're right, for -m32 it would need to start with
>    rtx *x = &addr;
> +  while (GET_CODE (*x) == ZERO_EXTEND
> +        || GET_CODE (*x) == AND
> +        || GET_CODE (*x) == SUBREG)
> +    x = &XEXP (*x, 0);

Oh, you can use SImode_address_operand predicate here.

Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 14:21         ` Uros Bizjak
@ 2015-06-09 14:44           ` Jakub Jelinek
  2015-06-09 16:19             ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2015-06-09 14:44 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches

On Tue, Jun 09, 2015 at 04:12:33PM +0200, Uros Bizjak wrote:
> On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote:
> >> I'm afraid that simple scan loop won't work correctly on x32. There
> >> are some issues with UNSPEC_TP for this target, so we have to generate
> >> zero_extend of SImode UNSPEC, e.g.:
> >>
> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...))
> >>
> >> as can be seen in get_thread_pointer to construct the address. It
> >> looks that your loop won't find the UNSPEC_TP tag in the above case.
> >
> > You're right, for -m32 it would need to start with

Yeah, I meant -mx32 (which I have no experience with nor spare time for).

> >    rtx *x = &addr;
> > +  while (GET_CODE (*x) == ZERO_EXTEND
> > +        || GET_CODE (*x) == AND
> > +        || GET_CODE (*x) == SUBREG)
> > +    x = &XEXP (*x, 0);
> 
> Oh, you can use SImode_address_operand predicate here.

Do I need to loop, or can there be just one SImode_address_operand
code?  Do you want to use the iterators (as in the second patch) or not
(then is
  if (SImode_address_operand (addr, VOIDmode))
    x = &XEXP (addr, 0);
ok)?  Is Pmode always SImode for -mx32, or depending on some switch or
something?  Would it be acceptable to just guard the changes in the patch
with !TARGET_X32 and let H.J. deal with that target?  I'm afraid I'm lost
when to ZERO_EXTEND addr (if needed at all), etc.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 14:44           ` Jakub Jelinek
@ 2015-06-09 16:19             ` Uros Bizjak
  2015-06-09 16:57               ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2015-06-09 16:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

On Tue, Jun 9, 2015 at 4:26 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>> >> I'm afraid that simple scan loop won't work correctly on x32. There
>> >> are some issues with UNSPEC_TP for this target, so we have to generate
>> >> zero_extend of SImode UNSPEC, e.g.:
>> >>
>> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...))
>> >>
>> >> as can be seen in get_thread_pointer to construct the address. It
>> >> looks that your loop won't find the UNSPEC_TP tag in the above case.
>> >
>> > You're right, for -m32 it would need to start with
>
> Yeah, I meant -mx32 (which I have no experience with nor spare time for).
>
>> >    rtx *x = &addr;
>> > +  while (GET_CODE (*x) == ZERO_EXTEND
>> > +        || GET_CODE (*x) == AND
>> > +        || GET_CODE (*x) == SUBREG)
>> > +    x = &XEXP (*x, 0);
>>
>> Oh, you can use SImode_address_operand predicate here.
>
> Do I need to loop, or can there be just one SImode_address_operand

IIRC, apart from the whole address, only UNSPEC_TP can be
zero_extended. It is a hardware "feature" (== HW bug) that addr32
doesn't apply to segment registers.

> code?  Do you want to use the iterators (as in the second patch) or not
> (then is
>   if (SImode_address_operand (addr, VOIDmode))
>     x = &XEXP (addr, 0);
> ok)?  Is Pmode always SImode for -mx32, or depending on some switch or

Nope, it depends on -maddress-mode switch, and can be SImode or DImode.

> something?  Would it be acceptable to just guard the changes in the patch
> with !TARGET_X32 and let H.J. deal with that target?  I'm afraid I'm lost
> when to ZERO_EXTEND addr (if needed at all), etc.

If you wish, I can take your patch and take if further. -mx32 is a
delicate beast...

Uros.
>
>         Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 16:19             ` Uros Bizjak
@ 2015-06-09 16:57               ` Jakub Jelinek
  2015-06-09 19:17                 ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2015-06-09 16:57 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches

On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote:
> > something?  Would it be acceptable to just guard the changes in the patch
> > with !TARGET_X32 and let H.J. deal with that target?  I'm afraid I'm lost
> > when to ZERO_EXTEND addr (if needed at all), etc.
> 
> If you wish, I can take your patch and take if further. -mx32 is a
> delicate beast...

If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff
now.
Note that for -m64/-mx32 it will be much harder to create a reproducer,
because to trigger the bug one has to convince the register allocator
to allocate the lhs of the load in certain registers (not that hard),
but also the index register (to be scaled, also not that hard) and
also the register holding the tls symbol immediate.  Wonder if one has to
keep all but the two registers live across the load or something similar.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 16:57               ` Jakub Jelinek
@ 2015-06-09 19:17                 ` Uros Bizjak
  2015-06-09 20:11                   ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2015-06-09 19:17 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]

On Tue, Jun 9, 2015 at 6:21 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote:
>> > something?  Would it be acceptable to just guard the changes in the patch
>> > with !TARGET_X32 and let H.J. deal with that target?  I'm afraid I'm lost
>> > when to ZERO_EXTEND addr (if needed at all), etc.
>>
>> If you wish, I can take your patch and take if further. -mx32 is a
>> delicate beast...
>
> If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff
> now.
> Note that for -m64/-mx32 it will be much harder to create a reproducer,
> because to trigger the bug one has to convince the register allocator
> to allocate the lhs of the load in certain registers (not that hard),
> but also the index register (to be scaled, also not that hard) and
> also the register holding the tls symbol immediate.  Wonder if one has to
> keep all but the two registers live across the load or something similar.

Please find attach a patch that takes your idea slightly further. We
find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
its place, we simply slap const0_rtx. We know that address to
multi-word values has to be offsettable, which in case of x32 means
that it is NOT zero-extended address.

Uros.

[-- Attachment #2: r.diff.txt --]
[-- Type: text/plain, Size: 2063 bytes --]

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 224292)
+++ config/i386/i386.c	(working copy)
@@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[])
 	 Do an lea to the last part and use only one colliding move.  */
       else if (collisions > 1)
 	{
-	  rtx base;
+	  rtx base, addr, tls_base = NULL_RTX;
 
 	  collisions = 1;
 
@@ -22869,10 +22869,52 @@ ix86_split_long_move (rtx operands[])
 	  if (GET_MODE (base) != Pmode)
 	    base = gen_rtx_REG (Pmode, REGNO (base));
 
-	  emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
+	  addr = XEXP (part[1][0], 0);
+	  if (TARGET_TLS_DIRECT_SEG_REFS)
+	    {
+	      struct ix86_address parts;
+	      int ok = ix86_decompose_address (addr, &parts);
+	      gcc_assert (ok);
+	      if (parts.seg != SEG_DEFAULT)
+		{
+		  /* It is not valid to use %gs: or %fs: in
+		     lea though, so we need to remove it from the
+		     address used for lea and add it to each individual
+		     memory loads instead.  */
+		  rtx *x = &addr;
+                  while (GET_CODE (*x) == PLUS)
+                    {
+                      for (i = 0; i < 2; i++)
+			{
+			  rtx op = XEXP (*x, i);
+			  if ((GET_CODE (op) == UNSPEC
+			     && XINT (op, 1) == UNSPEC_TP)
+			    || (GET_CODE (op) == ZERO_EXTEND
+				&& GET_CODE (XEXP (op, 0)) == UNSPEC
+				&& (XINT (XEXP (op, 0), 1)
+				    == UNSPEC_TP)))
+			  {
+			    tls_base = XEXP (*x, i);
+			    XEXP (*x, i) = const0_rtx;
+			    break;
+			  }
+			}
+
+		      if (tls_base)
+			break;
+		      x = &XEXP (*x, 0);
+		    }
+		  gcc_assert (tls_base);
+		}
+	    }
+	  emit_insn (gen_rtx_SET (base, addr));
+	  if (tls_base)
+	    base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
 	  part[1][0] = replace_equiv_address (part[1][0], base);
 	  for (i = 1; i < nparts; i++)
 	    {
+	      if (tls_base)
+		base = copy_rtx (base);
 	      tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
 	      part[1][i] = replace_equiv_address (part[1][i], tmp);
 	    }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 19:17                 ` Uros Bizjak
@ 2015-06-09 20:11                   ` Jakub Jelinek
  2015-06-10  6:38                     ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2015-06-09 20:11 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches

On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote:
> Please find attach a patch that takes your idea slightly further. We
> find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
> its place, we simply slap const0_rtx. We know that address to

Is that safe?  I mean, the address, even if offsetable, can have some
immediate already (seems e.g. the offsettable_memref_p predicate just checks
you can plus_constant some small integer and be recognized again) and if you
turn the %gs: into a const0_rtx, it would fail next decompose.
And when you already have the PLUS which has UNSPEC_TP as one of its
arguments, replacing that PLUS with the other argument is IMHO very easy.
Perhaps you are right that there is no need to copy_rtx, supposedly
the rtx shouldn't be shared with anything and thus can be modified in place.

If -mx32 is a non-issue here, then perhaps my initial patch is good enough?

> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c	(revision 224292)
> +++ config/i386/i386.c	(working copy)
> @@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[])
>  	 Do an lea to the last part and use only one colliding move.  */
>        else if (collisions > 1)
>  	{
> -	  rtx base;
> +	  rtx base, addr, tls_base = NULL_RTX;
>  
>  	  collisions = 1;
>  
> @@ -22869,10 +22869,52 @@ ix86_split_long_move (rtx operands[])
>  	  if (GET_MODE (base) != Pmode)
>  	    base = gen_rtx_REG (Pmode, REGNO (base));
>  
> -	  emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
> +	  addr = XEXP (part[1][0], 0);
> +	  if (TARGET_TLS_DIRECT_SEG_REFS)
> +	    {
> +	      struct ix86_address parts;
> +	      int ok = ix86_decompose_address (addr, &parts);
> +	      gcc_assert (ok);
> +	      if (parts.seg != SEG_DEFAULT)
> +		{
> +		  /* It is not valid to use %gs: or %fs: in
> +		     lea though, so we need to remove it from the
> +		     address used for lea and add it to each individual
> +		     memory loads instead.  */
> +		  rtx *x = &addr;
> +                  while (GET_CODE (*x) == PLUS)
> +                    {
> +                      for (i = 0; i < 2; i++)
> +			{
> +			  rtx op = XEXP (*x, i);
> +			  if ((GET_CODE (op) == UNSPEC
> +			     && XINT (op, 1) == UNSPEC_TP)
> +			    || (GET_CODE (op) == ZERO_EXTEND
> +				&& GET_CODE (XEXP (op, 0)) == UNSPEC
> +				&& (XINT (XEXP (op, 0), 1)
> +				    == UNSPEC_TP)))
> +			  {
> +			    tls_base = XEXP (*x, i);
> +			    XEXP (*x, i) = const0_rtx;
> +			    break;
> +			  }
> +			}
> +
> +		      if (tls_base)
> +			break;
> +		      x = &XEXP (*x, 0);
> +		    }
> +		  gcc_assert (tls_base);
> +		}
> +	    }
> +	  emit_insn (gen_rtx_SET (base, addr));
> +	  if (tls_base)
> +	    base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
>  	  part[1][0] = replace_equiv_address (part[1][0], base);
>  	  for (i = 1; i < nparts; i++)
>  	    {
> +	      if (tls_base)
> +		base = copy_rtx (base);
>  	      tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
>  	      part[1][i] = replace_equiv_address (part[1][i], tmp);
>  	    }


	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-09 20:11                   ` Jakub Jelinek
@ 2015-06-10  6:38                     ` Uros Bizjak
  2015-06-10  6:43                       ` Richard Sandiford
  2015-06-10  7:07                       ` Jakub Jelinek
  0 siblings, 2 replies; 16+ messages in thread
From: Uros Bizjak @ 2015-06-10  6:38 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

On Tue, Jun 9, 2015 at 9:30 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote:
>> Please find attach a patch that takes your idea slightly further. We
>> find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
>> its place, we simply slap const0_rtx. We know that address to
>
> Is that safe?  I mean, the address, even if offsetable, can have some
> immediate already (seems e.g. the offsettable_memref_p predicate just checks
> you can plus_constant some small integer and be recognized again) and if you
> turn the %gs: into a const0_rtx, it would fail next decompose.
> And when you already have the PLUS which has UNSPEC_TP as one of its
> arguments, replacing that PLUS with the other argument is IMHO very easy.
> Perhaps you are right that there is no need to copy_rtx, supposedly
> the rtx shouldn't be shared with anything and thus can be modified in place.

Hm, you are right. I was under impression that decompose_address can
handle multiple CONST_INT addends, which is unfortunatelly not the
case.

> If -mx32 is a non-issue here, then perhaps my initial patch is good enough?

It looks to me, that if you detect and record zero-extended UNSPEC_TP,
your original patch would also handle -mx32.

Can you please repost your original patch with the above addition?

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-10  6:38                     ` Uros Bizjak
@ 2015-06-10  6:43                       ` Richard Sandiford
  2015-06-10  7:07                       ` Jakub Jelinek
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2015-06-10  6:43 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, H.J. Lu, gcc-patches

Uros Bizjak <ubizjak@gmail.com> writes:
> On Tue, Jun 9, 2015 at 9:30 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote:
>>> Please find attach a patch that takes your idea slightly further. We
>>> find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
>>> its place, we simply slap const0_rtx. We know that address to
>>
>> Is that safe?  I mean, the address, even if offsetable, can have some
>> immediate already (seems e.g. the offsettable_memref_p predicate just checks
>> you can plus_constant some small integer and be recognized again) and if you
>> turn the %gs: into a const0_rtx, it would fail next decompose.
>> And when you already have the PLUS which has UNSPEC_TP as one of its
>> arguments, replacing that PLUS with the other argument is IMHO very easy.
>> Perhaps you are right that there is no need to copy_rtx, supposedly
>> the rtx shouldn't be shared with anything and thus can be modified in place.
>
> Hm, you are right. I was under impression that decompose_address can
> handle multiple CONST_INT addends, which is unfortunatelly not the
> case.

That's in some ways a feature though.  I don't think we want to support
multiple offsets, since that implies having more than one representation
for the same address.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-10  6:38                     ` Uros Bizjak
  2015-06-10  6:43                       ` Richard Sandiford
@ 2015-06-10  7:07                       ` Jakub Jelinek
  2015-06-10  7:13                         ` Uros Bizjak
  1 sibling, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2015-06-10  7:07 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches

On Wed, Jun 10, 2015 at 08:06:08AM +0200, Uros Bizjak wrote:
> > If -mx32 is a non-issue here, then perhaps my initial patch is good enough?
> 
> It looks to me, that if you detect and record zero-extended UNSPEC_TP,
> your original patch would also handle -mx32.
> 
> Can you please repost your original patch with the above addition?

I've managed to come up with a testcase that ICEs on -mx32 (with
-maddress-mode=long, with the default UNSPEC_TP seems to be always loaded
separately), and this version fixes even that, ok for trunk/5/4.9/4.8?

2015-06-10  Jakub Jelinek  <jakub@redhat.com>

	PR target/66470
	* config/i386/i386.c (ix86_split_long_move): For collisions
	involving direct tls segment refs, move the UNSPEC_TP possibly
	wrapped in ZERO_EXTEND out of the address for lea, to each of
	the memory loads.

	* gcc.dg/tls/pr66470.c: New test.
	* gcc.target/i386/pr66470.c: New test.

--- gcc/config/i386/i386.c.jj	2015-06-10 08:18:31.170053193 +0200
+++ gcc/config/i386/i386.c	2015-06-10 08:51:21.315434960 +0200
@@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[])
 	 Do an lea to the last part and use only one colliding move.  */
       else if (collisions > 1)
 	{
-	  rtx base;
+	  rtx base, addr, tls_base = NULL_RTX;
 
 	  collisions = 1;
 
@@ -22869,10 +22869,50 @@ ix86_split_long_move (rtx operands[])
 	  if (GET_MODE (base) != Pmode)
 	    base = gen_rtx_REG (Pmode, REGNO (base));
 
-	  emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
+	  addr = XEXP (part[1][0], 0);
+	  if (TARGET_TLS_DIRECT_SEG_REFS)
+	    {
+	      struct ix86_address parts;
+	      int ok = ix86_decompose_address (addr, &parts);
+	      gcc_assert (ok);
+	      if (parts.seg == DEFAULT_TLS_SEG_REG)
+		{
+		  /* It is not valid to use %gs: or %fs: in
+		     lea though, so we need to remove it from the
+		     address used for lea and add it to each individual
+		     memory loads instead.  */
+		  addr = copy_rtx (addr);
+		  rtx *x = &addr;
+		  while (GET_CODE (*x) == PLUS)
+		    {
+		      for (i = 0; i < 2; i++)
+			{
+			  rtx u = XEXP (*x, i);
+			  if (GET_CODE (u) == ZERO_EXTEND)
+			    u = XEXP (u, 0);
+			  if (GET_CODE (u) == UNSPEC
+			      && XINT (u, 1) == UNSPEC_TP)
+			    {
+			      tls_base = XEXP (*x, i);
+			      *x = XEXP (*x, 1 - i);
+			      break;
+			    }
+			}
+		      if (tls_base)
+			break;
+		      x = &XEXP (*x, 0);
+		    }
+		  gcc_assert (tls_base);
+		}
+	    }
+	  emit_insn (gen_rtx_SET (base, addr));
+	  if (tls_base)
+	    base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
 	  part[1][0] = replace_equiv_address (part[1][0], base);
 	  for (i = 1; i < nparts; i++)
 	    {
+	      if (tls_base)
+		base = copy_rtx (base);
 	      tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
 	      part[1][i] = replace_equiv_address (part[1][i], tmp);
 	    }
--- gcc/testsuite/gcc.dg/tls/pr66470.c.jj	2015-06-10 08:43:27.719773302 +0200
+++ gcc/testsuite/gcc.dg/tls/pr66470.c	2015-06-10 08:43:27.719773302 +0200
@@ -0,0 +1,29 @@
+/* PR target/66470 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target tls } */
+
+extern __thread unsigned long long a[10];
+extern __thread struct S { int a, b; } b[10];
+
+unsigned long long
+foo (long x)
+{
+  return a[x];
+}
+
+struct S
+bar (long x)
+{
+  return b[x];
+}
+
+#ifdef __SIZEOF_INT128__
+extern __thread unsigned __int128 c[10];
+
+unsigned __int128
+baz (long x)
+{
+  return c[x];
+}
+#endif
--- gcc/testsuite/gcc.target/i386/pr66470.c.jj	2015-06-10 08:45:10.111186752 +0200
+++ gcc/testsuite/gcc.target/i386/pr66470.c	2015-06-10 08:47:30.914005019 +0200
@@ -0,0 +1,13 @@
+/* PR target/66470 */
+/* { dg-do compile { target { ! { ia32 } } } } */
+/* { dg-options "-O2 -mx32 -maddress-mode=long" } */
+/* { dg-require-effective-target tls } */
+
+extern __thread unsigned __int128 c[10];
+int d;
+
+unsigned __int128
+foo (void)
+{
+  return c[d];
+}


	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
  2015-06-10  7:07                       ` Jakub Jelinek
@ 2015-06-10  7:13                         ` Uros Bizjak
  0 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2015-06-10  7:13 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

On Wed, Jun 10, 2015 at 9:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jun 10, 2015 at 08:06:08AM +0200, Uros Bizjak wrote:
>> > If -mx32 is a non-issue here, then perhaps my initial patch is good enough?
>>
>> It looks to me, that if you detect and record zero-extended UNSPEC_TP,
>> your original patch would also handle -mx32.
>>
>> Can you please repost your original patch with the above addition?
>
> I've managed to come up with a testcase that ICEs on -mx32 (with
> -maddress-mode=long, with the default UNSPEC_TP seems to be always loaded
> separately), and this version fixes even that, ok for trunk/5/4.9/4.8?
>
> 2015-06-10  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/66470
>         * config/i386/i386.c (ix86_split_long_move): For collisions
>         involving direct tls segment refs, move the UNSPEC_TP possibly
>         wrapped in ZERO_EXTEND out of the address for lea, to each of
>         the memory loads.
>
>         * gcc.dg/tls/pr66470.c: New test.
>         * gcc.target/i386/pr66470.c: New test.

Yes, this patch is OK for mainline and release branches.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-06-10  7:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 12:05 [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470) Jakub Jelinek
2015-06-09 12:32 ` Uros Bizjak
2015-06-09 12:57   ` Jakub Jelinek
2015-06-09 13:26     ` Uros Bizjak
2015-06-09 13:44       ` Jakub Jelinek
2015-06-09 14:06         ` Uros Bizjak
2015-06-09 14:21         ` Uros Bizjak
2015-06-09 14:44           ` Jakub Jelinek
2015-06-09 16:19             ` Uros Bizjak
2015-06-09 16:57               ` Jakub Jelinek
2015-06-09 19:17                 ` Uros Bizjak
2015-06-09 20:11                   ` Jakub Jelinek
2015-06-10  6:38                     ` Uros Bizjak
2015-06-10  6:43                       ` Richard Sandiford
2015-06-10  7:07                       ` Jakub Jelinek
2015-06-10  7:13                         ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).