public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
@ 2016-05-26 13:55 Kyrill Tkachov
  2016-06-06 14:36 ` Kyrill Tkachov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2016-05-26 13:55 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4407 bytes --]

Hi all,

In this PR we want to optimise:
int foo (int i)
{
   return (i == 0) ? N : __builtin_clz (i);
}

on targets where CLZ is defined at zero to the constant 'N'.
This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO macro.
The obvious place to implement this would be in combine through simplify-rtx where we'd
recognise an IF_THEN_ELSE of the form:
(set (reg:SI r1)
      (if_then_else:SI (ne (reg:SI r2)
                           (const_int 0 [0]))
        (clz:SI (reg:SI r2))
        (const_int 32)))

and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify it into
just (clz:SI (reg:SI r2)).
However, I found this doesn't quite happen for a couple of reasons:
1) This depends on ifcvt or some other pass to have created a conditional move of the
two branches that provide the IF_THEN_ELSE to propagate the const_int and clz operation into.

2) Combine will refuse to propagate r2 from the above example into both the condition and the
CLZ at the same time, so the most we see is:
(set (reg:SI r1)
      (if_then_else:SI (ne (reg:CC cc)
             (const_int 0))
        (clz:SI (reg:SI r2))
        (const_int 32)))

which is not enough information to perform the simplification.

This patch implements the optimisation in ce1 using the noce ifcvt framework.
During ifcvt noce_process_if_block can see that we're trying to optimise something
of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all the information
needed to perform the transformation.

The transformation is performed by adding a new noce_try* function that tries to put the
condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to simplify that
using the simplify-rtx machinery. That way, we can implement the simplification logic in
simplify-rtx.c where it belongs.

A similar transformation for CTZ is implemented as well.
So for code:
int foo (int i)
{
   return (i == 0) ? 32 : __builtin_clz (i);
}

On aarch64 we now emit:
foo:
         clz     w0, w0
         ret

instead of:
foo:
         mov     w1, 32
         clz     w2, w0
         cmp     w0, 0
         csel    w0, w2, w1, ne
         ret

and for arm similarly we generate:
foo:
         clz     r0, r0
         bx      lr

instead of:
foo:
         cmp     r0, #0
         clzne   r0, r0
         moveq   r0, #32
         bx      lr


and for x86_64 with -O2 -mlzcnt we generate:
foo:
         xorl    %eax, %eax
         lzcntl  %edi, %eax
         ret

instead of:
foo:
         xorl    %eax, %eax
         movl    $32, %edx
         lzcntl  %edi, %eax
         testl   %edi, %edi
         cmove   %edx, %eax
         ret


I tried getting this to work on other targets as well, but encountered difficulties.
For example on powerpc the two arms of the condition seen during ifcvt are:

(insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
         (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
      (nil))
and
(insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
         (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
      (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
         (nil)))

So the setup code in noce_process_if_block sees that the set destination is not the same
((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
I suppose that's a consequence of how SImode operations are represented in early RTL
on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be taught to handle
destinations that are subregs of one another, but that would be a separate patch.

Anyway, is this patch ok for trunk?

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, x86_64-pc-linux-gnu.

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/37780
     * ifcvt.c (noce_try_ifelse_collapse): New function.
     Declare prototype.
     (noce_process_if_block): Call noce_try_ifelse_collapse.
     * simplify-rtx.c (simplify_cond_clz_ctz): New function.
     (simplify_ternary_operation): Use the above to simplify
     conditional CLZ/CTZ expressions.

2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/37780
     * gcc.c-torture/execute/pr37780.c: New test.
     * gcc.target/aarch64/pr37780_1.c: Likewise.
     * gcc.target/arm/pr37780_1.c: Likewise.

[-- Attachment #2: ifcvt-clz.patch --]
[-- Type: text/x-patch, Size: 7829 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 4949965c9dc771bbd2f219fa72bdace3d40424da..80af4a84363192879cc49ea45f777fc987fda555 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -817,6 +817,7 @@ struct noce_if_info
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
 static int noce_try_move (struct noce_if_info *);
+static int noce_try_ifelse_collapse (struct noce_if_info *);
 static int noce_try_store_flag (struct noce_if_info *);
 static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
@@ -1120,6 +1121,37 @@ noce_try_move (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Try forming an IF_THEN_ELSE (cond, b, a) and collapsing that
+   through simplify_rtx.  Sometimes that can eliminate the IF_THEN_ELSE.
+   If that is the case, emit the result into x.  */
+
+static int
+noce_try_ifelse_collapse (struct noce_if_info * if_info)
+{
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
+  machine_mode mode = GET_MODE (if_info->x);
+  rtx if_then_else = simplify_gen_ternary (IF_THEN_ELSE, mode, mode,
+					    if_info->cond, if_info->b,
+					    if_info->a);
+
+  if (GET_CODE (if_then_else) == IF_THEN_ELSE)
+    return FALSE;
+
+  rtx_insn *seq;
+  start_sequence ();
+  noce_emit_move_insn (if_info->x, if_then_else);
+  seq = end_ifcvt_sequence (if_info);
+  if (!seq)
+    return FALSE;
+
+  emit_insn_before_setloc (seq, if_info->jump,
+			  INSN_LOCATION (if_info->insn_a));
+  return TRUE;
+}
+
+
 /* Convert "if (test) x = 1; else x = 0".
 
    Only try 0 and STORE_FLAG_VALUE here.  Other combinations will be
@@ -3493,6 +3525,8 @@ noce_process_if_block (struct noce_if_info *if_info)
 
   if (noce_try_move (if_info))
     goto success;
+  if (noce_try_ifelse_collapse (if_info))
+    goto success;
   if (noce_try_store_flag (if_info))
     goto success;
   if (noce_try_bitop (if_info))
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index eb73b9def1541b537e98ff2251e7a1d4ecf175be..0f750bb08e817ba630d92566c6a991f81e095304 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5280,6 +5280,49 @@ simplify_const_relational_operation (enum rtx_code code,
 
   return 0;
 }
+
+/* Recognize expressions of the form (X CMP 0) ? VAL : OP (X)
+   where OP is CLZ or CTZ and VAL is the value from CLZ_DEFINED_VALUE_AT_ZERO
+   or CTZ_DEFINED_VALUE_AT_ZERO respectively and return OP (X) if the expression
+   can be simplified to that or NULL_RTX if not.
+   Assume X is compared against zero with CMP_CODE and the true
+   arm is TRUE_VAL and the false arm is FALSE_VAL.  */
+
+static rtx
+simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
+{
+  if (cmp_code != EQ && cmp_code != NE)
+    return NULL_RTX;
+
+  /* Result on X == 0 and X !=0 respectively.  */
+  rtx on_zero, on_nonzero;
+  if (cmp_code == EQ)
+    {
+      on_zero = true_val;
+      on_nonzero = false_val;
+    }
+  else
+    {
+      on_zero = false_val;
+      on_nonzero = true_val;
+    }
+
+  rtx_code op_code = GET_CODE (on_nonzero);
+  if ((op_code != CLZ && op_code != CTZ)
+      || !rtx_equal_p (XEXP (on_nonzero, 0), x)
+      || !CONST_INT_P (on_zero))
+    return NULL_RTX;
+
+  HOST_WIDE_INT op_val;
+  machine_mode mode = GET_MODE (on_nonzero);
+  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
+	|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
+      && op_val == INTVAL (on_zero))
+    return on_nonzero;
+
+  return NULL_RTX;
+}
+
 \f
 /* Simplify CODE, an operation with result mode MODE and three operands,
    OP0, OP1, and OP2.  OP0_MODE was the mode of OP0 before it became
@@ -5413,6 +5456,19 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 	    }
 	}
 
+      /* Convert x == 0 ? N : clz (x) into clz (x) when
+	 CLZ_DEFINED_VALUE_AT_ZERO is defined to N for the mode of x.
+	 Similarly for ctz (x).  */
+      if (COMPARISON_P (op0) && !side_effects_p (op0)
+	  && XEXP (op0, 1) == const0_rtx)
+	{
+	  rtx simplified
+	    = simplify_cond_clz_ctz (XEXP (op0, 0), GET_CODE (op0),
+				     op1, op2);
+	  if (simplified)
+	    return simplified;
+	}
+
       if (COMPARISON_P (op0) && ! side_effects_p (op0))
 	{
 	  machine_mode cmp_mode = (GET_MODE (XEXP (op0, 0)) == VOIDmode
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr37780.c b/gcc/testsuite/gcc.c-torture/execute/pr37780.c
new file mode 100644
index 0000000000000000000000000000000000000000..a9eca68786ec8bbf98f010da22302d7c0297a766
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr37780.c
@@ -0,0 +1,49 @@
+/* PR middle-end/37780.  */
+
+#define VAL (8 * sizeof (int))
+
+int __attribute__ ((noinline, noclone))
+fooctz (int i)
+{
+  return (i == 0) ? VAL : __builtin_ctz (i);
+}
+
+int __attribute__ ((noinline, noclone))
+fooctz2 (int i)
+{
+  return (i != 0) ? __builtin_ctz (i) : VAL;
+}
+
+unsigned int __attribute__ ((noinline, noclone))
+fooctz3 (unsigned int i)
+{
+  return (i > 0) ?  __builtin_ctz (i) : VAL;
+}
+
+int __attribute__ ((noinline, noclone))
+fooclz (int i)
+{
+  return (i == 0) ? VAL : __builtin_clz (i);
+}
+
+int __attribute__ ((noinline, noclone))
+fooclz2 (int i)
+{
+  return (i != 0) ? __builtin_clz (i) : VAL;
+}
+
+unsigned int __attribute__ ((noinline, noclone))
+fooclz3 (unsigned int i)
+{
+  return (i > 0) ? __builtin_clz (i) : VAL;
+}
+
+int
+main (void)
+{
+  if (fooctz (0) != VAL || fooctz2 (0) != VAL || fooctz3 (0) != VAL
+      || fooclz (0) != VAL || fooclz2 (0) != VAL || fooclz3 (0) != VAL)
+    __builtin_abort ();
+
+  return 0;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/pr37780_1.c b/gcc/testsuite/gcc.target/aarch64/pr37780_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..97027e7479cfe284322249f8a4edc7d3da8ff644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr37780_1.c
@@ -0,0 +1,46 @@
+/* Test that we can remove the conditional move due to CLZ
+   and CTZ being defined at zero.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+fooctz (int i)
+{
+  return (i == 0) ? 32 : __builtin_ctz (i);
+}
+
+int
+fooctz2 (int i)
+{
+  return (i != 0) ? __builtin_ctz (i) : 32;
+}
+
+unsigned int
+fooctz3 (unsigned int i)
+{
+  return (i > 0) ?  __builtin_ctz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
+
+int
+fooclz (int i)
+{
+  return (i == 0) ? 32 : __builtin_clz (i);
+}
+
+int
+fooclz2 (int i)
+{
+  return (i != 0) ? __builtin_clz (i) : 32;
+}
+
+unsigned int
+fooclz3 (unsigned int i)
+{
+  return (i > 0) ? __builtin_clz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "clz\t" 6 } } */
+/* { dg-final { scan-assembler-not "cmp\t.*0" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr37780_1.c b/gcc/testsuite/gcc.target/arm/pr37780_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..b954fe5ceb89690feaa1098d0471b518bcbe0631
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr37780_1.c
@@ -0,0 +1,47 @@
+/* Test that we can remove the conditional move due to CLZ
+   being defined at zero.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-options "-O2" } */
+
+int
+fooctz (int i)
+{
+  return (i == 0) ? 32 : __builtin_ctz (i);
+}
+
+int
+fooctz2 (int i)
+{
+  return (i != 0) ? __builtin_ctz (i) : 32;
+}
+
+unsigned int
+fooctz3 (unsigned int i)
+{
+  return (i > 0) ?  __builtin_ctz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
+
+int
+fooclz (int i)
+{
+  return (i == 0) ? 32 : __builtin_clz (i);
+}
+
+int
+fooclz2 (int i)
+{
+  return (i != 0) ? __builtin_clz (i) : 32;
+}
+
+unsigned int
+fooclz3 (unsigned int i)
+{
+  return (i > 0) ? __builtin_clz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "clz\t" 6 } } */
+/* { dg-final { scan-assembler-not "cmp\t.*0" } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-05-26 13:55 [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out Kyrill Tkachov
@ 2016-06-06 14:36 ` Kyrill Tkachov
  2016-06-06 14:44 ` Bernd Schmidt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2016-06-06 14:36 UTC (permalink / raw)
  To: GCC Patches

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02080.html

Thanks,
Kyrill

On 26/05/16 10:53, Kyrill Tkachov wrote:
> Hi all,
>
> In this PR we want to optimise:
> int foo (int i)
> {
>   return (i == 0) ? N : __builtin_clz (i);
> }
>
> on targets where CLZ is defined at zero to the constant 'N'.
> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO macro.
> The obvious place to implement this would be in combine through simplify-rtx where we'd
> recognise an IF_THEN_ELSE of the form:
> (set (reg:SI r1)
>      (if_then_else:SI (ne (reg:SI r2)
>                           (const_int 0 [0]))
>        (clz:SI (reg:SI r2))
>        (const_int 32)))
>
> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify it into
> just (clz:SI (reg:SI r2)).
> However, I found this doesn't quite happen for a couple of reasons:
> 1) This depends on ifcvt or some other pass to have created a conditional move of the
> two branches that provide the IF_THEN_ELSE to propagate the const_int and clz operation into.
>
> 2) Combine will refuse to propagate r2 from the above example into both the condition and the
> CLZ at the same time, so the most we see is:
> (set (reg:SI r1)
>      (if_then_else:SI (ne (reg:CC cc)
>             (const_int 0))
>        (clz:SI (reg:SI r2))
>        (const_int 32)))
>
> which is not enough information to perform the simplification.
>
> This patch implements the optimisation in ce1 using the noce ifcvt framework.
> During ifcvt noce_process_if_block can see that we're trying to optimise something
> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all the information
> needed to perform the transformation.
>
> The transformation is performed by adding a new noce_try* function that tries to put the
> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to simplify that
> using the simplify-rtx machinery. That way, we can implement the simplification logic in
> simplify-rtx.c where it belongs.
>
> A similar transformation for CTZ is implemented as well.
> So for code:
> int foo (int i)
> {
>   return (i == 0) ? 32 : __builtin_clz (i);
> }
>
> On aarch64 we now emit:
> foo:
>         clz     w0, w0
>         ret
>
> instead of:
> foo:
>         mov     w1, 32
>         clz     w2, w0
>         cmp     w0, 0
>         csel    w0, w2, w1, ne
>         ret
>
> and for arm similarly we generate:
> foo:
>         clz     r0, r0
>         bx      lr
>
> instead of:
> foo:
>         cmp     r0, #0
>         clzne   r0, r0
>         moveq   r0, #32
>         bx      lr
>
>
> and for x86_64 with -O2 -mlzcnt we generate:
> foo:
>         xorl    %eax, %eax
>         lzcntl  %edi, %eax
>         ret
>
> instead of:
> foo:
>         xorl    %eax, %eax
>         movl    $32, %edx
>         lzcntl  %edi, %eax
>         testl   %edi, %edi
>         cmove   %edx, %eax
>         ret
>
>
> I tried getting this to work on other targets as well, but encountered difficulties.
> For example on powerpc the two arms of the condition seen during ifcvt are:
>
> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>         (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>      (nil))
> and
> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>         (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
>      (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>         (nil)))
>
> So the setup code in noce_process_if_block sees that the set destination is not the same
> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
> I suppose that's a consequence of how SImode operations are represented in early RTL
> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be taught to handle
> destinations that are subregs of one another, but that would be a separate patch.
>
> Anyway, is this patch ok for trunk?
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, x86_64-pc-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * ifcvt.c (noce_try_ifelse_collapse): New function.
>     Declare prototype.
>     (noce_process_if_block): Call noce_try_ifelse_collapse.
>     * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>     (simplify_ternary_operation): Use the above to simplify
>     conditional CLZ/CTZ expressions.
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * gcc.c-torture/execute/pr37780.c: New test.
>     * gcc.target/aarch64/pr37780_1.c: Likewise.
>     * gcc.target/arm/pr37780_1.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-05-26 13:55 [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out Kyrill Tkachov
  2016-06-06 14:36 ` Kyrill Tkachov
@ 2016-06-06 14:44 ` Bernd Schmidt
  2016-06-07  7:37 ` Andreas Schwab
  2016-06-07 19:35 ` Christophe Lyon
  3 siblings, 0 replies; 12+ messages in thread
From: Bernd Schmidt @ 2016-06-06 14:44 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 05/26/2016 11:53 AM, Kyrill Tkachov wrote:
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * ifcvt.c (noce_try_ifelse_collapse): New function.
>     Declare prototype.
>     (noce_process_if_block): Call noce_try_ifelse_collapse.
>     * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>     (simplify_ternary_operation): Use the above to simplify
>     conditional CLZ/CTZ expressions.
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * gcc.c-torture/execute/pr37780.c: New test.
>     * gcc.target/aarch64/pr37780_1.c: Likewise.
>     * gcc.target/arm/pr37780_1.c: Likewise.

Nice. Ok.


Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-05-26 13:55 [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out Kyrill Tkachov
  2016-06-06 14:36 ` Kyrill Tkachov
  2016-06-06 14:44 ` Bernd Schmidt
@ 2016-06-07  7:37 ` Andreas Schwab
  2016-06-07 16:47   ` Kyrill Tkachov
  2016-06-07 19:35 ` Christophe Lyon
  3 siblings, 1 reply; 12+ messages in thread
From: Andreas Schwab @ 2016-06-07  7:37 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> writes:

> +static rtx
> +simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
> +{
> +  if (cmp_code != EQ && cmp_code != NE)
> +    return NULL_RTX;
> +
> +  /* Result on X == 0 and X !=0 respectively.  */
> +  rtx on_zero, on_nonzero;
> +  if (cmp_code == EQ)
> +    {
> +      on_zero = true_val;
> +      on_nonzero = false_val;
> +    }
> +  else
> +    {
> +      on_zero = false_val;
> +      on_nonzero = true_val;
> +    }
> +
> +  rtx_code op_code = GET_CODE (on_nonzero);
> +  if ((op_code != CLZ && op_code != CTZ)
> +      || !rtx_equal_p (XEXP (on_nonzero, 0), x)
> +      || !CONST_INT_P (on_zero))
> +    return NULL_RTX;
> +
> +  HOST_WIDE_INT op_val;
> +  machine_mode mode = GET_MODE (on_nonzero);
> +  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
> +	|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
> +      && op_val == INTVAL (on_zero))
> +    return on_nonzero;
> +
> +  return NULL_RTX;
> +}

../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, rtx_code, rtx, rtx)':
../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' [-Werror=unused-variable]
   machine_mode mode = GET_MODE (on_nonzero);
                ^~~~

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-06-07  7:37 ` Andreas Schwab
@ 2016-06-07 16:47   ` Kyrill Tkachov
  2016-06-07 16:51     ` Kyrill Tkachov
  0 siblings, 1 reply; 12+ messages in thread
From: Kyrill Tkachov @ 2016-06-07 16:47 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]


On 07/06/16 08:37, Andreas Schwab wrote:
> Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> writes:
>
>> +static rtx
>> +simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
>> +{
>> +  if (cmp_code != EQ && cmp_code != NE)
>> +    return NULL_RTX;
>> +
>> +  /* Result on X == 0 and X !=0 respectively.  */
>> +  rtx on_zero, on_nonzero;
>> +  if (cmp_code == EQ)
>> +    {
>> +      on_zero = true_val;
>> +      on_nonzero = false_val;
>> +    }
>> +  else
>> +    {
>> +      on_zero = false_val;
>> +      on_nonzero = true_val;
>> +    }
>> +
>> +  rtx_code op_code = GET_CODE (on_nonzero);
>> +  if ((op_code != CLZ && op_code != CTZ)
>> +      || !rtx_equal_p (XEXP (on_nonzero, 0), x)
>> +      || !CONST_INT_P (on_zero))
>> +    return NULL_RTX;
>> +
>> +  HOST_WIDE_INT op_val;
>> +  machine_mode mode = GET_MODE (on_nonzero);
>> +  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
>> +	|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
>> +      && op_val == INTVAL (on_zero))
>> +    return on_nonzero;
>> +
>> +  return NULL_RTX;
>> +}
> ../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, rtx_code, rtx, rtx)':
> ../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' [-Werror=unused-variable]
>     machine_mode mode = GET_MODE (on_nonzero);
>                  ^~~~
>
> Andreas.

That must be on targets that don't define CLZ_DEFINED_VALUE_AT_ZERO or CTZ_DEFINED_VALUE_AT_ZERO.
defaults.h then defines them to just "0".
I think GCC shouldn't warn in these cases as 'mode' is used in the function
it's defined, it's just the macro where it's used as an argument discards it.
Anyway, this patch removes that variable and fixes the warning.

Tested on arm-none-eabi after manually undefining CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO to reproduce
the warning and checking that it's fixed.
Committing as obvious in the interest of fixing bootstrap on the relevant targets.

Thanks,
Kyrill


[-- Attachment #2: simplify-unused.patch --]
[-- Type: text/x-patch, Size: 776 bytes --]

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 0f750bb08e817ba630d92566c6a991f81e095304..21adfcb531f3f3379653c67145ab0023c978fd82 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5314,9 +5314,10 @@ simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
     return NULL_RTX;
 
   HOST_WIDE_INT op_val;
-  machine_mode mode = GET_MODE (on_nonzero);
-  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
-	|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
+  if (((op_code == CLZ
+	&& CLZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val))
+      || (op_code == CTZ
+	  && CTZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val)))
       && op_val == INTVAL (on_zero))
     return on_nonzero;
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-06-07 16:47   ` Kyrill Tkachov
@ 2016-06-07 16:51     ` Kyrill Tkachov
  0 siblings, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2016-06-07 16:51 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: GCC Patches


On 07/06/16 17:47, Kyrill Tkachov wrote:
>
> On 07/06/16 08:37, Andreas Schwab wrote:
>> Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> writes:
>>
>>> +static rtx
>>> +simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
>>> +{
>>> +  if (cmp_code != EQ && cmp_code != NE)
>>> +    return NULL_RTX;
>>> +
>>> +  /* Result on X == 0 and X !=0 respectively.  */
>>> +  rtx on_zero, on_nonzero;
>>> +  if (cmp_code == EQ)
>>> +    {
>>> +      on_zero = true_val;
>>> +      on_nonzero = false_val;
>>> +    }
>>> +  else
>>> +    {
>>> +      on_zero = false_val;
>>> +      on_nonzero = true_val;
>>> +    }
>>> +
>>> +  rtx_code op_code = GET_CODE (on_nonzero);
>>> +  if ((op_code != CLZ && op_code != CTZ)
>>> +      || !rtx_equal_p (XEXP (on_nonzero, 0), x)
>>> +      || !CONST_INT_P (on_zero))
>>> +    return NULL_RTX;
>>> +
>>> +  HOST_WIDE_INT op_val;
>>> +  machine_mode mode = GET_MODE (on_nonzero);
>>> +  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
>>> +    || (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
>>> +      && op_val == INTVAL (on_zero))
>>> +    return on_nonzero;
>>> +
>>> +  return NULL_RTX;
>>> +}
>> ../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, rtx_code, rtx, rtx)':
>> ../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' [-Werror=unused-variable]
>>     machine_mode mode = GET_MODE (on_nonzero);
>>                  ^~~~
>>
>> Andreas.
>
> That must be on targets that don't define CLZ_DEFINED_VALUE_AT_ZERO or CTZ_DEFINED_VALUE_AT_ZERO.
> defaults.h then defines them to just "0".
> I think GCC shouldn't warn in these cases as 'mode' is used in the function
> it's defined, it's just the macro where it's used as an argument discards it.
> Anyway, this patch removes that variable and fixes the warning.
>
> Tested on arm-none-eabi after manually undefining CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO to reproduce
> the warning and checking that it's fixed.
> Committing as obvious in the interest of fixing bootstrap on the relevant targets.
>

With the ChangeLog entry:

2016-06-07  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * simplify-rtx.c (simplify_cond_clz_ctz): Delete 'mode' local
     variable.

> Thanks,
> Kyrill
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-05-26 13:55 [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out Kyrill Tkachov
                   ` (2 preceding siblings ...)
  2016-06-07  7:37 ` Andreas Schwab
@ 2016-06-07 19:35 ` Christophe Lyon
  2016-06-08 16:40   ` Kyrill Tkachov
  3 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2016-06-07 19:35 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> In this PR we want to optimise:
> int foo (int i)
> {
>   return (i == 0) ? N : __builtin_clz (i);
> }
>
> on targets where CLZ is defined at zero to the constant 'N'.
> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
> macro.
> The obvious place to implement this would be in combine through simplify-rtx
> where we'd
> recognise an IF_THEN_ELSE of the form:
> (set (reg:SI r1)
>      (if_then_else:SI (ne (reg:SI r2)
>                           (const_int 0 [0]))
>        (clz:SI (reg:SI r2))
>        (const_int 32)))
>
> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify
> it into
> just (clz:SI (reg:SI r2)).
> However, I found this doesn't quite happen for a couple of reasons:
> 1) This depends on ifcvt or some other pass to have created a conditional
> move of the
> two branches that provide the IF_THEN_ELSE to propagate the const_int and
> clz operation into.
>
> 2) Combine will refuse to propagate r2 from the above example into both the
> condition and the
> CLZ at the same time, so the most we see is:
> (set (reg:SI r1)
>      (if_then_else:SI (ne (reg:CC cc)
>             (const_int 0))
>        (clz:SI (reg:SI r2))
>        (const_int 32)))
>
> which is not enough information to perform the simplification.
>
> This patch implements the optimisation in ce1 using the noce ifcvt
> framework.
> During ifcvt noce_process_if_block can see that we're trying to optimise
> something
> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all
> the information
> needed to perform the transformation.
>
> The transformation is performed by adding a new noce_try* function that
> tries to put the
> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to
> simplify that
> using the simplify-rtx machinery. That way, we can implement the
> simplification logic in
> simplify-rtx.c where it belongs.
>
> A similar transformation for CTZ is implemented as well.
> So for code:
> int foo (int i)
> {
>   return (i == 0) ? 32 : __builtin_clz (i);
> }
>
> On aarch64 we now emit:
> foo:
>         clz     w0, w0
>         ret
>
> instead of:
> foo:
>         mov     w1, 32
>         clz     w2, w0
>         cmp     w0, 0
>         csel    w0, w2, w1, ne
>         ret
>
> and for arm similarly we generate:
> foo:
>         clz     r0, r0
>         bx      lr
>
> instead of:
> foo:
>         cmp     r0, #0
>         clzne   r0, r0
>         moveq   r0, #32
>         bx      lr
>
>
> and for x86_64 with -O2 -mlzcnt we generate:
> foo:
>         xorl    %eax, %eax
>         lzcntl  %edi, %eax
>         ret
>
> instead of:
> foo:
>         xorl    %eax, %eax
>         movl    $32, %edx
>         lzcntl  %edi, %eax
>         testl   %edi, %edi
>         cmove   %edx, %eax
>         ret
>
>
> I tried getting this to work on other targets as well, but encountered
> difficulties.
> For example on powerpc the two arms of the condition seen during ifcvt are:
>
> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>         (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>      (nil))
> and
> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>         (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
>      (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>         (nil)))
>
> So the setup code in noce_process_if_block sees that the set destination is
> not the same
> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
> I suppose that's a consequence of how SImode operations are represented in
> early RTL
> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be
> taught to handle
> destinations that are subregs of one another, but that would be a separate
> patch.
>
> Anyway, is this patch ok for trunk?
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * ifcvt.c (noce_try_ifelse_collapse): New function.
>     Declare prototype.
>     (noce_process_if_block): Call noce_try_ifelse_collapse.
>     * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>     (simplify_ternary_operation): Use the above to simplify
>     conditional CLZ/CTZ expressions.
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * gcc.c-torture/execute/pr37780.c: New test.
>     * gcc.target/aarch64/pr37780_1.c: Likewise.
>     * gcc.target/arm/pr37780_1.c: Likewise.

Hi Kyrylo,

I've noticed that gcc.target/arm/pr37780_1.c fails on
arm if arch < v6.
I first tried to fix the effective-target guard (IIRC, the doc
says clz is available starting with v5t), but that isn't sufficient.

When compiling for armv5-t, the scan-assembler directives
fail. It seems to work with v6t2, so I am wondering whether
it's just a matter of increasing the effective-target arch version,
or if you really intended to make the test pass on these old
architectures?

Thanks,

Christophe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-06-07 19:35 ` Christophe Lyon
@ 2016-06-08 16:40   ` Kyrill Tkachov
  2016-06-09 12:15     ` Christophe Lyon
  0 siblings, 1 reply; 12+ messages in thread
From: Kyrill Tkachov @ 2016-06-08 16:40 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: GCC Patches


On 07/06/16 20:34, Christophe Lyon wrote:
> On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi all,
>>
>> In this PR we want to optimise:
>> int foo (int i)
>> {
>>    return (i == 0) ? N : __builtin_clz (i);
>> }
>>
>> on targets where CLZ is defined at zero to the constant 'N'.
>> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
>> macro.
>> The obvious place to implement this would be in combine through simplify-rtx
>> where we'd
>> recognise an IF_THEN_ELSE of the form:
>> (set (reg:SI r1)
>>       (if_then_else:SI (ne (reg:SI r2)
>>                            (const_int 0 [0]))
>>         (clz:SI (reg:SI r2))
>>         (const_int 32)))
>>
>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify
>> it into
>> just (clz:SI (reg:SI r2)).
>> However, I found this doesn't quite happen for a couple of reasons:
>> 1) This depends on ifcvt or some other pass to have created a conditional
>> move of the
>> two branches that provide the IF_THEN_ELSE to propagate the const_int and
>> clz operation into.
>>
>> 2) Combine will refuse to propagate r2 from the above example into both the
>> condition and the
>> CLZ at the same time, so the most we see is:
>> (set (reg:SI r1)
>>       (if_then_else:SI (ne (reg:CC cc)
>>              (const_int 0))
>>         (clz:SI (reg:SI r2))
>>         (const_int 32)))
>>
>> which is not enough information to perform the simplification.
>>
>> This patch implements the optimisation in ce1 using the noce ifcvt
>> framework.
>> During ifcvt noce_process_if_block can see that we're trying to optimise
>> something
>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all
>> the information
>> needed to perform the transformation.
>>
>> The transformation is performed by adding a new noce_try* function that
>> tries to put the
>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to
>> simplify that
>> using the simplify-rtx machinery. That way, we can implement the
>> simplification logic in
>> simplify-rtx.c where it belongs.
>>
>> A similar transformation for CTZ is implemented as well.
>> So for code:
>> int foo (int i)
>> {
>>    return (i == 0) ? 32 : __builtin_clz (i);
>> }
>>
>> On aarch64 we now emit:
>> foo:
>>          clz     w0, w0
>>          ret
>>
>> instead of:
>> foo:
>>          mov     w1, 32
>>          clz     w2, w0
>>          cmp     w0, 0
>>          csel    w0, w2, w1, ne
>>          ret
>>
>> and for arm similarly we generate:
>> foo:
>>          clz     r0, r0
>>          bx      lr
>>
>> instead of:
>> foo:
>>          cmp     r0, #0
>>          clzne   r0, r0
>>          moveq   r0, #32
>>          bx      lr
>>
>>
>> and for x86_64 with -O2 -mlzcnt we generate:
>> foo:
>>          xorl    %eax, %eax
>>          lzcntl  %edi, %eax
>>          ret
>>
>> instead of:
>> foo:
>>          xorl    %eax, %eax
>>          movl    $32, %edx
>>          lzcntl  %edi, %eax
>>          testl   %edi, %edi
>>          cmove   %edx, %eax
>>          ret
>>
>>
>> I tried getting this to work on other targets as well, but encountered
>> difficulties.
>> For example on powerpc the two arms of the condition seen during ifcvt are:
>>
>> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>>          (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>>       (nil))
>> and
>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>>          (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
>>       (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>>          (nil)))
>>
>> So the setup code in noce_process_if_block sees that the set destination is
>> not the same
>> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
>> I suppose that's a consequence of how SImode operations are represented in
>> early RTL
>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be
>> taught to handle
>> destinations that are subregs of one another, but that would be a separate
>> patch.
>>
>> Anyway, is this patch ok for trunk?
>>
>> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
>> x86_64-pc-linux-gnu.
>>
>> Thanks,
>> Kyrill
>>
>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR middle-end/37780
>>      * ifcvt.c (noce_try_ifelse_collapse): New function.
>>      Declare prototype.
>>      (noce_process_if_block): Call noce_try_ifelse_collapse.
>>      * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>>      (simplify_ternary_operation): Use the above to simplify
>>      conditional CLZ/CTZ expressions.
>>
>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR middle-end/37780
>>      * gcc.c-torture/execute/pr37780.c: New test.
>>      * gcc.target/aarch64/pr37780_1.c: Likewise.
>>      * gcc.target/arm/pr37780_1.c: Likewise.
> Hi Kyrylo,

Hi Christophe,

> I've noticed that gcc.target/arm/pr37780_1.c fails on
> arm if arch < v6.
> I first tried to fix the effective-target guard (IIRC, the doc
> says clz is available starting with v5t), but that isn't sufficient.
>
> When compiling for armv5-t, the scan-assembler directives
> fail. It seems to work with v6t2, so I am wondering whether
> it's just a matter of increasing the effective-target arch version,
> or if you really intended to make the test pass on these old
> architectures?

I've dug into it a bit.
I think the problem is that CLZ is available with ARMv5T but only in arm mode.
In Thumb mode it's only available with ARMv6T2.
So if your armv5t gcc compiles for Thumb by default you'll get the failure.
I think just bumping the effective to armv6t2 here is appropriate as I intended
the test to just check the ifcvt transformation when the instruction is available
rather than test that the CLZ instruction is available at one architecture level
or another.

A patch to bump the effective target check is pre-approved if you want to do it.
Otherwise I'll get to it in a few days.

Thanks for spotting this,
Kyrill
> Thanks,
>
> Christophe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-06-08 16:40   ` Kyrill Tkachov
@ 2016-06-09 12:15     ` Christophe Lyon
  2016-06-10  8:38       ` Kyrill Tkachov
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2016-06-09 12:15 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On 8 June 2016 at 18:40, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>
> On 07/06/16 20:34, Christophe Lyon wrote:
>>
>> On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> In this PR we want to optimise:
>>> int foo (int i)
>>> {
>>>    return (i == 0) ? N : __builtin_clz (i);
>>> }
>>>
>>> on targets where CLZ is defined at zero to the constant 'N'.
>>> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
>>> macro.
>>> The obvious place to implement this would be in combine through
>>> simplify-rtx
>>> where we'd
>>> recognise an IF_THEN_ELSE of the form:
>>> (set (reg:SI r1)
>>>       (if_then_else:SI (ne (reg:SI r2)
>>>                            (const_int 0 [0]))
>>>         (clz:SI (reg:SI r2))
>>>         (const_int 32)))
>>>
>>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd
>>> simplify
>>> it into
>>> just (clz:SI (reg:SI r2)).
>>> However, I found this doesn't quite happen for a couple of reasons:
>>> 1) This depends on ifcvt or some other pass to have created a conditional
>>> move of the
>>> two branches that provide the IF_THEN_ELSE to propagate the const_int and
>>> clz operation into.
>>>
>>> 2) Combine will refuse to propagate r2 from the above example into both
>>> the
>>> condition and the
>>> CLZ at the same time, so the most we see is:
>>> (set (reg:SI r1)
>>>       (if_then_else:SI (ne (reg:CC cc)
>>>              (const_int 0))
>>>         (clz:SI (reg:SI r2))
>>>         (const_int 32)))
>>>
>>> which is not enough information to perform the simplification.
>>>
>>> This patch implements the optimisation in ce1 using the noce ifcvt
>>> framework.
>>> During ifcvt noce_process_if_block can see that we're trying to optimise
>>> something
>>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of
>>> all
>>> the information
>>> needed to perform the transformation.
>>>
>>> The transformation is performed by adding a new noce_try* function that
>>> tries to put the
>>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try
>>> to
>>> simplify that
>>> using the simplify-rtx machinery. That way, we can implement the
>>> simplification logic in
>>> simplify-rtx.c where it belongs.
>>>
>>> A similar transformation for CTZ is implemented as well.
>>> So for code:
>>> int foo (int i)
>>> {
>>>    return (i == 0) ? 32 : __builtin_clz (i);
>>> }
>>>
>>> On aarch64 we now emit:
>>> foo:
>>>          clz     w0, w0
>>>          ret
>>>
>>> instead of:
>>> foo:
>>>          mov     w1, 32
>>>          clz     w2, w0
>>>          cmp     w0, 0
>>>          csel    w0, w2, w1, ne
>>>          ret
>>>
>>> and for arm similarly we generate:
>>> foo:
>>>          clz     r0, r0
>>>          bx      lr
>>>
>>> instead of:
>>> foo:
>>>          cmp     r0, #0
>>>          clzne   r0, r0
>>>          moveq   r0, #32
>>>          bx      lr
>>>
>>>
>>> and for x86_64 with -O2 -mlzcnt we generate:
>>> foo:
>>>          xorl    %eax, %eax
>>>          lzcntl  %edi, %eax
>>>          ret
>>>
>>> instead of:
>>> foo:
>>>          xorl    %eax, %eax
>>>          movl    $32, %edx
>>>          lzcntl  %edi, %eax
>>>          testl   %edi, %edi
>>>          cmove   %edx, %eax
>>>          ret
>>>
>>>
>>> I tried getting this to work on other targets as well, but encountered
>>> difficulties.
>>> For example on powerpc the two arms of the condition seen during ifcvt
>>> are:
>>>
>>> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>>>          (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>>>       (nil))
>>> and
>>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>>>          (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132
>>> {clzsi2}
>>>       (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>>>          (nil)))
>>>
>>> So the setup code in noce_process_if_block sees that the set destination
>>> is
>>> not the same
>>> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ])
>>> 0))
>>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
>>> I suppose that's a consequence of how SImode operations are represented
>>> in
>>> early RTL
>>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can
>>> be
>>> taught to handle
>>> destinations that are subregs of one another, but that would be a
>>> separate
>>> patch.
>>>
>>> Anyway, is this patch ok for trunk?
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>>> aarch64-none-linux-gnu,
>>> x86_64-pc-linux-gnu.
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      PR middle-end/37780
>>>      * ifcvt.c (noce_try_ifelse_collapse): New function.
>>>      Declare prototype.
>>>      (noce_process_if_block): Call noce_try_ifelse_collapse.
>>>      * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>>>      (simplify_ternary_operation): Use the above to simplify
>>>      conditional CLZ/CTZ expressions.
>>>
>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      PR middle-end/37780
>>>      * gcc.c-torture/execute/pr37780.c: New test.
>>>      * gcc.target/aarch64/pr37780_1.c: Likewise.
>>>      * gcc.target/arm/pr37780_1.c: Likewise.
>>
>> Hi Kyrylo,
>
>
> Hi Christophe,
>
>> I've noticed that gcc.target/arm/pr37780_1.c fails on
>> arm if arch < v6.
>> I first tried to fix the effective-target guard (IIRC, the doc
>> says clz is available starting with v5t), but that isn't sufficient.
>>
>> When compiling for armv5-t, the scan-assembler directives
>> fail. It seems to work with v6t2, so I am wondering whether
>> it's just a matter of increasing the effective-target arch version,
>> or if you really intended to make the test pass on these old
>> architectures?
>
>
> I've dug into it a bit.
> I think the problem is that CLZ is available with ARMv5T but only in arm
> mode.
> In Thumb mode it's only available with ARMv6T2.
> So if your armv5t gcc compiles for Thumb by default you'll get the failure.

Actually this gcc is configured with default cpu & mode, target=arm-none-eabi.

So I think it's arm7tdmi, arm mode.

> I think just bumping the effective to armv6t2 here is appropriate as I
> intended
> the test to just check the ifcvt transformation when the instruction is
> available
> rather than test that the CLZ instruction is available at one architecture
> level
> or another.
>
> A patch to bump the effective target check is pre-approved if you want to do
> it.
> Otherwise I'll get to it in a few days.
>
I was a about to commit the arm_arch_v5_ok -> arm_arch_v6t2_ok, but
this would not be sufficient without adding
dg-add-options arm_arch_v6t2
but then if gcc is configured with a higher default architecture, we'll end
up testing if works for v6t2 only.

> Thanks for spotting this,
> Kyrill
>>
>> Thanks,
>>
>> Christophe.
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-06-09 12:15     ` Christophe Lyon
@ 2016-06-10  8:38       ` Kyrill Tkachov
  2016-06-10 13:39         ` Christophe Lyon
  0 siblings, 1 reply; 12+ messages in thread
From: Kyrill Tkachov @ 2016-06-10  8:38 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: GCC Patches


On 09/06/16 13:14, Christophe Lyon wrote:
> On 8 June 2016 at 18:40, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>> On 07/06/16 20:34, Christophe Lyon wrote:
>>> On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
>>> wrote:
>>>> Hi all,
>>>>
>>>> In this PR we want to optimise:
>>>> int foo (int i)
>>>> {
>>>>     return (i == 0) ? N : __builtin_clz (i);
>>>> }
>>>>
>>>> on targets where CLZ is defined at zero to the constant 'N'.
>>>> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
>>>> macro.
>>>> The obvious place to implement this would be in combine through
>>>> simplify-rtx
>>>> where we'd
>>>> recognise an IF_THEN_ELSE of the form:
>>>> (set (reg:SI r1)
>>>>        (if_then_else:SI (ne (reg:SI r2)
>>>>                             (const_int 0 [0]))
>>>>          (clz:SI (reg:SI r2))
>>>>          (const_int 32)))
>>>>
>>>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd
>>>> simplify
>>>> it into
>>>> just (clz:SI (reg:SI r2)).
>>>> However, I found this doesn't quite happen for a couple of reasons:
>>>> 1) This depends on ifcvt or some other pass to have created a conditional
>>>> move of the
>>>> two branches that provide the IF_THEN_ELSE to propagate the const_int and
>>>> clz operation into.
>>>>
>>>> 2) Combine will refuse to propagate r2 from the above example into both
>>>> the
>>>> condition and the
>>>> CLZ at the same time, so the most we see is:
>>>> (set (reg:SI r1)
>>>>        (if_then_else:SI (ne (reg:CC cc)
>>>>               (const_int 0))
>>>>          (clz:SI (reg:SI r2))
>>>>          (const_int 32)))
>>>>
>>>> which is not enough information to perform the simplification.
>>>>
>>>> This patch implements the optimisation in ce1 using the noce ifcvt
>>>> framework.
>>>> During ifcvt noce_process_if_block can see that we're trying to optimise
>>>> something
>>>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of
>>>> all
>>>> the information
>>>> needed to perform the transformation.
>>>>
>>>> The transformation is performed by adding a new noce_try* function that
>>>> tries to put the
>>>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try
>>>> to
>>>> simplify that
>>>> using the simplify-rtx machinery. That way, we can implement the
>>>> simplification logic in
>>>> simplify-rtx.c where it belongs.
>>>>
>>>> A similar transformation for CTZ is implemented as well.
>>>> So for code:
>>>> int foo (int i)
>>>> {
>>>>     return (i == 0) ? 32 : __builtin_clz (i);
>>>> }
>>>>
>>>> On aarch64 we now emit:
>>>> foo:
>>>>           clz     w0, w0
>>>>           ret
>>>>
>>>> instead of:
>>>> foo:
>>>>           mov     w1, 32
>>>>           clz     w2, w0
>>>>           cmp     w0, 0
>>>>           csel    w0, w2, w1, ne
>>>>           ret
>>>>
>>>> and for arm similarly we generate:
>>>> foo:
>>>>           clz     r0, r0
>>>>           bx      lr
>>>>
>>>> instead of:
>>>> foo:
>>>>           cmp     r0, #0
>>>>           clzne   r0, r0
>>>>           moveq   r0, #32
>>>>           bx      lr
>>>>
>>>>
>>>> and for x86_64 with -O2 -mlzcnt we generate:
>>>> foo:
>>>>           xorl    %eax, %eax
>>>>           lzcntl  %edi, %eax
>>>>           ret
>>>>
>>>> instead of:
>>>> foo:
>>>>           xorl    %eax, %eax
>>>>           movl    $32, %edx
>>>>           lzcntl  %edi, %eax
>>>>           testl   %edi, %edi
>>>>           cmove   %edx, %eax
>>>>           ret
>>>>
>>>>
>>>> I tried getting this to work on other targets as well, but encountered
>>>> difficulties.
>>>> For example on powerpc the two arms of the condition seen during ifcvt
>>>> are:
>>>>
>>>> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>>>>           (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>>>>        (nil))
>>>> and
>>>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>>>>           (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132
>>>> {clzsi2}
>>>>        (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>>>>           (nil)))
>>>>
>>>> So the setup code in noce_process_if_block sees that the set destination
>>>> is
>>>> not the same
>>>> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ])
>>>> 0))
>>>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
>>>> I suppose that's a consequence of how SImode operations are represented
>>>> in
>>>> early RTL
>>>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can
>>>> be
>>>> taught to handle
>>>> destinations that are subregs of one another, but that would be a
>>>> separate
>>>> patch.
>>>>
>>>> Anyway, is this patch ok for trunk?
>>>>
>>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>>>> aarch64-none-linux-gnu,
>>>> x86_64-pc-linux-gnu.
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       PR middle-end/37780
>>>>       * ifcvt.c (noce_try_ifelse_collapse): New function.
>>>>       Declare prototype.
>>>>       (noce_process_if_block): Call noce_try_ifelse_collapse.
>>>>       * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>>>>       (simplify_ternary_operation): Use the above to simplify
>>>>       conditional CLZ/CTZ expressions.
>>>>
>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       PR middle-end/37780
>>>>       * gcc.c-torture/execute/pr37780.c: New test.
>>>>       * gcc.target/aarch64/pr37780_1.c: Likewise.
>>>>       * gcc.target/arm/pr37780_1.c: Likewise.
>>> Hi Kyrylo,
>>
>> Hi Christophe,
>>
>>> I've noticed that gcc.target/arm/pr37780_1.c fails on
>>> arm if arch < v6.
>>> I first tried to fix the effective-target guard (IIRC, the doc
>>> says clz is available starting with v5t), but that isn't sufficient.
>>>
>>> When compiling for armv5-t, the scan-assembler directives
>>> fail. It seems to work with v6t2, so I am wondering whether
>>> it's just a matter of increasing the effective-target arch version,
>>> or if you really intended to make the test pass on these old
>>> architectures?
>>
>> I've dug into it a bit.
>> I think the problem is that CLZ is available with ARMv5T but only in arm
>> mode.
>> In Thumb mode it's only available with ARMv6T2.
>> So if your armv5t gcc compiles for Thumb by default you'll get the failure.
> Actually this gcc is configured with default cpu & mode, target=arm-none-eabi.
>
> So I think it's arm7tdmi, arm mode.
>
>> I think just bumping the effective to armv6t2 here is appropriate as I
>> intended
>> the test to just check the ifcvt transformation when the instruction is
>> available
>> rather than test that the CLZ instruction is available at one architecture
>> level
>> or another.
>>
>> A patch to bump the effective target check is pre-approved if you want to do
>> it.
>> Otherwise I'll get to it in a few days.
>>
> I was a about to commit the arm_arch_v5_ok -> arm_arch_v6t2_ok, but
> this would not be sufficient without adding
> dg-add-options arm_arch_v6t2
> but then if gcc is configured with a higher default architecture, we'll end
> up testing if works for v6t2 only.

I'm okay with adding "dg-add-options arm_arch_v6t2".
We're not testing the availability of the instruction here, just the ifcvt
transformation when it *is* available, so I think we should just add whatever
option guarantees that instruction is available.

Kyrill

>> Thanks for spotting this,
>> Kyrill
>>> Thanks,
>>>
>>> Christophe.
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-06-10  8:38       ` Kyrill Tkachov
@ 2016-06-10 13:39         ` Christophe Lyon
  2016-06-10 13:41           ` Kyrill Tkachov
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2016-06-10 13:39 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 8063 bytes --]

On 10 June 2016 at 10:38, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>
> On 09/06/16 13:14, Christophe Lyon wrote:
>>
>> On 8 June 2016 at 18:40, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
>> wrote:
>>>
>>> On 07/06/16 20:34, Christophe Lyon wrote:
>>>>
>>>> On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> In this PR we want to optimise:
>>>>> int foo (int i)
>>>>> {
>>>>>     return (i == 0) ? N : __builtin_clz (i);
>>>>> }
>>>>>
>>>>> on targets where CLZ is defined at zero to the constant 'N'.
>>>>> This is determined at the RTL level through the
>>>>> CLZ_DEFINED_VALUE_AT_ZERO
>>>>> macro.
>>>>> The obvious place to implement this would be in combine through
>>>>> simplify-rtx
>>>>> where we'd
>>>>> recognise an IF_THEN_ELSE of the form:
>>>>> (set (reg:SI r1)
>>>>>        (if_then_else:SI (ne (reg:SI r2)
>>>>>                             (const_int 0 [0]))
>>>>>          (clz:SI (reg:SI r2))
>>>>>          (const_int 32)))
>>>>>
>>>>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd
>>>>> simplify
>>>>> it into
>>>>> just (clz:SI (reg:SI r2)).
>>>>> However, I found this doesn't quite happen for a couple of reasons:
>>>>> 1) This depends on ifcvt or some other pass to have created a
>>>>> conditional
>>>>> move of the
>>>>> two branches that provide the IF_THEN_ELSE to propagate the const_int
>>>>> and
>>>>> clz operation into.
>>>>>
>>>>> 2) Combine will refuse to propagate r2 from the above example into both
>>>>> the
>>>>> condition and the
>>>>> CLZ at the same time, so the most we see is:
>>>>> (set (reg:SI r1)
>>>>>        (if_then_else:SI (ne (reg:CC cc)
>>>>>               (const_int 0))
>>>>>          (clz:SI (reg:SI r2))
>>>>>          (const_int 32)))
>>>>>
>>>>> which is not enough information to perform the simplification.
>>>>>
>>>>> This patch implements the optimisation in ce1 using the noce ifcvt
>>>>> framework.
>>>>> During ifcvt noce_process_if_block can see that we're trying to
>>>>> optimise
>>>>> something
>>>>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of
>>>>> all
>>>>> the information
>>>>> needed to perform the transformation.
>>>>>
>>>>> The transformation is performed by adding a new noce_try* function that
>>>>> tries to put the
>>>>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and
>>>>> try
>>>>> to
>>>>> simplify that
>>>>> using the simplify-rtx machinery. That way, we can implement the
>>>>> simplification logic in
>>>>> simplify-rtx.c where it belongs.
>>>>>
>>>>> A similar transformation for CTZ is implemented as well.
>>>>> So for code:
>>>>> int foo (int i)
>>>>> {
>>>>>     return (i == 0) ? 32 : __builtin_clz (i);
>>>>> }
>>>>>
>>>>> On aarch64 we now emit:
>>>>> foo:
>>>>>           clz     w0, w0
>>>>>           ret
>>>>>
>>>>> instead of:
>>>>> foo:
>>>>>           mov     w1, 32
>>>>>           clz     w2, w0
>>>>>           cmp     w0, 0
>>>>>           csel    w0, w2, w1, ne
>>>>>           ret
>>>>>
>>>>> and for arm similarly we generate:
>>>>> foo:
>>>>>           clz     r0, r0
>>>>>           bx      lr
>>>>>
>>>>> instead of:
>>>>> foo:
>>>>>           cmp     r0, #0
>>>>>           clzne   r0, r0
>>>>>           moveq   r0, #32
>>>>>           bx      lr
>>>>>
>>>>>
>>>>> and for x86_64 with -O2 -mlzcnt we generate:
>>>>> foo:
>>>>>           xorl    %eax, %eax
>>>>>           lzcntl  %edi, %eax
>>>>>           ret
>>>>>
>>>>> instead of:
>>>>> foo:
>>>>>           xorl    %eax, %eax
>>>>>           movl    $32, %edx
>>>>>           lzcntl  %edi, %eax
>>>>>           testl   %edi, %edi
>>>>>           cmove   %edx, %eax
>>>>>           ret
>>>>>
>>>>>
>>>>> I tried getting this to work on other targets as well, but encountered
>>>>> difficulties.
>>>>> For example on powerpc the two arms of the condition seen during ifcvt
>>>>> are:
>>>>>
>>>>> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>>>>>           (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>>>>>        (nil))
>>>>> and
>>>>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>>>>>           (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132
>>>>> {clzsi2}
>>>>>        (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>>>>>           (nil)))
>>>>>
>>>>> So the setup code in noce_process_if_block sees that the set
>>>>> destination
>>>>> is
>>>>> not the same
>>>>> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ])
>>>>> 0))
>>>>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b))
>>>>> check.
>>>>> I suppose that's a consequence of how SImode operations are represented
>>>>> in
>>>>> early RTL
>>>>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt
>>>>> can
>>>>> be
>>>>> taught to handle
>>>>> destinations that are subregs of one another, but that would be a
>>>>> separate
>>>>> patch.
>>>>>
>>>>> Anyway, is this patch ok for trunk?
>>>>>
>>>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>>>>> aarch64-none-linux-gnu,
>>>>> x86_64-pc-linux-gnu.
>>>>>
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>       PR middle-end/37780
>>>>>       * ifcvt.c (noce_try_ifelse_collapse): New function.
>>>>>       Declare prototype.
>>>>>       (noce_process_if_block): Call noce_try_ifelse_collapse.
>>>>>       * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>>>>>       (simplify_ternary_operation): Use the above to simplify
>>>>>       conditional CLZ/CTZ expressions.
>>>>>
>>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>       PR middle-end/37780
>>>>>       * gcc.c-torture/execute/pr37780.c: New test.
>>>>>       * gcc.target/aarch64/pr37780_1.c: Likewise.
>>>>>       * gcc.target/arm/pr37780_1.c: Likewise.
>>>>
>>>> Hi Kyrylo,
>>>
>>>
>>> Hi Christophe,
>>>
>>>> I've noticed that gcc.target/arm/pr37780_1.c fails on
>>>> arm if arch < v6.
>>>> I first tried to fix the effective-target guard (IIRC, the doc
>>>> says clz is available starting with v5t), but that isn't sufficient.
>>>>
>>>> When compiling for armv5-t, the scan-assembler directives
>>>> fail. It seems to work with v6t2, so I am wondering whether
>>>> it's just a matter of increasing the effective-target arch version,
>>>> or if you really intended to make the test pass on these old
>>>> architectures?
>>>
>>>
>>> I've dug into it a bit.
>>> I think the problem is that CLZ is available with ARMv5T but only in arm
>>> mode.
>>> In Thumb mode it's only available with ARMv6T2.
>>> So if your armv5t gcc compiles for Thumb by default you'll get the
>>> failure.
>>
>> Actually this gcc is configured with default cpu & mode,
>> target=arm-none-eabi.
>>
>> So I think it's arm7tdmi, arm mode.
>>
>>> I think just bumping the effective to armv6t2 here is appropriate as I
>>> intended
>>> the test to just check the ifcvt transformation when the instruction is
>>> available
>>> rather than test that the CLZ instruction is available at one
>>> architecture
>>> level
>>> or another.
>>>
>>> A patch to bump the effective target check is pre-approved if you want to
>>> do
>>> it.
>>> Otherwise I'll get to it in a few days.
>>>
>> I was a about to commit the arm_arch_v5_ok -> arm_arch_v6t2_ok, but
>> this would not be sufficient without adding
>> dg-add-options arm_arch_v6t2
>> but then if gcc is configured with a higher default architecture, we'll
>> end
>> up testing if works for v6t2 only.
>
>
> I'm okay with adding "dg-add-options arm_arch_v6t2".
> We're not testing the availability of the instruction here, just the ifcvt
> transformation when it *is* available, so I think we should just add
> whatever
> option guarantees that instruction is available.
>
> Kyrill
>

OK, I've committed the attached patch.

2016-06-10  Christophe Lyon  <christophe.lyon@linaro.org>

        * gcc.target/arm/pr37780_1.c: Use arm_arch_v6t2 effective target
        and options.

>
>>> Thanks for spotting this,
>>> Kyrill
>>>>
>>>> Thanks,
>>>>
>>>> Christophe.
>>>
>>>
>

[-- Attachment #2: commit.txt --]
[-- Type: text/plain, Size: 496 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/pr37780_1.c b/gcc/testsuite/gcc.target/arm/pr37780_1.c
index b954fe5..8e06920 100644
--- a/gcc/testsuite/gcc.target/arm/pr37780_1.c
+++ b/gcc/testsuite/gcc.target/arm/pr37780_1.c
@@ -2,8 +2,9 @@
    being defined at zero.  */
 
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-require-effective-target arm_arch_v6t2_ok } */
 /* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v6t2 } */
 
 int
 fooctz (int i)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
  2016-06-10 13:39         ` Christophe Lyon
@ 2016-06-10 13:41           ` Kyrill Tkachov
  0 siblings, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2016-06-10 13:41 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: GCC Patches


On 10/06/16 14:39, Christophe Lyon wrote:
> On 10 June 2016 at 10:38, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>> On 09/06/16 13:14, Christophe Lyon wrote:
>>> On 8 June 2016 at 18:40, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
>>> wrote:
>>>> On 07/06/16 20:34, Christophe Lyon wrote:
>>>>> On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
>>>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> In this PR we want to optimise:
>>>>>> int foo (int i)
>>>>>> {
>>>>>>      return (i == 0) ? N : __builtin_clz (i);
>>>>>> }
>>>>>>
>>>>>> on targets where CLZ is defined at zero to the constant 'N'.
>>>>>> This is determined at the RTL level through the
>>>>>> CLZ_DEFINED_VALUE_AT_ZERO
>>>>>> macro.
>>>>>> The obvious place to implement this would be in combine through
>>>>>> simplify-rtx
>>>>>> where we'd
>>>>>> recognise an IF_THEN_ELSE of the form:
>>>>>> (set (reg:SI r1)
>>>>>>         (if_then_else:SI (ne (reg:SI r2)
>>>>>>                              (const_int 0 [0]))
>>>>>>           (clz:SI (reg:SI r2))
>>>>>>           (const_int 32)))
>>>>>>
>>>>>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd
>>>>>> simplify
>>>>>> it into
>>>>>> just (clz:SI (reg:SI r2)).
>>>>>> However, I found this doesn't quite happen for a couple of reasons:
>>>>>> 1) This depends on ifcvt or some other pass to have created a
>>>>>> conditional
>>>>>> move of the
>>>>>> two branches that provide the IF_THEN_ELSE to propagate the const_int
>>>>>> and
>>>>>> clz operation into.
>>>>>>
>>>>>> 2) Combine will refuse to propagate r2 from the above example into both
>>>>>> the
>>>>>> condition and the
>>>>>> CLZ at the same time, so the most we see is:
>>>>>> (set (reg:SI r1)
>>>>>>         (if_then_else:SI (ne (reg:CC cc)
>>>>>>                (const_int 0))
>>>>>>           (clz:SI (reg:SI r2))
>>>>>>           (const_int 32)))
>>>>>>
>>>>>> which is not enough information to perform the simplification.
>>>>>>
>>>>>> This patch implements the optimisation in ce1 using the noce ifcvt
>>>>>> framework.
>>>>>> During ifcvt noce_process_if_block can see that we're trying to
>>>>>> optimise
>>>>>> something
>>>>>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of
>>>>>> all
>>>>>> the information
>>>>>> needed to perform the transformation.
>>>>>>
>>>>>> The transformation is performed by adding a new noce_try* function that
>>>>>> tries to put the
>>>>>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and
>>>>>> try
>>>>>> to
>>>>>> simplify that
>>>>>> using the simplify-rtx machinery. That way, we can implement the
>>>>>> simplification logic in
>>>>>> simplify-rtx.c where it belongs.
>>>>>>
>>>>>> A similar transformation for CTZ is implemented as well.
>>>>>> So for code:
>>>>>> int foo (int i)
>>>>>> {
>>>>>>      return (i == 0) ? 32 : __builtin_clz (i);
>>>>>> }
>>>>>>
>>>>>> On aarch64 we now emit:
>>>>>> foo:
>>>>>>            clz     w0, w0
>>>>>>            ret
>>>>>>
>>>>>> instead of:
>>>>>> foo:
>>>>>>            mov     w1, 32
>>>>>>            clz     w2, w0
>>>>>>            cmp     w0, 0
>>>>>>            csel    w0, w2, w1, ne
>>>>>>            ret
>>>>>>
>>>>>> and for arm similarly we generate:
>>>>>> foo:
>>>>>>            clz     r0, r0
>>>>>>            bx      lr
>>>>>>
>>>>>> instead of:
>>>>>> foo:
>>>>>>            cmp     r0, #0
>>>>>>            clzne   r0, r0
>>>>>>            moveq   r0, #32
>>>>>>            bx      lr
>>>>>>
>>>>>>
>>>>>> and for x86_64 with -O2 -mlzcnt we generate:
>>>>>> foo:
>>>>>>            xorl    %eax, %eax
>>>>>>            lzcntl  %edi, %eax
>>>>>>            ret
>>>>>>
>>>>>> instead of:
>>>>>> foo:
>>>>>>            xorl    %eax, %eax
>>>>>>            movl    $32, %edx
>>>>>>            lzcntl  %edi, %eax
>>>>>>            testl   %edi, %edi
>>>>>>            cmove   %edx, %eax
>>>>>>            ret
>>>>>>
>>>>>>
>>>>>> I tried getting this to work on other targets as well, but encountered
>>>>>> difficulties.
>>>>>> For example on powerpc the two arms of the condition seen during ifcvt
>>>>>> are:
>>>>>>
>>>>>> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>>>>>>            (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>>>>>>         (nil))
>>>>>> and
>>>>>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>>>>>>            (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132
>>>>>> {clzsi2}
>>>>>>         (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>>>>>>            (nil)))
>>>>>>
>>>>>> So the setup code in noce_process_if_block sees that the set
>>>>>> destination
>>>>>> is
>>>>>> not the same
>>>>>> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ])
>>>>>> 0))
>>>>>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b))
>>>>>> check.
>>>>>> I suppose that's a consequence of how SImode operations are represented
>>>>>> in
>>>>>> early RTL
>>>>>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt
>>>>>> can
>>>>>> be
>>>>>> taught to handle
>>>>>> destinations that are subregs of one another, but that would be a
>>>>>> separate
>>>>>> patch.
>>>>>>
>>>>>> Anyway, is this patch ok for trunk?
>>>>>>
>>>>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>>>>>> aarch64-none-linux-gnu,
>>>>>> x86_64-pc-linux-gnu.
>>>>>>
>>>>>> Thanks,
>>>>>> Kyrill
>>>>>>
>>>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>>
>>>>>>        PR middle-end/37780
>>>>>>        * ifcvt.c (noce_try_ifelse_collapse): New function.
>>>>>>        Declare prototype.
>>>>>>        (noce_process_if_block): Call noce_try_ifelse_collapse.
>>>>>>        * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>>>>>>        (simplify_ternary_operation): Use the above to simplify
>>>>>>        conditional CLZ/CTZ expressions.
>>>>>>
>>>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>>
>>>>>>        PR middle-end/37780
>>>>>>        * gcc.c-torture/execute/pr37780.c: New test.
>>>>>>        * gcc.target/aarch64/pr37780_1.c: Likewise.
>>>>>>        * gcc.target/arm/pr37780_1.c: Likewise.
>>>>> Hi Kyrylo,
>>>>
>>>> Hi Christophe,
>>>>
>>>>> I've noticed that gcc.target/arm/pr37780_1.c fails on
>>>>> arm if arch < v6.
>>>>> I first tried to fix the effective-target guard (IIRC, the doc
>>>>> says clz is available starting with v5t), but that isn't sufficient.
>>>>>
>>>>> When compiling for armv5-t, the scan-assembler directives
>>>>> fail. It seems to work with v6t2, so I am wondering whether
>>>>> it's just a matter of increasing the effective-target arch version,
>>>>> or if you really intended to make the test pass on these old
>>>>> architectures?
>>>>
>>>> I've dug into it a bit.
>>>> I think the problem is that CLZ is available with ARMv5T but only in arm
>>>> mode.
>>>> In Thumb mode it's only available with ARMv6T2.
>>>> So if your armv5t gcc compiles for Thumb by default you'll get the
>>>> failure.
>>> Actually this gcc is configured with default cpu & mode,
>>> target=arm-none-eabi.
>>>
>>> So I think it's arm7tdmi, arm mode.
>>>
>>>> I think just bumping the effective to armv6t2 here is appropriate as I
>>>> intended
>>>> the test to just check the ifcvt transformation when the instruction is
>>>> available
>>>> rather than test that the CLZ instruction is available at one
>>>> architecture
>>>> level
>>>> or another.
>>>>
>>>> A patch to bump the effective target check is pre-approved if you want to
>>>> do
>>>> it.
>>>> Otherwise I'll get to it in a few days.
>>>>
>>> I was a about to commit the arm_arch_v5_ok -> arm_arch_v6t2_ok, but
>>> this would not be sufficient without adding
>>> dg-add-options arm_arch_v6t2
>>> but then if gcc is configured with a higher default architecture, we'll
>>> end
>>> up testing if works for v6t2 only.
>>
>> I'm okay with adding "dg-add-options arm_arch_v6t2".
>> We're not testing the availability of the instruction here, just the ifcvt
>> transformation when it *is* available, so I think we should just add
>> whatever
>> option guarantees that instruction is available.
>>
>> Kyrill
>>
> OK, I've committed the attached patch.
>
> 2016-06-10  Christophe Lyon  <christophe.lyon@linaro.org>
>
>          * gcc.target/arm/pr37780_1.c: Use arm_arch_v6t2 effective target
>          and options.
>

Thanks for taking care of this.

Kyrill

>>>> Thanks for spotting this,
>>>> Kyrill
>>>>> Thanks,
>>>>>
>>>>> Christophe.
>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-10 13:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26 13:55 [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out Kyrill Tkachov
2016-06-06 14:36 ` Kyrill Tkachov
2016-06-06 14:44 ` Bernd Schmidt
2016-06-07  7:37 ` Andreas Schwab
2016-06-07 16:47   ` Kyrill Tkachov
2016-06-07 16:51     ` Kyrill Tkachov
2016-06-07 19:35 ` Christophe Lyon
2016-06-08 16:40   ` Kyrill Tkachov
2016-06-09 12:15     ` Christophe Lyon
2016-06-10  8:38       ` Kyrill Tkachov
2016-06-10 13:39         ` Christophe Lyon
2016-06-10 13:41           ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).