public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
@ 2016-05-26 13:55 Kyrill Tkachov
  2016-06-06 14:36 ` Kyrill Tkachov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2016-05-26 13:55 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4407 bytes --]

Hi all,

In this PR we want to optimise:
int foo (int i)
{
   return (i == 0) ? N : __builtin_clz (i);
}

on targets where CLZ is defined at zero to the constant 'N'.
This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO macro.
The obvious place to implement this would be in combine through simplify-rtx where we'd
recognise an IF_THEN_ELSE of the form:
(set (reg:SI r1)
      (if_then_else:SI (ne (reg:SI r2)
                           (const_int 0 [0]))
        (clz:SI (reg:SI r2))
        (const_int 32)))

and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify it into
just (clz:SI (reg:SI r2)).
However, I found this doesn't quite happen for a couple of reasons:
1) This depends on ifcvt or some other pass to have created a conditional move of the
two branches that provide the IF_THEN_ELSE to propagate the const_int and clz operation into.

2) Combine will refuse to propagate r2 from the above example into both the condition and the
CLZ at the same time, so the most we see is:
(set (reg:SI r1)
      (if_then_else:SI (ne (reg:CC cc)
             (const_int 0))
        (clz:SI (reg:SI r2))
        (const_int 32)))

which is not enough information to perform the simplification.

This patch implements the optimisation in ce1 using the noce ifcvt framework.
During ifcvt noce_process_if_block can see that we're trying to optimise something
of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all the information
needed to perform the transformation.

The transformation is performed by adding a new noce_try* function that tries to put the
condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to simplify that
using the simplify-rtx machinery. That way, we can implement the simplification logic in
simplify-rtx.c where it belongs.

A similar transformation for CTZ is implemented as well.
So for code:
int foo (int i)
{
   return (i == 0) ? 32 : __builtin_clz (i);
}

On aarch64 we now emit:
foo:
         clz     w0, w0
         ret

instead of:
foo:
         mov     w1, 32
         clz     w2, w0
         cmp     w0, 0
         csel    w0, w2, w1, ne
         ret

and for arm similarly we generate:
foo:
         clz     r0, r0
         bx      lr

instead of:
foo:
         cmp     r0, #0
         clzne   r0, r0
         moveq   r0, #32
         bx      lr


and for x86_64 with -O2 -mlzcnt we generate:
foo:
         xorl    %eax, %eax
         lzcntl  %edi, %eax
         ret

instead of:
foo:
         xorl    %eax, %eax
         movl    $32, %edx
         lzcntl  %edi, %eax
         testl   %edi, %edi
         cmove   %edx, %eax
         ret


I tried getting this to work on other targets as well, but encountered difficulties.
For example on powerpc the two arms of the condition seen during ifcvt are:

(insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
         (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
      (nil))
and
(insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
         (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
      (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
         (nil)))

So the setup code in noce_process_if_block sees that the set destination is not the same
((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
I suppose that's a consequence of how SImode operations are represented in early RTL
on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be taught to handle
destinations that are subregs of one another, but that would be a separate patch.

Anyway, is this patch ok for trunk?

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, x86_64-pc-linux-gnu.

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/37780
     * ifcvt.c (noce_try_ifelse_collapse): New function.
     Declare prototype.
     (noce_process_if_block): Call noce_try_ifelse_collapse.
     * simplify-rtx.c (simplify_cond_clz_ctz): New function.
     (simplify_ternary_operation): Use the above to simplify
     conditional CLZ/CTZ expressions.

2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/37780
     * gcc.c-torture/execute/pr37780.c: New test.
     * gcc.target/aarch64/pr37780_1.c: Likewise.
     * gcc.target/arm/pr37780_1.c: Likewise.

[-- Attachment #2: ifcvt-clz.patch --]
[-- Type: text/x-patch, Size: 7829 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 4949965c9dc771bbd2f219fa72bdace3d40424da..80af4a84363192879cc49ea45f777fc987fda555 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -817,6 +817,7 @@ struct noce_if_info
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
 static int noce_try_move (struct noce_if_info *);
+static int noce_try_ifelse_collapse (struct noce_if_info *);
 static int noce_try_store_flag (struct noce_if_info *);
 static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
@@ -1120,6 +1121,37 @@ noce_try_move (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Try forming an IF_THEN_ELSE (cond, b, a) and collapsing that
+   through simplify_rtx.  Sometimes that can eliminate the IF_THEN_ELSE.
+   If that is the case, emit the result into x.  */
+
+static int
+noce_try_ifelse_collapse (struct noce_if_info * if_info)
+{
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
+  machine_mode mode = GET_MODE (if_info->x);
+  rtx if_then_else = simplify_gen_ternary (IF_THEN_ELSE, mode, mode,
+					    if_info->cond, if_info->b,
+					    if_info->a);
+
+  if (GET_CODE (if_then_else) == IF_THEN_ELSE)
+    return FALSE;
+
+  rtx_insn *seq;
+  start_sequence ();
+  noce_emit_move_insn (if_info->x, if_then_else);
+  seq = end_ifcvt_sequence (if_info);
+  if (!seq)
+    return FALSE;
+
+  emit_insn_before_setloc (seq, if_info->jump,
+			  INSN_LOCATION (if_info->insn_a));
+  return TRUE;
+}
+
+
 /* Convert "if (test) x = 1; else x = 0".
 
    Only try 0 and STORE_FLAG_VALUE here.  Other combinations will be
@@ -3493,6 +3525,8 @@ noce_process_if_block (struct noce_if_info *if_info)
 
   if (noce_try_move (if_info))
     goto success;
+  if (noce_try_ifelse_collapse (if_info))
+    goto success;
   if (noce_try_store_flag (if_info))
     goto success;
   if (noce_try_bitop (if_info))
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index eb73b9def1541b537e98ff2251e7a1d4ecf175be..0f750bb08e817ba630d92566c6a991f81e095304 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5280,6 +5280,49 @@ simplify_const_relational_operation (enum rtx_code code,
 
   return 0;
 }
+
+/* Recognize expressions of the form (X CMP 0) ? VAL : OP (X)
+   where OP is CLZ or CTZ and VAL is the value from CLZ_DEFINED_VALUE_AT_ZERO
+   or CTZ_DEFINED_VALUE_AT_ZERO respectively and return OP (X) if the expression
+   can be simplified to that or NULL_RTX if not.
+   Assume X is compared against zero with CMP_CODE and the true
+   arm is TRUE_VAL and the false arm is FALSE_VAL.  */
+
+static rtx
+simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
+{
+  if (cmp_code != EQ && cmp_code != NE)
+    return NULL_RTX;
+
+  /* Result on X == 0 and X !=0 respectively.  */
+  rtx on_zero, on_nonzero;
+  if (cmp_code == EQ)
+    {
+      on_zero = true_val;
+      on_nonzero = false_val;
+    }
+  else
+    {
+      on_zero = false_val;
+      on_nonzero = true_val;
+    }
+
+  rtx_code op_code = GET_CODE (on_nonzero);
+  if ((op_code != CLZ && op_code != CTZ)
+      || !rtx_equal_p (XEXP (on_nonzero, 0), x)
+      || !CONST_INT_P (on_zero))
+    return NULL_RTX;
+
+  HOST_WIDE_INT op_val;
+  machine_mode mode = GET_MODE (on_nonzero);
+  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
+	|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
+      && op_val == INTVAL (on_zero))
+    return on_nonzero;
+
+  return NULL_RTX;
+}
+
 \f
 /* Simplify CODE, an operation with result mode MODE and three operands,
    OP0, OP1, and OP2.  OP0_MODE was the mode of OP0 before it became
@@ -5413,6 +5456,19 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 	    }
 	}
 
+      /* Convert x == 0 ? N : clz (x) into clz (x) when
+	 CLZ_DEFINED_VALUE_AT_ZERO is defined to N for the mode of x.
+	 Similarly for ctz (x).  */
+      if (COMPARISON_P (op0) && !side_effects_p (op0)
+	  && XEXP (op0, 1) == const0_rtx)
+	{
+	  rtx simplified
+	    = simplify_cond_clz_ctz (XEXP (op0, 0), GET_CODE (op0),
+				     op1, op2);
+	  if (simplified)
+	    return simplified;
+	}
+
       if (COMPARISON_P (op0) && ! side_effects_p (op0))
 	{
 	  machine_mode cmp_mode = (GET_MODE (XEXP (op0, 0)) == VOIDmode
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr37780.c b/gcc/testsuite/gcc.c-torture/execute/pr37780.c
new file mode 100644
index 0000000000000000000000000000000000000000..a9eca68786ec8bbf98f010da22302d7c0297a766
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr37780.c
@@ -0,0 +1,49 @@
+/* PR middle-end/37780.  */
+
+#define VAL (8 * sizeof (int))
+
+int __attribute__ ((noinline, noclone))
+fooctz (int i)
+{
+  return (i == 0) ? VAL : __builtin_ctz (i);
+}
+
+int __attribute__ ((noinline, noclone))
+fooctz2 (int i)
+{
+  return (i != 0) ? __builtin_ctz (i) : VAL;
+}
+
+unsigned int __attribute__ ((noinline, noclone))
+fooctz3 (unsigned int i)
+{
+  return (i > 0) ?  __builtin_ctz (i) : VAL;
+}
+
+int __attribute__ ((noinline, noclone))
+fooclz (int i)
+{
+  return (i == 0) ? VAL : __builtin_clz (i);
+}
+
+int __attribute__ ((noinline, noclone))
+fooclz2 (int i)
+{
+  return (i != 0) ? __builtin_clz (i) : VAL;
+}
+
+unsigned int __attribute__ ((noinline, noclone))
+fooclz3 (unsigned int i)
+{
+  return (i > 0) ? __builtin_clz (i) : VAL;
+}
+
+int
+main (void)
+{
+  if (fooctz (0) != VAL || fooctz2 (0) != VAL || fooctz3 (0) != VAL
+      || fooclz (0) != VAL || fooclz2 (0) != VAL || fooclz3 (0) != VAL)
+    __builtin_abort ();
+
+  return 0;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/pr37780_1.c b/gcc/testsuite/gcc.target/aarch64/pr37780_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..97027e7479cfe284322249f8a4edc7d3da8ff644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr37780_1.c
@@ -0,0 +1,46 @@
+/* Test that we can remove the conditional move due to CLZ
+   and CTZ being defined at zero.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+fooctz (int i)
+{
+  return (i == 0) ? 32 : __builtin_ctz (i);
+}
+
+int
+fooctz2 (int i)
+{
+  return (i != 0) ? __builtin_ctz (i) : 32;
+}
+
+unsigned int
+fooctz3 (unsigned int i)
+{
+  return (i > 0) ?  __builtin_ctz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
+
+int
+fooclz (int i)
+{
+  return (i == 0) ? 32 : __builtin_clz (i);
+}
+
+int
+fooclz2 (int i)
+{
+  return (i != 0) ? __builtin_clz (i) : 32;
+}
+
+unsigned int
+fooclz3 (unsigned int i)
+{
+  return (i > 0) ? __builtin_clz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "clz\t" 6 } } */
+/* { dg-final { scan-assembler-not "cmp\t.*0" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr37780_1.c b/gcc/testsuite/gcc.target/arm/pr37780_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..b954fe5ceb89690feaa1098d0471b518bcbe0631
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr37780_1.c
@@ -0,0 +1,47 @@
+/* Test that we can remove the conditional move due to CLZ
+   being defined at zero.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-options "-O2" } */
+
+int
+fooctz (int i)
+{
+  return (i == 0) ? 32 : __builtin_ctz (i);
+}
+
+int
+fooctz2 (int i)
+{
+  return (i != 0) ? __builtin_ctz (i) : 32;
+}
+
+unsigned int
+fooctz3 (unsigned int i)
+{
+  return (i > 0) ?  __builtin_ctz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
+
+int
+fooclz (int i)
+{
+  return (i == 0) ? 32 : __builtin_clz (i);
+}
+
+int
+fooclz2 (int i)
+{
+  return (i != 0) ? __builtin_clz (i) : 32;
+}
+
+unsigned int
+fooclz3 (unsigned int i)
+{
+  return (i > 0) ? __builtin_clz (i) : 32;
+}
+
+/* { dg-final { scan-assembler-times "clz\t" 6 } } */
+/* { dg-final { scan-assembler-not "cmp\t.*0" } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-10 13:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26 13:55 [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out Kyrill Tkachov
2016-06-06 14:36 ` Kyrill Tkachov
2016-06-06 14:44 ` Bernd Schmidt
2016-06-07  7:37 ` Andreas Schwab
2016-06-07 16:47   ` Kyrill Tkachov
2016-06-07 16:51     ` Kyrill Tkachov
2016-06-07 19:35 ` Christophe Lyon
2016-06-08 16:40   ` Kyrill Tkachov
2016-06-09 12:15     ` Christophe Lyon
2016-06-10  8:38       ` Kyrill Tkachov
2016-06-10 13:39         ` Christophe Lyon
2016-06-10 13:41           ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).