public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch tree-optimization]: Fix for PR 45397 part 1 of 2
@ 2012-03-15 13:09 Kai Tietz
  2012-03-15 13:32 ` Richard Guenther
  0 siblings, 1 reply; 8+ messages in thread
From: Kai Tietz @ 2012-03-15 13:09 UTC (permalink / raw)
  To: GCC Patches

Hi,

The solution for this PR is a mix out of different issues.  First is
of course the type-hoisting, but also
it shows some lacks in simplifications on integer-values, and on equal
and none-equal
comparisons.
The first patch adds to forward-propagation the ability to do type-hoisting
for some conversion operations and do simplification for inner binary
operations on it.
Most important part is here the adjustment of constant integer-values
in statement-lists
for a truncation.
I limited that patch to handle in compare_equal_optimize_1 only
bitwise-and operations
inner a truncation-cast.  Of course for bitwise-xor/or operations some
more simplifications
are possible.
This patch just does the type-hoisting part.  In a second patch I add
to compare_equal_optimize_1
the ability for further required simplifications for fixing this problem.

ChangeLog

2012-03-15  Kai Tietz  <ktietz@redhat.com>

	PR tree-optimization/45397
	* tree-ssa-forwprop.c (compare_equal_optimize_1): New
	function.
	(combine_cond_expr_cond): Use compare_equal_optimize_1
	function.
	(truncate_integers): New function.
	(combine_inner_conversion): Likewise.
	(ssa_forward_propagate_and_combine): Use for conversions
	combine_inner_conversion function.

2012-03-15  Kai Tietz  <ktietz@redhat.com>

	* gcc.dg/tree-ssa/pr45397-1.c: New testcase.

Regression tested for all languages (including Ada and Obj-C) on
x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: gcc-trunk/gcc/tree-ssa-forwprop.c
===================================================================
--- gcc-trunk.orig/gcc/tree-ssa-forwprop.c
+++ gcc-trunk/gcc/tree-ssa-forwprop.c
@@ -362,6 +362,150 @@ rhs_to_tree (tree type, gimple stmt)
     gcc_unreachable ();
 }

+/* This function does simplifications of comparison OP0 ==/!= OP1
while integral
+   typed operands
+   On success new statement's TREE is returned, otherwise NULL_TREE.  */
+
+static tree
+compare_equal_optimize_1 (gimple stmt, enum tree_code code, tree type,
+			  tree op0, tree op1)
+{
+  gimple_stmt_iterator gsi;
+  tree type_outer;
+  tree type_inner, op_inner;
+  tree op1_l, op1_r, tem;
+  gimple newop;
+
+  type_outer = TREE_TYPE (op0);
+  if ((code != EQ_EXPR && code != NE_EXPR)
+      || !INTEGRAL_TYPE_P (type_outer))
+    return NULL_TREE;
+
+  /* If OP0 isn't a conversion, then check if OP1 might be one.  If so
+     swap arguments, otherwise return NULL_TREE.  */
+  if (!CONVERT_EXPR_P (op0))
+    {
+      if (!CONVERT_EXPR_P (op1))
+        return NULL_TREE;
+      tem = op0;
+      op0 = op1;
+      op1 = tem;
+    }
+
+  op_inner = TREE_OPERAND (op0, 0);
+  type_inner = TREE_TYPE (op_inner);
+
+  /* Operations only apply to integral typed operands of cast.  */
+  if (!INTEGRAL_TYPE_P (type_inner))
+    return NULL_TREE;
+
+  /* If second operand is also a type-conversion, check that underlying operand
+     is integral typed.  */
+  if (CONVERT_EXPR_P (op1)
+      && !INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op1, 0))))
+    return NULL_TREE;
+
+  /* Simplify ((type) X ==/!= (type) X) to true/false, if X has no side-effects
+     and is integral typed.  */
+  if (CONVERT_EXPR_P (op1)
+      && TREE_OPERAND (op0, 0) == TREE_OPERAND (op1, 0)
+      && !TREE_SIDE_EFFECTS (TREE_OPERAND (op0, 0)))
+    return fold_convert (type, (code == EQ_EXPR ? boolean_true_node
+   						: boolean_false_node));
+
+  /* Simplify (type) X ==/!= (type) Y to X ==/!= Y, if types of X and Y are
+     compatible and type-precision of X is smaller or equal to TYPE's.  */
+  if (CONVERT_EXPR_P (op1)
+      && TYPE_PRECISION (type_inner) <= TYPE_PRECISION (type_outer))
+    {
+      tem = TREE_OPERAND (op1, 0);
+      if (!useless_type_conversion_p (type_inner, TREE_TYPE (tem)))
+	return NULL_TREE;
+      return fold_build2_loc (gimple_location (stmt), code, type,
+			      op_inner, tem);
+    }
+
+  /* Verify that for pattern 'OP0 = (type) X' the type of X is of
integral kind,
+     is unsigned, and has smaller or equal precision to type TYPE.  */
+  if (TYPE_PRECISION (type_inner) > TYPE_PRECISION (type_outer)
+      || !TYPE_UNSIGNED (type_inner))
+    return NULL_TREE;
+
+  /* Simplify (type) X ==/!= CST to X ==/!= CST' with CST'=(type-of-X) CST.  */
+  if (TREE_CODE (op1) == INTEGER_CST)
+    {
+      tree new_cst = fold_convert (type_inner, op1);
+      /* If constant is out of range for (type) X, then return
+         constant result of comparison.  */
+      if (!operand_equal_p (op1, fold_convert (type_outer, new_cst), 0))
+	return fold_convert (type, (code == EQ_EXPR ? boolean_false_node
+						    : boolean_true_node));
+      return fold_build2_loc (gimple_location (stmt), code, type,
op_inner, new_cst);
+    }
+
+  /* Simplify (type) X ==/!= Y & CST to X ==/!= (type-of-X) (Y & CST).  If CST
+     is equal to ~(type-of-X)0, then simplify to X ==/!= (type-of-X).  */
+  if (TREE_CODE (op1) != BIT_AND_EXPR)
+    return NULL_TREE;
+
+  gsi = gsi_for_stmt (stmt);
+
+  op1_l = TREE_OPERAND (op1, 0);
+  op1_r = TREE_OPERAND (op1, 1);
+
+  /* Make sure OP1_R holds an integer-constant.  */
+  if (TREE_CODE (op1_l) == INTEGER_CST)
+    {
+      op1_l = op1_r;
+      op1_r = TREE_OPERAND (op1, 0);
+    }
+  if (TREE_CODE (op1_r) != INTEGER_CST)
+    return NULL_TREE;
+
+  tem = fold_convert (type_inner, op1_r);
+
+  /* If CST doesn't fit complete into precision of the type of X,
+     then we can't do anything here.
+     Remark: Type-precision is here of interested, not sign/unsigned
+     value-range.  */
+  if (!operand_equal_p (op1_r, fold_convert (TREE_TYPE (op1_r), tem), 0))
+    return NULL_TREE;
+  op0 = op_inner;
+
+  /* If CST has value of zero, then we can special case
+     to X == 0.  */
+  if (integer_zerop (tem))
+    return fold_build2_loc (gimple_location (stmt), code, type, op0, tem);
+
+  /* If CST has value of ~((type-of-X) 0), then we can special case
+     to X == (type-of-X) Y.  */
+  if (!integer_all_onesp (tem))
+    {
+      tem = create_tmp_reg (TREE_TYPE (op1), NULL);
+      newop = gimple_build_assign_with_ops (TREE_CODE (op1),
+					    tem, TREE_OPERAND (op1, 0),
+					    TREE_OPERAND (op1, 1));
+      tem = make_ssa_name (tem, newop);
+      gimple_assign_set_lhs (newop, tem);
+      gimple_set_location (newop, gimple_location (stmt));
+      gsi_insert_before (&gsi, newop, GSI_SAME_STMT);
+      update_stmt (newop);
+      op1 = tem;
+    }
+  else
+    op1 = op1_l;
+  tem = create_tmp_reg (type_inner, NULL);
+  newop = gimple_build_assign_with_ops (NOP_EXPR,
+					tem, op1, NULL_TREE);
+  tem = make_ssa_name (tem, newop);
+  gimple_assign_set_lhs (newop, tem);
+  gimple_set_location (newop, gimple_location (stmt));
+  gsi_insert_before (&gsi, newop, GSI_SAME_STMT);
+  update_stmt (newop);
+  op1 = tem;
+  return fold_build2_loc (gimple_location (stmt), code, type, op0, op1);
+}
+
 /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
    the folded result in a form suitable for COND_EXPR_COND or
    NULL_TREE, if there is no suitable simplified form.  If
@@ -378,6 +522,10 @@ combine_cond_expr_cond (gimple stmt, enu

   fold_defer_overflow_warnings ();
   t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1);
+
+  if (!t && !invariant_only)
+    t = compare_equal_optimize_1 (stmt, code, type, op0, op1);
+
   if (!t)
     {
       fold_undefer_overflow_warnings (false, NULL, 0);
@@ -2191,6 +2339,253 @@ out:
   return false;
 }

+/* Function truncates all constant integer-values to precision of
+   type TCAST within STMT expressions made of +, -, ^, |, and &.
+   STMT has to be an valid gimple-assignment statement.  */
+
+static void
+truncate_integers (gimple stmt, tree tcast)
+{
+  gimple_stmt_iterator gsi2;
+  gimple s;
+  enum tree_code code;
+  tree op1, op2, tem;
+  int need_update = 0;
+
+  do
+    {
+      code = gimple_assign_rhs_code (stmt);
+      if (code != PLUS_EXPR && code != MINUS_EXPR
+	  && code != BIT_AND_EXPR
+	  && code != BIT_IOR_EXPR
+	  && code != BIT_XOR_EXPR)
+	return;
+
+      op1 = gimple_assign_rhs1 (stmt);
+      op2 = gimple_assign_rhs2 (stmt);
+
+      if (code != MINUS_EXPR
+	  && TREE_CODE (op1) == INTEGER_CST
+	  && TREE_CODE (op2) != INTEGER_CST)
+	{
+	  need_update = 1;
+	  tem = op1;
+	  op1 = op2;
+	  op2 = tem;
+	}
+
+      if (TREE_CODE (op1) == INTEGER_CST)
+	{
+	  tem = fold_convert (tcast, op1);
+	  tem = fold_convert (TREE_TYPE (op1), tem);
+	  if (!operand_equal_p (op1, tem, 0))
+	    {
+	      op1 = tem;
+	      need_update = 1;
+	    }
+	}
+
+      if (TREE_CODE (op2) == INTEGER_CST)
+	{
+	  tem = fold_convert (tcast, op2);
+	  tem = fold_convert (TREE_TYPE (op2), tem);
+	  if (!operand_equal_p (op2, tem, 0))
+	    {
+	      op2 = tem;
+	      need_update = 1;
+	    }
+	}
+
+      if (need_update)
+	{
+	  gsi2 = gsi_for_stmt (stmt);
+	  gimple_assign_set_rhs_with_ops (&gsi2, code, op1, op2);
+	  update_stmt (stmt);
+	}
+
+      if (TREE_CODE (op2) == SSA_NAME
+	  && (s = SSA_NAME_DEF_STMT (op2)) != NULL
+	  && has_single_use (op2)
+	  && is_gimple_assign (s))
+	truncate_integers (s, tcast);
+    }
+  while (TREE_CODE (op1) == SSA_NAME
+	 && (stmt = SSA_NAME_DEF_STMT (op1)) != NULL
+	 && has_single_use (op1)
+	 && is_gimple_assign (stmt));
+}
+
+/* Convert (type1) ((type2) X [PLUS|MINUS|AND|IOR|XOR]_EXPR (type2) Y) to
+   X' [PLUS|MINUS|AND|IOR|XOR]_EXPR Y'.  If X or Y have compatible
type to TYPE1,
+   the types TYPE1, TYPE2, type of X, and type of Y have integral
kind, and TYPE2
+   has smaller or equal precision as TYPE1.
+   Returns 1 if there were any changes made, 2 if cfg-cleanup needs to
+   run.  Else it returns 0.  */
+
+static int
+combine_inner_conversion (gimple_stmt_iterator *gsi)
+{
+  gimple stmt = gsi_stmt (*gsi);
+  tree op0, lhs, inner_op0, inner_op1;
+  tree new_op0, new_op1, tem;
+  gimple s, newop;
+  enum tree_code inner_code, code;
+  int sunken_cast = 0, require_cast = 0, has_cst = 0;
+
+  if (!is_gimple_assign (stmt))
+    return 0;
+  code = gimple_assign_rhs_code (stmt);
+
+  if (!CONVERT_EXPR_CODE_P (code))
+    return 0;
+
+  new_op0 = new_op1 = NULL_TREE;
+  lhs = gimple_assign_lhs (stmt);
+  op0 = gimple_assign_rhs1 (stmt);
+
+  /* Check that inner and outer type of conversion is of integral
+     kind, and that the outer type has smaller or equal precision
+     then the inner type.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+      || !INTEGRAL_TYPE_P (TREE_TYPE (op0))
+      || TYPE_PRECISION (TREE_TYPE (lhs)) > TYPE_PRECISION (TREE_TYPE (op0)))
+    return 0;
+
+  if (TREE_CODE (op0) != SSA_NAME
+      || (s = SSA_NAME_DEF_STMT (op0)) == NULL
+      || !has_single_use (op0)
+      || !is_gimple_assign (s))
+    return 0;
+
+  inner_code = gimple_assign_rhs_code (s);
+  if (inner_code != PLUS_EXPR && inner_code != MINUS_EXPR
+      && inner_code != BIT_AND_EXPR
+      && inner_code != BIT_IOR_EXPR
+      && inner_code != BIT_XOR_EXPR)
+    return 0;
+
+  truncate_integers (s, TREE_TYPE (lhs));
+
+  inner_op0 = gimple_assign_rhs1 (s);
+  inner_op1 = gimple_assign_rhs2 (s);
+
+  if (TREE_CODE (inner_op0) == INTEGER_CST)
+    {
+      new_op0 = fold_convert (TREE_TYPE (lhs), inner_op0);
+      has_cst++;
+    }
+
+  if (TREE_CODE (inner_op1) == INTEGER_CST)
+    {
+      new_op1 = fold_convert (TREE_TYPE (lhs), inner_op1);
+      has_cst++;
+    }
+
+  if (has_cst == 2)
+    {
+      tem = fold_build2 (inner_code, TREE_TYPE (lhs), new_op0, new_op1);
+      gimple_assign_set_rhs_with_ops (gsi, TREE_CODE (tem), tem, NULL_TREE);
+      update_stmt (gsi_stmt (*gsi));
+      return 2;
+    }
+
+  if ((inner_code == PLUS_EXPR || inner_code == MINUS_EXPR)
+      && !TYPE_UNSIGNED (TREE_TYPE (lhs)))
+    return 0;
+
+  if (TREE_CODE (inner_op0) == SSA_NAME
+      && has_single_use (inner_op0)
+      && (s = SSA_NAME_DEF_STMT (inner_op0)) != NULL
+      && is_gimple_assign (s)
+      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (s))
+      && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (s)))
+      && TYPE_PRECISION (TREE_TYPE (lhs))
+	 <= TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (s))))
+    {
+      new_op0 = gimple_assign_rhs1 (s);
+      sunken_cast++;
+    }
+  else if (TREE_CODE (inner_op0) == SSA_NAME)
+    {
+      new_op0 = inner_op0;
+      require_cast++;
+    }
+
+  if (TREE_CODE (inner_op1) == SSA_NAME
+      && has_single_use (inner_op1)
+      && (s = SSA_NAME_DEF_STMT (inner_op1)) != NULL
+      && is_gimple_assign (s)
+      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (s))
+      && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (s)))
+      && TYPE_PRECISION (TREE_TYPE (lhs))
+	 <= TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (s))))
+    {
+      new_op1 = gimple_assign_rhs1 (s);
+      sunken_cast++;
+    }
+ else if (TREE_CODE (inner_op1) == SSA_NAME)
+    {
+      new_op1 = inner_op1;
+      require_cast++;
+    }
+
+  if (!new_op0 || !new_op1)
+    return 0;
+
+  if (require_cast == 2)
+    return 0;
+
+  if (require_cast && sunken_cast
+      && !useless_type_conversion_p (TREE_TYPE (new_op0), TREE_TYPE (lhs))
+      && !useless_type_conversion_p (TREE_TYPE (new_op1), TREE_TYPE (lhs)))
+    return 0;
+
+  if (require_cast && has_cst)
+    {
+      if (TREE_CODE (new_op0) == INTEGER_CST)
+	new_op0 = fold_convert (TREE_TYPE (new_op1), new_op0);
+      if (TREE_CODE (new_op1) == INTEGER_CST)
+	new_op1 = fold_convert (TREE_TYPE (new_op0), new_op1);
+
+      /* Can we simplify to (X op CST)? */
+      if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_op0))
+	  && ((inner_code != PLUS_EXPR && inner_code != MINUS_EXPR)
+	       || TYPE_UNSIGNED (TREE_TYPE (new_op0))))
+        {
+	  gimple_assign_set_rhs_with_ops (gsi, inner_code, new_op0, new_op1);
+	  update_stmt (gsi_stmt (*gsi));
+          return 2;
+        }
+      return 0;
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (new_op0), TREE_TYPE (lhs)))
+    {
+      tem = create_tmp_reg (TREE_TYPE (lhs), NULL);
+      newop = gimple_build_assign_with_ops (NOP_EXPR, tem, new_op0, NULL_TREE);
+      tem = make_ssa_name (tem, newop);
+      gimple_assign_set_lhs (newop, tem);
+      gimple_set_location (newop, gimple_location (stmt));
+      gsi_insert_before (gsi, newop, GSI_SAME_STMT);
+      new_op0 = tem;
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (new_op1), TREE_TYPE (lhs)))
+    {
+      tem = create_tmp_reg (TREE_TYPE (lhs), NULL);
+      newop = gimple_build_assign_with_ops (NOP_EXPR, tem, new_op1, NULL_TREE);
+      tem = make_ssa_name (tem, newop);
+      gimple_assign_set_lhs (newop, tem);
+      gimple_set_location (newop, gimple_location (stmt));
+      gsi_insert_before (gsi, newop, GSI_SAME_STMT);
+      new_op1 = tem;
+    }
+
+  gimple_assign_set_rhs_with_ops (gsi, inner_code, new_op0, new_op1);
+  update_stmt (gsi_stmt (*gsi));
+  return 2;
+}
+
 /* Combine two conversions in a row for the second conversion at *GSI.
    Returns 1 if there were any changes made, 2 if cfg-cleanup needs to
    run.  Else it returns 0.  */
@@ -2506,6 +2901,8 @@ ssa_forward_propagate_and_combine (void)
 			 || code == FIX_TRUNC_EXPR)
 		  {
 		    int did_something = combine_conversions (&gsi);
+		    if (!did_something)
+		      did_something = combine_inner_conversion (&gsi);
 		    if (did_something == 2)
 		      cfg_changed = true;
 		    changed = did_something != 0;
Index: gcc-trunk/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
===================================================================
--- /dev/null
+++ gcc-trunk/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+signed char a[1024], b[1024];
+
+void
+baz (void)
+{
+  int i, s, t;
+  for (i = 0; i < 1024; i++)
+    { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; }
+}
+
+/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2
  2012-03-15 13:09 [patch tree-optimization]: Fix for PR 45397 part 1 of 2 Kai Tietz
@ 2012-03-15 13:32 ` Richard Guenther
  2012-03-15 13:53   ` Kai Tietz
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Guenther @ 2012-03-15 13:32 UTC (permalink / raw)
  To: Kai Tietz; +Cc: GCC Patches

On Thu, Mar 15, 2012 at 2:08 PM, Kai Tietz <ktietz70@googlemail.com> wrote
> Hi,
>
> The solution for this PR is a mix out of different issues.  First is
> of course the type-hoisting, but also
> it shows some lacks in simplifications on integer-values, and on equal
> and none-equal
> comparisons.
> The first patch adds to forward-propagation the ability to do type-hoisting
> for some conversion operations and do simplification for inner binary
> operations on it.
> Most important part is here the adjustment of constant integer-values
> in statement-lists
> for a truncation.
> I limited that patch to handle in compare_equal_optimize_1 only
> bitwise-and operations
> inner a truncation-cast.  Of course for bitwise-xor/or operations some
> more simplifications
> are possible.
> This patch just does the type-hoisting part.  In a second patch I add
> to compare_equal_optimize_1
> the ability for further required simplifications for fixing this problem.

This looks like to match unbound pattern sizes and thus does not fit
into the forwprop machinery.  Instead it was suggested elsewhere
that promoting / demoting registers should be done in a separate pass
where you can compute a lattice of used bits and apply a transform
based on that lattice and target information (according to PROMOTE_MODE
for example).

Richard.

> ChangeLog
>
> 2012-03-15  Kai Tietz  <ktietz@redhat.com>
>
>        PR tree-optimization/45397
>        * tree-ssa-forwprop.c (compare_equal_optimize_1): New
>        function.
>        (combine_cond_expr_cond): Use compare_equal_optimize_1
>        function.
>        (truncate_integers): New function.
>        (combine_inner_conversion): Likewise.
>        (ssa_forward_propagate_and_combine): Use for conversions
>        combine_inner_conversion function.
>
> 2012-03-15  Kai Tietz  <ktietz@redhat.com>
>
>        * gcc.dg/tree-ssa/pr45397-1.c: New testcase.
>
> Regression tested for all languages (including Ada and Obj-C) on
> x86_64-unknown-linux-gnu.  Ok for apply?
>
> Regards,
> Kai
>
> Index: gcc-trunk/gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc-trunk.orig/gcc/tree-ssa-forwprop.c
> +++ gcc-trunk/gcc/tree-ssa-forwprop.c
> @@ -362,6 +362,150 @@ rhs_to_tree (tree type, gimple stmt)
>     gcc_unreachable ();
>  }
>
> +/* This function does simplifications of comparison OP0 ==/!= OP1
> while integral
> +   typed operands
> +   On success new statement's TREE is returned, otherwise NULL_TREE.  */
> +
> +static tree
> +compare_equal_optimize_1 (gimple stmt, enum tree_code code, tree type,
> +                         tree op0, tree op1)
> +{
> +  gimple_stmt_iterator gsi;
> +  tree type_outer;
> +  tree type_inner, op_inner;
> +  tree op1_l, op1_r, tem;
> +  gimple newop;
> +
> +  type_outer = TREE_TYPE (op0);
> +  if ((code != EQ_EXPR && code != NE_EXPR)
> +      || !INTEGRAL_TYPE_P (type_outer))
> +    return NULL_TREE;
> +
> +  /* If OP0 isn't a conversion, then check if OP1 might be one.  If so
> +     swap arguments, otherwise return NULL_TREE.  */
> +  if (!CONVERT_EXPR_P (op0))
> +    {
> +      if (!CONVERT_EXPR_P (op1))
> +        return NULL_TREE;
> +      tem = op0;
> +      op0 = op1;
> +      op1 = tem;
> +    }
> +
> +  op_inner = TREE_OPERAND (op0, 0);
> +  type_inner = TREE_TYPE (op_inner);
> +
> +  /* Operations only apply to integral typed operands of cast.  */
> +  if (!INTEGRAL_TYPE_P (type_inner))
> +    return NULL_TREE;
> +
> +  /* If second operand is also a type-conversion, check that underlying operand
> +     is integral typed.  */
> +  if (CONVERT_EXPR_P (op1)
> +      && !INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op1, 0))))
> +    return NULL_TREE;
> +
> +  /* Simplify ((type) X ==/!= (type) X) to true/false, if X has no side-effects
> +     and is integral typed.  */
> +  if (CONVERT_EXPR_P (op1)
> +      && TREE_OPERAND (op0, 0) == TREE_OPERAND (op1, 0)
> +      && !TREE_SIDE_EFFECTS (TREE_OPERAND (op0, 0)))
> +    return fold_convert (type, (code == EQ_EXPR ? boolean_true_node
> +                                               : boolean_false_node));
> +
> +  /* Simplify (type) X ==/!= (type) Y to X ==/!= Y, if types of X and Y are
> +     compatible and type-precision of X is smaller or equal to TYPE's.  */
> +  if (CONVERT_EXPR_P (op1)
> +      && TYPE_PRECISION (type_inner) <= TYPE_PRECISION (type_outer))
> +    {
> +      tem = TREE_OPERAND (op1, 0);
> +      if (!useless_type_conversion_p (type_inner, TREE_TYPE (tem)))
> +       return NULL_TREE;
> +      return fold_build2_loc (gimple_location (stmt), code, type,
> +                             op_inner, tem);
> +    }
> +
> +  /* Verify that for pattern 'OP0 = (type) X' the type of X is of
> integral kind,
> +     is unsigned, and has smaller or equal precision to type TYPE.  */
> +  if (TYPE_PRECISION (type_inner) > TYPE_PRECISION (type_outer)
> +      || !TYPE_UNSIGNED (type_inner))
> +    return NULL_TREE;
> +
> +  /* Simplify (type) X ==/!= CST to X ==/!= CST' with CST'=(type-of-X) CST.  */
> +  if (TREE_CODE (op1) == INTEGER_CST)
> +    {
> +      tree new_cst = fold_convert (type_inner, op1);
> +      /* If constant is out of range for (type) X, then return
> +         constant result of comparison.  */
> +      if (!operand_equal_p (op1, fold_convert (type_outer, new_cst), 0))
> +       return fold_convert (type, (code == EQ_EXPR ? boolean_false_node
> +                                                   : boolean_true_node));
> +      return fold_build2_loc (gimple_location (stmt), code, type,
> op_inner, new_cst);
> +    }
> +
> +  /* Simplify (type) X ==/!= Y & CST to X ==/!= (type-of-X) (Y & CST).  If CST
> +     is equal to ~(type-of-X)0, then simplify to X ==/!= (type-of-X).  */
> +  if (TREE_CODE (op1) != BIT_AND_EXPR)
> +    return NULL_TREE;
> +
> +  gsi = gsi_for_stmt (stmt);
> +
> +  op1_l = TREE_OPERAND (op1, 0);
> +  op1_r = TREE_OPERAND (op1, 1);
> +
> +  /* Make sure OP1_R holds an integer-constant.  */
> +  if (TREE_CODE (op1_l) == INTEGER_CST)
> +    {
> +      op1_l = op1_r;
> +      op1_r = TREE_OPERAND (op1, 0);
> +    }
> +  if (TREE_CODE (op1_r) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  tem = fold_convert (type_inner, op1_r);
> +
> +  /* If CST doesn't fit complete into precision of the type of X,
> +     then we can't do anything here.
> +     Remark: Type-precision is here of interested, not sign/unsigned
> +     value-range.  */
> +  if (!operand_equal_p (op1_r, fold_convert (TREE_TYPE (op1_r), tem), 0))
> +    return NULL_TREE;
> +  op0 = op_inner;
> +
> +  /* If CST has value of zero, then we can special case
> +     to X == 0.  */
> +  if (integer_zerop (tem))
> +    return fold_build2_loc (gimple_location (stmt), code, type, op0, tem);
> +
> +  /* If CST has value of ~((type-of-X) 0), then we can special case
> +     to X == (type-of-X) Y.  */
> +  if (!integer_all_onesp (tem))
> +    {
> +      tem = create_tmp_reg (TREE_TYPE (op1), NULL);
> +      newop = gimple_build_assign_with_ops (TREE_CODE (op1),
> +                                           tem, TREE_OPERAND (op1, 0),
> +                                           TREE_OPERAND (op1, 1));
> +      tem = make_ssa_name (tem, newop);
> +      gimple_assign_set_lhs (newop, tem);
> +      gimple_set_location (newop, gimple_location (stmt));
> +      gsi_insert_before (&gsi, newop, GSI_SAME_STMT);
> +      update_stmt (newop);
> +      op1 = tem;
> +    }
> +  else
> +    op1 = op1_l;
> +  tem = create_tmp_reg (type_inner, NULL);
> +  newop = gimple_build_assign_with_ops (NOP_EXPR,
> +                                       tem, op1, NULL_TREE);
> +  tem = make_ssa_name (tem, newop);
> +  gimple_assign_set_lhs (newop, tem);
> +  gimple_set_location (newop, gimple_location (stmt));
> +  gsi_insert_before (&gsi, newop, GSI_SAME_STMT);
> +  update_stmt (newop);
> +  op1 = tem;
> +  return fold_build2_loc (gimple_location (stmt), code, type, op0, op1);
> +}
> +
>  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
>    the folded result in a form suitable for COND_EXPR_COND or
>    NULL_TREE, if there is no suitable simplified form.  If
> @@ -378,6 +522,10 @@ combine_cond_expr_cond (gimple stmt, enu
>
>   fold_defer_overflow_warnings ();
>   t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1);
> +
> +  if (!t && !invariant_only)
> +    t = compare_equal_optimize_1 (stmt, code, type, op0, op1);
> +
>   if (!t)
>     {
>       fold_undefer_overflow_warnings (false, NULL, 0);
> @@ -2191,6 +2339,253 @@ out:
>   return false;
>  }
>
> +/* Function truncates all constant integer-values to precision of
> +   type TCAST within STMT expressions made of +, -, ^, |, and &.
> +   STMT has to be an valid gimple-assignment statement.  */
> +
> +static void
> +truncate_integers (gimple stmt, tree tcast)
> +{
> +  gimple_stmt_iterator gsi2;
> +  gimple s;
> +  enum tree_code code;
> +  tree op1, op2, tem;
> +  int need_update = 0;
> +
> +  do
> +    {
> +      code = gimple_assign_rhs_code (stmt);
> +      if (code != PLUS_EXPR && code != MINUS_EXPR
> +         && code != BIT_AND_EXPR
> +         && code != BIT_IOR_EXPR
> +         && code != BIT_XOR_EXPR)
> +       return;
> +
> +      op1 = gimple_assign_rhs1 (stmt);
> +      op2 = gimple_assign_rhs2 (stmt);
> +
> +      if (code != MINUS_EXPR
> +         && TREE_CODE (op1) == INTEGER_CST
> +         && TREE_CODE (op2) != INTEGER_CST)
> +       {
> +         need_update = 1;
> +         tem = op1;
> +         op1 = op2;
> +         op2 = tem;
> +       }
> +
> +      if (TREE_CODE (op1) == INTEGER_CST)
> +       {
> +         tem = fold_convert (tcast, op1);
> +         tem = fold_convert (TREE_TYPE (op1), tem);
> +         if (!operand_equal_p (op1, tem, 0))
> +           {
> +             op1 = tem;
> +             need_update = 1;
> +           }
> +       }
> +
> +      if (TREE_CODE (op2) == INTEGER_CST)
> +       {
> +         tem = fold_convert (tcast, op2);
> +         tem = fold_convert (TREE_TYPE (op2), tem);
> +         if (!operand_equal_p (op2, tem, 0))
> +           {
> +             op2 = tem;
> +             need_update = 1;
> +           }
> +       }
> +
> +      if (need_update)
> +       {
> +         gsi2 = gsi_for_stmt (stmt);
> +         gimple_assign_set_rhs_with_ops (&gsi2, code, op1, op2);
> +         update_stmt (stmt);
> +       }
> +
> +      if (TREE_CODE (op2) == SSA_NAME
> +         && (s = SSA_NAME_DEF_STMT (op2)) != NULL
> +         && has_single_use (op2)
> +         && is_gimple_assign (s))
> +       truncate_integers (s, tcast);
> +    }
> +  while (TREE_CODE (op1) == SSA_NAME
> +        && (stmt = SSA_NAME_DEF_STMT (op1)) != NULL
> +        && has_single_use (op1)
> +        && is_gimple_assign (stmt));
> +}
> +
> +/* Convert (type1) ((type2) X [PLUS|MINUS|AND|IOR|XOR]_EXPR (type2) Y) to
> +   X' [PLUS|MINUS|AND|IOR|XOR]_EXPR Y'.  If X or Y have compatible
> type to TYPE1,
> +   the types TYPE1, TYPE2, type of X, and type of Y have integral
> kind, and TYPE2
> +   has smaller or equal precision as TYPE1.
> +   Returns 1 if there were any changes made, 2 if cfg-cleanup needs to
> +   run.  Else it returns 0.  */
> +
> +static int
> +combine_inner_conversion (gimple_stmt_iterator *gsi)
> +{
> +  gimple stmt = gsi_stmt (*gsi);
> +  tree op0, lhs, inner_op0, inner_op1;
> +  tree new_op0, new_op1, tem;
> +  gimple s, newop;
> +  enum tree_code inner_code, code;
> +  int sunken_cast = 0, require_cast = 0, has_cst = 0;
> +
> +  if (!is_gimple_assign (stmt))
> +    return 0;
> +  code = gimple_assign_rhs_code (stmt);
> +
> +  if (!CONVERT_EXPR_CODE_P (code))
> +    return 0;
> +
> +  new_op0 = new_op1 = NULL_TREE;
> +  lhs = gimple_assign_lhs (stmt);
> +  op0 = gimple_assign_rhs1 (stmt);
> +
> +  /* Check that inner and outer type of conversion is of integral
> +     kind, and that the outer type has smaller or equal precision
> +     then the inner type.  */
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> +      || !INTEGRAL_TYPE_P (TREE_TYPE (op0))
> +      || TYPE_PRECISION (TREE_TYPE (lhs)) > TYPE_PRECISION (TREE_TYPE (op0)))
> +    return 0;
> +
> +  if (TREE_CODE (op0) != SSA_NAME
> +      || (s = SSA_NAME_DEF_STMT (op0)) == NULL
> +      || !has_single_use (op0)
> +      || !is_gimple_assign (s))
> +    return 0;
> +
> +  inner_code = gimple_assign_rhs_code (s);
> +  if (inner_code != PLUS_EXPR && inner_code != MINUS_EXPR
> +      && inner_code != BIT_AND_EXPR
> +      && inner_code != BIT_IOR_EXPR
> +      && inner_code != BIT_XOR_EXPR)
> +    return 0;
> +
> +  truncate_integers (s, TREE_TYPE (lhs));
> +
> +  inner_op0 = gimple_assign_rhs1 (s);
> +  inner_op1 = gimple_assign_rhs2 (s);
> +
> +  if (TREE_CODE (inner_op0) == INTEGER_CST)
> +    {
> +      new_op0 = fold_convert (TREE_TYPE (lhs), inner_op0);
> +      has_cst++;
> +    }
> +
> +  if (TREE_CODE (inner_op1) == INTEGER_CST)
> +    {
> +      new_op1 = fold_convert (TREE_TYPE (lhs), inner_op1);
> +      has_cst++;
> +    }
> +
> +  if (has_cst == 2)
> +    {
> +      tem = fold_build2 (inner_code, TREE_TYPE (lhs), new_op0, new_op1);
> +      gimple_assign_set_rhs_with_ops (gsi, TREE_CODE (tem), tem, NULL_TREE);
> +      update_stmt (gsi_stmt (*gsi));
> +      return 2;
> +    }
> +
> +  if ((inner_code == PLUS_EXPR || inner_code == MINUS_EXPR)
> +      && !TYPE_UNSIGNED (TREE_TYPE (lhs)))
> +    return 0;
> +
> +  if (TREE_CODE (inner_op0) == SSA_NAME
> +      && has_single_use (inner_op0)
> +      && (s = SSA_NAME_DEF_STMT (inner_op0)) != NULL
> +      && is_gimple_assign (s)
> +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (s))
> +      && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (s)))
> +      && TYPE_PRECISION (TREE_TYPE (lhs))
> +        <= TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (s))))
> +    {
> +      new_op0 = gimple_assign_rhs1 (s);
> +      sunken_cast++;
> +    }
> +  else if (TREE_CODE (inner_op0) == SSA_NAME)
> +    {
> +      new_op0 = inner_op0;
> +      require_cast++;
> +    }
> +
> +  if (TREE_CODE (inner_op1) == SSA_NAME
> +      && has_single_use (inner_op1)
> +      && (s = SSA_NAME_DEF_STMT (inner_op1)) != NULL
> +      && is_gimple_assign (s)
> +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (s))
> +      && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (s)))
> +      && TYPE_PRECISION (TREE_TYPE (lhs))
> +        <= TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (s))))
> +    {
> +      new_op1 = gimple_assign_rhs1 (s);
> +      sunken_cast++;
> +    }
> + else if (TREE_CODE (inner_op1) == SSA_NAME)
> +    {
> +      new_op1 = inner_op1;
> +      require_cast++;
> +    }
> +
> +  if (!new_op0 || !new_op1)
> +    return 0;
> +
> +  if (require_cast == 2)
> +    return 0;
> +
> +  if (require_cast && sunken_cast
> +      && !useless_type_conversion_p (TREE_TYPE (new_op0), TREE_TYPE (lhs))
> +      && !useless_type_conversion_p (TREE_TYPE (new_op1), TREE_TYPE (lhs)))
> +    return 0;
> +
> +  if (require_cast && has_cst)
> +    {
> +      if (TREE_CODE (new_op0) == INTEGER_CST)
> +       new_op0 = fold_convert (TREE_TYPE (new_op1), new_op0);
> +      if (TREE_CODE (new_op1) == INTEGER_CST)
> +       new_op1 = fold_convert (TREE_TYPE (new_op0), new_op1);
> +
> +      /* Can we simplify to (X op CST)? */
> +      if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_op0))
> +         && ((inner_code != PLUS_EXPR && inner_code != MINUS_EXPR)
> +              || TYPE_UNSIGNED (TREE_TYPE (new_op0))))
> +        {
> +         gimple_assign_set_rhs_with_ops (gsi, inner_code, new_op0, new_op1);
> +         update_stmt (gsi_stmt (*gsi));
> +          return 2;
> +        }
> +      return 0;
> +    }
> +
> +  if (!useless_type_conversion_p (TREE_TYPE (new_op0), TREE_TYPE (lhs)))
> +    {
> +      tem = create_tmp_reg (TREE_TYPE (lhs), NULL);
> +      newop = gimple_build_assign_with_ops (NOP_EXPR, tem, new_op0, NULL_TREE);
> +      tem = make_ssa_name (tem, newop);
> +      gimple_assign_set_lhs (newop, tem);
> +      gimple_set_location (newop, gimple_location (stmt));
> +      gsi_insert_before (gsi, newop, GSI_SAME_STMT);
> +      new_op0 = tem;
> +    }
> +
> +  if (!useless_type_conversion_p (TREE_TYPE (new_op1), TREE_TYPE (lhs)))
> +    {
> +      tem = create_tmp_reg (TREE_TYPE (lhs), NULL);
> +      newop = gimple_build_assign_with_ops (NOP_EXPR, tem, new_op1, NULL_TREE);
> +      tem = make_ssa_name (tem, newop);
> +      gimple_assign_set_lhs (newop, tem);
> +      gimple_set_location (newop, gimple_location (stmt));
> +      gsi_insert_before (gsi, newop, GSI_SAME_STMT);
> +      new_op1 = tem;
> +    }
> +
> +  gimple_assign_set_rhs_with_ops (gsi, inner_code, new_op0, new_op1);
> +  update_stmt (gsi_stmt (*gsi));
> +  return 2;
> +}
> +
>  /* Combine two conversions in a row for the second conversion at *GSI.
>    Returns 1 if there were any changes made, 2 if cfg-cleanup needs to
>    run.  Else it returns 0.  */
> @@ -2506,6 +2901,8 @@ ssa_forward_propagate_and_combine (void)
>                         || code == FIX_TRUNC_EXPR)
>                  {
>                    int did_something = combine_conversions (&gsi);
> +                   if (!did_something)
> +                     did_something = combine_inner_conversion (&gsi);
>                    if (did_something == 2)
>                      cfg_changed = true;
>                    changed = did_something != 0;
> Index: gcc-trunk/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
> ===================================================================
> --- /dev/null
> +++ gcc-trunk/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +signed char a[1024], b[1024];
> +
> +void
> +baz (void)
> +{
> +  int i, s, t;
> +  for (i = 0; i < 1024; i++)
> +    { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> +

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2
  2012-03-15 13:32 ` Richard Guenther
@ 2012-03-15 13:53   ` Kai Tietz
  2012-03-15 14:00     ` Jakub Jelinek
  0 siblings, 1 reply; 8+ messages in thread
From: Kai Tietz @ 2012-03-15 13:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC Patches

2012/3/15 Richard Guenther <richard.guenther@gmail.com>:
> On Thu, Mar 15, 2012 at 2:08 PM, Kai Tietz <ktietz70@googlemail.com> wrote
>> Hi,
>>
>> The solution for this PR is a mix out of different issues.  First is
>> of course the type-hoisting, but also
>> it shows some lacks in simplifications on integer-values, and on equal
>> and none-equal
>> comparisons.
>> The first patch adds to forward-propagation the ability to do type-hoisting
>> for some conversion operations and do simplification for inner binary
>> operations on it.
>> Most important part is here the adjustment of constant integer-values
>> in statement-lists
>> for a truncation.
>> I limited that patch to handle in compare_equal_optimize_1 only
>> bitwise-and operations
>> inner a truncation-cast.  Of course for bitwise-xor/or operations some
>> more simplifications
>> are possible.
>> This patch just does the type-hoisting part.  In a second patch I add
>> to compare_equal_optimize_1
>> the ability for further required simplifications for fixing this problem.
>
> This looks like to match unbound pattern sizes and thus does not fit
> into the forwprop machinery.  Instead it was suggested elsewhere
> that promoting / demoting registers should be done in a separate pass
> where you can compute a lattice of used bits and apply a transform
> based on that lattice and target information (according to PROMOTE_MODE
> for example).
>
> Richard.

Well, the integer truncation part might be something for a separate
pass.  It could then also take care that within single-use
gimple-statements the integral-constant is always on right-hand-side
of first statement of an +, -, |, ^, &, and mul.

But the cast-hoisting code itself is not unbound AFAICS and has fixed
pattern size.

Regards,
Kai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2
  2012-03-15 13:53   ` Kai Tietz
@ 2012-03-15 14:00     ` Jakub Jelinek
  2012-03-15 14:07       ` Richard Guenther
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Jelinek @ 2012-03-15 14:00 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Richard Guenther, GCC Patches

On Thu, Mar 15, 2012 at 02:53:10PM +0100, Kai Tietz wrote:
> > This looks like to match unbound pattern sizes and thus does not fit
> > into the forwprop machinery.  Instead it was suggested elsewhere
> > that promoting / demoting registers should be done in a separate pass
> > where you can compute a lattice of used bits and apply a transform
> > based on that lattice and target information (according to PROMOTE_MODE
> > for example).
> 
> Well, the integer truncation part might be something for a separate
> pass.  It could then also take care that within single-use
> gimple-statements the integral-constant is always on right-hand-side
> of first statement of an +, -, |, ^, &, and mul.
> 
> But the cast-hoisting code itself is not unbound AFAICS and has fixed
> pattern size.

The type demotion is PR45397/PR47477 among other PRs.
I'd just walk from the narrowing integer conversion stmts recursively
through the def stmts, see if they can be narrowed, note it, and finally if
everything or significant portion of the stmts can be demoted (if not all,
with some narrowing integer conversion stmt inserted), do it all together.

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2
  2012-03-15 14:00     ` Jakub Jelinek
@ 2012-03-15 14:07       ` Richard Guenther
  2012-03-15 14:34         ` Kai Tietz
  2012-03-15 16:25         ` Michael Matz
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Guenther @ 2012-03-15 14:07 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Kai Tietz, GCC Patches

On Thu, Mar 15, 2012 at 3:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Mar 15, 2012 at 02:53:10PM +0100, Kai Tietz wrote:
>> > This looks like to match unbound pattern sizes and thus does not fit
>> > into the forwprop machinery.  Instead it was suggested elsewhere
>> > that promoting / demoting registers should be done in a separate pass
>> > where you can compute a lattice of used bits and apply a transform
>> > based on that lattice and target information (according to PROMOTE_MODE
>> > for example).
>>
>> Well, the integer truncation part might be something for a separate
>> pass.  It could then also take care that within single-use
>> gimple-statements the integral-constant is always on right-hand-side
>> of first statement of an +, -, |, ^, &, and mul.
>>
>> But the cast-hoisting code itself is not unbound AFAICS and has fixed
>> pattern size.

Can you split that part out then please?

> The type demotion is PR45397/PR47477 among other PRs.
> I'd just walk from the narrowing integer conversion stmts recursively
> through the def stmts, see if they can be narrowed, note it, and finally if
> everything or significant portion of the stmts can be demoted (if not all,
> with some narrowing integer conversion stmt inserted), do it all together.

For PROMOTE_MODE targets you'd promote but properly mask out
constants (to make them cheaper to generate, for example).  You'd
also take advantate of targets that can do zero/sign-extending loads
without extra cost (ISTR that's quite important for some SPEC 2k6
benchmark on x86_64).

Richard.

>        Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2
  2012-03-15 14:07       ` Richard Guenther
@ 2012-03-15 14:34         ` Kai Tietz
  2012-03-16 11:35           ` Richard Guenther
  2012-03-15 16:25         ` Michael Matz
  1 sibling, 1 reply; 8+ messages in thread
From: Kai Tietz @ 2012-03-15 14:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, GCC Patches

2012/3/15 Richard Guenther <richard.guenther@gmail.com>:
> On Thu, Mar 15, 2012 at 3:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Mar 15, 2012 at 02:53:10PM +0100, Kai Tietz wrote:
>>> > This looks like to match unbound pattern sizes and thus does not fit
>>> > into the forwprop machinery.  Instead it was suggested elsewhere
>>> > that promoting / demoting registers should be done in a separate pass
>>> > where you can compute a lattice of used bits and apply a transform
>>> > based on that lattice and target information (according to PROMOTE_MODE
>>> > for example).
>>>
>>> Well, the integer truncation part might be something for a separate
>>> pass.  It could then also take care that within single-use
>>> gimple-statements the integral-constant is always on right-hand-side
>>> of first statement of an +, -, |, ^, &, and mul.
>>>
>>> But the cast-hoisting code itself is not unbound AFAICS and has fixed
>>> pattern size.
>
> Can you split that part out then please?

I can do.  In fact just the part of calling

Sure, it would be the removal of the function truncate_integers and its call.

>> The type demotion is PR45397/PR47477 among other PRs.
>> I'd just walk from the narrowing integer conversion stmts recursively
>> through the def stmts, see if they can be narrowed, note it, and finally if
>> everything or significant portion of the stmts can be demoted (if not all,
>> with some narrowing integer conversion stmt inserted), do it all together.

Jakub, this might be something good to have it in separate pass.
Right now I need to avoid some type-hoisting in forward-propagation,
as otherwise it would loop endless caused by type-sinking code in
forward-propagation.  Only question would be where such pass would be
best placed.  After or before forward-propagation pass?

> For PROMOTE_MODE targets you'd promote but properly mask out
> constants (to make them cheaper to generate, for example).  You'd
> also take advantate of targets that can do zero/sign-extending loads
> without extra cost (ISTR that's quite important for some SPEC 2k6
> benchmark on x86_64).

Hmm, as we are talking about truncation-casts here, what is the reaons
for PROMOTE_MODE here?
You mean to generate for a PROMOTE_MODE target explicit something like:
D1 = 0x80
D2 = (int) D1

instead of having D1 = 0xffffff80.  Isn't that a decision done on RTL level?

Another interesting type-hoisting part is to check if a type-sign
change might be profitable.

Eg:
signed char D0, D1, D5;
int D2,D3,D4
...
D2 = (int) D0
D3 = (int) D1
D4 = D2 + D3
D5 = (signed char) D4
D6 = D5 == 0x8f

to

signed char D0, D1, D5;
unsigned char D2,D3,D4
...
D2 = (unsigned char) D0
D3 = (unsigned char) D1
D4 = D2 + D3
D5 = D4 == 0x7fu

> Richard.
>
>>        Jakub

Regards,
Kai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2
  2012-03-15 14:07       ` Richard Guenther
  2012-03-15 14:34         ` Kai Tietz
@ 2012-03-15 16:25         ` Michael Matz
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Matz @ 2012-03-15 16:25 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, Kai Tietz, GCC Patches

Hi,

On Thu, 15 Mar 2012, Richard Guenther wrote:

> > The type demotion is PR45397/PR47477 among other PRs. I'd just walk 
> > from the narrowing integer conversion stmts recursively through the 
> > def stmts, see if they can be narrowed, note it, and finally if 
> > everything or significant portion of the stmts can be demoted (if not 
> > all, with some narrowing integer conversion stmt inserted), do it all 
> > together.
> 
> For PROMOTE_MODE targets you'd promote but properly mask out
> constants (to make them cheaper to generate, for example).  You'd
> also take advantate of targets that can do zero/sign-extending loads
> without extra cost (ISTR that's quite important for some SPEC 2k6
> benchmark on x86_64).

gamess.  I still have an old proof of concept patch doing type promotion.  
Probably doesn't apply anymore, and it's using too broad predicates (it 
simple-mindedly extends to the largest type see in an expression tree).
But I think that basic idea of it is sound.


Ciao,
Michael.

Index: passes.c
===================================================================
--- passes.c	(revision 159226)
+++ passes.c	(working copy)
@@ -831,6 +831,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_all_optimizations);
     {
       struct opt_pass **p = &pass_all_optimizations.pass.sub;
+      extern struct gimple_opt_pass pass_bprop_extends;
       NEXT_PASS (pass_remove_cgraph_callee_edges);
       /* Initial scalar cleanups before alias computation.
 	 They ensure memory accesses are not indirect wherever possible.  */
@@ -838,6 +839,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_update_address_taken);
       NEXT_PASS (pass_rename_ssa_copies);
       NEXT_PASS (pass_complete_unrolli);
+      NEXT_PASS (pass_bprop_extends);
       NEXT_PASS (pass_ccp);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_call_cdce);
Index: tree-ssa-ccp.c
===================================================================
--- tree-ssa-ccp.c	(revision 159226)
+++ tree-ssa-ccp.c	(working copy)
@@ -1999,3 +1999,263 @@ struct gimple_opt_pass pass_fold_builtin
     | TODO_update_ssa			/* todo_flags_finish */
  }
 };
+
+#if 1
+static bool
+promote_through_insn_p (gimple def, tree *prhs1, tree *prhs2)
+{
+  tree lhs, rhs1, rhs2;
+  if (!is_gimple_assign (def))
+    return false;
+  lhs = gimple_assign_lhs (def);
+  rhs1 = rhs2 = NULL;
+  switch (gimple_assign_rhs_class (def))
+    {
+      case GIMPLE_SINGLE_RHS:
+	rhs1 = gimple_assign_rhs1 (def);
+	if (TREE_CODE (rhs1) != SSA_NAME)
+	  return false;
+	break;
+      case GIMPLE_UNARY_RHS:
+	rhs1 = gimple_assign_rhs1 (def);
+	if (TREE_TYPE (gimple_expr_type (def)) != TREE_TYPE (rhs1))
+	  return false;
+	break;
+      case GIMPLE_BINARY_RHS:
+	rhs1 = gimple_assign_rhs1 (def);
+	rhs2 = gimple_assign_rhs2 (def);
+
+	switch (gimple_assign_rhs_code (def))
+	  {
+	    case LSHIFT_EXPR: case RSHIFT_EXPR:
+	    case LROTATE_EXPR: case RROTATE_EXPR:
+	      rhs2 = NULL;
+	      if (TREE_TYPE (lhs) != TREE_TYPE (rhs1))
+		return false;
+	      break;
+	    case PLUS_EXPR: case MINUS_EXPR:
+	    case MULT_EXPR: case EXACT_DIV_EXPR:
+	    case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR:
+	    case FLOOR_DIV_EXPR: case ROUND_DIV_EXPR:
+	    case TRUNC_MOD_EXPR: case CEIL_MOD_EXPR:
+	    case FLOOR_MOD_EXPR: case ROUND_MOD_EXPR:
+	    case RDIV_EXPR:
+	    case MIN_EXPR: case MAX_EXPR:
+	    case BIT_IOR_EXPR: case BIT_XOR_EXPR: case BIT_AND_EXPR:
+	      if (TREE_TYPE (lhs) != TREE_TYPE (rhs1)
+		  || TREE_TYPE (lhs) != TREE_TYPE (rhs2))
+		return false;
+	      break;
+	    default:
+	      return false;
+	  }
+	break;
+      default:
+	return false;
+    }
+  if (rhs1 && TREE_CODE (rhs1) != SSA_NAME)
+    rhs1 = NULL;
+  if (prhs1)
+    *prhs1 = rhs1;
+  if (rhs2 && TREE_CODE (rhs2) != SSA_NAME)
+    rhs2 = NULL;
+  if (prhs2)
+    *prhs2 = rhs2;
+  return true;
+}
+
+static tree
+get_extended_version (tree newtype, tree name, bool force)
+{
+  tree ret = TREE_CHAIN (name);
+  tree rhs1, rhs2;
+  gimple def;
+  /* If we already have a version of NAME, try to use it.  If it
+     doesn't match in type, fail.  */
+  if (ret)
+    {
+      if (TREE_TYPE (ret) == newtype)
+	return ret;
+      else
+	return NULL_TREE;
+    }
+  def = SSA_NAME_DEF_STMT (name);
+  /* If we can propagate through our defining insn, try to do that.  */
+  if (promote_through_insn_p (def, &rhs1, &rhs2))
+    {
+      gimple stmt;
+      tree extrhs1, extrhs2;
+      gimple_stmt_iterator gsi;
+      enum tree_code code;
+      if (rhs1)
+	{
+	  extrhs1 = get_extended_version (newtype, rhs1, true);
+	  if (!extrhs1)
+	    /* ??? We could force here.  */
+	    return NULL_TREE;
+	}
+      else
+	extrhs1 = gimple_assign_rhs1 (def);
+      if (extrhs1 && !useless_type_conversion_p (newtype, TREE_TYPE (extrhs1)))
+	{
+	  gcc_assert (is_gimple_min_invariant (extrhs1));
+	  extrhs1 = fold_convert (newtype, extrhs1);
+	}
+      if (rhs2)
+	{
+	  extrhs2 = get_extended_version (newtype, rhs2, true);
+	  if (!extrhs2)
+	    return NULL_TREE;
+	}
+      else
+	extrhs2 = gimple_assign_rhs2 (def);
+      code = gimple_assign_rhs_code (def);
+      if (extrhs2
+	  && code != LSHIFT_EXPR && code != RSHIFT_EXPR
+	  && code != LROTATE_EXPR && code != RROTATE_EXPR
+	  && !useless_type_conversion_p (newtype, TREE_TYPE (extrhs2)))
+	{
+	  gcc_assert (is_gimple_min_invariant (extrhs2));
+	  extrhs2 = fold_convert (newtype, extrhs2);
+	}
+      ret = create_tmp_reg (newtype, NULL);
+      add_referenced_var (ret);
+      ret = make_ssa_name (ret, NULL);
+      stmt = gimple_build_assign_with_ops (gimple_assign_rhs_code (def),
+					   ret, extrhs1, extrhs2);
+      gsi = gsi_for_stmt (def);
+      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+    }
+  else if (force)
+    {
+      gimple stmt;
+      gimple_stmt_iterator gsi;
+      /* We can't propagate through our defining statement, so emit
+         a conversion explicitely.  */
+      ret = create_tmp_reg (newtype, NULL);
+      add_referenced_var (ret);
+      ret = make_ssa_name (ret, NULL);
+      stmt = gimple_build_assign_with_ops (NOP_EXPR, ret, name, NULL);
+      if (gimple_nop_p (def))
+	{
+	  gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR));
+	  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+	}
+      else if (gimple_code (def) == GIMPLE_PHI)
+	{
+	  gsi = gsi_after_labels (gimple_bb (def));
+	  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+	}
+      else if (!stmt_ends_bb_p (def))
+	{
+	  gsi = gsi_for_stmt (def);
+	  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	}
+      else
+	{
+	  /* XXX We could search the fallthru edge, and insert there.
+	     That's correct because if this statement ends the BB it
+	     must be throwing (it can't be a conditional), and hence
+	     the result (which we want to extend) actually is only
+	     available on the fallthru edge.  But inserting on edges
+	     might create new basic blocks, and that doesn't seem 
+	     worth it.  We could do that if the fallthru successor
+	     has only one incoming edge.  Actually we can be sure that
+	     this is the case, as NAME can't be used in a PHI node
+	     (we don't call ourself on them), hence it must be used
+	     in something we dominate and therefore our fallthru successor
+	     must have only one incoming edge.  */
+#if 0
+	  edge e;
+	  edge_iterator ei;
+
+	  FOR_EACH_EDGE (e, ei, gimple_bb (def)->succs)
+	    if (e->flags & EDGE_FALLTHRU)
+	      gsi_insert_on_edge_immediate (e, sum);
+#endif
+	  return NULL_TREE;
+	}
+    }
+  TREE_CHAIN (name) = ret;
+  return ret;
+}
+
+static unsigned int
+execute_bprop_extends (void)
+{
+  unsigned int todoflags = 0;
+  size_t i;
+  /*bitmap_iterator bi;*/
+  bitmap new_extend = BITMAP_ALLOC (NULL);
+
+  /* Collect SSA names that we'd like to have in a wider type.  */
+  for (i = 0; i < num_ssa_names; i++)
+    {
+      tree lhs = ssa_name (i);
+      if (lhs)
+	{
+	  gimple def = SSA_NAME_DEF_STMT (lhs);
+	  tree rhs;
+	  TREE_CHAIN (lhs) = NULL;
+	  if (!is_gimple_assign (def)
+	      || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)))
+	    continue;
+	  rhs = gimple_assign_rhs1 (def);
+	  /* We could handle int->pointer conversions with some care
+	     through which insns we look through.  Don't bother for now.  */
+	  if (POINTER_TYPE_P (TREE_TYPE (lhs))
+	      || POINTER_TYPE_P (TREE_TYPE (rhs)))
+	    continue;
+	  if (TYPE_PRECISION (TREE_TYPE (lhs))
+	      <= TYPE_PRECISION (TREE_TYPE (rhs)))
+	    continue;
+	  if (!nowrap_type_p (TREE_TYPE (rhs)))
+	    continue;
+	  bitmap_set_bit (new_extend, SSA_NAME_VERSION (lhs));
+	}
+    }
+
+  while (!bitmap_empty_p (new_extend))
+    {
+      tree lhs, rhs, extlhs;
+      gimple def;
+      i = bitmap_first_set_bit (new_extend);
+      bitmap_clear_bit (new_extend, i);
+      lhs = ssa_name (i);
+      gcc_assert (lhs);
+      def = SSA_NAME_DEF_STMT (lhs);
+      rhs = gimple_assign_rhs1 (def);
+      extlhs = get_extended_version (TREE_TYPE (lhs), rhs, false);
+      if (extlhs)
+	{
+	  gimple_assign_set_rhs1 (def, extlhs);
+	  gimple_assign_set_rhs_code (def, SSA_NAME);
+	  update_stmt (def);
+	}
+    }
+
+  BITMAP_FREE (new_extend);
+  return todoflags;
+}
+
+struct gimple_opt_pass pass_bprop_extends =
+{
+ {
+  GIMPLE_PASS,
+  "bprop",				/* name */
+  NULL,					/* gate */
+  execute_bprop_extends,		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_NONE,				/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func
+    | TODO_verify_ssa
+    | TODO_update_ssa			/* todo_flags_finish */
+ }
+};
+#endif

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2
  2012-03-15 14:34         ` Kai Tietz
@ 2012-03-16 11:35           ` Richard Guenther
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Guenther @ 2012-03-16 11:35 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jakub Jelinek, GCC Patches

On Thu, Mar 15, 2012 at 3:34 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
> 2012/3/15 Richard Guenther <richard.guenther@gmail.com>:
>> On Thu, Mar 15, 2012 at 3:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Thu, Mar 15, 2012 at 02:53:10PM +0100, Kai Tietz wrote:
>>>> > This looks like to match unbound pattern sizes and thus does not fit
>>>> > into the forwprop machinery.  Instead it was suggested elsewhere
>>>> > that promoting / demoting registers should be done in a separate pass
>>>> > where you can compute a lattice of used bits and apply a transform
>>>> > based on that lattice and target information (according to PROMOTE_MODE
>>>> > for example).
>>>>
>>>> Well, the integer truncation part might be something for a separate
>>>> pass.  It could then also take care that within single-use
>>>> gimple-statements the integral-constant is always on right-hand-side
>>>> of first statement of an +, -, |, ^, &, and mul.
>>>>
>>>> But the cast-hoisting code itself is not unbound AFAICS and has fixed
>>>> pattern size.
>>
>> Can you split that part out then please?
>
> I can do.  In fact just the part of calling
>
> Sure, it would be the removal of the function truncate_integers and its call.
>
>>> The type demotion is PR45397/PR47477 among other PRs.
>>> I'd just walk from the narrowing integer conversion stmts recursively
>>> through the def stmts, see if they can be narrowed, note it, and finally if
>>> everything or significant portion of the stmts can be demoted (if not all,
>>> with some narrowing integer conversion stmt inserted), do it all together.
>
> Jakub, this might be something good to have it in separate pass.
> Right now I need to avoid some type-hoisting in forward-propagation,
> as otherwise it would loop endless caused by type-sinking code in
> forward-propagation.

Well, that sounds like either of it is not applying costs properly.  We should
have a total ordering cost-wise on both forms.  Which forms do we iterate
on?

> Only question would be where such pass would be
> best placed.  After or before forward-propagation pass?

Somewhere before loop optimizations (especially IVOPTs).  Generally
it might help SCEV analysis, so I'd do it before PRE, after the CCP/copy-prop
passes.

>> For PROMOTE_MODE targets you'd promote but properly mask out
>> constants (to make them cheaper to generate, for example).  You'd
>> also take advantate of targets that can do zero/sign-extending loads
>> without extra cost (ISTR that's quite important for some SPEC 2k6
>> benchmark on x86_64).
>
> Hmm, as we are talking about truncation-casts here, what is the reaons
> for PROMOTE_MODE here?
> You mean to generate for a PROMOTE_MODE target explicit something like:
> D1 = 0x80
> D2 = (int) D1
>
> instead of having D1 = 0xffffff80.  Isn't that a decision done on RTL level?

PROMOTE_MODE is about targets only having basic operations in
the mode PROMOTE_MODE promotes to.  For example (IIRC) powerpc
can only do arithmetic on full register width, not on arbitrarily sub-regs
as i386.  Which means that if you need sub-reg precision values we have
to insert truncataions after every operation on RTL.  Thus, if you artificially
lower precision of computations on the tree level this will pessimize things
on RTL.  So, you never should narrow operations below what PROMOTE_MODE
would do.  In fact, you might as well expose the PROMOTE_MODE fact on
the tree level.

> Another interesting type-hoisting part is to check if a type-sign
> change might be profitable.
>
> Eg:
> signed char D0, D1, D5;
> int D2,D3,D4
> ...
> D2 = (int) D0
> D3 = (int) D1
> D4 = D2 + D3
> D5 = (signed char) D4
> D6 = D5 == 0x8f
>
> to
>
> signed char D0, D1, D5;
> unsigned char D2,D3,D4
> ...
> D2 = (unsigned char) D0
> D3 = (unsigned char) D1
> D4 = D2 + D3
> D5 = D4 == 0x7fu

You need to watch for undefined overflow issues on the narrowed expressions
anyway (thus, have to resort to unsigned arithmetic in nearly all cases).

Richard.

>> Richard.
>>
>>>        Jakub
>
> Regards,
> Kai

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-03-16 11:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-15 13:09 [patch tree-optimization]: Fix for PR 45397 part 1 of 2 Kai Tietz
2012-03-15 13:32 ` Richard Guenther
2012-03-15 13:53   ` Kai Tietz
2012-03-15 14:00     ` Jakub Jelinek
2012-03-15 14:07       ` Richard Guenther
2012-03-15 14:34         ` Kai Tietz
2012-03-16 11:35           ` Richard Guenther
2012-03-15 16:25         ` Michael Matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).