[PATCH] Fold more aggressively for trip counts

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fold more aggressively for trip counts
@ 2007-01-03  2:40 Robert Kennedy
  2007-01-03  2:50 ` Andrew Pinski
  2007-01-03 13:57 ` Richard Guenther
  0 siblings, 2 replies; 15+ messages in thread
From: Robert Kennedy @ 2007-01-03  2:40 UTC (permalink / raw)
  To: gcc-patches

Attached is a patch to do a little bit of extra work in analyzing trip
counts for loops. As a side benefit, some other phases will perhaps
get slightly better folding results.

The motivation for this patch came from some experiments I did with
doing strength reduction and linear function test replacement rather
early in optimization, before, e.g., vectorization. At first, code
introduced by SR caused some failures by keeping loops from being
analyzed well, hence the genesis of this patch.

Bootstrapped and tested on i686-pc-linux-gnu.

	-- Robert Kennedy

		    ------------------------------

2007-01-02  Robert Kennedy <jimbob@google.com>

	* tree-ssa-loop-niter.c (tree_simplify_using_condition_1):
	Simplify using simply related variables, not just identical
	ones.
	* fold-const.c (fold_comparison): Fold comparisons like (x *
	1000 < 0) to (x < 0).
	
==== //depot2/gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c#16 - /home/jimbob/clients/jimbob-perforce2-test/gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c ====
# action=edit type=text
--- gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c	2006-12-21 16:55:26.000000000 -0800
+++ gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c	2007-01-02 11:06:35.000000000 -0800
@@ -42,7 +42,7 @@
    force_fit_type takes a constant, an overflowable flag and prior
    overflow indicators.  It forces the value to fit the type and sets
    TREE_OVERFLOW and TREE_CONSTANT_OVERFLOW as appropriate.
-   
+
    Note: Since the folders get called on non-gimple code as well as
    gimple code, we need to handle GIMPLE tuples as well as their
    corresponding tree equivalents.  */
@@ -8065,6 +8065,29 @@
 			    variable2);
     }
 
+  /* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
+     signed arithmetic case.  That form is created by the compiler
+     often enough for folding it to be of value. One example is in
+     computing loop trip counts after Operator Strength Reduction.  */
+  if (!(flag_wrapv || flag_trapv)
+      && !TYPE_UNSIGNED (TREE_TYPE (arg0))
+      && TREE_CODE (arg0) == MULT_EXPR
+      && (TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
+          && !TREE_OVERFLOW (TREE_OPERAND (arg0, 1)))
+      && integer_zerop (arg1))
+    {
+      tree const1 = TREE_OPERAND (arg0, 1);
+      tree const2 = arg1;                       /* zero */
+      tree variable1 = TREE_OPERAND (arg0, 0);
+      enum tree_code cmp_code = code;
+
+      /* If const1 is negative we swap the sense of the comparison.  */
+      if (tree_int_cst_sgn (const1) < 0)
+        cmp_code = swap_tree_comparison (cmp_code);
+
+      return fold_build2 (cmp_code, type, variable1, const2);
+    }
+
   tem = maybe_canonicalize_comparison (code, type, arg0, arg1);
   if (tem)
     return tem;
==== //depot2/gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c#10 - /home/jimbob/clients/jimbob-perforce2-test/gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c ====
# action=edit type=text
--- gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c	2006-12-29 11:18:58.000000000 -0800
+++ gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c	2007-01-02 11:11:41.000000000 -0800
@@ -837,6 +837,55 @@
   if (e && integer_zerop (e))
     return e;
 
+  /* Try something a little aggressive: If te is easily expressible in
+     terms of a variable in cond (resp. notcond), we can give a
+     stronger answer.  Get the definitions of te's variable(s) and try
+     checks based on equivalent (or weaker) conditions in terms of
+     them. The original motivation was to keep Operator Strength
+     Reduction from hurting subsequent loop optimizations by making
+     trip counts harder to compute.
+
+     TODO: Consider iterating here through copies instead of checking
+     just one level up.  */
+
+  if (COMPARISON_CLASS_P (te))
+    {
+      tree def0;
+      tree def1;
+
+      def0 = e0 = TREE_OPERAND (te, 0);
+      def1 = e1 = TREE_OPERAND (te, 1);
+
+      /* Did somebody make sure te is canonicalized somehow? What are the
+         rules for its canonicalization? */
+      if (TREE_CODE (e0) == SSA_NAME)
+        {
+          tree def = SSA_NAME_DEF_STMT (e0);
+          if (TREE_CODE (def) == GIMPLE_MODIFY_STMT)
+            def0 = GIMPLE_STMT_OPERAND (def, 1);
+        }
+      if (TREE_CODE (e1) == SSA_NAME)
+        {
+          tree def = SSA_NAME_DEF_STMT (e1);
+          if (TREE_CODE (def) == GIMPLE_MODIFY_STMT)
+            def1 = GIMPLE_STMT_OPERAND (def, 1);
+        }
+
+      if (def0 != e0
+          || def1 != e1)
+        te = fold_build2 (TREE_CODE (te), boolean_type_node, def0, def1);
+
+      /* Check whether COND ==> EXPR.  */
+      e = fold_binary (TRUTH_OR_EXPR, boolean_type_node, notcond, te);
+      if (e && integer_nonzerop (e))
+        return e;
+
+      /* Check whether COND ==> not EXPR.  */
+      e = fold_binary (TRUTH_AND_EXPR, boolean_type_node, cond, te);
+      if (e && integer_zerop (e))
+        return e;
+    }
+
   return expr;
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-03  2:40 [PATCH] Fold more aggressively for trip counts Robert Kennedy
@ 2007-01-03  2:50 ` Andrew Pinski
  2007-01-03  8:49   ` Paolo Bonzini
  2007-01-03 13:57 ` Richard Guenther
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Pinski @ 2007-01-03  2:50 UTC (permalink / raw)
  To: jimbob; +Cc: gcc-patches

> 
> Attached is a patch to do a little bit of extra work in analyzing trip
> counts for loops. As a side benefit, some other phases will perhaps
> get slightly better folding results.
> 
> The motivation for this patch came from some experiments I did with
> doing strength reduction and linear function test replacement rather
> early in optimization, before, e.g., vectorization. At first, code
> introduced by SR caused some failures by keeping loops from being
> analyzed well, hence the genesis of this patch.
> 
> Bootstrapped and tested on i686-pc-linux-gnu.
> 
> 	-- Robert Kennedy
> 
> 		    ------------------------------
> 
> +      /* Did somebody make sure te is canonicalized somehow? What are the
                                   ^
Add h there.

> +         rules for its canonicalization? */

The rules for canonicalization should be documented in c-tree.texi but they
are not :(.  Right now maybe_canonicalize_comparison documents the,.


Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-03  2:50 ` Andrew Pinski
@ 2007-01-03  8:49   ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2007-01-03  8:49 UTC (permalink / raw)
  To: Andrew Pinski, jimbob; +Cc: gcc-patches


>> +      /* Did somebody make sure te is canonicalized somehow? What are the
>                                    ^
> Add h there.

No, rather, write all variable names (including TE) in uppercase.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-03  2:40 [PATCH] Fold more aggressively for trip counts Robert Kennedy
  2007-01-03  2:50 ` Andrew Pinski
@ 2007-01-03 13:57 ` Richard Guenther
  2007-01-03 15:14   ` Daniel Berlin
  2007-01-04 19:35   ` Robert Kennedy
  1 sibling, 2 replies; 15+ messages in thread
From: Richard Guenther @ 2007-01-03 13:57 UTC (permalink / raw)
  To: Robert Kennedy; +Cc: gcc-patches

On 1/3/07, Robert Kennedy <jimbob@google.com> wrote:
> Attached is a patch to do a little bit of extra work in analyzing trip
> counts for loops. As a side benefit, some other phases will perhaps
> get slightly better folding results.
>
> The motivation for this patch came from some experiments I did with
> doing strength reduction and linear function test replacement rather
> early in optimization, before, e.g., vectorization. At first, code
> introduced by SR caused some failures by keeping loops from being
> analyzed well, hence the genesis of this patch.
>
> Bootstrapped and tested on i686-pc-linux-gnu.

The fold-const change is ok if you add a testcase.
tree_simplify_using_condition_1
is already complex, ugly and slow - we shouldn't add to that more.
Can you provide
a testcase and see where we create a comparison that makes this extra testing
worthwhile?  Can you fix this place instead?

Thanks,
Richard.

>
>         -- Robert Kennedy
>
>                     ------------------------------
>
> 2007-01-02  Robert Kennedy <jimbob@google.com>
>
>         * tree-ssa-loop-niter.c (tree_simplify_using_condition_1):
>         Simplify using simply related variables, not just identical
>         ones.
>         * fold-const.c (fold_comparison): Fold comparisons like (x *
>         1000 < 0) to (x < 0).
>
> ==== //depot2/gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c#16 - /home/jimbob/clients/jimbob-perforce2-test/gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c ====
> # action=edit type=text
> --- gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c        2006-12-21 16:55:26.000000000 -0800
> +++ gcctools/google_vendor_src_branch/gcc/trunk/gcc/fold-const.c        2007-01-02 11:06:35.000000000 -0800
> @@ -42,7 +42,7 @@
>     force_fit_type takes a constant, an overflowable flag and prior
>     overflow indicators.  It forces the value to fit the type and sets
>     TREE_OVERFLOW and TREE_CONSTANT_OVERFLOW as appropriate.
> -
> +
>     Note: Since the folders get called on non-gimple code as well as
>     gimple code, we need to handle GIMPLE tuples as well as their
>     corresponding tree equivalents.  */
> @@ -8065,6 +8065,29 @@
>                             variable2);
>      }
>
> +  /* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
> +     signed arithmetic case.  That form is created by the compiler
> +     often enough for folding it to be of value. One example is in
> +     computing loop trip counts after Operator Strength Reduction.  */
> +  if (!(flag_wrapv || flag_trapv)
> +      && !TYPE_UNSIGNED (TREE_TYPE (arg0))
> +      && TREE_CODE (arg0) == MULT_EXPR
> +      && (TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
> +          && !TREE_OVERFLOW (TREE_OPERAND (arg0, 1)))
> +      && integer_zerop (arg1))
> +    {
> +      tree const1 = TREE_OPERAND (arg0, 1);
> +      tree const2 = arg1;                       /* zero */
> +      tree variable1 = TREE_OPERAND (arg0, 0);
> +      enum tree_code cmp_code = code;
> +
> +      /* If const1 is negative we swap the sense of the comparison.  */
> +      if (tree_int_cst_sgn (const1) < 0)
> +        cmp_code = swap_tree_comparison (cmp_code);
> +
> +      return fold_build2 (cmp_code, type, variable1, const2);
> +    }
> +
>    tem = maybe_canonicalize_comparison (code, type, arg0, arg1);
>    if (tem)
>      return tem;
> ==== //depot2/gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c#10 - /home/jimbob/clients/jimbob-perforce2-test/gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c ====
> # action=edit type=text
> --- gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c       2006-12-29 11:18:58.000000000 -0800
> +++ gcctools/google_vendor_src_branch/gcc/trunk/gcc/tree-ssa-loop-niter.c       2007-01-02 11:11:41.000000000 -0800
> @@ -837,6 +837,55 @@
>    if (e && integer_zerop (e))
>      return e;
>
> +  /* Try something a little aggressive: If te is easily expressible in
> +     terms of a variable in cond (resp. notcond), we can give a
> +     stronger answer.  Get the definitions of te's variable(s) and try
> +     checks based on equivalent (or weaker) conditions in terms of
> +     them. The original motivation was to keep Operator Strength
> +     Reduction from hurting subsequent loop optimizations by making
> +     trip counts harder to compute.
> +
> +     TODO: Consider iterating here through copies instead of checking
> +     just one level up.  */
> +
> +  if (COMPARISON_CLASS_P (te))
> +    {
> +      tree def0;
> +      tree def1;
> +
> +      def0 = e0 = TREE_OPERAND (te, 0);
> +      def1 = e1 = TREE_OPERAND (te, 1);
> +
> +      /* Did somebody make sure te is canonicalized somehow? What are the
> +         rules for its canonicalization? */
> +      if (TREE_CODE (e0) == SSA_NAME)
> +        {
> +          tree def = SSA_NAME_DEF_STMT (e0);
> +          if (TREE_CODE (def) == GIMPLE_MODIFY_STMT)
> +            def0 = GIMPLE_STMT_OPERAND (def, 1);
> +        }
> +      if (TREE_CODE (e1) == SSA_NAME)
> +        {
> +          tree def = SSA_NAME_DEF_STMT (e1);
> +          if (TREE_CODE (def) == GIMPLE_MODIFY_STMT)
> +            def1 = GIMPLE_STMT_OPERAND (def, 1);
> +        }
> +
> +      if (def0 != e0
> +          || def1 != e1)
> +        te = fold_build2 (TREE_CODE (te), boolean_type_node, def0, def1);
> +
> +      /* Check whether COND ==> EXPR.  */
> +      e = fold_binary (TRUTH_OR_EXPR, boolean_type_node, notcond, te);
> +      if (e && integer_nonzerop (e))
> +        return e;
> +
> +      /* Check whether COND ==> not EXPR.  */
> +      e = fold_binary (TRUTH_AND_EXPR, boolean_type_node, cond, te);
> +      if (e && integer_zerop (e))
> +        return e;
> +    }
> +
>    return expr;
>  }
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-03 13:57 ` Richard Guenther
@ 2007-01-03 15:14   ` Daniel Berlin
  2007-01-04 19:35   ` Robert Kennedy
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Berlin @ 2007-01-03 15:14 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Robert Kennedy, gcc-patches

On 1/3/07, Richard Guenther <richard.guenther@gmail.com> wrote:
> On 1/3/07, Robert Kennedy <jimbob@google.com> wrote:
> > Attached is a patch to do a little bit of extra work in analyzing trip
> > counts for loops. As a side benefit, some other phases will perhaps
> > get slightly better folding results.
> >
> > The motivation for this patch came from some experiments I did with
> > doing strength reduction and linear function test replacement rather
> > early in optimization, before, e.g., vectorization. At first, code
> > introduced by SR caused some failures by keeping loops from being
> > analyzed well, hence the genesis of this patch.
> >
> > Bootstrapped and tested on i686-pc-linux-gnu.
>
> The fold-const change is ok if you add a testcase.
> tree_simplify_using_condition_1
> is already complex, ugly and slow - we shouldn't add to that more.
> Can you provide
> a testcase and see where we create a comparison that makes this extra testing
> worthwhile?  Can you fix this place instead?
>

The main place that exposes this is the Operator Strength Reduction
pass, which is not yet committed or submitted.
These are mainly generated by the  exit test replacement it performs
Theoretically you could work around this by moving OSR much earlier
than where we have it now, and then some other pass would probably
simplify these (VRP is my guess), but doing so is not really feasible
(You don't want to run strength reduction too early, it makes things
harder to analyze. We currently have it right before ivopts, which is
somewhere that both Zdenek and i agree is the right place for such a
pass).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-03 13:57 ` Richard Guenther
  2007-01-03 15:14   ` Daniel Berlin
@ 2007-01-04 19:35   ` Robert Kennedy
  2007-01-04 19:46     ` Richard Guenther
  2007-01-04 22:59     ` Zdenek Dvorak
  1 sibling, 2 replies; 15+ messages in thread
From: Robert Kennedy @ 2007-01-04 19:35 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Robert Kennedy, gcc-patches

> The fold-const change is ok if you add a testcase.

A testcase is difficult because the fold-const change only works on
non-GIMPLE comparisons that are generated during analysis and then
thrown away. Because it only works on non-GIMPLE comparisons, it will
not directly affect comparisons in the code. Its point is to make
analysis more precise.

In order to see the effects, you have to:
1. have something that makes the analysis more difficult, like
   Operator Strength Reduction, which isn't checked in yet; and
2. have it in a carefully chosen place so that it happens "too early,"
   i.e., before some phase that does analysis based on comparisons
   (vectorization is an example).

The place Dan, Zdenek, and I have settled on for OSR is just before
IVOPTS, which is too late to cause much confusion. It is possible that
with OSR that late, my fold-const change is a no-op. But I'm in favor
of the change anyway because the cost is very low and it addresses a
general problem that could arise with new or exiting optimizations as
things change or get moved around. For example, if we decide in the
future to put OSR earlier or vectorization later, optimizations will
break without this change.

	-- Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-04 19:35   ` Robert Kennedy
@ 2007-01-04 19:46     ` Richard Guenther
  2007-01-04 19:49       ` Robert Kennedy
  2007-01-04 22:59     ` Zdenek Dvorak
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2007-01-04 19:46 UTC (permalink / raw)
  To: Robert Kennedy; +Cc: gcc-patches

On 1/4/07, Robert Kennedy <jimbob@google.com> wrote:
> > The fold-const change is ok if you add a testcase.
>
> A testcase is difficult because the fold-const change only works on
> non-GIMPLE comparisons that are generated during analysis and then
> thrown away. Because it only works on non-GIMPLE comparisons, it will
> not directly affect comparisons in the code. Its point is to make
> analysis more precise.

Well, a testcase looks like

int foo(int a)
{
  return a * 1000 > 0;
}

and then check the gimple tree dump.  The full expressions are
fed to fold from the frontend.

Richard.

>
> In order to see the effects, you have to:
> 1. have something that makes the analysis more difficult, like
>    Operator Strength Reduction, which isn't checked in yet; and
> 2. have it in a carefully chosen place so that it happens "too early,"
>    i.e., before some phase that does analysis based on comparisons
>    (vectorization is an example).
>
> The place Dan, Zdenek, and I have settled on for OSR is just before
> IVOPTS, which is too late to cause much confusion. It is possible that
> with OSR that late, my fold-const change is a no-op. But I'm in favor
> of the change anyway because the cost is very low and it addresses a
> general problem that could arise with new or exiting optimizations as
> things change or get moved around. For example, if we decide in the
> future to put OSR earlier or vectorization later, optimizations will
> break without this change.
>
>         -- Robert
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-04 19:46     ` Richard Guenther
@ 2007-01-04 19:49       ` Robert Kennedy
  2007-01-04 19:54         ` Richard Guenther
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Kennedy @ 2007-01-04 19:49 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Robert Kennedy, gcc-patches

> Well, a testcase looks like
> 
> int foo(int a)
> {
>   return a * 1000 > 0;
> }
> 
> and then check the gimple tree dump.

I tried that. Somehow (I'm not sure how) it gets folded anyway without
my change. Whatever is doing it in that context does not apply during
the loop optimizations.

	-- Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-04 19:49       ` Robert Kennedy
@ 2007-01-04 19:54         ` Richard Guenther
  2007-01-04 19:58           ` Robert Kennedy
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2007-01-04 19:54 UTC (permalink / raw)
  To: Robert Kennedy; +Cc: gcc-patches

On 1/4/07, Robert Kennedy <jimbob@google.com> wrote:
> > Well, a testcase looks like
> >
> > int foo(int a)
> > {
> >   return a * 1000 > 0;
> > }
> >
> > and then check the gimple tree dump.
>
> I tried that. Somehow (I'm not sure how) it gets folded anyway without
> my change. Whatever is doing it in that context does not apply during
> the loop optimizations.

I see the condition survive until asm even with -O2 on i686.

Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-04 19:54         ` Richard Guenther
@ 2007-01-04 19:58           ` Robert Kennedy
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Kennedy @ 2007-01-04 19:58 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Robert Kennedy, gcc-patches

> > I tried that. Somehow (I'm not sure how) it gets folded anyway without
> > my change. Whatever is doing it in that context does not apply during
> > the loop optimizations.
> 
> I see the condition survive until asm even with -O2 on i686.

Thanks. I was looking in the wrong subdirectory... I take it all
back. :-( Grr.

Sorry for the distraction. You are right. I'll add the testcase.

	-- Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-04 19:35   ` Robert Kennedy
  2007-01-04 19:46     ` Richard Guenther
@ 2007-01-04 22:59     ` Zdenek Dvorak
  2007-01-05  1:18       ` Robert Kennedy
  2007-01-10  0:05       ` Robert Kennedy
  1 sibling, 2 replies; 15+ messages in thread
From: Zdenek Dvorak @ 2007-01-04 22:59 UTC (permalink / raw)
  To: Robert Kennedy; +Cc: Richard Guenther, gcc-patches

Hello,

> +  /* Try something a little aggressive: If te is easily expressible in
> +     terms of a variable in cond (resp. notcond), we can give a
> +     stronger answer.  Get the definitions of te's variable(s) and try
> +     checks based on equivalent (or weaker) conditions in terms of
> +     them. The original motivation was to keep Operator Strength
> +     Reduction from hurting subsequent loop optimizations by making
> +     trip counts harder to compute.
> +
> +     TODO: Consider iterating here through copies instead of checking
> +     just one level up.  */

implementing the TODO you mention here looks quite dangerous, asking for
serious performance problems.  I am not sure making tree_simplify_using_condition
more complex is a good idea -- for what kind of expressions do you need this change?
Also, I think you should provide some measurements supporting the need for this change -- i.e.,
verify that it does not increase compile time, and measure how many times is this change useful
(for example, during gcc bootstrap).

> +  if (COMPARISON_CLASS_P (te))
> +    {
> +      tree def0;
> +      tree def1;
> +
> +      def0 = e0 = TREE_OPERAND (te, 0);
> +      def1 = e1 = TREE_OPERAND (te, 1);
> +
> +      /* Did somebody make sure te is canonicalized somehow? What are the
> +         rules for its canonicalization? */

I do not think it is useful to introduce this kind of comments to the code.

Anyway, how would you want it to be canonicalized?  I think the answer is no, #
of iterations analysis will pass all sorts of expressions to
tree_simplify_using_condition.

Zdenek

> +      if (TREE_CODE (e0) == SSA_NAME)
> +        {
> +          tree def = SSA_NAME_DEF_STMT (e0);
> +          if (TREE_CODE (def) == GIMPLE_MODIFY_STMT)
> +            def0 = GIMPLE_STMT_OPERAND (def, 1);
> +        }
> +      if (TREE_CODE (e1) == SSA_NAME)
> +        {
> +          tree def = SSA_NAME_DEF_STMT (e1);
> +          if (TREE_CODE (def) == GIMPLE_MODIFY_STMT)
> +            def1 = GIMPLE_STMT_OPERAND (def, 1);
> +        }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-04 22:59     ` Zdenek Dvorak
@ 2007-01-05  1:18       ` Robert Kennedy
  2007-01-05  7:46         ` Zdenek Dvorak
  2007-01-10  0:05       ` Robert Kennedy
  1 sibling, 1 reply; 15+ messages in thread
From: Robert Kennedy @ 2007-01-05  1:18 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Robert Kennedy, Richard Guenther, gcc-patches

> > +     TODO: Consider iterating here through copies instead of checking
> > +     just one level up.  */
> implementing the TODO you mention here looks quite dangerous, asking for
> serious performance problems.

I assume you mean that you are concerned about an increase in
compilation time. That's why I said "consider." It would need to be
considered, implemented, and tested to make sure it didn't cause
trouble.

But I would be willing to simply remove that part of the comment if
you prefer.

> I am not sure making tree_simplify_using_condition
> more complex is a good idea -- for what kind of expressions do you
> need this change?

We discussed this a little bit privately in the past, but I don't
think I made clear the connection between that discussion and this
change. The situation is that you have a trip-countable loop guarded
by an if condition, and after SR/LFTR, the loop termination test is
replaced but the guard is not. As a result, slightly more
sophistication is needed to compute the trip count for the loop while
taking into account the guard.

Example:

   for (i = 0; i < n; i++)
     body(i * 4);

becomes:

   if (n > 0)
     {
       i = 0;
       do
         {
           body(i * 4);
           i++;
         }
       while (i < n);
     }

In that form, the do {} while loop is trip-countable without my change
because the (n > 0) guard proves easily that n cannot be negative
inside the "then" block.

But now with SR/LFTR, the code becomes:

    if (n > 0)
      {
        osrtemp.i = 0;
        osrtemp.n = n * 4;
        do
          {
            body(osrtemp.i);
            osrtemp.i += 4;
          }
        while (osrtemp.i < osrtemp.n);
      }

If the trip count needs to be computed after such a transformation, my
change is needed because now the loop termination test involves
variables different from the ones in the guarding "if" condition.

> Also, I think you should provide some measurements supporting the
> need for this change -- i.e., verify that it does not increase
> compile time...

What is the accepted way of measuring compile time?

Thanks for your comments!

	-- Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-05  1:18       ` Robert Kennedy
@ 2007-01-05  7:46         ` Zdenek Dvorak
  2007-01-05 19:08           ` Robert Kennedy
  0 siblings, 1 reply; 15+ messages in thread
From: Zdenek Dvorak @ 2007-01-05  7:46 UTC (permalink / raw)
  To: Robert Kennedy; +Cc: Richard Guenther, gcc-patches

Hello,

> > I am not sure making tree_simplify_using_condition
> > more complex is a good idea -- for what kind of expressions do you
> > need this change?
> 
> We discussed this a little bit privately in the past, but I don't
> think I made clear the connection between that discussion and this
> change. The situation is that you have a trip-countable loop guarded
> by an if condition, and after SR/LFTR, the loop termination test is
> replaced but the guard is not. As a result, slightly more
> sophistication is needed to compute the trip count for the loop while
> taking into account the guard.
> 
> Example:
> 
>    for (i = 0; i < n; i++)
>      body(i * 4);
> 
> becomes:
> 
>    if (n > 0)
>      {
>        i = 0;
>        do
>          {
>            body(i * 4);
>            i++;
>          }
>        while (i < n);
>      }
> 
> In that form, the do {} while loop is trip-countable without my change
> because the (n > 0) guard proves easily that n cannot be negative
> inside the "then" block.
> 
> But now with SR/LFTR, the code becomes:
> 
>     if (n > 0)
>       {
>         osrtemp.i = 0;
>         osrtemp.n = n * 4;
>         do
>           {
>             body(osrtemp.i);
>             osrtemp.i += 4;
>           }
>         while (osrtemp.i < osrtemp.n);
>       }
> 
> If the trip count needs to be computed after such a transformation, my
> change is needed because now the loop termination test involves
> variables different from the ones in the guarding "if" condition.

perhaps making expand_simple_operations consider multiplication by a
constant to be simple would work as well, at least for this testcase?

> > Also, I think you should provide some measurements supporting the
> > need for this change -- i.e., verify that it does not increase
> > compile time...
> 
> What is the accepted way of measuring compile time?

I usually measure how long it takes to compile preprocessed gcc sources
(although I would hardly claim that this is "the accepted way of
measuring compile time"),

Zdenek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-05  7:46         ` Zdenek Dvorak
@ 2007-01-05 19:08           ` Robert Kennedy
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Kennedy @ 2007-01-05 19:08 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Robert Kennedy, Richard Guenther, gcc-patches

> > What is the accepted way of measuring compile time?
> 
> I usually measure how long it takes to compile preprocessed gcc sources
> (although I would hardly claim that this is "the accepted way of
> measuring compile time"),

Any particular application you like to use the sources for? I suspect
you might say GCC, but I want to ask. Thanks.

	-- Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fold more aggressively for trip counts
  2007-01-04 22:59     ` Zdenek Dvorak
  2007-01-05  1:18       ` Robert Kennedy
@ 2007-01-10  0:05       ` Robert Kennedy
  1 sibling, 0 replies; 15+ messages in thread
From: Robert Kennedy @ 2007-01-10  0:05 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Robert Kennedy, Richard Guenther, gcc-patches, dannyb

Talking about the change I proposed to
tree_simplify_using_condition_1, Zdenek said:

> Also, I think you should provide some measurements supporting the
> need for this change -- i.e., verify that it does not increase
> compile time, and measure how many times is this change useful (for
> example, during gcc bootstrap).

I have collected this data. The difference in compile time on
preprocessed GCC sources is in the noise (measured delta in user CPU
time on an otherwise unloaded machine is ~0.3%):

whippen:/usr/jimbob/gcc-ppsrc> time ./timeit ~/downloads/prefix_with_change/bin/gcc
../../gcc/varasm.c: In function ‘const_rtx_hash_1’:
../../gcc/varasm.c:3083: warning: right shift count >= width of type
846.412u 9.956s 14:18.02 99.8%  0+0k 0+0io 0pf+0w
whippen:/usr/jimbob/gcc-ppsrc> time ./timeit ~/downloads/prefix_without_change/bin/gcc
../../gcc/varasm.c: In function ‘const_rtx_hash_1’:
../../gcc/varasm.c:3083: warning: right shift count >= width of type
843.652u 9.864s 14:16.05 99.7%  0+0k 0+0io 0pf+0w

Without OSR placed before loop optimizations like vectorization, etc.,
the number of times the change is useful during GCC bootstrap turns
out to be zero.

In light of that fact, maybe it's best if I hold onto the patch and
resubmit later if it ever becomes useful. Currently we are proposing
to do OSR late enough that this change is not needed. Anyone disagree
with NOT putting this change in now?

	-- Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-01-10  0:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-03  2:40 [PATCH] Fold more aggressively for trip counts Robert Kennedy
2007-01-03  2:50 ` Andrew Pinski
2007-01-03  8:49   ` Paolo Bonzini
2007-01-03 13:57 ` Richard Guenther
2007-01-03 15:14   ` Daniel Berlin
2007-01-04 19:35   ` Robert Kennedy
2007-01-04 19:46     ` Richard Guenther
2007-01-04 19:49       ` Robert Kennedy
2007-01-04 19:54         ` Richard Guenther
2007-01-04 19:58           ` Robert Kennedy
2007-01-04 22:59     ` Zdenek Dvorak
2007-01-05  1:18       ` Robert Kennedy
2007-01-05  7:46         ` Zdenek Dvorak
2007-01-05 19:08           ` Robert Kennedy
2007-01-10  0:05       ` Robert Kennedy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).