public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, PR45098]
@ 2011-05-17  7:23 Tom de Vries
  2011-05-17  7:25 ` [PATCH, PR45098, 1/10] Tom de Vries
                   ` (9 more replies)
  0 siblings, 10 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  7:23 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

Hi Zdenek,

I have a patch set for for PR45098.

01_object-size-target.patch
02_pr45098-rtx-cost-set.patch
03_pr45098-computation-cost.patch
04_pr45098-iv-init-cost.patch
05_pr45098-bound-cost.patch
06_pr45098-bound-cost.test.patch
07_pr45098-nowrap-limits-iterations.patch
08_pr45098-nowrap-limits-iterations.test.patch
09_pr45098-shift-add-cost.patch
10_pr45098-shift-add-cost.test.patch

I will sent out the patches individually.

The patch set has been bootstrapped and reg-tested on x86_64, and
reg-tested on ARM.

The effect of the patch set on examples is the removal of 1 iterator,
demonstrated below for '-Os -mthumb -march=armv7-a' on example tr4.

tr4.c:
...
extern void foo2 (short*);
void tr4 (short array[], int n)
{
  int i;
  if (n > 0)
    for (i = 0; i < n; i++)
      foo2 (&array[i]);
}
...

tr4.s diff (left without, right with patch):
...
push    {r4, r5, r6, lr}          |     cmp     r1, #0
subs    r6, r1, #0                |     push    {r3, r4, r5, lr}
ble     .L1                             ble     .L1
mov     r5, r0                    |     mov     r4, r0
movs    r4, #0                    |     add     r5, r0, r1, lsl #1
.L3:                                    .L3:
mov     r0, r5                    |     mov     r0, r4
adds    r4, r4, #1                |     adds    r4, r4, #2
bl      foo2                            bl      foo2
adds    r5, r5, #2                |     cmp     r4, r5
cmp     r4, r6                    <
bne     .L3                             bne     .L3
.L1:                                    .L1:
pop     {r4, r5, r6, pc}          |     pop     {r3, r4, r5, pc}
...


The effect of the patch set on the test cases in terms of size is listed
in the following 2 tables.

---------------------------
-Os -thumb -mmarch=armv7-a
---------------------------
    without    with   delta
---------------------------
tr1      32      30      -2
tr2      36      36       0
tr3      32      30      -2
tr4      26      26       0
tr5      20      20       0
---------------------------

---------------------------
-Os -mmarch=armv7-a
---------------------------
    without    with   delta
---------------------------
tr1      60      52      -8
tr2      64      60      -4
tr3      60      52      -8
tr4      48      44      -4
tr5      36      32      -4
---------------------------


The size impact on several benchmarks is shown in the following table
(%, lower is better).

                     none            pic
                thumb1  thumb2  thumb1 thumb2
spec2000          99.9    99.9    99.9   99.9
eembc             99.9   100.0    99.9  100.1
dhrystone        100.0   100.0   100.0  100.0
coremark	  99.3    99.9    99.3  100.0

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 1/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
@ 2011-05-17  7:25 ` Tom de Vries
  2011-05-19 11:17   ` [PATCH PR45098, 1/10] Proc object-size fix Tom de Vries
  2011-05-17  8:12 ` [PATCH, PR45098, 2/10] Tom de Vries
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  7:25 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 01_object-size-target.patch --]
[-- Type: text/x-patch, Size: 589 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	* lib/lib/scanasm.exp (object-size): Fix target selector handling.

Index: gcc/testsuite/lib/scanasm.exp
===================================================================
--- gcc/testsuite/lib/scanasm.exp (revision 173734)
+++ gcc/testsuite/lib/scanasm.exp (working copy)
@@ -330,7 +330,7 @@ proc object-size { args } {
 	return
     }
     if { [llength $args] >= 4 } {
-	switch [dg-process-target [lindex $args 1]] {
+	switch [dg-process-target [lindex $args 3]] {
 	    "S" { }
 	    "N" { return }
 	    "F" { setup_xfail "*-*-*" }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 2/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
  2011-05-17  7:25 ` [PATCH, PR45098, 1/10] Tom de Vries
@ 2011-05-17  8:12 ` Tom de Vries
  2011-05-18  9:39   ` Zdenek Dvorak
  2011-05-17  8:30 ` [PATCH, PR45098, 3/10] Tom de Vries
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  8:12 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 02_pr45098-rtx-cost-set.patch --]
[-- Type: text/x-patch, Size: 546 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -2745,7 +2745,7 @@ seq_cost (rtx seq, bool speed)
     {
       set = single_set (seq);
       if (set)
-	cost += rtx_cost (set, SET,speed);
+	cost += rtx_cost (SET_SRC (set), SET, speed);
       else
 	cost++;
     }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 3/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
  2011-05-17  7:25 ` [PATCH, PR45098, 1/10] Tom de Vries
  2011-05-17  8:12 ` [PATCH, PR45098, 2/10] Tom de Vries
@ 2011-05-17  8:30 ` Tom de Vries
  2011-05-18 10:10   ` Zdenek Dvorak
  2011-05-17  8:32 ` [PATCH, PR45098, 4/10] Tom de Vries
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  8:30 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 03_pr45098-computation-cost.patch --]
[-- Type: text/x-patch, Size: 727 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (computation_cost): Prevent cost of 0.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -2862,7 +2862,9 @@ computation_cost (tree expr, bool speed)
   default_rtl_profile ();
   node->frequency = real_frequency;
 
-  cost = seq_cost (seq, speed);
+  cost = (seq != NULL_RTX
+          ? seq_cost (seq, speed)
+          : (unsigned)rtx_cost (rslt, SET, speed));
   if (MEM_P (rslt))
     cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type),
 			  TYPE_ADDR_SPACE (type), speed);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 4/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
                   ` (2 preceding siblings ...)
  2011-05-17  8:30 ` [PATCH, PR45098, 3/10] Tom de Vries
@ 2011-05-17  8:32 ` Tom de Vries
  2011-05-18 17:46   ` [PATCH PR45098, 4/10] Iv init cost Tom de Vries
  2011-05-17  8:37 ` [PATCH, PR45098, 5/10] Tom de Vries
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  8:32 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 04_pr45098-iv-init-cost.patch --]
[-- Type: text/x-patch, Size: 672 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
	cost_base.cost == 0.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
 
   base = cand->iv->base;
   cost_base = force_var_cost (data, base, NULL);
+  if (cost_base.cost == 0)
+      cost_base.cost = COSTS_N_INSNS (1);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 5/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
                   ` (3 preceding siblings ...)
  2011-05-17  8:32 ` [PATCH, PR45098, 4/10] Tom de Vries
@ 2011-05-17  8:37 ` Tom de Vries
  2011-05-18 17:48   ` [PATCH PR45098, 5/10] Bound cost Tom de Vries
  2011-05-17  8:42 ` [PATCH, PR45098, 6/10] Tom de Vries
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  8:37 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 05_pr45098-bound-cost.patch --]
[-- Type: text/x-patch, Size: 5912 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (get_expr_id): Factored new function out of
	get_loop_invariant_expr_id.
	(get_loop_invariant_expr_id): Use get_expr_id.
	(parm_decl_cost): New function.
	(determine_use_iv_cost_condition): Use get_expr_id and parm_decl_cost.
	Improve bound cost estimation.  Use different inv_expr_id for elim and
	express cases.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3835,6 +3835,28 @@ compare_aff_trees (aff_tree *aff1, aff_t
   return true;
 }
 
+/* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
+
+static int
+get_expr_id (struct ivopts_data *data, tree expr)
+{
+  struct iv_inv_expr_ent ent;
+  struct iv_inv_expr_ent **slot;
+
+  ent.expr = expr;
+  ent.hash = iterative_hash_expr (expr, 0);
+  slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab,
+                                                     &ent, INSERT);
+  if (*slot)
+    return (*slot)->id;
+
+  *slot = XNEW (struct iv_inv_expr_ent);
+  (*slot)->expr = expr;
+  (*slot)->hash = ent.hash;
+  (*slot)->id = data->inv_expr_id++;
+  return (*slot)->id;
+}
+
 /* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
    requires a new compiler generated temporary.  Returns -1 otherwise.
    ADDRESS_P is a flag indicating if the expression is for address
@@ -3847,8 +3869,6 @@ get_loop_invariant_expr_id (struct ivopt
 {
   aff_tree ubase_aff, cbase_aff;
   tree expr, ub, cb;
-  struct iv_inv_expr_ent ent;
-  struct iv_inv_expr_ent **slot;
 
   STRIP_NOPS (ubase);
   STRIP_NOPS (cbase);
@@ -3936,18 +3956,7 @@ get_loop_invariant_expr_id (struct ivopt
   aff_combination_scale (&cbase_aff, shwi_to_double_int (-1 * ratio));
   aff_combination_add (&ubase_aff, &cbase_aff);
   expr = aff_combination_to_tree (&ubase_aff);
-  ent.expr = expr;
-  ent.hash = iterative_hash_expr (expr, 0);
-  slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab,
-                                                     &ent, INSERT);
-  if (*slot)
-    return (*slot)->id;
-
-  *slot = XNEW (struct iv_inv_expr_ent);
-  (*slot)->expr = expr;
-  (*slot)->hash = ent.hash;
-  (*slot)->id = data->inv_expr_id++;
-  return  (*slot)->id;
+  return get_expr_id (data, expr);
 }
 
 
@@ -4412,6 +4421,23 @@ may_eliminate_iv (struct ivopts_data *da
   return true;
 }
 
+ /* Calculates the cost of BOUND, if it is a PARM_DECL.  A PARM_DECL must
+    be copied, if is is used in the loop body and DATA->body_includes_call.  */
+
+static int
+parm_decl_cost (struct ivopts_data *data, tree bound)
+{
+  tree sbound = bound;
+  STRIP_NOPS (sbound);
+
+  if (TREE_CODE (sbound) == SSA_NAME
+      && TREE_CODE (SSA_NAME_VAR (sbound)) == PARM_DECL
+      && gimple_nop_p (SSA_NAME_DEF_STMT (sbound))
+      && data->body_includes_call)
+    return COSTS_N_INSNS (1);
+
+  return 0;
+}
 
 /* Determines cost of basing replacement of USE on CAND in a condition.  */
 
@@ -4422,9 +4448,9 @@ determine_use_iv_cost_condition (struct 
   tree bound = NULL_TREE;
   struct iv *cmp_iv;
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
-  comp_cost elim_cost, express_cost, cost;
+  comp_cost elim_cost, express_cost, cost, bound_cost;
   bool ok;
-  int inv_expr_id = -1;
+  int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id;
   tree *control_var, *bound_cst;
 
   /* Only consider real candidates.  */
@@ -4438,6 +4464,21 @@ determine_use_iv_cost_condition (struct 
   if (may_eliminate_iv (data, use, cand, &bound))
     {
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
+      if (elim_cost.cost == 0)
+        elim_cost.cost = parm_decl_cost (data, bound);
+      else if (TREE_CODE (bound) == INTEGER_CST)
+        elim_cost.cost = 0;
+      /* If we replace a loop condition 'i < n' with 'p < base + n',
+	 depends_on_elim will have 'base' and 'n' set, which implies
+	 that both 'base' and 'n' will be live during the loop.	 More likely,
+	 'base + n' will be loop invariant, resulting in only one live value
+	 during the loop.  So in that case we clear depends_on_elim and set
+        elim_inv_expr_id instead.  */
+      if (depends_on_elim && bitmap_count_bits (depends_on_elim) > 1)
+	{
+	  elim_inv_expr_id = get_expr_id (data, bound);
+	  bitmap_clear (depends_on_elim);
+	}
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
       elim_cost.cost = adjust_setup_cost (data, elim_cost.cost);
@@ -4465,16 +4506,25 @@ determine_use_iv_cost_condition (struct 
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
-                                       &inv_expr_id);
+                                       &express_inv_expr_id);
   fd_ivopts_data = data;
   walk_tree (&cmp_iv->base, find_depends, &depends_on_express, NULL);
 
+  /* Count the cost of the original bound as well.  */
+  bound_cost = force_var_cost (data, *bound_cst, NULL);
+  if (bound_cost.cost == 0)
+    bound_cost.cost = parm_decl_cost (data, *bound_cst);
+  else if (TREE_CODE (*bound_cst) == INTEGER_CST)
+    bound_cost.cost = 0;
+  express_cost.cost += bound_cost.cost;
+
   /* Choose the better approach, preferring the eliminated IV. */
   if (compare_costs (elim_cost, express_cost) <= 0)
     {
       cost = elim_cost;
       depends_on = depends_on_elim;
       depends_on_elim = NULL;
+      inv_expr_id = elim_inv_expr_id;
     }
   else
     {
@@ -4482,6 +4532,7 @@ determine_use_iv_cost_condition (struct 
       depends_on = depends_on_express;
       depends_on_express = NULL;
       bound = NULL_TREE;
+      inv_expr_id = express_inv_expr_id;
     }
 
   set_use_iv_cost (data, use, cand, cost, depends_on, bound, inv_expr_id);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 6/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
                   ` (4 preceding siblings ...)
  2011-05-17  8:37 ` [PATCH, PR45098, 5/10] Tom de Vries
@ 2011-05-17  8:42 ` Tom de Vries
  2011-05-18 17:48   ` [PATCH PR45098, 6/10] Bound cost - test cases Tom de Vries
  2011-05-17  8:58 ` [PATCH, PR45098, 7/10] Tom de Vries
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  8:42 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 06_pr45098-bound-cost.test.patch --]
[-- Type: text/x-patch, Size: 1466 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* gcc.target/arm/ivopts.c: New test.
	* gcc.target/arm/ivopts-2.c: New test.

Index: gcc/testsuite/gcc.target/arm/ivopts-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-2.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern void foo2 (short*);
+
+void
+tr4 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      foo2 (&array[x]);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 26 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.target/arm/ivopts.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts.c (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+void
+tr5 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      array[x] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 20 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 7/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
                   ` (5 preceding siblings ...)
  2011-05-17  8:42 ` [PATCH, PR45098, 6/10] Tom de Vries
@ 2011-05-17  8:58 ` Tom de Vries
  2011-05-18 17:52   ` [PATCH PR45098, 7/10] Nowrap limits iterations Tom de Vries
  2011-05-17  9:03 ` [PATCH, PR45098, 8/10] Tom de Vries
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  8:58 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 07_pr45098-nowrap-limits-iterations.patch --]
[-- Type: text/x-patch, Size: 4827 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields
	max_iterations_p and max_iterations.
	(is_nonwrap_use, max_loop_iterations, set_max_iterations): New function.
	(may_eliminate_iv): Use max_iterations_p and max_iterations.
	(tree_ssa_iv_optimize_loop): Use set_max_iterations.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c (revision 173355)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -291,6 +291,12 @@ struct ivopts_data
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether max_iterations is valid.  */
+  bool max_iterations_p;
+
+  /* Maximum number of iterations of current_loop.  */
+  double_int max_iterations;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -4319,6 +4325,108 @@ iv_elimination_compare (struct ivopts_da
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Determine if USE contains non-wrapping arithmetic.  */
+
+static bool
+is_nonwrap_use (struct ivopts_data *data, struct iv_use *use)
+{
+  gimple stmt = use->stmt;
+  tree var, ptr, ptr_type;
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  switch (gimple_assign_rhs_code (stmt))
+    {
+    case POINTER_PLUS_EXPR:
+      ptr = gimple_assign_rhs1 (stmt);
+      ptr_type = TREE_TYPE (ptr);
+      var = gimple_assign_rhs2 (stmt);
+      if (!expr_invariant_in_loop_p (data->current_loop, ptr))
+        return false;
+      break;
+    case ARRAY_REF:
+      ptr = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 0);
+      ptr_type = build_pointer_type (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+      var = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 1);
+      break;
+    default:
+      return false;
+    }
+
+  if (!nowrap_type_p (ptr_type))
+    return false;
+
+  if (TYPE_PRECISION (ptr_type) != TYPE_PRECISION (TREE_TYPE (var)))
+    return false;
+
+  return true;
+}
+
+/* Attempt to infer maximum number of loop iterations of DATA->current_loop
+   from uses in loop containing non-wrapping arithmetic.  If successful,
+   return true, and return maximum iterations in MAX_NITER.  */
+
+static bool
+max_loop_iterations (struct ivopts_data *data, double_int *max_niter)
+{
+  struct iv_use *use;
+  struct iv *iv;
+  bool found = false;
+  double_int period;
+  gimple stmt;
+  unsigned i;
+
+  for (i = 0; i < n_iv_uses (data); i++)
+    {
+      use = iv_use (data, i);
+
+      stmt = use->stmt;
+      if (!just_once_each_iteration_p (data->current_loop, gimple_bb (stmt)))
+	continue;
+
+      if (!is_nonwrap_use (data, use))
+        continue;
+
+      iv = use->iv;
+      if (iv->step == NULL_TREE || TREE_CODE (iv->step) != INTEGER_CST)
+	continue;
+      period = tree_to_double_int (iv_period (iv));
+
+      if (found)
+        *max_niter = double_int_umin (*max_niter, period);
+      else
+        {
+          found = true;
+          *max_niter = period;
+        }
+    }
+
+  return found;
+}
+
+/* Initializes DATA->max_iterations and DATA->max_iterations_p.  */
+
+static void
+set_max_iterations (struct ivopts_data *data)
+{
+  double_int max_niter, max_niter2;
+  bool estimate1, estimate2;
+
+  data->max_iterations_p = false;
+  estimate1 = estimated_loop_iterations (data->current_loop, true, &max_niter);
+  estimate2 = max_loop_iterations (data, &max_niter2);
+  if (!(estimate1 || estimate2))
+    return;
+  if (estimate1 && estimate2)
+    data->max_iterations = double_int_umin (max_niter, max_niter2);
+  else if (estimate1)
+    data->max_iterations = max_niter;
+  else
+    data->max_iterations = max_niter2;
+  data->max_iterations_p = true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND.  */
 
@@ -4391,10 +4499,10 @@ may_eliminate_iv (struct ivopts_data *da
           /* See if we can take advantage of infered loop bound information.  */
           if (loop_only_exit_p (loop, exit))
             {
-              if (!estimated_loop_iterations (loop, true, &max_niter))
+              if (!data->max_iterations_p)
                 return false;
               /* The loop bound is already adjusted by adding 1.  */
-              if (double_int_ucmp (max_niter, period_value) > 0)
+              if (double_int_ucmp (data->max_iterations, period_value) > 0)
                 return false;
             }
           else
@@ -6390,6 +6498,8 @@ tree_ssa_iv_optimize_loop (struct ivopts
   /* Finds candidates for the induction variables (item 2).  */
   find_iv_candidates (data);
 
+  set_max_iterations (data);
+
   /* Calculates the costs (item 3, part 1).  */
   determine_iv_costs (data);
   determine_use_iv_costs (data);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098,  8/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
                   ` (6 preceding siblings ...)
  2011-05-17  8:58 ` [PATCH, PR45098, 7/10] Tom de Vries
@ 2011-05-17  9:03 ` Tom de Vries
  2011-05-18 18:23   ` [PATCH PR45098, 8/10] Nowrap limits iterations - test cases Tom de Vries
  2011-05-18 18:27   ` [PATCH PR45098, 9/10] Cheap shift-add Tom de Vries
  2011-05-17 10:03 ` [PATCH, PR45098, 9/10] Tom de Vries
  2011-05-17 10:30 ` [PATCH, PR45098, 10/10] Tom de Vries
  9 siblings, 2 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-17  9:03 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 08_pr45098-nowrap-limits-iterations.test.patch --]
[-- Type: text/x-patch, Size: 3625 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* gcc.target/arm/ivopts-3.c: New test.
	* gcc.target/arm/ivopts-4.c: New test.
	* gcc.target/arm/ivopts-5.c: New test.
	* gcc.dg/tree-ssa/ivopt_infer_2.c: Adapt test.

Index: gcc/testsuite/gcc.target/arm/ivopts-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-3.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern unsigned int foo2 (short*) __attribute__((pure));
+
+unsigned int
+tr3 (short array[], unsigned int n)
+{
+  unsigned sum = 0;
+  unsigned int x;
+  for (x = 0; x < n; x++)
+    sum += foo2 (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <x" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times ", x" 0 "ivopts"} } */
+/* { dg-final { object-size text <= 30 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.target/arm/ivopts-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-4.c (revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do assemble } */
+/* { dg-options "-mthumb -Os -fdump-tree-ivopts -save-temps" } */
+
+extern unsigned int foo (int*) __attribute__((pure));
+
+unsigned int
+tr2 (int array[], int n)
+{
+  unsigned int sum = 0;
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      sum += foo (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <x" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times ", x" 0 "ivopts"} } */
+/* { dg-final { object-size text <= 36 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.target/arm/ivopts-5.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-5.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern unsigned int foo (int*) __attribute__((pure));
+
+unsigned int
+tr1 (int array[], unsigned int n)
+{
+  unsigned int sum = 0;
+  unsigned int x;
+  for (x = 0; x < n; x++)
+    sum += foo (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <x" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times ", x" 0 "ivopts"} } */
+/* { dg-final { object-size text <= 30 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 173380)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(working copy)
@@ -7,7 +7,8 @@
 
 extern int a[];
 
-/* Can not infer loop iteration from array -- exit test can not be replaced.  */
+/* Can infer loop iteration from nonwrapping pointer arithmetic.
+   exit test can be replaced.  */
 void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
 {
       TYPE dstn= dst + i_width;
@@ -21,5 +22,5 @@ void foo (int i_width, TYPE dst, TYPE sr
        }
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 9/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
                   ` (7 preceding siblings ...)
  2011-05-17  9:03 ` [PATCH, PR45098, 8/10] Tom de Vries
@ 2011-05-17 10:03 ` Tom de Vries
  2011-05-17 10:30 ` [PATCH, PR45098, 10/10] Tom de Vries
  9 siblings, 0 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-17 10:03 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 09_pr45098-shift-add-cost.patch --]
[-- Type: text/x-patch, Size: 2834 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c: Include expmed.h.
	(get_shiftadd_cost): New function.
	(force_expr_to_var_cost): Use get_shiftadd_cost.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -92,6 +92,12 @@ along with GCC; see the file COPYING3.  
 #include "tree-inline.h"
 #include "tree-ssa-propagate.h"
 
+/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
+ */
+#include "expmed.h"
+#undef add_cost
+#undef zero_cost
+
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
    interface between the GIMPLE and RTL worlds.  */
@@ -3504,6 +3510,37 @@ get_address_cost (bool symbol_present, b
   return new_cost (cost + acost, complexity);
 }
 
+ /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
+    the EXPR operand holding the shift.  COST0 and COST1 are the costs for
+    calculating the operands of EXPR.  Returns true if successful, and returns
+    the cost in COST.  */
+
+static bool
+get_shiftadd_cost (tree expr, enum machine_mode mode, comp_cost cost0,
+                   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+{
+  comp_cost res;
+  tree op1 = TREE_OPERAND (expr, 1);
+  tree cst = TREE_OPERAND (mult, 1);
+  int m = exact_log2 (int_cst_value (cst));
+  int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
+  int sa_cost;
+
+  if (!(m >= 0 && m < maxm))
+    return false;
+
+  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
+             ? shiftadd_cost[speed][mode][m]
+             : (mult == op1
+                ? shiftsub1_cost[speed][mode][m]
+                : shiftsub0_cost[speed][mode][m]));
+  res = new_cost (sa_cost, 0);
+  res = add_costs (res, mult == op1 ? cost0 : cost1);
+
+  *cost = res;
+  return true;
+}
+
 /* Estimates cost of forcing expression EXPR into a variable.  */
 
 static comp_cost
@@ -3629,6 +3666,21 @@ force_expr_to_var_cost (tree expr, bool 
     case MINUS_EXPR:
     case NEGATE_EXPR:
       cost = new_cost (add_cost (mode, speed), 0);
+      if (TREE_CODE (expr) != NEGATE_EXPR)
+        {
+          tree mult = NULL_TREE;
+          comp_cost sa_cost;
+          if (TREE_CODE (op1) == MULT_EXPR)
+            mult = op1;
+          else if (TREE_CODE (op0) == MULT_EXPR)
+            mult = op0;
+
+          if (mult != NULL_TREE
+              && TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST
+              && get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed,
+                                    &sa_cost))
+            return sa_cost;
+        }
       break;
 
     case MULT_EXPR:

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH, PR45098, 10/10]
  2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
                   ` (8 preceding siblings ...)
  2011-05-17 10:03 ` [PATCH, PR45098, 9/10] Tom de Vries
@ 2011-05-17 10:30 ` Tom de Vries
  2011-05-18 18:30   ` [PATCH PR45098, 10/10] Cheap shift-add - test case Tom de Vries
  9 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-17 10:30 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 10_pr45098-shift-add-cost.test.patch --]
[-- Type: text/x-patch, Size: 692 bytes --]

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* gcc.target/arm/ivopts-6.c: New test.

Index: gcc/testsuite/gcc.target/arm/ivopts-6.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-6.c (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -fdump-tree-ivopts -save-temps -marm" } */
+
+void
+tr5 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      array[x] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 32 } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH, PR45098, 2/10]
  2011-05-17  8:12 ` [PATCH, PR45098, 2/10] Tom de Vries
@ 2011-05-18  9:39   ` Zdenek Dvorak
  0 siblings, 0 replies; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-18  9:39 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost.

OK,

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH, PR45098, 3/10]
  2011-05-17  8:30 ` [PATCH, PR45098, 3/10] Tom de Vries
@ 2011-05-18 10:10   ` Zdenek Dvorak
  2011-05-18 13:00     ` Tom de Vries
  0 siblings, 1 reply; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-18 10:10 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c (computation_cost): Prevent cost of 0.

this looks strange.  Something like

  cost = seq_cost (seq, speed);
  if (MEM_P (rslt))
    the current code;
  else
    cost += rtx_cost (rslt, SET, speed));

would make more sense to me (if I understand correctly what you are
trying to achieve).

Zdenek

> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -2862,7 +2862,9 @@ computation_cost (tree expr, bool speed)
>    default_rtl_profile ();
>    node->frequency = real_frequency;
>  
> -  cost = seq_cost (seq, speed);
> +  cost = (seq != NULL_RTX
> +          ? seq_cost (seq, speed)
> +          : (unsigned)rtx_cost (rslt, SET, speed));
>    if (MEM_P (rslt))
>      cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type),
>  			  TYPE_ADDR_SPACE (type), speed);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH, PR45098, 3/10]
  2011-05-18 10:10   ` Zdenek Dvorak
@ 2011-05-18 13:00     ` Tom de Vries
       [not found]       ` <20110518152457.GA13360@kam.mff.cuni.cz>
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 13:00 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

Hi Zdenek,

thanks for the review.

On 05/18/2011 09:26 AM, Zdenek Dvorak wrote:
>> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>>
>> 	PR target/45098
>> 	* tree-ssa-loop-ivopts.c (computation_cost): Prevent cost of 0.
> 
> this looks strange.  Something like
> 
>   cost = seq_cost (seq, speed);
>   if (MEM_P (rslt))
>     the current code;
>   else
>     cost += rtx_cost (rslt, SET, speed));
> 
> would make more sense to me (if I understand correctly what you are
> trying to achieve).
> 

Sorry for not putting an explanation in the first place.

I'm trying to achieve that the cost of forcing an int into a reg is not
0. Currently, if the expansion results in an rtx expression rather than
an insn, seq is NULL and seq_cost returns 0, after which
computation_cost returns 0:
...
(gdb) call debug_generic_expr (expr)
2000
(gdb) call debug_rtx (rslt)
(const_int 2000 [0x7d0])
(gdb) p seq
$5 = (rtx) 0x0
...

Using an assert on an x86_64 build, I found this case where seq != NULL_RTX:
...
gdb) call debug_generic_expr (expr)
(int) ((unsigned int) ivtmp.11 * 8)
(gdb) call debug_rtx (rslt)
(reg:SI 62)
(gdb) call debug_rtx (seq)
(insn 4 0 0 (parallel [
            (set (reg:SI 62)
                (ashift:SI (subreg:SI (reg:DI 59 [ ivtmp.11 ]) 0)
                    (const_int 3 [0x3])))
            (clobber (reg:CC 17 flags))
        ]) -1
     (nil))
...
we don't need to additionally count a regcopy of r62 on top of seq_cost
in this case.

How about:
...
@@ -2866,6 +2878,8 @@ computation_cost (tree expr, bool speed)
   if (MEM_P (rslt))
     cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type),
 			  TYPE_ADDR_SPACE (type), speed);
+  else if (!REG_P (rslt))
+    cost += (unsigned)rtx_cost (rslt, SET, speed);

   return cost;
 }
...
?

Thanks
- Tom

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098,  4/10] Iv init cost.
  2011-05-17  8:32 ` [PATCH, PR45098, 4/10] Tom de Vries
@ 2011-05-18 17:46   ` Tom de Vries
  2011-05-18 22:59     ` Zdenek Dvorak
  2011-05-25 14:20     ` Richard Sandiford
  0 siblings, 2 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 17:46 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

On 05/17/2011 09:17 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
>>
> 
> OK for trunk?
> 
> Thanks,
> - Tom

Resubmitting with comment.

The init cost of an iv will in general not be zero. It will be
exceptional that the iv register happens to be initialized with the
proper value at no cost. In general, there will at the very least be a
regcopy or a const set.

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
	cost_base.cost == 0.

[-- Attachment #2: 04_pr45098-iv-init-cost.patch --]
[-- Type: text/x-patch, Size: 544 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
 
   base = cand->iv->base;
   cost_base = force_var_cost (data, base, NULL);
+  if (cost_base.cost == 0)
+      cost_base.cost = COSTS_N_INSNS (1);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098,  5/10] Bound cost
  2011-05-17  8:37 ` [PATCH, PR45098, 5/10] Tom de Vries
@ 2011-05-18 17:48   ` Tom de Vries
  2011-05-19  4:45     ` Zdenek Dvorak
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 17:48 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On 05/17/2011 09:18 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
>>
> 
> OK for trunk?
> 
> Thanks,
> - Tom

Resubmitting with comment.

This improves the estimation of cost of bound calculation:
- It tries to estimate whether an ssa_name can be used in the loop
  at zero cost, or whether a regcopy is needed to keep the ssa_name
  alive during the loop, and counts the cost of the regcopy.
- It adjusts the cost of a complex loop bound: if a complex loop bound
  uses more that 1 invariant, it is counted as a new single invariant.
- It estimates the cost of an int bound at zero.

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (get_expr_id): Factored new function out of
	get_loop_invariant_expr_id.
	(get_loop_invariant_expr_id): Use get_expr_id.
	(parm_decl_cost): New function.
	(determine_use_iv_cost_condition): Use get_expr_id and parm_decl_cost.
	Improve bound cost estimation.  Use different inv_expr_id for elim and
	express cases.

[-- Attachment #2: 05_pr45098-bound-cost.patch --]
[-- Type: text/x-patch, Size: 5504 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3835,6 +3835,28 @@ compare_aff_trees (aff_tree *aff1, aff_t
   return true;
 }
 
+/* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
+
+static int
+get_expr_id (struct ivopts_data *data, tree expr)
+{
+  struct iv_inv_expr_ent ent;
+  struct iv_inv_expr_ent **slot;
+
+  ent.expr = expr;
+  ent.hash = iterative_hash_expr (expr, 0);
+  slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab,
+                                                     &ent, INSERT);
+  if (*slot)
+    return (*slot)->id;
+
+  *slot = XNEW (struct iv_inv_expr_ent);
+  (*slot)->expr = expr;
+  (*slot)->hash = ent.hash;
+  (*slot)->id = data->inv_expr_id++;
+  return (*slot)->id;
+}
+
 /* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
    requires a new compiler generated temporary.  Returns -1 otherwise.
    ADDRESS_P is a flag indicating if the expression is for address
@@ -3847,8 +3869,6 @@ get_loop_invariant_expr_id (struct ivopt
 {
   aff_tree ubase_aff, cbase_aff;
   tree expr, ub, cb;
-  struct iv_inv_expr_ent ent;
-  struct iv_inv_expr_ent **slot;
 
   STRIP_NOPS (ubase);
   STRIP_NOPS (cbase);
@@ -3936,18 +3956,7 @@ get_loop_invariant_expr_id (struct ivopt
   aff_combination_scale (&cbase_aff, shwi_to_double_int (-1 * ratio));
   aff_combination_add (&ubase_aff, &cbase_aff);
   expr = aff_combination_to_tree (&ubase_aff);
-  ent.expr = expr;
-  ent.hash = iterative_hash_expr (expr, 0);
-  slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab,
-                                                     &ent, INSERT);
-  if (*slot)
-    return (*slot)->id;
-
-  *slot = XNEW (struct iv_inv_expr_ent);
-  (*slot)->expr = expr;
-  (*slot)->hash = ent.hash;
-  (*slot)->id = data->inv_expr_id++;
-  return  (*slot)->id;
+  return get_expr_id (data, expr);
 }
 
 
@@ -4412,6 +4421,23 @@ may_eliminate_iv (struct ivopts_data *da
   return true;
 }
 
+ /* Calculates the cost of BOUND, if it is a PARM_DECL.  A PARM_DECL must
+    be copied, if is is used in the loop body and DATA->body_includes_call.  */
+
+static int
+parm_decl_cost (struct ivopts_data *data, tree bound)
+{
+  tree sbound = bound;
+  STRIP_NOPS (sbound);
+
+  if (TREE_CODE (sbound) == SSA_NAME
+      && TREE_CODE (SSA_NAME_VAR (sbound)) == PARM_DECL
+      && gimple_nop_p (SSA_NAME_DEF_STMT (sbound))
+      && data->body_includes_call)
+    return COSTS_N_INSNS (1);
+
+  return 0;
+}
 
 /* Determines cost of basing replacement of USE on CAND in a condition.  */
 
@@ -4422,9 +4448,9 @@ determine_use_iv_cost_condition (struct 
   tree bound = NULL_TREE;
   struct iv *cmp_iv;
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
-  comp_cost elim_cost, express_cost, cost;
+  comp_cost elim_cost, express_cost, cost, bound_cost;
   bool ok;
-  int inv_expr_id = -1;
+  int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id;
   tree *control_var, *bound_cst;
 
   /* Only consider real candidates.  */
@@ -4438,6 +4464,21 @@ determine_use_iv_cost_condition (struct 
   if (may_eliminate_iv (data, use, cand, &bound))
     {
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
+      if (elim_cost.cost == 0)
+        elim_cost.cost = parm_decl_cost (data, bound);
+      else if (TREE_CODE (bound) == INTEGER_CST)
+        elim_cost.cost = 0;
+      /* If we replace a loop condition 'i < n' with 'p < base + n',
+	 depends_on_elim will have 'base' and 'n' set, which implies
+	 that both 'base' and 'n' will be live during the loop.	 More likely,
+	 'base + n' will be loop invariant, resulting in only one live value
+	 during the loop.  So in that case we clear depends_on_elim and set
+        elim_inv_expr_id instead.  */
+      if (depends_on_elim && bitmap_count_bits (depends_on_elim) > 1)
+	{
+	  elim_inv_expr_id = get_expr_id (data, bound);
+	  bitmap_clear (depends_on_elim);
+	}
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
       elim_cost.cost = adjust_setup_cost (data, elim_cost.cost);
@@ -4465,16 +4506,25 @@ determine_use_iv_cost_condition (struct 
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
-                                       &inv_expr_id);
+                                       &express_inv_expr_id);
   fd_ivopts_data = data;
   walk_tree (&cmp_iv->base, find_depends, &depends_on_express, NULL);
 
+  /* Count the cost of the original bound as well.  */
+  bound_cost = force_var_cost (data, *bound_cst, NULL);
+  if (bound_cost.cost == 0)
+    bound_cost.cost = parm_decl_cost (data, *bound_cst);
+  else if (TREE_CODE (*bound_cst) == INTEGER_CST)
+    bound_cost.cost = 0;
+  express_cost.cost += bound_cost.cost;
+
   /* Choose the better approach, preferring the eliminated IV. */
   if (compare_costs (elim_cost, express_cost) <= 0)
     {
       cost = elim_cost;
       depends_on = depends_on_elim;
       depends_on_elim = NULL;
+      inv_expr_id = elim_inv_expr_id;
     }
   else
     {
@@ -4482,6 +4532,7 @@ determine_use_iv_cost_condition (struct 
       depends_on = depends_on_express;
       depends_on_express = NULL;
       bound = NULL_TREE;
+      inv_expr_id = express_inv_expr_id;
     }
 
   set_use_iv_cost (data, use, cand, cost, depends_on, bound, inv_expr_id);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098,  6/10] Bound cost - test cases.
  2011-05-17  8:42 ` [PATCH, PR45098, 6/10] Tom de Vries
@ 2011-05-18 17:48   ` Tom de Vries
  0 siblings, 0 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 17:48 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 689 bytes --]

On 05/17/2011 09:19 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
>>
> 
> OK for trunk?
> 
> Thanks,
> - Tom

These patch adds 2 new test cases. These need the preceding patches to pass.

[-- Attachment #2: 06_pr45098-bound-cost.test.patch --]
[-- Type: text/x-patch, Size: 1320 bytes --]

Index: gcc/testsuite/gcc.target/arm/ivopts-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-2.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern void foo2 (short*);
+
+void
+tr4 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      foo2 (&array[x]);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 26 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.target/arm/ivopts.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts.c (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+void
+tr5 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      array[x] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 20 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-17  8:58 ` [PATCH, PR45098, 7/10] Tom de Vries
@ 2011-05-18 17:52   ` Tom de Vries
  2011-05-19  4:45     ` Zdenek Dvorak
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 17:52 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

On 05/17/2011 09:20 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
>>
> 
> OK for trunk?
> 
> Thanks,
> - Tom

Resubmitting with comment.

This patch attemps to estimate the number of iterations of the loop based on
nonwrapping arithmetic in the loop body.

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields
	max_iterations_p and max_iterations.
	(is_nonwrap_use, max_loop_iterations, set_max_iterations): New function.
	(may_eliminate_iv): Use max_iterations_p and max_iterations.
	(tree_ssa_iv_optimize_loop): Use set_max_iterations.

[-- Attachment #2: 07_pr45098-nowrap-limits-iterations.patch --]
[-- Type: text/x-patch, Size: 4472 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c (revision 173355)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -291,6 +291,12 @@ struct ivopts_data
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether max_iterations is valid.  */
+  bool max_iterations_p;
+
+  /* Maximum number of iterations of current_loop.  */
+  double_int max_iterations;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -4319,6 +4325,108 @@ iv_elimination_compare (struct ivopts_da
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Determine if USE contains non-wrapping arithmetic.  */
+
+static bool
+is_nonwrap_use (struct ivopts_data *data, struct iv_use *use)
+{
+  gimple stmt = use->stmt;
+  tree var, ptr, ptr_type;
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  switch (gimple_assign_rhs_code (stmt))
+    {
+    case POINTER_PLUS_EXPR:
+      ptr = gimple_assign_rhs1 (stmt);
+      ptr_type = TREE_TYPE (ptr);
+      var = gimple_assign_rhs2 (stmt);
+      if (!expr_invariant_in_loop_p (data->current_loop, ptr))
+        return false;
+      break;
+    case ARRAY_REF:
+      ptr = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 0);
+      ptr_type = build_pointer_type (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+      var = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 1);
+      break;
+    default:
+      return false;
+    }
+
+  if (!nowrap_type_p (ptr_type))
+    return false;
+
+  if (TYPE_PRECISION (ptr_type) != TYPE_PRECISION (TREE_TYPE (var)))
+    return false;
+
+  return true;
+}
+
+/* Attempt to infer maximum number of loop iterations of DATA->current_loop
+   from uses in loop containing non-wrapping arithmetic.  If successful,
+   return true, and return maximum iterations in MAX_NITER.  */
+
+static bool
+max_loop_iterations (struct ivopts_data *data, double_int *max_niter)
+{
+  struct iv_use *use;
+  struct iv *iv;
+  bool found = false;
+  double_int period;
+  gimple stmt;
+  unsigned i;
+
+  for (i = 0; i < n_iv_uses (data); i++)
+    {
+      use = iv_use (data, i);
+
+      stmt = use->stmt;
+      if (!just_once_each_iteration_p (data->current_loop, gimple_bb (stmt)))
+	continue;
+
+      if (!is_nonwrap_use (data, use))
+        continue;
+
+      iv = use->iv;
+      if (iv->step == NULL_TREE || TREE_CODE (iv->step) != INTEGER_CST)
+	continue;
+      period = tree_to_double_int (iv_period (iv));
+
+      if (found)
+        *max_niter = double_int_umin (*max_niter, period);
+      else
+        {
+          found = true;
+          *max_niter = period;
+        }
+    }
+
+  return found;
+}
+
+/* Initializes DATA->max_iterations and DATA->max_iterations_p.  */
+
+static void
+set_max_iterations (struct ivopts_data *data)
+{
+  double_int max_niter, max_niter2;
+  bool estimate1, estimate2;
+
+  data->max_iterations_p = false;
+  estimate1 = estimated_loop_iterations (data->current_loop, true, &max_niter);
+  estimate2 = max_loop_iterations (data, &max_niter2);
+  if (!(estimate1 || estimate2))
+    return;
+  if (estimate1 && estimate2)
+    data->max_iterations = double_int_umin (max_niter, max_niter2);
+  else if (estimate1)
+    data->max_iterations = max_niter;
+  else
+    data->max_iterations = max_niter2;
+  data->max_iterations_p = true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND.  */
 
@@ -4391,10 +4499,10 @@ may_eliminate_iv (struct ivopts_data *da
           /* See if we can take advantage of infered loop bound information.  */
           if (loop_only_exit_p (loop, exit))
             {
-              if (!estimated_loop_iterations (loop, true, &max_niter))
+              if (!data->max_iterations_p)
                 return false;
               /* The loop bound is already adjusted by adding 1.  */
-              if (double_int_ucmp (max_niter, period_value) > 0)
+              if (double_int_ucmp (data->max_iterations, period_value) > 0)
                 return false;
             }
           else
@@ -6390,6 +6498,8 @@ tree_ssa_iv_optimize_loop (struct ivopts
   /* Finds candidates for the induction variables (item 2).  */
   find_iv_candidates (data);
 
+  set_max_iterations (data);
+
   /* Calculates the costs (item 3, part 1).  */
   determine_iv_costs (data);
   determine_use_iv_costs (data);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098,  8/10] Nowrap limits iterations - test cases.
  2011-05-17  9:03 ` [PATCH, PR45098, 8/10] Tom de Vries
@ 2011-05-18 18:23   ` Tom de Vries
  2011-05-18 18:27   ` [PATCH PR45098, 9/10] Cheap shift-add Tom de Vries
  1 sibling, 0 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 18:23 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1839 bytes --]

On 05/17/2011 09:21 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
>>
> 
> OK for trunk?
> 
> Thanks,
> - Tom

Resubmitting with comment.

This patch introduces 3 new testcases, and modifies an existing test case.
The 3 new testcases need the preceding patches to pass.

The modified test case is ivopt_infer_2.c.

#ifndef TYPE
#define TYPE char*
#endif

extern int a[];

/* Can not infer loop iteration from array -- exit test can not be replaced.  */
void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
{
  TYPE dstn= dst + i_width;
  TYPE dst0 = dst;
  unsigned long long i = 0;
  for( ; dst <= dstn; )
    {
      dst0[i] = ( src1[i] + src2[i] + 1 +a[i]) >> 1;
      dst++;
      i += 16;
    }
}

The estimates in set_max_iterations for this testcase are:

(gdb) p /x  max_niter
$3 = {low = 0x0, high = 0x1}
(gdb) p /x  max_niter2
$4 = {low = 0x3ffffffffffffff, high = 0x0}

The second estimate is based on a[i], which contains the non-wrapping pointer
arithmetic a+i. Var i is incremented with 16 each iterations, an a is an int
pointer, which explains the factor 64 difference between the 2.

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* gcc.target/arm/ivopts-3.c: New test.
	* gcc.target/arm/ivopts-4.c: New test.
	* gcc.target/arm/ivopts-5.c: New test.
	* gcc.dg/tree-ssa/ivopt_infer_2.c: Adapt test.

[-- Attachment #2: 08_pr45098-nowrap-limits-iterations.test.patch --]
[-- Type: text/x-patch, Size: 3389 bytes --]

Index: gcc/testsuite/gcc.target/arm/ivopts-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-3.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern unsigned int foo2 (short*) __attribute__((pure));
+
+unsigned int
+tr3 (short array[], unsigned int n)
+{
+  unsigned sum = 0;
+  unsigned int x;
+  for (x = 0; x < n; x++)
+    sum += foo2 (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <x" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times ", x" 0 "ivopts"} } */
+/* { dg-final { object-size text <= 30 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.target/arm/ivopts-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-4.c (revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do assemble } */
+/* { dg-options "-mthumb -Os -fdump-tree-ivopts -save-temps" } */
+
+extern unsigned int foo (int*) __attribute__((pure));
+
+unsigned int
+tr2 (int array[], int n)
+{
+  unsigned int sum = 0;
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      sum += foo (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <x" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times ", x" 0 "ivopts"} } */
+/* { dg-final { object-size text <= 36 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.target/arm/ivopts-5.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-5.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern unsigned int foo (int*) __attribute__((pure));
+
+unsigned int
+tr1 (int array[], unsigned int n)
+{
+  unsigned int sum = 0;
+  unsigned int x;
+  for (x = 0; x < n; x++)
+    sum += foo (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <x" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times ", x" 0 "ivopts"} } */
+/* { dg-final { object-size text <= 30 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 173380)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(working copy)
@@ -7,7 +7,8 @@
 
 extern int a[];
 
-/* Can not infer loop iteration from array -- exit test can not be replaced.  */
+/* Can infer loop iteration from nonwrapping pointer arithmetic.
+   exit test can be replaced.  */
 void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
 {
       TYPE dstn= dst + i_width;
@@ -21,5 +22,5 @@ void foo (int i_width, TYPE dst, TYPE sr
        }
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098,  9/10] Cheap shift-add.
  2011-05-17  9:03 ` [PATCH, PR45098, 8/10] Tom de Vries
  2011-05-18 18:23   ` [PATCH PR45098, 8/10] Nowrap limits iterations - test cases Tom de Vries
@ 2011-05-18 18:27   ` Tom de Vries
  2011-05-19  5:33     ` Zdenek Dvorak
  1 sibling, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 18:27 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 927 bytes --]

On 05/17/2011 09:21 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
>>
> 
> OK for trunk?
> 
> Thanks,
> - Tom

Resubmitting with comment.

ARM has cheap shift-add instructions. Take that into account in
force_expr_to_var_cost.

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c: Include expmed.h.
	(get_shiftadd_cost): New function.
	(force_expr_to_var_cost): Use get_shiftadd_cost.

[-- Attachment #2: 09_pr45098-shift-add-cost.patch --]
[-- Type: text/x-patch, Size: 2635 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -92,6 +92,12 @@ along with GCC; see the file COPYING3.  
 #include "tree-inline.h"
 #include "tree-ssa-propagate.h"
 
+/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
+ */
+#include "expmed.h"
+#undef add_cost
+#undef zero_cost
+
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
    interface between the GIMPLE and RTL worlds.  */
@@ -3504,6 +3510,37 @@ get_address_cost (bool symbol_present, b
   return new_cost (cost + acost, complexity);
 }
 
+ /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
+    the EXPR operand holding the shift.  COST0 and COST1 are the costs for
+    calculating the operands of EXPR.  Returns true if successful, and returns
+    the cost in COST.  */
+
+static bool
+get_shiftadd_cost (tree expr, enum machine_mode mode, comp_cost cost0,
+                   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+{
+  comp_cost res;
+  tree op1 = TREE_OPERAND (expr, 1);
+  tree cst = TREE_OPERAND (mult, 1);
+  int m = exact_log2 (int_cst_value (cst));
+  int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
+  int sa_cost;
+
+  if (!(m >= 0 && m < maxm))
+    return false;
+
+  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
+             ? shiftadd_cost[speed][mode][m]
+             : (mult == op1
+                ? shiftsub1_cost[speed][mode][m]
+                : shiftsub0_cost[speed][mode][m]));
+  res = new_cost (sa_cost, 0);
+  res = add_costs (res, mult == op1 ? cost0 : cost1);
+
+  *cost = res;
+  return true;
+}
+
 /* Estimates cost of forcing expression EXPR into a variable.  */
 
 static comp_cost
@@ -3629,6 +3666,21 @@ force_expr_to_var_cost (tree expr, bool 
     case MINUS_EXPR:
     case NEGATE_EXPR:
       cost = new_cost (add_cost (mode, speed), 0);
+      if (TREE_CODE (expr) != NEGATE_EXPR)
+        {
+          tree mult = NULL_TREE;
+          comp_cost sa_cost;
+          if (TREE_CODE (op1) == MULT_EXPR)
+            mult = op1;
+          else if (TREE_CODE (op0) == MULT_EXPR)
+            mult = op0;
+
+          if (mult != NULL_TREE
+              && TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST
+              && get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed,
+                                    &sa_cost))
+            return sa_cost;
+        }
       break;
 
     case MULT_EXPR:

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098, 10/10] Cheap shift-add - test case.
  2011-05-17 10:30 ` [PATCH, PR45098, 10/10] Tom de Vries
@ 2011-05-18 18:30   ` Tom de Vries
  0 siblings, 0 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 18:30 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

On 05/17/2011 09:24 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
>>
> 
> OK for trunk?
> 
> Thanks,
> - Tom

Resubmitting with comment.

This new test case needs all preceding patches to pass.

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* gcc.target/arm/ivopts-6.c: New test.

[-- Attachment #2: 10_pr45098-shift-add-cost.test.patch --]
[-- Type: text/x-patch, Size: 584 bytes --]

Index: gcc/testsuite/gcc.target/arm/ivopts-6.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-6.c (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -fdump-tree-ivopts -save-temps -marm" } */
+
+void
+tr5 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+    for (x = 0; x < n; x++)
+      array[x] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 32 } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH, PR45098, 3/10]
       [not found]       ` <20110518152457.GA13360@kam.mff.cuni.cz>
@ 2011-05-18 19:30         ` Tom de Vries
  0 siblings, 0 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-18 19:30 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

Hi Zdenek,

On 05/18/2011 05:24 PM, Zdenek Dvorak wrote:
> Hi,
> 
>> How about:
>> ...
>> @@ -2866,6 +2878,8 @@ computation_cost (tree expr, bool speed)
>>    if (MEM_P (rslt))
>>      cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type),
>>  			  TYPE_ADDR_SPACE (type), speed);
>> +  else if (!REG_P (rslt))
>> +    cost += (unsigned)rtx_cost (rslt, SET, speed);
>>
>>    return cost;
>>  }
>> ...
>> ?
> 
> this looks ok to me 
> 

thanks for the review.

> (the cast to unsigned is not necessary, though?)

You're right, it's not, that was only necessary to prevent a warning in the
conditional expression originally proposed.

Checked in without cast.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  4/10] Iv init cost.
  2011-05-18 17:46   ` [PATCH PR45098, 4/10] Iv init cost Tom de Vries
@ 2011-05-18 22:59     ` Zdenek Dvorak
  2011-05-25 14:20     ` Richard Sandiford
  1 sibling, 0 replies; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-18 22:59 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> Resubmitting with comment.
> 
> The init cost of an iv will in general not be zero. It will be
> exceptional that the iv register happens to be initialized with the
> proper value at no cost. In general, there will at the very least be a
> regcopy or a const set.

OK.  Please add a comment explaining this to the code,

Zdenek

> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
> 	cost_base.cost == 0.

> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
>  
>    base = cand->iv->base;
>    cost_base = force_var_cost (data, base, NULL);
> +  if (cost_base.cost == 0)
> +      cost_base.cost = COSTS_N_INSNS (1);
>    cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
>  
>    cost = cost_step + adjust_setup_cost (data, cost_base.cost);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-18 17:52   ` [PATCH PR45098, 7/10] Nowrap limits iterations Tom de Vries
@ 2011-05-19  4:45     ` Zdenek Dvorak
  2011-05-20 12:22       ` Tom de Vries
  0 siblings, 1 reply; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-19  4:45 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> This patch attemps to estimate the number of iterations of the loop based on
> nonwrapping arithmetic in the loop body.
> 
> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields
> 	max_iterations_p and max_iterations.
> 	(is_nonwrap_use, max_loop_iterations, set_max_iterations): New function.
> 	(may_eliminate_iv): Use max_iterations_p and max_iterations.
> 	(tree_ssa_iv_optimize_loop): Use set_max_iterations.

what are the cases this handles better than the existing analysis of maximum numbers of iterations
(estimate_numbers_of_iterations)?  Would it be possible to add the handling of these cases
to estimate_numbers_of_iterations, rather than doing it just for ivopts?

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  5/10] Bound cost
  2011-05-18 17:48   ` [PATCH PR45098, 5/10] Bound cost Tom de Vries
@ 2011-05-19  4:45     ` Zdenek Dvorak
  0 siblings, 0 replies; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-19  4:45 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> Resubmitting with comment.
> 
> This improves the estimation of cost of bound calculation:
> - It tries to estimate whether an ssa_name can be used in the loop
>   at zero cost, or whether a regcopy is needed to keep the ssa_name
>   alive during the loop, and counts the cost of the regcopy.
> - It adjusts the cost of a complex loop bound: if a complex loop bound
>   uses more that 1 invariant, it is counted as a new single invariant.
> - It estimates the cost of an int bound at zero.
> 
> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c (get_expr_id): Factored new function out of
> 	get_loop_invariant_expr_id.
> 	(get_loop_invariant_expr_id): Use get_expr_id.
> 	(parm_decl_cost): New function.
> 	(determine_use_iv_cost_condition): Use get_expr_id and parm_decl_cost.
> 	Improve bound cost estimation.  Use different inv_expr_id for elim and
> 	express cases.

OK,

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  9/10] Cheap shift-add.
  2011-05-18 18:27   ` [PATCH PR45098, 9/10] Cheap shift-add Tom de Vries
@ 2011-05-19  5:33     ` Zdenek Dvorak
  2011-05-20 11:32       ` Tom de Vries
  0 siblings, 1 reply; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-19  5:33 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> +  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
> +             ? shiftadd_cost[speed][mode][m]
> +             : (mult == op1
> +                ? shiftsub1_cost[speed][mode][m]
> +                : shiftsub0_cost[speed][mode][m]));
> +  res = new_cost (sa_cost, 0);
> +  res = add_costs (res, mult == op1 ? cost0 : cost1);

just forgetting the cost of the other operand does not seem correct -- what
if it contains some more complicated subexpression?

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH PR45098,  1/10] Proc object-size fix
  2011-05-17  7:25 ` [PATCH, PR45098, 1/10] Tom de Vries
@ 2011-05-19 11:17   ` Tom de Vries
  0 siblings, 0 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-19 11:17 UTC (permalink / raw)
  To: Rainer Orth, mikestump; +Cc: Zdenek Dvorak, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 810 bytes --]

On 05/17/2011 09:14 AM, Tom de Vries wrote:
> On 05/17/2011 09:10 AM, Tom de Vries wrote:
>> Hi Zdenek,
>>
>> I have a patch set for for PR45098.
>>
>> 01_object-size-target.patch
>> 02_pr45098-rtx-cost-set.patch
>> 03_pr45098-computation-cost.patch
>> 04_pr45098-iv-init-cost.patch
>> 05_pr45098-bound-cost.patch
>> 06_pr45098-bound-cost.test.patch
>> 07_pr45098-nowrap-limits-iterations.patch
>> 08_pr45098-nowrap-limits-iterations.test.patch
>> 09_pr45098-shift-add-cost.patch
>> 10_pr45098-shift-add-cost.test.patch
>>
>> I will sent out the patches individually.
> 
> OK for trunk?
> 
> Thanks,
> - Tom

Fixes bug in target selector handling of proc object-size.

Committed as obvious.

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	* lib/lib/scanasm.exp (object-size): Fix target selector handling.

[-- Attachment #2: 01_object-size-target.patch --]
[-- Type: text/x-patch, Size: 470 bytes --]

Index: gcc/testsuite/lib/scanasm.exp
===================================================================
--- gcc/testsuite/lib/scanasm.exp (revision 173734)
+++ gcc/testsuite/lib/scanasm.exp (working copy)
@@ -330,7 +330,7 @@ proc object-size { args } {
 	return
     }
     if { [llength $args] >= 4 } {
-	switch [dg-process-target [lindex $args 1]] {
+	switch [dg-process-target [lindex $args 3]] {
 	    "S" { }
 	    "N" { return }
 	    "F" { setup_xfail "*-*-*" }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  9/10] Cheap shift-add.
  2011-05-19  5:33     ` Zdenek Dvorak
@ 2011-05-20 11:32       ` Tom de Vries
  2011-05-20 20:09         ` Zdenek Dvorak
  2011-05-21 15:05         ` Eric Botcazou
  0 siblings, 2 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-20 11:32 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

Hi,

On 05/18/2011 11:20 PM, Zdenek Dvorak wrote:
>> +  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
>> +             ? shiftadd_cost[speed][mode][m]
>> +             : (mult == op1
>> +                ? shiftsub1_cost[speed][mode][m]
>> +                : shiftsub0_cost[speed][mode][m]));
>> +  res = new_cost (sa_cost, 0);
>> +  res = add_costs (res, mult == op1 ? cost0 : cost1);
> 
> just forgetting the cost of the other operand does not seem correct -- what
> if it contains some more complicated subexpression?
> 

True.  I now added the cost of TREE_OPERAND (mult, 0).

Thanks,
- Tom


2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c: Include expmed.h.
	(get_shiftadd_cost): New function.
	(force_expr_to_var_cost): Declare forward.  Use get_shiftadd_cost.


[-- Attachment #2: 09_pr45098-shift-add-cost.patch --]
[-- Type: text/x-patch, Size: 3012 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -92,6 +92,12 @@ along with GCC; see the file COPYING3.  
 #include "tree-inline.h"
 #include "tree-ssa-propagate.h"
 
+/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
+ */
+#include "expmed.h"
+#undef add_cost
+#undef zero_cost
+
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
    interface between the GIMPLE and RTL worlds.  */
@@ -377,6 +383,8 @@ struct iv_ca_delta
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+static comp_cost force_expr_to_var_cost (tree, bool);
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -3504,6 +3512,42 @@ get_address_cost (bool symbol_present, b
   return new_cost (cost + acost, complexity);
 }
 
+ /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
+    the EXPR operand holding the shift.  COST0 and COST1 are the costs for
+    calculating the operands of EXPR.  Returns true if successful, and returns
+    the cost in COST.  */
+
+static bool
+get_shiftadd_cost (tree expr, enum machine_mode mode, comp_cost cost0,
+                   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+{
+  comp_cost res;
+  tree op1 = TREE_OPERAND (expr, 1);
+  tree cst = TREE_OPERAND (mult, 1);
+  tree multop = TREE_OPERAND (mult, 0);
+  int m = exact_log2 (int_cst_value (cst));
+  int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
+  int sa_cost;
+
+  if (!(m >= 0 && m < maxm))
+    return false;
+
+  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
+             ? shiftadd_cost[speed][mode][m]
+             : (mult == op1
+                ? shiftsub1_cost[speed][mode][m]
+                : shiftsub0_cost[speed][mode][m]));
+  res = new_cost (sa_cost, 0);
+  res = add_costs (res, mult == op1 ? cost0 : cost1);
+
+  STRIP_NOPS (multop);
+  if (!is_gimple_val (multop))
+    res = add_costs (res, force_expr_to_var_cost (multop, speed));
+
+  *cost = res;
+  return true;
+}
+
 /* Estimates cost of forcing expression EXPR into a variable.  */
 
 static comp_cost
@@ -3629,6 +3673,21 @@ force_expr_to_var_cost (tree expr, bool 
     case MINUS_EXPR:
     case NEGATE_EXPR:
       cost = new_cost (add_cost (mode, speed), 0);
+      if (TREE_CODE (expr) != NEGATE_EXPR)
+        {
+          tree mult = NULL_TREE;
+          comp_cost sa_cost;
+          if (TREE_CODE (op1) == MULT_EXPR)
+            mult = op1;
+          else if (TREE_CODE (op0) == MULT_EXPR)
+            mult = op0;
+
+          if (mult != NULL_TREE
+              && TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST
+              && get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed,
+                                    &sa_cost))
+            return sa_cost;
+        }
       break;
 
     case MULT_EXPR:

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-19  4:45     ` Zdenek Dvorak
@ 2011-05-20 12:22       ` Tom de Vries
  2011-05-21 18:54         ` Zdenek Dvorak
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-20 12:22 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2525 bytes --]

On 05/18/2011 11:11 PM, Zdenek Dvorak wrote:
> Hi,
> 
>> This patch attemps to estimate the number of iterations of the loop based on
>> nonwrapping arithmetic in the loop body.
>>
>> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>>
>> 	PR target/45098
>> 	* tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields
>> 	max_iterations_p and max_iterations.
>> 	(is_nonwrap_use, max_loop_iterations, set_max_iterations): New function.
>> 	(may_eliminate_iv): Use max_iterations_p and max_iterations.
>> 	(tree_ssa_iv_optimize_loop): Use set_max_iterations.
> 
> what are the cases this handles better than the existing analysis of maximum numbers of iterations
> (estimate_numbers_of_iterations)?

Consider tr1:

extern int pfoo (int*)  __attribute__((pure));

int tr1 (int array[], unsigned int n)
{
  int sum = 0;
  unsigned int i;
  i = 0;
  while (1)
    {
      sum += pfoo (&array[i]);
      if (!(i < n))
        break;
      i++;
    }
  return sum;
}

The number of iterations is limited by &array[i]. &array[0x3fffffff] is still
ok, but &array[0x40000000] wraps. So i runs maximally from 0 to 0x3fffffff,
which is 0x40000000 loop iterations. Since &array[i] dominates the single
loop exit, any statement in the loop is executed at most 0x40000000 times.
That's also the amount registered in nb_iterations_upper_bound.
Since int *p has 0x40000000 distinct values, it's ok to replace the
exit test !(i < n) with p == &array[n].


On the other hand, consider tr6:

int tr6 (int array[], unsigned int n)
{
  int sum = 0;
  unsigned int i;
  for (i = 0; i < n; ++i)
      sum += pfoo (&array[i]);
  return sum;
}

The same holds as before, but &array[i] does not dominate the single
loop exit, so any statement in the loop is executed at most 0x40000001 times.
Note that the loop body is executed at most 0x40000000 times, and that only
the exit test is executed at most 0x40000001 times.
That's also the amount registered in nb_iterations_upper_bound.
Since int *p has only 0x40000000 distinct values, it's not ok to replace
the exit test in terms of p.


> Would it be possible to add the handling of these cases
> to estimate_numbers_of_iterations, rather than doing it just for ivopts?

Yes, I did that now.

Thanks,
- Tom

2011-05-05  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-niter.c (infer_loop_bounds_from_pointer_arith): New
	function.
	(infer_loop_bounds_from_undefined): Use new function.
	* tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix
	estimated_loop_iterations comparison.

[-- Attachment #2: 07_pr45098-nowrap-limits-iterations.patch --]
[-- Type: text/x-patch, Size: 3227 bytes --]

Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c (revision 173734)
+++ gcc/tree-ssa-loop-niter.c (working copy)
@@ -2835,6 +2835,54 @@ infer_loop_bounds_from_array (struct loo
    that signed arithmetics in STMT does not overflow.  */
 
 static void
+infer_loop_bounds_from_pointer_arith (struct loop *loop, gimple stmt)
+{
+  tree def, base, step, scev, type, low, high;
+  tree var, ptr;
+
+  if (!is_gimple_assign (stmt)
+      || gimple_assign_rhs_code (stmt) != POINTER_PLUS_EXPR)
+    return;
+
+  def = gimple_assign_lhs (stmt);
+  if (TREE_CODE (def) != SSA_NAME)
+    return;
+
+  type = TREE_TYPE (def);
+  if (!nowrap_type_p (type))
+    return;
+
+  ptr = gimple_assign_rhs1 (stmt);
+  if (!expr_invariant_in_loop_p (loop, ptr))
+    return;
+
+  var = gimple_assign_rhs2 (stmt);
+  if (TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (var)))
+    return;
+
+  scev = instantiate_parameters (loop, analyze_scalar_evolution (loop, def));
+  if (chrec_contains_undetermined (scev))
+    return;
+
+  base = initial_condition_in_loop_num (scev, loop->num);
+  step = evolution_part_in_loop_num (scev, loop->num);
+
+  if (!base || !step
+      || TREE_CODE (step) != INTEGER_CST
+      || tree_contains_chrecs (base, NULL)
+      || chrec_contains_symbols_defined_in_loop (base, loop->num))
+    return;
+
+  low = lower_bound_in_type (type, type);
+  high = upper_bound_in_type (type, type);
+
+  record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
+}
+
+/* Determine information about number of iterations of a LOOP from the fact
+   that signed arithmetics in STMT does not overflow.  */
+
+static void
 infer_loop_bounds_from_signedness (struct loop *loop, gimple stmt)
 {
   tree def, base, step, scev, type, low, high;
@@ -2907,7 +2955,10 @@ infer_loop_bounds_from_undefined (struct
 	  infer_loop_bounds_from_array (loop, stmt, reliable);
 
 	  if (reliable)
-	    infer_loop_bounds_from_signedness (loop, stmt);
+            {
+              infer_loop_bounds_from_signedness (loop, stmt);
+              infer_loop_bounds_from_pointer_arith (loop, stmt);
+            }
   	}
 
     }
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c (revision 173734)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -4391,8 +4391,13 @@ may_eliminate_iv (struct ivopts_data *da
             {
               if (!estimated_loop_iterations (loop, true, &max_niter))
                 return false;
-              /* The loop bound is already adjusted by adding 1.  */
-              if (double_int_ucmp (max_niter, period_value) > 0)
+              /* The max iterations applies also to the number of times the loop
+                 exit condition is executed.  The number of distinct values of
+                 the cand is period_value + 1.  So, test for
+                 'period_value + 1 >= max_iterations'.
+               */
+              period_value = double_int_add (period_value, double_int_one);
+              if (double_int_ucmp (max_niter, period_value) > 0)
                 return false;
             }
           else

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  9/10] Cheap shift-add.
  2011-05-20 11:32       ` Tom de Vries
@ 2011-05-20 20:09         ` Zdenek Dvorak
  2011-05-21 15:05         ` Eric Botcazou
  1 sibling, 0 replies; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-20 20:09 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> On 05/18/2011 11:20 PM, Zdenek Dvorak wrote:
> >> +  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
> >> +             ? shiftadd_cost[speed][mode][m]
> >> +             : (mult == op1
> >> +                ? shiftsub1_cost[speed][mode][m]
> >> +                : shiftsub0_cost[speed][mode][m]));
> >> +  res = new_cost (sa_cost, 0);
> >> +  res = add_costs (res, mult == op1 ? cost0 : cost1);
> > 
> > just forgetting the cost of the other operand does not seem correct -- what
> > if it contains some more complicated subexpression?
> > 
> 
> True.  I now added the cost of TREE_OPERAND (mult, 0).

OK,

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  9/10] Cheap shift-add.
  2011-05-20 11:32       ` Tom de Vries
  2011-05-20 20:09         ` Zdenek Dvorak
@ 2011-05-21 15:05         ` Eric Botcazou
  2011-05-22 19:33           ` Tom de Vries
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Botcazou @ 2011-05-21 15:05 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Zdenek Dvorak

> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c: Include expmed.h.
> 	(get_shiftadd_cost): New function.
> 	(force_expr_to_var_cost): Declare forward.  Use get_shiftadd_cost.

This breaks the Ada compiler on x86:

/home/eric/build/gcc/native32/./gcc/xgcc -B/home/eric/build/gcc/native32/./gcc/ -B/home/eric/install/gcc/i586-suse-linux/bin/ -B/home/eric/install/gcc/i586-suse-linux/lib/ -isystem /home/eric/install/gcc/i586-suse-linux/include -isystem /home/eric/install/gcc/i586-suse-linux/sys-include    -c -g -O2  -fPIC  -W -Wall -gnatpg   
a-calend.adb -o a-calend.o
+===========================GNAT BUG DETECTED==============================+
| 4.7.0 20110521 (experimental) [trunk revision 173887] (i586-suse-linux-gnu) 
GCC error:|
| in int_cst_value, at tree.c:9970                                         |
| Error detected around a-calend.adb:1254:7          

To reproduce, do:
  gcc/gnat1 gcc/ada/rts/a-calend.adb -gnatg -O -Igcc/ada/rts
in the build dir.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-20 12:22       ` Tom de Vries
@ 2011-05-21 18:54         ` Zdenek Dvorak
  2011-05-21 22:53           ` Tom de Vries
  2011-05-23 14:50           ` H.J. Lu
  0 siblings, 2 replies; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-21 18:54 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> > Would it be possible to add the handling of these cases
> > to estimate_numbers_of_iterations, rather than doing it just for ivopts?
> 
> Yes, I did that now.
> 
> Thanks,
> - Tom
> 
> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/45098
> 	* tree-ssa-loop-niter.c (infer_loop_bounds_from_pointer_arith): New
> 	function.
> 	(infer_loop_bounds_from_undefined): Use new function.

this is OK.

> 	* tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix
> 	estimated_loop_iterations comparison.

I don't think this part is correct, though:

> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c (revision 173734)
> +++ gcc/tree-ssa-loop-ivopts.c (working copy)
> @@ -4391,8 +4391,13 @@ may_eliminate_iv (struct ivopts_data *da
>              {
>                if (!estimated_loop_iterations (loop, true, &max_niter))
>                  return false;
> -              /* The loop bound is already adjusted by adding 1.  */
> -              if (double_int_ucmp (max_niter, period_value) > 0)
> +              /* The max iterations applies also to the number of times the loop
> +                 exit condition is executed.  The number of distinct values of
> +                 the cand is period_value + 1.  So, test for
> +                 'period_value + 1 >= max_iterations'.
> +               */
> +              period_value = double_int_add (period_value, double_int_one);
> +              if (double_int_ucmp (max_niter, period_value) > 0)
>                  return false;
>              }
>            else

max_niter is the upper bound on the number of iterations of the loop, i.e., the number
of executions of its latch edge.  Therefore, the control induction variable of the loop
will (at the exit statement) achieve at most max_niter + 1 different values.  Conversely,
the number of distinct values that the control iv can represent is period + 1 (naming of
the "period" variable is a bit missleading).  Thus, the correct test is
# of available values >= # of required values, equivalently
period + 1 >= max_niter + 1, equivalently
period >= max_niter, i.e.,
the current test.

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-21 18:54         ` Zdenek Dvorak
@ 2011-05-21 22:53           ` Tom de Vries
  2011-05-28 17:58             ` Tom de Vries
  2011-05-23 14:50           ` H.J. Lu
  1 sibling, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-21 22:53 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

On 05/21/2011 02:24 PM, Zdenek Dvorak wrote:
>> 	* tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix
>> 	estimated_loop_iterations comparison.
> 
> I don't think this part is correct, though:
> 
>> Index: gcc/tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-ivopts.c (revision 173734)
>> +++ gcc/tree-ssa-loop-ivopts.c (working copy)
>> @@ -4391,8 +4391,13 @@ may_eliminate_iv (struct ivopts_data *da
>>              {
>>                if (!estimated_loop_iterations (loop, true, &max_niter))
>>                  return false;
>> -              /* The loop bound is already adjusted by adding 1.  */
>> -              if (double_int_ucmp (max_niter, period_value) > 0)
>> +              /* The max iterations applies also to the number of times the loop
>> +                 exit condition is executed.  The number of distinct values of
>> +                 the cand is period_value + 1.  So, test for
>> +                 'period_value + 1 >= max_iterations'.
>> +               */
>> +              period_value = double_int_add (period_value, double_int_one);
>> +              if (double_int_ucmp (max_niter, period_value) > 0)
>>                  return false;
>>              }
>>            else
> 

> max_niter is the upper bound on the number of iterations of the loop, i.e., the number
> of executions of its latch edge.

max_niter is set from estimated_loop_iterations, meaning from
loop->nb_iterations_upper_bound.

consider:
...
void f(int *a)
{
  int i;

  for (i = 0; i < 10; ++i)
    a[i] = 0;
}
...

at ivopts, it looks like this (compiled with -Os -fno-tree-vrp
-fno-tree-dominator-opts -fno-tree-loop-ivcanon, to get a source-like
representation)
...
f (int * a)
{
  int i;
  int * D.2009;
  unsigned int D.2008;
  unsigned int i.0;

<bb 2>:
  goto <bb 4>;

<bb 3>:
  i.0_3 = (unsigned int) i_1;
  D.2008_4 = i.0_3 * 4;
  D.2009_6 = a_5(D) + D.2008_4;
  *D.2009_6 = 0;
  i_7 = i_1 + 1;

<bb 4>:
  # i_1 = PHI <0(2), i_7(3)>
  if (i_1 <= 9)
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  return;

}
...


The header block of the loop is bb 4, the latch block is bb 3:
...
(gdb) p loop.header.index
$4 = 4
(gdb) p loop.latch.index
$5 = 3
...

The number of times the latch edge is executed, is 10.

But loop->nb_iterations_upper_bound, or max_niter is 11:
...
(gdb) p *loop
$1 = {num = 1, ninsns = 0, header = 0xf7dc2440, latch = 0xf7dc2400, lpt_decision
= {decision = LPT_NONE, times = 0}, av_ninsns = 0, num_nodes = 2, superloops =
0xf7db6ee8, inner = 0x0, next = 0x0,
  aux = 0x0, nb_iterations = 0xf7d3d540, nb_iterations_upper_bound = {low = 11,
high = 0}, nb_iterations_estimate = {low = 11, high = 0}, any_upper_bound = 1
'\001', any_estimate = 1 '\001',
  can_be_parallel = 0 '\000', estimate_state = EST_AVAILABLE, bounds =
0xf7d3da2c, exits = 0xf7dc3d70}
...

> Therefore, the control induction variable of the loop
> will (at the exit statement) achieve at most max_niter + 1 different values.

Based on what I observe, I'd say the control induction variable of the loop will
achieve at most max_niter different values.

> Conversely,
> the number of distinct values that the control iv can represent is period + 1 (naming of
> the "period" variable is a bit missleading).

agree.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  9/10] Cheap shift-add.
  2011-05-21 15:05         ` Eric Botcazou
@ 2011-05-22 19:33           ` Tom de Vries
  2011-05-22 20:22             ` Richard Guenther
  2011-05-22 21:11             ` Eric Botcazou
  0 siblings, 2 replies; 46+ messages in thread
From: Tom de Vries @ 2011-05-22 19:33 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]

On 05/21/2011 09:40 AM, Eric Botcazou wrote:
>> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>>
>> 	PR target/45098
>> 	* tree-ssa-loop-ivopts.c: Include expmed.h.
>> 	(get_shiftadd_cost): New function.
>> 	(force_expr_to_var_cost): Declare forward.  Use get_shiftadd_cost.
> 
> This breaks the Ada compiler on x86:
> 
> /home/eric/build/gcc/native32/./gcc/xgcc -B/home/eric/build/gcc/native32/./gcc/ -B/home/eric/install/gcc/i586-suse-linux/bin/ -B/home/eric/install/gcc/i586-suse-linux/lib/ -isystem /home/eric/install/gcc/i586-suse-linux/include -isystem /home/eric/install/gcc/i586-suse-linux/sys-include    -c -g -O2  -fPIC  -W -Wall -gnatpg   
> a-calend.adb -o a-calend.o
> +===========================GNAT BUG DETECTED==============================+
> | 4.7.0 20110521 (experimental) [trunk revision 173887] (i586-suse-linux-gnu) 
> GCC error:|
> | in int_cst_value, at tree.c:9970                                         |
> | Error detected around a-calend.adb:1254:7          
> 
> To reproduce, do:
>   gcc/gnat1 gcc/ada/rts/a-calend.adb -gnatg -O -Igcc/ada/rts
> in the build dir.
> 

I didn't manage to reproduce the breakage, but I think this patch will fix it.

The patch makes sure cst_and_fits_in_hwi is tested before using int_cst_value.

Regtested on x86_64.

Ok for trunk?

2011-05-22  Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Fixed const test
	for call to get_shiftadd_cost.

[-- Attachment #2: pr45098-fix.patch --]
[-- Type: text/x-patch, Size: 563 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 173703)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3683,7 +3683,7 @@
             mult = op0;
 
           if (mult != NULL_TREE
-              && TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST
+              && cst_and_fits_in_hwi (TREE_OPERAND (mult, 1))
               && get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed,
                                     &sa_cost))
             return sa_cost;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098, 9/10] Cheap shift-add.
  2011-05-22 19:33           ` Tom de Vries
@ 2011-05-22 20:22             ` Richard Guenther
  2011-05-22 21:11             ` Eric Botcazou
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Guenther @ 2011-05-22 20:22 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Eric Botcazou, gcc-patches, Zdenek Dvorak

On Sun, May 22, 2011 at 6:06 PM, Tom de Vries <vries@codesourcery.com> wrote:
> On 05/21/2011 09:40 AM, Eric Botcazou wrote:
>>> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>>>
>>>      PR target/45098
>>>      * tree-ssa-loop-ivopts.c: Include expmed.h.
>>>      (get_shiftadd_cost): New function.
>>>      (force_expr_to_var_cost): Declare forward.  Use get_shiftadd_cost.
>>
>> This breaks the Ada compiler on x86:
>>
>> /home/eric/build/gcc/native32/./gcc/xgcc -B/home/eric/build/gcc/native32/./gcc/ -B/home/eric/install/gcc/i586-suse-linux/bin/ -B/home/eric/install/gcc/i586-suse-linux/lib/ -isystem /home/eric/install/gcc/i586-suse-linux/include -isystem /home/eric/install/gcc/i586-suse-linux/sys-include    -c -g -O2  -fPIC  -W -Wall -gnatpg
>> a-calend.adb -o a-calend.o
>> +===========================GNAT BUG DETECTED==============================+
>> | 4.7.0 20110521 (experimental) [trunk revision 173887] (i586-suse-linux-gnu)
>> GCC error:|
>> | in int_cst_value, at tree.c:9970                                         |
>> | Error detected around a-calend.adb:1254:7
>>
>> To reproduce, do:
>>   gcc/gnat1 gcc/ada/rts/a-calend.adb -gnatg -O -Igcc/ada/rts
>> in the build dir.
>>
>
> I didn't manage to reproduce the breakage, but I think this patch will fix it.
>
> The patch makes sure cst_and_fits_in_hwi is tested before using int_cst_value.
>
> Regtested on x86_64.
>
> Ok for trunk?

Ok.

Thanks,
Richard.

> 2011-05-22  Tom de Vries  <tom@codesourcery.com>
>
>        PR target/45098
>        * tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Fixed const test
>        for call to get_shiftadd_cost.
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  9/10] Cheap shift-add.
  2011-05-22 19:33           ` Tom de Vries
  2011-05-22 20:22             ` Richard Guenther
@ 2011-05-22 21:11             ` Eric Botcazou
  1 sibling, 0 replies; 46+ messages in thread
From: Eric Botcazou @ 2011-05-22 21:11 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Zdenek Dvorak

> I didn't manage to reproduce the breakage, but I think this patch will fix
> it.

You need a 32-bit host.

> The patch makes sure cst_and_fits_in_hwi is tested before using
> int_cst_value.

This should be sufficient indeed, thanks.

> 2011-05-22  Tom de Vries  <tom@codesourcery.com>
>
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Fixed const test
> 	for call to get_shiftadd_cost.

Present tense in the ChangeLog.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098, 7/10] Nowrap limits iterations
  2011-05-21 18:54         ` Zdenek Dvorak
  2011-05-21 22:53           ` Tom de Vries
@ 2011-05-23 14:50           ` H.J. Lu
  1 sibling, 0 replies; 46+ messages in thread
From: H.J. Lu @ 2011-05-23 14:50 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Tom de Vries, gcc-patches

On Sat, May 21, 2011 at 5:24 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> > Would it be possible to add the handling of these cases
>> > to estimate_numbers_of_iterations, rather than doing it just for ivopts?
>>
>> Yes, I did that now.
>>
>> Thanks,
>> - Tom
>>
>> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>>
>>       PR target/45098
>>       * tree-ssa-loop-niter.c (infer_loop_bounds_from_pointer_arith): New
>>       function.
>>       (infer_loop_bounds_from_undefined): Use new function.
>
> this is OK.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49124

H.J.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  4/10] Iv init cost.
  2011-05-18 17:46   ` [PATCH PR45098, 4/10] Iv init cost Tom de Vries
  2011-05-18 22:59     ` Zdenek Dvorak
@ 2011-05-25 14:20     ` Richard Sandiford
  2011-05-26 12:24       ` Tom de Vries
  1 sibling, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2011-05-25 14:20 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Zdenek Dvorak, gcc-patches

Sorry for being so late.  I was just curious...

Tom de Vries <vries@codesourcery.com> writes:
> The init cost of an iv will in general not be zero. It will be
> exceptional that the iv register happens to be initialized with the
> proper value at no cost. In general, there will at the very least be a
> regcopy or a const set.
>
> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>
> 	PR target/45098
> 	* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
> 	cost_base.cost == 0.
> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
>  
>    base = cand->iv->base;
>    cost_base = force_var_cost (data, base, NULL);
> +  if (cost_base.cost == 0)
> +      cost_base.cost = COSTS_N_INSNS (1);
>    cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
>  
>    cost = cost_step + adjust_setup_cost (data, cost_base.cost);

...why does this reasoning apply only to this call to force_var_cost?

Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  4/10] Iv init cost.
  2011-05-25 14:20     ` Richard Sandiford
@ 2011-05-26 12:24       ` Tom de Vries
  2011-05-31 15:22         ` Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-26 12:24 UTC (permalink / raw)
  To: richard.sandiford; +Cc: Zdenek Dvorak, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2329 bytes --]

Hi Richard,

On 05/25/2011 03:44 PM, Richard Sandiford wrote:
> Sorry for being so late.  I was just curious...
> 
> Tom de Vries <vries@codesourcery.com> writes:
>> The init cost of an iv will in general not be zero. It will be
>> exceptional that the iv register happens to be initialized with the
>> proper value at no cost. In general, there will at the very least be a
>> regcopy or a const set.
>>
>> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>>
>> 	PR target/45098
>> 	* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
>> 	cost_base.cost == 0.
>> Index: gcc/tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
>> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
>> @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
>>  
>>    base = cand->iv->base;
>>    cost_base = force_var_cost (data, base, NULL);
>> +  if (cost_base.cost == 0)
>> +      cost_base.cost = COSTS_N_INSNS (1);
>>    cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
>>  
>>    cost = cost_step + adjust_setup_cost (data, cost_base.cost);
> 
> ...why does this reasoning apply only to this call to force_var_cost?
> 
> Richard

force_var_cost is described as estimating the cost of forcing expression expr
into a variable. If expr is already a var, this definition is ambiguous.
If we can use the var directly, the cost is zero, but if we need a regcopy, it
should be the cost of a regcopy.

What is special for an iv, is that we know that it is not only used but also
modified. If a var is used in or after the loop, we need a regcopy to init the
iv with that var. If that var is not used in or after the loop, we can use that
var as iv. The patch above is a heuristic that estimates that the latter
situation is the less frequent one.

In general, we don't have such specific information, and the the cost of zero is
a good choice then.

We could add a parameter to force_var_cost that indicates this choice, that
would perhaps be a better fix.

As for the reasoning related to the const set, that is something that indeed
holds more general, and could be implemented in force_var_cost, which is what
you're suggesting if I understand you correctly.

The tentative patch below explores these last 2 ideas.

Thanks,
-Tom


[-- Attachment #2: pr45098-rewrite.patch --]
[-- Type: text/x-patch, Size: 5498 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c (working copy) gcc/tree-ssa-loop-ivopts.c (working copy)
--- gcc/tree-ssa-loop-ivopts.c (working copy)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -3722,16 +3722,31 @@
    invariants the computation depends on.  */
 
 static comp_cost
-force_var_cost (struct ivopts_data *data,
-		tree expr, bitmap *depends_on)
+force_var_cost (struct ivopts_data *data, tree expr, bool var_costs_regcopy,
+                bitmap *depends_on)
 {
+  comp_cost cost;
+
   if (depends_on)
     {
       fd_ivopts_data = data;
       walk_tree (&expr, find_depends, depends_on, NULL);
     }
 
-  return force_expr_to_var_cost (expr, data->speed);
+  STRIP_NOPS (expr);
+  if (SSA_VAR_P (expr))
+    {
+      cost = zero_cost;
+      if (var_costs_regcopy)
+        cost.cost = COSTS_N_INSNS (1);
+      return cost;
+    }
+
+  cost = force_expr_to_var_cost (expr, data->speed);
+  if (cost.cost == 0)
+    cost.cost = COSTS_N_INSNS (1);
+
+  return cost;
 }
 
 /* Estimates cost of expressing address ADDR  as var + symbol + offset.  The
@@ -3817,7 +3832,8 @@
   aff_combination_scale (&aff_e2, double_int_minus_one);
   aff_combination_add (&aff_e1, &aff_e2);
 
-  return force_var_cost (data, aff_combination_to_tree (&aff_e1), depends_on);
+  return force_var_cost (data, aff_combination_to_tree (&aff_e1), false,
+                         depends_on);
 }
 
 /* Estimates cost of expressing difference E1 - E2 as
@@ -3857,11 +3873,11 @@
   *var_present = true;
 
   if (integer_zerop (e2))
-    return force_var_cost (data, e1, depends_on);
+    return force_var_cost (data, e1, false, depends_on);
 
   if (integer_zerop (e1))
     {
-      comp_cost cost = force_var_cost (data, e2, depends_on);
+      comp_cost cost = force_var_cost (data, e2, false, depends_on);
       cost.cost += multiply_by_cost (-1, mode, data->speed);
       return cost;
     }
@@ -3872,7 +3888,8 @@
   aff_combination_scale (&aff_e2, double_int_minus_one);
   aff_combination_add (&aff_e1, &aff_e2);
 
-  return force_var_cost (data, aff_combination_to_tree (&aff_e1), depends_on);
+  return force_var_cost (data, aff_combination_to_tree (&aff_e1), false,
+                         depends_on);
 }
 
 /* Returns true if AFF1 and AFF2 are identical.  */
@@ -4162,7 +4179,7 @@
     }
   else
     {
-      cost = force_var_cost (data, cbase, depends_on);
+      cost = force_var_cost (data, cbase, false, depends_on);
       cost = add_costs (cost,
 			difference_cost (data,
 					 ubase, build_int_cst (utype, 0),
@@ -4487,22 +4504,18 @@
   return true;
 }
 
- /* Calculates the cost of BOUND, if it is a PARM_DECL.  A PARM_DECL must
-    be copied, if is is used in the loop body and DATA->body_includes_call.  */
+/* Attempts to determine whether a var EXPR can be used in the loop body
+   of DATA->current_loop, or whether a regcopy is needed.  */
 
-static int
-parm_decl_cost (struct ivopts_data *data, tree bound)
+static bool
+use_in_loop_needs_copy (struct ivopts_data *data, tree expr)
 {
-  tree sbound = bound;
-  STRIP_NOPS (sbound);
-
-  if (TREE_CODE (sbound) == SSA_NAME
-      && TREE_CODE (SSA_NAME_VAR (sbound)) == PARM_DECL
-      && gimple_nop_p (SSA_NAME_DEF_STMT (sbound))
-      && data->body_includes_call)
-    return COSTS_N_INSNS (1);
+  STRIP_NOPS (expr);
 
-  return 0;
+  return (TREE_CODE (expr) == SSA_NAME
+          && TREE_CODE (SSA_NAME_VAR (expr)) == PARM_DECL
+          && gimple_nop_p (SSA_NAME_DEF_STMT (expr))
+          && data->body_includes_call);
 }
 
 /* Determines cost of basing replacement of USE on CAND in a condition.  */
@@ -4529,10 +4542,10 @@
   /* Try iv elimination.  */
   if (may_eliminate_iv (data, use, cand, &bound))
     {
-      elim_cost = force_var_cost (data, bound, &depends_on_elim);
-      if (elim_cost.cost == 0)
-        elim_cost.cost = parm_decl_cost (data, bound);
-      else if (TREE_CODE (bound) == INTEGER_CST)
+      elim_cost = force_var_cost (data, bound,
+                                  use_in_loop_needs_copy (data, bound),
+                                  &depends_on_elim);
+      if (TREE_CODE (bound) == INTEGER_CST)
         elim_cost.cost = 0;
       /* If we replace a loop condition 'i < n' with 'p < base + n',
 	 depends_on_elim will have 'base' and 'n' set, which implies
@@ -4577,10 +4590,9 @@
   walk_tree (&cmp_iv->base, find_depends, &depends_on_express, NULL);
 
   /* Count the cost of the original bound as well.  */
-  bound_cost = force_var_cost (data, *bound_cst, NULL);
-  if (bound_cost.cost == 0)
-    bound_cost.cost = parm_decl_cost (data, *bound_cst);
-  else if (TREE_CODE (*bound_cst) == INTEGER_CST)
+  bound_cost = force_var_cost (data, *bound_cst,
+                               use_in_loop_needs_copy (data, *bound_cst), NULL);
+  if (TREE_CODE (*bound_cst) == INTEGER_CST)
     bound_cost.cost = 0;
   express_cost.cost += bound_cost.cost;
 
@@ -4804,12 +4816,10 @@
      that rolls enough, so we take it just very little into account.  */
 
   base = cand->iv->base;
-  cost_base = force_var_cost (data, base, NULL);
   /* It will be exceptional that the iv register happens to be initialized with
      the proper value at no cost.  In general, there will at least be a regcopy
      or a const set.  */
-  if (cost_base.cost == 0)
-    cost_base.cost = COSTS_N_INSNS (1);
+  cost_base = force_var_cost (data, base, true, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-21 22:53           ` Tom de Vries
@ 2011-05-28 17:58             ` Tom de Vries
  2011-05-30 15:12               ` Zdenek Dvorak
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-28 17:58 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

Hi Zdenek,

On 05/21/2011 07:59 PM, Tom de Vries wrote:
> On 05/21/2011 02:24 PM, Zdenek Dvorak wrote:
>>> 	* tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix
>>> 	estimated_loop_iterations comparison.
>>
>> I don't think this part is correct, though:
>>
>>> Index: gcc/tree-ssa-loop-ivopts.c
>>> ===================================================================
>>> --- gcc/tree-ssa-loop-ivopts.c (revision 173734)
>>> +++ gcc/tree-ssa-loop-ivopts.c (working copy)
>>> @@ -4391,8 +4391,13 @@ may_eliminate_iv (struct ivopts_data *da
>>>              {
>>>                if (!estimated_loop_iterations (loop, true, &max_niter))
>>>                  return false;
>>> -              /* The loop bound is already adjusted by adding 1.  */
>>> -              if (double_int_ucmp (max_niter, period_value) > 0)
>>> +              /* The max iterations applies also to the number of times the loop
>>> +                 exit condition is executed.  The number of distinct values of
>>> +                 the cand is period_value + 1.  So, test for
>>> +                 'period_value + 1 >= max_iterations'.
>>> +               */
>>> +              period_value = double_int_add (period_value, double_int_one);
>>> +              if (double_int_ucmp (max_niter, period_value) > 0)
>>>                  return false;
>>>              }
>>>            else
>>
> 
>> max_niter is the upper bound on the number of iterations of the loop, i.e., the number
>> of executions of its latch edge.
> 
> max_niter is set from estimated_loop_iterations, meaning from
> loop->nb_iterations_upper_bound.
> 
> consider:
> ...
> void f(int *a)
> {
>   int i;
> 
>   for (i = 0; i < 10; ++i)
>     a[i] = 0;
> }
> ...
> 
> at ivopts, it looks like this (compiled with -Os -fno-tree-vrp
> -fno-tree-dominator-opts -fno-tree-loop-ivcanon, to get a source-like
> representation)
> ...
> f (int * a)
> {
>   int i;
>   int * D.2009;
>   unsigned int D.2008;
>   unsigned int i.0;
> 
> <bb 2>:
>   goto <bb 4>;
> 
> <bb 3>:
>   i.0_3 = (unsigned int) i_1;
>   D.2008_4 = i.0_3 * 4;
>   D.2009_6 = a_5(D) + D.2008_4;
>   *D.2009_6 = 0;
>   i_7 = i_1 + 1;
> 
> <bb 4>:
>   # i_1 = PHI <0(2), i_7(3)>
>   if (i_1 <= 9)
>     goto <bb 3>;
>   else
>     goto <bb 5>;
> 
> <bb 5>:
>   return;
> 
> }
> ...
> 
> 
> The header block of the loop is bb 4, the latch block is bb 3:
> ...
> (gdb) p loop.header.index
> $4 = 4
> (gdb) p loop.latch.index
> $5 = 3
> ...
> 
> The number of times the latch edge is executed, is 10.
> 
> But loop->nb_iterations_upper_bound, or max_niter is 11:
> ...
> (gdb) p *loop
> $1 = {num = 1, ninsns = 0, header = 0xf7dc2440, latch = 0xf7dc2400, lpt_decision
> = {decision = LPT_NONE, times = 0}, av_ninsns = 0, num_nodes = 2, superloops =
> 0xf7db6ee8, inner = 0x0, next = 0x0,
>   aux = 0x0, nb_iterations = 0xf7d3d540, nb_iterations_upper_bound = {low = 11,
> high = 0}, nb_iterations_estimate = {low = 11, high = 0}, any_upper_bound = 1
> '\001', any_estimate = 1 '\001',
>   can_be_parallel = 0 '\000', estimate_state = EST_AVAILABLE, bounds =
> 0xf7d3da2c, exits = 0xf7dc3d70}
> ...
> 
>> Therefore, the control induction variable of the loop
>> will (at the exit statement) achieve at most max_niter + 1 different values.
> 
> Based on what I observe, I'd say the control induction variable of the loop will
> achieve at most max_niter different values.
> 

Any thoughts on my observations above?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-28 17:58             ` Tom de Vries
@ 2011-05-30 15:12               ` Zdenek Dvorak
  2011-05-31  9:07                 ` Tom de Vries
  0 siblings, 1 reply; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-30 15:12 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> > The header block of the loop is bb 4, the latch block is bb 3:
> > ...
> > (gdb) p loop.header.index
> > $4 = 4
> > (gdb) p loop.latch.index
> > $5 = 3
> > ...
> > 
> > The number of times the latch edge is executed, is 10.
> > 
> > But loop->nb_iterations_upper_bound, or max_niter is 11:

this is a bit strange, it looks like the # of iterations estimation is setting
nb_iterations_upper_bound too conservatively (or I gave nb_iterations_upper_bound
a different semantics than I remember -- but both my memory and the comment in cfgloop.h
suggest that nb_iterations_upper_bound >= nb_iterations, i.e., that it should be 10 in your
example),

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-30 15:12               ` Zdenek Dvorak
@ 2011-05-31  9:07                 ` Tom de Vries
  2011-05-31  9:11                   ` Zdenek Dvorak
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-05-31  9:07 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

On 05/30/2011 02:38 PM, Zdenek Dvorak wrote:
> Hi,
> 
>>> The header block of the loop is bb 4, the latch block is bb 3:
>>> ...
>>> (gdb) p loop.header.index
>>> $4 = 4
>>> (gdb) p loop.latch.index
>>> $5 = 3
>>> ...
>>>
>>> The number of times the latch edge is executed, is 10.
>>>
>>> But loop->nb_iterations_upper_bound, or max_niter is 11:
> 
> this is a bit strange, it looks like the # of iterations estimation is setting
> nb_iterations_upper_bound too conservatively (or I gave nb_iterations_upper_bound
> a different semantics than I remember -- but both my memory and the comment in cfgloop.h
> suggest that nb_iterations_upper_bound >= nb_iterations, i.e., that it should be 10 in your
> example),
> 

The actual values of nb_iterations_upper_bound are determined by this code
fragment in record_estimate.

/* Records that AT_STMT is executed at most BOUND + 1 times in LOOP.  IS_EXIT
   is true if the loop is exited immediately after STMT, and this exit
   is taken at last when the STMT is executed BOUND + 1 times.
   REALISTIC is true if BOUND is expected to be close to the real number
   of iterations.  UPPER is true if we are sure the loop iterates at most
   BOUND times.  I_BOUND is an unsigned double_int upper estimate on BOUND.  */

static void
record_estimate (struct loop *loop, tree bound, double_int i_bound,
		 gimple at_stmt, bool is_exit, bool realistic, bool upper)
{
  ...

  /* Update the number of iteration estimates according to the bound.
     If at_stmt is an exit, then every statement in the loop is
     executed at most BOUND + 1 times.  If it is not an exit, then
     some of the statements before it could be executed BOUND + 2
     times, if an exit of LOOP is before stmt.  */
  exit = single_exit (loop);
  if (is_exit
      || (exit != NULL
	  && dominated_by_p (CDI_DOMINATORS,
			     exit->src, gimple_bb (at_stmt))))
    delta = double_int_one;
  else
    delta = double_int_two;
  i_bound = double_int_add (i_bound, delta);


As far as I can tell, what is current calculated in i_bound (and assigned to
nb_iterations_upper_bound), is the maximum amount of times any statement in the
loop is executed, where any includes exit tests. Differently put, the maximum
amount of times the loop header is executed.

This is confirmed by this comment in tree-vrp.c:

  /* Try to use estimated number of iterations for the loop to constrain the
     final value in the evolution.
     We are interested in the number of executions of the latch, while
     nb_iterations_upper_bound includes the last execution of the exit test.  */

I modified the patch to improved the comment.

Ok for trunk?

Thanks,
- Tom

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c (revision 173734)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -4391,8 +4391,14 @@ may_eliminate_iv (struct ivopts_data *da
             {
               if (!estimated_loop_iterations (loop, true, &max_niter))
                 return false;
-              /* The loop bound is already adjusted by adding 1.  */
-              if (double_int_ucmp (max_niter, period_value) > 0)
+              /* (a) loop->nb_iterations_upper_bound (assigned to max_niter)
+                     includes the last execution of the exit test.
+                 (b) The number of distinct values of the cand is
+                     period_value + 1.
+                 So, the transformation is allowed if
+                 max_niter <= period_value + 1.  */
+              period_value = double_int_add (period_value, double_int_one);
+              if (double_int_ucmp (max_niter, period_value) > 0)
                 return false;
             }
           else

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-31  9:07                 ` Tom de Vries
@ 2011-05-31  9:11                   ` Zdenek Dvorak
  2011-06-11 10:13                     ` Tom de Vries
  0 siblings, 1 reply; 46+ messages in thread
From: Zdenek Dvorak @ 2011-05-31  9:11 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> As far as I can tell, what is current calculated in i_bound (and assigned to
> nb_iterations_upper_bound), is the maximum amount of times any statement in the
> loop is executed, where any includes exit tests. Differently put, the maximum
> amount of times the loop header is executed.

hmm... this is rather confusing, I don't really recall why I gave
nb_iterations_upper_bound a different semantics from any other instance
of what # of iterations of a loop means.  

> This is confirmed by this comment in tree-vrp.c:
> 
>   /* Try to use estimated number of iterations for the loop to constrain the
>      final value in the evolution.
>      We are interested in the number of executions of the latch, while
>      nb_iterations_upper_bound includes the last execution of the exit test.  */
> 
> I modified the patch to improved the comment.

I think a better fix would be to make the nb_iterations_upper_bound semantics
consistent with that of nb_iterations.  Let me try to do it, hopefully this should
be mostly mechanical,

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  4/10] Iv init cost.
  2011-05-26 12:24       ` Tom de Vries
@ 2011-05-31 15:22         ` Richard Sandiford
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2011-05-31 15:22 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Zdenek Dvorak, gcc-patches

Hi Tom,

Thanks for the reply, and sorry for responding so slowly.

Tom de Vries <vries@codesourcery.com> writes:
> On 05/25/2011 03:44 PM, Richard Sandiford wrote:
>> Sorry for being so late.  I was just curious...
>> 
>> Tom de Vries <vries@codesourcery.com> writes:
>>> The init cost of an iv will in general not be zero. It will be
>>> exceptional that the iv register happens to be initialized with the
>>> proper value at no cost. In general, there will at the very least be a
>>> regcopy or a const set.
>>>
>>> 2011-05-05  Tom de Vries  <tom@codesourcery.com>
>>>
>>> 	PR target/45098
>>> 	* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
>>> 	cost_base.cost == 0.
>>> Index: gcc/tree-ssa-loop-ivopts.c
>>> ===================================================================
>>> --- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
>>> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
>>> @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
>>>  
>>>    base = cand->iv->base;
>>>    cost_base = force_var_cost (data, base, NULL);
>>> +  if (cost_base.cost == 0)
>>> +      cost_base.cost = COSTS_N_INSNS (1);
>>>    cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
>>>  
>>>    cost = cost_step + adjust_setup_cost (data, cost_base.cost);
>> 
>> ...why does this reasoning apply only to this call to force_var_cost?
>> 
>> Richard
>
> force_var_cost is described as estimating the cost of forcing expression expr
> into a variable. If expr is already a var, this definition is ambiguous.
> If we can use the var directly, the cost is zero, but if we need a regcopy, it
> should be the cost of a regcopy.
>
> What is special for an iv, is that we know that it is not only used but also
> modified. If a var is used in or after the loop, we need a regcopy to init the
> iv with that var. If that var is not used in or after the loop, we can use that
> var as iv. The patch above is a heuristic that estimates that the latter
> situation is the less frequent one.
>
> In general, we don't have such specific information, and the the cost of zero is
> a good choice then.
>
> We could add a parameter to force_var_cost that indicates this choice, that
> would perhaps be a better fix.
>
> As for the reasoning related to the const set, that is something that indeed
> holds more general, and could be implemented in force_var_cost, which is what
> you're suggesting if I understand you correctly.

It was actually a genuine question.  I honestly wasn't sure whether
(and why) this was the only site at which a reg copy should be counted.
However...

> The tentative patch below explores these last 2 ideas.

...this makes things _much_ clearer to me FWIW.  Thanks.

Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-05-31  9:11                   ` Zdenek Dvorak
@ 2011-06-11 10:13                     ` Tom de Vries
  2011-06-12  1:17                       ` Zdenek Dvorak
  0 siblings, 1 reply; 46+ messages in thread
From: Tom de Vries @ 2011-06-11 10:13 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2949 bytes --]

Hi Zdenek,

On 05/31/2011 10:04 AM, Zdenek Dvorak wrote:
> Hi,
> 
>> As far as I can tell, what is current calculated in i_bound (and assigned to
>> nb_iterations_upper_bound), is the maximum amount of times any statement in the
>> loop is executed, where any includes exit tests. Differently put, the maximum
>> amount of times the loop header is executed.
> 
> hmm... this is rather confusing, I don't really recall why I gave
> nb_iterations_upper_bound a different semantics from any other instance
> of what # of iterations of a loop means.  
> 
>> This is confirmed by this comment in tree-vrp.c:
>>
>>   /* Try to use estimated number of iterations for the loop to constrain the
>>      final value in the evolution.
>>      We are interested in the number of executions of the latch, while
>>      nb_iterations_upper_bound includes the last execution of the exit test.  */
>>
>> I modified the patch to improved the comment.
> 
> I think a better fix would be to make the nb_iterations_upper_bound semantics
> consistent with that of nb_iterations.  Let me try to do it, hopefully this should
> be mostly mechanical,
> 

This patch changes the semantics of nb_iterations_upper_bound and
nb_iterations_estimate, to mean: the amount of latch executions.

That change is countered at all use sites, except for
tree-ssa-loop-ivopts.c:may_eliminate_iv.

Passed x86_64 bootstrapping and reg-testing.

OK for trunk?

2011-06-10  Zdenek Dvorak  <ook@ucw.cz>
	    Tom de Vries  <tom@codesourcery.com>

	PR target/45098
	* cfgloop.h (nb_iterations_upper_bound, nb_iterations_estimate):
	Document changed semantics.
	(max_stmt_executions, max_stmt_executions_int): Declare.
	* tree-data-ref.c (estimated_loop_iterations)
	(estimated_loop_iterations_int): Move functions...
	* tree-ssa-loop-niter.c (estimated_loop_iterations)
	(estimated_loop_iterations_int): here.
	(record_estimate): Change nb_iterations_upper_bound and
	nb_iterations_estimate semantics.
	(max_stmt_executions, max_stmt_executions_int): New function.
	* tree-data-ref.c (estimated_loop_iterations_tree): Rename to ...
	(max_stmt_executions_tree): this.
	(analyze_miv_subscript): Use max_stmt_executions_tree instead of
	estimated_loop_iterations_tree.
	tree-ssa-loop-ivopts.c (avg_loop_niter): Use
	max_stmt_executions_int instead of estimated_loop_iterations_int.
	* predict.c (predict_loops): Idem.
	* tree-parloops.c (parallelize_loops): Idem.
	* tree-data-ref.c (analyze_siv_subscript_cst_affine)
	(compute_overlap_steps_for_affine_1_2, analyze_subscript_affine_affine)
	(init_omega_for_ddr_1): Idem.
	* tree-ssa-loop-prefetch.c (determine_loop_nest_reuse)
	(loop_prefetch_arrays): Idem
	* graphite-sese-to-poly.c (build_loop_iteration_domains): Use
	max_stmt_executions instead of estimated_loop_iterations.
	* tree-data-ref.c (estimated_loop_iterations_tree): Idem.
	* tree-vrp.c (adjust_range_with_scev): Use estimated_loop_iterations
	instead of nb_iterations_upper_bound.

[-- Attachment #2: pr45098-nb_iterations_upper_bound.7.patch --]
[-- Type: text/x-patch, Size: 17283 bytes --]

Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c	(revision 174810)
+++ gcc/tree-vrp.c	(working copy)
@@ -3403,44 +3403,42 @@ adjust_range_with_scev (value_range_t *v
     tmax = TYPE_MAX_VALUE (type);
 
   /* Try to use estimated number of iterations for the loop to constrain the
-     final value in the evolution.
-     We are interested in the number of executions of the latch, while
-     nb_iterations_upper_bound includes the last execution of the exit test.  */
+     final value in the evolution.  */
   if (TREE_CODE (step) == INTEGER_CST
-      && loop->any_upper_bound
-      && !double_int_zero_p (loop->nb_iterations_upper_bound)
       && is_gimple_val (init)
       && (TREE_CODE (init) != SSA_NAME
 	  || get_value_range (init)->type == VR_RANGE))
     {
-      value_range_t maxvr = { VR_UNDEFINED, NULL_TREE, NULL_TREE, NULL };
-      double_int dtmp;
-      bool unsigned_p = TYPE_UNSIGNED (TREE_TYPE (step));
-      int overflow = 0;
+      double_int nit;
 
-      dtmp = double_int_mul_with_sign (tree_to_double_int (step),
-                                       double_int_sub (
-                                           loop->nb_iterations_upper_bound,
-                                           double_int_one),
-                                       unsigned_p, &overflow);
-      /* If the multiplication overflowed we can't do a meaningful
-	 adjustment.  Likewise if the result doesn't fit in the type
-	 of the induction variable.  For a signed type we have to
-	 check whether the result has the expected signedness which
-	 is that of the step as nb_iterations_upper_bound is unsigned.  */
-      if (!overflow
-	  && double_int_fits_to_tree_p (TREE_TYPE (init), dtmp)
-	  && (unsigned_p
-	      || ((dtmp.high ^ TREE_INT_CST_HIGH (step)) >= 0)))
+      if (estimated_loop_iterations (loop, true, &nit))
 	{
-	  tem = double_int_to_tree (TREE_TYPE (init), dtmp);
-	  extract_range_from_binary_expr (&maxvr, PLUS_EXPR,
-					  TREE_TYPE (init), init, tem);
-	  /* Likewise if the addition did.  */
-	  if (maxvr.type == VR_RANGE)
+	  value_range_t maxvr = { VR_UNDEFINED, NULL_TREE, NULL_TREE, NULL };
+	  double_int dtmp;
+	  bool unsigned_p = TYPE_UNSIGNED (TREE_TYPE (step));
+	  int overflow = 0;
+
+	  dtmp = double_int_mul_with_sign (tree_to_double_int (step), nit,
+					   unsigned_p, &overflow);
+	  /* If the multiplication overflowed we can't do a meaningful
+	     adjustment.  Likewise if the result doesn't fit in the type
+	     of the induction variable.  For a signed type we have to
+	     check whether the result has the expected signedness which
+	     is that of the step as number of iterations is unsigned.  */
+	  if (!overflow
+	      && double_int_fits_to_tree_p (TREE_TYPE (init), dtmp)
+	      && (unsigned_p
+		  || ((dtmp.high ^ TREE_INT_CST_HIGH (step)) >= 0)))
 	    {
-	      tmin = maxvr.min;
-	      tmax = maxvr.max;
+	      tem = double_int_to_tree (TREE_TYPE (init), dtmp);
+	      extract_range_from_binary_expr (&maxvr, PLUS_EXPR,
+					      TREE_TYPE (init), init, tem);
+	      /* Likewise if the addition did.  */
+	      if (maxvr.type == VR_RANGE)
+		{
+		  tmin = maxvr.min;
+		  tmax = maxvr.max;
+		}
 	    }
 	}
     }
Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c	(revision 174810)
+++ gcc/tree-ssa-loop-niter.c	(working copy)
@@ -2568,18 +2568,17 @@ record_estimate (struct loop *loop, tree
     }
 
   /* Update the number of iteration estimates according to the bound.
-     If at_stmt is an exit, then every statement in the loop is
-     executed at most BOUND + 1 times.  If it is not an exit, then
-     some of the statements before it could be executed BOUND + 2
-     times, if an exit of LOOP is before stmt.  */
+     If at_stmt is an exit or dominates the single exit from the loop,
+     then the loop latch is executed at most BOUND times, otherwise
+     it can be executed BOUND + 1 times.  */
   exit = single_exit (loop);
   if (is_exit
       || (exit != NULL
 	  && dominated_by_p (CDI_DOMINATORS,
 			     exit->src, gimple_bb (at_stmt))))
-    delta = double_int_one;
+    delta = double_int_zero;
   else
-    delta = double_int_two;
+    delta = double_int_one;
   i_bound = double_int_add (i_bound, delta);
 
   /* If an overflow occurred, ignore the result.  */
@@ -3042,6 +3041,93 @@ estimate_numbers_of_iterations_loop (str
     loop->nb_iterations_estimate = loop->nb_iterations_upper_bound;
 }
 
+/* Sets NIT to the estimated number of executions of the latch of the
+   LOOP.  If CONSERVATIVE is true, we must be sure that NIT is at least as
+   large as the number of iterations.  If we have no reliable estimate,
+   the function returns false, otherwise returns true.  */
+
+bool
+estimated_loop_iterations (struct loop *loop, bool conservative,
+			   double_int *nit)
+{
+  estimate_numbers_of_iterations_loop (loop, true);
+  if (conservative)
+    {
+      if (!loop->any_upper_bound)
+	return false;
+
+      *nit = loop->nb_iterations_upper_bound;
+    }
+  else
+    {
+      if (!loop->any_estimate)
+	return false;
+
+      *nit = loop->nb_iterations_estimate;
+    }
+
+  return true;
+}
+
+/* Similar to estimated_loop_iterations, but returns the estimate only
+   if it fits to HOST_WIDE_INT.  If this is not the case, or the estimate
+   on the number of iterations of LOOP could not be derived, returns -1.  */
+
+HOST_WIDE_INT
+estimated_loop_iterations_int (struct loop *loop, bool conservative)
+{
+  double_int nit;
+  HOST_WIDE_INT hwi_nit;
+
+  if (!estimated_loop_iterations (loop, conservative, &nit))
+    return -1;
+
+  if (!double_int_fits_in_shwi_p (nit))
+    return -1;
+  hwi_nit = double_int_to_shwi (nit);
+
+  return hwi_nit < 0 ? -1 : hwi_nit;
+}
+
+/* Returns an upper bound on the number of executions of statements
+   in the LOOP.  For statements before the loop exit, this exceeds
+   the number of execution of the latch by one.  */
+
+HOST_WIDE_INT
+max_stmt_executions_int (struct loop *loop, bool conservative)
+{
+  HOST_WIDE_INT nit = estimated_loop_iterations_int (loop, conservative);
+  HOST_WIDE_INT snit;
+
+  if (nit == -1)
+    return -1;
+
+  snit = (HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) nit + 1);
+
+  /* If the computation overflows, return -1.  */
+  return snit < 0 ? -1 : snit;
+}
+
+/* Sets NIT to the estimated number of executions of the latch of the
+   LOOP, plus one.  If CONSERVATIVE is true, we must be sure that NIT is at
+   least as large as the number of iterations.  If we have no reliable
+   estimate, the function returns false, otherwise returns true.  */
+
+bool
+max_stmt_executions (struct loop *loop, bool conservative, double_int *nit)
+{
+  double_int nit_minus_one;
+
+  if (!estimated_loop_iterations (loop, conservative, nit))
+    return false;
+
+  nit_minus_one = *nit;
+
+  *nit = double_int_add (*nit, double_int_one);
+
+  return double_int_ucmp (*nit, nit_minus_one) > 0;
+}
+
 /* Records estimates on numbers of iterations of loops.  */
 
 void
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 174810)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -115,7 +115,7 @@ along with GCC; see the file COPYING3.
 static inline HOST_WIDE_INT
 avg_loop_niter (struct loop *loop)
 {
-  HOST_WIDE_INT niter = estimated_loop_iterations_int (loop, false);
+  HOST_WIDE_INT niter = max_stmt_executions_int (loop, false);
   if (niter == -1)
     return AVG_LOOP_NITER (loop);
 
Index: gcc/predict.c
===================================================================
--- gcc/predict.c	(revision 174810)
+++ gcc/predict.c	(working copy)
@@ -994,7 +994,7 @@ predict_loops (void)
 	     the loop, use it to predict this exit.  */
 	  else if (n_exits == 1)
 	    {
-	      nitercst = estimated_loop_iterations_int (loop, false);
+	      nitercst = max_stmt_executions_int (loop, false);
 	      if (nitercst < 0)
 		continue;
 	      if (nitercst > max)
Index: gcc/tree-parloops.c
===================================================================
--- gcc/tree-parloops.c	(revision 174810)
+++ gcc/tree-parloops.c	(working copy)
@@ -2134,7 +2134,7 @@ parallelize_loops (void)
 	  /* FIXME: the check for vector phi nodes could be removed.  */
 	  || loop_has_vector_phi_nodes (loop))
 	continue;
-      estimated = estimated_loop_iterations_int (loop, false);
+      estimated = max_stmt_executions_int (loop, false);
       /* FIXME: Bypass this check as graphite doesn't update the
       count and frequency correctly now.  */
       if (!flag_loop_parallelize_all
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	(revision 174810)
+++ gcc/tree-data-ref.c	(working copy)
@@ -1621,66 +1621,18 @@ analyze_ziv_subscript (tree chrec_a,
     fprintf (dump_file, ")\n");
 }
 
-/* Sets NIT to the estimated number of executions of the statements in
-   LOOP.  If CONSERVATIVE is true, we must be sure that NIT is at least as
-   large as the number of iterations.  If we have no reliable estimate,
-   the function returns false, otherwise returns true.  */
-
-bool
-estimated_loop_iterations (struct loop *loop, bool conservative,
-			   double_int *nit)
-{
-  estimate_numbers_of_iterations_loop (loop, true);
-  if (conservative)
-    {
-      if (!loop->any_upper_bound)
-	return false;
-
-      *nit = loop->nb_iterations_upper_bound;
-    }
-  else
-    {
-      if (!loop->any_estimate)
-	return false;
-
-      *nit = loop->nb_iterations_estimate;
-    }
-
-  return true;
-}
-
-/* Similar to estimated_loop_iterations, but returns the estimate only
-   if it fits to HOST_WIDE_INT.  If this is not the case, or the estimate
-   on the number of iterations of LOOP could not be derived, returns -1.  */
-
-HOST_WIDE_INT
-estimated_loop_iterations_int (struct loop *loop, bool conservative)
-{
-  double_int nit;
-  HOST_WIDE_INT hwi_nit;
-
-  if (!estimated_loop_iterations (loop, conservative, &nit))
-    return -1;
-
-  if (!double_int_fits_in_shwi_p (nit))
-    return -1;
-  hwi_nit = double_int_to_shwi (nit);
-
-  return hwi_nit < 0 ? -1 : hwi_nit;
-}
-
-/* Similar to estimated_loop_iterations, but returns the estimate as a tree,
+/* Similar to max_stmt_executions_int, but returns the bound as a tree,
    and only if it fits to the int type.  If this is not the case, or the
-   estimate on the number of iterations of LOOP could not be derived, returns
+   bound  on the number of iterations of LOOP could not be derived, returns
    chrec_dont_know.  */
 
 static tree
-estimated_loop_iterations_tree (struct loop *loop, bool conservative)
+max_stmt_executions_tree (struct loop *loop)
 {
   double_int nit;
   tree type;
 
-  if (!estimated_loop_iterations (loop, conservative, &nit))
+  if (!max_stmt_executions (loop, true, &nit))
     return chrec_dont_know;
 
   type = lang_hooks.types.type_for_size (INT_TYPE_SIZE, true);
@@ -1763,7 +1715,7 @@ analyze_siv_subscript_cst_affine (tree c
 
 		      /* Perform weak-zero siv test to see if overlap is
 			 outside the loop bounds.  */
-		      numiter = estimated_loop_iterations_int (loop, false);
+		      numiter = max_stmt_executions_int (loop, true);
 
 		      if (numiter >= 0
 			  && compare_tree_int (tmp, numiter) > 0)
@@ -1841,7 +1793,7 @@ analyze_siv_subscript_cst_affine (tree c
 
 		      /* Perform weak-zero siv test to see if overlap is
 			 outside the loop bounds.  */
-		      numiter = estimated_loop_iterations_int (loop, false);
+		      numiter = max_stmt_executions_int (loop, true);
 
 		      if (numiter >= 0
 			  && compare_tree_int (tmp, numiter) > 0)
@@ -2022,10 +1974,9 @@ compute_overlap_steps_for_affine_1_2 (tr
   step_z = int_cst_value (CHREC_RIGHT (chrec_b));
 
   niter_x =
-    estimated_loop_iterations_int (get_chrec_loop (CHREC_LEFT (chrec_a)),
-				   false);
-  niter_y = estimated_loop_iterations_int (get_chrec_loop (chrec_a), false);
-  niter_z = estimated_loop_iterations_int (get_chrec_loop (chrec_b), false);
+    max_stmt_executions_int (get_chrec_loop (CHREC_LEFT (chrec_a)), true);
+  niter_y = max_stmt_executions_int (get_chrec_loop (chrec_a), true);
+  niter_z = max_stmt_executions_int (get_chrec_loop (chrec_b), true);
 
   if (niter_x < 0 || niter_y < 0 || niter_z < 0)
     {
@@ -2350,10 +2301,8 @@ analyze_subscript_affine_affine (tree ch
 	  HOST_WIDE_INT niter, niter_a, niter_b;
 	  affine_fn ova, ovb;
 
-	  niter_a = estimated_loop_iterations_int (get_chrec_loop (chrec_a),
-						   false);
-	  niter_b = estimated_loop_iterations_int (get_chrec_loop (chrec_b),
-						   false);
+	  niter_a = max_stmt_executions_int (get_chrec_loop (chrec_a), true);
+	  niter_b = max_stmt_executions_int (get_chrec_loop (chrec_b), true);
 	  niter = MIN (niter_a, niter_b);
 	  step_a = int_cst_value (CHREC_RIGHT (chrec_a));
 	  step_b = int_cst_value (CHREC_RIGHT (chrec_b));
@@ -2460,10 +2409,10 @@ analyze_subscript_affine_affine (tree ch
 
 	  if (i1 > 0 && j1 > 0)
 	    {
-	      HOST_WIDE_INT niter_a = estimated_loop_iterations_int
-		(get_chrec_loop (chrec_a), false);
-	      HOST_WIDE_INT niter_b = estimated_loop_iterations_int
-		(get_chrec_loop (chrec_b), false);
+	      HOST_WIDE_INT niter_a = max_stmt_executions_int
+		(get_chrec_loop (chrec_a), true);
+	      HOST_WIDE_INT niter_b = max_stmt_executions_int
+		(get_chrec_loop (chrec_b), true);
 	      HOST_WIDE_INT niter = MIN (niter_a, niter_b);
 
 	      /* (X0, Y0) is a solution of the Diophantine equation:
@@ -2740,8 +2689,7 @@ analyze_miv_subscript (tree chrec_a,
 	 in the same order.  */
       *overlaps_a = conflict_fn (1, affine_fn_cst (integer_zero_node));
       *overlaps_b = conflict_fn (1, affine_fn_cst (integer_zero_node));
-      *last_conflicts = estimated_loop_iterations_tree
-				(get_chrec_loop (chrec_a), true);
+      *last_conflicts = max_stmt_executions_tree (get_chrec_loop (chrec_a));
       dependence_stats.num_miv_dependent++;
     }
 
@@ -3754,7 +3702,7 @@ init_omega_for_ddr_1 (struct data_refere
   for (i = 0; i <= DDR_INNER_LOOP (ddr)
 	 && VEC_iterate (loop_p, DDR_LOOP_NEST (ddr), i, loopi); i++)
     {
-      HOST_WIDE_INT nbi = estimated_loop_iterations_int (loopi, false);
+      HOST_WIDE_INT nbi = max_stmt_executions_int (loopi, true);
 
       /* 0 <= loop_x */
       ineq = omega_add_zero_geq (pb, omega_black);
Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
--- gcc/tree-ssa-loop-prefetch.c	(revision 174810)
+++ gcc/tree-ssa-loop-prefetch.c	(working copy)
@@ -1549,7 +1549,7 @@ determine_loop_nest_reuse (struct loop *
 	continue;
 
       aloop = VEC_index (loop_p, vloops, i);
-      vol = estimated_loop_iterations_int (aloop, false);
+      vol = max_stmt_executions_int (aloop, false);
       if (vol < 0)
 	vol = expected_loop_iterations (aloop);
       volume *= vol;
@@ -1801,7 +1801,7 @@ loop_prefetch_arrays (struct loop *loop)
     return false;
 
   ahead = (PREFETCH_LATENCY + time - 1) / time;
-  est_niter = estimated_loop_iterations_int (loop, false);
+  est_niter = max_stmt_executions_int (loop, false);
 
   /* Prefetching is not likely to be profitable if the trip count to ahead
      ratio is too small.  */
Index: gcc/graphite-sese-to-poly.c
===================================================================
--- gcc/graphite-sese-to-poly.c	(revision 174810)
+++ gcc/graphite-sese-to-poly.c	(working copy)
@@ -1092,7 +1092,7 @@ build_loop_iteration_domains (scop_p sco
       scan_tree_for_params (SCOP_REGION (scop), nb_iters, ub_expr, one);
       mpz_clear (one);
 
-      if (estimated_loop_iterations (loop, true, &nit))
+      if (max_stmt_executions (loop, true, &nit))
 	add_upper_bounds_from_estimated_nit (scop, nit, dim, ub_expr);
 
       /* loop_i <= expr_nb_iters */
Index: gcc/cfgloop.h
===================================================================
--- gcc/cfgloop.h	(revision 174810)
+++ gcc/cfgloop.h	(working copy)
@@ -143,11 +143,13 @@ struct GTY ((chain_next ("%h.next"))) lo
      computes and caches the computed information in this field.  */
   tree nb_iterations;
 
-  /* An integer guaranteed to bound the number of iterations of the loop
-     from above.  */
+  /* An integer guaranteed to be greater or equal to nb_iterations.  Only
+     valid if any_upper_bound is true.  */ 
   double_int nb_iterations_upper_bound;
 
-  /* An integer giving the expected number of iterations of the loop.  */
+  /* An integer giving an estimate on nb_iterations.  Unlike
+     nb_iterations_upper_bound, there is no guarantee that it is at least
+     nb_iterations.  */
   double_int nb_iterations_estimate;
 
   bool any_upper_bound;
@@ -278,7 +280,9 @@ extern rtx doloop_condition_get (rtx);
 
 void estimate_numbers_of_iterations_loop (struct loop *, bool);
 HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
+HOST_WIDE_INT max_stmt_executions_int (struct loop *, bool);
 bool estimated_loop_iterations (struct loop *, bool, double_int *);
+bool max_stmt_executions (struct loop *, bool, double_int *);
 
 /* Loop manipulation.  */
 extern bool can_duplicate_loop_p (const struct loop *loop);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH PR45098,  7/10] Nowrap limits iterations
  2011-06-11 10:13                     ` Tom de Vries
@ 2011-06-12  1:17                       ` Zdenek Dvorak
  0 siblings, 0 replies; 46+ messages in thread
From: Zdenek Dvorak @ 2011-06-12  1:17 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Hi,

> > I think a better fix would be to make the nb_iterations_upper_bound semantics
> > consistent with that of nb_iterations.  Let me try to do it, hopefully this should
> > be mostly mechanical,
> > 
> 
> This patch changes the semantics of nb_iterations_upper_bound and
> nb_iterations_estimate, to mean: the amount of latch executions.
> 
> That change is countered at all use sites, except for
> tree-ssa-loop-ivopts.c:may_eliminate_iv.
> 
> Passed x86_64 bootstrapping and reg-testing.
> 
> OK for trunk?

yes,

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2011-06-11 18:53 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-17  7:23 [PATCH, PR45098] Tom de Vries
2011-05-17  7:25 ` [PATCH, PR45098, 1/10] Tom de Vries
2011-05-19 11:17   ` [PATCH PR45098, 1/10] Proc object-size fix Tom de Vries
2011-05-17  8:12 ` [PATCH, PR45098, 2/10] Tom de Vries
2011-05-18  9:39   ` Zdenek Dvorak
2011-05-17  8:30 ` [PATCH, PR45098, 3/10] Tom de Vries
2011-05-18 10:10   ` Zdenek Dvorak
2011-05-18 13:00     ` Tom de Vries
     [not found]       ` <20110518152457.GA13360@kam.mff.cuni.cz>
2011-05-18 19:30         ` Tom de Vries
2011-05-17  8:32 ` [PATCH, PR45098, 4/10] Tom de Vries
2011-05-18 17:46   ` [PATCH PR45098, 4/10] Iv init cost Tom de Vries
2011-05-18 22:59     ` Zdenek Dvorak
2011-05-25 14:20     ` Richard Sandiford
2011-05-26 12:24       ` Tom de Vries
2011-05-31 15:22         ` Richard Sandiford
2011-05-17  8:37 ` [PATCH, PR45098, 5/10] Tom de Vries
2011-05-18 17:48   ` [PATCH PR45098, 5/10] Bound cost Tom de Vries
2011-05-19  4:45     ` Zdenek Dvorak
2011-05-17  8:42 ` [PATCH, PR45098, 6/10] Tom de Vries
2011-05-18 17:48   ` [PATCH PR45098, 6/10] Bound cost - test cases Tom de Vries
2011-05-17  8:58 ` [PATCH, PR45098, 7/10] Tom de Vries
2011-05-18 17:52   ` [PATCH PR45098, 7/10] Nowrap limits iterations Tom de Vries
2011-05-19  4:45     ` Zdenek Dvorak
2011-05-20 12:22       ` Tom de Vries
2011-05-21 18:54         ` Zdenek Dvorak
2011-05-21 22:53           ` Tom de Vries
2011-05-28 17:58             ` Tom de Vries
2011-05-30 15:12               ` Zdenek Dvorak
2011-05-31  9:07                 ` Tom de Vries
2011-05-31  9:11                   ` Zdenek Dvorak
2011-06-11 10:13                     ` Tom de Vries
2011-06-12  1:17                       ` Zdenek Dvorak
2011-05-23 14:50           ` H.J. Lu
2011-05-17  9:03 ` [PATCH, PR45098, 8/10] Tom de Vries
2011-05-18 18:23   ` [PATCH PR45098, 8/10] Nowrap limits iterations - test cases Tom de Vries
2011-05-18 18:27   ` [PATCH PR45098, 9/10] Cheap shift-add Tom de Vries
2011-05-19  5:33     ` Zdenek Dvorak
2011-05-20 11:32       ` Tom de Vries
2011-05-20 20:09         ` Zdenek Dvorak
2011-05-21 15:05         ` Eric Botcazou
2011-05-22 19:33           ` Tom de Vries
2011-05-22 20:22             ` Richard Guenther
2011-05-22 21:11             ` Eric Botcazou
2011-05-17 10:03 ` [PATCH, PR45098, 9/10] Tom de Vries
2011-05-17 10:30 ` [PATCH, PR45098, 10/10] Tom de Vries
2011-05-18 18:30   ` [PATCH PR45098, 10/10] Cheap shift-add - test case Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).