[PATCH 0/3] Fix PR47654 and PR49649

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Fix PR47654 and PR49649
@ 2011-07-07 18:07 Sebastian Pop
  2011-07-07 18:07 ` [PATCH 1/3] Start counting nesting level from 0 and use the standard "Polyhedral SCattering Transformed" psct_* interface Sebastian Pop
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Sebastian Pop @ 2011-07-07 18:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, tobias, Sebastian Pop

Hi,

First there are two cleanup patches independent of the fix:

  Start counting nesting level from 0.
  Do not compute twice type, lb, and ub.

Then the patch that fixes PR47654:

  Fix PR47654: Compute LB and UB of a CLAST expression.

One of the reasons we cannot determine the IV type only from the
polyhedral representation is that as in the testcase of PR47654, we
are asked to generate an induction variable going from 0 to 127.  That
could be represented with a "char".  However the upper bound
expression of the loop generated by CLOOG is min (127, 51*scat_1 + 50)
and that would overflow if we use a "char" type.  To evaluate a type
in which the expression 51*scat_1 + 50 does not overflow, we have to
compute an upper and lower bound for the expression.

To fix the problem exposed by Tobias:

> for (i = 0 ; i < 2; i++)
>  for (j = i ; j < i + 1; j++)
>    for (k = j ; k < j + 1; k++)
>      for (m = k ; m < k + 1; m++)
>        for (n = m ; n < m + 1; n++)
>          A[0] += A[n];
> 
> I am a little bit afraid that we will increase the type size by an
> order of magnitude (or at least one bit) for each nesting level.

instead of computing the lb and ub of scat_1 in "51*scat_1 + 50" based
on the type of scat_1 (that we already code generated when building
the outer loop), we use the polyhedral representation to get an
accurate lb and ub for scat_1.

When translating the substitutions of a user statement using this
precise method, like for example S5 in vect-pr43423.c:

  for (scat_1=0;scat_1<=min(T_3-1,T_4-1);scat_1++) {
    S5(scat_1);

we get a type that is too precise: based on the interval [0,99] we get
the type "unsigned char" when the type of scat_1 is "int", misleading
the vectorizer due to the insertion of spurious casts:

#  Access function 0: (int) {(<unnamed-unsigned:8>) graphite_IV.7_56, +, 1}_3;
#)
affine dependence test not usable: access function not affine or constant.

So we have to keep around the previous code gcc_type_for_clast_* that
computes the type of an expression as the max precision of the
components of that expression, and use that when computing the types
of substitution expressions.

The patches passed together a full bootstrap and test on amd64-linux.
Ok for trunk?

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] Start counting nesting level from 0 and use the standard "Polyhedral SCattering Transformed" psct_* interface.
  2011-07-07 18:07 [PATCH 0/3] Fix PR47654 and PR49649 Sebastian Pop
@ 2011-07-07 18:07 ` Sebastian Pop
  2011-07-07 18:08 ` [PATCH 2/3] Do not compute twice type, lb, and ub Sebastian Pop
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Sebastian Pop @ 2011-07-07 18:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, tobias, Sebastian Pop

2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>

	* graphite-clast-to-gimple.c (compute_bounds_for_level): Call
	psct_dynamic_dim.
	(translate_clast_for_loop): Pass loop level to dependency_in_loop_p.
	(gcc_type_for_iv_of_clast_loop): Update use of level.
	(gloog): Start counting nesting level from 0.
	* graphite-clast-to-gimple.h (get_scattering_level): Removed.
	* graphite-dependences.c (graphite_carried_dependence_level_k): Call
	psct_dynamic_dim on level.
---
 gcc/ChangeLog                  |   11 +++++++++++
 gcc/graphite-clast-to-gimple.c |   11 +++++------
 gcc/graphite-clast-to-gimple.h |   12 ------------
 gcc/graphite-dependences.c     |    9 ++++++---
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 418f5dc..9a7849d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,16 @@
 2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* graphite-clast-to-gimple.c (compute_bounds_for_level): Call
+	psct_dynamic_dim.
+	(translate_clast_for_loop): Pass loop level to dependency_in_loop_p.
+	(gcc_type_for_iv_of_clast_loop): Update use of level.
+	(gloog): Start counting nesting level from 0.
+	* graphite-clast-to-gimple.h (get_scattering_level): Removed.
+	* graphite-dependences.c (graphite_carried_dependence_level_k): Call
+	psct_dynamic_dim on level.
+
+2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
+
 	PR tree-optimization/47654
 	* graphite-blocking.c (pbb_strip_mine_time_depth): Do not return bool.
 	(lst_do_strip_mine_loop): Return an int.
diff --git a/gcc/graphite-clast-to-gimple.c b/gcc/graphite-clast-to-gimple.c
index 6b17631..53af18e 100644
--- a/gcc/graphite-clast-to-gimple.c
+++ b/gcc/graphite-clast-to-gimple.c
@@ -622,7 +622,7 @@ compute_bounds_for_level (poly_bb_p pbb, int level, mpz_t low, mpz_t up)
       + pbb_dim_iter_domain (pbb) + pbb_nb_params (pbb);
 
     ppl_new_Linear_Expression_with_dimension (&le, dim);
-    ppl_set_coef (le, 2 * level + 1, 1);
+    ppl_set_coef (le, psct_dynamic_dim (pbb, level), 1);
   }
 
   ppl_max_for_le_pointset (ps, le, up);
@@ -687,7 +687,7 @@ gcc_type_for_iv_of_clast_loop (struct clast_for *stmt_for, int level,
 
   return max_signed_precision_type (lb_type, max_precision_type
 				    (ub_type, compute_type_for_level
-				     (pbb, level - 1)));
+				     (pbb, level)));
 }
 
 /* Creates a new LOOP corresponding to Cloog's STMT.  Inserts an
@@ -803,7 +803,7 @@ find_pbb_via_hash (htab_t bb_pbb_mapping, basic_block bb)
   return NULL;
 }
 
-/* Check data dependency in LOOP at scattering level LEVEL.
+/* Check data dependency in LOOP at level LEVEL.
    BB_PBB_MAPPING is a basic_block and it's related poly_bb_p
    mapping.  */
 
@@ -961,8 +961,7 @@ translate_clast_for_loop (sese region, loop_p context_loop,
   set_immediate_dominator (CDI_DOMINATORS, next_e->dest, next_e->src);
 
   if (flag_loop_parallelize_all
-      && !dependency_in_loop_p (loop, bb_pbb_mapping,
- 				get_scattering_level (level)))
+      && !dependency_in_loop_p (loop, bb_pbb_mapping, level))
     loop->can_be_parallel = true;
 
   return last_e;
@@ -1477,7 +1476,7 @@ gloog (scop_p scop, htab_t bb_pbb_mapping)
   translate_clast (region, context_loop, pc.stmt,
 		   if_region->true_region->entry,
 		   &newivs, newivs_index,
-		   bb_pbb_mapping, 1, params_index);
+		   bb_pbb_mapping, 0, params_index);
   graphite_verify ();
   scev_reset ();
   recompute_all_dominators ();
diff --git a/gcc/graphite-clast-to-gimple.h b/gcc/graphite-clast-to-gimple.h
index 9d599d6..b5affd9 100644
--- a/gcc/graphite-clast-to-gimple.h
+++ b/gcc/graphite-clast-to-gimple.h
@@ -63,16 +63,4 @@ eq_bb_pbb_map (const void *bb_pbb1, const void *bb_pbb2)
   return (bp1->bb->index == bp2->bb->index);
 }
 
-/* Returns the scattering dimension for STMTFOR.
-
-   The relationship between dimension in scattering matrix
-   and the DEPTH of the loop is:
-   DIMENSION = 2*DEPTH - 1
-*/
-
-static inline int get_scattering_level (int depth)
-{
-  return 2 * depth - 1;
-}
-
 #endif
diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index b9b1d1b..21f97c4 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -748,11 +748,13 @@ graphite_carried_dependence_level_k (poly_dr_p pdr1, poly_dr_p pdr2,
 {
   ppl_Pointset_Powerset_C_Polyhedron_t po;
   ppl_Pointset_Powerset_C_Polyhedron_t eqpp;
-  graphite_dim_t tdim1 = pbb_nb_scattering_transform (PDR_PBB (pdr1));
-  graphite_dim_t ddim1 = pbb_dim_iter_domain (PDR_PBB (pdr1));
+  poly_bb_p pbb = PDR_PBB (pdr1);
+  graphite_dim_t tdim1 = pbb_nb_scattering_transform (pbb);
+  graphite_dim_t ddim1 = pbb_dim_iter_domain (pbb);
   ppl_dimension_type dim;
   bool empty_p;
   poly_ddr_p pddr = new_poly_ddr (pdr1, pdr2, 1, false);
+  graphite_dim_t pos;
 
   if (PDDR_KIND (pddr) == unknown_dependence)
     {
@@ -768,7 +770,8 @@ graphite_carried_dependence_level_k (poly_dr_p pdr1, poly_dr_p pdr2,
 
   po = PDDR_DDP (pddr);
   ppl_Pointset_Powerset_C_Polyhedron_space_dimension (po, &dim);
-  eqpp = build_pairwise_scheduling (dim, level, tdim1 + ddim1, 1);
+  pos = psct_dynamic_dim (pbb, level);
+  eqpp = build_pairwise_scheduling (dim, pos, tdim1 + ddim1, 1);
 
   ppl_Pointset_Powerset_C_Polyhedron_intersection_assign (eqpp, po);
   empty_p = ppl_powerset_is_empty (eqpp);
-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] Do not compute twice type, lb, and ub.
  2011-07-07 18:07 [PATCH 0/3] Fix PR47654 and PR49649 Sebastian Pop
  2011-07-07 18:07 ` [PATCH 1/3] Start counting nesting level from 0 and use the standard "Polyhedral SCattering Transformed" psct_* interface Sebastian Pop
@ 2011-07-07 18:08 ` Sebastian Pop
  2011-07-07 18:15 ` [PATCH 3/3] Fix PR47654: Compute LB and UB of a CLAST expression Sebastian Pop
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Sebastian Pop @ 2011-07-07 18:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, tobias, Sebastian Pop

2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>

	* graphite-clast-to-gimple.c (graphite_create_new_loop): Do not
	recompute type, lb, and ub.  Get them from...
	(graphite_create_new_loop_guard): ...here.  Pass in parameter
	pointers to type, lb, and ub.
	(translate_clast_for_loop): Update function calls.
	(translate_clast_for): Same.
---
 gcc/ChangeLog                  |    9 ++++++
 gcc/graphite-clast-to-gimple.c |   61 +++++++++++++++++++--------------------
 2 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a7849d..77cf1f6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,14 @@
 2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* graphite-clast-to-gimple.c (graphite_create_new_loop): Do not
+	recompute type, lb, and ub.  Get them from...
+	(graphite_create_new_loop_guard): ...here.  Pass in parameter
+	pointers to type, lb, and ub.
+	(translate_clast_for_loop): Update function calls.
+	(translate_clast_for): Same.
+
+2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* graphite-clast-to-gimple.c (compute_bounds_for_level): Call
 	psct_dynamic_dim.
 	(translate_clast_for_loop): Pass loop level to dependency_in_loop_p.
diff --git a/gcc/graphite-clast-to-gimple.c b/gcc/graphite-clast-to-gimple.c
index 53af18e..a8ac9c6 100644
--- a/gcc/graphite-clast-to-gimple.c
+++ b/gcc/graphite-clast-to-gimple.c
@@ -696,23 +696,15 @@ gcc_type_for_iv_of_clast_loop (struct clast_for *stmt_for, int level,
    becomes the child loop of the OUTER_LOOP.  NEWIVS_INDEX binds
    CLooG's scattering name to the induction variable created for the
    loop of STMT.  The new induction variable is inserted in the NEWIVS
-   vector.  */
+   vector and is of type TYPE.  */
 
 static struct loop *
-graphite_create_new_loop (sese region, edge entry_edge,
+graphite_create_new_loop (edge entry_edge,
 			  struct clast_for *stmt,
 			  loop_p outer, VEC (tree, heap) **newivs,
-			  htab_t newivs_index, htab_t params_index, int level)
+			  htab_t newivs_index,
+			  tree type, tree lb, tree ub)
 {
-  tree lb_type = gcc_type_for_clast_expr (stmt->LB, region, *newivs,
-					  newivs_index, params_index);
-  tree ub_type = gcc_type_for_clast_expr (stmt->UB, region, *newivs,
-					  newivs_index, params_index);
-  tree type = gcc_type_for_iv_of_clast_loop (stmt, level, lb_type, ub_type);
-  tree lb = clast_to_gcc_expression (type, stmt->LB, region, *newivs,
-				     newivs_index, params_index);
-  tree ub = clast_to_gcc_expression (type, stmt->UB, region, *newivs,
-				     newivs_index, params_index);
   tree stride = gmp_cst_to_tree (type, stmt->stride);
   tree ivvar = create_tmp_var (type, "graphite_IV");
   tree iv, iv_after_increment;
@@ -887,7 +879,8 @@ static edge
 graphite_create_new_loop_guard (sese region, edge entry_edge,
 				struct clast_for *stmt,
 				VEC (tree, heap) *newivs,
-				htab_t newivs_index, htab_t params_index)
+				htab_t newivs_index, htab_t params_index,
+				int level, tree *type, tree *lb, tree *ub)
 {
   tree cond_expr;
   edge exit_edge;
@@ -895,28 +888,30 @@ graphite_create_new_loop_guard (sese region, edge entry_edge,
 					  newivs_index, params_index);
   tree ub_type = gcc_type_for_clast_expr (stmt->UB, region, newivs,
 					  newivs_index, params_index);
-  tree type = max_precision_type (lb_type, ub_type);
-  tree lb = clast_to_gcc_expression (type, stmt->LB, region, newivs,
-				     newivs_index, params_index);
-  tree ub = clast_to_gcc_expression (type, stmt->UB, region, newivs,
-				     newivs_index, params_index);
+
+  *type = gcc_type_for_iv_of_clast_loop (stmt, level, lb_type, ub_type);
+  *lb = clast_to_gcc_expression (*type, stmt->LB, region, newivs,
+				 newivs_index, params_index);
+  *ub = clast_to_gcc_expression (*type, stmt->UB, region, newivs,
+				 newivs_index, params_index);
+
   /* When ub is simply a constant or a parameter, use lb <= ub.  */
-  if (TREE_CODE (ub) == INTEGER_CST || TREE_CODE (ub) == SSA_NAME)
-    cond_expr = fold_build2 (LE_EXPR, boolean_type_node, lb, ub);
+  if (TREE_CODE (*ub) == INTEGER_CST || TREE_CODE (*ub) == SSA_NAME)
+    cond_expr = fold_build2 (LE_EXPR, boolean_type_node, *lb, *ub);
   else
     {
-      tree one = (POINTER_TYPE_P (type)
+      tree one = (POINTER_TYPE_P (*type)
 		  ? size_one_node
-		  : fold_convert (type, integer_one_node));
+		  : fold_convert (*type, integer_one_node));
       /* Adding +1 and using LT_EXPR helps with loop latches that have a
 	 loop iteration count of "PARAMETER - 1".  For PARAMETER == 0 this becomes
 	 2^k-1 due to integer overflow, and the condition lb <= ub is true,
 	 even if we do not want this.  However lb < ub + 1 is false, as
 	 expected.  */
-      tree ub_one = fold_build2 (POINTER_TYPE_P (type) ? POINTER_PLUS_EXPR
-				 : PLUS_EXPR, type, ub, one);
+      tree ub_one = fold_build2 (POINTER_TYPE_P (*type) ? POINTER_PLUS_EXPR
+				 : PLUS_EXPR, *type, *ub, one);
 
-      cond_expr = fold_build2 (LT_EXPR, boolean_type_node, lb, ub_one);
+      cond_expr = fold_build2 (LT_EXPR, boolean_type_node, *lb, ub_one);
     }
 
   exit_edge = create_empty_if_region_on_edge (entry_edge, cond_expr);
@@ -935,17 +930,19 @@ translate_clast (sese, loop_p, struct clast_stmt *, edge,
    - BB_PBB_MAPPING is is a basic_block and it's related poly_bb_p mapping.
    - PARAMS_INDEX connects the cloog parameters with the gimple parameters in
      the sese region.  */
+
 static edge
 translate_clast_for_loop (sese region, loop_p context_loop,
 			  struct clast_for *stmt, edge next_e,
 			  VEC (tree, heap) **newivs,
 			  htab_t newivs_index, htab_t bb_pbb_mapping,
-			  int level, htab_t params_index)
+			  int level, htab_t params_index, tree type,
+			  tree lb, tree ub)
 {
-  struct loop *loop = graphite_create_new_loop (region, next_e, stmt,
+  struct loop *loop = graphite_create_new_loop (next_e, stmt,
  						context_loop, newivs,
- 						newivs_index, params_index,
-						level);
+						newivs_index,
+						type, lb, ub);
   edge last_e = single_exit (loop);
   edge to_body = single_succ_edge (loop->header);
   basic_block after = to_body->dest;
@@ -982,13 +979,15 @@ translate_clast_for (sese region, loop_p context_loop, struct clast_for *stmt,
 		     htab_t newivs_index, htab_t bb_pbb_mapping, int level,
 		     htab_t params_index)
 {
+  tree type, lb, ub;
   edge last_e = graphite_create_new_loop_guard (region, next_e, stmt, *newivs,
-						newivs_index, params_index);
+						newivs_index, params_index,
+						level, &type, &lb, &ub);
   edge true_e = get_true_edge_from_guard_bb (next_e->dest);
 
   translate_clast_for_loop (region, context_loop, stmt, true_e, newivs,
 			    newivs_index, bb_pbb_mapping, level,
-			    params_index);
+			    params_index, type, lb, ub);
   return last_e;
 }
 
-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] Fix PR47654: Compute LB and UB of a CLAST expression.
  2011-07-07 18:07 [PATCH 0/3] Fix PR47654 and PR49649 Sebastian Pop
  2011-07-07 18:07 ` [PATCH 1/3] Start counting nesting level from 0 and use the standard "Polyhedral SCattering Transformed" psct_* interface Sebastian Pop
  2011-07-07 18:08 ` [PATCH 2/3] Do not compute twice type, lb, and ub Sebastian Pop
@ 2011-07-07 18:15 ` Sebastian Pop
  2011-07-08  8:38 ` [PATCH 0/3] Fix PR47654 and PR49649 Richard Guenther
  2011-07-17  7:54 ` Tobias Grosser
  4 siblings, 0 replies; 15+ messages in thread
From: Sebastian Pop @ 2011-07-07 18:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, tobias, Sebastian Pop

2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/47654
	PR middle-end/49649
	* graphite-clast-to-gimple.c (struct clast_name_index): Add
	a level field.
	(new_clast_name_index): Add the level parameter.
	(clast_name_to_level): New.
	(save_clast_name_index): Add the level parameter.
	(save_clast_name_index): Adjust call to new_clast_name_index.
	(newivs_to_depth_to_newiv): Removed.
	(clast_name_to_gcc): Inline code of newivs_to_depth_to_newiv.
	(compute_bounds_for_level): Moved down.
	(compute_type_for_level): Renamed gcc_type_for_clast_expr_from_lb_ub.
	(clast_get_body_of_loop): Renamed clast_get_body_of.
	(gcc_type_for_iv_of_clast_loop): Removed.
	(graphite_create_new_loop): Add the level parameter.  Adjust call
	to save_clast_name_index.
	(compute_bounds_for_param): New.
	(lb_ub_for_name): New.
	(lb_ub_for_term): New.
	(lb_ub_for_red): New.
	(lb_ub_for_bin): New.
	(lb_ub_for_expr): New.
	(graphite_create_new_loop_guard): Do not pass in the level in
	parameter.  Compute type as the max_precision_type of lb_type and
	ub_type.  Call gcc_type_for_clast_expr_from_lb_ub.
	(translate_clast_for_loop): Adjust call to graphite_create_new_loop.
	(translate_clast_for): Adjust call to graphite_create_new_loop_guard.
	(create_params_index): Adjust call to save_clast_name_index.
	* graphite-ppl.h (value_min): New.

	* gcc.dg/graphite/run-id-pr47654.c
---
 gcc/ChangeLog                                  |   32 ++
 gcc/graphite-clast-to-gimple.c                 |  432 +++++++++++++++++-------
 gcc/graphite-ppl.h                             |   11 +
 gcc/testsuite/ChangeLog                        |    5 +
 gcc/testsuite/gcc.dg/graphite/run-id-pr47654.c |   24 ++
 5 files changed, 388 insertions(+), 116 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/run-id-pr47654.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 77cf1f6..b86c059 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,37 @@
 2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
 
+	PR tree-optimization/47654
+	PR middle-end/49649
+	* graphite-clast-to-gimple.c (struct clast_name_index): Add
+	a level field.
+	(new_clast_name_index): Add the level parameter.
+	(clast_name_to_level): New.
+	(save_clast_name_index): Add the level parameter.
+	(save_clast_name_index): Adjust call to new_clast_name_index.
+	(newivs_to_depth_to_newiv): Removed.
+	(clast_name_to_gcc): Inline code of newivs_to_depth_to_newiv.
+	(compute_bounds_for_level): Moved down.
+	(compute_type_for_level): Renamed gcc_type_for_clast_expr_from_lb_ub.
+	(clast_get_body_of_loop): Renamed clast_get_body_of.
+	(gcc_type_for_iv_of_clast_loop): Removed.
+	(graphite_create_new_loop): Add the level parameter.  Adjust call
+	to save_clast_name_index.
+	(compute_bounds_for_param): New.
+	(lb_ub_for_name): New.
+	(lb_ub_for_term): New.
+	(lb_ub_for_red): New.
+	(lb_ub_for_bin): New.
+	(lb_ub_for_expr): New.
+	(graphite_create_new_loop_guard): Do not pass in the level in
+	parameter.  Compute type as the max_precision_type of lb_type and
+	ub_type.  Call gcc_type_for_clast_expr_from_lb_ub.
+	(translate_clast_for_loop): Adjust call to graphite_create_new_loop.
+	(translate_clast_for): Adjust call to graphite_create_new_loop_guard.
+	(create_params_index): Adjust call to save_clast_name_index.
+	* graphite-ppl.h (value_min): New.
+
+2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* graphite-clast-to-gimple.c (graphite_create_new_loop): Do not
 	recompute type, lb, and ub.  Get them from...
 	(graphite_create_new_loop_guard): ...here.  Pass in parameter
diff --git a/gcc/graphite-clast-to-gimple.c b/gcc/graphite-clast-to-gimple.c
index a8ac9c6..9f66edb 100644
--- a/gcc/graphite-clast-to-gimple.c
+++ b/gcc/graphite-clast-to-gimple.c
@@ -56,26 +56,55 @@ graphite_verify (void)
 #endif
 }
 
-/* Stores the INDEX in a vector for a given clast NAME.  */
+/* Stores the INDEX in a vector and the loop nesting LEVEL for a given
+   clast NAME.  */
 
 typedef struct clast_name_index {
   int index;
+  int level;
   const char *name;
 } *clast_name_index_p;
 
 /* Returns a pointer to a new element of type clast_name_index_p built
-   from NAME and INDEX.  */
+   from NAME, LEVEL, and INDEX.  */
 
 static inline clast_name_index_p
-new_clast_name_index (const char *name, int index)
+new_clast_name_index (const char *name, int index, int level)
 {
   clast_name_index_p res = XNEW (struct clast_name_index);
 
   res->name = name;
+  res->level = level;
   res->index = index;
   return res;
 }
 
+/* For a given clast NAME, returns -1 if NAME is not in the
+   INDEX_TABLE, otherwise returns the loop level for the induction
+   variable NAME, or if it is a parameter, the parameter number in the
+   vector of parameters.  */
+
+static inline int
+clast_name_to_level (clast_name_p name, htab_t index_table)
+{
+  struct clast_name_index tmp;
+  PTR *slot;
+
+#ifdef CLOOG_ORG
+  gcc_assert (name->type == clast_expr_name);
+  tmp.name = ((const struct clast_name *) name)->name;
+#else
+  tmp.name = name;
+#endif
+
+  slot = htab_find_slot (index_table, &tmp, NO_INSERT);
+
+  if (slot && *slot)
+    return ((struct clast_name_index *) *slot)->level;
+
+  return -1;
+}
+
 /* For a given clast NAME, returns -1 if it does not correspond to any
    parameter, or otherwise, returns the index in the PARAMS or
    SCATTERING_DIMENSIONS vector.  */
@@ -101,10 +130,11 @@ clast_name_to_index (clast_name_p name, htab_t index_table)
   return -1;
 }
 
-/* Records in INDEX_TABLE the INDEX for NAME.  */
+/* Records in INDEX_TABLE the INDEX and LEVEL for NAME.  */
 
 static inline void
-save_clast_name_index (htab_t index_table, const char *name, int index)
+save_clast_name_index (htab_t index_table, const char *name,
+		       int index, int level)
 {
   struct clast_name_index tmp;
   PTR *slot;
@@ -116,7 +146,7 @@ save_clast_name_index (htab_t index_table, const char *name, int index)
     {
       free (*slot);
 
-      *slot = new_clast_name_index (name, index);
+      *slot = new_clast_name_index (name, index, level);
     }
 }
 
@@ -139,15 +169,6 @@ eq_clast_name_indexes (const void *e1, const void *e2)
   return (elt1->name == elt2->name);
 }
 
-/* For a given scattering dimension, return the new induction variable
-   associated to it.  */
-
-static inline tree
-newivs_to_depth_to_newiv (VEC (tree, heap) *newivs, int depth)
-{
-  return VEC_index (tree, newivs, depth);
-}
-
 \f
 
 /* Returns the tree variable from the name NAME that was given in
@@ -172,7 +193,7 @@ clast_name_to_gcc (clast_name_p name, sese region, VEC (tree, heap) *newivs,
   index = clast_name_to_index (name, newivs_index);
   gcc_assert (index >= 0);
 
-  return newivs_to_depth_to_newiv (newivs, index);
+  return VEC_index (tree, newivs, index);
 }
 
 /* Returns the signed maximal precision type for expressions TYPE1 and TYPE2.  */
@@ -602,94 +623,6 @@ graphite_create_new_guard (sese region, edge entry_edge,
   return exit_edge;
 }
 
-/* Compute the lower bound LOW and upper bound UP for the induction
-   variable at LEVEL for the statement PBB, based on the transformed
-   scattering of PBB: T|I|G|Cst, with T the scattering transform, I
-   the iteration domain, and G the context parameters.  */
-
-static void
-compute_bounds_for_level (poly_bb_p pbb, int level, mpz_t low, mpz_t up)
-{
-  ppl_Pointset_Powerset_C_Polyhedron_t ps;
-  ppl_Linear_Expression_t le;
-
-  combine_context_id_scat (&ps, pbb, false);
-
-  /* Prepare the linear expression corresponding to the level that we
-     want to maximize/minimize.  */
-  {
-    ppl_dimension_type dim = pbb_nb_scattering_transform (pbb)
-      + pbb_dim_iter_domain (pbb) + pbb_nb_params (pbb);
-
-    ppl_new_Linear_Expression_with_dimension (&le, dim);
-    ppl_set_coef (le, psct_dynamic_dim (pbb, level), 1);
-  }
-
-  ppl_max_for_le_pointset (ps, le, up);
-  ppl_min_for_le_pointset (ps, le, low);
-  ppl_delete_Linear_Expression (le);
-  ppl_delete_Pointset_Powerset_C_Polyhedron (ps);
-}
-
-/* Compute the type for the induction variable at LEVEL for the
-   statement PBB, based on the transformed schedule of PBB.  */
-
-static tree
-compute_type_for_level (poly_bb_p pbb, int level)
-{
-  mpz_t low, up;
-  tree type;
-
-  mpz_init (low);
-  mpz_init (up);
-
-  compute_bounds_for_level (pbb, level, low, up);
-  type = gcc_type_for_interval (low, up);
-
-  mpz_clear (low);
-  mpz_clear (up);
-  return type;
-}
-
-/* Walks a CLAST and returns the first statement in the body of a
-   loop.  */
-
-static struct clast_user_stmt *
-clast_get_body_of_loop (struct clast_stmt *stmt)
-{
-  if (!stmt
-      || CLAST_STMT_IS_A (stmt, stmt_user))
-    return (struct clast_user_stmt *) stmt;
-
-  if (CLAST_STMT_IS_A (stmt, stmt_for))
-    return clast_get_body_of_loop (((struct clast_for *) stmt)->body);
-
-  if (CLAST_STMT_IS_A (stmt, stmt_guard))
-    return clast_get_body_of_loop (((struct clast_guard *) stmt)->then);
-
-  if (CLAST_STMT_IS_A (stmt, stmt_block))
-    return clast_get_body_of_loop (((struct clast_block *) stmt)->body);
-
-  gcc_unreachable ();
-}
-
-/* Returns the type for the induction variable for the loop translated
-   from STMT_FOR.  */
-
-static tree
-gcc_type_for_iv_of_clast_loop (struct clast_for *stmt_for, int level,
-			       tree lb_type, tree ub_type)
-{
-  struct clast_stmt *stmt = (struct clast_stmt *) stmt_for;
-  struct clast_user_stmt *body = clast_get_body_of_loop (stmt);
-  CloogStatement *cs = body->statement;
-  poly_bb_p pbb = (poly_bb_p) cloog_statement_usr (cs);
-
-  return max_signed_precision_type (lb_type, max_precision_type
-				    (ub_type, compute_type_for_level
-				     (pbb, level)));
-}
-
 /* Creates a new LOOP corresponding to Cloog's STMT.  Inserts an
    induction variable for the new LOOP.  New LOOP is attached to CFG
    starting at ENTRY_EDGE.  LOOP is inserted into the loop tree and
@@ -703,7 +636,7 @@ graphite_create_new_loop (edge entry_edge,
 			  struct clast_for *stmt,
 			  loop_p outer, VEC (tree, heap) **newivs,
 			  htab_t newivs_index,
-			  tree type, tree lb, tree ub)
+			  tree type, tree lb, tree ub, int level)
 {
   tree stride = gmp_cst_to_tree (type, stmt->stride);
   tree ivvar = create_tmp_var (type, "graphite_IV");
@@ -715,7 +648,7 @@ graphite_create_new_loop (edge entry_edge,
   add_referenced_var (ivvar);
 
   save_clast_name_index (newivs_index, stmt->iterator,
-			 VEC_length (tree, *newivs));
+			 VEC_length (tree, *newivs), level);
   VEC_safe_push (tree, heap, *newivs, iv);
   return loop;
 }
@@ -872,6 +805,270 @@ translate_clast_user (sese region, struct clast_user_stmt *stmt, edge next_e,
   return next_e;
 }
 
+
+/* Compute the lower bound LOW and upper bound UP for the parameter
+   PARAM for the statement PBB, based on the transformed scattering of
+   PBB: T|I|G|Cst, with T the scattering transform, I the iteration
+   domain, and G the context parameters.  */
+
+static void
+compute_bounds_for_param (poly_bb_p pbb, int param, mpz_t low, mpz_t up)
+{
+  ppl_Pointset_Powerset_C_Polyhedron_t ps;
+  ppl_Linear_Expression_t le;
+
+  combine_context_id_scat (&ps, pbb, false);
+
+  /* Prepare the linear expression corresponding to the parameter that
+     we want to maximize/minimize.  */
+  {
+    ppl_dimension_type dim = pbb_nb_scattering_transform (pbb)
+      + pbb_dim_iter_domain (pbb) + pbb_nb_params (pbb);
+
+    ppl_new_Linear_Expression_with_dimension (&le, dim);
+    ppl_set_coef (le, psct_parameter_dim (pbb, param), 1);
+  }
+
+  ppl_max_for_le_pointset (ps, le, up);
+  ppl_min_for_le_pointset (ps, le, low);
+  ppl_delete_Linear_Expression (le);
+  ppl_delete_Pointset_Powerset_C_Polyhedron (ps);
+}
+
+/* Compute the lower bound LOW and upper bound UP for the induction
+   variable at LEVEL for the statement PBB, based on the transformed
+   scattering of PBB: T|I|G|Cst, with T the scattering transform, I
+   the iteration domain, and G the context parameters.  */
+
+static void
+compute_bounds_for_level (poly_bb_p pbb, int level, mpz_t low, mpz_t up)
+{
+  ppl_Pointset_Powerset_C_Polyhedron_t ps;
+  ppl_Linear_Expression_t le;
+
+  combine_context_id_scat (&ps, pbb, false);
+
+  /* Prepare the linear expression corresponding to the level that we
+     want to maximize/minimize.  */
+  {
+    ppl_dimension_type dim = pbb_nb_scattering_transform (pbb)
+      + pbb_dim_iter_domain (pbb) + pbb_nb_params (pbb);
+
+    ppl_new_Linear_Expression_with_dimension (&le, dim);
+    ppl_set_coef (le, psct_dynamic_dim (pbb, level), 1);
+  }
+
+  ppl_max_for_le_pointset (ps, le, up);
+  ppl_min_for_le_pointset (ps, le, low);
+  ppl_delete_Linear_Expression (le);
+  ppl_delete_Pointset_Powerset_C_Polyhedron (ps);
+}
+
+/* Return the lower bound LB and upper bound UB of the clast_name NAME.  */
+
+static void
+lb_ub_for_name (clast_name_p name, htab_t newivs_index, htab_t params_index,
+		mpz_t lb, mpz_t ub, poly_bb_p pbb)
+{
+  int level;
+
+  if (params_index)
+    {
+      int param = clast_name_to_level (name, params_index);
+
+      if (param >= 0)
+	{
+	  compute_bounds_for_param (pbb, param, lb, ub);
+	  return;
+	}
+    }
+
+  gcc_assert (newivs_index);
+  level = clast_name_to_level (name, newivs_index);
+  gcc_assert (level >= 0);
+  compute_bounds_for_level (pbb, level, lb, ub);
+}
+
+/* Return the lower bound LB and upper bound UB of the clast_term T.  */
+
+static void
+lb_ub_for_term (struct clast_term *t, htab_t newivs_index, htab_t params_index,
+		mpz_t lb, mpz_t ub, poly_bb_p pbb)
+{
+  gcc_assert (t->expr.type == clast_expr_term);
+
+  if (t->var)
+    {
+      mpz_t v;
+      lb_ub_for_name ((clast_name_p) (t->var),
+		      newivs_index, params_index, lb, ub, pbb);
+      mpz_init (v);
+      mpz_abs (v, t->val);
+      mpz_mul (lb, lb, v);
+      mpz_mul (ub, ub, v);
+      mpz_clear (v);
+    }
+  else
+    {
+      mpz_set (lb, t->val);
+      mpz_set (ub, t->val);
+    }
+}
+
+static void
+lb_ub_for_expr (struct clast_expr *, htab_t, htab_t, mpz_t, mpz_t, poly_bb_p);
+
+/* Return the lower bound LB and upper bound UB of the clast_reduction R.  */
+
+static void
+lb_ub_for_red (struct clast_reduction *r, htab_t newivs_index,
+	       htab_t params_index, mpz_t lb, mpz_t ub, poly_bb_p pbb)
+{
+  int i;
+  mpz_t l, u;
+
+  lb_ub_for_expr (r->elts[0], newivs_index, params_index, lb, ub, pbb);
+
+  if (r->n == 1)
+    return;
+
+  mpz_init (l);
+  mpz_init (u);
+
+  for (i = 1; i < r->n; i++)
+    {
+      lb_ub_for_expr (r->elts[i], newivs_index, params_index, l, u, pbb);
+
+      /* As the interval [LB, UB] is used to compute a type for the
+	 expression, it should include the bounds of each term of the
+	 reduction expression: so take the min of lower bounds and the
+	 max of upper bounds.  */
+      value_min (lb, lb, l);
+      value_max (ub, ub, u);
+
+      switch (r->type)
+	{
+	case clast_red_sum:
+	  /* The interval [LB, UB] should also include the result of
+	     the sum.  */
+	  mpz_add (l, lb, l);
+	  mpz_add (u, ub, u);
+	  value_min (lb, lb, l);
+	  value_max (ub, ub, u);
+	  break;
+
+	case clast_red_min:
+	case clast_red_max:
+	  break;
+
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  mpz_clear (l);
+  mpz_clear (u);
+}
+
+/* Return the type for the clast_binary B used in STMT.  */
+
+static void
+lb_ub_for_bin (struct clast_binary *b, htab_t newivs_index,
+	       htab_t params_index, mpz_t lb, mpz_t ub, poly_bb_p pbb)
+{
+  lb_ub_for_expr ((struct clast_expr *) b->LHS, newivs_index, params_index,
+		  lb, ub, pbb);
+
+  switch (b->type)
+    {
+    case clast_bin_cdiv:
+    case clast_bin_fdiv:
+    case clast_bin_div:
+    case clast_bin_mod:
+      value_min (lb, lb, b->RHS);
+      value_max (ub, ub, b->RHS);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the lower bound LB and upper bound UB of the clast_expr E.  */
+
+static void
+lb_ub_for_expr (struct clast_expr *e, htab_t newivs_index, htab_t params_index,
+		mpz_t lb, mpz_t ub, poly_bb_p pbb)
+{
+  switch (e->type)
+    {
+    case clast_expr_term:
+      lb_ub_for_term ((struct clast_term *) e,
+		      newivs_index, params_index, lb, ub, pbb);
+      break;
+
+    case clast_expr_red:
+      lb_ub_for_red ((struct clast_reduction *) e,
+		     newivs_index, params_index, lb, ub, pbb);
+      break;
+
+    case clast_expr_bin:
+      lb_ub_for_bin ((struct clast_binary *) e,
+		     newivs_index, params_index, lb, ub, pbb);
+      break;
+
+    case clast_expr_name:
+      lb_ub_for_name ((clast_name_p) e, 
+		      newivs_index, params_index, lb, ub, pbb);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Returns the type for the CLAST expression E in REGION.  */
+
+static tree
+gcc_type_for_clast_expr_from_lb_ub (struct clast_expr *e, htab_t newivs_index,
+				    htab_t params_index, poly_bb_p pbb)
+{
+  mpz_t lb, ub;
+  tree type;
+
+  mpz_init (lb);
+  mpz_init (ub);
+
+  lb_ub_for_expr (e, newivs_index, params_index, lb, ub, pbb);
+  type = gcc_type_for_interval (lb, ub);
+
+  mpz_clear (lb);
+  mpz_clear (ub);
+  return type;
+}
+
+/* Walks a CLAST and returns the first statement in the body of a
+   loop.  */
+
+static struct clast_user_stmt *
+clast_get_body_of (struct clast_stmt *stmt)
+{
+  if (!stmt
+      || CLAST_STMT_IS_A (stmt, stmt_user))
+    return (struct clast_user_stmt *) stmt;
+
+  if (CLAST_STMT_IS_A (stmt, stmt_for))
+    return clast_get_body_of (((struct clast_for *) stmt)->body);
+
+  if (CLAST_STMT_IS_A (stmt, stmt_guard))
+    return clast_get_body_of (((struct clast_guard *) stmt)->then);
+
+  if (CLAST_STMT_IS_A (stmt, stmt_block))
+    return clast_get_body_of (((struct clast_block *) stmt)->body);
+
+  gcc_unreachable ();
+}
+
 /* Creates a new if region protecting the loop to be executed, if the execution
    count is zero (lb > ub).  */
 
@@ -880,16 +1077,19 @@ graphite_create_new_loop_guard (sese region, edge entry_edge,
 				struct clast_for *stmt,
 				VEC (tree, heap) *newivs,
 				htab_t newivs_index, htab_t params_index,
-				int level, tree *type, tree *lb, tree *ub)
+				tree *type, tree *lb, tree *ub)
 {
   tree cond_expr;
   edge exit_edge;
-  tree lb_type = gcc_type_for_clast_expr (stmt->LB, region, newivs,
-					  newivs_index, params_index);
-  tree ub_type = gcc_type_for_clast_expr (stmt->UB, region, newivs,
-					  newivs_index, params_index);
-
-  *type = gcc_type_for_iv_of_clast_loop (stmt, level, lb_type, ub_type);
+  struct clast_stmt *cstmt = (struct clast_stmt *) stmt;
+  struct clast_user_stmt *body = clast_get_body_of (cstmt);
+  CloogStatement *cs = body->statement;
+  poly_bb_p pbb = (poly_bb_p) cloog_statement_usr (cs);
+  tree lb_type = gcc_type_for_clast_expr_from_lb_ub (stmt->LB, newivs_index,
+						     params_index, pbb);
+  tree ub_type = gcc_type_for_clast_expr_from_lb_ub (stmt->UB, newivs_index,
+						     params_index, pbb);
+  *type = max_precision_type (lb_type, ub_type);
   *lb = clast_to_gcc_expression (*type, stmt->LB, region, newivs,
 				 newivs_index, params_index);
   *ub = clast_to_gcc_expression (*type, stmt->UB, region, newivs,
@@ -942,7 +1142,7 @@ translate_clast_for_loop (sese region, loop_p context_loop,
   struct loop *loop = graphite_create_new_loop (next_e, stmt,
  						context_loop, newivs,
 						newivs_index,
-						type, lb, ub);
+						type, lb, ub, level);
   edge last_e = single_exit (loop);
   edge to_body = single_succ_edge (loop->header);
   basic_block after = to_body->dest;
@@ -982,7 +1182,7 @@ translate_clast_for (sese region, loop_p context_loop, struct clast_for *stmt,
   tree type, lb, ub;
   edge last_e = graphite_create_new_loop_guard (region, next_e, stmt, *newivs,
 						newivs_index, params_index,
-						level, &type, &lb, &ub);
+						&type, &lb, &ub);
   edge true_e = get_true_edge_from_guard_bb (next_e->dest);
 
   translate_clast_for_loop (region, context_loop, stmt, true_e, newivs,
@@ -1423,7 +1623,7 @@ create_params_index (htab_t index_table, CloogProgram *prog) {
   int i;
 
   for (i = 0; i < nb_parameters; i++)
-    save_clast_name_index (index_table, parameters[i], i);
+    save_clast_name_index (index_table, parameters[i], i, i);
 }
 
 /* GIMPLE Loop Generator: generates loops from STMT in GIMPLE form for
diff --git a/gcc/graphite-ppl.h b/gcc/graphite-ppl.h
index 49bde61..5820e19 100644
--- a/gcc/graphite-ppl.h
+++ b/gcc/graphite-ppl.h
@@ -124,6 +124,17 @@ ppl_set_coef_tree (ppl_Linear_Expression_t e, ppl_dimension_type i, tree x)
   mpz_clear (v);
 }
 
+/* Sets RES to the min of V1 and V2.  */
+
+static inline void
+value_min (mpz_t res, mpz_t v1, mpz_t v2)
+{
+  if (mpz_cmp (v1, v2) < 0)
+    mpz_set (res, v1);
+  else
+    mpz_set (res, v2);
+}
+
 /* Sets RES to the max of V1 and V2.  */
 
 static inline void
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 282990f..8e08779 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,6 +1,11 @@
 2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
 
 	PR tree-optimization/47654
+	* gcc.dg/graphite/run-id-pr47654.c
+
+2011-07-05  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/47654
 	* gcc.dg/graphite/block-pr47654.c: New.
 
 2011-07-05  Jason Merrill  <jason@redhat.com>
diff --git a/gcc/testsuite/gcc.dg/graphite/run-id-pr47654.c b/gcc/testsuite/gcc.dg/graphite/run-id-pr47654.c
new file mode 100644
index 0000000..c257f58
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/run-id-pr47654.c
@@ -0,0 +1,24 @@
+/* { dg-options "-O -floop-block" } */
+
+int a[128][40];
+
+void __attribute__ ((noinline, noclone))
+foo (void)
+{
+  int i, j;
+  for (i = 0; i < 40; i++)
+    for (j = 0; j < 128; j++)
+      a[j][i] = 4;
+}
+
+int
+main ()
+{
+  int i, j;
+  foo ();
+  for (i = 0; i < 40; i++)
+    for (j = 0; j < 128; j++)
+      if (a[j][i] != 4)
+	__builtin_abort ();
+  return 0;
+}
-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-07 18:07 [PATCH 0/3] Fix PR47654 and PR49649 Sebastian Pop
                   ` (2 preceding siblings ...)
  2011-07-07 18:15 ` [PATCH 3/3] Fix PR47654: Compute LB and UB of a CLAST expression Sebastian Pop
@ 2011-07-08  8:38 ` Richard Guenther
  2011-07-15 23:08   ` Sebastian Pop
  2011-07-17  7:54 ` Tobias Grosser
  4 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-07-08  8:38 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, tobias

On Thu, 7 Jul 2011, Sebastian Pop wrote:

> Hi,
> 
> First there are two cleanup patches independent of the fix:
> 
>   Start counting nesting level from 0.
>   Do not compute twice type, lb, and ub.
> 
> Then the patch that fixes PR47654:
> 
>   Fix PR47654: Compute LB and UB of a CLAST expression.
> 
> One of the reasons we cannot determine the IV type only from the
> polyhedral representation is that as in the testcase of PR47654, we
> are asked to generate an induction variable going from 0 to 127.  That
> could be represented with a "char".  However the upper bound
> expression of the loop generated by CLOOG is min (127, 51*scat_1 + 50)
> and that would overflow if we use a "char" type.  To evaluate a type
> in which the expression 51*scat_1 + 50 does not overflow, we have to
> compute an upper and lower bound for the expression.
> 
> To fix the problem exposed by Tobias:
> 
> > for (i = 0 ; i < 2; i++)
> >  for (j = i ; j < i + 1; j++)
> >    for (k = j ; k < j + 1; k++)
> >      for (m = k ; m < k + 1; m++)
> >        for (n = m ; n < m + 1; n++)
> >          A[0] += A[n];
> > 
> > I am a little bit afraid that we will increase the type size by an
> > order of magnitude (or at least one bit) for each nesting level.
> 
> instead of computing the lb and ub of scat_1 in "51*scat_1 + 50" based
> on the type of scat_1 (that we already code generated when building
> the outer loop), we use the polyhedral representation to get an
> accurate lb and ub for scat_1.
> 
> When translating the substitutions of a user statement using this
> precise method, like for example S5 in vect-pr43423.c:
> 
>   for (scat_1=0;scat_1<=min(T_3-1,T_4-1);scat_1++) {
>     S5(scat_1);
> 
> we get a type that is too precise: based on the interval [0,99] we get
> the type "unsigned char" when the type of scat_1 is "int", misleading
> the vectorizer due to the insertion of spurious casts:
> 
> #  Access function 0: (int) {(<unnamed-unsigned:8>) graphite_IV.7_56, +, 1}_3;
> #)
> affine dependence test not usable: access function not affine or constant.
> 
> So we have to keep around the previous code gcc_type_for_clast_* that
> computes the type of an expression as the max precision of the
> components of that expression, and use that when computing the types
> of substitution expressions.
> 
> The patches passed together a full bootstrap and test on amd64-linux.
> Ok for trunk?

The idea sounds good to me and the middle-end-like looking pieces
look good.  I'd appreciate a 2nd look from Tobias.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-08  8:38 ` [PATCH 0/3] Fix PR47654 and PR49649 Richard Guenther
@ 2011-07-15 23:08   ` Sebastian Pop
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Pop @ 2011-07-15 23:08 UTC (permalink / raw)
  To: tobias; +Cc: gcc-patches, Richard Guenther

On Fri, Jul 8, 2011 at 03:32, Richard Guenther <rguenther@suse.de> wrote:
> On Thu, 7 Jul 2011, Sebastian Pop wrote:
>
>> Hi,
>>
>> First there are two cleanup patches independent of the fix:
>>
>>   Start counting nesting level from 0.
>>   Do not compute twice type, lb, and ub.
>>
>> Then the patch that fixes PR47654:
>>
>>   Fix PR47654: Compute LB and UB of a CLAST expression.
>>
>> One of the reasons we cannot determine the IV type only from the
>> polyhedral representation is that as in the testcase of PR47654, we
>> are asked to generate an induction variable going from 0 to 127.  That
>> could be represented with a "char".  However the upper bound
>> expression of the loop generated by CLOOG is min (127, 51*scat_1 + 50)
>> and that would overflow if we use a "char" type.  To evaluate a type
>> in which the expression 51*scat_1 + 50 does not overflow, we have to
>> compute an upper and lower bound for the expression.
>>
>> To fix the problem exposed by Tobias:
>>
>> > for (i = 0 ; i < 2; i++)
>> >  for (j = i ; j < i + 1; j++)
>> >    for (k = j ; k < j + 1; k++)
>> >      for (m = k ; m < k + 1; m++)
>> >        for (n = m ; n < m + 1; n++)
>> >          A[0] += A[n];
>> >
>> > I am a little bit afraid that we will increase the type size by an
>> > order of magnitude (or at least one bit) for each nesting level.
>>
>> instead of computing the lb and ub of scat_1 in "51*scat_1 + 50" based
>> on the type of scat_1 (that we already code generated when building
>> the outer loop), we use the polyhedral representation to get an
>> accurate lb and ub for scat_1.
>>
>> When translating the substitutions of a user statement using this
>> precise method, like for example S5 in vect-pr43423.c:
>>
>>   for (scat_1=0;scat_1<=min(T_3-1,T_4-1);scat_1++) {
>>     S5(scat_1);
>>
>> we get a type that is too precise: based on the interval [0,99] we get
>> the type "unsigned char" when the type of scat_1 is "int", misleading
>> the vectorizer due to the insertion of spurious casts:
>>
>> #  Access function 0: (int) {(<unnamed-unsigned:8>) graphite_IV.7_56, +, 1}_3;
>> #)
>> affine dependence test not usable: access function not affine or constant.
>>
>> So we have to keep around the previous code gcc_type_for_clast_* that
>> computes the type of an expression as the max precision of the
>> components of that expression, and use that when computing the types
>> of substitution expressions.
>>
>> The patches passed together a full bootstrap and test on amd64-linux.
>> Ok for trunk?
>
> The idea sounds good to me and the middle-end-like looking pieces
> look good.  I'd appreciate a 2nd look from Tobias.
>

Tobias, could you please have a look at these patches as well?

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-07 18:07 [PATCH 0/3] Fix PR47654 and PR49649 Sebastian Pop
                   ` (3 preceding siblings ...)
  2011-07-08  8:38 ` [PATCH 0/3] Fix PR47654 and PR49649 Richard Guenther
@ 2011-07-17  7:54 ` Tobias Grosser
  2011-07-17 11:31   ` Sebastian Pop
  4 siblings, 1 reply; 15+ messages in thread
From: Tobias Grosser @ 2011-07-17  7:54 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, rguenther

On 07/07/2011 08:07 PM, Sebastian Pop wrote:
> Hi,

Hi Sebastian,

sorry for taking so long. Here some comments from me:

> First there are two cleanup patches independent of the fix:
>
>    Start counting nesting level from 0.
>    Do not compute twice type, lb, and ub.
>
> Then the patch that fixes PR47654:
>
>    Fix PR47654: Compute LB and UB of a CLAST expression.
>
> One of the reasons we cannot determine the IV type only from the
> polyhedral representation is that as in the testcase of PR47654, we
> are asked to generate an induction variable going from 0 to 127.  That
> could be represented with a "char".  However the upper bound
> expression of the loop generated by CLOOG is min (127, 51*scat_1 + 50)
> and that would overflow if we use a "char" type.  To evaluate a type
> in which the expression 51*scat_1 + 50 does not overflow, we have to
> compute an upper and lower bound for the expression.
>
> To fix the problem exposed by Tobias:
>
>> for (i = 0 ; i<  2; i++)
>>   for (j = i ; j<  i + 1; j++)
>>     for (k = j ; k<  j + 1; k++)
>>       for (m = k ; m<  k + 1; m++)
>>         for (n = m ; n<  m + 1; n++)
>>           A[0] += A[n];
>>
>> I am a little bit afraid that we will increase the type size by an
>> order of magnitude (or at least one bit) for each nesting level.
>
> instead of computing the lb and ub of scat_1 in "51*scat_1 + 50" based
> on the type of scat_1 (that we already code generated when building
> the outer loop), we use the polyhedral representation to get an
> accurate lb and ub for scat_1.

I think this part is OK.

With the new cloog-isl you may get even more information (and 
consequently smaller tyes), if you do not only us domain + scattering, 
but intersect it with the subset of the scattering space that the loop 
enumerates (This information is available in cloog-isl).

Another way to get smaller types is to only get the maximal/minimal 
value each constraint can have. Let this be an upper bound up = 
min(5+3*n-8*m+33*i, 5i), which has two constraints.
5+3*n-8*m+33*i and 5i. Here we do not need to ensure that every single 
result during the calculation of 5+3*n-8*m+33*i fits into its type. 
Overflows during addition and multiplication should not affect the final 
result, as long as the overflow follows modulo semantics and as long as 
the type is large enough to store the final value. Only for the 
calculations like min/max/div/mod we need to ensure that every 
intermediate value fits in the range of the used type.

> When translating the substitutions of a user statement using this
> precise method, like for example S5 in:
>
>    for (scat_1=0;scat_1<=min(T_3-1,T_4-1);scat_1++) {
>      S5(scat_1);
>
> we get a type that is too precise: based on the interval [0,99] we get
> the type "unsigned char" when the type of scat_1 is "int", misleading
> the vectorizer due to the insertion of spurious casts:

Here I am a little lost. You write using the precise method you get an 
interval [0,99] for the example in vect-pr43423.c. This is the example:

int a[100], b[100], c[100];

void foo(int n, int mid)
{
   int i;
   for(i=0; i<n; i++)
     {
       if (i < mid)
         a[i] = a[i] + b[i];
       else
         a[i] = a[i] + c[i];
     }
}

I do not see, how you would derive this small type, as the parameters of 
function foo() allow a range that is a lot larger.

So lets assume the size of n is known and limited to 100 in the context. 
Like this you may be able to get such a precise type.

> #  Access function 0: (int) {(<unnamed-unsigned:8>) graphite_IV.7_56, +, 1}_3;
> #)
> affine dependence test not usable: access function not affine or constant.
>
> So we have to keep around the previous code gcc_type_for_clast_* that
> computes the type of an expression as the max precision of the
> components of that expression, and use that when computing the types
> of substitution expressions.

Here I am lost again. I understand that very small types may introduce 
more casts (and casts are always difficult to analyze). However, I do 
not understand your solution.

First of all it seems the gcc_type_for_clast ... functions are always 
needed to calculate the types, as just using polyhedral methods would 
not get larger types, as they may be needed for intermediate results. 
Which functions in your patch (or in the current graphite code), are 
only needed to ensure that no problematic casts are inserted? This would 
probably help me to understand how they ensure no problematic casts are 
generated.

Cheers
Tobi

P.S.: I would love to see this upper/lower bound calculation being part 
of CLooG.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-17  7:54 ` Tobias Grosser
@ 2011-07-17 11:31   ` Sebastian Pop
  2011-07-18  7:04     ` Tobias Grosser
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Pop @ 2011-07-17 11:31 UTC (permalink / raw)
  To: Tobias Grosser; +Cc: gcc-patches, rguenther

Hi Tobias,

On Sat, Jul 16, 2011 at 20:02, Tobias Grosser <tobias@grosser.es> wrote:
> Here I am a little lost. You write using the precise method you get an
> interval [0,99] for the example in vect-pr43423.c. This is the example:
>
> int a[100], b[100], c[100];
>
> void foo(int n, int mid)
> {
>  int i;
>  for(i=0; i<n; i++)
>    {
>      if (i < mid)
>        a[i] = a[i] + b[i];
>      else
>        a[i] = a[i] + c[i];
>    }
> }
>
> I do not see, how you would derive this small type, as the parameters of
> function foo() allow a range that is a lot larger.

Accessing a[100] is undefined in C, so i < 100, and so n <= 100.

>> #  Access function 0: (int) {(<unnamed-unsigned:8>) graphite_IV.7_56, +,
>> 1}_3;
>> #)
>> affine dependence test not usable: access function not affine or constant.
>>
>> So we have to keep around the previous code gcc_type_for_clast_* that
>> computes the type of an expression as the max precision of the
>> components of that expression, and use that when computing the types
>> of substitution expressions.
>
> Here I am lost again. I understand that very small types may introduce more
> casts (and casts are always difficult to analyze). However, I do not
> understand your solution.
>
> First of all it seems the gcc_type_for_clast ... functions are always needed
> to calculate the types, as just using polyhedral methods would not get
> larger types, as they may be needed for intermediate results. Which
> functions in your patch (or in the current graphite code), are only needed
> to ensure that no problematic casts are inserted? This would probably help
> me to understand how they ensure no problematic casts are generated.

The current graphite code contains the gcc_type_for_clast functions
that are computing wide enough types for expressions.  The types are
computed based on the types of parameters and based on the types of
the induction variables of the loops surrounding the statement using
that expression.  These functions are not touched by the current patch
set.

The functions lb_ub_for_* added by this patch are the ones that will
be very precise on the lower bound and upper bounds, as they are
computing based on the polyhedral representation, instead of asking
for the type of parameters, or the type of induction variables in an
expression.  The lb_ub_for_* functions are too precise to compute the
type of expressions other than the loop bounds.  This extra precision is
needed to address the problem you exposed of increasingly wider types
for inductions variables.

> P.S.: I would love to see this upper/lower bound calculation being part of
> CLooG.
>

Yes, that would be nice.  Let's convince skimo that we would need types
on CLAST expressions ;-)

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-17 11:31   ` Sebastian Pop
@ 2011-07-18  7:04     ` Tobias Grosser
  2011-07-18 16:12       ` Sebastian Pop
  0 siblings, 1 reply; 15+ messages in thread
From: Tobias Grosser @ 2011-07-18  7:04 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, rguenther

On 07/17/2011 07:55 AM, Sebastian Pop wrote:
> Hi Tobias,
>
> On Sat, Jul 16, 2011 at 20:02, Tobias Grosser<tobias@grosser.es>  wrote:
>> Here I am a little lost. You write using the precise method you get an
>> interval [0,99] for the example in vect-pr43423.c. This is the example:
>>
>> int a[100], b[100], c[100];
>>
>> void foo(int n, int mid)
>> {
>>   int i;
>>   for(i=0; i<n; i++)
>>     {
>>       if (i<  mid)
>>         a[i] = a[i] + b[i];
>>       else
>>         a[i] = a[i] + c[i];
>>     }
>> }
>>
>> I do not see, how you would derive this small type, as the parameters of
>> function foo() allow a range that is a lot larger.
>
> Accessing a[100] is undefined in C, so i<  100, and so n<= 100.

Sure. I am just surprised we already include this information in 
graphite. Do we add this to the context?

>>> #  Access function 0: (int) {(<unnamed-unsigned:8>) graphite_IV.7_56, +,
>>> 1}_3;
>>> #)
>>> affine dependence test not usable: access function not affine or constant.
>>>
>>> So we have to keep around the previous code gcc_type_for_clast_* that
>>> computes the type of an expression as the max precision of the
>>> components of that expression, and use that when computing the types
>>> of substitution expressions.
>>
>> Here I am lost again. I understand that very small types may introduce more
>> casts (and casts are always difficult to analyze). However, I do not
>> understand your solution.
>>
>> First of all it seems the gcc_type_for_clast ... functions are always needed
>> to calculate the types, as just using polyhedral methods would not get
>> larger types, as they may be needed for intermediate results. Which
>> functions in your patch (or in the current graphite code), are only needed
>> to ensure that no problematic casts are inserted? This would probably help
>> me to understand how they ensure no problematic casts are generated.
>
> The current graphite code contains the gcc_type_for_clast functions
> that are computing wide enough types for expressions.  The types are
> computed based on the types of parameters and based on the types of
> the induction variables of the loops surrounding the statement using
> that expression.  These functions are not touched by the current patch
> set.
>
> The functions lb_ub_for_* added by this patch are the ones that will
> be very precise on the lower bound and upper bounds, as they are
> computing based on the polyhedral representation, instead of asking
> for the type of parameters, or the type of induction variables in an
> expression.  The lb_ub_for_* functions are too precise to compute the
> type of expressions other than the loop bounds.  This extra precision is
> needed to address the problem you exposed of increasingly wider types
> for inductions variables.

OK.

I just looked again through the gcc_type_ ... types and wondered why it 
contains both calls to max_precision_type() and 
max_signed_precision_type(). Is there a reason for this?

I also wondered if the solution of reusing the gcc_type_for_clast 
functions is the right one. As far as I understood the reason we do not 
use the type calculated by the lb_ub functions is that it may be too 
small and that it may introduce too many casts. So where exactly do we 
introduce these casts? Can we find a solution that solves this problem 
more locally. I mean instead of using an imprecise algorithm that leads 
to larger types which (accidentally?) leads to less casts, we could 
understand the type calculated by lb_ub_... as a minimal type needed for 
correctness. For each instruction that we code generate, we derive the 
type by electing a type that is large enough to be correct and that at 
the same time is not smaller than the smallest type of each operand.

If CLoog one day provides the lb/ub information itself, the migration 
pass will be straightforward.

What do you think?

>> P.S.: I would love to see this upper/lower bound calculation being part of
>> CLooG.

> +static struct clast_user_stmt *
> +clast_get_body_of (struct clast_stmt *stmt)
> +{
> +  if (!stmt
> +      || CLAST_STMT_IS_A (stmt, stmt_user))
> +    return (struct clast_user_stmt *) stmt;
> +
> +  if (CLAST_STMT_IS_A (stmt, stmt_for))
> +    return clast_get_body_of (((struct clast_for *) stmt)->body);
> +
> +  if (CLAST_STMT_IS_A (stmt, stmt_guard))
> +    return clast_get_body_of (((struct clast_guard *) stmt)->then);
> +
> +  if (CLAST_STMT_IS_A (stmt, stmt_block))
> +    return clast_get_body_of (((struct clast_block *) stmt)->body);
> +
> +  gcc_unreachable ();
> +}

The use of this function seems incorrect to me. We seem to only get the
first statement in a loop and use its domain and scattering to derive
the size of the iv type. Let's assume we have this code:

for (s0 = 0; s0 <= n; s0++) {
	if (s0 = 0)
		S1(s0);
	S2(s0);
}

Here we would get a range [0,0] instead of [0,n]. This does not seem to
be correct. A solution would be to completely switch to cloog-isl and
just use the information that the clast provides about the enumerated
space in each loop.

Cheers
Tobi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-18  7:04     ` Tobias Grosser
@ 2011-07-18 16:12       ` Sebastian Pop
  2011-07-18 16:48         ` Tobias Grosser
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Pop @ 2011-07-18 16:12 UTC (permalink / raw)
  To: Tobias Grosser; +Cc: gcc-patches, rguenther

On Sun, Jul 17, 2011 at 19:41, Tobias Grosser <tobias@grosser.es> wrote:
>> Accessing a[100] is undefined in C, so i<  100, and so n<= 100.
>
> Sure. I am just surprised we already include this information in graphite.
> Do we add this to the context?

Yes, that's added to the context as a bound on the loop iteration domain.

> I just looked again through the gcc_type_ ... types and wondered why it
> contains both calls to max_precision_type() and max_signed_precision_type().
> Is there a reason for this?

Well, it's all the same: max_precision_type calls
max_signed_precision_type when one of the types is signed.

Also we are calling max_signed_precision_type to determine the type of
the induction variable, as we want to generate signed IV types as much
as possible.

> I also wondered if the solution of reusing the gcc_type_for_clast functions
> is the right one. As far as I understood the reason we do not use the type
> calculated by the lb_ub functions is that it may be too small and that it
> may introduce too many casts. So where exactly do we introduce these casts?

We introduce casts when we generate code for an expression with a type
smaller than the types of its constituents.

Here is an example: supposing that we already code generated an IV
graphite_IV with a signed short type, then we are asked to generate
code for the expression "42 * graphite_IV", and finally suppose that
lb_ub returns the interval [-100, 100] for this expression, then we
would have a signed char type for the expression, and so we would
introduce a cast of the graphite_IV to signed char before doing the
multiplication: so I guess the code generated for this expression
would be something like: "42 * (char) graphite_IV" of type char.

Without taking the max of the types of the sub-expressions, you would
have to generate casts for a shorter type.

> Can we find a solution that solves this problem more locally. I mean instead
> of using an imprecise algorithm that leads to larger types which
> (accidentally?) leads to less casts, we could understand the type calculated
> by lb_ub_... as a minimal type needed for correctness. For each instruction
> that we code generate, we derive the type by electing a type that is large
> enough to be correct and that at the same time is not smaller than the
> smallest type of each operand.
>

You surely mean "not smaller than the *largest* type of each operand".

Yes, that's the intention of the gcc_type_for_clast_* functions, get a
max over the types of sub-expressions.

> If CLoog one day provides the lb/ub information itself, the migration pass
> will be straightforward.
>
> What do you think?

I agree with you that CLooG should provide this information for both
induction variables, and for each expression of the CLAST.  We should
not duplicate code doing the same thing in different compilers that
use CLooG.

>> +static struct clast_user_stmt *
>> +clast_get_body_of (struct clast_stmt *stmt)
>> +{
>> +  if (!stmt
>> +      || CLAST_STMT_IS_A (stmt, stmt_user))
>> +    return (struct clast_user_stmt *) stmt;
>> +
>> +  if (CLAST_STMT_IS_A (stmt, stmt_for))
>> +    return clast_get_body_of (((struct clast_for *) stmt)->body);
>> +
>> +  if (CLAST_STMT_IS_A (stmt, stmt_guard))
>> +    return clast_get_body_of (((struct clast_guard *) stmt)->then);
>> +
>> +  if (CLAST_STMT_IS_A (stmt, stmt_block))
>> +    return clast_get_body_of (((struct clast_block *) stmt)->body);
>> +
>> +  gcc_unreachable ();
>> +}
>
> The use of this function seems incorrect to me. We seem to only get the
> first statement in a loop and use its domain and scattering to derive
> the size of the iv type. Let's assume we have this code:
>
> for (s0 = 0; s0 <= n; s0++) {
>        if (s0 = 0)
>                S1(s0);
>        S2(s0);
> }
>
> Here we would get a range [0,0] instead of [0,n]. This does not seem to
> be correct. A solution would be to completely switch to cloog-isl and
> just use the information that the clast provides about the enumerated
> space in each loop.
>

You're right.  However this cleanup should be in a different patch,
probably after another patch set that removes support for CLooG-ppl.

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-18 16:12       ` Sebastian Pop
@ 2011-07-18 16:48         ` Tobias Grosser
  2011-07-18 17:48           ` Sebastian Pop
  0 siblings, 1 reply; 15+ messages in thread
From: Tobias Grosser @ 2011-07-18 16:48 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, rguenther

On 07/18/2011 05:03 PM, Sebastian Pop wrote:
> On Sun, Jul 17, 2011 at 19:41, Tobias Grosser<tobias@grosser.es>  wrote:
>>> Accessing a[100] is undefined in C, so i<    100, and so n<= 100.
>>
>> Sure. I am just surprised we already include this information in graphite.
>> Do we add this to the context?
>
> Yes, that's added to the context as a bound on the loop iteration domain.
>
>> I just looked again through the gcc_type_ ... types and wondered why it
>> contains both calls to max_precision_type() and max_signed_precision_type().
>> Is there a reason for this?
>
> Well, it's all the same: max_precision_type calls
> max_signed_precision_type when one of the types is signed.
>
> Also we are calling max_signed_precision_type to determine the type of
> the induction variable, as we want to generate signed IV types as much
> as possible.

OK.

>> I also wondered if the solution of reusing the gcc_type_for_clast functions
>> is the right one. As far as I understood the reason we do not use the type
>> calculated by the lb_ub functions is that it may be too small and that it
>> may introduce too many casts. So where exactly do we introduce these casts?
>
> We introduce casts when we generate code for an expression with a type
> smaller than the types of its constituents.

So the next question is: What is the type of the expression. As far as I 
understand we decide ourselves which type the expression should have. 
Our correctness constraint is that the type must be large enough to 
contain [lb, ub] of the result as well as all operands. We can choose 
any larger type to reduce the number of casts that will generated.

 > OK.
> Here is an example: supposing that we already code generated an IV
> graphite_IV with a signed short type, then we are asked to generate
> code for the expression "42 * graphite_IV", and finally suppose that
> lb_ub returns the interval [-100, 100] for this expression, then we
> would have a signed char type for the expression
So the signed char type is the smallest type we can use for the expression.

 > and so we would
> introduce a cast of the graphite_IV to signed char before doing the
> multiplication: so I guess the code generated for this expression
> would be something like: "42 * (char) graphite_IV" of type char.
> Without taking the max of the types of the sub-expressions, you would
> have to generate casts for a shorter type.

Your example is a little incomplete, as 42 is missing a type. At the 
moment gcc_type_for_value() returns the smallest type that contains the 
scalar value. So I assume the type is a signed char.

If we choose for as type of the expression signed char we get this:
42 * (char) graphite_IV

On the other hand, if we choose signed short for the expression we get:
((short) 42) * graphite_IV

Both expressions contain a cast. Which one is better for the optimizer 
is something I do not yet understand. However, I believe this choice is 
basically a choice that can be taken locally for each expression. Using 
either max(validy_type,(min(op1_type, op2_type, ...)) to favor small 
types or max(validy_type,(max(op1_type, op2_type, ...)) to favor large 
types.

>> Can we find a solution that solves this problem more locally. I mean instead
>> of using an imprecise algorithm that leads to larger types which
>> (accidentally?) leads to less casts, we could understand the type calculated
>> by lb_ub_... as a minimal type needed for correctness. For each instruction
>> that we code generate, we derive the type by electing a type that is large
>> enough to be correct and that at the same time is not smaller than the
>> smallest type of each operand.
>>
>
> You surely mean "not smaller than the *largest* type of each operand".

No. As said above, both seem to be valid choices and this seems to be 
basically an issue of generating code that can be optimized well by 
later transformations.

> Yes, that's the intention of the gcc_type_for_clast_* functions, get a
> max over the types of sub-expressions.

I honestly do not like this function chain. The problem is that we would 
calculate huge types, if we derive the type of an expression by only 
taking into account the types of its sub(sub, ...) expressions. At the 
moment we do not get these huge types, as our calculations are not 100% 
correct.

Lets assume we have a clast consisting of a single addition from several 
integers.

127 + 127 + 127 + 127

Each integer would get the type signed char. Now the function 
gcc_type_for_clast_red() would calculate the necessary type as max(char, 
char, char, char) => char. However the correct type is actually max 
(type([lb, ub]), min/max(char,char,char,char)). Where type([lb,ub]) = 
type([508,508]) = short and the result is max(short, char) = short.

gcc_type_for_clast_term () seems also to just use the type of the 
variable instead of the type that would actually be needed to calculate 
<integer> * variable.

The types would become even larger as they would increase while walking 
up the ClAST. This is because, we never take the actual ub,lb values 
into account, but always derive the minimal type needed for correctness 
from the types of the operands.

I believe, to derive correct types that are nor overly large, doing 
those calculations locally would be better. For each expression we use 
ub/lb to calculate the minimal type needed for correctness and then we 
choose a type equal or larger than this type that minimizes the number 
of bad casts.

>> If CLoog one day provides the lb/ub information itself, the migration pass
>> will be straightforward.
>>
>> What do you think?
>
> I agree with you that CLooG should provide this information for both
> induction variables, and for each expression of the CLAST.  We should
> not duplicate code doing the same thing in different compilers that
> use CLooG.
Nice. Now we need to implement this in CLooG. ;-)

>>> +static struct clast_user_stmt *
>>> +clast_get_body_of (struct clast_stmt *stmt)
>>> +{
>>> +  if (!stmt
>>> +      || CLAST_STMT_IS_A (stmt, stmt_user))
>>> +    return (struct clast_user_stmt *) stmt;
>>> +
>>> +  if (CLAST_STMT_IS_A (stmt, stmt_for))
>>> +    return clast_get_body_of (((struct clast_for *) stmt)->body);
>>> +
>>> +  if (CLAST_STMT_IS_A (stmt, stmt_guard))
>>> +    return clast_get_body_of (((struct clast_guard *) stmt)->then);
>>> +
>>> +  if (CLAST_STMT_IS_A (stmt, stmt_block))
>>> +    return clast_get_body_of (((struct clast_block *) stmt)->body);
>>> +
>>> +  gcc_unreachable ();
>>> +}
>>
>> The use of this function seems incorrect to me. We seem to only get the
>> first statement in a loop and use its domain and scattering to derive
>> the size of the iv type. Let's assume we have this code:
>>
>> for (s0 = 0; s0<= n; s0++) {
>>         if (s0 = 0)
>>                 S1(s0);
>>         S2(s0);
>> }
>>
>> Here we would get a range [0,0] instead of [0,n]. This does not seem to
>> be correct. A solution would be to completely switch to cloog-isl and
>> just use the information that the clast provides about the enumerated
>> space in each loop.
>>
>
> You're right.  However this cleanup should be in a different patch,
> probably after another patch set that removes support for CLooG-ppl.

Yes. Is is not worth trying to get this correct based on CLooG-PPL. I 
just wanted to mention this problem. It will be easy to solve with 
cloog-isl.

Cheers
Tobi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-18 16:48         ` Tobias Grosser
@ 2011-07-18 17:48           ` Sebastian Pop
  2011-07-18 22:43             ` Tobias Grosser
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Pop @ 2011-07-18 17:48 UTC (permalink / raw)
  To: Tobias Grosser; +Cc: gcc-patches, rguenther

On Mon, Jul 18, 2011 at 11:11, Tobias Grosser <tobias@grosser.es> wrote:
>> We introduce casts when we generate code for an expression with a type
>> smaller than the types of its constituents.
>
> So the next question is: What is the type of the expression. As far as I
> understand we decide ourselves which type the expression should have. Our
> correctness constraint is that the type must be large enough to contain [lb,
> ub] of the result as well as all operands. We can choose any larger type to
> reduce the number of casts that will generated.
>

Right, and so we cannot only rely on the lb_ub info, and we have to
walk the CLAST expression and determine the largest type used as a
sub-expression: that's what gcc_type_for_clast_expr tries to do.

>> OK.
>>
>> Here is an example: supposing that we already code generated an IV
>> graphite_IV with a signed short type, then we are asked to generate
>> code for the expression "42 * graphite_IV", and finally suppose that
>> lb_ub returns the interval [-100, 100] for this expression, then we
>> would have a signed char type for the expression
>
> So the signed char type is the smallest type we can use for the expression.
>
>> and so we would
>>
>> introduce a cast of the graphite_IV to signed char before doing the
>> multiplication: so I guess the code generated for this expression
>> would be something like: "42 * (char) graphite_IV" of type char.
>> Without taking the max of the types of the sub-expressions, you would
>> have to generate casts for a shorter type.
>
> Your example is a little incomplete, as 42 is missing a type. At the moment
> gcc_type_for_value() returns the smallest type that contains the scalar
> value. So I assume the type is a signed char.
>
> If we choose for as type of the expression signed char we get this:
> 42 * (char) graphite_IV
>
> On the other hand, if we choose signed short for the expression we get:
> ((short) 42) * graphite_IV
>

That would be just "42 * graphite_IV" of type short, as widening from
char to short is possible without extra cast.

> Both expressions contain a cast. Which one is better for the optimizer is
> something I do not yet understand. However, I believe this choice is
> basically a choice that can be taken locally for each expression. Using
> either max(validy_type,(min(op1_type, op2_type, ...)) to favor small types
> or max(validy_type,(max(op1_type, op2_type, ...)) to favor large types.
>
>>> Can we find a solution that solves this problem more locally. I mean
>>> instead
>>> of using an imprecise algorithm that leads to larger types which
>>> (accidentally?) leads to less casts, we could understand the type
>>> calculated
>>> by lb_ub_... as a minimal type needed for correctness. For each
>>> instruction
>>> that we code generate, we derive the type by electing a type that is
>>> large
>>> enough to be correct and that at the same time is not smaller than the
>>> smallest type of each operand.
>>>
>>
>> You surely mean "not smaller than the *largest* type of each operand".
>
> No. As said above, both seem to be valid choices and this seems to be

Here is a counter-example where taking the type "not smaller than the
smallest type of each operand" produces wrong code:

expression "a + b"
type of a is char
type of b is short
a goes from -100 to 100
b goes from 100 to 200

Computing the type of "a + b" as the type "not smaller than the
smallest type of each operand" leads to char.  Then, we would have to
generate the expression "(char) a + (char) b" of type char.  Casting
200 into a char overflows and is undefined.

> I honestly do not like this function chain. The problem is that we would
> calculate huge types, if we derive the type of an expression by only taking
> into account the types of its sub(sub, ...) expressions. At the moment we do
> not get these huge types, as our calculations are not 100% correct.
>
> Lets assume we have a clast consisting of a single addition from several
> integers.
>
> 127 + 127 + 127 + 127
>
> Each integer would get the type signed char. Now the function
> gcc_type_for_clast_red() would calculate the necessary type as max(char,
> char, char, char) => char. However the correct type is actually max
> (type([lb, ub]), min/max(char,char,char,char)). Where type([lb,ub]) =
> type([508,508]) = short and the result is max(short, char) = short.
>

Right.  We should be able to represent both the sub-expressions and
their results in the computed type.  The current implementation of
gcc_type_for_clast_* is wrong.

> gcc_type_for_clast_term () seems also to just use the type of the variable
> instead of the type that would actually be needed to calculate <integer> *
> variable.
>
> The types would become even larger as they would increase while walking up
> the ClAST. This is because, we never take the actual ub,lb values into
> account, but always derive the minimal type needed for correctness from the
> types of the operands.
>
> I believe, to derive correct types that are nor overly large, doing those
> calculations locally would be better. For each expression we use ub/lb to
> calculate the minimal type needed for correctness and then we choose a type
> equal or larger than this type that minimizes the number of bad casts.

You would also need to make sure that every sub-expression can be
expressed in that computed type: the example here is that of a min or
a max expression of a loop bound: even if we know from the iteration
domain that the IV belongs to [-128, 127] we cannot generate a char
for this loop:

for (i = -128; i < min (127, j + 200); i++)

we need a type large enough to be able to represent "j + 200", and
that would be short.  So we need to take again the "max of the types
of sub-expressions".

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-18 17:48           ` Sebastian Pop
@ 2011-07-18 22:43             ` Tobias Grosser
  2011-07-18 23:24               ` Sebastian Pop
  0 siblings, 1 reply; 15+ messages in thread
From: Tobias Grosser @ 2011-07-18 22:43 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, rguenther

On 07/18/2011 07:07 PM, Sebastian Pop wrote:
> On Mon, Jul 18, 2011 at 11:11, Tobias Grosser<tobias@grosser.es>  wrote:
>>> We introduce casts when we generate code for an expression with a type
>>> smaller than the types of its constituents.
>>
>> So the next question is: What is the type of the expression. As far as I
>> understand we decide ourselves which type the expression should have. Our
>> correctness constraint is that the type must be large enough to contain [lb,
>> ub] of the result as well as all operands. We can choose any larger type to
>> reduce the number of casts that will generated.
>>
>
> Right, and so we cannot only rely on the lb_ub info, and we have to
> walk the CLAST expression and determine the largest type used as a we
> sub-expression: that's what gcc_type_for_clast_expr tries to do.
>>> OK
>>>
>>> Here is an example: supposing that we already code generated an IV
>>> graphite_IV with a signed short type, then we are asked to generate
>>> code for the expression "42 * graphite_IV", and finally suppose that
>>> lb_ub returns the interval [-100, 100] for this expression, then we
>>> would have a signed char type for the expression
>>
>> So the signed char type is the smallest type we can use for the expression.
>>
>>> and so we would
>>>
>>> introduce a cast of the graphite_IV to signed char before doing the
>>> multiplication: so I guess the code generated for this expression
>>> would be something like: "42 * (char) graphite_IV" of type char.
>>> Without taking the max of the types of the sub-expressions, you would
>>> have to generate casts for a shorter type.
>>
>> Your example is a little incomplete, as 42 is missing a type. At the moment
>> gcc_type_for_value() returns the smallest type that contains the scalar
>> value. So I assume the type is a signed char.
>>
>> If we choose for as type of the expression signed char we get this:
>> 42 * (char) graphite_IV
>>
>> On the other hand, if we choose signed short for the expression we get:
>> ((short) 42) * graphite_IV
>>
>
> That would be just "42 * graphite_IV" of type short, as widening from
> char to short is possible without extra cast.
OK.

>> Both expressions contain a cast. Which one is better for the optimizer is
>> something I do not yet understand. However, I believe this choice is
>> basically a choice that can be taken locally for each expression. Using
>> either max(validy_type,(min(op1_type, op2_type, ...)) to favor small types
>> or max(validy_type,(max(op1_type, op2_type, ...)) to favor large types.
>>
>>>> Can we find a solution that solves this problem more locally. I mean
>>>> instead
>>>> of using an imprecise algorithm that leads to larger types which
>>>> (accidentally?) leads to less casts, we could understand the type
>>>> calculated
>>>> by lb_ub_... as a minimal type needed for correctness. For each
>>>> instruction
>>>> that we code generate, we derive the type by electing a type that is
>>>> large
>>>> enough to be correct and that at the same time is not smaller than the
>>>> smallest type of each operand.
>>>>
>>>
>>> You surely mean "not smaller than the *largest* type of each operand".
>>
>> No. As said above, both seem to be valid choices and this seems to be
>
> Here is a counter-example where taking the type "not smaller than the
> smallest type of each operand" produces wrong code:
>
> expression "a + b"
> type of a is char
> type of b is short
> a goes from -100 to 100
> b goes from 100 to 200
>
> Computing the type of "a + b" as the type "not smaller than the
> smallest type of each operand" leads to char.  Then, we would have to
> generate the expression "(char) a + (char) b" of type char.  Casting
> 200 into a char overflows and is undefined.

You are right. In case we want to downcast b we would also need to prove 
that b itself fits into the smaller type. (This may happen, if the 
larger type was introduced earlier to remove casts, but is not needed to 
store the result). In most cases widening is probably easier.

>> I honestly do not like this function chain. The problem is that we would
>> calculate huge types, if we derive the type of an expression by only taking
>> into account the types of its sub(sub, ...) expressions. At the moment we do
>> not get these huge types, as our calculations are not 100% correct.
>>
>> Lets assume we have a clast consisting of a single addition from several
>> integers.
>>
>> 127 + 127 + 127 + 127
>>
>> Each integer would get the type signed char. Now the function
>> gcc_type_for_clast_red() would calculate the necessary type as max(char,
>> char, char, char) =>  char. However the correct type is actually max
>> (type([lb, ub]), min/max(char,char,char,char)). Where type([lb,ub]) =
>> type([508,508]) = short and the result is max(short, char) = short.
>>
>
> Right.  We should be able to represent both the sub-expressions and
> their results in the computed type.  The current implementation of
> gcc_type_for_clast_* is wrong.
>
>> gcc_type_for_clast_term () seems also to just use the type of the variable
>> instead of the type that would actually be needed to calculate<integer>  *
>> variable.
>>
>> The types would become even larger as they would increase while walking up
>> the ClAST. This is because, we never take the actual ub,lb values into
>> account, but always derive the minimal type needed for correctness from the
>> types of the operands.
>>
>> I believe, to derive correct types that are nor overly large, doing those
>> calculations locally would be better. For each expression we use ub/lb to
>> calculate the minimal type needed for correctness and then we choose a type
>> equal or larger than this type that minimizes the number of bad casts.
>
> You would also need to make sure that every sub-expression can be
> expressed in that computed type: the example here is that of a min or
> a max expression of a loop bound: even if we know from the iteration
> domain that the IV belongs to [-128, 127] we cannot generate a char
> for this loop:

Sure. Only taking the lb/ub of the loop induction variable is not enough 
to derive a correct type for the whole clast. We need to get the right 
type for each subexpression.

> for (i = -128; i<  min (127, j + 200); i++)
>
> we need a type large enough to be able to represent "j + 200", and
> that would be short.  So we need to take again the "max of the types
> of sub-expressions".

What is the "max of the types of sub-expressions"? Just the larges type 
is in general not large enough. a+b may not fit into the type of a nor 
in the type of b. However, assuming a and b may have any value that the 
types of a and b allow, would require us to assume a type that is 
definitely larger then the type of each argument (we do not do this, 
thats why some function are incorrect). Implementing this approach would 
yield to an explosion in type size.

What I propose is that you take advantage of your newly built lb/ub 
infrastructure and that we calculate for every subexpression the type as 
follows.

minimal_correct_type = max(min_type(lb_result, ub_result),
                            min_type(lb_op1, ub_op1),
                            min_type(lb_op2, ub_op2))

And then we choose a type that is equal or larger than 
minimal_correct_type and that reduces the number of introduced downcasts.

Cheers
Tobi



What we want is to get for each a single subexpression the types that






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-18 22:43             ` Tobias Grosser
@ 2011-07-18 23:24               ` Sebastian Pop
  2011-07-18 23:36                 ` Tobias Grosser
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Pop @ 2011-07-18 23:24 UTC (permalink / raw)
  To: Tobias Grosser; +Cc: gcc-patches, rguenther

On Mon, Jul 18, 2011 at 16:40, Tobias Grosser <tobias@grosser.es> wrote:
> You are right. In case we want to downcast b we would also need to prove
> that b itself fits into the smaller type. (This may happen, if the larger
> type was introduced earlier to remove casts, but is not needed to store the
> result).

[...]

>> for (i = -128; i<  min (127, j + 200); i++)
>>
>> we need a type large enough to be able to represent "j + 200", and
>> that would be short.  So we need to take again the "max of the types
>> of sub-expressions".
>
> What is the "max of the types of sub-expressions"? Just the larges type is
> in general not large enough. a+b may not fit into the type of a nor in the
> type of b. However, assuming a and b may have any value that the types of a
> and b allow, would require us to assume a type that is definitely larger
> then the type of each argument (we do not do this, thats why some function
> are incorrect). Implementing this approach would yield to an explosion in
> type size.
>
> What I propose is that you take advantage of your newly built lb/ub
> infrastructure and that we calculate for every subexpression the type as
> follows.
>
> minimal_correct_type = max(min_type(lb_result, ub_result),
>                           min_type(lb_op1, ub_op1),
>                           min_type(lb_op2, ub_op2))
>
> And then we choose a type that is equal or larger than minimal_correct_type
> and that reduces the number of introduced downcasts.

minimal_correct_type wouldn't work on this example:

Suppose we have a loop that we have already generated with an
induction variable of type short for the same reasons as in the
previous example, and such that the lb_ub of the IV is still in the
range of char:

for (i = -100; i < min (100, j + 200); i++)

Now suppose that we want to determine the type of expression "i + 1".
Based on the lb_ub information of i, we would compute that
minimal_correct_type is char, because the result of "i + 1" is in
[-99, 101], so that's char and the lb_ub of the op1 and op2 are char
as well, so max (char, char, char) = char.

Generating code for "i + 1" of type char would need a cast of i from
short to char, and then computing the plus expr in the char type:
"(char) i + 1", and here you cannot eliminate the cast.

So if CLooG's answer to "what is the type of i + 1" is char, then you
would not be able to remove that cast to make the loop vectorizable.
In this particular example, I really want CLooG to answer "the type of
i + 1 is short", such that I do not need any extra casts between the
main induction variable and the plus expression.

What about the following formula that integrates not only the lb_ub
info (minimal_correct_type as you defined it), but also the types of
the sub-expressions:

minimal_correct_type_1 = max(minimal_correct_type, type_op1, type_op2)

Can you think at an example where that would still lead to an
"explosion in type size"?

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Fix PR47654 and PR49649
  2011-07-18 23:24               ` Sebastian Pop
@ 2011-07-18 23:36                 ` Tobias Grosser
  0 siblings, 0 replies; 15+ messages in thread
From: Tobias Grosser @ 2011-07-18 23:36 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, rguenther

On 07/19/2011 12:33 AM, Sebastian Pop wrote:
> On Mon, Jul 18, 2011 at 16:40, Tobias Grosser<tobias@grosser.es>  wrote:
>> You are right. In case we want to downcast b we would also need to prove
>> that b itself fits into the smaller type. (This may happen, if the larger
>> type was introduced earlier to remove casts, but is not needed to store the
>> result).
>
> [...]
>
>>> for (i = -128; i<    min (127, j + 200); i++)
>>>
>>> we need a type large enough to be able to represent "j + 200", and
>>> that would be short.  So we need to take again the "max of the types
>>> of sub-expressions".
>>
>> What is the "max of the types of sub-expressions"? Just the larges type is
>> in general not large enough. a+b may not fit into the type of a nor in the
>> type of b. However, assuming a and b may have any value that the types of a
>> and b allow, would require us to assume a type that is definitely larger
>> then the type of each argument (we do not do this, thats why some function
>> are incorrect). Implementing this approach would yield to an explosion in
>> type size.
>>
>> What I propose is that you take advantage of your newly built lb/ub
>> infrastructure and that we calculate for every subexpression the type as
>> follows.
>>
>> minimal_correct_type = max(min_type(lb_result, ub_result),
>>                            min_type(lb_op1, ub_op1),
>>                            min_type(lb_op2, ub_op2))
>>
>> And then we choose a type that is equal or larger than minimal_correct_type
>> and that reduces the number of introduced downcasts.
>
> minimal_correct_type wouldn't work on this example:
>
> Suppose we have a loop that we have already generated with an
> induction variable of type short for the same reasons as in the
> previous example, and such that the lb_ub of the IV is still in the
> range of char:
>
> for (i = -100; i<  min (100, j + 200); i++)
>
> Now suppose that we want to determine the type of expression "i + 1".
> Based on the lb_ub information of i, we would compute that
> minimal_correct_type is char, because the result of "i + 1" is in
> [-99, 101], so that's char and the lb_ub of the op1 and op2 are char
> as well, so max (char, char, char) = char.
>
> Generating code for "i + 1" of type char would need a cast of i from
> short to char, and then computing the plus expr in the char type:
> "(char) i + 1", and here you cannot eliminate the cast.
>
> So if CLooG's answer to "what is the type of i + 1" is char, then you
> would not be able to remove that cast to make the loop vectorizable.
> In this particular example, I really want CLooG to answer "the type of
> i + 1 is short", such that I do not need any extra casts between the
> main induction variable and the plus expression.
>
> What about the following formula that integrates not only the lb_ub
> info (minimal_correct_type as you defined it), but also the types of
> the sub-expressions:
>
> minimal_correct_type_1 = max(minimal_correct_type, type_op1, type_op2)

This formula sounds reasonable and is one solution for what I ment with

"And then we choose a type that is equal or larger than 
minimal_correct_type and that reduces the number of introduced downcast"
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

So by adding the max with the op1 and op2 types, we try to reduce the 
the number of introduced downcasts. I am not sure if your proposed 
formular is optimal in all cases, however it seems to be a good start.

> Can you think at an example where that would still lead to an
> "explosion in type size"?

No. This formular is correct and I do not think it would lead to any 
explosion. We would not use minimal types, but I agree that minimizing 
the types should only be done if we know it will be beneficial even 
though it introduces additional casts. We can think later about such an 
optimization.

With your formular we have a criteria, that allows us to select the type 
for each subexpression locally. I like this a lot better.

Cheers
Tobi

P.S.: In respect of getting these changes in, it may be best to just 
commit the patches you already have and keep improving from this. The 
patches you already have do not introduce any regressions. They are just 
do not fix all problems.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-07-18 22:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-07 18:07 [PATCH 0/3] Fix PR47654 and PR49649 Sebastian Pop
2011-07-07 18:07 ` [PATCH 1/3] Start counting nesting level from 0 and use the standard "Polyhedral SCattering Transformed" psct_* interface Sebastian Pop
2011-07-07 18:08 ` [PATCH 2/3] Do not compute twice type, lb, and ub Sebastian Pop
2011-07-07 18:15 ` [PATCH 3/3] Fix PR47654: Compute LB and UB of a CLAST expression Sebastian Pop
2011-07-08  8:38 ` [PATCH 0/3] Fix PR47654 and PR49649 Richard Guenther
2011-07-15 23:08   ` Sebastian Pop
2011-07-17  7:54 ` Tobias Grosser
2011-07-17 11:31   ` Sebastian Pop
2011-07-18  7:04     ` Tobias Grosser
2011-07-18 16:12       ` Sebastian Pop
2011-07-18 16:48         ` Tobias Grosser
2011-07-18 17:48           ` Sebastian Pop
2011-07-18 22:43             ` Tobias Grosser
2011-07-18 23:24               ` Sebastian Pop
2011-07-18 23:36                 ` Tobias Grosser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).