* [PATCH][GRAPHITE] More TLC
@ 2017-09-22 13:03 Richard Biener
2017-09-22 14:30 ` Sebastian Pop
2017-09-25 7:39 ` [PATCH][GRAPHITE] More TLC Richard Biener
0 siblings, 2 replies; 26+ messages in thread
From: Richard Biener @ 2017-09-22 13:03 UTC (permalink / raw)
To: gcc-patches
This simplifies canonicalize_loop_closed_ssa and does other minimal
TLC. It also adds a testcase I reduced from a stupid mistake I made
when reworking canonicalize_loop_closed_ssa.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
SPEC CPU 2006 is happy with it, current statistics on x86_64 with
-Ofast -march=haswell -floop-nest-optimize are
61 loop nests "optimized"
45 loop nest transforms cancelled because of code generation issues
21 loop nest optimizations timed out the 350000 ISL "operations" we allow
I say "optimized" because the usual transform I've seen is static tiling
as enforced by GRAPHITE according to --param loop-block-tile-size.
There's no way to automagically figure what kind of transform ISL did
(usually none with the schedule identical check confused by FILTER
stuff positioning). This is also the issue with most GRAPHITE testcases.
We can't really verify easily whether we performed loop interchange
or not. We can probably tell whether we applied loop fusion or
splitting (by counting loops).
I'm not aware of any remaining ICEs / wrong-code issues with GRAPHITE.
I'm aware that the current "black-box" granularity hinders
scheduling freedom (each GIMPLE BB is mapped to a ISL stmt, this
is too coarse to schedule say two writes in a BB independently
from each other). Quick experiments could be done by simply
splitting gimple BBs at some points.
I'm aware that the SCOP detection algorithm assumes that it can
walk loop->next and find loops "in order" -- but while that's
true for the initial flow_loops_find result (DFS walk) it isn't
true for any later created / discovered loops. Sorting of
loop siblings in DFS order should be easy (and a general cfgloopanal
helper).
Richard.
2017-09-22 Richard Biener <rguenther@suse.de>
* graphite-isl-ast-to-gimple.c (graphite_verify): Inline into
single caller.
(graphite_regenerate_ast_isl): Do not reset SCEV. Move debug
print of no dependency loops ...
* graphite.c (graphite_transform_loops): ... here.
(canonicalize_loop_closed_ssa_form): Work from inner to outer
loops.
(same_close_phi_node, remove_duplicate_close_phi,
make_close_phi_nodes_unique, defined_in_loop_p): Fold into ...
(canonicalize_loop_closed_ssa): ... here and simplify.
* graphite-optimize-isl.c: Include tree-vectorizer.h.
(optimize_isl): Use dump_printf_loc to tell when we stopped
optimizing because of an ISL timeout.
* gcc.dg/graphite/scop-24.c: New testcase.
Index: gcc/graphite-isl-ast-to-gimple.c
===================================================================
--- gcc/graphite-isl-ast-to-gimple.c (revision 253091)
+++ gcc/graphite-isl-ast-to-gimple.c (working copy)
@@ -73,15 +73,6 @@ struct ast_build_info
bool is_parallelizable;
};
-/* Verifies properties that GRAPHITE should maintain during translation. */
-
-static inline void
-graphite_verify (void)
-{
- checking_verify_loop_structure ();
- checking_verify_loop_closed_ssa (true);
-}
-
/* IVS_PARAMS maps isl's scattering and parameter identifiers
to corresponding trees. */
@@ -2997,8 +2988,9 @@ graphite_regenerate_ast_isl (scop_p scop
delete_loop (loop);
}
- graphite_verify ();
- scev_reset ();
+ /* Verifies properties that GRAPHITE should maintain during translation. */
+ checking_verify_loop_structure ();
+ checking_verify_loop_closed_ssa (true);
free (if_region->true_region);
free (if_region->region);
@@ -3008,19 +3000,6 @@ graphite_regenerate_ast_isl (scop_p scop
isl_ast_node_free (root_node);
timevar_pop (TV_GRAPHITE_CODE_GEN);
- if (dump_file && (dump_flags & TDF_DETAILS))
- {
- loop_p loop;
- int num_no_dependency = 0;
-
- FOR_EACH_LOOP (loop, 0)
- if (loop->can_be_parallel)
- num_no_dependency++;
-
- fprintf (dump_file, "%d loops carried no dependency.\n",
- num_no_dependency);
- }
-
return !t.codegen_error_p ();
}
Index: gcc/graphite-optimize-isl.c
===================================================================
--- gcc/graphite-optimize-isl.c (revision 253091)
+++ gcc/graphite-optimize-isl.c (working copy)
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
#include "tree-data-ref.h"
#include "params.h"
#include "dumpfile.h"
+#include "tree-vectorizer.h"
#include "graphite.h"
@@ -156,9 +157,12 @@ optimize_isl (scop_p scop)
if (!scop->transformed_schedule
|| isl_ctx_last_error (scop->isl_context) == isl_error_quota)
{
- if (dump_file && dump_flags)
- fprintf (dump_file, "isl timed out --param max-isl-operations=%d\n",
- max_operations);
+ location_t loc = find_loop_location
+ (scop->scop_info->region.entry->dest->loop_father);
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+ "loop nest not optimized, optimization timed out "
+ "after %d operations [--param max-isl-operations]\n",
+ max_operations);
return false;
}
Index: gcc/graphite.c
===================================================================
--- gcc/graphite.c (revision 253091)
+++ gcc/graphite.c (working copy)
@@ -293,86 +293,6 @@ free_scops (vec<scop_p> scops)
scops.release ();
}
-/* Returns true when P1 and P2 are close phis with the same
- argument. */
-
-static inline bool
-same_close_phi_node (gphi *p1, gphi *p2)
-{
- return (types_compatible_p (TREE_TYPE (gimple_phi_result (p1)),
- TREE_TYPE (gimple_phi_result (p2)))
- && operand_equal_p (gimple_phi_arg_def (p1, 0),
- gimple_phi_arg_def (p2, 0), 0));
-}
-
-static void make_close_phi_nodes_unique (basic_block bb);
-
-/* Remove the close phi node at GSI and replace its rhs with the rhs
- of PHI. */
-
-static void
-remove_duplicate_close_phi (gphi *phi, gphi_iterator *gsi)
-{
- gimple *use_stmt;
- use_operand_p use_p;
- imm_use_iterator imm_iter;
- tree res = gimple_phi_result (phi);
- tree def = gimple_phi_result (gsi->phi ());
-
- gcc_assert (same_close_phi_node (phi, gsi->phi ()));
-
- FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
- {
- FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
- SET_USE (use_p, res);
-
- update_stmt (use_stmt);
-
- /* It is possible that we just created a duplicate close-phi
- for an already-processed containing loop. Check for this
- case and clean it up. */
- if (gimple_code (use_stmt) == GIMPLE_PHI
- && gimple_phi_num_args (use_stmt) == 1)
- make_close_phi_nodes_unique (gimple_bb (use_stmt));
- }
-
- remove_phi_node (gsi, true);
-}
-
-/* Removes all the close phi duplicates from BB. */
-
-static void
-make_close_phi_nodes_unique (basic_block bb)
-{
- gphi_iterator psi;
-
- for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
- {
- gphi_iterator gsi = psi;
- gphi *phi = psi.phi ();
-
- /* At this point, PHI should be a close phi in normal form. */
- gcc_assert (gimple_phi_num_args (phi) == 1);
-
- /* Iterate over the next phis and remove duplicates. */
- gsi_next (&gsi);
- while (!gsi_end_p (gsi))
- if (same_close_phi_node (phi, gsi.phi ()))
- remove_duplicate_close_phi (phi, &gsi);
- else
- gsi_next (&gsi);
- }
-}
-
-/* Return true when NAME is defined in LOOP. */
-
-static bool
-defined_in_loop_p (tree name, loop_p loop)
-{
- gcc_assert (TREE_CODE (name) == SSA_NAME);
- return loop == loop_containing_stmt (SSA_NAME_DEF_STMT (name));
-}
-
/* Transforms LOOP to the canonical loop closed SSA form. */
static void
@@ -380,20 +300,22 @@ canonicalize_loop_closed_ssa (loop_p loo
{
edge e = single_exit (loop);
basic_block bb;
+ gphi_iterator psi;
if (!e || (e->flags & EDGE_COMPLEX))
return;
bb = e->dest;
+ /* Make the loop-close PHI node BB contain only PHIs and have a
+ single predecessor. */
if (single_pred_p (bb))
{
e = split_block_after_labels (bb);
- make_close_phi_nodes_unique (e->src);
+ bb = e->src;
}
else
{
- gphi_iterator psi;
basic_block close = split_edge (e);
e = single_succ_edge (close);
for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
@@ -404,7 +326,7 @@ canonicalize_loop_closed_ssa (loop_p loo
/* Only add close phi nodes for SSA_NAMEs defined in LOOP. */
if (TREE_CODE (arg) != SSA_NAME
- || !defined_in_loop_p (arg, loop))
+ || loop_containing_stmt (SSA_NAME_DEF_STMT (arg)) != loop)
continue;
tree res = copy_ssa_name (arg);
@@ -413,8 +335,30 @@ canonicalize_loop_closed_ssa (loop_p loo
UNKNOWN_LOCATION);
SET_USE (use_p, res);
}
+ bb = close;
+ }
- make_close_phi_nodes_unique (close);
+ /* Eliminate duplicates. This relies on processing loops from
+ innermost to outer. */
+ for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
+ {
+ gphi_iterator gsi = psi;
+ gphi *phi = psi.phi ();
+
+ /* At this point, PHI should be a close phi in normal form. */
+ gcc_assert (gimple_phi_num_args (phi) == 1);
+
+ /* Iterate over the next phis and remove duplicates. */
+ gsi_next (&gsi);
+ while (!gsi_end_p (gsi))
+ if (gimple_phi_arg_def (phi, 0) == gimple_phi_arg_def (gsi.phi (), 0))
+ {
+ replace_uses_by (gimple_phi_result (gsi.phi ()),
+ gimple_phi_result (phi));
+ remove_phi_node (&gsi, true);
+ }
+ else
+ gsi_next (&gsi);
}
}
@@ -443,7 +387,7 @@ static void
canonicalize_loop_closed_ssa_form (void)
{
loop_p loop;
- FOR_EACH_LOOP (loop, 0)
+ FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
canonicalize_loop_closed_ssa (loop);
checking_verify_loop_closed_ssa (true);
@@ -509,6 +453,19 @@ graphite_transform_loops (void)
"loop nest optimized\n");
}
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ loop_p loop;
+ int num_no_dependency = 0;
+
+ FOR_EACH_LOOP (loop, 0)
+ if (loop->can_be_parallel)
+ num_no_dependency++;
+
+ fprintf (dump_file, "%d loops carried no dependency.\n",
+ num_no_dependency);
+ }
+
free_scops (scops);
graphite_finalize (need_cfg_cleanup_p);
the_isl_ctx = NULL;
Index: gcc/testsuite/gcc.dg/graphite/scop-24.c
===================================================================
--- gcc/testsuite/gcc.dg/graphite/scop-24.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/scop-24.c (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -floop-nest-optimize" } */
+
+typedef struct _IO_FILE FILE;
+extern struct _IO_FILE *stderr;
+typedef float real;
+typedef real rvec[3];
+int rgbset (int);
+void ps_box (int, int);
+void plot_phi(char *fn,rvec box,int natoms,rvec x[],real phi[])
+{
+ real phi_max,rr,gg,bb,fac,dx,x0,y0;
+ int i;
+ for(i=0; (i<natoms); i++)
+ phi_max=((phi_max > __builtin_fabs(phi[i]))
+ ? phi_max : __builtin_fabs(phi[i]));
+ if (__builtin_fabs(phi_max)<1.2e-38)
+ __builtin_fprintf(stderr, "X");
+ ps_box((real)(fac*box[0]-1),(real)(fac*box[1]-1));
+ for(i=0; (i<natoms); i++)
+ {
+ rr=gg=bb=1.0;
+ if (phi[i] < 0)
+ gg=bb=(1.0+(phi[i]/phi_max));
+ else
+ rr=gg=(1.0-(phi[i]/phi_max));
+ rr=rgbset(rr);
+ }
+}
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-22 13:03 [PATCH][GRAPHITE] More TLC Richard Biener
@ 2017-09-22 14:30 ` Sebastian Pop
2017-09-25 13:12 ` Richard Biener
2017-09-25 7:39 ` [PATCH][GRAPHITE] More TLC Richard Biener
1 sibling, 1 reply; 26+ messages in thread
From: Sebastian Pop @ 2017-09-22 14:30 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 3637 bytes --]
On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de> wrote:
>
> This simplifies canonicalize_loop_closed_ssa and does other minimal
> TLC. It also adds a testcase I reduced from a stupid mistake I made
> when reworking canonicalize_loop_closed_ssa.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> -Ofast -march=haswell -floop-nest-optimize are
>
> 61 loop nests "optimized"
> 45 loop nest transforms cancelled because of code generation issues
> 21 loop nest optimizations timed out the 350000 ISL "operations" we allow
>
> I say "optimized" because the usual transform I've seen is static tiling
> as enforced by GRAPHITE according to --param loop-block-tile-size.
> There's no way to automagically figure what kind of transform ISL did
>
Here is how to automate (without magic) the detection
of the transform that isl did.
The problem solved by isl is the minimization of strides
in memory, and to do this, we need to tell the isl scheduler
the validity dependence graph, in graphite-optimize-isl.c
see the validity (RAW, WAR, WAW) and the proximity
(RAR + validity) maps. The proximity does include the
read after read, as the isl scheduler needs to minimize
strides between consecutive reads.
When you apply the schedule to the dependence graph,
one can tell from the result the strides in memory, a good
way to say whether a transform was beneficial is to sum up
all memory strides, and make sure that the sum of all strides
decreases after transform. We could add a printf with the
sum of strides before and after transforms, and have the
testcases check for that.
(usually none with the schedule identical check confused by FILTER
> stuff positioning). This is also the issue with most GRAPHITE testcases.
> We can't really verify easily whether we performed loop interchange
> or not. We can probably tell whether we applied loop fusion or
> splitting (by counting loops).
>
> I'm not aware of any remaining ICEs / wrong-code issues with GRAPHITE.
>
> I'm aware that the current "black-box" granularity hinders
> scheduling freedom (each GIMPLE BB is mapped to a ISL stmt, this
> is too coarse to schedule say two writes in a BB independently
> from each other). Quick experiments could be done by simply
> splitting gimple BBs at some points.
>
> I'm aware that the SCOP detection algorithm assumes that it can
> walk loop->next and find loops "in order" -- but while that's
> true for the initial flow_loops_find result (DFS walk) it isn't
> true for any later created / discovered loops. Sorting of
> loop siblings in DFS order should be easy (and a general cfgloopanal
> helper).
>
> Richard.
>
> 2017-09-22 Richard Biener <rguenther@suse.de>
>
> * graphite-isl-ast-to-gimple.c (graphite_verify): Inline into
> single caller.
> (graphite_regenerate_ast_isl): Do not reset SCEV. Move debug
> print of no dependency loops ...
> * graphite.c (graphite_transform_loops): ... here.
> (canonicalize_loop_closed_ssa_form): Work from inner to outer
> loops.
> (same_close_phi_node, remove_duplicate_close_phi,
> make_close_phi_nodes_unique, defined_in_loop_p): Fold into ...
> (canonicalize_loop_closed_ssa): ... here and simplify.
> * graphite-optimize-isl.c: Include tree-vectorizer.h.
> (optimize_isl): Use dump_printf_loc to tell when we stopped
> optimizing because of an ISL timeout.
>
> * gcc.dg/graphite/scop-24.c: New testcase.
>
>
The change looks good to me.
Thanks,
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-22 13:03 [PATCH][GRAPHITE] More TLC Richard Biener
2017-09-22 14:30 ` Sebastian Pop
@ 2017-09-25 7:39 ` Richard Biener
2017-09-25 12:46 ` Richard Biener
1 sibling, 1 reply; 26+ messages in thread
From: Richard Biener @ 2017-09-25 7:39 UTC (permalink / raw)
To: gcc-patches
On Fri, 22 Sep 2017, Richard Biener wrote:
>
> This simplifies canonicalize_loop_closed_ssa and does other minimal
> TLC. It also adds a testcase I reduced from a stupid mistake I made
> when reworking canonicalize_loop_closed_ssa.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> -Ofast -march=haswell -floop-nest-optimize are
>
> 61 loop nests "optimized"
> 45 loop nest transforms cancelled because of code generation issues
> 21 loop nest optimizations timed out the 350000 ISL "operations" we allow
Overall compile time (with -j6) is 695 sec. w/o -floop-nest-optimize
and 709 sec. with (this was with release checking).
A single-run has 416.gamess (580s -> 618s),
436.cactusADM (206s -> 182s), 437.leslie3d (228s ->218s),
450.soplex (229s -> 226s), 465.tonto (428s -> 425s), 401.bzip2 (383s ->
379s), 462.libquantum (352s -> 343s), ignoring +-2s changes. Will
do a 3-run for those to confirm (it would be only a single regression
for 416.gamess).
Sofar I'm positively surprised given the limitations (and inefficiencies)
I know.
I'll add some more opt-info stuff to assess the number of SCOPs we
detect but discard during further analysis and the number of transforms
we cancel because they turn out as a no-op.
Richard.
> {
> - if (dump_file && dump_flags)
> - fprintf (dump_file, "isl timed out --param max-isl-operations=%d\n",
> - max_operations);
> + location_t loc = find_loop_location
> + (scop->scop_info->region.entry->dest->loop_father);
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
> + "loop nest not optimized, optimization timed out "
> + "after %d operations [--param max-isl-operations]\n",
> + max_operations);
> return false;
> }
>
> Index: gcc/graphite.c
> ===================================================================
> --- gcc/graphite.c (revision 253091)
> +++ gcc/graphite.c (working copy)
> @@ -293,86 +293,6 @@ free_scops (vec<scop_p> scops)
> scops.release ();
> }
>
> -/* Returns true when P1 and P2 are close phis with the same
> - argument. */
> -
> -static inline bool
> -same_close_phi_node (gphi *p1, gphi *p2)
> -{
> - return (types_compatible_p (TREE_TYPE (gimple_phi_result (p1)),
> - TREE_TYPE (gimple_phi_result (p2)))
> - && operand_equal_p (gimple_phi_arg_def (p1, 0),
> - gimple_phi_arg_def (p2, 0), 0));
> -}
> -
> -static void make_close_phi_nodes_unique (basic_block bb);
> -
> -/* Remove the close phi node at GSI and replace its rhs with the rhs
> - of PHI. */
> -
> -static void
> -remove_duplicate_close_phi (gphi *phi, gphi_iterator *gsi)
> -{
> - gimple *use_stmt;
> - use_operand_p use_p;
> - imm_use_iterator imm_iter;
> - tree res = gimple_phi_result (phi);
> - tree def = gimple_phi_result (gsi->phi ());
> -
> - gcc_assert (same_close_phi_node (phi, gsi->phi ()));
> -
> - FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
> - {
> - FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> - SET_USE (use_p, res);
> -
> - update_stmt (use_stmt);
> -
> - /* It is possible that we just created a duplicate close-phi
> - for an already-processed containing loop. Check for this
> - case and clean it up. */
> - if (gimple_code (use_stmt) == GIMPLE_PHI
> - && gimple_phi_num_args (use_stmt) == 1)
> - make_close_phi_nodes_unique (gimple_bb (use_stmt));
> - }
> -
> - remove_phi_node (gsi, true);
> -}
> -
> -/* Removes all the close phi duplicates from BB. */
> -
> -static void
> -make_close_phi_nodes_unique (basic_block bb)
> -{
> - gphi_iterator psi;
> -
> - for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
> - {
> - gphi_iterator gsi = psi;
> - gphi *phi = psi.phi ();
> -
> - /* At this point, PHI should be a close phi in normal form. */
> - gcc_assert (gimple_phi_num_args (phi) == 1);
> -
> - /* Iterate over the next phis and remove duplicates. */
> - gsi_next (&gsi);
> - while (!gsi_end_p (gsi))
> - if (same_close_phi_node (phi, gsi.phi ()))
> - remove_duplicate_close_phi (phi, &gsi);
> - else
> - gsi_next (&gsi);
> - }
> -}
> -
> -/* Return true when NAME is defined in LOOP. */
> -
> -static bool
> -defined_in_loop_p (tree name, loop_p loop)
> -{
> - gcc_assert (TREE_CODE (name) == SSA_NAME);
> - return loop == loop_containing_stmt (SSA_NAME_DEF_STMT (name));
> -}
> -
> /* Transforms LOOP to the canonical loop closed SSA form. */
>
> static void
> @@ -380,20 +300,22 @@ canonicalize_loop_closed_ssa (loop_p loo
> {
> edge e = single_exit (loop);
> basic_block bb;
> + gphi_iterator psi;
>
> if (!e || (e->flags & EDGE_COMPLEX))
> return;
>
> bb = e->dest;
>
> + /* Make the loop-close PHI node BB contain only PHIs and have a
> + single predecessor. */
> if (single_pred_p (bb))
> {
> e = split_block_after_labels (bb);
> - make_close_phi_nodes_unique (e->src);
> + bb = e->src;
> }
> else
> {
> - gphi_iterator psi;
> basic_block close = split_edge (e);
> e = single_succ_edge (close);
> for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
> @@ -404,7 +326,7 @@ canonicalize_loop_closed_ssa (loop_p loo
>
> /* Only add close phi nodes for SSA_NAMEs defined in LOOP. */
> if (TREE_CODE (arg) != SSA_NAME
> - || !defined_in_loop_p (arg, loop))
> + || loop_containing_stmt (SSA_NAME_DEF_STMT (arg)) != loop)
> continue;
>
> tree res = copy_ssa_name (arg);
> @@ -413,8 +335,30 @@ canonicalize_loop_closed_ssa (loop_p loo
> UNKNOWN_LOCATION);
> SET_USE (use_p, res);
> }
> + bb = close;
> + }
>
> - make_close_phi_nodes_unique (close);
> + /* Eliminate duplicates. This relies on processing loops from
> + innermost to outer. */
> + for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
> + {
> + gphi_iterator gsi = psi;
> + gphi *phi = psi.phi ();
> +
> + /* At this point, PHI should be a close phi in normal form. */
> + gcc_assert (gimple_phi_num_args (phi) == 1);
> +
> + /* Iterate over the next phis and remove duplicates. */
> + gsi_next (&gsi);
> + while (!gsi_end_p (gsi))
> + if (gimple_phi_arg_def (phi, 0) == gimple_phi_arg_def (gsi.phi (), 0))
> + {
> + replace_uses_by (gimple_phi_result (gsi.phi ()),
> + gimple_phi_result (phi));
> + remove_phi_node (&gsi, true);
> + }
> + else
> + gsi_next (&gsi);
> }
> }
>
> @@ -443,7 +387,7 @@ static void
> canonicalize_loop_closed_ssa_form (void)
> {
> loop_p loop;
> - FOR_EACH_LOOP (loop, 0)
> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> canonicalize_loop_closed_ssa (loop);
>
> checking_verify_loop_closed_ssa (true);
> @@ -509,6 +453,19 @@ graphite_transform_loops (void)
> "loop nest optimized\n");
> }
>
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + loop_p loop;
> + int num_no_dependency = 0;
> +
> + FOR_EACH_LOOP (loop, 0)
> + if (loop->can_be_parallel)
> + num_no_dependency++;
> +
> + fprintf (dump_file, "%d loops carried no dependency.\n",
> + num_no_dependency);
> + }
> +
> free_scops (scops);
> graphite_finalize (need_cfg_cleanup_p);
> the_isl_ctx = NULL;
> Index: gcc/testsuite/gcc.dg/graphite/scop-24.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/graphite/scop-24.c (nonexistent)
> +++ gcc/testsuite/gcc.dg/graphite/scop-24.c (working copy)
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -floop-nest-optimize" } */
> +
> +typedef struct _IO_FILE FILE;
> +extern struct _IO_FILE *stderr;
> +typedef float real;
> +typedef real rvec[3];
> +int rgbset (int);
> +void ps_box (int, int);
> +void plot_phi(char *fn,rvec box,int natoms,rvec x[],real phi[])
> +{
> + real phi_max,rr,gg,bb,fac,dx,x0,y0;
> + int i;
> + for(i=0; (i<natoms); i++)
> + phi_max=((phi_max > __builtin_fabs(phi[i]))
> + ? phi_max : __builtin_fabs(phi[i]));
> + if (__builtin_fabs(phi_max)<1.2e-38)
> + __builtin_fprintf(stderr, "X");
> + ps_box((real)(fac*box[0]-1),(real)(fac*box[1]-1));
> + for(i=0; (i<natoms); i++)
> + {
> + rr=gg=bb=1.0;
> + if (phi[i] < 0)
> + gg=bb=(1.0+(phi[i]/phi_max));
> + else
> + rr=gg=(1.0-(phi[i]/phi_max));
> + rr=rgbset(rr);
> + }
> +}
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-25 7:39 ` [PATCH][GRAPHITE] More TLC Richard Biener
@ 2017-09-25 12:46 ` Richard Biener
2017-09-25 12:55 ` Bin.Cheng
0 siblings, 1 reply; 26+ messages in thread
From: Richard Biener @ 2017-09-25 12:46 UTC (permalink / raw)
To: gcc-patches
On Mon, 25 Sep 2017, Richard Biener wrote:
> On Fri, 22 Sep 2017, Richard Biener wrote:
>
> >
> > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > TLC. It also adds a testcase I reduced from a stupid mistake I made
> > when reworking canonicalize_loop_closed_ssa.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >
> > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > -Ofast -march=haswell -floop-nest-optimize are
> >
> > 61 loop nests "optimized"
> > 45 loop nest transforms cancelled because of code generation issues
> > 21 loop nest optimizations timed out the 350000 ISL "operations" we allow
>
> Overall compile time (with -j6) is 695 sec. w/o -floop-nest-optimize
> and 709 sec. with (this was with release checking).
>
> A single-run has 416.gamess (580s -> 618s),
> 436.cactusADM (206s -> 182s), 437.leslie3d (228s ->218s),
> 450.soplex (229s -> 226s), 465.tonto (428s -> 425s), 401.bzip2 (383s ->
> 379s), 462.libquantum (352s -> 343s), ignoring +-2s changes. Will
> do a 3-run for those to confirm (it would be only a single regression
> for 416.gamess).
416.gamess regression confirmed, 450.soplex improvement as well,
in the three-run 462.libquantum regresses (344s -> 351s) so I suppose
that's noise.
Richard.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-25 12:46 ` Richard Biener
@ 2017-09-25 12:55 ` Bin.Cheng
2017-09-25 13:10 ` Richard Biener
0 siblings, 1 reply; 26+ messages in thread
From: Bin.Cheng @ 2017-09-25 12:55 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches List
On Mon, Sep 25, 2017 at 1:46 PM, Richard Biener <rguenther@suse.de> wrote:
> On Mon, 25 Sep 2017, Richard Biener wrote:
>
>> On Fri, 22 Sep 2017, Richard Biener wrote:
>>
>> >
>> > This simplifies canonicalize_loop_closed_ssa and does other minimal
>> > TLC. It also adds a testcase I reduced from a stupid mistake I made
>> > when reworking canonicalize_loop_closed_ssa.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>> >
>> > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
>> > -Ofast -march=haswell -floop-nest-optimize are
>> >
>> > 61 loop nests "optimized"
>> > 45 loop nest transforms cancelled because of code generation issues
>> > 21 loop nest optimizations timed out the 350000 ISL "operations" we allow
>>
>> Overall compile time (with -j6) is 695 sec. w/o -floop-nest-optimize
>> and 709 sec. with (this was with release checking).
>>
>> A single-run has 416.gamess (580s -> 618s),
>> 436.cactusADM (206s -> 182s), 437.leslie3d (228s ->218s),
>> 450.soplex (229s -> 226s), 465.tonto (428s -> 425s), 401.bzip2 (383s ->
>> 379s), 462.libquantum (352s -> 343s), ignoring +-2s changes. Will
>> do a 3-run for those to confirm (it would be only a single regression
>> for 416.gamess).
>
> 416.gamess regression confirmed, 450.soplex improvement as well,
436/437 improvements? 450.soplex (229s -> 226s) loops like noise.
Thanks,
bin
> in the three-run 462.libquantum regresses (344s -> 351s) so I suppose
> that's noise.
>
> Richard.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-25 12:55 ` Bin.Cheng
@ 2017-09-25 13:10 ` Richard Biener
0 siblings, 0 replies; 26+ messages in thread
From: Richard Biener @ 2017-09-25 13:10 UTC (permalink / raw)
To: Bin.Cheng; +Cc: gcc-patches List
On Mon, 25 Sep 2017, Bin.Cheng wrote:
> On Mon, Sep 25, 2017 at 1:46 PM, Richard Biener <rguenther@suse.de> wrote:
> > On Mon, 25 Sep 2017, Richard Biener wrote:
> >
> >> On Fri, 22 Sep 2017, Richard Biener wrote:
> >>
> >> >
> >> > This simplifies canonicalize_loop_closed_ssa and does other minimal
> >> > TLC. It also adds a testcase I reduced from a stupid mistake I made
> >> > when reworking canonicalize_loop_closed_ssa.
> >> >
> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >> >
> >> > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> >> > -Ofast -march=haswell -floop-nest-optimize are
> >> >
> >> > 61 loop nests "optimized"
> >> > 45 loop nest transforms cancelled because of code generation issues
> >> > 21 loop nest optimizations timed out the 350000 ISL "operations" we allow
> >>
> >> Overall compile time (with -j6) is 695 sec. w/o -floop-nest-optimize
> >> and 709 sec. with (this was with release checking).
> >>
> >> A single-run has 416.gamess (580s -> 618s),
> >> 436.cactusADM (206s -> 182s), 437.leslie3d (228s ->218s),
> >> 450.soplex (229s -> 226s), 465.tonto (428s -> 425s), 401.bzip2 (383s ->
> >> 379s), 462.libquantum (352s -> 343s), ignoring +-2s changes. Will
> >> do a 3-run for those to confirm (it would be only a single regression
> >> for 416.gamess).
> >
> > 416.gamess regression confirmed, 450.soplex improvement as well,
> 436/437 improvements? 450.soplex (229s -> 226s) loops like noise.
base is with -floop-nest-optimize, peak without.
416.gamess 19580 619 31.7 S 19580 576
34.0 *
416.gamess 19580 614 31.9 S 19580 577
33.9 S
416.gamess 19580 618 31.7 * 19580 576
34.0 S
436.cactusADM 11950 194 61.5 S 11950 204
58.5 S
436.cactusADM 11950 184 65.0 S 11950 187
63.8 *
436.cactusADM 11950 186 64.1 * 11950 186
64.1 S
437.leslie3d 9400 219 43.0 S 9400 218
43.1 S
437.leslie3d 9400 219 43.0 * 9400 223
42.1 S
437.leslie3d 9400 218 43.0 S 9400 223
42.2 *
450.soplex 8340 225 37.0 S 8340 231
36.1 S
450.soplex 8340 226 36.9 * 8340 230
36.3 *
450.soplex 8340 227 36.8 S 8340 229
36.4 S
465.tonto 9840 426 23.1 S 9840 427
23.0 *
465.tonto 9840 424 23.2 S 9840 430
22.9 S
465.tonto 9840 425 23.2 * 9840 425
23.2 S
401.bzip2 9650 379 25.5 S 9650 378
25.5 S
401.bzip2 9650 379 25.5 * 9650 380
25.4 *
401.bzip2 9650 379 25.5 S 9650 380
25.4 S
462.libquantum 20720 351 59.0 * 20720 349
59.4 S
462.libquantum 20720 351 59.0 S 20720 345
60.1 *
462.libquantum 20720 352 58.8 S 20720 344
60.2 S
> Thanks,
> bin
> > in the three-run 462.libquantum regresses (344s -> 351s) so I suppose
> > that's noise.
> >
> > Richard.
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-22 14:30 ` Sebastian Pop
@ 2017-09-25 13:12 ` Richard Biener
2017-09-26 14:20 ` Sebastian Pop
0 siblings, 1 reply; 26+ messages in thread
From: Richard Biener @ 2017-09-25 13:12 UTC (permalink / raw)
To: Sebastian Pop; +Cc: GCC Patches
On Fri, 22 Sep 2017, Sebastian Pop wrote:
> On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de> wrote:
>
> >
> > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > TLC. It also adds a testcase I reduced from a stupid mistake I made
> > when reworking canonicalize_loop_closed_ssa.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >
> > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > -Ofast -march=haswell -floop-nest-optimize are
> >
> > 61 loop nests "optimized"
> > 45 loop nest transforms cancelled because of code generation issues
> > 21 loop nest optimizations timed out the 350000 ISL "operations" we allow
> >
> > I say "optimized" because the usual transform I've seen is static tiling
> > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > There's no way to automagically figure what kind of transform ISL did
> >
>
> Here is how to automate (without magic) the detection
> of the transform that isl did.
>
> The problem solved by isl is the minimization of strides
> in memory, and to do this, we need to tell the isl scheduler
> the validity dependence graph, in graphite-optimize-isl.c
> see the validity (RAW, WAR, WAW) and the proximity
> (RAR + validity) maps. The proximity does include the
> read after read, as the isl scheduler needs to minimize
> strides between consecutive reads.
>
> When you apply the schedule to the dependence graph,
> one can tell from the result the strides in memory, a good
> way to say whether a transform was beneficial is to sum up
> all memory strides, and make sure that the sum of all strides
> decreases after transform. We could add a printf with the
> sum of strides before and after transforms, and have the
> testcases check for that.
Interesting. Can you perhaps show me in code how to do that?
Thanks,
Richard.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-25 13:12 ` Richard Biener
@ 2017-09-26 14:20 ` Sebastian Pop
2017-09-26 15:15 ` Sven Verdoolaege
2017-09-27 12:18 ` Richard Biener
0 siblings, 2 replies; 26+ messages in thread
From: Sebastian Pop @ 2017-09-26 14:20 UTC (permalink / raw)
To: Richard Biener, Sven Verdoolaege; +Cc: GCC Patches
On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 22 Sep 2017, Sebastian Pop wrote:
>
> > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de>
> wrote:
> >
> > >
> > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > TLC. It also adds a testcase I reduced from a stupid mistake I made
> > > when reworking canonicalize_loop_closed_ssa.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > >
> > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > -Ofast -march=haswell -floop-nest-optimize are
> > >
> > > 61 loop nests "optimized"
> > > 45 loop nest transforms cancelled because of code generation issues
> > > 21 loop nest optimizations timed out the 350000 ISL "operations" we
> allow
> > >
> > > I say "optimized" because the usual transform I've seen is static
> tiling
> > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > There's no way to automagically figure what kind of transform ISL did
> > >
> >
> > Here is how to automate (without magic) the detection
> > of the transform that isl did.
> >
> > The problem solved by isl is the minimization of strides
> > in memory, and to do this, we need to tell the isl scheduler
> > the validity dependence graph, in graphite-optimize-isl.c
> > see the validity (RAW, WAR, WAW) and the proximity
> > (RAR + validity) maps. The proximity does include the
> > read after read, as the isl scheduler needs to minimize
> > strides between consecutive reads.
> >
> > When you apply the schedule to the dependence graph,
> > one can tell from the result the strides in memory, a good
> > way to say whether a transform was beneficial is to sum up
> > all memory strides, and make sure that the sum of all strides
> > decreases after transform. We could add a printf with the
> > sum of strides before and after transforms, and have the
> > testcases check for that.
>
> Interesting. Can you perhaps show me in code how to do that?
>
>
Sven, is there already a function that computes the sum of all
strides in a proximity map? Maybe you have code that does
something similar in pet or ppcg?
Thanks,
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-26 14:20 ` Sebastian Pop
@ 2017-09-26 15:15 ` Sven Verdoolaege
2017-09-28 18:29 ` Sebastian Pop
2017-09-27 12:18 ` Richard Biener
1 sibling, 1 reply; 26+ messages in thread
From: Sven Verdoolaege @ 2017-09-26 15:15 UTC (permalink / raw)
To: Sebastian Pop; +Cc: Richard Biener, GCC Patches
On Tue, Sep 26, 2017 at 09:19:50AM -0500, Sebastian Pop wrote:
> Sven, is there already a function that computes the sum of all
> strides in a proximity map? Maybe you have code that does
> something similar in pet or ppcg?
What exactly do you want to sum?
If this involves any counting, then it cannot currently
be done in pet or ppcg since isl does not support counting yet
and the public version of barvinok is GPL licensed.
Also, it's better to ask such questions on the isl mailing list
isl-development@googlegroups.com
skimo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-26 14:20 ` Sebastian Pop
2017-09-26 15:15 ` Sven Verdoolaege
@ 2017-09-27 12:18 ` Richard Biener
2017-09-27 13:04 ` Richard Biener
` (3 more replies)
1 sibling, 4 replies; 26+ messages in thread
From: Richard Biener @ 2017-09-27 12:18 UTC (permalink / raw)
To: Sebastian Pop; +Cc: Sven Verdoolaege, GCC Patches
On Tue, 26 Sep 2017, Sebastian Pop wrote:
> On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguenther@suse.de> wrote:
>
> > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> >
> > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de>
> > wrote:
> > >
> > > >
> > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > TLC. It also adds a testcase I reduced from a stupid mistake I made
> > > > when reworking canonicalize_loop_closed_ssa.
> > > >
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > >
> > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > -Ofast -march=haswell -floop-nest-optimize are
> > > >
> > > > 61 loop nests "optimized"
> > > > 45 loop nest transforms cancelled because of code generation issues
> > > > 21 loop nest optimizations timed out the 350000 ISL "operations" we
> > allow
> > > >
> > > > I say "optimized" because the usual transform I've seen is static
> > tiling
> > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > There's no way to automagically figure what kind of transform ISL did
> > > >
> > >
> > > Here is how to automate (without magic) the detection
> > > of the transform that isl did.
> > >
> > > The problem solved by isl is the minimization of strides
> > > in memory, and to do this, we need to tell the isl scheduler
> > > the validity dependence graph, in graphite-optimize-isl.c
> > > see the validity (RAW, WAR, WAW) and the proximity
> > > (RAR + validity) maps. The proximity does include the
> > > read after read, as the isl scheduler needs to minimize
> > > strides between consecutive reads.
Ah, so I now see why we do not perform interchange on trivial cases like
double A[1024][1024], B[1024][1024];
void foo(void)
{
for (int i = 0; i < 1024; ++i)
for (int j = 0; j < 1024; ++j)
A[j][i] = B[j][i];
}
which is probably because
/* FIXME: proximity should not be validity. */
isl_union_map *proximity = isl_union_map_copy (validity);
falls apart when there is _no_ dependence?
I can trick GRAPHITE into performing the interchange for
double A[1024][1024], B[1024][1024];
void foo(void)
{
for (int i = 1; i < 1023; ++i)
for (int j = 0; j < 1024; ++j)
A[j][i] = B[j][i-1] + A[j][i+1];
}
because now there is a dependence. Any idea on how to rewrite
scop_get_dependences to avoid "simplifying"? I suppose the
validity constraints _do_ also specify kind-of a proximity
we just may not prune / optimize them in the same way as
dependences?
Richard.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-27 12:18 ` Richard Biener
@ 2017-09-27 13:04 ` Richard Biener
2017-09-27 14:33 ` Richard Biener
2017-09-28 19:00 ` Sebastian Pop
2017-09-28 18:46 ` Sebastian Pop
` (2 subsequent siblings)
3 siblings, 2 replies; 26+ messages in thread
From: Richard Biener @ 2017-09-27 13:04 UTC (permalink / raw)
To: Sebastian Pop; +Cc: Sven Verdoolaege, GCC Patches
On Wed, 27 Sep 2017, Richard Biener wrote:
> On Tue, 26 Sep 2017, Sebastian Pop wrote:
>
> > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguenther@suse.de> wrote:
> >
> > > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> > >
> > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de>
> > > wrote:
> > > >
> > > > >
> > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > > TLC. It also adds a testcase I reduced from a stupid mistake I made
> > > > > when reworking canonicalize_loop_closed_ssa.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > > >
> > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > > -Ofast -march=haswell -floop-nest-optimize are
> > > > >
> > > > > 61 loop nests "optimized"
> > > > > 45 loop nest transforms cancelled because of code generation issues
> > > > > 21 loop nest optimizations timed out the 350000 ISL "operations" we
> > > allow
> > > > >
> > > > > I say "optimized" because the usual transform I've seen is static
> > > tiling
> > > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > > There's no way to automagically figure what kind of transform ISL did
> > > > >
> > > >
> > > > Here is how to automate (without magic) the detection
> > > > of the transform that isl did.
> > > >
> > > > The problem solved by isl is the minimization of strides
> > > > in memory, and to do this, we need to tell the isl scheduler
> > > > the validity dependence graph, in graphite-optimize-isl.c
> > > > see the validity (RAW, WAR, WAW) and the proximity
> > > > (RAR + validity) maps. The proximity does include the
> > > > read after read, as the isl scheduler needs to minimize
> > > > strides between consecutive reads.
>
> Ah, so I now see why we do not perform interchange on trivial cases like
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
> for (int i = 0; i < 1024; ++i)
> for (int j = 0; j < 1024; ++j)
> A[j][i] = B[j][i];
> }
>
> which is probably because
>
> /* FIXME: proximity should not be validity. */
> isl_union_map *proximity = isl_union_map_copy (validity);
>
> falls apart when there is _no_ dependence?
>
> I can trick GRAPHITE into performing the interchange for
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
> for (int i = 1; i < 1023; ++i)
> for (int j = 0; j < 1024; ++j)
> A[j][i] = B[j][i-1] + A[j][i+1];
> }
>
> because now there is a dependence. Any idea on how to rewrite
> scop_get_dependences to avoid "simplifying"? I suppose the
> validity constraints _do_ also specify kind-of a proximity
> we just may not prune / optimize them in the same way as
> dependences?
Another thing I notice is that we don't handle the multi-dimensional
accesses the fortran frontend produces:
(gdb) p debug_data_reference (dr)
#(Data Ref:
# bb: 18
# stmt: _43 = *a_141(D)[_42];
# ref: *a_141(D)[_42];
# base_object: *a_141(D);
# Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +,
stride.88_115}_5
ultimatively we fail here because we try to build a constraint for
{{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5
which ends up computing isl_pw_aff_mul (A, stride.88_115) with
A being the non-constant constraint generated for
{(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being
a parameter. ISL doesn't like that multiplication as the result
isn't affine (well - it is, we just have parameters in there).
I suppose ISL doesn't handle this form of accesses given the
two "dimensions" in this scalarized form may overlap? So we'd
really need to turn those into references with different access
functions (even if that's not 100% a valid semantic transformation
as scalarization isn't reversible without extra information)?
Thanks,
Richard.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-27 13:04 ` Richard Biener
@ 2017-09-27 14:33 ` Richard Biener
2017-09-28 19:11 ` Sebastian Pop
2017-09-28 19:00 ` Sebastian Pop
1 sibling, 1 reply; 26+ messages in thread
From: Richard Biener @ 2017-09-27 14:33 UTC (permalink / raw)
To: Sebastian Pop; +Cc: Sven Verdoolaege, GCC Patches
On Wed, 27 Sep 2017, Richard Biener wrote:
> On Wed, 27 Sep 2017, Richard Biener wrote:
>
> > On Tue, 26 Sep 2017, Sebastian Pop wrote:
> >
> > > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> > > >
> > > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de>
> > > > wrote:
> > > > >
> > > > > >
> > > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > > > TLC. It also adds a testcase I reduced from a stupid mistake I made
> > > > > > when reworking canonicalize_loop_closed_ssa.
> > > > > >
> > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > > > >
> > > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > > > -Ofast -march=haswell -floop-nest-optimize are
> > > > > >
> > > > > > 61 loop nests "optimized"
> > > > > > 45 loop nest transforms cancelled because of code generation issues
> > > > > > 21 loop nest optimizations timed out the 350000 ISL "operations" we
> > > > allow
> > > > > >
> > > > > > I say "optimized" because the usual transform I've seen is static
> > > > tiling
> > > > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > > > There's no way to automagically figure what kind of transform ISL did
> > > > > >
> > > > >
> > > > > Here is how to automate (without magic) the detection
> > > > > of the transform that isl did.
> > > > >
> > > > > The problem solved by isl is the minimization of strides
> > > > > in memory, and to do this, we need to tell the isl scheduler
> > > > > the validity dependence graph, in graphite-optimize-isl.c
> > > > > see the validity (RAW, WAR, WAW) and the proximity
> > > > > (RAR + validity) maps. The proximity does include the
> > > > > read after read, as the isl scheduler needs to minimize
> > > > > strides between consecutive reads.
> >
> > Ah, so I now see why we do not perform interchange on trivial cases like
> >
> > double A[1024][1024], B[1024][1024];
> >
> > void foo(void)
> > {
> > for (int i = 0; i < 1024; ++i)
> > for (int j = 0; j < 1024; ++j)
> > A[j][i] = B[j][i];
> > }
> >
> > which is probably because
> >
> > /* FIXME: proximity should not be validity. */
> > isl_union_map *proximity = isl_union_map_copy (validity);
> >
> > falls apart when there is _no_ dependence?
> >
> > I can trick GRAPHITE into performing the interchange for
> >
> > double A[1024][1024], B[1024][1024];
> >
> > void foo(void)
> > {
> > for (int i = 1; i < 1023; ++i)
> > for (int j = 0; j < 1024; ++j)
> > A[j][i] = B[j][i-1] + A[j][i+1];
> > }
> >
> > because now there is a dependence. Any idea on how to rewrite
> > scop_get_dependences to avoid "simplifying"? I suppose the
> > validity constraints _do_ also specify kind-of a proximity
> > we just may not prune / optimize them in the same way as
> > dependences?
>
> Another thing I notice is that we don't handle the multi-dimensional
> accesses the fortran frontend produces:
>
> (gdb) p debug_data_reference (dr)
> #(Data Ref:
> # bb: 18
> # stmt: _43 = *a_141(D)[_42];
> # ref: *a_141(D)[_42];
> # base_object: *a_141(D);
> # Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +,
> stride.88_115}_5
>
> ultimatively we fail here because we try to build a constraint for
>
> {{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5
>
> which ends up computing isl_pw_aff_mul (A, stride.88_115) with
> A being the non-constant constraint generated for
> {(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being
> a parameter. ISL doesn't like that multiplication as the result
> isn't affine (well - it is, we just have parameters in there).
>
> I suppose ISL doesn't handle this form of accesses given the
> two "dimensions" in this scalarized form may overlap? So we'd
> really need to turn those into references with different access
> functions (even if that's not 100% a valid semantic transformation
> as scalarization isn't reversible without extra information)?
Looks like even when hacking the Fortran FE to produce nested
ARRAY_REFs we run into the same issue for
(gdb) p debug_data_reference (dr)
#(Data Ref:
# bb: 17
# stmt:
VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D))[_24]{lb:
1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1
sz: 8} = 0.0;
# ref:
VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D))[_24]{lb:
1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1
sz: 8};
# base_object:
VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D));
# Access function 0: {1, +, 1}_4
# Access function 1: (integer(kind=8)) {(unsigned long) stride.88_92, +,
(unsigned long) stride.88_92}_3;
# Access function 2: (integer(kind=8)) {(unsigned long) stride.90_96, +,
(unsigned long) stride.90_96}_2;
# Access function 3: (integer(kind=8)) {(unsigned long) stride.92_100, +,
(unsigned long) stride.92_100}_1;
so it looks like simple strided (where stride is a parameter) access
is not handled either.
GCCs dependence analysis can at least compute distances of two
DRs when the difference of the access CHRECs is constant. Within
the polyhedral model those cases cannot be handled?
Richard.
> Thanks,
> Richard.
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-26 15:15 ` Sven Verdoolaege
@ 2017-09-28 18:29 ` Sebastian Pop
0 siblings, 0 replies; 26+ messages in thread
From: Sebastian Pop @ 2017-09-28 18:29 UTC (permalink / raw)
To: Sven Verdoolaege, isl-development; +Cc: Richard Biener, GCC Patches
Hi skimo,
On Tue, Sep 26, 2017 at 10:15 AM, Sven Verdoolaege <
sven.verdoolaege@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 09:19:50AM -0500, Sebastian Pop wrote:
> > Sven, is there already a function that computes the sum of all
> > strides in a proximity map? Maybe you have code that does
> > something similar in pet or ppcg?
>
> What exactly do you want to sum?
If this involves any counting, then it cannot currently
>
I think that it does involve counting: we need to know
the distance between all pairs of array accesses, that is the
number of points in the dependence polyhedron.
> be done in pet or ppcg since isl does not support counting yet
> and the public version of barvinok is GPL licensed.
>
> Also, it's better to ask such questions on the isl mailing list
> isl-development@googlegroups.com
>
>
We are trying to find a metric that shows that isl's scheduler
did a useful transform. Something like a diff tool that shows
before and after scheduling the strides of array accesses.
Could the isl scheduler output a description of what it did?
We would like to use that output to build testcases that match
the behavior of the compiler on different patterns.
Thanks,
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-27 12:18 ` Richard Biener
2017-09-27 13:04 ` Richard Biener
@ 2017-09-28 18:46 ` Sebastian Pop
2017-09-29 19:34 ` isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC) Sven Verdoolaege
2017-09-29 19:37 ` Sven Verdoolaege
3 siblings, 0 replies; 26+ messages in thread
From: Sebastian Pop @ 2017-09-28 18:46 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
On Wed, Sep 27, 2017 at 7:18 AM, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 26 Sep 2017, Sebastian Pop wrote:
>
> > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguenther@suse.de>
> wrote:
> >
> > > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> > >
> > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de>
> > > wrote:
> > > >
> > > > >
> > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > > TLC. It also adds a testcase I reduced from a stupid mistake I
> made
> > > > > when reworking canonicalize_loop_closed_ssa.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to
> trunk.
> > > > >
> > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > > -Ofast -march=haswell -floop-nest-optimize are
> > > > >
> > > > > 61 loop nests "optimized"
> > > > > 45 loop nest transforms cancelled because of code generation
> issues
> > > > > 21 loop nest optimizations timed out the 350000 ISL "operations"
> we
> > > allow
> > > > >
> > > > > I say "optimized" because the usual transform I've seen is static
> > > tiling
> > > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > > There's no way to automagically figure what kind of transform ISL
> did
> > > > >
> > > >
> > > > Here is how to automate (without magic) the detection
> > > > of the transform that isl did.
> > > >
> > > > The problem solved by isl is the minimization of strides
> > > > in memory, and to do this, we need to tell the isl scheduler
> > > > the validity dependence graph, in graphite-optimize-isl.c
> > > > see the validity (RAW, WAR, WAW) and the proximity
> > > > (RAR + validity) maps. The proximity does include the
> > > > read after read, as the isl scheduler needs to minimize
> > > > strides between consecutive reads.
>
> Ah, so I now see why we do not perform interchange on trivial cases like
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
> for (int i = 0; i < 1024; ++i)
> for (int j = 0; j < 1024; ++j)
> A[j][i] = B[j][i];
> }
>
> which is probably because
>
> /* FIXME: proximity should not be validity. */
> isl_union_map *proximity = isl_union_map_copy (validity);
>
> falls apart when there is _no_ dependence?
>
You are right. The proximity needs to account for spatial
locality as well if you want to interchange the loop.
To describe the spatial locality, I would recommend adding
to the proximity relation the array accesses from two
successive iterations of the innermost loop:
A[j][i] -> A[j][i+1] and B[j][i] -> B[j][i+1]
With these two extra relations in the proximity map,
isl should be able to interchange the above loop.
>
> I can trick GRAPHITE into performing the interchange for
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
> for (int i = 1; i < 1023; ++i)
> for (int j = 0; j < 1024; ++j)
> A[j][i] = B[j][i-1] + A[j][i+1];
> }
>
> because now there is a dependence. Any idea on how to rewrite
> scop_get_dependences to avoid "simplifying"? I suppose the
> validity constraints _do_ also specify kind-of a proximity
>
Correct: the validity map specifies a subset (it is missing
RAR dependences) of data reuse.
> we just may not prune / optimize them in the same way as
> dependences?
>
Validity constraints are there to "keep the wind blowing
in the same direction" after the transform (otherwise the
result of the transformed computation may be wrong.)
The proximity map should contain a description of
- reuse of memory (temporal locality)
- how close together the access elements are (spatial locality.)
isl will optimize for both if the proximity map has a description
of both.
For the moment the proximity map is initialized only with the
current validity constraints, as you quoted the FIXME comment,
which would only describe a subset of the temporal locality.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-27 13:04 ` Richard Biener
2017-09-27 14:33 ` Richard Biener
@ 2017-09-28 19:00 ` Sebastian Pop
1 sibling, 0 replies; 26+ messages in thread
From: Sebastian Pop @ 2017-09-28 19:00 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
On Wed, Sep 27, 2017 at 8:04 AM, Richard Biener <rguenther@suse.de> wrote:
>
> Another thing I notice is that we don't handle the multi-dimensional
> accesses the fortran frontend produces:
>
> (gdb) p debug_data_reference (dr)
> #(Data Ref:
> # bb: 18
> # stmt: _43 = *a_141(D)[_42];
> # ref: *a_141(D)[_42];
> # base_object: *a_141(D);
> # Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +,
> stride.88_115}_5
>
> ultimatively we fail here because we try to build a constraint for
>
> {{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5
>
> which ends up computing isl_pw_aff_mul (A, stride.88_115) with
> A being the non-constant constraint generated for
> {(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being
> a parameter. ISL doesn't like that multiplication as the result
> isn't affine (well - it is, we just have parameters in there).
>
> I suppose ISL doesn't handle this form of accesses given the
> two "dimensions" in this scalarized form may overlap? So we'd
> really need to turn those into references with different access
> functions (even if that's not 100% a valid semantic transformation
> as scalarization isn't reversible without extra information)?
You are right.
This multivariate memory access would be better handled in
delinearized form:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66981
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14741
There are two ways to handle this issue:
- fix the FORTRAN front-end to emit multi dimensions ARRAY_REFs,
- implement an array delinearization pass, as I implemented in LLVM
http://llvm.org/doxygen/Delinearization_8cpp_source.html
"On Recovering Multi-Dimensional Arrays in Polly"
http://impact.gforge.inria.fr/impact2015/papers/impact2015-grosser.pdf
"Optimistic Delinearization of Parametrically Sized Arrays"
https://dl.acm.org/citation.cfm?id=2751248
LLVM does not have an equivalent for multi-dim ARRAY_REF description
it only reasons about linearized memory accesses like in GCC's RTL:
gep = Get Element Pointer, so we had no other option than to delinearize.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-27 14:33 ` Richard Biener
@ 2017-09-28 19:11 ` Sebastian Pop
2017-09-29 11:17 ` Richard Biener
0 siblings, 1 reply; 26+ messages in thread
From: Sebastian Pop @ 2017-09-28 19:11 UTC (permalink / raw)
To: Richard Biener; +Cc: Sven Verdoolaege, GCC Patches
On Wed, Sep 27, 2017 at 9:33 AM, Richard Biener <rguenther@suse.de> wrote:
> Looks like even when hacking the Fortran FE to produce nested
> ARRAY_REFs we run into the same issue for
>
> (gdb) p debug_data_reference (dr)
> #(Data Ref:
> # bb: 17
> # stmt:
> VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D))[_24]{lb:
> 1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1
> sz: 8} = 0.0;
> # ref:
> VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D))[_24]{lb:
> 1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1
> sz: 8};
> # base_object:
> VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D));
> # Access function 0: {1, +, 1}_4
> # Access function 1: (integer(kind=8)) {(unsigned long) stride.88_92, +,
> (unsigned long) stride.88_92}_3;
> # Access function 2: (integer(kind=8)) {(unsigned long) stride.90_96, +,
> (unsigned long) stride.90_96}_2;
> # Access function 3: (integer(kind=8)) {(unsigned long) stride.92_100, +,
> (unsigned long) stride.92_100}_1;
>
> so it looks like simple strided (where stride is a parameter) access
> is not handled either.
Yes, this is the first option I was mentioning: it could work,
could you please make sure that you don't have a bug in the "hack patch"
where the outer dimension should not contain the parameter
(inner array dimension) times the access function.
Example in C:
int A[100][N];
A[i][j] is linearized as *(A + i * N * 4 + j * 4)
and you may have a bug if you delinearized it in the Fortran FE as A[i * N][j]
Could you please check that it would delinearize back to A[i][j]?
>
> GCCs dependence analysis can at least compute distances of two
> DRs when the difference of the access CHRECs is constant. Within
> the polyhedral model those cases cannot be handled?
The difficulty for the polyhedral model is in the representation
of a multiplication of parameter times loop index variable.
The delinearization removes these difficulties by creating linear expressions.
Think about multiplication as something introducing exponentiality
and you realize that any such expression would not fit in the
linear model of polyhedra.
A parameter is nothing else than an outer loop index to which we don't
have access to that loop level as it may be outside the current function
in which we get that parameter in.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-28 19:11 ` Sebastian Pop
@ 2017-09-29 11:17 ` Richard Biener
2017-09-29 14:59 ` Sebastian Pop
0 siblings, 1 reply; 26+ messages in thread
From: Richard Biener @ 2017-09-29 11:17 UTC (permalink / raw)
To: Sebastian Pop; +Cc: Sven Verdoolaege, GCC Patches
On Thu, 28 Sep 2017, Sebastian Pop wrote:
> On Wed, Sep 27, 2017 at 9:33 AM, Richard Biener <rguenther@suse.de> wrote:
> > Looks like even when hacking the Fortran FE to produce nested
> > ARRAY_REFs we run into the same issue for
> >
> > (gdb) p debug_data_reference (dr)
> > #(Data Ref:
> > # bb: 17
> > # stmt:
> > VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D))[_24]{lb:
> > 1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1
> > sz: 8} = 0.0;
> > # ref:
> > VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D))[_24]{lb:
> > 1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1
> > sz: 8};
> > # base_object:
> > VIEW_CONVERT_EXPR<real(kind=8)[1:ubound.91][1:ubound.89][1:ubound.87][1:ubound.86]>(*y_117(D));
> > # Access function 0: {1, +, 1}_4
> > # Access function 1: (integer(kind=8)) {(unsigned long) stride.88_92, +,
> > (unsigned long) stride.88_92}_3;
> > # Access function 2: (integer(kind=8)) {(unsigned long) stride.90_96, +,
> > (unsigned long) stride.90_96}_2;
> > # Access function 3: (integer(kind=8)) {(unsigned long) stride.92_100, +,
> > (unsigned long) stride.92_100}_1;
> >
> > so it looks like simple strided (where stride is a parameter) access
> > is not handled either.
>
> Yes, this is the first option I was mentioning: it could work,
> could you please make sure that you don't have a bug in the "hack patch"
> where the outer dimension should not contain the parameter
> (inner array dimension) times the access function.
>
> Example in C:
> int A[100][N];
> A[i][j] is linearized as *(A + i * N * 4 + j * 4)
> and you may have a bug if you delinearized it in the Fortran FE as A[i * N][j]
> Could you please check that it would delinearize back to A[i][j]?
I fixed the "hack patch" somewhat but realized it's not really possible
ATM to get back at this form because the array descriptor contains only
information to generate the linearized form. So while I get now correct
values they end up with runtime divisions which aren't handled by
SCEV.
I fear emitting delinearized code from the Fortran FE would be an
ABI breakage as we'd have to change the array descriptor contents.
> >
> > GCCs dependence analysis can at least compute distances of two
> > DRs when the difference of the access CHRECs is constant. Within
> > the polyhedral model those cases cannot be handled?
>
> The difficulty for the polyhedral model is in the representation
> of a multiplication of parameter times loop index variable.
> The delinearization removes these difficulties by creating linear expressions.
>
> Think about multiplication as something introducing exponentiality
> and you realize that any such expression would not fit in the
> linear model of polyhedra.
> A parameter is nothing else than an outer loop index to which we don't
> have access to that loop level as it may be outside the current function
> in which we get that parameter in.
Yeah, I see that now.
Richard.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH][GRAPHITE] More TLC
2017-09-29 11:17 ` Richard Biener
@ 2017-09-29 14:59 ` Sebastian Pop
0 siblings, 0 replies; 26+ messages in thread
From: Sebastian Pop @ 2017-09-29 14:59 UTC (permalink / raw)
To: Richard Biener; +Cc: Sven Verdoolaege, GCC Patches
On Fri, Sep 29, 2017 at 6:17 AM, Richard Biener <rguenther@suse.de> wrote:
> I fixed the "hack patch" somewhat but realized it's not really possible
> ATM to get back at this form because the array descriptor contains only
> information to generate the linearized form. So while I get now correct
> values they end up with runtime divisions which aren't handled by
> SCEV.
You are right, SCEV has some limits on representing and folding
those division expressions.
There is a proposal in LLVM from Johannes Doerfert
https://reviews.llvm.org/D38255
to use isl as a representation and expression folder instead of the
chains of recurrences for the scalar evolution analysis. isl would
be able to handle some of the semantics of the div_exprs, and the
semantics of wrap-around variables, and of course it would have
some other limits to represent multiplications (as we spoke about
yesterday, i * N or M * N for example,) and thus that polyhedral
analysis would need to rely on the delinearization of array indices.
^ permalink raw reply [flat|nested] 26+ messages in thread
* isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC)
2017-09-27 12:18 ` Richard Biener
2017-09-27 13:04 ` Richard Biener
2017-09-28 18:46 ` Sebastian Pop
@ 2017-09-29 19:34 ` Sven Verdoolaege
2017-09-29 19:37 ` Sven Verdoolaege
3 siblings, 0 replies; 26+ messages in thread
From: Sven Verdoolaege @ 2017-09-29 19:34 UTC (permalink / raw)
To: Richard Biener; +Cc: Sebastian Pop, GCC Patches, Oleksandr Zinenko
On Wed, Sep 27, 2017 at 02:18:51PM +0200, Richard Biener wrote:
> Ah, so I now see why we do not perform interchange on trivial cases like
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
> for (int i = 0; i < 1024; ++i)
> for (int j = 0; j < 1024; ++j)
> A[j][i] = B[j][i];
> }
I didn't see you mentioning _why_ you expect an interchange here.
Are you prehaps interested in spatial locality?
If so, then there are several approaches for taking
that into account.
- pluto performs an intra-tile loop interchange to
improve temporal and/or spatial locality. It shouldn't
be too hard to do something similar on an isl generated
schedule
- Alex (Oleksandr) has been working on an extension of
the isl scheduler that takes into account spatial locality.
I'm not sure if it's publicly available.
- I've been working on a special case of spatial locality
(consecutivity). The current version is available in
the consecutivity branch. Note that it may get rebased and
it may not necessarily get merged into master.
There are also other approaches, but they may not be that
easy to combine with the isl scheduler.
skimo
^ permalink raw reply [flat|nested] 26+ messages in thread
* isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC)
2017-09-27 12:18 ` Richard Biener
` (2 preceding siblings ...)
2017-09-29 19:34 ` isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC) Sven Verdoolaege
@ 2017-09-29 19:37 ` Sven Verdoolaege
2017-09-29 19:59 ` Sebastian Pop
3 siblings, 1 reply; 26+ messages in thread
From: Sven Verdoolaege @ 2017-09-29 19:37 UTC (permalink / raw)
To: Richard Biener; +Cc: Sebastian Pop, GCC Patches, Oleksandr Zinenko
[Sorry for the resend; I used the wrong email address to CC Alex]
On Wed, Sep 27, 2017 at 02:18:51PM +0200, Richard Biener wrote:
> Ah, so I now see why we do not perform interchange on trivial cases like
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
> for (int i = 0; i < 1024; ++i)
> for (int j = 0; j < 1024; ++j)
> A[j][i] = B[j][i];
> }
I didn't see you mentioning _why_ you expect an interchange here.
Are you prehaps interested in spatial locality?
If so, then there are several approaches for taking
that into account.
- pluto performs an intra-tile loop interchange to
improve temporal and/or spatial locality. It shouldn't
be too hard to do something similar on an isl generated
schedule
- Alex (Oleksandr) has been working on an extension of
the isl scheduler that takes into account spatial locality.
I'm not sure if it's publicly available.
- I've been working on a special case of spatial locality
(consecutivity). The current version is available in
the consecutivity branch. Note that it may get rebased and
it may not necessarily get merged into master.
There are also other approaches, but they may not be that
easy to combine with the isl scheduler.
skimo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC)
2017-09-29 19:37 ` Sven Verdoolaege
@ 2017-09-29 19:59 ` Sebastian Pop
2017-09-29 21:28 ` Oleksandr Zinenko
2017-09-30 17:47 ` Richard Biener
0 siblings, 2 replies; 26+ messages in thread
From: Sebastian Pop @ 2017-09-29 19:59 UTC (permalink / raw)
To: Sven Verdoolaege; +Cc: Richard Biener, GCC Patches, Oleksandr Zinenko
On Fri, Sep 29, 2017 at 2:37 PM, Sven Verdoolaege
<sven.verdoolaege@gmail.com> wrote:
> [Sorry for the resend; I used the wrong email address to CC Alex]
>
> On Wed, Sep 27, 2017 at 02:18:51PM +0200, Richard Biener wrote:
>> Ah, so I now see why we do not perform interchange on trivial cases like
>>
>> double A[1024][1024], B[1024][1024];
>>
>> void foo(void)
>> {
>> for (int i = 0; i < 1024; ++i)
>> for (int j = 0; j < 1024; ++j)
>> A[j][i] = B[j][i];
>> }
>
> I didn't see you mentioning _why_ you expect an interchange here.
> Are you prehaps interested in spatial locality?
> If so, then there are several approaches for taking
> that into account.
> - pluto performs an intra-tile loop interchange to
> improve temporal and/or spatial locality. It shouldn't
> be too hard to do something similar on an isl generated
> schedule
> - Alex (Oleksandr) has been working on an extension of
> the isl scheduler that takes into account spatial locality.
> I'm not sure if it's publicly available.
> - I've been working on a special case of spatial locality
> (consecutivity). The current version is available in
> the consecutivity branch. Note that it may get rebased and
> it may not necessarily get merged into master.
>
> There are also other approaches, but they may not be that
> easy to combine with the isl scheduler.
Would the following work?
Add to the proximity relation the array accesses from two
successive iterations of the innermost loop:
A[j][i] -> A[j][i+1] and B[j][i] -> B[j][i+1]
With these two extra relations in the proximity map,
isl should be able to interchange the above loop.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC)
2017-09-29 19:59 ` Sebastian Pop
@ 2017-09-29 21:28 ` Oleksandr Zinenko
2017-09-30 17:47 ` Richard Biener
1 sibling, 0 replies; 26+ messages in thread
From: Oleksandr Zinenko @ 2017-09-29 21:28 UTC (permalink / raw)
To: Sebastian Pop, Sven Verdoolaege; +Cc: Richard Biener, GCC Patches
On 29/09/17 21:58, Sebastian Pop wrote:
> On Fri, Sep 29, 2017 at 2:37 PM, Sven Verdoolaege
> <sven.verdoolaege@gmail.com> wrote:
>> [Sorry for the resend; I used the wrong email address to CC Alex]
>>
>> On Wed, Sep 27, 2017 at 02:18:51PM +0200, Richard Biener wrote:
>>> Ah, so I now see why we do not perform interchange on trivial cases like
>>>
>>> double A[1024][1024], B[1024][1024];
>>>
>>> void foo(void)
>>> {
>>> for (int i = 0; i < 1024; ++i)
>>> for (int j = 0; j < 1024; ++j)
>>> A[j][i] = B[j][i];
>>> }
>> I didn't see you mentioning _why_ you expect an interchange here.
>> Are you prehaps interested in spatial locality?
>> If so, then there are several approaches for taking
>> that into account.
>> - pluto performs an intra-tile loop interchange to
>> improve temporal and/or spatial locality. It shouldn't
>> be too hard to do something similar on an isl generated
>> schedule
>> - Alex (Oleksandr) has been working on an extension of
>> the isl scheduler that takes into account spatial locality.
>> I'm not sure if it's publicly available.
>> - I've been working on a special case of spatial locality
>> (consecutivity). The current version is available in
>> the consecutivity branch. Note that it may get rebased and
>> it may not necessarily get merged into master.
>>
>> There are also other approaches, but they may not be that
>> easy to combine with the isl scheduler.
> Would the following work?
> Add to the proximity relation the array accesses from two
> successive iterations of the innermost loop:
> A[j][i] -> A[j][i+1] and B[j][i] -> B[j][i+1]
> With these two extra relations in the proximity map,
> isl should be able to interchange the above loop.
>
> Sebastian
Hi,
this looks very close to what we do for spatial locality in the
scheduler, except that we separate proximity and "spatial proximity"
maps. There is a couple of caveats in just plugging those into
proximity, in particular resolving conflicts between spatial and
temporal locality and unnecessary skewing.
Cheers,
Alex
--
Oleksandr Zinenko,
Inria / Ãcole Normale Supérieure,
contact@ozinenko.com, oleksandr.zinenko@inria.fr
https://www.ozinenko.com
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC)
2017-09-29 19:59 ` Sebastian Pop
2017-09-29 21:28 ` Oleksandr Zinenko
@ 2017-09-30 17:47 ` Richard Biener
2017-10-01 9:58 ` Sven Verdoolaege
1 sibling, 1 reply; 26+ messages in thread
From: Richard Biener @ 2017-09-30 17:47 UTC (permalink / raw)
To: Sebastian Pop, Sven Verdoolaege; +Cc: GCC Patches, Oleksandr Zinenko
On September 29, 2017 9:58:41 PM GMT+02:00, Sebastian Pop <sebpop@gmail.com> wrote:
>On Fri, Sep 29, 2017 at 2:37 PM, Sven Verdoolaege
><sven.verdoolaege@gmail.com> wrote:
>> [Sorry for the resend; I used the wrong email address to CC Alex]
>>
>> On Wed, Sep 27, 2017 at 02:18:51PM +0200, Richard Biener wrote:
>>> Ah, so I now see why we do not perform interchange on trivial cases
>like
>>>
>>> double A[1024][1024], B[1024][1024];
>>>
>>> void foo(void)
>>> {
>>> for (int i = 0; i < 1024; ++i)
>>> for (int j = 0; j < 1024; ++j)
>>> A[j][i] = B[j][i];
>>> }
>>
>> I didn't see you mentioning _why_ you expect an interchange here.
>> Are you prehaps interested in spatial locality?
>> If so, then there are several approaches for taking
>> that into account.
>> - pluto performs an intra-tile loop interchange to
>> improve temporal and/or spatial locality. It shouldn't
>> be too hard to do something similar on an isl generated
>> schedule
>> - Alex (Oleksandr) has been working on an extension of
>> the isl scheduler that takes into account spatial locality.
>> I'm not sure if it's publicly available.
>> - I've been working on a special case of spatial locality
>> (consecutivity). The current version is available in
>> the consecutivity branch. Note that it may get rebased and
>> it may not necessarily get merged into master.
>>
>> There are also other approaches, but they may not be that
>> easy to combine with the isl scheduler.
>
>Would the following work?
>Add to the proximity relation the array accesses from two
>successive iterations of the innermost loop:
>A[j][i] -> A[j][i+1] and B[j][i] -> B[j][i+1]
>With these two extra relations in the proximity map,
>isl should be able to interchange the above loop.
Can anyone give a hint on how to do that in ISL terms?
Richard.
>Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC)
2017-09-30 17:47 ` Richard Biener
@ 2017-10-01 9:58 ` Sven Verdoolaege
2017-11-11 18:02 ` Sven Verdoolaege
0 siblings, 1 reply; 26+ messages in thread
From: Sven Verdoolaege @ 2017-10-01 9:58 UTC (permalink / raw)
To: Richard Biener; +Cc: Sebastian Pop, GCC Patches, Oleksandr Zinenko
On Sat, Sep 30, 2017 at 07:47:43PM +0200, Richard Biener wrote:
> On September 29, 2017 9:58:41 PM GMT+02:00, Sebastian Pop <sebpop@gmail.com> wrote:
> >On Fri, Sep 29, 2017 at 2:37 PM, Sven Verdoolaege
> ><sven.verdoolaege@gmail.com> wrote:
> >> [Sorry for the resend; I used the wrong email address to CC Alex]
> >>
> >> On Wed, Sep 27, 2017 at 02:18:51PM +0200, Richard Biener wrote:
> >>> Ah, so I now see why we do not perform interchange on trivial cases
> >like
> >>>
> >>> double A[1024][1024], B[1024][1024];
> >>>
> >>> void foo(void)
> >>> {
> >>> for (int i = 0; i < 1024; ++i)
> >>> for (int j = 0; j < 1024; ++j)
> >>> A[j][i] = B[j][i];
> >>> }
> >>
> >> I didn't see you mentioning _why_ you expect an interchange here.
> >> Are you prehaps interested in spatial locality?
> >> If so, then there are several approaches for taking
> >> that into account.
> >> - pluto performs an intra-tile loop interchange to
> >> improve temporal and/or spatial locality. It shouldn't
> >> be too hard to do something similar on an isl generated
> >> schedule
> >> - Alex (Oleksandr) has been working on an extension of
> >> the isl scheduler that takes into account spatial locality.
> >> I'm not sure if it's publicly available.
> >> - I've been working on a special case of spatial locality
> >> (consecutivity). The current version is available in
> >> the consecutivity branch. Note that it may get rebased and
> >> it may not necessarily get merged into master.
> >>
> >> There are also other approaches, but they may not be that
> >> easy to combine with the isl scheduler.
> >
> >Would the following work?
> >Add to the proximity relation the array accesses from two
> >successive iterations of the innermost loop:
> >A[j][i] -> A[j][i+1] and B[j][i] -> B[j][i+1]
> >With these two extra relations in the proximity map,
> >isl should be able to interchange the above loop.
>
> Can anyone give a hint on how to do that in ISL terms?
For the approach pluto is taking, you'll have to look at the source
code, see pluto_intra_tile_optimize_band.
For the other two approaches I mentioned above, reports will
be made available within the next couple of weeks.
For the last one, there is a sample implementation in the
consecutivity branch of PPCG.
skimo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC)
2017-10-01 9:58 ` Sven Verdoolaege
@ 2017-11-11 18:02 ` Sven Verdoolaege
0 siblings, 0 replies; 26+ messages in thread
From: Sven Verdoolaege @ 2017-11-11 18:02 UTC (permalink / raw)
To: Richard Biener; +Cc: Sebastian Pop, GCC Patches, Oleksandr Zinenko
On Sun, Oct 01, 2017 at 11:58:30AM +0200, Sven Verdoolaege wrote:
> For the approach pluto is taking, you'll have to look at the source
> code, see pluto_intra_tile_optimize_band.
> For the other two approaches I mentioned above, reports will
> be made available within the next couple of weeks.
https://hal.inria.fr/hal-01628798
http://www.cs.kuleuven.be/publicaties/rapporten/cw/CW709.abs.html
skimo
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH][GRAPHITE] More TLC
@ 2017-10-18 13:47 Richard Biener
0 siblings, 0 replies; 26+ messages in thread
From: Richard Biener @ 2017-10-18 13:47 UTC (permalink / raw)
To: gcc-patches
And using range-info to constain parameters.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
Richard.
2017-10-18 Richard Biener <rguenther@suse.de>
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::set_rename): Simplify.
(translate_isl_ast_to_gimple::set_rename_for_each_def): Inline...
(graphite_copy_stmts_from_block): ... here.
(copy_bb_and_scalar_dependences): Simplify.
(add_parameters_to_ivs_params): Canonicalize.
(generate_entry_out_of_ssa_copies): Simplify.
* graphite-sese-to-poly.c (extract_affine_name): Simplify
by passing in ISL dimension.
(parameter_index_in_region_1): Rename to ...
(parameter_index_in_region): ... this.
(extract_affine): Adjust assert, pass down parameter index.
(add_param_constraints): Use range-info when available.
(build_scop_context): Adjust.
* sese.c (new_sese_info): Adjust.
(free_sese_info): Likewise.
* sese.h (bb_map_t, rename_map_t, phi_rename, init_back_edge_pair_t):
Remove unused typedefs.
(struct sese_info_t): Simplify rename_map, remove incomplete_phis.
Index: gcc/graphite-isl-ast-to-gimple.c
===================================================================
--- gcc/graphite-isl-ast-to-gimple.c (revision 253848)
+++ gcc/graphite-isl-ast-to-gimple.c (working copy)
@@ -195,7 +195,6 @@ class translate_isl_ast_to_gimple
edge copy_bb_and_scalar_dependences (basic_block bb, edge next_e,
vec<tree> iv_map);
void set_rename (tree old_name, tree expr);
- void set_rename_for_each_def (gimple *stmt);
void gsi_insert_earliest (gimple_seq seq);
bool codegen_error_p () const { return codegen_error; }
@@ -932,25 +931,12 @@ set_rename (tree old_name, tree expr)
{
fprintf (dump_file, "[codegen] setting rename: old_name = ");
print_generic_expr (dump_file, old_name);
- fprintf (dump_file, ", new_name = ");
+ fprintf (dump_file, ", new decl = ");
print_generic_expr (dump_file, expr);
fprintf (dump_file, "\n");
}
-
- if (old_name == expr)
- return;
-
- vec <tree> *renames = region->rename_map->get (old_name);
-
- if (renames)
- renames->safe_push (expr);
- else
- {
- vec<tree> r;
- r.create (2);
- r.safe_push (expr);
- region->rename_map->put (old_name, r);
- }
+ bool res = region->rename_map->put (old_name, expr);
+ gcc_assert (! res);
}
/* Return an iterator to the instructions comes last in the execution order.
@@ -1132,21 +1118,6 @@ should_copy_to_new_region (gimple *stmt,
return true;
}
-/* Create new names for all the definitions created by COPY and add replacement
- mappings for each new name. */
-
-void translate_isl_ast_to_gimple::
-set_rename_for_each_def (gimple *stmt)
-{
- def_operand_p def_p;
- ssa_op_iter op_iter;
- FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
- {
- tree old_name = DEF_FROM_PTR (def_p);
- create_new_def_for (old_name, stmt, def_p);
- }
-}
-
/* Duplicates the statements of basic block BB into basic block NEW_BB
and compute the new induction variables according to the IV_MAP. */
@@ -1192,7 +1163,13 @@ graphite_copy_stmts_from_block (basic_bl
gimple_duplicate_stmt_histograms (cfun, copy, cfun, stmt);
/* Crete new names for each def in the copied stmt. */
- set_rename_for_each_def (copy);
+ def_operand_p def_p;
+ ssa_op_iter op_iter;
+ FOR_EACH_SSA_DEF_OPERAND (def_p, copy, op_iter, SSA_OP_ALL_DEFS)
+ {
+ tree old_name = DEF_FROM_PTR (def_p);
+ create_new_def_for (old_name, copy, def_p);
+ }
if (codegen_error_p ())
return false;
@@ -1244,17 +1221,14 @@ copy_bb_and_scalar_dependences (basic_bl
continue;
tree new_phi_def;
- vec <tree> *renames = region->rename_map->get (res);
- if (! renames || renames->is_empty ())
+ tree *rename = region->rename_map->get (res);
+ if (! rename)
{
new_phi_def = create_tmp_reg (TREE_TYPE (res));
set_rename (res, new_phi_def);
}
else
- {
- gcc_assert (renames->length () == 1);
- new_phi_def = (*renames)[0];
- }
+ new_phi_def = *rename;
gassign *ass = gimple_build_assign (NULL_TREE, new_phi_def);
create_new_def_for (res, ass, NULL);
@@ -1291,17 +1265,14 @@ copy_bb_and_scalar_dependences (basic_bl
continue;
tree new_phi_def;
- vec <tree> *renames = region->rename_map->get (res);
- if (! renames || renames->is_empty ())
+ tree *rename = region->rename_map->get (res);
+ if (! rename)
{
new_phi_def = create_tmp_reg (TREE_TYPE (res));
set_rename (res, new_phi_def);
}
else
- {
- gcc_assert (renames->length () == 1);
- new_phi_def = (*renames)[0];
- }
+ new_phi_def = *rename;
tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
if (TREE_CODE (arg) == SSA_NAME
@@ -1336,13 +1307,14 @@ add_parameters_to_ivs_params (scop_p sco
{
sese_info_p region = scop->scop_info;
unsigned nb_parameters = isl_set_dim (scop->param_context, isl_dim_param);
- gcc_assert (nb_parameters == region->params.length ());
+ gcc_assert (nb_parameters == sese_nb_params (region));
unsigned i;
- for (i = 0; i < nb_parameters; i++)
+ tree param;
+ FOR_EACH_VEC_ELT (region->params, i, param)
{
isl_id *tmp_id = isl_set_get_dim_id (scop->param_context,
isl_dim_param, i);
- ip[tmp_id] = region->params[i];
+ ip[tmp_id] = param;
}
}
@@ -1417,10 +1389,10 @@ generate_entry_out_of_ssa_copies (edge f
continue;
/* When there's no out-of-SSA var registered do not bother
to create one. */
- vec <tree> *renames = region->rename_map->get (res);
- if (! renames || renames->is_empty ())
+ tree *rename = region->rename_map->get (res);
+ if (! rename)
continue;
- tree new_phi_def = (*renames)[0];
+ tree new_phi_def = *rename;
gassign *ass = gimple_build_assign (new_phi_def,
PHI_ARG_DEF_FROM_EDGE (phi,
false_entry));
Index: gcc/graphite-sese-to-poly.c
===================================================================
--- gcc/graphite-sese-to-poly.c (revision 253848)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -142,11 +142,8 @@ isl_id_for_dr (scop_p s)
/* Extract an affine expression from the ssa_name E. */
static isl_pw_aff *
-extract_affine_name (scop_p s, tree e, __isl_take isl_space *space)
+extract_affine_name (int dimension, __isl_take isl_space *space)
{
- isl_id *id = isl_id_for_ssa_name (s, e);
- int dimension = isl_space_find_dim_by_id (space, isl_dim_param, id);
- isl_id_free (id);
isl_set *dom = isl_set_universe (isl_space_copy (space));
isl_aff *aff = isl_aff_zero_on_domain (isl_local_space_from_space (space));
aff = isl_aff_add_coefficient_si (aff, isl_dim_param, dimension, 1);
@@ -211,17 +208,13 @@ wrap (isl_pw_aff *pwaff, unsigned width)
Otherwise returns -1. */
static inline int
-parameter_index_in_region_1 (tree name, sese_info_p region)
+parameter_index_in_region (tree name, sese_info_p region)
{
int i;
tree p;
-
- gcc_assert (TREE_CODE (name) == SSA_NAME);
-
FOR_EACH_VEC_ELT (region->params, i, p)
if (p == name)
return i;
-
return -1;
}
@@ -288,10 +281,13 @@ extract_affine (scop_p s, tree e, __isl_
break;
case SSA_NAME:
- gcc_assert (-1 != parameter_index_in_region_1 (e, s->scop_info)
- || defined_in_sese_p (e, s->scop_info->region));
- res = extract_affine_name (s, e, space);
- break;
+ {
+ gcc_assert (! defined_in_sese_p (e, s->scop_info->region));
+ int dim = parameter_index_in_region (e, s->scop_info);
+ gcc_assert (dim != -1);
+ res = extract_affine_name (dim, space);
+ break;
+ }
case INTEGER_CST:
res = extract_affine_int (e, space);
@@ -431,54 +427,40 @@ add_conditions_to_domain (poly_bb_p pbb)
of P. */
static void
-add_param_constraints (scop_p scop, graphite_dim_t p)
+add_param_constraints (scop_p scop, graphite_dim_t p, tree parameter)
{
- tree parameter = scop->scop_info->params[p];
tree type = TREE_TYPE (parameter);
- tree lb = NULL_TREE;
- tree ub = NULL_TREE;
+ wide_int min, max;
- if (POINTER_TYPE_P (type) || !TYPE_MIN_VALUE (type))
- lb = lower_bound_in_type (type, type);
- else
- lb = TYPE_MIN_VALUE (type);
+ gcc_assert (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type));
- if (POINTER_TYPE_P (type) || !TYPE_MAX_VALUE (type))
- ub = upper_bound_in_type (type, type);
+ if (INTEGRAL_TYPE_P (type)
+ && get_range_info (parameter, &min, &max) == VR_RANGE)
+ ;
else
- ub = TYPE_MAX_VALUE (type);
-
- if (lb)
{
- isl_space *space = isl_set_get_space (scop->param_context);
- isl_constraint *c;
- isl_val *v;
-
- c = isl_inequality_alloc (isl_local_space_from_space (space));
- v = isl_val_int_from_wi (scop->isl_context, wi::to_widest (lb));
- v = isl_val_neg (v);
- c = isl_constraint_set_constant_val (c, v);
- c = isl_constraint_set_coefficient_si (c, isl_dim_param, p, 1);
-
- scop->param_context = isl_set_coalesce
- (isl_set_add_constraint (scop->param_context, c));
+ min = wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type));
+ max = wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type));
}
- if (ub)
- {
- isl_space *space = isl_set_get_space (scop->param_context);
- isl_constraint *c;
- isl_val *v;
-
- c = isl_inequality_alloc (isl_local_space_from_space (space));
-
- v = isl_val_int_from_wi (scop->isl_context, wi::to_widest (ub));
- c = isl_constraint_set_constant_val (c, v);
- c = isl_constraint_set_coefficient_si (c, isl_dim_param, p, -1);
-
- scop->param_context = isl_set_coalesce
- (isl_set_add_constraint (scop->param_context, c));
- }
+ isl_space *space = isl_set_get_space (scop->param_context);
+ isl_constraint *c = isl_inequality_alloc (isl_local_space_from_space (space));
+ isl_val *v = isl_val_int_from_wi (scop->isl_context,
+ widest_int::from (min, TYPE_SIGN (type)));
+ v = isl_val_neg (v);
+ c = isl_constraint_set_constant_val (c, v);
+ c = isl_constraint_set_coefficient_si (c, isl_dim_param, p, 1);
+ scop->param_context = isl_set_coalesce
+ (isl_set_add_constraint (scop->param_context, c));
+
+ space = isl_set_get_space (scop->param_context);
+ c = isl_inequality_alloc (isl_local_space_from_space (space));
+ v = isl_val_int_from_wi (scop->isl_context,
+ widest_int::from (max, TYPE_SIGN (type)));
+ c = isl_constraint_set_constant_val (c, v);
+ c = isl_constraint_set_coefficient_si (c, isl_dim_param, p, -1);
+ scop->param_context = isl_set_coalesce
+ (isl_set_add_constraint (scop->param_context, c));
}
/* Add a constrain to the ACCESSES polyhedron for the alias set of
@@ -930,9 +912,8 @@ build_scop_context (scop_p scop)
scop->param_context = isl_set_universe (space);
- graphite_dim_t p;
- for (p = 0; p < nbp; p++)
- add_param_constraints (scop, p);
+ FOR_EACH_VEC_ELT (region->params, i, e)
+ add_param_constraints (scop, i, e);
}
/* Return true when loop A is nested in loop B. */
@@ -1224,7 +1205,7 @@ bool
build_poly_scop (scop_p scop)
{
int old_err = isl_options_get_on_error (scop->isl_context);
- isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
+ isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_ABORT);
build_scop_context (scop);
Index: gcc/sese.c
===================================================================
--- gcc/sese.c (revision 253848)
+++ gcc/sese.c (working copy)
@@ -156,10 +156,8 @@ new_sese_info (edge entry, edge exit)
region->liveout = NULL;
region->debug_liveout = NULL;
region->params.create (3);
- region->rename_map = new rename_map_t;
+ region->rename_map = new hash_map <tree, tree>;
region->bbs.create (3);
- region->incomplete_phis.create (3);
-
return region;
}
@@ -173,14 +171,9 @@ free_sese_info (sese_info_p region)
BITMAP_FREE (region->liveout);
BITMAP_FREE (region->debug_liveout);
- for (rename_map_t::iterator it = region->rename_map->begin ();
- it != region->rename_map->end (); ++it)
- (*it).second.release ();
-
delete region->rename_map;
region->rename_map = NULL;
region->bbs.release ();
- region->incomplete_phis.release ();
XDELETE (region);
}
Index: gcc/sese.h
===================================================================
--- gcc/sese.h (revision 253848)
+++ gcc/sese.h (working copy)
@@ -22,13 +22,7 @@ along with GCC; see the file COPYING3.
#ifndef GCC_SESE_H
#define GCC_SESE_H
-typedef hash_map<basic_block, vec<basic_block> > bb_map_t;
-typedef hash_map<tree, vec<tree> > rename_map_t;
typedef struct ifsese_s *ifsese;
-/* First phi is the new codegenerated phi second one is original phi. */
-typedef std::pair <gphi *, gphi *> phi_rename;
-/* First edge is the init edge and second is the back edge w.r.t. a loop. */
-typedef std::pair<edge, edge> init_back_edge_pair_t;
/* A Single Entry, Single Exit region is a part of the CFG delimited
by two edges. */
@@ -91,18 +85,12 @@ typedef struct sese_info_t
/* Parameters used within the SCOP. */
vec<tree> params;
- /* Maps an old name to one or more new names. When there are several new
- names, one has to select the definition corresponding to the immediate
- dominator. */
- rename_map_t *rename_map;
+ /* Maps an old name to a new decl. */
+ hash_map<tree, tree> *rename_map;
/* Basic blocks contained in this SESE. */
vec<basic_block> bbs;
- /* A vector of phi nodes to be updated when all arguments are available. The
- pair contains first the old_phi and second the new_phi. */
- vec<phi_rename> incomplete_phis;
-
/* The condition region generated for this sese. */
ifsese if_region;
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2017-11-11 16:57 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22 13:03 [PATCH][GRAPHITE] More TLC Richard Biener
2017-09-22 14:30 ` Sebastian Pop
2017-09-25 13:12 ` Richard Biener
2017-09-26 14:20 ` Sebastian Pop
2017-09-26 15:15 ` Sven Verdoolaege
2017-09-28 18:29 ` Sebastian Pop
2017-09-27 12:18 ` Richard Biener
2017-09-27 13:04 ` Richard Biener
2017-09-27 14:33 ` Richard Biener
2017-09-28 19:11 ` Sebastian Pop
2017-09-29 11:17 ` Richard Biener
2017-09-29 14:59 ` Sebastian Pop
2017-09-28 19:00 ` Sebastian Pop
2017-09-28 18:46 ` Sebastian Pop
2017-09-29 19:34 ` isl scheduler and spatial locality (Re: [PATCH][GRAPHITE] More TLC) Sven Verdoolaege
2017-09-29 19:37 ` Sven Verdoolaege
2017-09-29 19:59 ` Sebastian Pop
2017-09-29 21:28 ` Oleksandr Zinenko
2017-09-30 17:47 ` Richard Biener
2017-10-01 9:58 ` Sven Verdoolaege
2017-11-11 18:02 ` Sven Verdoolaege
2017-09-25 7:39 ` [PATCH][GRAPHITE] More TLC Richard Biener
2017-09-25 12:46 ` Richard Biener
2017-09-25 12:55 ` Bin.Cheng
2017-09-25 13:10 ` Richard Biener
2017-10-18 13:47 Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).