* [PATCH] tree-optimization/101120 - fix compile-time issue with SLP groups
@ 2021-06-18 12:22 Richard Biener
2021-06-18 14:24 ` Richard Biener
0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2021-06-18 12:22 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.sandiford
This places two hacks to avoid an old compile-time issue when
vectorizing large permuted SLP groups with gaps where we end up
emitting loads and IV adjustments for the gap as well and those
have quite a high cost until they are eventually cleaned up.
The first hack is to fold the auto-inc style IV updates early
in the vectorizer rather than in the next forwprop pass which
shortens the SSA use-def chains of the used IV.
The second hack is to remove the unused loads after we've picked
all that we possibly use.
Bootstrap / regtest running on x86_64-unknown-linux-gnu.
I wonder if this is too gross (and I have to check the one or two
bug duplicates), but it should be at least easy to backport ...
Thanks,
Richard.
2021-06-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/101120
* tree-vect-data-refs.c (bump_vector_ptr): Fold the
built increment.
* tree-vect-stmts.c (vectorizable_load): Remove unused
loads in the DR chain for SLP.
---
gcc/tree-vect-data-refs.c | 12 +++++++++++-
gcc/tree-vect-stmts.c | 12 ++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index bb086c6ac1c..be067c8923b 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-hash-traits.h"
#include "vec-perm-indices.h"
#include "internal-fn.h"
+#include "gimple-fold.h"
/* Return true if load- or store-lanes optab OPTAB is implemented for
COUNT vectors of type VECTYPE. NAME is the name of OPTAB. */
@@ -5026,7 +5027,7 @@ bump_vector_ptr (vec_info *vinfo,
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
tree update = TYPE_SIZE_UNIT (vectype);
- gassign *incr_stmt;
+ gimple *incr_stmt;
ssa_op_iter iter;
use_operand_p use_p;
tree new_dataref_ptr;
@@ -5041,6 +5042,15 @@ bump_vector_ptr (vec_info *vinfo,
incr_stmt = gimple_build_assign (new_dataref_ptr, POINTER_PLUS_EXPR,
dataref_ptr, update);
vect_finish_stmt_generation (vinfo, stmt_info, incr_stmt, gsi);
+ /* Fold the increment, avoiding excessive chains use-def chains of
+ those, leading to compile-time issues for passes until the next
+ forwprop pass which would do this as well. */
+ gimple_stmt_iterator fold_gsi = gsi_for_stmt (incr_stmt);
+ if (fold_stmt (&fold_gsi, follow_all_ssa_edges))
+ {
+ incr_stmt = gsi_stmt (fold_gsi);
+ update_stmt (incr_stmt);
+ }
/* Copy the points-to information if it exists. */
if (DR_PTR_INFO (dr))
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index eeef96a2eb6..1636e6716df 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9765,6 +9765,18 @@ vectorizable_load (vec_info *vinfo,
bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain,
gsi, vf, false, &n_perms);
gcc_assert (ok);
+ /* For SLP we know we've seen all possible uses of dr_chain.
+ See to remove stmts we didn't need.
+ ??? This is a hack to prevent compile-time issues as seen
+ in PR101120 and friends. */
+ for (tree op : dr_chain)
+ if (has_zero_uses (op))
+ {
+ gimple *stmt = SSA_NAME_DEF_STMT (op);
+ gimple_stmt_iterator rgsi = gsi_for_stmt (stmt);
+ gsi_remove (&rgsi, true);
+ release_defs (stmt);
+ }
}
else
{
--
2.26.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] tree-optimization/101120 - fix compile-time issue with SLP groups
2021-06-18 12:22 [PATCH] tree-optimization/101120 - fix compile-time issue with SLP groups Richard Biener
@ 2021-06-18 14:24 ` Richard Biener
2021-06-21 13:02 ` Richard Biener
0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2021-06-18 14:24 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Richard Sandiford
[-- Attachment #1: Type: text/plain, Size: 892 bytes --]
On Fri, Jun 18, 2021 at 2:23 PM Richard Biener <rguenther@suse.de> wrote:
>
> This places two hacks to avoid an old compile-time issue when
> vectorizing large permuted SLP groups with gaps where we end up
> emitting loads and IV adjustments for the gap as well and those
> have quite a high cost until they are eventually cleaned up.
>
> The first hack is to fold the auto-inc style IV updates early
> in the vectorizer rather than in the next forwprop pass which
> shortens the SSA use-def chains of the used IV.
>
> The second hack is to remove the unused loads after we've picked
> all that we possibly use.
>
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> I wonder if this is too gross (and I have to check the one or two
> bug duplicates), but it should be at least easy to backport ...
Was apparently too simple - the following passes bootstrap and
regtest.
Richard.
[-- Attachment #2: p --]
[-- Type: application/octet-stream, Size: 6678 bytes --]
From 14d83407b265f1b5026c4d2db327d5cc58bda074 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Fri, 18 Jun 2021 14:07:00 +0200
Subject: [PATCH] tree-optimization/101120 - fix compile-time issue with SLP
groups
To: gcc-patches@gcc.gnu.org
This places two hacks to avoid an old compile-time issue when
vectorizing large permuted SLP groups with gaps where we end up
emitting loads and IV adjustments for the gap as well and those
have quite a high cost until they are eventually cleaned up.
The first hack is to fold the auto-inc style IV updates early
in the vectorizer rather than in the next forwprop pass which
shortens the SSA use-def chains of the used IV.
The second hack is to remove the unused loads after we've picked
all that we possibly use.
2021-06-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/101120
* tree-vect-data-refs.c (bump_vector_ptr): Fold the
built increment.
* tree-vect-slp.c (vect_transform_slp_perm_load): Add
DR chain DCE capability.
* tree-vectorizer.h (vect_transform_slp_perm_load): Adjust.
* tree-vect-stmts.c (vectorizable_load): Remove unused
loads in the DR chain for SLP.
---
gcc/tree-vect-data-refs.c | 12 +++++++++++-
gcc/tree-vect-slp.c | 31 ++++++++++++++++++++++++++-----
gcc/tree-vect-stmts.c | 7 ++++++-
gcc/tree-vectorizer.h | 2 +-
4 files changed, 44 insertions(+), 8 deletions(-)
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index bb086c6ac1c..be067c8923b 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-hash-traits.h"
#include "vec-perm-indices.h"
#include "internal-fn.h"
+#include "gimple-fold.h"
/* Return true if load- or store-lanes optab OPTAB is implemented for
COUNT vectors of type VECTYPE. NAME is the name of OPTAB. */
@@ -5026,7 +5027,7 @@ bump_vector_ptr (vec_info *vinfo,
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
tree update = TYPE_SIZE_UNIT (vectype);
- gassign *incr_stmt;
+ gimple *incr_stmt;
ssa_op_iter iter;
use_operand_p use_p;
tree new_dataref_ptr;
@@ -5041,6 +5042,15 @@ bump_vector_ptr (vec_info *vinfo,
incr_stmt = gimple_build_assign (new_dataref_ptr, POINTER_PLUS_EXPR,
dataref_ptr, update);
vect_finish_stmt_generation (vinfo, stmt_info, incr_stmt, gsi);
+ /* Fold the increment, avoiding excessive chains use-def chains of
+ those, leading to compile-time issues for passes until the next
+ forwprop pass which would do this as well. */
+ gimple_stmt_iterator fold_gsi = gsi_for_stmt (incr_stmt);
+ if (fold_stmt (&fold_gsi, follow_all_ssa_edges))
+ {
+ incr_stmt = gsi_stmt (fold_gsi);
+ update_stmt (incr_stmt);
+ }
/* Copy the points-to information if it exists. */
if (DR_PTR_INFO (dr))
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index e2e5a54a87b..396b2bee4d8 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -6317,14 +6317,15 @@ vect_get_slp_defs (vec_info *,
If ANALYZE_ONLY is TRUE, only check that it is possible to create valid
permute statements for the SLP node NODE. Store the number of vector
permute instructions in *N_PERMS and the number of vector load
- instructions in *N_LOADS. */
+ instructions in *N_LOADS. If DCE_CHAIN is true, remove all definitions
+ that were not needed. */
bool
vect_transform_slp_perm_load (vec_info *vinfo,
slp_tree node, vec<tree> dr_chain,
gimple_stmt_iterator *gsi, poly_uint64 vf,
bool analyze_only, unsigned *n_perms,
- unsigned int *n_loads)
+ unsigned int *n_loads, bool dce_chain)
{
stmt_vec_info stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
int vec_index = 0;
@@ -6403,6 +6404,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
}
auto_sbitmap used_in_lanes (in_nlanes);
bitmap_clear (used_in_lanes);
+ auto_bitmap used_defs;
unsigned int count = mask.encoded_nelts ();
mask.quick_grow (count);
@@ -6510,11 +6512,20 @@ vect_transform_slp_perm_load (vec_info *vinfo,
mask_vec);
vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
gsi);
+ if (dce_chain)
+ {
+ bitmap_set_bit (used_defs, first_vec_index + ri);
+ bitmap_set_bit (used_defs, second_vec_index + ri);
+ }
}
else
- /* If mask was NULL_TREE generate the requested
- identity transform. */
- perm_stmt = SSA_NAME_DEF_STMT (first_vec);
+ {
+ /* If mask was NULL_TREE generate the requested
+ identity transform. */
+ perm_stmt = SSA_NAME_DEF_STMT (first_vec);
+ if (dce_chain)
+ bitmap_set_bit (used_defs, first_vec_index + ri);
+ }
/* Store the vector statement in NODE. */
SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
@@ -6554,6 +6565,16 @@ vect_transform_slp_perm_load (vec_info *vinfo,
}
}
+ if (dce_chain)
+ for (unsigned i = 0; i < dr_chain.length (); ++i)
+ if (!bitmap_bit_p (used_defs, i))
+ {
+ gimple *stmt = SSA_NAME_DEF_STMT (dr_chain[i]);
+ gimple_stmt_iterator rgsi = gsi_for_stmt (stmt);
+ gsi_remove (&rgsi, true);
+ release_defs (stmt);
+ }
+
return true;
}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 28d9984318f..d95e359daae 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9762,8 +9762,13 @@ vectorizable_load (vec_info *vinfo,
if (slp_perm)
{
unsigned n_perms;
+ /* For SLP we know we've seen all possible uses of dr_chain so
+ direct vect_transform_slp_perm_load to DCE the unused parts.
+ ??? This is a hack to prevent compile-time issues as seen
+ in PR101120 and friends. */
bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain,
- gsi, vf, false, &n_perms);
+ gsi, vf, false, &n_perms,
+ nullptr, true);
gcc_assert (ok);
}
else
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b9824623ad9..fa28336d429 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2012,7 +2012,7 @@ extern void vect_free_slp_instance (slp_instance);
extern bool vect_transform_slp_perm_load (vec_info *, slp_tree, vec<tree>,
gimple_stmt_iterator *, poly_uint64,
bool, unsigned *,
- unsigned * = nullptr);
+ unsigned * = nullptr, bool = false);
extern bool vect_slp_analyze_operations (vec_info *);
extern void vect_schedule_slp (vec_info *, vec<slp_instance>);
extern opt_result vect_analyze_slp (vec_info *, unsigned);
--
2.26.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] tree-optimization/101120 - fix compile-time issue with SLP groups
2021-06-18 14:24 ` Richard Biener
@ 2021-06-21 13:02 ` Richard Biener
0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2021-06-21 13:02 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Richard Sandiford
On Fri, Jun 18, 2021 at 4:24 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jun 18, 2021 at 2:23 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > This places two hacks to avoid an old compile-time issue when
> > vectorizing large permuted SLP groups with gaps where we end up
> > emitting loads and IV adjustments for the gap as well and those
> > have quite a high cost until they are eventually cleaned up.
> >
> > The first hack is to fold the auto-inc style IV updates early
> > in the vectorizer rather than in the next forwprop pass which
> > shortens the SSA use-def chains of the used IV.
> >
> > The second hack is to remove the unused loads after we've picked
> > all that we possibly use.
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > I wonder if this is too gross (and I have to check the one or two
> > bug duplicates), but it should be at least easy to backport ...
>
> Was apparently too simple - the following passes bootstrap and
> regtest.
I've pushed this now after thinking about better solutions.
Richard.
>
> Richard.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-06-21 13:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 12:22 [PATCH] tree-optimization/101120 - fix compile-time issue with SLP groups Richard Biener
2021-06-18 14:24 ` Richard Biener
2021-06-21 13:02 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).