* [PATCH][RFC] enfoce SLP_TREE_VECTYPE on invariants
@ 2020-05-18 14:27 Richard Biener
2020-05-18 15:33 ` Richard Sandiford
0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2020-05-18 14:27 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
Hi,
I'm trying to enforce SLP_TREE_VECTYPE being set (correctly) on
external and invariant SLP nodes to avoid (re-)computing that
in other places. The responsible entity for specifying the
desired vector type is vectorizable_* - vectorization analysis
of the user (when we start to share those nodes between multiple
users a user can then also unshare such a node if an existing
vector type conflicts with its own requirements).
The previous patch
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545938.html
set up things so that vect_slp_analyze_node_operations would
use such type information during costing and also pick up
unshared nodes.
Now as you can see below where I started to enforce SLP_TREE_VECTYPE
for the special boolean case in vect_get_constant_vectors I seemingly
only had to fixup two places, vectorizable_operation (saw bool & bool)
and vectorizable_comparison doing testing on x86_64 and aarch64
(cross cc1 on aarch64-sve.exp and vect.exp). There obviously will
be more places and I was thinking of less ugly (and more correct?)
ways than doing
+ FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child)
+ if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
+ SLP_TREE_VECTYPE (child) = vectype;
in each vectorizable_*. In principle it is the vect_is_simple_use
that would need adjustment to associate the correct SLP child with
a vectype so I can't think of sth better than refactoring things
to sth like
vect_simple_uses (vinfo, stmt_info, slp_node, &dts, &vectypes)
and process all operans at once transparently for stmt_info/slp_node
(where the actual vect_is_simple_use check is redundant for SLP
nodes anyway). The index into the array would then match up
with the SLP child (if SLP) or the stmt operand (if not SLP - with
SLP the order can be different on the stmt).
Any better ideas or comments on the general direction? Richard,
which vectorizable_* do you remember being the "worst" with respect
to vector type determining for invariants?
Thanks,
Richard.
---
gcc/tree-vect-slp.c | 17 +++++++++++++++--
gcc/tree-vect-stmts.c | 22 ++++++++++++++++++++++
2 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 69a2002717f..32e0b6677b3 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3640,9 +3640,22 @@ vect_get_constant_vectors (vec_info *vinfo,
tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
&& vect_mask_constant_operand_p (vinfo, stmt_vinfo, op_num))
- vector_type = truth_type_for (stmt_vectype);
+ {
+ /* We always want SLP_TREE_VECTYPE (op_node) here correctly set
+ as first step. */
+ vector_type = SLP_TREE_VECTYPE (op_node);
+ gcc_assert (vector_type
+ && types_compatible_p (vector_type,
+ truth_type_for (stmt_vectype)));
+ }
else
- vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), op_node);
+ {
+ vector_type = get_vectype_for_scalar_type (vinfo,
+ TREE_TYPE (op), op_node);
+ gcc_assert (!SLP_TREE_VECTYPE (op_node)
+ || types_compatible_p (SLP_TREE_VECTYPE (op_node),
+ vector_type));
+ }
poly_uint64 vf = 1;
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 94f233e2e27..c21fba84ec1 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6169,6 +6169,15 @@ vectorizable_operation (vec_info *vinfo,
vectype, NULL);
}
+ /* Put types on constant and invariant SLP children. */
+ if (slp_node)
+ {
+ slp_tree child;
+ FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child)
+ if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
+ SLP_TREE_VECTYPE (child) = vectype;
+ }
+
STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
DUMP_VECT_SCOPE ("vectorizable_operation");
vect_model_simple_cost (vinfo, stmt_info,
@@ -10654,6 +10663,19 @@ vectorizable_comparison (vec_info *vinfo,
}
}
+ /* Ideally vect_is_simple_use would return the corresponding SLP
+ nodes as well. But a reorg of that is for when we make SLP
+ only reality. */
+ if ((!vectype1 || !vectype2) && slp_node)
+ for (unsigned i = 0; i < 2; ++i)
+ {
+ slp_tree child = SLP_TREE_CHILDREN (slp_node)[i];
+ if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
+ /* ??? If we share invariant nodes look for existing
+ vectype and on mismatch unshare the child node. */
+ SLP_TREE_VECTYPE (child) = vectype;
+ }
+
STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
vect_model_simple_cost (vinfo, stmt_info,
ncopies * (1 + (bitop2 != NOP_EXPR)),
--
2.26.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH][RFC] enfoce SLP_TREE_VECTYPE on invariants
2020-05-18 14:27 [PATCH][RFC] enfoce SLP_TREE_VECTYPE on invariants Richard Biener
@ 2020-05-18 15:33 ` Richard Sandiford
2020-05-19 7:32 ` Richard Biener
0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2020-05-18 15:33 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
Richard Biener <rguenther@suse.de> writes:
> Hi,
>
> I'm trying to enforce SLP_TREE_VECTYPE being set (correctly) on
> external and invariant SLP nodes to avoid (re-)computing that
> in other places.
Nice.
> The responsible entity for specifying the
> desired vector type is vectorizable_* - vectorization analysis
> of the user (when we start to share those nodes between multiple
> users a user can then also unshare such a node if an existing
> vector type conflicts with its own requirements).
>
> The previous patch
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545938.html
> set up things so that vect_slp_analyze_node_operations would
> use such type information during costing and also pick up
> unshared nodes.
>
> Now as you can see below where I started to enforce SLP_TREE_VECTYPE
> for the special boolean case in vect_get_constant_vectors I seemingly
> only had to fixup two places, vectorizable_operation (saw bool & bool)
> and vectorizable_comparison doing testing on x86_64 and aarch64
> (cross cc1 on aarch64-sve.exp and vect.exp). There obviously will
> be more places and I was thinking of less ugly (and more correct?)
> ways than doing
>
> + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child)
> + if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
> + SLP_TREE_VECTYPE (child) = vectype;
>
> in each vectorizable_*. In principle it is the vect_is_simple_use
> that would need adjustment to associate the correct SLP child with
> a vectype so I can't think of sth better than refactoring things
> to sth like
>
> vect_simple_uses (vinfo, stmt_info, slp_node, &dts, &vectypes)
>
> and process all operans at once transparently for stmt_info/slp_node
> (where the actual vect_is_simple_use check is redundant for SLP
> nodes anyway). The index into the array would then match up
> with the SLP child (if SLP) or the stmt operand (if not SLP - with
> SLP the order can be different on the stmt).
>
> Any better ideas or comments on the general direction? Richard,
> which vectorizable_* do you remember being the "worst" with respect
> to vector type determining for invariants?
I think most of the struggles I had were from different code ending
up with different results when recomputing the vectype. IIRC the
vectorizable_* routines usually got it right and it was the other
code that got it wrong. So yeah, can imagine that we don't need
many changes if the vectoriable_* routines get the final say.
Not related to invariants specifically, but the routines I remember
being most awkward for vector type choices were vectorizable_condition
(because of the embedded comparison) and vectorizable_conversion (because
there could be multiple stages, with the intermediate vector types not
being recorded).
OTOH, I think calculating the ideal vector types requires
propagating the vector type in both directions, rather than
just upwards from the roots. E.g. for:
double d1, d2, d3, d4;
float f1, f2, f3;
f1 = d1 == d2 & d3 == d4 ? f2 : f3;
when operating on full vectors, there needs to be a conversion
somewhere between "d-booleans" and "f-booleans". For this case
it probably should happen as early as possible, e.g.:
b1_lo = d1_lo == d2_lo
b1_hi = d1_hi == d2_hi
b2_lo = d3_lo == d4_lo
b2_hi = d3_hi == d4_hi
b1 = pack (b1_lo, b1_hi)
b2 = pack (b2_lo, b2_hi)
b3 = b1 & b2
f1 = b3 ? f2 : f3;
But if the roles are reversed:
float f1, f2, f3, f4;
double d1, d2, d3;
d1 = f1 == f2 & f3 == f4 ? d2 : d3;
it should happen after the "&":
b1 = f1 == f2
b2 = f3 == f4
b3 = b1 & b2
b3_lo = unpack_lo (b3)
b3_hi = unpack_hi (b3)
d1_lo = b3_lo ? d2_lo : d3_lo;
d1_hi = b3_hi ? d2_hi : d3_hi;
At the moment we rely on pattern matching for that. But it basically
means that the pattern matcher has to "predict" what vector types the
vectorizable_* routines would calculate, so that it knows whether to
create pattern statements that enforce a different choice.
IMO it'd be really good to get rid of that and unify the vector
type calculation. But that probably means not doing it during
vectorizable_*, or at least changing the way that the vectorizable_*
functions are called during analysis, so that information can flow in
both directions.
Thanks,
Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH][RFC] enfoce SLP_TREE_VECTYPE on invariants
2020-05-18 15:33 ` Richard Sandiford
@ 2020-05-19 7:32 ` Richard Biener
0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2020-05-19 7:32 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches
On Mon, 18 May 2020, Richard Sandiford wrote:
> Richard Biener <rguenther@suse.de> writes:
> > Hi,
> >
> > I'm trying to enforce SLP_TREE_VECTYPE being set (correctly) on
> > external and invariant SLP nodes to avoid (re-)computing that
> > in other places.
>
> Nice.
>
> > The responsible entity for specifying the
> > desired vector type is vectorizable_* - vectorization analysis
> > of the user (when we start to share those nodes between multiple
> > users a user can then also unshare such a node if an existing
> > vector type conflicts with its own requirements).
> >
> > The previous patch
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545938.html
> > set up things so that vect_slp_analyze_node_operations would
> > use such type information during costing and also pick up
> > unshared nodes.
> >
> > Now as you can see below where I started to enforce SLP_TREE_VECTYPE
> > for the special boolean case in vect_get_constant_vectors I seemingly
> > only had to fixup two places, vectorizable_operation (saw bool & bool)
> > and vectorizable_comparison doing testing on x86_64 and aarch64
> > (cross cc1 on aarch64-sve.exp and vect.exp). There obviously will
> > be more places and I was thinking of less ugly (and more correct?)
> > ways than doing
> >
> > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child)
> > + if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
> > + SLP_TREE_VECTYPE (child) = vectype;
> >
> > in each vectorizable_*. In principle it is the vect_is_simple_use
> > that would need adjustment to associate the correct SLP child with
> > a vectype so I can't think of sth better than refactoring things
> > to sth like
> >
> > vect_simple_uses (vinfo, stmt_info, slp_node, &dts, &vectypes)
> >
> > and process all operans at once transparently for stmt_info/slp_node
> > (where the actual vect_is_simple_use check is redundant for SLP
> > nodes anyway). The index into the array would then match up
> > with the SLP child (if SLP) or the stmt operand (if not SLP - with
> > SLP the order can be different on the stmt).
> >
> > Any better ideas or comments on the general direction? Richard,
> > which vectorizable_* do you remember being the "worst" with respect
> > to vector type determining for invariants?
>
> I think most of the struggles I had were from different code ending
> up with different results when recomputing the vectype. IIRC the
> vectorizable_* routines usually got it right and it was the other
> code that got it wrong. So yeah, can imagine that we don't need
> many changes if the vectoriable_* routines get the final say.
>
> Not related to invariants specifically, but the routines I remember
> being most awkward for vector type choices were vectorizable_condition
> (because of the embedded comparison) and vectorizable_conversion (because
> there could be multiple stages, with the intermediate vector types not
> being recorded).
>
> OTOH, I think calculating the ideal vector types requires
> propagating the vector type in both directions, rather than
> just upwards from the roots. E.g. for:
>
> double d1, d2, d3, d4;
> float f1, f2, f3;
>
> f1 = d1 == d2 & d3 == d4 ? f2 : f3;
>
> when operating on full vectors, there needs to be a conversion
> somewhere between "d-booleans" and "f-booleans". For this case
> it probably should happen as early as possible, e.g.:
>
> b1_lo = d1_lo == d2_lo
> b1_hi = d1_hi == d2_hi
>
> b2_lo = d3_lo == d4_lo
> b2_hi = d3_hi == d4_hi
>
> b1 = pack (b1_lo, b1_hi)
>
> b2 = pack (b2_lo, b2_hi)
>
> b3 = b1 & b2
>
> f1 = b3 ? f2 : f3;
>
> But if the roles are reversed:
>
> float f1, f2, f3, f4;
> double d1, d2, d3;
>
> d1 = f1 == f2 & f3 == f4 ? d2 : d3;
>
> it should happen after the "&":
>
> b1 = f1 == f2
>
> b2 = f3 == f4
>
> b3 = b1 & b2
>
> b3_lo = unpack_lo (b3)
> b3_hi = unpack_hi (b3)
>
> d1_lo = b3_lo ? d2_lo : d3_lo;
> d1_hi = b3_hi ? d2_hi : d3_hi;
>
> At the moment we rely on pattern matching for that. But it basically
> means that the pattern matcher has to "predict" what vector types the
> vectorizable_* routines would calculate, so that it knows whether to
> create pattern statements that enforce a different choice.
Yeah, there's cases where it's difficult...
> IMO it'd be really good to get rid of that and unify the vector
> type calculation. But that probably means not doing it during
> vectorizable_*, or at least changing the way that the vectorizable_*
> functions are called during analysis, so that information can flow in
> both directions.
Note currently we're committing to vector types even earlier than
vectorizable_* (SLP analysis, dataref analysis and
mark_stmts_to_be_vectorized). We also can't delay endlessly,
at the moment vectorizable_* rely on a fixed vectorization factor
and since those functions eventually check optabs for support they
also really need to have fixed vector types for an operations
input and output vectors.
Now - we could move towards vectorizable_* only computing
constraints for vector input/output types (say, a plus needs
compatible intput/output and the machine can do v2si and v4si)
and then some "magic" afterwards computes the optimal set of
types. That would then require to dissect costing from
vectorizable_* until after the types are fixed. And I fear
it wouldn't handle the case you cite above where we also
change the overall pattern - and thus may affect the
compatibility matrix of operations.
But yes, I'm currently facing the issue that SLP discovery
commits to a vectorization factor too early. I'm going to
need to change the max_nunits thing to a min_vf which then
ends me at an iteration over possible VFs again (compared
to iteration over vector modes). Not completely sure how
things will play out though.
Thanks,
Richard.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-05-19 7:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18 14:27 [PATCH][RFC] enfoce SLP_TREE_VECTYPE on invariants Richard Biener
2020-05-18 15:33 ` Richard Sandiford
2020-05-19 7:32 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).