RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Richard Sandiford <Richard.Sandiford@arm.com>, nd <nd@arm.com>,
	 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.
Date: Wed, 4 Nov 2020 13:40:59 +0100 (CET)	[thread overview]
Message-ID: <nycvar.YFH.7.76.2011041209570.10073@p653.nepu.fhfr.qr> (raw)
In-Reply-To: <VI1PR08MB53250748BBC032014BB94CF0FF110@VI1PR08MB5325.eurprd08.prod.outlook.com>

On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> This is a respin which includes the changes you requested.

Comments randomly ordered, I'm pasting in pieces of the patch -
sending it inline would help to get pieces properly quoted and in-order.

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 
4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9aa42ffe0ce0dd 
100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
...

I miss comments in this file, see tree-vectorizer.h where we try
to document purpose of classes and fields.

Things that sticks out to me:

+    uint8_t m_arity;
+    uint8_t m_num_args;

why uint8_t and not simply unsigned int?  Not knowing what
arity / num_args should be here ;)

+    vec_info *m_vinfo;
...
+    vect_pattern (slp_tree *node, vec_info *vinfo)

so this looks like something I freed stmt_vec_info of - back-pointers
in the "wrong" direction of the logical hierarchy.  I suppose it's
just to avoid passing down vinfo where we need it?  Please do that
instead - pass down vinfo as everything else does.

The class seems to expose both very high-level (build () it!)
and very low level details (get_ifn).  The high-level one suggests
that a pattern _not_ being represented by an ifn is possible
but there's too much implementation detail already in the
vect_pattern class to make that impossible.  I guess the IFN
details could be pushed down to the simple matching class
(and that be called vect_ifn_pattern or so).

+static bool
+vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
+  bool found_p = false;
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match 
--\n");
+      vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
+      dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
+    }

we dumped all instances after their analysis.  Maybe just
refer to the instance with its address (dump_print %p) so
lookup in the (already large) dump file is easy.

+  hash_set<slp_tree> *visited = new hash_set<slp_tree> ();
+  for (unsigned x = 0; x < num__slp_patterns; x++)
+    {
+      visited->empty ();
+      found_p |= vect_match_slp_patterns_2 (ref_node, vinfo, 
slp_patterns[x],
+                                           visited);
+    }
+
+  delete visited;

no need to new / delete, just do

  has_set<slp_tree> visited;

like everyone else.  Btw, do you really want to scan
pieces of the SLP graph (with instances being graph entries)
multiple times?  If not then you should move the visited
set to the caller instead.

+  /* TODO: Remove in final version, only here for generating debug dot 
graphs
+          from SLP tree.  */
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
+      vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
+      dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
+    }

now, if there was some pattern matched it is probably useful
to dump the graph (entry) again.  But only conditional on that
I think.  So can you instead make the dump conditional on
found_p and remove the start dot/end dot markers as said in the comment?

+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                            "transformation for %s not valid due to post 
"
+                            "condition\n",

not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
MSG_MISSED_OPTIMIZATION should be used for things (likely) making
vectorization fail.

+  /* Perform recursive matching, it's important to do this after matching 
things

before matching things?

+     in the current node as the matches here may re-order the nodes below 
it.
+     As such the pattern that needs to be subsequently match may change.  

and this is no longer true?

*/
+
+  if (SLP_TREE_CHILDREN (node).exists ()) {

elide this check, the loop will simply not run if empty

+    slp_tree child;
+    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)

I think you want to perform the recursion in the caller so you
do it only once and not once for each pattern kind now that you
do post-order processing rather than pre-order.

+  vect_pattern *pattern = patt_fn (ref_node, vinfo);
+
+  if (pattern->matches ())

this suggests you get number of SLP nodes times number of pattern
matchers allocations & inits of vect_pattern.  If you move

+      if (pattern->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
+       {

into ->matches () then whether this is a IFN or multi-node pattern
becomes an implementation detail which would be IMHO better.

+  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
+    {
+      /* Calculate the location of the statement in NODE to replace.  */
+      stmt_info = SLP_TREE_REPRESENTATIVE (node);
+      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
+      tree type = gimple_expr_type (old_stmt);
+
+      /* Create the argument set for use by 
gimple_build_call_internal_vec.  */
+      for (int i = 0; i < this->m_num_args; i++)

the scalar argument type is the type of the definition so instead
of gimple_expr_type you want simply TREE_TYPE (gimple_get_lhs (old_stmt))

But then you seem to mix the result and the argument types?  I
think you need to walk over SLP_TREE_CHILDREN and look at their
representatives scalar def.  Which would then assume the pattern
consumes all direct children.  Somehow this "generic" 
vect_simple_pattern_match::build () has quite some restrictions.
I guess they are documented in the files toplevel comment
which also has general parts applying to all possible pattern
matchers (not deriving from vect_simple_pattern_match).  Can you
split up the comment and more explicitely mark what applies to
patterns in general (postorder traversal) and what applies
to vect_simple_pattern_match only?

+      /* We have to explicitly mark the old statement as unused because 
during
+        statement analysis the original and new pattern statement may 
require

comment needs updating.

+        different level of unrolling.  As an example add/sub when 
vectorized
+        without a pattern requires 4 copies, whereas with a COMPLEX_ADD 
pattern
+        this only requires 2 copies and the two statement will be treated 
as
+        hand unrolled.  That means that the analysis won't happen as 
it'll find
+        a mismatch.  So we don't analyze the old statement and if we end 
up
+        needing it, e.g. SLP fails then we have to quickly re-analyze it.  
*/
+      STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope;

I guess relevance should be copied from the original stmt (like if
it was used_by_reduction).  Can't that be done in
vect_mark_pattern_stmts?  Likewise the ->add_pattern_stmt part
could be done there, too.  So you'd be left with

+      STMT_SLP_TYPE (call_stmt_info) = pure_slp;

and

+      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;

where the pure_slp setting should in theory be not needed (it
should be automagically detected so) but setting it is not wrong
(it would be an optimization).

--

I'm now looking down the series to see how this is all used,
so quoting different pieces from other patches.

+/* Checks to see of the expression EXPR is a gimple assign with code CODE
+   and if this is the case the two operands of EXPR is returned in OP1 
and
+   OP2.
+
+   If the matching and extraction is successful TRUE is returned 
otherwise
+   FALSE in which case the value of OP1 and OP2 will not have been 
touched.
+*/
+
+static bool
+vect_match_expression_p (slp_tree node, tree_code code)
+{
+  if (!SLP_TREE_REPRESENTATIVE (node))

I see that many overall function comments do not actually match
anymore, if at least for the parameters refered to.  Please go
over the patch and update them.

+  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
+  if (!is_gimple_assign (expr)
+      || gimple_expr_code (expr) != code)

gimple_assign_rhs_code (expr)

+/* Check if the given lane permute in PERMUTES matches an alternating 
sequence
+   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.   */
+static bool
+vect_check_lane_permute (lane_permutation_t &permutes,
+                        unsigned p0, unsigned p1)
+{
+  unsigned val[2] = {p0, p1};
+  for (unsigned i = 0; i < permutes.length (); i++)
+    if (permutes[i].first != val[i % 2])
+      return false;

so this doesn't put any constraint on permutes[].second.  So it
matches an arbitrary interleaving of two vectors.

+
+   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
+*/
+
+static complex_operation_t
+vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t 
&lanes,
+                    bool two_operands = true, vec<slp_tree> *ops = NULL)

 { {_37, _38}, {_12, _36} }

there's a lot of vect_match_expression_p calls and I wonder if
inlining them and CSEing the 'code' compute of node1 and node2
would simplify things and elide the macro.  Also we then know
this is all CSEd in the end ...

I wonder what LANES specifies, TWO_OPERANDS specifies whether
two-operand variants are allowed unless .. (guess LANES causes
only same operands to match up in the end).

+static void
+vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree node)
+{

need to find a use for this still, but don't you set pure_slp
"above"?

+bool
+complex_pattern::validate_p ()
+{
+      unsigned i;
+      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
+       {
+         slp_tree vnode = NULL;
+         if (vect_slp_make_linear (node, tmp, &vnode))
+           nodes.quick_push (vnode);
+         else
+           {
+             nodes.release();
+             return false;
+           }

hum.  vect_slp_make_linear better not fail?  At least we've
created half-way stuff when it does.  Can't spot the implementation
right now (guess splitting the patch up doesn't really help much ...)

Noting elsewhere:

+static void
+vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo,
+                                hash_set<slp_tree> *visited, slp_tree 
root)
+{
+  if (!root || visited->contains (root))
+    return;
+
+  unsigned int i;
+  slp_tree node;
+  stmt_vec_info related_stmt_info;
+  stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (root);
+
+  visited->add (root);

visited->add () returns whether the key was already there so
please combine the contains and the add calls here and elsewhere.

    if (stmt_info && STMT_VINFO_SLP_VECT_ONLY (stmt_info)
+        && (related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info)) != 
NULL)
+      {
+       if (dump_enabled_p ())

I think STMT_VINFO_SLP_VECT_ONLY is a thing already, so I wonder
if the above is a sufficient test.  There's is_pattern_stmt_p ()
(which obviously also applies to non-SLP only patterns).

Note that I also see another issue, namely with hybrid SLP the
original non-pattern stmt would need to stay relevant.  That
probably asks for hybrid SLP discovery and relevance marking
to be combined somehow.  You can probably create a simple
testcase by storing a lane of a complex op via a non-grouped
store.

+       STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
+       STMT_VINFO_RELEVANT (related_stmt_info) = vect_used_in_scope;

as said previously the relevance should be copied, in this case
back to the original stmt info from the pattern stmt info.

+       STMT_VINFO_IN_PATTERN_P (related_stmt_info) = false;
+       STMT_SLP_TYPE (related_stmt_info) = loop_vect;

one option might be to delay pattern recognition (for the loop case)
to after vect_detect_hybrid_slp and simply not make any hybrid
stmt containing nodes part of a SLP pattern.  It would be a workaround
of course, not the best solution.  Another option is to
add a new field to stmt_info to put a "saved" relevance to and
simply go over all stmts restoring relevance - we already restore
SLP_TYPE to loop_vect that way at

  /* Reset SLP type to loop_vect on all stmts.  */
  for (i = 0; i < LOOP_VINFO_LOOP (loop_vinfo)->num_nodes; ++i)
    {

so simply restore the saved relevance would work there (guess I
like that more than what vect_dissolve_slp_only_patterns does,
in any case do what vect_dissolve_slp_only_patterns does in the
above loop please).

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in (tree-vect-slp-patterns.o): New.
> 	* doc/passes.texi: Update documentation.
> 	* tree-vect-slp.c (vect_print_slp_tree): Add new state.
> 	(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
> 	(vect_analyze_slp): Call pattern matcher.
> 	* tree-vectorizer.h (enum _complex_operation):
> 	(class vect_pattern_match, class vect_pattern): New.
> 	* tree-vect-slp-patterns.c: New file.
> 
> > -----Original Message-----
> > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > Behalf Of Richard Biener
> > Sent: Tuesday, September 29, 2020 10:42 AM
> > To: Richard Sandiford <Richard.Sandiford@arm.com>
> > Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>; gcc-
> > patches@gcc.gnu.org
> > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> > 
> > On Tue, 29 Sep 2020, Richard Sandiford wrote:
> > 
> > > Richard Biener <rguenther@suse.de> writes:
> > > >> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info
> > *vinfo,
> > > >> > >                               &tree_size, bst_map);
> > > >> > >    if (node != NULL)
> > > >> > >      {
> > > >> > > +      /* Temporarily allow add_stmt calls again.  */
> > > >> > > +      vinfo->stmt_vec_info_ro = false;
> > > >> > > +
> > > >> > > +      /* See if any patterns can be found in the constructed SLP tree
> > > >> > > +        before we do any analysis on it.  */
> > > >> > > +      vect_match_slp_patterns (node, vinfo, group_size,
> > &max_nunits,
> > > >> > > +                              matches, &npermutes, &tree_size,
> > > >> > > + bst_map);
> > > >> > > +
> > > >> > > +      /* After this no more add_stmt calls are allowed.  */
> > > >> > > +      vinfo->stmt_vec_info_ro = true;
> > > >> > > +
> > > >> > >
> > > >> > > I think this is a bit early to match patterns - I'd defer it to
> > > >> > > the point where all entries into the same SLP subgraph are
> > > >> > > analyzed, thus somewhere at the end of vect_analyze_slp loop
> > > >> > > over all instances and match patterns?  That way phases are more
> > clearly separated.
> > > >> >
> > > >> > That would probably work, my only worry is that the SLP analysis
> > > >> > itself may fail and bail out at
> > > >> >
> > > >> > 	  /* If the loads and stores can be handled with load/store-lane
> > > >> > 	     instructions do not generate this SLP instance.  */
> > > >> > 	  if (is_a <loop_vec_info> (vinfo)
> > > >> > 	      && loads_permuted
> > > >> > 	      && dr && vect_store_lanes_supported (vectype, group_size,
> > > >> > false))
> > > >
> > > > Ah, that piece of code.  Yeah, I'm repeatedly running into it as
> > > > well - it's a bad hack that stands in the way all the time :/
> > >
> > > At one point I was wondering about trying to drop the above, vectorise
> > > with and without SLP, and then compare their costs, like for
> > VECT_COMPARE_COSTS.
> > > But that seemed like a dead end with the move to doing everything on
> > > the SLP representation.
> > 
> > Yeah ... though even moving everything to the SLP representation will retain
> > the issue since there it will be N group-size 1 SLP instances vs. 1 group-size N
> > SLP instance.
> > 
> > > > I guess we should try moving this upward like to
> > > > vect_analyze_loop_2 right before
> > > >
> > > >   /* Check the SLP opportunities in the loop, analyze and build SLP trees.
> > > > */
> > > >   ok = vect_analyze_slp (loop_vinfo, *n_stmts);
> > > >   if (!ok)
> > > >     return ok;
> > > >
> > > > and there check whether all grouped loads and stores can be handled
> > > > with store-/load-lanes (and there are no SLP reduction chains?) in
> > > > which case do not try to attempt SLP at all.  Because the testcases
> > > > this check was supposed to change were all-load/store-lane or all
> > > > SLP so the mixed case is probably not worth special casing.
> > > >
> > > > Since load-/store-lanes is an arm speciality I tried to only touch
> > > > this fragile part with a ten-foot pole ;)  CCing Richard, if he acks
> > > > the above I can produce a patch.
> > >
> > > Yeah, sounds good to me.  Probably also sorth checking whether the
> > > likely_max iteration count is high enough to support group_size
> > > vectors, if we have enough information to guess that.
> > >
> > > We could also get the gen* machinery to emit a macro that is true if
> > > at least one load/store-lane pattern is defined, so that we can skip
> > > the code for non-Arm targets.  I can do that as a follow-up.
> > 
> > I've had a second look and one complication is that we only elide the SLP
> > node if any of the loads are permuted.  So if all loads/stores are unpermuted
> > but load/store-lanes would work we'd keep the SLP node.
> > 
> > Of course without actually building the SLP node we don't know whether the
> > loads will be permuted or not ...
> > 
> > But surely the current place for the check will cause some testcases to
> > become hybrid vectorizations which is likely undesirable.
> > 
> > So we could move the check after all SLP discovery is completed and throw it
> > all away if we can and should use load/store-lanes?
> > But that might then not solve Tamars issue.
> > 
> > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

next prev parent reply	other threads:[~2020-11-04 12:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25 14:27 Tamar Christina
2020-09-28 12:36 ` Richard Biener
2020-09-28 14:56   ` Tamar Christina
2020-09-28 17:24     ` Tamar Christina
2020-09-29  6:45       ` Richard Biener
2020-09-29  9:25         ` Richard Sandiford
2020-09-29  9:42           ` Richard Biener
2020-11-03 15:06             ` Tamar Christina
2020-11-04 12:40               ` Richard Biener [this message]
2020-11-04 13:37                 ` Tamar Christina
2020-11-04 14:04                   ` Richard Biener
2020-11-04 14:36                     ` Tamar Christina
2020-11-04 15:14                       ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.76.2011041209570.10073@p653.nepu.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Richard.Sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).