From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
 by sourceware.org (Postfix) with ESMTPS id C54CC385700D
 for <gcc-patches@gcc.gnu.org>; Wed,  4 Nov 2020 15:14:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C54CC385700D
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=rguenther@suse.de
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
 by mx2.suse.de (Postfix) with ESMTP id BC4C3AB5C;
 Wed,  4 Nov 2020 15:14:54 +0000 (UTC)
Date: Wed, 4 Nov 2020 16:14:54 +0100 (CET)
From: Richard Biener <rguenther@suse.de>
Sender: rguenther@c653.arch.suse.de
To: Tamar Christina <Tamar.Christina@arm.com>
cc: Richard Sandiford <Richard.Sandiford@arm.com>, nd <nd@arm.com>, 
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
 scaffolding.
In-Reply-To: <VI1PR08MB532558C058BF2A8C31ED2FA8FFEF0@VI1PR08MB5325.eurprd08.prod.outlook.com>
Message-ID: <nycvar.YFH.7.76.2011041613040.10073@p653.nepu.fhfr.qr>
References: <20200925142753.GA13692@arm.com>
 <nycvar.YFH.7.76.2009281405360.10073@p653.nepu.fhfr.qr>
 <VI1PR08MB532554C7BAF0AD8C02B793EDFF350@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <VI1PR08MB5325F124EA9661D12C00AA57FF350@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2009290834170.10073@p653.nepu.fhfr.qr>
 <mptsgb153zc.fsf@arm.com>
 <nycvar.YFH.7.76.2009291134300.10073@p653.nepu.fhfr.qr>
 <VI1PR08MB53250748BBC032014BB94CF0FF110@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2011041209570.10073@p653.nepu.fhfr.qr>
 <VI1PR08MB5325F3496EF0CB5218BD87D6FFEF0@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2011041449410.10073@p653.nepu.fhfr.qr>
 <VI1PR08MB532558C058BF2A8C31ED2FA8FFEF0@VI1PR08MB5325.eurprd08.prod.outlook.com>
User-Agent: Alpine 2.21 (LSU 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 INDUSTRIAL_BODY, INDUSTRIAL_SUBJECT, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3,
 RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Nov 2020 15:14:59 -0000

On Wed, 4 Nov 2020, Tamar Christina wrote:

> > -----Original Message-----
> > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 2:04 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>;
> > gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> > 
> > On Wed, 4 Nov 2020, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > > > Behalf Of Richard Biener
> > > > Sent: Wednesday, November 4, 2020 12:41 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>;
> > > > gcc-patches@gcc.gnu.org
> > > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern
> > > > matching scaffolding.
> > > >
> > > > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > > >
> > > > > Hi Richi,
> > > > >
> > > > > This is a respin which includes the changes you requested.
> > > >
> > > > Comments randomly ordered, I'm pasting in pieces of the patch -
> > > > sending it inline would help to get pieces properly quoted and in-order.
> > > >
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > > >
> > 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f
> > > > 9
> > > > aa42ffe0ce0dd
> > > > 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > ...
> > > >
> > > > I miss comments in this file, see tree-vectorizer.h where we try to
> > > > document purpose of classes and fields.
> > > >
> > > > Things that sticks out to me:
> > > >
> > > > +    uint8_t m_arity;
> > > > +    uint8_t m_num_args;
> > > >
> > > > why uint8_t and not simply unsigned int?  Not knowing what arity /
> > > > num_args should be here ;)
> > >
> > > I think I can remove arity, but num_args is how many operands the
> > > created internal function call should take.  Since we can't vectorize
> > > calls with more than
> > > 4 arguments at the moment it seemed like 255 would be a safe limit :).
> > >
> > > >
> > > > +    vec_info *m_vinfo;
> > > > ...
> > > > +    vect_pattern (slp_tree *node, vec_info *vinfo)
> > > >
> > > > so this looks like something I freed stmt_vec_info of -
> > > > back-pointers in the "wrong" direction of the logical hierarchy.  I
> > > > suppose it's just to avoid passing down vinfo where we need it?
> > > > Please do that instead - pass down vinfo as everything else does.
> > > >
> > > > The class seems to expose both very high-level (build () it!) and
> > > > very low level details (get_ifn).  The high-level one suggests that
> > > > a pattern _not_ being represented by an ifn is possible but there's
> > > > too much implementation detail already in the vect_pattern class to
> > > > make that impossible.  I guess the IFN details could be pushed down
> > > > to the simple matching class (and that be called vect_ifn_pattern or so).
> > > >
> > > > +static bool
> > > > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> > > > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > > > +  bool found_p = false;
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +    {
> > > > +      dump_printf_loc (MSG_NOTE, vect_location, "-- before patt
> > > > + match
> > > > --\n");
> > > > +      vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > > +      dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> > > > +    }
> > > >
> > > > we dumped all instances after their analysis.  Maybe just refer to
> > > > the instance with its address (dump_print %p) so lookup in the
> > > > (already large) dump file is easy.
> > > >
> > > > +  hash_set<slp_tree> *visited = new hash_set<slp_tree> ();  for
> > > > + (unsigned x = 0; x < num__slp_patterns; x++)
> > > > +    {
> > > > +      visited->empty ();
> > > > +      found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> > > > slp_patterns[x],
> > > > +                                           visited);
> > > > +    }
> > > > +
> > > > +  delete visited;
> > > >
> > > > no need to new / delete, just do
> > > >
> > > >   has_set<slp_tree> visited;
> > > >
> > > > like everyone else.  Btw, do you really want to scan pieces of the
> > > > SLP graph (with instances being graph entries) multiple times?  If
> > > > not then you should move the visited set to the caller instead.
> > > >
> > > > +  /* TODO: Remove in final version, only here for generating debug
> > > > + dot
> > > > graphs
> > > > +          from SLP tree.  */
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +    {
> > > > +      dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> > > > +      vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > > +      dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> > > > +    }
> > > >
> > > > now, if there was some pattern matched it is probably useful to dump
> > > > the graph (entry) again.  But only conditional on that I think.  So
> > > > can you instead make the dump conditional on found_p and remove the
> > > > start dot/end dot markers as said in the comment?
> > > >
> > > > +         if (dump_enabled_p ())
> > > > +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +                            "transformation for %s not valid due to
> > > > + post
> > > > "
> > > > +                            "condition\n",
> > > >
> > > > not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
> > > > MSG_MISSED_OPTIMIZATION should be used for things (likely) making
> > > > vectorization fail.
> > > >
> > > > +  /* Perform recursive matching, it's important to do this after
> > > > + matching
> > > > things
> > > >
> > > > before matching things?
> > > >
> > > > +     in the current node as the matches here may re-order the nodes
> > > > + below
> > > > it.
> > > > +     As such the pattern that needs to be subsequently match may
> > change.
> > > >
> > > > and this is no longer true?
> > > >
> > > > */
> > > > +
> > > > +  if (SLP_TREE_CHILDREN (node).exists ()) {
> > > >
> > > > elide this check, the loop will simply not run if empty
> > > >
> > > > +    slp_tree child;
> > > > +    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > > >
> > > > I think you want to perform the recursion in the caller so you do it
> > > > only once and not once for each pattern kind now that you do
> > > > post-order processing rather than pre-order.
> > >
> > > That wouldn't work as that would make multiply match before FMA/FMS.
> > 
> > Hmm, I see.  Note that to avoid visiting multiple entries into the graph
> > multiple times you then need another visited set or compute
> > pre- or post-order of the entire graph into an array separately.
> > 
> > > I had tried having FM(A|S) match MUL instead, but MUL does a lot of
> > > work to create permute nodes, which would have to be undone.
> > 
> > Nevertheless I think that's what regular pattern matching does.  Is matching
> > FMA from "scalar" really that much simpler than matching first MUL and then
> > FMA with a MUL component?
> 
> No, the matching itself no. (in fact I already have a branch where I do this).
> The FMA case however does end up needing to drop half the load nodes
> (which I need to pick at random really).
> 
> Is it OK to do so? If so then I can use this variant. If I split FMA and FMS then
> I can do it if dropping the loads at this point doesn't cause any issues. 

Sure, dopping nodes is easy.

> Would you prefer I just elide the nodes or add a VEC_PERM that just selects
> from the first vector only?

Drop them.

> > 
> > > The FMS case is somewhat easy as that still has a two operators node
> > > in between, But FMA is just a simple add on top.  I have to then
> > > remove the merging nodes that MUL Created, and drop half the operands
> > to create the new node.
> > >
> > > Which is doable but effectively does the same pattern twice, whereas
> > > now the MUL would just skip all that work since FMA matched.
> > >
> > > The post order is essentially really problematic for overlapping
> > > patterns.  In pre-order I could for Instance detect all 3 patterns in one
> > traversal and share the matching information between all 3.
> > 
> > I do remember you doing this in some earlier variant but it looked equally
> > complicated.  What you've now done is do a post-order walk for each pattern
> > kind.
> > 
> > So I'd like to get the recursion up to fix the 'visited' thing.  We can either have
> > ->pre_match and ->post_match (ugh) or bite the bullet and do FM(A|S)
> > ontop of MUL (I believe for the earlier variant you said that would be
> > possible).
> > 
> > > Right now I can only do it between mul and add and FMA has to be
> > standalone.
> > >
> > > >
> > > > +  vect_pattern *pattern = patt_fn (ref_node, vinfo);
> > > > +
> > > > +  if (pattern->matches ())
> > > >
> > > > this suggests you get number of SLP nodes times number of pattern
> > > > matchers allocations & inits of vect_pattern.  If you move
> > > >
> > > > +      if (pattern->is_optab_supported_p (vectype,
> > OPTIMIZE_FOR_SPEED))
> > > > +       {
> > > >
> > > > into ->matches () then whether this is a IFN or multi-node pattern
> > > > becomes an implementation detail which would be IMHO better.
> > > >
> > > > +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> > > > +    {
> > > > +      /* Calculate the location of the statement in NODE to replace.  */
> > > > +      stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > > > +      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > > > +      tree type = gimple_expr_type (old_stmt);
> > > > +
> > > > +      /* Create the argument set for use by
> > > > gimple_build_call_internal_vec.  */
> > > > +      for (int i = 0; i < this->m_num_args; i++)
> > > >
> > > > the scalar argument type is the type of the definition so instead of
> > > > gimple_expr_type you want simply TREE_TYPE (gimple_get_lhs
> > > > (old_stmt))
> > > >
> > > > But then you seem to mix the result and the argument types?  I think
> > > > you need to walk over SLP_TREE_CHILDREN and look at their
> > > > representatives scalar def.  Which would then assume the pattern
> > > > consumes all direct children.  Somehow this "generic"
> > >
> > > Since I'm only consuming simple math ops would the argument and result
> > > types differ? Something like a widening sub I can see but not the normal
> > sub.
> > 
> > No.  Still gimple_expr_type is somewhat of an odd beast so better avoid it.
> > And document this restriction.
> > 
> > > > vect_simple_pattern_match::build () has quite some restrictions.
> > > > I guess they are documented in the files toplevel comment which also
> > > > has general parts applying to all possible pattern matchers (not
> > > > deriving from vect_simple_pattern_match).  Can you split up the
> > > > comment and more explicitely mark what applies to patterns in
> > > > general (postorder traversal) and what applies to
> > vect_simple_pattern_match only?
> > > >
> > > > +      /* We have to explicitly mark the old statement as unused
> > > > + because
> > > > during
> > > > +        statement analysis the original and new pattern statement
> > > > + may
> > > > require
> > > >
> > > > comment needs updating.
> > > >
> > > > +        different level of unrolling.  As an example add/sub when
> > > > vectorized
> > > > +        without a pattern requires 4 copies, whereas with a
> > > > + COMPLEX_ADD
> > > > pattern
> > > > +        this only requires 2 copies and the two statement will be
> > > > + treated
> > > > as
> > > > +        hand unrolled.  That means that the analysis won't happen
> > > > + as
> > > > it'll find
> > > > +        a mismatch.  So we don't analyze the old statement and if
> > > > + we end
> > > > up
> > > > +        needing it, e.g. SLP fails then we have to quickly re-analyze it.
> > > > */
> > > > +      STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope;
> > > >
> > > > I guess relevance should be copied from the original stmt (like if
> > > > it was used_by_reduction).  Can't that be done in
> > > > vect_mark_pattern_stmts?  Likewise the ->add_pattern_stmt part could
> > > > be done there, too.  So you'd be left with
> > > >
> > > > +      STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > > >
> > > > and
> > > >
> > > > +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > > >
> > > > where the pure_slp setting should in theory be not needed (it should
> > > > be automagically detected so) but setting it is not wrong (it would
> > > > be an optimization).
> > >
> > > I am using it as a marker for a node dissolve needs to look at when
> > > undoing SLP so it restores the relevancy, but given your comment below
> > > on that I may be able to get rid of it.
> > >
> > > >
> > > > --
> > > >
> > > > I'm now looking down the series to see how this is all used, so
> > > > quoting different pieces from other patches.
> > > >
> > > > +/* Checks to see of the expression EXPR is a gimple assign with code
> > CODE
> > > > +   and if this is the case the two operands of EXPR is returned in
> > > > +OP1
> > > > and
> > > > +   OP2.
> > > > +
> > > > +   If the matching and extraction is successful TRUE is returned
> > > > otherwise
> > > > +   FALSE in which case the value of OP1 and OP2 will not have been
> > > > touched.
> > > > +*/
> > > > +
> > > > +static bool
> > > > +vect_match_expression_p (slp_tree node, tree_code code) {
> > > > +  if (!SLP_TREE_REPRESENTATIVE (node))
> > > >
> > > > I see that many overall function comments do not actually match
> > > > anymore, if at least for the parameters refered to.  Please go over
> > > > the patch and update them.
> > > >
> > > > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> > (node));
> > > > + if (!is_gimple_assign (expr)
> > > > +      || gimple_expr_code (expr) != code)
> > > >
> > > > gimple_assign_rhs_code (expr)
> > > >
> > > > +/* Check if the given lane permute in PERMUTES matches an
> > > > +alternating
> > > > sequence
> > > > +   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.   */
> > > > +static bool
> > > > +vect_check_lane_permute (lane_permutation_t &permutes,
> > > > +                        unsigned p0, unsigned p1) {
> > > > +  unsigned val[2] = {p0, p1};
> > > > +  for (unsigned i = 0; i < permutes.length (); i++)
> > > > +    if (permutes[i].first != val[i % 2])
> > > > +      return false;
> > > >
> > > > so this doesn't put any constraint on permutes[].second.  So it
> > > > matches an arbitrary interleaving of two vectors.
> > > >
> > > > +
> > > > +   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
> > > > +*/
> > > > +
> > > > +static complex_operation_t
> > > > +vect_detect_pair_op (slp_tree node1, slp_tree node2,
> > > > +lane_permutation_t
> > > > &lanes,
> > > > +                    bool two_operands = true, vec<slp_tree> *ops =
> > > > + NULL)
> > > >
> > > >  { {_37, _38}, {_12, _36} }
> > > >
> > > > there's a lot of vect_match_expression_p calls and I wonder if
> > > > inlining them and CSEing the 'code' compute of node1 and node2 would
> > > > simplify things and elide the macro.  Also we then know this is all
> > > > CSEd in the end ...
> > > >
> > > > I wonder what LANES specifies, TWO_OPERANDS specifies whether
> > > > two-operand variants are allowed unless .. (guess LANES causes only
> > > > same operands to match up in the end).
> > > >
> > >
> > > I'm guarding against that you could technically speaking have a +/-
> > > node that does reverses the order of the elements below it, in which
> > > case that's a different operation than what we want to guard against.
> > >
> > > I hadn't been able to prove to myself that SLP build would never do this.
> > >
> > > > +static void
> > > > +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree
> > > > node)
> > > > +{
> > > >
> > > > need to find a use for this still, but don't you set pure_slp
> > > > "above"?
> > >
> > > This is used to mark the elided nodes.  For instance in a complex
> > > multiply the
> > > * nodes in the SLP tree become unused.  While not having them in the
> > > SLP tree is simple, the detection of pure/hybrid is led from the BB
> > > statements so they would mark the elided nodes as being hybrid.
> > 
> > What does regular pattern matching do here?  Ah, the relevance compute is
> > after pattern matching.  But you don't know whether it's still relevant (for
> > hybrid operation).
> > 
> > > There is of course the problem of is this node used in any other SLP
> > > instance, it may force the analysis to think the SLP tree is pure, but
> > > if SLP fails it would try not SLP anyway as it would have done eventually if
> > Hybrid.
> > 
> > I still think the code will have some issues with relevance and pure vs.
> > hybrid ...
> > 
> > > >
> > > > +bool
> > > > +complex_pattern::validate_p ()
> > > > +{
> > > > +      unsigned i;
> > > > +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
> > > > +       {
> > > > +         slp_tree vnode = NULL;
> > > > +         if (vect_slp_make_linear (node, tmp, &vnode))
> > > > +           nodes.quick_push (vnode);
> > > > +         else
> > > > +           {
> > > > +             nodes.release();
> > > > +             return false;
> > > > +           }
> > > >
> > > > hum.  vect_slp_make_linear better not fail?  At least we've created
> > > > half-way stuff when it does.  Can't spot the implementation right
> > > > now (guess splitting the patch up doesn't really help much ...)
> > > >
> > >
> > > It can fail, so I see I need to free some nodes otherwise I'll leak...
> > > But It's not a correctness issue. Nothing It changed on NODE until it
> > > all succeeds.
> > >
> > > > Noting elsewhere:
> > > >
> > > > +static void
> > > > +vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo,
> > > > +                                hash_set<slp_tree> *visited,
> > > > +slp_tree
> > > > root)
> > > > +{
> > > > +  if (!root || visited->contains (root))
> > > > +    return;
> > > > +
> > > > +  unsigned int i;
> > > > +  slp_tree node;
> > > > +  stmt_vec_info related_stmt_info;
> > > > +  stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (root);
> > > > +
> > > > +  visited->add (root);
> > > >
> > > > visited->add () returns whether the key was already there so
> > > > please combine the contains and the add calls here and elsewhere.
> > > >
> > > >     if (stmt_info && STMT_VINFO_SLP_VECT_ONLY (stmt_info)
> > > > +        && (related_stmt_info = STMT_VINFO_RELATED_STMT
> > > > + (stmt_info)) !=
> > > > NULL)
> > > > +      {
> > > > +       if (dump_enabled_p ())
> > > >
> > > > I think STMT_VINFO_SLP_VECT_ONLY is a thing already, so I wonder if
> > > > the above is a sufficient test.  There's is_pattern_stmt_p () (which
> > > > obviously also applies to non-SLP only patterns).
> > > >
> > > > Note that I also see another issue, namely with hybrid SLP the
> > > > original non-pattern stmt would need to stay relevant.  That
> > > > probably asks for hybrid SLP discovery and relevance marking to be
> > > > combined somehow.  You can probably create a simple testcase by
> > > > storing a lane of a complex op via a non-grouped store.
> > > >
> > > > +       STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
> > > > +       STMT_VINFO_RELEVANT (related_stmt_info) =
> > > > + vect_used_in_scope;
> > > >
> > > > as said previously the relevance should be copied, in this case back
> > > > to the original stmt info from the pattern stmt info.
> > > >
> > > > +       STMT_VINFO_IN_PATTERN_P (related_stmt_info) = false;
> > > > +       STMT_SLP_TYPE (related_stmt_info) = loop_vect;
> > > >
> > > > one option might be to delay pattern recognition (for the loop case)
> > > > to after vect_detect_hybrid_slp and simply not make any hybrid stmt
> > > > containing nodes part of a SLP pattern.  It would be a workaround of
> > > > course, not the best solution.  Another option is to add a new field
> > > > to stmt_info to put a "saved" relevance to and simply go over all
> > > > stmts restoring relevance - we already restore SLP_TYPE to loop_vect
> > > > that way at
> > >
> > > Yes.. ideally I would want something like telling it "don't worry
> > > about this if you end up being pure SLP".
> > >
> > > I haven't tried with this new approach to instead add it to the
> > PATTERN_SEQ
> > > for the statement.   Last time I tried however it looked like in order to get
> > this
> > > to work correctly I'd need a bigger hack..
> > 
> > I don't think PATTERN_SEQ would help us here, we'd need a "ORIG_SEQ"
> > though we should eventually have that by means of the pattern root scalar
> > stmts.
> 
> Shouldn't it? If I mark the unused nodes as part of the same pattern, vect_analyze_stmt
> would examine the pattern instead, but once it's relevancy is determined it won't analyze it
> again.  The unused statements then don't have any relevancy which will get
> vect_detect_hybrid_slp to ignore them which should get us what we want no?
> 
> Unless I'm misinterpreting something.

The area is complex so I might misremeber as well ... guess we need
to figure out the hard way.

Richard.

> Regards,
> Tamar
> 
> > 
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > > >
> > > >   /* Reset SLP type to loop_vect on all stmts.  */
> > > >   for (i = 0; i < LOOP_VINFO_LOOP (loop_vinfo)->num_nodes; ++i)
> > > >     {
> > > >
> > > > so simply restore the saved relevance would work there (guess I like
> > > > that more than what vect_dissolve_slp_only_patterns does, in any
> > > > case do what vect_dissolve_slp_only_patterns does in the above loop
> > > > please).
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > >
> > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	* Makefile.in (tree-vect-slp-patterns.o): New.
> > > > > 	* doc/passes.texi: Update documentation.
> > > > > 	* tree-vect-slp.c (vect_print_slp_tree): Add new state.
> > > > > 	(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
> > > > > 	(vect_analyze_slp): Call pattern matcher.
> > > > > 	* tree-vectorizer.h (enum _complex_operation):
> > > > > 	(class vect_pattern_match, class vect_pattern): New.
> > > > > 	* tree-vect-slp-patterns.c: New file.
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de>
> > > > > > On Behalf Of Richard Biener
> > > > > > Sent: Tuesday, September 29, 2020 10:42 AM
> > > > > > To: Richard Sandiford <Richard.Sandiford@arm.com>
> > > > > > Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>;
> > > > gcc-
> > > > > > patches@gcc.gnu.org
> > > > > > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern
> > > > > > matching scaffolding.
> > > > > >
> > > > > > On Tue, 29 Sep 2020, Richard Sandiford wrote:
> > > > > >
> > > > > > > Richard Biener <rguenther@suse.de> writes:
> > > > > > > >> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance
> > > > > > > >> > > (vec_info
> > > > > > *vinfo,
> > > > > > > >> > >                               &tree_size, bst_map);
> > > > > > > >> > >    if (node != NULL)
> > > > > > > >> > >      {
> > > > > > > >> > > +      /* Temporarily allow add_stmt calls again.  */
> > > > > > > >> > > +      vinfo->stmt_vec_info_ro = false;
> > > > > > > >> > > +
> > > > > > > >> > > +      /* See if any patterns can be found in the
> > > > > > > >> > > + constructed SLP
> > > > tree
> > > > > > > >> > > +        before we do any analysis on it.  */
> > > > > > > >> > > +      vect_match_slp_patterns (node, vinfo,
> > > > > > > >> > > + group_size,
> > > > > > &max_nunits,
> > > > > > > >> > > +                              matches, &npermutes,
> > > > > > > >> > > + &tree_size, bst_map);
> > > > > > > >> > > +
> > > > > > > >> > > +      /* After this no more add_stmt calls are allowed.  */
> > > > > > > >> > > +      vinfo->stmt_vec_info_ro = true;
> > > > > > > >> > > +
> > > > > > > >> > >
> > > > > > > >> > > I think this is a bit early to match patterns - I'd
> > > > > > > >> > > defer it to the point where all entries into the same
> > > > > > > >> > > SLP subgraph are analyzed, thus somewhere at the end of
> > > > > > > >> > > vect_analyze_slp loop over all instances and match
> > > > > > > >> > > patterns?  That way phases are
> > > > more
> > > > > > clearly separated.
> > > > > > > >> >
> > > > > > > >> > That would probably work, my only worry is that the SLP
> > > > > > > >> > analysis itself may fail and bail out at
> > > > > > > >> >
> > > > > > > >> > 	  /* If the loads and stores can be handled with
> > > > > > > >> > load/store-
> > > > lane
> > > > > > > >> > 	     instructions do not generate this SLP instance.  */
> > > > > > > >> > 	  if (is_a <loop_vec_info> (vinfo)
> > > > > > > >> > 	      && loads_permuted
> > > > > > > >> > 	      && dr && vect_store_lanes_supported (vectype,
> > > > group_size,
> > > > > > > >> > false))
> > > > > > > >
> > > > > > > > Ah, that piece of code.  Yeah, I'm repeatedly running into
> > > > > > > > it as well - it's a bad hack that stands in the way all the
> > > > > > > > time :/
> > > > > > >
> > > > > > > At one point I was wondering about trying to drop the above,
> > > > > > > vectorise with and without SLP, and then compare their costs,
> > > > > > > like for
> > > > > > VECT_COMPARE_COSTS.
> > > > > > > But that seemed like a dead end with the move to doing
> > > > > > > everything on the SLP representation.
> > > > > >
> > > > > > Yeah ... though even moving everything to the SLP representation
> > > > > > will
> > > > retain
> > > > > > the issue since there it will be N group-size 1 SLP instances
> > > > > > vs. 1 group-
> > > > size N
> > > > > > SLP instance.
> > > > > >
> > > > > > > > I guess we should try moving this upward like to
> > > > > > > > vect_analyze_loop_2 right before
> > > > > > > >
> > > > > > > >   /* Check the SLP opportunities in the loop, analyze and
> > > > > > > > build SLP
> > > > trees.
> > > > > > > > */
> > > > > > > >   ok = vect_analyze_slp (loop_vinfo, *n_stmts);
> > > > > > > >   if (!ok)
> > > > > > > >     return ok;
> > > > > > > >
> > > > > > > > and there check whether all grouped loads and stores can be
> > > > > > > > handled with store-/load-lanes (and there are no SLP
> > > > > > > > reduction chains?) in which case do not try to attempt SLP
> > > > > > > > at all.  Because the testcases this check was supposed to
> > > > > > > > change were all-load/store-lane or all SLP so the mixed case is
> > probably not worth special casing.
> > > > > > > >
> > > > > > > > Since load-/store-lanes is an arm speciality I tried to only
> > > > > > > > touch this fragile part with a ten-foot pole ;)  CCing
> > > > > > > > Richard, if he acks the above I can produce a patch.
> > > > > > >
> > > > > > > Yeah, sounds good to me.  Probably also sorth checking whether
> > > > > > > the likely_max iteration count is high enough to support
> > > > > > > group_size vectors, if we have enough information to guess that.
> > > > > > >
> > > > > > > We could also get the gen* machinery to emit a macro that is
> > > > > > > true if at least one load/store-lane pattern is defined, so
> > > > > > > that we can skip the code for non-Arm targets.  I can do that as a
> > follow-up.
> > > > > >
> > > > > > I've had a second look and one complication is that we only
> > > > > > elide the SLP node if any of the loads are permuted.  So if all
> > > > > > loads/stores are
> > > > unpermuted
> > > > > > but load/store-lanes would work we'd keep the SLP node.
> > > > > >
> > > > > > Of course without actually building the SLP node we don't know
> > > > > > whether
> > > > the
> > > > > > loads will be permuted or not ...
> > > > > >
> > > > > > But surely the current place for the check will cause some
> > > > > > testcases to become hybrid vectorizations which is likely undesirable.
> > > > > >
> > > > > > So we could move the check after all SLP discovery is completed
> > > > > > and
> > > > throw it
> > > > > > all away if we can and should use load/store-lanes?
> > > > > > But that might then not solve Tamars issue.
> > > > > >
> > > > > > Richard.
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > Nuernberg, Germany; GF: Felix Imend
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend