From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id C54CC385700D for ; Wed, 4 Nov 2020 15:14:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C54CC385700D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rguenther@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BC4C3AB5C; Wed, 4 Nov 2020 15:14:54 +0000 (UTC) Date: Wed, 4 Nov 2020 16:14:54 +0100 (CET) From: Richard Biener Sender: rguenther@c653.arch.suse.de To: Tamar Christina cc: Richard Sandiford , nd , "gcc-patches@gcc.gnu.org" Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding. In-Reply-To: Message-ID: References: <20200925142753.GA13692@arm.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, INDUSTRIAL_BODY, INDUSTRIAL_SUBJECT, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2020 15:14:59 -0000 On Wed, 4 Nov 2020, Tamar Christina wrote: > > -----Original Message----- > > From: rguenther@c653.arch.suse.de On > > Behalf Of Richard Biener > > Sent: Wednesday, November 4, 2020 2:04 PM > > To: Tamar Christina > > Cc: Richard Sandiford ; nd ; > > gcc-patches@gcc.gnu.org > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching > > scaffolding. > > > > On Wed, 4 Nov 2020, Tamar Christina wrote: > > > > > > -----Original Message----- > > > > From: rguenther@c653.arch.suse.de On > > > > Behalf Of Richard Biener > > > > Sent: Wednesday, November 4, 2020 12:41 PM > > > > To: Tamar Christina > > > > Cc: Richard Sandiford ; nd ; > > > > gcc-patches@gcc.gnu.org > > > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern > > > > matching scaffolding. > > > > > > > > On Tue, 3 Nov 2020, Tamar Christina wrote: > > > > > > > > > Hi Richi, > > > > > > > > > > This is a respin which includes the changes you requested. > > > > > > > > Comments randomly ordered, I'm pasting in pieces of the patch - > > > > sending it inline would help to get pieces properly quoted and in-order. > > > > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index > > > > > > 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f > > > > 9 > > > > aa42ffe0ce0dd > > > > 100644 > > > > --- a/gcc/tree-vectorizer.h > > > > +++ b/gcc/tree-vectorizer.h > > > > ... > > > > > > > > I miss comments in this file, see tree-vectorizer.h where we try to > > > > document purpose of classes and fields. > > > > > > > > Things that sticks out to me: > > > > > > > > + uint8_t m_arity; > > > > + uint8_t m_num_args; > > > > > > > > why uint8_t and not simply unsigned int? Not knowing what arity / > > > > num_args should be here ;) > > > > > > I think I can remove arity, but num_args is how many operands the > > > created internal function call should take. Since we can't vectorize > > > calls with more than > > > 4 arguments at the moment it seemed like 255 would be a safe limit :). > > > > > > > > > > > + vec_info *m_vinfo; > > > > ... > > > > + vect_pattern (slp_tree *node, vec_info *vinfo) > > > > > > > > so this looks like something I freed stmt_vec_info of - > > > > back-pointers in the "wrong" direction of the logical hierarchy. I > > > > suppose it's just to avoid passing down vinfo where we need it? > > > > Please do that instead - pass down vinfo as everything else does. > > > > > > > > The class seems to expose both very high-level (build () it!) and > > > > very low level details (get_ifn). The high-level one suggests that > > > > a pattern _not_ being represented by an ifn is possible but there's > > > > too much implementation detail already in the vect_pattern class to > > > > make that impossible. I guess the IFN details could be pushed down > > > > to the simple matching class (and that be called vect_ifn_pattern or so). > > > > > > > > +static bool > > > > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) { > > > > + DUMP_VECT_SCOPE ("vect_match_slp_patterns"); > > > > + bool found_p = false; > > > > + > > > > + if (dump_enabled_p ()) > > > > + { > > > > + dump_printf_loc (MSG_NOTE, vect_location, "-- before patt > > > > + match > > > > --\n"); > > > > + vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node); > > > > + dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n"); > > > > + } > > > > > > > > we dumped all instances after their analysis. Maybe just refer to > > > > the instance with its address (dump_print %p) so lookup in the > > > > (already large) dump file is easy. > > > > > > > > + hash_set *visited = new hash_set (); for > > > > + (unsigned x = 0; x < num__slp_patterns; x++) > > > > + { > > > > + visited->empty (); > > > > + found_p |= vect_match_slp_patterns_2 (ref_node, vinfo, > > > > slp_patterns[x], > > > > + visited); > > > > + } > > > > + > > > > + delete visited; > > > > > > > > no need to new / delete, just do > > > > > > > > has_set visited; > > > > > > > > like everyone else. Btw, do you really want to scan pieces of the > > > > SLP graph (with instances being graph entries) multiple times? If > > > > not then you should move the visited set to the caller instead. > > > > > > > > + /* TODO: Remove in final version, only here for generating debug > > > > + dot > > > > graphs > > > > + from SLP tree. */ > > > > + > > > > + if (dump_enabled_p ()) > > > > + { > > > > + dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n"); > > > > + vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node); > > > > + dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n"); > > > > + } > > > > > > > > now, if there was some pattern matched it is probably useful to dump > > > > the graph (entry) again. But only conditional on that I think. So > > > > can you instead make the dump conditional on found_p and remove the > > > > start dot/end dot markers as said in the comment? > > > > > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "transformation for %s not valid due to > > > > + post > > > > " > > > > + "condition\n", > > > > > > > > not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE. > > > > MSG_MISSED_OPTIMIZATION should be used for things (likely) making > > > > vectorization fail. > > > > > > > > + /* Perform recursive matching, it's important to do this after > > > > + matching > > > > things > > > > > > > > before matching things? > > > > > > > > + in the current node as the matches here may re-order the nodes > > > > + below > > > > it. > > > > + As such the pattern that needs to be subsequently match may > > change. > > > > > > > > and this is no longer true? > > > > > > > > */ > > > > + > > > > + if (SLP_TREE_CHILDREN (node).exists ()) { > > > > > > > > elide this check, the loop will simply not run if empty > > > > > > > > + slp_tree child; > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > > > > > > > I think you want to perform the recursion in the caller so you do it > > > > only once and not once for each pattern kind now that you do > > > > post-order processing rather than pre-order. > > > > > > That wouldn't work as that would make multiply match before FMA/FMS. > > > > Hmm, I see. Note that to avoid visiting multiple entries into the graph > > multiple times you then need another visited set or compute > > pre- or post-order of the entire graph into an array separately. > > > > > I had tried having FM(A|S) match MUL instead, but MUL does a lot of > > > work to create permute nodes, which would have to be undone. > > > > Nevertheless I think that's what regular pattern matching does. Is matching > > FMA from "scalar" really that much simpler than matching first MUL and then > > FMA with a MUL component? > > No, the matching itself no. (in fact I already have a branch where I do this). > The FMA case however does end up needing to drop half the load nodes > (which I need to pick at random really). > > Is it OK to do so? If so then I can use this variant. If I split FMA and FMS then > I can do it if dropping the loads at this point doesn't cause any issues. Sure, dopping nodes is easy. > Would you prefer I just elide the nodes or add a VEC_PERM that just selects > from the first vector only? Drop them. > > > > > The FMS case is somewhat easy as that still has a two operators node > > > in between, But FMA is just a simple add on top. I have to then > > > remove the merging nodes that MUL Created, and drop half the operands > > to create the new node. > > > > > > Which is doable but effectively does the same pattern twice, whereas > > > now the MUL would just skip all that work since FMA matched. > > > > > > The post order is essentially really problematic for overlapping > > > patterns. In pre-order I could for Instance detect all 3 patterns in one > > traversal and share the matching information between all 3. > > > > I do remember you doing this in some earlier variant but it looked equally > > complicated. What you've now done is do a post-order walk for each pattern > > kind. > > > > So I'd like to get the recursion up to fix the 'visited' thing. We can either have > > ->pre_match and ->post_match (ugh) or bite the bullet and do FM(A|S) > > ontop of MUL (I believe for the earlier variant you said that would be > > possible). > > > > > Right now I can only do it between mul and add and FMA has to be > > standalone. > > > > > > > > > > > + vect_pattern *pattern = patt_fn (ref_node, vinfo); > > > > + > > > > + if (pattern->matches ()) > > > > > > > > this suggests you get number of SLP nodes times number of pattern > > > > matchers allocations & inits of vect_pattern. If you move > > > > > > > > + if (pattern->is_optab_supported_p (vectype, > > OPTIMIZE_FOR_SPEED)) > > > > + { > > > > > > > > into ->matches () then whether this is a IFN or multi-node pattern > > > > becomes an implementation detail which would be IMHO better. > > > > > > > > + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node) > > > > + { > > > > + /* Calculate the location of the statement in NODE to replace. */ > > > > + stmt_info = SLP_TREE_REPRESENTATIVE (node); > > > > + gimple* old_stmt = STMT_VINFO_STMT (stmt_info); > > > > + tree type = gimple_expr_type (old_stmt); > > > > + > > > > + /* Create the argument set for use by > > > > gimple_build_call_internal_vec. */ > > > > + for (int i = 0; i < this->m_num_args; i++) > > > > > > > > the scalar argument type is the type of the definition so instead of > > > > gimple_expr_type you want simply TREE_TYPE (gimple_get_lhs > > > > (old_stmt)) > > > > > > > > But then you seem to mix the result and the argument types? I think > > > > you need to walk over SLP_TREE_CHILDREN and look at their > > > > representatives scalar def. Which would then assume the pattern > > > > consumes all direct children. Somehow this "generic" > > > > > > Since I'm only consuming simple math ops would the argument and result > > > types differ? Something like a widening sub I can see but not the normal > > sub. > > > > No. Still gimple_expr_type is somewhat of an odd beast so better avoid it. > > And document this restriction. > > > > > > vect_simple_pattern_match::build () has quite some restrictions. > > > > I guess they are documented in the files toplevel comment which also > > > > has general parts applying to all possible pattern matchers (not > > > > deriving from vect_simple_pattern_match). Can you split up the > > > > comment and more explicitely mark what applies to patterns in > > > > general (postorder traversal) and what applies to > > vect_simple_pattern_match only? > > > > > > > > + /* We have to explicitly mark the old statement as unused > > > > + because > > > > during > > > > + statement analysis the original and new pattern statement > > > > + may > > > > require > > > > > > > > comment needs updating. > > > > > > > > + different level of unrolling. As an example add/sub when > > > > vectorized > > > > + without a pattern requires 4 copies, whereas with a > > > > + COMPLEX_ADD > > > > pattern > > > > + this only requires 2 copies and the two statement will be > > > > + treated > > > > as > > > > + hand unrolled. That means that the analysis won't happen > > > > + as > > > > it'll find > > > > + a mismatch. So we don't analyze the old statement and if > > > > + we end > > > > up > > > > + needing it, e.g. SLP fails then we have to quickly re-analyze it. > > > > */ > > > > + STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope; > > > > > > > > I guess relevance should be copied from the original stmt (like if > > > > it was used_by_reduction). Can't that be done in > > > > vect_mark_pattern_stmts? Likewise the ->add_pattern_stmt part could > > > > be done there, too. So you'd be left with > > > > > > > > + STMT_SLP_TYPE (call_stmt_info) = pure_slp; > > > > > > > > and > > > > > > > > + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true; > > > > > > > > where the pure_slp setting should in theory be not needed (it should > > > > be automagically detected so) but setting it is not wrong (it would > > > > be an optimization). > > > > > > I am using it as a marker for a node dissolve needs to look at when > > > undoing SLP so it restores the relevancy, but given your comment below > > > on that I may be able to get rid of it. > > > > > > > > > > > -- > > > > > > > > I'm now looking down the series to see how this is all used, so > > > > quoting different pieces from other patches. > > > > > > > > +/* Checks to see of the expression EXPR is a gimple assign with code > > CODE > > > > + and if this is the case the two operands of EXPR is returned in > > > > +OP1 > > > > and > > > > + OP2. > > > > + > > > > + If the matching and extraction is successful TRUE is returned > > > > otherwise > > > > + FALSE in which case the value of OP1 and OP2 will not have been > > > > touched. > > > > +*/ > > > > + > > > > +static bool > > > > +vect_match_expression_p (slp_tree node, tree_code code) { > > > > + if (!SLP_TREE_REPRESENTATIVE (node)) > > > > > > > > I see that many overall function comments do not actually match > > > > anymore, if at least for the parameters refered to. Please go over > > > > the patch and update them. > > > > > > > > + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE > > (node)); > > > > + if (!is_gimple_assign (expr) > > > > + || gimple_expr_code (expr) != code) > > > > > > > > gimple_assign_rhs_code (expr) > > > > > > > > +/* Check if the given lane permute in PERMUTES matches an > > > > +alternating > > > > sequence > > > > + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. */ > > > > +static bool > > > > +vect_check_lane_permute (lane_permutation_t &permutes, > > > > + unsigned p0, unsigned p1) { > > > > + unsigned val[2] = {p0, p1}; > > > > + for (unsigned i = 0; i < permutes.length (); i++) > > > > + if (permutes[i].first != val[i % 2]) > > > > + return false; > > > > > > > > so this doesn't put any constraint on permutes[].second. So it > > > > matches an arbitrary interleaving of two vectors. > > > > > > > > + > > > > + will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}. > > > > +*/ > > > > + > > > > +static complex_operation_t > > > > +vect_detect_pair_op (slp_tree node1, slp_tree node2, > > > > +lane_permutation_t > > > > &lanes, > > > > + bool two_operands = true, vec *ops = > > > > + NULL) > > > > > > > > { {_37, _38}, {_12, _36} } > > > > > > > > there's a lot of vect_match_expression_p calls and I wonder if > > > > inlining them and CSEing the 'code' compute of node1 and node2 would > > > > simplify things and elide the macro. Also we then know this is all > > > > CSEd in the end ... > > > > > > > > I wonder what LANES specifies, TWO_OPERANDS specifies whether > > > > two-operand variants are allowed unless .. (guess LANES causes only > > > > same operands to match up in the end). > > > > > > > > > > I'm guarding against that you could technically speaking have a +/- > > > node that does reverses the order of the elements below it, in which > > > case that's a different operation than what we want to guard against. > > > > > > I hadn't been able to prove to myself that SLP build would never do this. > > > > > > > +static void > > > > +vect_mark_stmts_as_in_pattern (hash_set *cache, slp_tree > > > > node) > > > > +{ > > > > > > > > need to find a use for this still, but don't you set pure_slp > > > > "above"? > > > > > > This is used to mark the elided nodes. For instance in a complex > > > multiply the > > > * nodes in the SLP tree become unused. While not having them in the > > > SLP tree is simple, the detection of pure/hybrid is led from the BB > > > statements so they would mark the elided nodes as being hybrid. > > > > What does regular pattern matching do here? Ah, the relevance compute is > > after pattern matching. But you don't know whether it's still relevant (for > > hybrid operation). > > > > > There is of course the problem of is this node used in any other SLP > > > instance, it may force the analysis to think the SLP tree is pure, but > > > if SLP fails it would try not SLP anyway as it would have done eventually if > > Hybrid. > > > > I still think the code will have some issues with relevance and pure vs. > > hybrid ... > > > > > > > > > > +bool > > > > +complex_pattern::validate_p () > > > > +{ > > > > + unsigned i; > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp) > > > > + { > > > > + slp_tree vnode = NULL; > > > > + if (vect_slp_make_linear (node, tmp, &vnode)) > > > > + nodes.quick_push (vnode); > > > > + else > > > > + { > > > > + nodes.release(); > > > > + return false; > > > > + } > > > > > > > > hum. vect_slp_make_linear better not fail? At least we've created > > > > half-way stuff when it does. Can't spot the implementation right > > > > now (guess splitting the patch up doesn't really help much ...) > > > > > > > > > > It can fail, so I see I need to free some nodes otherwise I'll leak... > > > But It's not a correctness issue. Nothing It changed on NODE until it > > > all succeeds. > > > > > > > Noting elsewhere: > > > > > > > > +static void > > > > +vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo, > > > > + hash_set *visited, > > > > +slp_tree > > > > root) > > > > +{ > > > > + if (!root || visited->contains (root)) > > > > + return; > > > > + > > > > + unsigned int i; > > > > + slp_tree node; > > > > + stmt_vec_info related_stmt_info; > > > > + stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (root); > > > > + > > > > + visited->add (root); > > > > > > > > visited->add () returns whether the key was already there so > > > > please combine the contains and the add calls here and elsewhere. > > > > > > > > if (stmt_info && STMT_VINFO_SLP_VECT_ONLY (stmt_info) > > > > + && (related_stmt_info = STMT_VINFO_RELATED_STMT > > > > + (stmt_info)) != > > > > NULL) > > > > + { > > > > + if (dump_enabled_p ()) > > > > > > > > I think STMT_VINFO_SLP_VECT_ONLY is a thing already, so I wonder if > > > > the above is a sufficient test. There's is_pattern_stmt_p () (which > > > > obviously also applies to non-SLP only patterns). > > > > > > > > Note that I also see another issue, namely with hybrid SLP the > > > > original non-pattern stmt would need to stay relevant. That > > > > probably asks for hybrid SLP discovery and relevance marking to be > > > > combined somehow. You can probably create a simple testcase by > > > > storing a lane of a complex op via a non-grouped store. > > > > > > > > + STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope; > > > > + STMT_VINFO_RELEVANT (related_stmt_info) = > > > > + vect_used_in_scope; > > > > > > > > as said previously the relevance should be copied, in this case back > > > > to the original stmt info from the pattern stmt info. > > > > > > > > + STMT_VINFO_IN_PATTERN_P (related_stmt_info) = false; > > > > + STMT_SLP_TYPE (related_stmt_info) = loop_vect; > > > > > > > > one option might be to delay pattern recognition (for the loop case) > > > > to after vect_detect_hybrid_slp and simply not make any hybrid stmt > > > > containing nodes part of a SLP pattern. It would be a workaround of > > > > course, not the best solution. Another option is to add a new field > > > > to stmt_info to put a "saved" relevance to and simply go over all > > > > stmts restoring relevance - we already restore SLP_TYPE to loop_vect > > > > that way at > > > > > > Yes.. ideally I would want something like telling it "don't worry > > > about this if you end up being pure SLP". > > > > > > I haven't tried with this new approach to instead add it to the > > PATTERN_SEQ > > > for the statement. Last time I tried however it looked like in order to get > > this > > > to work correctly I'd need a bigger hack.. > > > > I don't think PATTERN_SEQ would help us here, we'd need a "ORIG_SEQ" > > though we should eventually have that by means of the pattern root scalar > > stmts. > > Shouldn't it? If I mark the unused nodes as part of the same pattern, vect_analyze_stmt > would examine the pattern instead, but once it's relevancy is determined it won't analyze it > again. The unused statements then don't have any relevancy which will get > vect_detect_hybrid_slp to ignore them which should get us what we want no? > > Unless I'm misinterpreting something. The area is complex so I might misremeber as well ... guess we need to figure out the hard way. Richard. > Regards, > Tamar > > > > > Richard. > > > > > Thanks, > > > Tamar > > > > > > > > > > > /* Reset SLP type to loop_vect on all stmts. */ > > > > for (i = 0; i < LOOP_VINFO_LOOP (loop_vinfo)->num_nodes; ++i) > > > > { > > > > > > > > so simply restore the saved relevance would work there (guess I like > > > > that more than what vect_dissolve_slp_only_patterns does, in any > > > > case do what vect_dissolve_slp_only_patterns does in the above loop > > > > please). > > > > > > > > Thanks, > > > > Richard. > > > > > > > > > > > > > > > > > Thanks, > > > > > Tamar > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * Makefile.in (tree-vect-slp-patterns.o): New. > > > > > * doc/passes.texi: Update documentation. > > > > > * tree-vect-slp.c (vect_print_slp_tree): Add new state. > > > > > (vect_match_slp_patterns_2, vect_match_slp_patterns): New. > > > > > (vect_analyze_slp): Call pattern matcher. > > > > > * tree-vectorizer.h (enum _complex_operation): > > > > > (class vect_pattern_match, class vect_pattern): New. > > > > > * tree-vect-slp-patterns.c: New file. > > > > > > > > > > > -----Original Message----- > > > > > > From: rguenther@c653.arch.suse.de > > > > > > On Behalf Of Richard Biener > > > > > > Sent: Tuesday, September 29, 2020 10:42 AM > > > > > > To: Richard Sandiford > > > > > > Cc: Tamar Christina ; nd ; > > > > gcc- > > > > > > patches@gcc.gnu.org > > > > > > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern > > > > > > matching scaffolding. > > > > > > > > > > > > On Tue, 29 Sep 2020, Richard Sandiford wrote: > > > > > > > > > > > > > Richard Biener writes: > > > > > > > >> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance > > > > > > > >> > > (vec_info > > > > > > *vinfo, > > > > > > > >> > > &tree_size, bst_map); > > > > > > > >> > > if (node != NULL) > > > > > > > >> > > { > > > > > > > >> > > + /* Temporarily allow add_stmt calls again. */ > > > > > > > >> > > + vinfo->stmt_vec_info_ro = false; > > > > > > > >> > > + > > > > > > > >> > > + /* See if any patterns can be found in the > > > > > > > >> > > + constructed SLP > > > > tree > > > > > > > >> > > + before we do any analysis on it. */ > > > > > > > >> > > + vect_match_slp_patterns (node, vinfo, > > > > > > > >> > > + group_size, > > > > > > &max_nunits, > > > > > > > >> > > + matches, &npermutes, > > > > > > > >> > > + &tree_size, bst_map); > > > > > > > >> > > + > > > > > > > >> > > + /* After this no more add_stmt calls are allowed. */ > > > > > > > >> > > + vinfo->stmt_vec_info_ro = true; > > > > > > > >> > > + > > > > > > > >> > > > > > > > > > >> > > I think this is a bit early to match patterns - I'd > > > > > > > >> > > defer it to the point where all entries into the same > > > > > > > >> > > SLP subgraph are analyzed, thus somewhere at the end of > > > > > > > >> > > vect_analyze_slp loop over all instances and match > > > > > > > >> > > patterns? That way phases are > > > > more > > > > > > clearly separated. > > > > > > > >> > > > > > > > > >> > That would probably work, my only worry is that the SLP > > > > > > > >> > analysis itself may fail and bail out at > > > > > > > >> > > > > > > > > >> > /* If the loads and stores can be handled with > > > > > > > >> > load/store- > > > > lane > > > > > > > >> > instructions do not generate this SLP instance. */ > > > > > > > >> > if (is_a (vinfo) > > > > > > > >> > && loads_permuted > > > > > > > >> > && dr && vect_store_lanes_supported (vectype, > > > > group_size, > > > > > > > >> > false)) > > > > > > > > > > > > > > > > Ah, that piece of code. Yeah, I'm repeatedly running into > > > > > > > > it as well - it's a bad hack that stands in the way all the > > > > > > > > time :/ > > > > > > > > > > > > > > At one point I was wondering about trying to drop the above, > > > > > > > vectorise with and without SLP, and then compare their costs, > > > > > > > like for > > > > > > VECT_COMPARE_COSTS. > > > > > > > But that seemed like a dead end with the move to doing > > > > > > > everything on the SLP representation. > > > > > > > > > > > > Yeah ... though even moving everything to the SLP representation > > > > > > will > > > > retain > > > > > > the issue since there it will be N group-size 1 SLP instances > > > > > > vs. 1 group- > > > > size N > > > > > > SLP instance. > > > > > > > > > > > > > > I guess we should try moving this upward like to > > > > > > > > vect_analyze_loop_2 right before > > > > > > > > > > > > > > > > /* Check the SLP opportunities in the loop, analyze and > > > > > > > > build SLP > > > > trees. > > > > > > > > */ > > > > > > > > ok = vect_analyze_slp (loop_vinfo, *n_stmts); > > > > > > > > if (!ok) > > > > > > > > return ok; > > > > > > > > > > > > > > > > and there check whether all grouped loads and stores can be > > > > > > > > handled with store-/load-lanes (and there are no SLP > > > > > > > > reduction chains?) in which case do not try to attempt SLP > > > > > > > > at all. Because the testcases this check was supposed to > > > > > > > > change were all-load/store-lane or all SLP so the mixed case is > > probably not worth special casing. > > > > > > > > > > > > > > > > Since load-/store-lanes is an arm speciality I tried to only > > > > > > > > touch this fragile part with a ten-foot pole ;) CCing > > > > > > > > Richard, if he acks the above I can produce a patch. > > > > > > > > > > > > > > Yeah, sounds good to me. Probably also sorth checking whether > > > > > > > the likely_max iteration count is high enough to support > > > > > > > group_size vectors, if we have enough information to guess that. > > > > > > > > > > > > > > We could also get the gen* machinery to emit a macro that is > > > > > > > true if at least one load/store-lane pattern is defined, so > > > > > > > that we can skip the code for non-Arm targets. I can do that as a > > follow-up. > > > > > > > > > > > > I've had a second look and one complication is that we only > > > > > > elide the SLP node if any of the loads are permuted. So if all > > > > > > loads/stores are > > > > unpermuted > > > > > > but load/store-lanes would work we'd keep the SLP node. > > > > > > > > > > > > Of course without actually building the SLP node we don't know > > > > > > whether > > > > the > > > > > > loads will be permuted or not ... > > > > > > > > > > > > But surely the current place for the check will cause some > > > > > > testcases to become hybrid vectorizations which is likely undesirable. > > > > > > > > > > > > So we could move the check after all SLP discovery is completed > > > > > > and > > > > throw it > > > > > > all away if we can and should use load/store-lanes? > > > > > > But that might then not solve Tamars issue. > > > > > > > > > > > > Richard. > > > > > > > > > > > > > -- > > > > Richard Biener > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > > Nuernberg, Germany; GF: Felix Imend > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > Nuernberg, Germany; GF: Felix Imend > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend