From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 7E7A03857C48 for ; Thu, 7 Jan 2021 13:36:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7E7A03857C48 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rguenther@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6DC7FACAF; Thu, 7 Jan 2021 13:36:12 +0000 (UTC) Date: Thu, 7 Jan 2021 14:36:12 +0100 (CET) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "ook@ucw.cz" Subject: RE: [PATCH 1/8 v9]middle-end slp: Support optimizing load distribution In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jan 2021 13:36:15 -0000 On Thu, 7 Jan 2021, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener > > Sent: Thursday, January 7, 2021 1:21 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; ook@ucw.cz > > Subject: Re: [PATCH 1/8 v9]middle-end slp: Support optimizing load > > distribution > > > > > From tamar.christina@arm.com Mon Dec 28 14:36:32 2020 > > > Date: Mon, 28 Dec 2020 13:35:56 +0000 > > > From: Tamar Christina > > > To: gcc-patches@gcc.gnu.org > > > Cc: nd@arm.com, rguenther@suse.de, ook@ucw.cz > > > Subject: [PATCH 1/8 v9]middle-end slp: Support optimizing load > > > distribution > > > > > > Hi All, > > > > > > This introduces a post processing step for the pattern matcher to > > > flatten permutes introduced by the complex multiplications patterns. > > > > > > This performs a blend early such that SLP is not cancelled by the > > > LOAD_LANES permute. This is a temporary workaround to the fact that > > > loads are not CSEd during building and is required to produce efficient code. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > > and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * tree-vect-slp.c (optimize_load_redistribution_1): New. > > > (optimize_load_redistribution): New. > > > (vect_match_slp_patterns): Use it. > > > > > > --- inline copy of patch -- > > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index > > > > > 2a58e54fe51471df5f55ce4a524d0022744054b0..8360a59098f517498f3155f325c > > f > > > 8406466ac25c 100644 > > > --- a/gcc/tree-vect-slp.c > > > +++ b/gcc/tree-vect-slp.c > > > @@ -2228,6 +2228,115 @@ calculate_unrolling_factor (poly_uint64 nunits, > > unsigned int group_size) > > > return exact_div (common_multiple (nunits, group_size), > > > group_size); } > > > > > > +/* Helper function of optimize_load_redistribution that performs the > > operation > > > + recursively. */ > > > + > > > +static slp_tree > > > +optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t > > *bst_map, > > > + hash_set *visited, slp_tree root) { > > > + if (visited->add (root)) > > > + return NULL; > > > + > > > + slp_tree node; > > > + unsigned i; > > > + > > > + /* For now, we don't know anything about externals so do not do > > > + anything. */ if (SLP_TREE_DEF_TYPE (root) == vect_external_def > > > + || SLP_TREE_DEF_TYPE (root) == vect_constant_def) > > > > use a single != vect_internal_def test please > > > > > + return NULL; > > > + else if (SLP_TREE_CODE (root) == VEC_PERM_EXPR > > > + && SLP_TREE_LANE_PERMUTATION (root).exists () > > > + && !SLP_TREE_SCALAR_STMTS (root).exists ()) > > > > I think both last tests are unnecessary > > It's there to prevent it from trying to optimize two_operands nodes > which are a vec_perm but contain no scalar statements. I didn't find a different > way to distinguish between the two. The SLP tree can contain a number of these > that haven't been pattern matched away. Well, that's because of the weak check for what you want to pattern match below. Certainly !SLP_TREE_SCALAR_STMTS (root).exists () isn't a reliable way to catch these. > > > > > + { > > > + /* First convert this node into a load node and add it to the leaves > > > + list and flatten the permute from a lane to a load one. If it's > > > + unneeded it will be elided later. */ > > > + auto_vec stmts; > > > + stmts.create (SLP_TREE_LANES (root)); > > > + load_permutation_t load_perm; > > > + load_perm.create (SLP_TREE_LANES (root)); > > > + lane_permutation_t lane_perm = SLP_TREE_LANE_PERMUTATION > > > + (root); > > > > load_perm leaks when any of the below outs is taken > > > > > + for (unsigned j = 0; j < lane_perm.length (); j++) > > > + { > > > + std::pair perm = lane_perm[j]; > > > + /* This isn't strictly needed, but this function is a temporary > > > + one for specifically pattern matching, so don't want it to > > > + optimize things the remainder of the pipeline will. */ > > > + if (perm.first != j) > > > + goto next; > > > > but please elide it nevertheless > > > > > + node = SLP_TREE_CHILDREN (root)[perm.first]; > > > + > > > + if (!SLP_TREE_LOAD_PERMUTATION (node).exists ()) > > > + return NULL; > > > > so you want to check whether this is a load, I think more to the point would > > be a vect_internal_def + zero SLP children check. And a comment on what > > we test (we do lack classification of SLP nodes, so a helper like > > vect_is_slp_load_node or so would be OK as well) > > > > > + > > > + stmts.quick_push (SLP_TREE_SCALAR_STMTS > > (node)[perm.second]); > > > + load_perm.safe_push (SLP_TREE_LOAD_PERMUTATION > > > +(node)[perm.second]); > > > > As you're doing here lacks a check that we are actually loading from the same > > DR group. I think it might be easier to just collect scalar stmts and throw > > them at vect_build_slp_tree? That should perform the necessary > > verification, build the appropriate lane permute and perform the CSE. Which > > leads to the question why the VEC_PERM node doesn't have scalar stmts set > > while we are actually be able to compute them here ... that is, the CSE > > opportunity could have been noticed during pattern matching itself? > > > > > + } > > > + > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > + "converting stmts on permute node %p\n", root); > > > + > > > + slp_tree *value = bst_map->get (stmts); > > > + if (value) > > > + node = *value; > > > + else > > > + { > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, node) > > > + SLP_TREE_REF_COUNT (node)++; > > > + > > > + vec stmts_cpy = stmts.copy (); > > > + node = vect_create_new_slp_node (stmts_cpy.copy (), 0); > > > + SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (root); > > > + SLP_TREE_LOAD_PERMUTATION (node) = load_perm; > > > + bst_map->put (stmts_cpy, node); > > > + } > > > + SLP_TREE_REF_COUNT (node)++; > > > > Adjusting the refcount here but doing the replacement in the caller is a bit > > awkward to follow - how about passing a reference so you can adjust the > > edge here? > > > > > + > > > + return node; > > > + } > > > + > > > +next: > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node) > > > + { > > > + slp_tree value = optimize_load_redistribution_1 (bst_map, visited, > > node); > > > + if (value) > > > + { > > > + SLP_TREE_CHILDREN (root)[i] = value; > > > + vect_free_slp_tree (node); > > > + } > > > + } > > > + > > > + return NULL; > > > +} > > > + > > > +/* Temporary workaround for loads not being CSEd during SLP build. This > > > + function will traverse the SLP tree rooted in ROOT for INSTANCE and find > > > + VEC_PERM nodes that blend vectors from multiple nodes that all read > > from the > > > + same DR such that the final operation is equal to a permuted load. Such > > > + NODES are then directly converted into LOADS themselves. The nodes > > are > > > + CSEd using BST_MAP. */ > > > + > > > +static void > > > +optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t > > *bst_map, > > > + slp_tree root) > > > +{ > > > + slp_tree node; > > > + unsigned i; > > > + hash_set visited; > > > + > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node) > > > + { > > > + slp_tree value = optimize_load_redistribution_1 (bst_map, &visited, > > node); > > > + if (value) > > > + { > > > + SLP_TREE_CHILDREN (root)[i] = value; > > > + vect_free_slp_tree (node); > > > + } > > > + } > > > +} > > > + > > > /* Helper function of vect_match_slp_patterns. > > > > > > Attempts to match patterns against the slp tree rooted in REF_NODE > > > using @@ -2276,7 +2385,7 @@ static bool vect_match_slp_patterns > > > (slp_instance instance, vec_info *vinfo, > > > hash_set *visited, > > > slp_tree_to_load_perm_map_t *perm_cache, > > > - scalar_stmts_to_slp_tree_map_t * /* bst_map */) > > > + scalar_stmts_to_slp_tree_map_t *bst_map) > > > { > > > DUMP_VECT_SCOPE ("vect_match_slp_patterns"); > > > slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2291,6 > > > +2400,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info > > > *vinfo, > > > > > > if (found_p) > > > { > > > + optimize_load_redistribution (bst_map, *ref_node); > > > + > > > if (dump_enabled_p ()) > > > { > > > dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > -- > > > > > > > > > [ Part 2, Text/X-DIFF 140 lines. ] > > > [ Unable to print this part. ] > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)