From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 979F13846034 for ; Wed, 4 Nov 2020 13:00:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 979F13846034 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rguenther@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4B4F4AF49; Wed, 4 Nov 2020 13:00:21 +0000 (UTC) Date: Wed, 4 Nov 2020 14:00:21 +0100 (CET) From: Richard Biener Sender: rguenther@c653.arch.suse.de To: Tamar Christina cc: gcc-patches@gcc.gnu.org, nd@arm.com, ook@ucw.cz Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes. In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2020 13:00:24 -0000 On Tue, 3 Nov 2020, Tamar Christina wrote: > Hi All, > > This optimizes sequential permutes. i.e. if there are two permutes back to back > this function applies the permute of the parent to the child and removed the > parent. > > If the resulting permute in the child is now a no-op. Then the child is also > dropped from the graph and the parent's parent attached to the child's child. > > This relies on the materialization point calculation in optimize SLP. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > Tests are included as part of the final patch as they need the SLP pattern > matcher to insert permutes in between. > > This allows us to remove useless permutes such as > > ldr q0, [x0, x3] > ldr q2, [x1, x3] > trn1 v1.4s, v0.4s, v0.4s > trn2 v0.4s, v0.4s, v0.4s > trn1 v0.4s, v1.4s, v0.4s > mov v1.16b, v3.16b > fcmla v1.4s, v0.4s, v2.4s, #0 > fcmla v1.4s, v0.4s, v2.4s, #90 > str q1, [x2, x3] > > from the sequence the vectorizer puts out and give > > ldr q0, [x0, x3] > ldr q2, [x1, x3] > mov v1.16b, v3.16b > fcmla v1.4s, v0.4s, v2.4s, #0 > fcmla v1.4s, v0.4s, v2.4s, #90 > str q1, [x2, x3] > > instead > > Ok for master? + /* If the remaining permute is a no-op then we can just drop the + node instead of materializing it. */ + if (vect_slp_tree_permute_noop_p (node)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "removing unneeded permute node %p\n", + node); + + unsigned idx = SLP_TREE_LANE_PERMUTATION (node)[0].first; + slp_tree value = SLP_TREE_CHILDREN (node)[idx]; + unsigned src = slpg->vertices[node->vertex].pred->src; + slp_tree prev = vertices[src]; + unsigned dest; + slp_tree tmp; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp) + if (tmp == node) + { + SLP_TREE_CHILDREN (prev)[dest] = value; + break; + } so I don't think this will work reliably since we do not update the graph when inserting permute nodes and thus the "parent" can refer to a permute rather than the original node now (we're just walking over all vertices in no specific order during materialization - guess using IPO might fix this apart from in cycles). You would also need to iterate over preds here (pred_next). I guess removing no-op permutes is only important for costing? They should not cause any actual code generation? You also need to adjust reference counts when you change SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then slp_tree_free node itself (which might be tricky at this point). +static bool +vect_slp_tree_permute_noop_p (slp_tree node) +{ + gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR); + + if (!SLP_TREE_LANE_PERMUTATION (node).exists ()) + return true; + + unsigned x, seed; + lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node); + seed = perms[0].second; + for (x = 1; x < perms.length (); x++) + if (perms[x].first != perms[0].first || perms[x].second != ++seed) + return false; 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as SLP_TREE_LANES (node). Otherwise you'll make permutes that select parts of a vector no-op. Maybe simplify the patch and do the vect_slp_tree_permute_noop_p check in vectorizable_slp_permutation instead? The permute node adjustment part is OK, thus up to + else if (SLP_TREE_LANE_PERMUTATION (node).exists ()) + { + /* If the node if already a permute node we just need to apply + the permutation to the permute node itself. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "simplifying permute node %p\n", + node); + + vect_slp_permute (perms[perm], SLP_TREE_LANE_PERMUTATION (node), + true); in case you want to split up, independently of the rest of the patches. Thanks, Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vect-slp.c (vect_slp_tree_permute_noop_p): New. > (vect_optimize_slp): Optimize permutes. > (vectorizable_slp_permutation): Fix typo. > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend