From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 7E8A03858290 for ; Tue, 11 Oct 2022 11:16:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7E8A03858290 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 9562D1FE33 for ; Tue, 11 Oct 2022 11:15:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1665486924; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=LdbFfMtftXHu4SSz42Pk/ijNApE6bxm9fj++AaTpaLs=; b=H9/A7PtHvwG8g57XunmXUjhrviEVqyrWYPigp+nc3rr2gsVc+gUmZvfVtt4iBX4ljP4wum 0q9n7fZN/ooBgDs2oODi02rmqcs6/Ka6+JLxfeiFPwHO2wP6/22o6tmp6LCh63As7cukqR /q9ZbDkNqcwHP9vJM7MRQS9UPITtsUc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1665486924; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=LdbFfMtftXHu4SSz42Pk/ijNApE6bxm9fj++AaTpaLs=; b=wWakMxnadBLiQgNd+PWkTF/Fb532ZWVYWLV1Wzbg6hgCgfiZnmGEcHcMj87b6T4xRI83WQ jejnrR8Z4T49sbDA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 90F7F2C141 for ; Tue, 11 Oct 2022 11:15:24 +0000 (UTC) Date: Tue, 11 Oct 2022 11:15:24 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/107212 - SLP reduction of reduction paths User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,MISSING_MID,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Message-ID: <20221011111524.503kKxOqbsOFt85Jm7iykLokiEcLds84bEmSqhPuHW8@z> The following fixes an issue with how we handle epilogue generation for SLP reductions of reduction paths where the actual live lanes are not "canonical". We need to make sure to identify all live lanes as reductions and thus have to iterate over all participating SLP lanes when walking the reduction SSA use-def chain. Also the previous attempt likely to mitigate such issue in vectorizable_live_operation is misguided and has to be removed. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk sofar. PR tree-optimization/107212 * tree-vect-loop.cc (vectorizable_reduction): Make sure to set STMT_VINFO_REDUC_DEF for all live lanes in a SLP reduction. (vectorizable_live_operation): Do not pun to the SLP node representative for reduction epilogue generation. * gcc.dg/vect/pr107212-1.c: New testcase. * gcc.dg/vect/pr107212-2.c: Likewise. --- gcc/testsuite/gcc.dg/vect/pr107212-1.c | 27 ++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/vect/pr107212-2.c | 23 ++++++++++++++++++++++ gcc/tree-vect-loop.cc | 20 ++++++++++++------- 3 files changed, 63 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr107212-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/pr107212-2.c diff --git a/gcc/testsuite/gcc.dg/vect/pr107212-1.c b/gcc/testsuite/gcc.dg/vect/pr107212-1.c new file mode 100644 index 00000000000..5343f9b6b23 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr107212-1.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ + +#include "tree-vect.h" + +int main() +{ + check_vect (); + + unsigned int tab[6][2] = { {69, 73}, {36, 40}, {24, 16}, + {16, 11}, {4, 5}, {3, 1} }; + + int sum_0 = 0; + int sum_1 = 0; + + for(int t=0; t<6; t++) { + sum_0 += tab[t][0]; + sum_1 += tab[t][1]; + } + + int x1 = (sum_0 < 100); + int x2 = (sum_0 > 200); + + if (x1 || x2 || sum_1 != 146) + __builtin_abort (); + + return 0; +} diff --git a/gcc/testsuite/gcc.dg/vect/pr107212-2.c b/gcc/testsuite/gcc.dg/vect/pr107212-2.c new file mode 100644 index 00000000000..109c2b991a6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr107212-2.c @@ -0,0 +1,23 @@ +/* { dg-do run } */ + +#include "tree-vect.h" + +int sum_1 = 0; + +int main() +{ + check_vect (); + + unsigned int tab[6][2] = {{150, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}}; + + int sum_0 = 0; + + for (int t = 0; t < 6; t++) { + sum_0 += tab[t][0]; + sum_1 += tab[t][0]; + } + + if (sum_0 < 100 || sum_0 > 200) + __builtin_abort(); + return 0; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 2536cc3cf49..1996ecfee7a 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -6822,10 +6822,20 @@ vectorizable_reduction (loop_vec_info loop_vinfo, } if (!REDUC_GROUP_FIRST_ELEMENT (vdef)) only_slp_reduc_chain = false; - /* ??? For epilogue generation live members of the chain need + /* For epilogue generation live members of the chain need to point back to the PHI via their original stmt for - info_for_reduction to work. */ - if (STMT_VINFO_LIVE_P (vdef)) + info_for_reduction to work. For SLP we need to look at + all lanes here - even though we only will vectorize from + the SLP node with live lane zero the other live lanes also + need to be identified as part of a reduction to be able + to skip code generation for them. */ + if (slp_for_stmt_info) + { + for (auto s : SLP_TREE_SCALAR_STMTS (slp_for_stmt_info)) + if (STMT_VINFO_LIVE_P (s)) + STMT_VINFO_REDUC_DEF (vect_orig_stmt (s)) = phi_info; + } + else if (STMT_VINFO_LIVE_P (vdef)) STMT_VINFO_REDUC_DEF (def) = phi_info; gimple_match_op op; if (!gimple_extract_op (vdef->stmt, &op)) @@ -9601,10 +9611,6 @@ vectorizable_live_operation (vec_info *vinfo, all involved stmts together. */ else if (slp_index != 0) return true; - else - /* For SLP reductions the meta-info is attached to - the representative. */ - stmt_info = SLP_TREE_REPRESENTATIVE (slp_node); } stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info); gcc_assert (reduc_info->is_reduc_info); -- 2.35.3