From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 728213858CDB for ; Mon, 26 Jun 2023 12:17:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 728213858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id A7D0421881; Mon, 26 Jun 2023 12:17:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1687781848; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=iJPD4mDnkA/WcIZCLAgYaESzpd9FyX9IxetsaEjPy1U=; b=zv8NvjCuimZDRXP74Ue7sziFr8ndn3u+lzpy315lzChBelgtDZ+2E39MPAVdhd5BRFX1D2 uNFxBOyChzdmLj2dg4SZgNX3U1ScFgIbYMUXx31LS1N1QYk6hCaSHz7BK5nx0GEdhVzJbO Lpuvfn1czF1T3l4zFjukttjZVyzW+Ps= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1687781848; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=iJPD4mDnkA/WcIZCLAgYaESzpd9FyX9IxetsaEjPy1U=; b=M+tIvx+ukhK/lyYZNaxAkaSj7ExIqAWm0oSHgAK5nS37IMwpKRsS7o0F2JbILln9XJeuS1 oJvQV0WoiURRG3AQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 9C3562C141; Mon, 26 Jun 2023 12:17:28 +0000 (UTC) Date: Mon, 26 Jun 2023 12:17:28 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] tree-optimization/110381 - preserve SLP permutation with in-order reductions User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,MISSING_MID,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Message-ID: <20230626121728.S5ndxputu9vInIDcTRg4_3B4R0iHSwEWRTvEucBKDsE@z> The following fixes a bug that manifests itself during fold-left reduction transform in picking not the last scalar def to replace and thus double-counting some elements. But the underlying issue is that we merge a load permutation into the in-order reduction which is of course wrong. Now, reduction analysis has not yet been performend when optimizing permutations so we have to resort to check that ourselves. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/110381 * tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts): Materialize permutes before fold-left reductions. * gcc.dg/vect/pr110381.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr110381.c | 40 ++++++++++++++++++++++++++++ gcc/tree-vect-slp.cc | 18 +++++++++++-- 2 files changed, 56 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr110381.c diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c b/gcc/testsuite/gcc.dg/vect/pr110381.c new file mode 100644 index 00000000000..2313dbf11ca --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr110381.c @@ -0,0 +1,40 @@ +/* { dg-do run } */ + +struct FOO { + double a; + double b; + double c; +}; + +double __attribute__((noipa)) +sum_8_foos(const struct FOO* foos) +{ + double sum = 0; + + for (int i = 0; i < 8; ++i) + { + struct FOO foo = foos[i]; + + /* Need to use an in-order reduction here, preserving + the load permutation. */ + sum += foo.a; + sum += foo.c; + sum += foo.b; + } + + return sum; +} + +int main() +{ + struct FOO foos[8]; + + __builtin_memset (foos, 0, sizeof (foos)); + foos[0].a = __DBL_MAX__; + foos[0].b = 5; + foos[0].c = -__DBL_MAX__; + + if (sum_8_foos (foos) != 5) + __builtin_abort (); + return 0; +} diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 4481d43e3d7..8cb1ac1f319 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -4682,14 +4682,28 @@ vect_optimize_slp_pass::start_choosing_layouts () m_partition_layout_costs.safe_grow_cleared (m_partitions.length () * m_perms.length ()); - /* We have to mark outgoing permutations facing non-reduction graph - entries that are not represented as to be materialized. */ + /* We have to mark outgoing permutations facing non-associating-reduction + graph entries that are not represented as to be materialized. + slp_inst_kind_bb_reduc currently only covers associatable reductions. */ for (slp_instance instance : m_vinfo->slp_instances) if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor) { unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex; m_partitions[m_vertices[node_i].partition].layout = 0; } + else if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_reduc_chain) + { + stmt_vec_info stmt_info + = SLP_TREE_REPRESENTATIVE (SLP_INSTANCE_TREE (instance)); + stmt_vec_info reduc_info = info_for_reduction (m_vinfo, stmt_info); + if (needs_fold_left_reduction_p (TREE_TYPE + (gimple_get_lhs (stmt_info->stmt)), + STMT_VINFO_REDUC_CODE (reduc_info))) + { + unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex; + m_partitions[m_vertices[node_i].partition].layout = 0; + } + } /* Check which layouts each node and partition can handle. Calculate the weights associated with inserting layout changes on edges. */ -- 2.35.3