From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 8BBC8385C6E8 for ; Tue, 13 Jun 2023 02:07:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8BBC8385C6E8 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D27QHs012364; Tue, 13 Jun 2023 02:07:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=ImoyiwXX94q+FFMQvdFVUKR8M60ZthY2akheBRg/UW4=; b=tYw3GykVWNPhCJhKERhvPVa6qgmxcyKgnt5OMwND8K3qI/Y5JRn7E1kPr08w/GIUrnoS ehII0poytbiwu94nzGXXsMsFTLos3SeY7LLdYf7+c3NoAmMLRqVFwaXIQXu3nR0ViIzw E2jb6X8t+1lmVgggiPE+qwDOKA+J54f/yJjR9xlavhkdTCp8gJFwxD3rHqyjeZXochK4 P1R/YFcoDO6tW97hg1dsQ6lhjinU+jKYo7hq8Y3HRedIBFDwzIO4bjL9v19iWBa4r+jh sxCymyg+LTo+4c2k2AboA3RXM/rjBUaQ0IrWVB9Cb75FCKV59Xh6I1QdcF+Jk1PmFJQ2 zQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3era6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:07:30 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D27TVU012849; Tue, 13 Jun 2023 02:07:29 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3era21-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:07:28 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D0J485032234; Tue, 13 Jun 2023 02:03:52 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3r4gt51u1c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:52 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23oxv27197988 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:50 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5314C20043; Tue, 13 Jun 2023 02:03:50 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 64FF720040; Tue, 13 Jun 2023 02:03:49 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:49 +0000 (GMT) From: Kewen Lin To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 7/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE Date: Mon, 12 Jun 2023 21:03:28 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: LTRSoTFJCf6CNUvPZkwnE0L90vWKn63m X-Proofpoint-ORIG-GUID: CXPBCiOfpGBWTpc4KvoEfczgtoeTXfEf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 spamscore=0 malwarescore=0 adultscore=0 suspectscore=0 mlxlogscore=999 clxscore=1015 impostorscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch adjusts the cost handling on VMAT_CONTIGUOUS_REVERSE in function vectorizable_load. We don't call function vect_model_load_cost for it any more. This change makes us not miscount some required vector permutation as the associated test case shows. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_load_cost): Assert it won't get VMAT_CONTIGUOUS_REVERSE any more. (vectorizable_load): Adjust the costing handling on VMAT_CONTIGUOUS_REVERSE without calling vect_model_load_cost. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c: New test. --- .../costmodel/ppc/costmodel-vect-reversed.c | 22 ++++ gcc/tree-vect-stmts.cc | 109 ++++++++++++------ 2 files changed, 93 insertions(+), 38 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c new file mode 100644 index 00000000000..651274be038 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-additional-options "-mvsx" } */ + +/* Verify we do cost the required vec_perm. */ + +int x[1024], y[1024]; + +void +foo () +{ + for (int i = 0; i < 512; ++i) + { + x[2 * i] = y[1023 - (2 * i)]; + x[2 * i + 1] = y[1023 - (2 * i + 1)]; + } +} +/* The reason why it doesn't check the exact count is that + retrying for the epilogue with partial vector capability + like Power10 can result in more than 1 vec_perm. */ +/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 4c5ce2ab278..7f8d9db5363 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1134,11 +1134,8 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER - && memory_access_type != VMAT_INVARIANT - && memory_access_type != VMAT_ELEMENTWISE - && memory_access_type != VMAT_STRIDED_SLP - && memory_access_type != VMAT_LOAD_STORE_LANES); + gcc_assert (memory_access_type == VMAT_CONTIGUOUS + || memory_access_type == VMAT_CONTIGUOUS_PERMUTE); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -10292,7 +10289,7 @@ vectorizable_load (vec_info *vinfo, = record_stmt_cost (cost_vec, cnunits, scalar_load, stmt_info, 0, vect_body); - goto vec_num_loop_costing_end; + break; } if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vec_offset = vec_offsets[vec_num * j + i]; @@ -10335,7 +10332,7 @@ vectorizable_load (vec_info *vinfo, inside_cost = record_stmt_cost (cost_vec, 1, vec_construct, stmt_info, 0, vect_body); - goto vec_num_loop_costing_end; + break; } unsigned HOST_WIDE_INT const_offset_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) @@ -10390,7 +10387,7 @@ vectorizable_load (vec_info *vinfo, } if (costing_p) - goto vec_num_loop_costing_end; + break; align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); @@ -10563,7 +10560,7 @@ vectorizable_load (vec_info *vinfo, case dr_explicit_realign: { if (costing_p) - goto vec_num_loop_costing_end; + break; tree ptr, bump; tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype)); @@ -10627,7 +10624,7 @@ vectorizable_load (vec_info *vinfo, case dr_explicit_realign_optimized: { if (costing_p) - goto vec_num_loop_costing_end; + break; if (TREE_CODE (dataref_ptr) == SSA_NAME) new_temp = copy_ssa_name (dataref_ptr); else @@ -10650,22 +10647,37 @@ vectorizable_load (vec_info *vinfo, default: gcc_unreachable (); } - vec_dest = vect_create_destination_var (scalar_dest, vectype); - /* DATA_REF is null if we've already built the statement. */ - if (data_ref) + + /* One common place to cost the above vect load for different + alignment support schemes. */ + if (costing_p) { - vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); - new_stmt = gimple_build_assign (vec_dest, data_ref); + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) + vect_get_load_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + false, &inside_cost, &prologue_cost, + cost_vec, cost_vec, true); + } + else + { + vec_dest = vect_create_destination_var (scalar_dest, vectype); + /* DATA_REF is null if we've already built the statement. */ + if (data_ref) + { + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); + new_stmt = gimple_build_assign (vec_dest, data_ref); + } + new_temp = make_ssa_name (vec_dest, new_stmt); + gimple_set_lhs (new_stmt, new_temp); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); } - new_temp = make_ssa_name (vec_dest, new_stmt); - gimple_set_lhs (new_stmt, new_temp); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); /* 3. Handle explicit realignment if necessary/supported. Create in loop: vec_dest = realign_load (msq, lsq, realignment_token) */ - if (alignment_support_scheme == dr_explicit_realign_optimized - || alignment_support_scheme == dr_explicit_realign) + if (!costing_p + && (alignment_support_scheme == dr_explicit_realign_optimized + || alignment_support_scheme == dr_explicit_realign)) { lsq = gimple_assign_lhs (new_stmt); if (!realignment_token) @@ -10690,26 +10702,34 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) { - tree perm_mask = perm_mask_for_reverse (vectype); - new_temp = permute_vec_elements (vinfo, new_temp, new_temp, - perm_mask, stmt_info, gsi); - new_stmt = SSA_NAME_DEF_STMT (new_temp); + if (costing_p) + inside_cost = record_stmt_cost (cost_vec, 1, vec_perm, + stmt_info, 0, vect_body); + else + { + tree perm_mask = perm_mask_for_reverse (vectype); + new_temp + = permute_vec_elements (vinfo, new_temp, new_temp, + perm_mask, stmt_info, gsi); + new_stmt = SSA_NAME_DEF_STMT (new_temp); + } } /* Collect vector loads and later create their permutation in vect_transform_grouped_load (). */ - if (grouped_load || slp_perm) + if (!costing_p && (grouped_load || slp_perm)) dr_chain.quick_push (new_temp); /* Store vector loads in the corresponding SLP_NODE. */ - if (slp && !slp_perm) + if (!costing_p && slp && !slp_perm) SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); /* With SLP permutation we load the gaps as well, without we need to skip the gaps after we manage to fully load all elements. group_gap_adj is DR_GROUP_SIZE here. */ group_elt += nunits; - if (maybe_ne (group_gap_adj, 0U) + if (!costing_p + && maybe_ne (group_gap_adj, 0U) && !slp_perm && known_eq (group_elt, group_size - group_gap_adj)) { @@ -10724,8 +10744,6 @@ vectorizable_load (vec_info *vinfo, gsi, stmt_info, bump); group_elt = 0; } -vec_num_loop_costing_end: - ; } /* Bump the vector pointer to account for a gap or for excess elements loaded for a permuted SLP load. */ @@ -10748,18 +10766,30 @@ vec_num_loop_costing_end: if (slp && !slp_perm) continue; - if (slp_perm && !costing_p) - { + if (slp_perm) + { unsigned n_perms; /* For SLP we know we've seen all possible uses of dr_chain so direct vect_transform_slp_perm_load to DCE the unused parts. ??? This is a hack to prevent compile-time issues as seen in PR101120 and friends. */ - bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, - gsi, vf, false, &n_perms, - nullptr, true); - gcc_assert (ok); - } + if (costing_p + && memory_access_type != VMAT_CONTIGUOUS + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) + { + vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf, + true, &n_perms, nullptr); + inside_cost = record_stmt_cost (cost_vec, n_perms, vec_perm, + stmt_info, 0, vect_body); + } + else if (!costing_p) + { + bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, + gsi, vf, false, &n_perms, + nullptr, true); + gcc_assert (ok); + } + } else if (!costing_p) { if (grouped_load) @@ -10781,8 +10811,11 @@ vec_num_loop_costing_end: if (costing_p) { - if (memory_access_type == VMAT_GATHER_SCATTER - || memory_access_type == VMAT_LOAD_STORE_LANES) + gcc_assert (memory_access_type != VMAT_INVARIANT + && memory_access_type != VMAT_ELEMENTWISE + && memory_access_type != VMAT_STRIDED_SLP); + if (memory_access_type != VMAT_CONTIGUOUS + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, -- 2.31.1