From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 098BE3858C2F for ; Thu, 14 Sep 2023 03:12:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 098BE3858C2F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E394do028954; Thu, 14 Sep 2023 03:12:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=yYh0FyzWyXBm7mpdnmIpfTcS3TOiit4V6OIoPGWJJ5U=; b=sx3X9aTuGpBZMuSPg/0DyAblW8XM9croM5bkA1fWEQLG/A/vjF9rlJc0rpFPLGtO0joJ qaL7scEk7WzMt/Z8q5d6KmSos91wfYZXFzOC/k0Hp9cdlVIf1lS1HpSElCypbKT9gjxM 4C+pUE/UvB85l1q0assfLC0UHD0WfB85l+MBiUMKepU6lxjqbmxwkE3UvLgZpiB7bq6J jrLTzzCFckXdJSMAu2F3AO4hls8LLPcASJpxz+o7POtE1yHFjNhzfivJNZrY1ebMC4sI OBN8ALGunPG31y/xO0XCdWYlReJRPYe5eNIWZMzoL+i92vtKA0UM/okAo7Q5J2esHffz BQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3rat7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3CBSW006555; Thu, 14 Sep 2023 03:12:12 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3rasu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:11 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E37pTk002779; Thu, 14 Sep 2023 03:12:11 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t14hm7q3s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:11 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C9IE33161942 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:09 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 158DE20040; Thu, 14 Sep 2023 03:12:09 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4EBF820049; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) From: Kewen Lin To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 07/10] vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE Date: Wed, 13 Sep 2023 22:11:56 -0500 Message-Id: <03074b183ea6c016691e6174a331de1443bdf326.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: A2JfLfMfARBZUkf4s45XnWoCAw9YMSnj X-Proofpoint-ORIG-GUID: aqmzHGyxKzfHsu3um06hgj3w4Z32xtXO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 adultscore=0 spamscore=0 clxscore=1015 malwarescore=0 phishscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 mlxlogscore=626 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related handlings. (vectorizable_store): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost. --- gcc/tree-vect-stmts.cc | 128 ++++++++++++++++++++++++----------------- 1 file changed, 74 insertions(+), 54 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index fbd16b8a487..e3ba8077091 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -967,10 +967,10 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP - && memory_access_type != VMAT_LOAD_STORE_LANES); + && memory_access_type != VMAT_LOAD_STORE_LANES + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE); + unsigned int inside_cost = 0, prologue_cost = 0; - stmt_vec_info first_stmt_info = stmt_info; - bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); /* ??? Somehow we need to fix this at the callers. */ if (slp_node) @@ -983,35 +983,6 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, stmt_info, 0, vect_prologue); } - /* Grouped stores update all elements in the group at once, - so we want the DR for the first statement. */ - if (!slp_node && grouped_access_p) - first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); - - /* True if we should include any once-per-group costs as well as - the cost of the statement itself. For SLP we only get called - once per group anyhow. */ - bool first_stmt_p = (first_stmt_info == stmt_info); - - /* We assume that the cost of a single store-lanes instruction is - equivalent to the cost of DR_GROUP_SIZE separate stores. If a grouped - access is instead being provided by a permute-and-store operation, - include the cost of the permutes. */ - if (first_stmt_p - && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) - { - /* Uses a high and low interleave or shuffle operations for each - needed permute. */ - int group_size = DR_GROUP_SIZE (first_stmt_info); - int nstmts = ncopies * ceil_log2 (group_size) * group_size; - inside_cost = record_stmt_cost (cost_vec, nstmts, vec_perm, - stmt_info, 0, vect_body); - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_store_cost: strided group_size = %d .\n", - group_size); - } /* Costs of the stores. */ vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, @@ -8408,9 +8379,7 @@ vectorizable_store (vec_info *vinfo, costing, use the first one instead. */ if (grouped_store && !slp - && first_stmt_info != stmt_info - && (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_LOAD_STORE_LANES)) + && first_stmt_info != stmt_info) return true; } gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); @@ -9254,14 +9223,15 @@ vectorizable_store (vec_info *vinfo, return true; } + unsigned inside_cost = 0, prologue_cost = 0; auto_vec result_chain (group_size); auto_vec vec_oprnds; for (j = 0; j < ncopies; j++) { gimple *new_stmt; - if (j == 0 && !costing_p) + if (j == 0) { - if (slp) + if (slp && !costing_p) { /* Get vectorized arguments for SLP_NODE. */ vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op, @@ -9287,13 +9257,20 @@ vectorizable_store (vec_info *vinfo, that there is no interleaving, DR_GROUP_SIZE is 1, and only one iteration of the loop will be executed. */ op = vect_get_store_rhs (next_stmt_info); - vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies, - op, gvec_oprnds[i]); - vec_oprnd = (*gvec_oprnds[i])[0]; - dr_chain.quick_push (vec_oprnd); + if (costing_p + && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + update_prologue_cost (&prologue_cost, op); + else if (!costing_p) + { + vect_get_vec_defs_for_operand (vinfo, next_stmt_info, + ncopies, op, + gvec_oprnds[i]); + vec_oprnd = (*gvec_oprnds[i])[0]; + dr_chain.quick_push (vec_oprnd); + } next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); } - if (mask) + if (mask && !costing_p) { vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, mask, &vec_masks, @@ -9303,11 +9280,13 @@ vectorizable_store (vec_info *vinfo, } /* We should have catched mismatched types earlier. */ - gcc_assert (useless_type_conversion_p (vectype, - TREE_TYPE (vec_oprnd))); + gcc_assert (costing_p + || useless_type_conversion_p (vectype, + TREE_TYPE (vec_oprnd))); bool simd_lane_access_p = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0; - if (simd_lane_access_p + if (!costing_p + && simd_lane_access_p && !loop_masks && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)) @@ -9319,7 +9298,7 @@ vectorizable_store (vec_info *vinfo, dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr_info->dr)); dataref_offset = build_int_cst (ref_type, 0); } - else + else if (!costing_p) dataref_ptr = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, simd_lane_access_p ? loop : NULL, @@ -9347,16 +9326,46 @@ vectorizable_store (vec_info *vinfo, } new_stmt = NULL; - if (!costing_p && grouped_store) - /* Permute. */ - vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info, gsi, - &result_chain); + if (grouped_store) + { + /* Permute. */ + gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE); + if (costing_p) + { + int group_size = DR_GROUP_SIZE (first_stmt_info); + int nstmts = ceil_log2 (group_size) * group_size; + inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm, + stmt_info, 0, vect_body); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: " + "strided group_size = %d .\n", + group_size); + } + else + vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info, + gsi, &result_chain); + } stmt_vec_info next_stmt_info = first_stmt_info; for (i = 0; i < vec_num; i++) { if (costing_p) - continue; + { + if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + vect_get_store_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + &inside_cost, cost_vec); + + if (!slp) + { + next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); + if (!next_stmt_info) + break; + } + + continue; + } unsigned misalign; unsigned HOST_WIDE_INT align; @@ -9540,9 +9549,20 @@ vectorizable_store (vec_info *vinfo, } if (costing_p) - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, vls_type, - slp_node, cost_vec); + { + if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + } + else + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + alignment_support_scheme, misalignment, vls_type, + slp_node, cost_vec); + } return true; } -- 2.31.1