From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C41C83858023 for ; Tue, 13 Jun 2023 02:06:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C41C83858023 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1lNlZ010299; Tue, 13 Jun 2023 02:06:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=Wk2efnAObf7dh8ft7Yn4+GyyR4hRUw1UbaEwS3COqQY=; b=Lo0CXjWRyUHw37aYPERLlTTtIo6BWwoAtnAzreTr8mMfYWhbjLFBOXW6A1MmZggBUgsK qenH+Rr8nvEqRnUl9x5x5WJahMueD0pmthXH8GRJ7ay+1o4QZQNHFxCvRJ0X/NChTXRQ rKwhsypsh0evUkT+YgpJAlFNZy1CF/82kO1M85KH/+HijGbhVUOhTBCn+dqo7V34ZngR S0PBfD5BQapPV3X1WMOU3nSVYsT6vKJU8V03o21MNtu5Bkx9t1HzHP6rSRCiNCELZ9q6 LrrE/ck5Lr2XNqzKacsAPNGUKjUE4Ah30EvP8XLi7qOPiG1m4YyAfbW9cmwQ6XD4JCFR Dg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3gg9s6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:06:27 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D26RN9001455; Tue, 13 Jun 2023 02:06:27 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3gg9m3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:06:26 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D0duK9010323; Tue, 13 Jun 2023 02:03:49 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3r4gedsc3q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:49 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23kb920382260 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:46 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C64F520043; Tue, 13 Jun 2023 02:03:46 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D866B2004B; Tue, 13 Jun 2023 02:03:45 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:45 +0000 (GMT) From: Kewen Lin To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP Date: Mon, 12 Jun 2023 21:03:25 -0500 Message-Id: <0281a2a022869efe379130aea6e0782e4827ef61.1686573640.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: reORvQXV-4stvmIX0PZ1UjFtbz4KYBRh X-Proofpoint-ORIG-GUID: -MoQxbBkL_e8Nlig1u4CFiiyPTzmr-Va X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 suspectscore=0 lowpriorityscore=0 clxscore=1015 phishscore=0 spamscore=0 adultscore=0 priorityscore=1501 mlxscore=0 malwarescore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch adjusts the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP in function vectorizable_load. We don't call function vect_model_load_cost for them any more. As PR82255 shows, we don't always need a vector construction there, moving costing next to the transform can make us only cost for vector construction when it's actually needed. Besides, it can count the number of loads consistently for some cases. PR tree-optimization/82255 gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling vect_model_load_cost. (vect_model_load_cost): Assert it won't get VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their related handlings. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New test. 2023-06-13 Bill Schmidt Kewen Lin --- .../vect/costmodel/ppc/costmodel-pr82255.c | 31 ++++ gcc/tree-vect-stmts.cc | 170 +++++++++++------- 2 files changed, 134 insertions(+), 67 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c new file mode 100644 index 00000000000..9317ee2e15b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* PR82255: Ensure we don't require a vec_construct cost when we aren't + going to generate a strided load. */ + +extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__)) +__attribute__ ((__const__)); + +static int +foo (unsigned char *w, int i, unsigned char *x, int j) +{ + int tot = 0; + for (int a = 0; a < 16; a++) + { +#pragma GCC unroll 16 + for (int b = 0; b < 16; b++) + tot += abs (w[b] - x[b]); + w += i; + x += j; + } + return tot; +} + +void +bar (unsigned char *w, unsigned char *x, int i, int *result) +{ + *result = foo (w, 16, x, i); +} + +/* { dg-final { scan-tree-dump-times "vec_construct" 0 "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 19c61d703c8..651dc800380 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1136,7 +1136,9 @@ vect_model_load_cost (vec_info *vinfo, stmt_vector_for_cost *cost_vec) { gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) - && memory_access_type != VMAT_INVARIANT); + && memory_access_type != VMAT_INVARIANT + && memory_access_type != VMAT_ELEMENTWISE + && memory_access_type != VMAT_STRIDED_SLP); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1221,8 +1223,7 @@ vect_model_load_cost (vec_info *vinfo, } /* The loads themselves. */ - if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_GATHER_SCATTER) + if (memory_access_type == VMAT_GATHER_SCATTER) { tree vectype = STMT_VINFO_VECTYPE (stmt_info); unsigned int assumed_nunits = vect_nunits_for_cost (vectype); @@ -1244,10 +1245,10 @@ vect_model_load_cost (vec_info *vinfo, alignment_support_scheme, misalignment, first_stmt_p, &inside_cost, &prologue_cost, cost_vec, cost_vec, true); - if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_STRIDED_SLP - || (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST && !gs_info->decl)) + + if (memory_access_type == VMAT_GATHER_SCATTER + && gs_info->ifn == IFN_LAST + && !gs_info->decl) inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct, stmt_info, 0, vect_body); @@ -9591,14 +9592,6 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) { - if (costing_p) - { - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, - memory_access_type, alignment_support_scheme, - misalignment, &gs_info, slp_node, cost_vec); - return true; - } - gimple_stmt_iterator incr_gsi; bool insert_after; tree offvar; @@ -9610,6 +9603,7 @@ vectorizable_load (vec_info *vinfo, unsigned int const_nunits = nunits.to_constant (); unsigned HOST_WIDE_INT cst_offset = 0; tree dr_offset; + unsigned int inside_cost = 0; gcc_assert (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)); gcc_assert (!nested_in_vect_loop); @@ -9624,6 +9618,7 @@ vectorizable_load (vec_info *vinfo, first_stmt_info = stmt_info; first_dr_info = dr_info; } + if (slp && grouped_load) { group_size = DR_GROUP_SIZE (first_stmt_info); @@ -9640,43 +9635,44 @@ vectorizable_load (vec_info *vinfo, ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr)); } - dr_offset = get_dr_vinfo_offset (vinfo, first_dr_info); - stride_base - = fold_build_pointer_plus - (DR_BASE_ADDRESS (first_dr_info->dr), - size_binop (PLUS_EXPR, - convert_to_ptrofftype (dr_offset), - convert_to_ptrofftype (DR_INIT (first_dr_info->dr)))); - stride_step = fold_convert (sizetype, DR_STEP (first_dr_info->dr)); + if (!costing_p) + { + dr_offset = get_dr_vinfo_offset (vinfo, first_dr_info); + stride_base = fold_build_pointer_plus ( + DR_BASE_ADDRESS (first_dr_info->dr), + size_binop (PLUS_EXPR, convert_to_ptrofftype (dr_offset), + convert_to_ptrofftype (DR_INIT (first_dr_info->dr)))); + stride_step = fold_convert (sizetype, DR_STEP (first_dr_info->dr)); - /* For a load with loop-invariant (but other than power-of-2) - stride (i.e. not a grouped access) like so: + /* For a load with loop-invariant (but other than power-of-2) + stride (i.e. not a grouped access) like so: - for (i = 0; i < n; i += stride) - ... = array[i]; + for (i = 0; i < n; i += stride) + ... = array[i]; - we generate a new induction variable and new accesses to - form a new vector (or vectors, depending on ncopies): + we generate a new induction variable and new accesses to + form a new vector (or vectors, depending on ncopies): - for (j = 0; ; j += VF*stride) - tmp1 = array[j]; - tmp2 = array[j + stride]; - ... - vectemp = {tmp1, tmp2, ...} - */ + for (j = 0; ; j += VF*stride) + tmp1 = array[j]; + tmp2 = array[j + stride]; + ... + vectemp = {tmp1, tmp2, ...} + */ - ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (stride_step), stride_step, - build_int_cst (TREE_TYPE (stride_step), vf)); + ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (stride_step), stride_step, + build_int_cst (TREE_TYPE (stride_step), vf)); - standard_iv_increment_position (loop, &incr_gsi, &insert_after); + standard_iv_increment_position (loop, &incr_gsi, &insert_after); - stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); - ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); - create_iv (stride_base, PLUS_EXPR, ivstep, NULL, - loop, &incr_gsi, insert_after, - &offvar, NULL); + stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); + ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); + create_iv (stride_base, PLUS_EXPR, ivstep, NULL, + loop, &incr_gsi, insert_after, + &offvar, NULL); - stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + } running_off = offvar; alias_off = build_int_cst (ref_type, 0); @@ -9743,11 +9739,23 @@ vectorizable_load (vec_info *vinfo, unsigned int n_groups = 0; for (j = 0; j < ncopies; j++) { - if (nloads > 1) + if (nloads > 1 && !costing_p) vec_alloc (v, nloads); gimple *new_stmt = NULL; for (i = 0; i < nloads; i++) { + if (costing_p) + { + if (VECTOR_TYPE_P (ltype)) + vect_get_load_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + false, &inside_cost, nullptr, cost_vec, + cost_vec, true); + else + inside_cost += record_stmt_cost (cost_vec, 1, scalar_load, + stmt_info, 0, vect_body); + continue; + } tree this_off = build_int_cst (TREE_TYPE (alias_off), group_el * elsz + cst_offset); tree data_ref = build2 (MEM_REF, ltype, running_off, this_off); @@ -9778,42 +9786,70 @@ vectorizable_load (vec_info *vinfo, group_el = 0; } } + if (nloads > 1) { - tree vec_inv = build_constructor (lvectype, v); - new_temp = vect_init_vector (vinfo, stmt_info, - vec_inv, lvectype, gsi); - new_stmt = SSA_NAME_DEF_STMT (new_temp); - if (lvectype != vectype) + if (costing_p) + inside_cost += record_stmt_cost (cost_vec, 1, vec_construct, + stmt_info, 0, vect_body); + else { - new_stmt = gimple_build_assign (make_ssa_name (vectype), - VIEW_CONVERT_EXPR, - build1 (VIEW_CONVERT_EXPR, - vectype, new_temp)); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + tree vec_inv = build_constructor (lvectype, v); + new_temp = vect_init_vector (vinfo, stmt_info, vec_inv, + lvectype, gsi); + new_stmt = SSA_NAME_DEF_STMT (new_temp); + if (lvectype != vectype) + { + new_stmt + = gimple_build_assign (make_ssa_name (vectype), + VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + vectype, new_temp)); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, + gsi); + } } } - if (slp) + if (!costing_p) { - if (slp_perm) - dr_chain.quick_push (gimple_assign_lhs (new_stmt)); + if (slp) + { + if (slp_perm) + dr_chain.quick_push (gimple_assign_lhs (new_stmt)); + else + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + } else - SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); - } - else - { - if (j == 0) - *vec_stmt = new_stmt; - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + { + if (j == 0) + *vec_stmt = new_stmt; + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + } } } if (slp_perm) { unsigned n_perms; - vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, - false, &n_perms); + if (costing_p) + { + unsigned n_loads; + vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL, vf, + true, &n_perms, &n_loads); + inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm, + first_stmt_info, 0, vect_body); + } + else + vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, + false, &n_perms); } + + if (costing_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %u, " + "prologue_cost = 0 .\n", + inside_cost); + return true; } -- 2.31.1