From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C686A3858D35 for ; Tue, 23 May 2023 03:01:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C686A3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34N2lJlQ028776; Tue, 23 May 2023 03:01:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=g4Pgc8xczPcYbmVW6l7FzFA6ua8V1g7qHJvVXT9RiII=; b=pBFCnuKnIzZq6M/HKvpVXL/HAdV5+OKWXMShAyutbid/iz9t3840faqlg09GMCfxqsr7 lQ4y2bdx1SkUCU+BZkOCEXkJFG1fYq/ZSm9WpsQDXFU4727zCTQUCxt1QE2/Pr/Hq3Sc 4ndIfguixd6Kh4kqn9OQK5UFgkVx9dX8o8r+OTrkslIZR0aatr+rVMtOTfK/a13Djc6s 5sxxKOVrNEQDU0C/hgiamMYLoyONcTlSb4cC4htdNgFOti0LC93IN5vpibxpB+wxy/9a DJCwYCBUNFOahN/1VK4Po+9U+GYj+gRsDj3nveBC9GJpGRvHFMcm9O44G4Tu5lL005em +w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qrm5a1nc9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 23 May 2023 03:01:50 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34N2kPje026283; Tue, 23 May 2023 03:01:50 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qrm5a1nbu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 23 May 2023 03:01:50 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34N1WiCn015874; Tue, 23 May 2023 03:01:48 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3qppcs91gc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 23 May 2023 03:01:48 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34N31jRM53674422 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 23 May 2023 03:01:46 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DAC2520043; Tue, 23 May 2023 03:01:45 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8281720040; Tue, 23 May 2023 03:01:43 +0000 (GMT) Received: from [9.177.28.193] (unknown [9.177.28.193]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 23 May 2023 03:01:43 +0000 (GMT) Message-ID: <06c5a418-ca90-b117-04b1-c3bef50ae28c@linux.ibm.com> Date: Tue, 23 May 2023 11:01:41 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 Content-Language: en-US To: Richard Biener Cc: GCC Patches , Richard Sandiford , Segher Boessenkool , Peter Bergner References: <72a5c5db-bc06-eded-d229-82af34342515@linux.ibm.com> <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: yIPCUn00sl4IfygdG1RXMQDtNWixECs6 X-Proofpoint-ORIG-GUID: du8RtVrQ0d-_0BlmYLxHjtthCA6MXuSA Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-22_18,2023-05-22_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 bulkscore=0 impostorscore=0 suspectscore=0 adultscore=0 priorityscore=1501 mlxscore=0 spamscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305230019 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richi, Thanks for the review! on 2023/5/22 21:44, Richard Biener wrote: > On Wed, May 17, 2023 at 8:15 AM Kewen.Lin wrote: >> >> Hi, >> >> Following Richi's suggestion in [1], I'm working on deferring >> cost evaluation next to the transformation, this patch is >> to enhance function vect_transform_slp_perm_load_1 which >> could under-cost for vector permutation, since the costing >> doesn't try to consider nvectors_per_build, it's inconsistent >> with the transformation part. >> >> Bootstrapped and regtested on x86_64-redhat-linux, >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >> >> Is it ok for trunk? >> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html >> >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the >> calculation on n_perms by considering nvectors_per_build. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. >> --- >> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ >> gcc/tree-vect-slp.cc | 66 ++++++++++--------- >> 2 files changed, 57 insertions(+), 32 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >> >> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >> new file mode 100644 >> index 00000000000..e5c4dceddfb >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >> @@ -0,0 +1,23 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-require-effective-target powerpc_p9vector_ok } */ >> +/* Specify power9 to ensure the vectorization is profitable >> + and test point stands, otherwise it could be not profitable >> + to vectorize. */ >> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ >> + >> +/* Verify we cost the exact count for required vec_perm. */ >> + >> +int x[1024], y[1024]; >> + >> +void >> +foo () >> +{ >> + for (int i = 0; i < 512; ++i) >> + { >> + x[2 * i] = y[1023 - (2 * i)]; >> + x[2 * i + 1] = y[1023 - (2 * i + 1)]; >> + } >> +} >> + >> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc >> index e5c9d7e766e..af9a6dd4fa9 100644 >> --- a/gcc/tree-vect-slp.cc >> +++ b/gcc/tree-vect-slp.cc >> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> >> mode = TYPE_MODE (vectype); >> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); >> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); >> >> /* Initialize the vect stmts of NODE to properly insert the generated >> stmts later. */ >> if (! analyze_only) >> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); >> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) >> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) >> SLP_TREE_VEC_STMTS (node).quick_push (NULL); >> >> /* Generate permutation masks for every NODE. Number of masks for each NODE >> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> (b) the permutes only need a single vector input. */ >> mask.new_vector (nunits, group_size, 3); >> nelts_to_build = mask.encoded_nelts (); >> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); >> + /* It's possible to obtain zero nstmts during analyze_only, so make >> + it at least one to ensure the later computation for n_perms >> + proceed. */ >> + nvectors_per_build = nstmts > 0 ? nstmts : 1; >> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; >> } >> else >> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> return false; >> } >> >> - ++*n_perms; >> - >> + tree mask_vec = NULL_TREE; >> if (!analyze_only) >> - { >> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); >> + mask_vec = vect_gen_perm_mask_checked (vectype, indices); >> >> - if (second_vec_index == -1) >> - second_vec_index = first_vec_index; >> + if (second_vec_index == -1) >> + second_vec_index = first_vec_index; >> >> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >> + { >> + ++*n_perms; > > So the "real" change is doing > > *n_perms += nvectors_per_build; > > and *n_perms was unused when !analyze_only? And since at Yes, although both !analyze_only and analyze_only calls pass n_perms in, now only the call sites with analyze_only will use the returned n_perms further. > analysis time we (sometimes?) have zero nvectors you have to > fixup above? Which cases are that? Yes, the fixup is to avoid to result in unexpected n_perms in function vect_optimize_slp_pass::internal_node_cost。 One typical case is gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize out one more vec_perm unexpectedly. In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms is zero or not (vec_perm not needed or needed). if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL, nullptr, vf, true, false, &n_perms)) { auto rep = SLP_TREE_REPRESENTATIVE (node); if (out_layout_i == 0) { /* Use the fallback cost if the load is an N-to-N permutation. Otherwise assume that the node will be rejected later and rebuilt from scalars. */ if (STMT_VINFO_GROUPED_ACCESS (rep) && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep)) == SLP_TREE_LANES (node))) return fallback_cost; return 0; } return -1; } /* See the comment above the corresponding VEC_PERM_EXPR handling. */ return n_perms == 0 ? 0 : 1; In vect_optimize_slp_pass::forward_pass (), it only considers the case that factor > 0 (there is some vec_perm needed). /* Accumulate the cost of using LAYOUT_I within NODE, both for the inputs and the outputs. */ int factor = internal_node_cost (vertex.node, layout_i, layout_i); if (factor < 0) { is_possible = false; break; } else if (factor) layout_costs.internal_cost.add_serial_cost ({ vertex.weight * factor, m_optimize_size }); BR, Kewen > > In principle the patch looks good to me. > > Richard. > >> + if (analyze_only) >> + continue; >> + /* Generate the permute statement if necessary. */ >> + tree first_vec = dr_chain[first_vec_index + ri]; >> + tree second_vec = dr_chain[second_vec_index + ri]; >> + gassign *stmt = as_a (stmt_info->stmt); >> + tree perm_dest >> + = vect_create_destination_var (gimple_assign_lhs (stmt), >> + vectype); >> + perm_dest = make_ssa_name (perm_dest); >> + gimple *perm_stmt >> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec, >> + second_vec, mask_vec); >> + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, >> + gsi); >> + if (dce_chain) >> { >> - /* Generate the permute statement if necessary. */ >> - tree first_vec = dr_chain[first_vec_index + ri]; >> - tree second_vec = dr_chain[second_vec_index + ri]; >> - gassign *stmt = as_a (stmt_info->stmt); >> - tree perm_dest >> - = vect_create_destination_var (gimple_assign_lhs (stmt), >> - vectype); >> - perm_dest = make_ssa_name (perm_dest); >> - gimple *perm_stmt >> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, >> - first_vec, second_vec, mask_vec); >> - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, >> - gsi); >> - if (dce_chain) >> - { >> - bitmap_set_bit (used_defs, first_vec_index + ri); >> - bitmap_set_bit (used_defs, second_vec_index + ri); >> - } >> - >> - /* Store the vector statement in NODE. */ >> - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] >> - = perm_stmt; >> + bitmap_set_bit (used_defs, first_vec_index + ri); >> + bitmap_set_bit (used_defs, second_vec_index + ri); >> } >> + >> + /* Store the vector statement in NODE. */ >> + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; >> } >> } >> else if (!analyze_only) >> -- >> 2.39.1