From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id DFC613858D37 for ; Wed, 24 May 2023 05:23:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DFC613858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34O554d8011271; Wed, 24 May 2023 05:23:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=HIMFyfAWf58CGOgn1VSpx1GegbWs1lV/7oDURoWZVoE=; b=QLt/DZfQibOFH8ZtmfaT2o31uAe7viNWP7d34TasfxgYWGBc5V4ZoCLqWqArNDJicurR I3H/srpTPC/YtWrqGlVGSMD0f1SjZFGbEsgQsmikQzN/XGw8Md7HW08hrV0pw+xBFpZh Fze6Bqb92tS0F/rdkhk04DSyi6CaVfGtHR67ar5JY4ZL15c1a6Xq/Pme3bnNP0SIBSn+ 2bN3nosMc71KQQJS74Jb+ih9ZfHPAkEliGH41m2fctAQcP0DnR6iTCbRLN1mMDSTKFgJ y140gBjTNXxAEXCwAHYcmG8Rr6vigPc/8DWN3x7biDEvG1ug6EiMgOmp2QgA3YudINRb 4g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qsbfys22s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 May 2023 05:23:50 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34O59mMq022547; Wed, 24 May 2023 05:23:49 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qsbfys22a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 May 2023 05:23:49 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34O3HF5d000413; Wed, 24 May 2023 05:23:47 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3qppcs9g4s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 May 2023 05:23:47 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34O5NjJp6226482 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 May 2023 05:23:45 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 031722004B; Wed, 24 May 2023 05:23:45 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DD4BD20043; Wed, 24 May 2023 05:23:41 +0000 (GMT) Received: from [9.177.82.162] (unknown [9.177.82.162]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 24 May 2023 05:23:41 +0000 (GMT) Message-ID: <544abef2-cd6b-ebf7-4bed-8be576e58fad@linux.ibm.com> Date: Wed, 24 May 2023 13:23:39 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 Content-Language: en-US To: Richard Biener Cc: GCC Patches , Richard Sandiford , Segher Boessenkool , Peter Bergner References: <72a5c5db-bc06-eded-d229-82af34342515@linux.ibm.com> <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com> <06c5a418-ca90-b117-04b1-c3bef50ae28c@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: UoLpWf80siLLY_w1Fthk-mMzY3TXaI6F X-Proofpoint-ORIG-GUID: Ep-0OivUX_M6SeNnxKG6wRrI-Vw5ICu0 Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-24_02,2023-05-23_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 mlxscore=0 lowpriorityscore=0 impostorscore=0 suspectscore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 adultscore=0 clxscore=1015 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305240042 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: on 2023/5/23 14:19, Richard Biener wrote: > On Tue, May 23, 2023 at 5:01 AM Kewen.Lin wrote: >> >> Hi Richi, >> >> Thanks for the review! >> >> on 2023/5/22 21:44, Richard Biener wrote: >>> On Wed, May 17, 2023 at 8:15 AM Kewen.Lin wrote: >>>> >>>> Hi, >>>> >>>> Following Richi's suggestion in [1], I'm working on deferring >>>> cost evaluation next to the transformation, this patch is >>>> to enhance function vect_transform_slp_perm_load_1 which >>>> could under-cost for vector permutation, since the costing >>>> doesn't try to consider nvectors_per_build, it's inconsistent >>>> with the transformation part. >>>> >>>> Bootstrapped and regtested on x86_64-redhat-linux, >>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >>>> >>>> Is it ok for trunk? >>>> >>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html >>>> >>>> BR, >>>> Kewen >>>> ----- >>>> gcc/ChangeLog: >>>> >>>> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the >>>> calculation on n_perms by considering nvectors_per_build. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. >>>> --- >>>> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ >>>> gcc/tree-vect-slp.cc | 66 ++++++++++--------- >>>> 2 files changed, 57 insertions(+), 32 deletions(-) >>>> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >>>> >>>> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >>>> new file mode 100644 >>>> index 00000000000..e5c4dceddfb >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >>>> @@ -0,0 +1,23 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-require-effective-target vect_int } */ >>>> +/* { dg-require-effective-target powerpc_p9vector_ok } */ >>>> +/* Specify power9 to ensure the vectorization is profitable >>>> + and test point stands, otherwise it could be not profitable >>>> + to vectorize. */ >>>> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ >>>> + >>>> +/* Verify we cost the exact count for required vec_perm. */ >>>> + >>>> +int x[1024], y[1024]; >>>> + >>>> +void >>>> +foo () >>>> +{ >>>> + for (int i = 0; i < 512; ++i) >>>> + { >>>> + x[2 * i] = y[1023 - (2 * i)]; >>>> + x[2 * i + 1] = y[1023 - (2 * i + 1)]; >>>> + } >>>> +} >>>> + >>>> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ >>>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc >>>> index e5c9d7e766e..af9a6dd4fa9 100644 >>>> --- a/gcc/tree-vect-slp.cc >>>> +++ b/gcc/tree-vect-slp.cc >>>> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >>>> >>>> mode = TYPE_MODE (vectype); >>>> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); >>>> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); >>>> >>>> /* Initialize the vect stmts of NODE to properly insert the generated >>>> stmts later. */ >>>> if (! analyze_only) >>>> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); >>>> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) >>>> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) >>>> SLP_TREE_VEC_STMTS (node).quick_push (NULL); >>>> >>>> /* Generate permutation masks for every NODE. Number of masks for each NODE >>>> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >>>> (b) the permutes only need a single vector input. */ >>>> mask.new_vector (nunits, group_size, 3); >>>> nelts_to_build = mask.encoded_nelts (); >>>> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); >>>> + /* It's possible to obtain zero nstmts during analyze_only, so make >>>> + it at least one to ensure the later computation for n_perms >>>> + proceed. */ >>>> + nvectors_per_build = nstmts > 0 ? nstmts : 1; >>>> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; >>>> } >>>> else >>>> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >>>> return false; >>>> } >>>> >>>> - ++*n_perms; >>>> - >>>> + tree mask_vec = NULL_TREE; >>>> if (!analyze_only) >>>> - { >>>> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); >>>> + mask_vec = vect_gen_perm_mask_checked (vectype, indices); >>>> >>>> - if (second_vec_index == -1) >>>> - second_vec_index = first_vec_index; >>>> + if (second_vec_index == -1) >>>> + second_vec_index = first_vec_index; >>>> >>>> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >>>> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >>>> + { >>>> + ++*n_perms; >>> >>> So the "real" change is doing >>> >>> *n_perms += nvectors_per_build; >>> >>> and *n_perms was unused when !analyze_only? And since at >> >> Yes, although both !analyze_only and analyze_only calls pass n_perms in, now >> only the call sites with analyze_only will use the returned n_perms further. >> >>> analysis time we (sometimes?) have zero nvectors you have to >>> fixup above? Which cases are that? >> >> Yes, the fixup is to avoid to result in unexpected n_perms in function >> vect_optimize_slp_pass::internal_node_cost。 One typical case is >> gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize >> out one more vec_perm unexpectedly. >> >> In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms >> is zero or not (vec_perm not needed or needed). >> >> if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL, >> nullptr, vf, true, false, &n_perms)) >> { >> auto rep = SLP_TREE_REPRESENTATIVE (node); >> if (out_layout_i == 0) >> { >> /* Use the fallback cost if the load is an N-to-N permutation. >> Otherwise assume that the node will be rejected later >> and rebuilt from scalars. */ >> if (STMT_VINFO_GROUPED_ACCESS (rep) >> && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep)) >> == SLP_TREE_LANES (node))) >> return fallback_cost; >> return 0; >> } >> return -1; >> } >> >> /* See the comment above the corresponding VEC_PERM_EXPR handling. */ >> return n_perms == 0 ? 0 : 1; >> >> In vect_optimize_slp_pass::forward_pass (), it only considers the case that >> factor > 0 (there is some vec_perm needed). >> >> /* Accumulate the cost of using LAYOUT_I within NODE, >> both for the inputs and the outputs. */ >> int factor = internal_node_cost (vertex.node, layout_i, >> layout_i); >> if (factor < 0) >> { >> is_possible = false; >> break; >> } >> else if (factor) >> layout_costs.internal_cost.add_serial_cost >> ({ vertex.weight * factor, m_optimize_size }); > > Ah, OK - thanks for clarifying. > > The patch is OK. Thanks! Committed in r14-1151. BR, Kewen