From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 0A8223858C41 for ; Wed, 17 May 2023 06:15:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0A8223858C41 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34H687o4013993; Wed, 17 May 2023 06:15:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=qM4Cj708rrMnOdi8UuY68qWlyHBLSxfmPNrQSn/s+s0=; b=MHs+QHsRySADgJG7knwhaRyIWiiqFdt0OALza5x6M/P3cstTcAW7pejfEnWVcaaWzY/W g3s6hsN6f0tq7o4g2y6vsOa6Hsn0y6InoitwiO6WMJY9e8UZGB5kOWA5MbuuSEOUBIds 3Cc4j1zeGhUAtmzjmpk1ox+IQO4/CczjHkmjE7ygkD+HOD9Pv4AwW/VtlSBdRqZW8CjC efd5Gx4vEARaGo+xYrjI2g3ZRI+D4JwA8Y2Pr7bi9O/CLXOKoUe7Ar7kY730EtyHSukj l7YqROvtBvGGwXU5Zz/aE9wHkuxoiZdBwj4OJIL/KH51RrP/J26Un6zt90nUweMH5eoY VA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qms788jsy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 May 2023 06:15:13 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34H68k9J018250; Wed, 17 May 2023 06:15:13 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qms788jrj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 May 2023 06:15:13 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34H0aG8Z019786; Wed, 17 May 2023 06:15:09 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3qj1tdsnr9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 May 2023 06:15:08 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34H6F6aq16712402 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 May 2023 06:15:06 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 55B9320040; Wed, 17 May 2023 06:15:05 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7564220049; Wed, 17 May 2023 06:15:02 +0000 (GMT) Received: from [9.177.81.160] (unknown [9.177.81.160]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 17 May 2023 06:15:02 +0000 (GMT) Message-ID: <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com> Date: Wed, 17 May 2023 14:15:00 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 Content-Language: en-US To: GCC Patches Cc: Richard Biener , Richard Sandiford , Segher Boessenkool , Peter Bergner References: <72a5c5db-bc06-eded-d229-82af34342515@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: <72a5c5db-bc06-eded-d229-82af34342515@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Lxtqpv_pLrGHcs3z6YJVSbutywyywwVx X-Proofpoint-ORIG-GUID: VytFRz4_xchlFV9_kgYHTH5Li9v-T5Lp Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-16_14,2023-05-16_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 suspectscore=0 bulkscore=0 priorityscore=1501 phishscore=0 spamscore=0 impostorscore=0 mlxlogscore=999 lowpriorityscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305170051 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Following Richi's suggestion in [1], I'm working on deferring cost evaluation next to the transformation, this patch is to enhance function vect_transform_slp_perm_load_1 which could under-cost for vector permutation, since the costing doesn't try to consider nvectors_per_build, it's inconsistent with the transformation part. Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu and powerpc64{,le}-linux-gnu. Is it ok for trunk? [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html BR, Kewen ----- gcc/ChangeLog: * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the calculation on n_perms by considering nvectors_per_build. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. --- .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ gcc/tree-vect-slp.cc | 66 ++++++++++--------- 2 files changed, 57 insertions(+), 32 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c new file mode 100644 index 00000000000..e5c4dceddfb --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* Specify power9 to ensure the vectorization is profitable + and test point stands, otherwise it could be not profitable + to vectorize. */ +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ + +/* Verify we cost the exact count for required vec_perm. */ + +int x[1024], y[1024]; + +void +foo () +{ + for (int i = 0; i < 512; ++i) + { + x[2 * i] = y[1023 - (2 * i)]; + x[2 * i + 1] = y[1023 - (2 * i + 1)]; + } +} + +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index e5c9d7e766e..af9a6dd4fa9 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, mode = TYPE_MODE (vectype); poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); /* Initialize the vect stmts of NODE to properly insert the generated stmts later. */ if (! analyze_only) - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) SLP_TREE_VEC_STMTS (node).quick_push (NULL); /* Generate permutation masks for every NODE. Number of masks for each NODE @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, (b) the permutes only need a single vector input. */ mask.new_vector (nunits, group_size, 3); nelts_to_build = mask.encoded_nelts (); - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); + /* It's possible to obtain zero nstmts during analyze_only, so make + it at least one to ensure the later computation for n_perms + proceed. */ + nvectors_per_build = nstmts > 0 ? nstmts : 1; in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; } else @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, return false; } - ++*n_perms; - + tree mask_vec = NULL_TREE; if (!analyze_only) - { - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); + mask_vec = vect_gen_perm_mask_checked (vectype, indices); - if (second_vec_index == -1) - second_vec_index = first_vec_index; + if (second_vec_index == -1) + second_vec_index = first_vec_index; - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) + { + ++*n_perms; + if (analyze_only) + continue; + /* Generate the permute statement if necessary. */ + tree first_vec = dr_chain[first_vec_index + ri]; + tree second_vec = dr_chain[second_vec_index + ri]; + gassign *stmt = as_a (stmt_info->stmt); + tree perm_dest + = vect_create_destination_var (gimple_assign_lhs (stmt), + vectype); + perm_dest = make_ssa_name (perm_dest); + gimple *perm_stmt + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec, + second_vec, mask_vec); + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, + gsi); + if (dce_chain) { - /* Generate the permute statement if necessary. */ - tree first_vec = dr_chain[first_vec_index + ri]; - tree second_vec = dr_chain[second_vec_index + ri]; - gassign *stmt = as_a (stmt_info->stmt); - tree perm_dest - = vect_create_destination_var (gimple_assign_lhs (stmt), - vectype); - perm_dest = make_ssa_name (perm_dest); - gimple *perm_stmt - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, - first_vec, second_vec, mask_vec); - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, - gsi); - if (dce_chain) - { - bitmap_set_bit (used_defs, first_vec_index + ri); - bitmap_set_bit (used_defs, second_vec_index + ri); - } - - /* Store the vector statement in NODE. */ - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] - = perm_stmt; + bitmap_set_bit (used_defs, first_vec_index + ri); + bitmap_set_bit (used_defs, second_vec_index + ri); } + + /* Store the vector statement in NODE. */ + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; } } else if (!analyze_only) -- 2.39.1