From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E89D13955408 for ; Wed, 2 Jun 2021 09:34:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E89D13955408 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 1529XnNh128045; Wed, 2 Jun 2021 05:34:14 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 38x7858bad-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Jun 2021 05:34:14 -0400 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1529XreF128865; Wed, 2 Jun 2021 05:34:14 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 38x7858b9p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Jun 2021 05:34:13 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1529YC6x012198; Wed, 2 Jun 2021 09:34:12 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma04ams.nl.ibm.com with ESMTP id 38ud88a7vh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Jun 2021 09:34:12 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1529XYZs34341144 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Jun 2021 09:33:34 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B59F452052; Wed, 2 Jun 2021 09:34:08 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.146.52]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 6BCD352050; Wed, 2 Jun 2021 09:34:07 +0000 (GMT) To: GCC Patches Cc: Richard Biener , Segher Boessenkool , Bill Schmidt From: "Kewen.Lin" Subject: [PATCH] predcom: Enabled by loop vect at O2 [PR100794] Message-ID: Date: Wed, 2 Jun 2021 17:34:05 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------FA7942CC0756A992D73B7C80" Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 42HpArcO50b591ro_QTh1sB_udKxW4HW X-Proofpoint-GUID: kqaNcXGiuMFv8jsYe0OW5Xwiar9-5lJp X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-06-02_05:2021-06-02, 2021-06-02 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 phishscore=0 spamscore=0 impostorscore=0 suspectscore=0 bulkscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106020061 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jun 2021 09:34:17 -0000 This is a multi-part message in MIME format. --------------FA7942CC0756A992D73B7C80 Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 7bit Hi, As PR100794 shows, in the current implementation PRE bypasses some optimization to avoid introducing loop carried dependence which stops loop vectorizer to vectorize the loop. At -O2, there is no downstream pass to re-catch this kind of opportunity if loop vectorizer fails to vectorize that loop. This patch follows Richi's suggestion in the PR, if predcom flag isn't set and loop vectorization will enable predcom without any unrolling implicitly. The Power9 SPEC2017 evaluation showed it can speed up 521.wrf_r 3.30% and 554.roms_r 1.08% at very-cheap cost model, no remarkable impact at cheap cost model, the build time and size impact is fine (see the PR for the details). By the way, I tested another proposal to guard PRE not skip the optimization for cheap and very-cheap vect cost models, the evaluation results showed it's fine with very cheap cost model, but it can degrade some bmks like 521.wrf_r -9.17% and 549.fotonik3d_r -2.07% etc. Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: PR tree-optimization/100794 * tree-predcom.c (tree_predictive_commoning_loop): Add parameter allow_unroll_p and only allow unrolling when it's true. (tree_predictive_commoning): Add parameter allow_unroll_p and adjust for it. (run_tree_predictive_commoning): Likewise. (class pass_predcom): Add private member allow_unroll_p. (pass_predcom::pass_predcom): Init allow_unroll_p. (pass_predcom::gate): Check flag_tree_loop_vectorize and global_options_set.x_flag_predictive_commoning. (pass_predcom::execute): Adjust for allow_unroll_p. gcc/testsuite/ChangeLog: PR tree-optimization/100794 * gcc.dg/tree-ssa/pr100794.c: New test. --------------FA7942CC0756A992D73B7C80 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="predcom_enabled_by_loop_vect.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="predcom_enabled_by_loop_vect.diff" gcc/testsuite/gcc.dg/tree-ssa/pr100794.c | 20 +++++++++ gcc/tree-predcom.c | 57 +++++++++++++++++------- 2 files changed, 60 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr100794.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c new file mode 100644 index 00000000000..6f707ae7fba --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-loop-vectorize -fdump-tree-pcom-details -fdisable-tree-vect" } */ + +extern double arr[100]; +extern double foo (double, double); +extern double sum; + +void +test (int i_0, int i_n) +{ + int i; + for (i = i_0; i < i_n - 1; i++) + { + double a = arr[i]; + double b = arr[i + 1]; + sum += a * b; + } +} + +/* { dg-final { scan-tree-dump "Executing predictive commoning without unrolling" "pcom" } } */ diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index 02f911a08bb..65a93c8e505 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -3178,13 +3178,13 @@ insert_init_seqs (class loop *loop, vec chains) applied to this loop. */ static unsigned -tree_predictive_commoning_loop (class loop *loop) +tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) { vec datarefs; vec dependences; struct component *components; vec chains = vNULL; - unsigned unroll_factor; + unsigned unroll_factor = 0; class tree_niter_desc desc; bool unroll = false, loop_closed_ssa = false; @@ -3272,11 +3272,13 @@ tree_predictive_commoning_loop (class loop *loop) dump_chains (dump_file, chains); } - /* Determine the unroll factor, and if the loop should be unrolled, ensure - that its number of iterations is divisible by the factor. */ - unroll_factor = determine_unroll_factor (chains); - unroll = (unroll_factor > 1 - && can_unroll_loop_p (loop, unroll_factor, &desc)); + if (allow_unroll_p) + /* Determine the unroll factor, and if the loop should be unrolled, ensure + that its number of iterations is divisible by the factor. */ + unroll_factor = determine_unroll_factor (chains); + + if (unroll_factor > 1) + unroll = can_unroll_loop_p (loop, unroll_factor, &desc); /* Execute the predictive commoning transformations, and possibly unroll the loop. */ @@ -3319,7 +3321,7 @@ tree_predictive_commoning_loop (class loop *loop) /* Runs predictive commoning. */ unsigned -tree_predictive_commoning (void) +tree_predictive_commoning (bool allow_unroll_p) { class loop *loop; unsigned ret = 0, changed = 0; @@ -3328,7 +3330,7 @@ tree_predictive_commoning (void) FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST) if (optimize_loop_for_speed_p (loop)) { - changed |= tree_predictive_commoning_loop (loop); + changed |= tree_predictive_commoning_loop (loop, allow_unroll_p); } free_original_copy_tables (); @@ -3355,12 +3357,12 @@ tree_predictive_commoning (void) /* Predictive commoning Pass. */ static unsigned -run_tree_predictive_commoning (struct function *fun) +run_tree_predictive_commoning (struct function *fun, bool allow_unroll_p) { if (number_of_loops (fun) <= 1) return 0; - return tree_predictive_commoning (); + return tree_predictive_commoning (allow_unroll_p); } namespace { @@ -3382,15 +3384,36 @@ class pass_predcom : public gimple_opt_pass { public: pass_predcom (gcc::context *ctxt) - : gimple_opt_pass (pass_data_predcom, ctxt) + : gimple_opt_pass (pass_data_predcom, ctxt), + allow_unroll_p (true) {} /* opt_pass methods: */ - virtual bool gate (function *) { return flag_predictive_commoning != 0; } - virtual unsigned int execute (function *fun) - { - return run_tree_predictive_commoning (fun); - } + virtual bool + gate (function *) + { + if (flag_predictive_commoning != 0) + return true; + /* Loop vectorization enables predictive commoning implicitly + only if predictive commoning isn't set explicitly, and it + doesn't allow unrolling. */ + if (flag_tree_loop_vectorize + && !global_options_set.x_flag_predictive_commoning) + { + allow_unroll_p = false; + return true; + } + return false; + } + + virtual unsigned int + execute (function *fun) + { + return run_tree_predictive_commoning (fun, allow_unroll_p); + } + +private: + bool allow_unroll_p; }; // class pass_predcom -- 2.17.1 --------------FA7942CC0756A992D73B7C80--