From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 5BAAB3857C73; Thu, 19 Aug 2021 05:51:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5BAAB3857C73 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17J5Yxmu008620; Thu, 19 Aug 2021 01:51:17 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3agp20a54r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Aug 2021 01:51:16 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 17J5a5jU011866; Thu, 19 Aug 2021 01:51:16 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 3agp20a549-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Aug 2021 01:51:16 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17J5h55F009846; Thu, 19 Aug 2021 05:51:14 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04ams.nl.ibm.com with ESMTP id 3ae5f8fq2y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Aug 2021 05:51:14 +0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17J5pBCJ53412146 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 19 Aug 2021 05:51:11 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BF4EF4203F; Thu, 19 Aug 2021 05:51:11 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D37574205C; Thu, 19 Aug 2021 05:51:07 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.33.147]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 19 Aug 2021 05:51:07 +0000 (GMT) Subject: [PATCH v2] Don't move cold code out of loop by checking bb count To: Ulrich Drepper Cc: Richard Biener , Segher Boessenkool , Bill Schmidt , linkw@gcc.gnu.org, GCC Patches , Jan Hubicka , David Edelsohn References: <20210802050501.159058-1-luoxhu@linux.ibm.com> From: Xionghu Luo Message-ID: <05aeb621-3e5d-4194-7474-f1d8a2b352d4@linux.ibm.com> Date: Thu, 19 Aug 2021 13:51:04 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: zckyb7-_l4Smi0YqIFBSqLxEIKLdaXe4 X-Proofpoint-GUID: ff0ah8JvIon7_4JDQbRGsPntgs4YNN6Y Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-19_01:2021-08-17, 2021-08-19 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 clxscore=1011 priorityscore=1501 lowpriorityscore=0 phishscore=0 malwarescore=0 mlxscore=0 bulkscore=0 impostorscore=0 suspectscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108190030 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Aug 2021 05:51:27 -0000 On 2021/8/10 12:25, Ulrich Drepper wrote: > On Tue, Aug 10, 2021 at 4:03 AM Xionghu Luo via Gcc-patches > wrote: >> For this case, theorotically I think the master GCC will optimize it to: >> >> invariant; >> for (;;) >> if (unlikely_cond) >> for (;;) >> ; >> >> 'invariant' is moved out of outer loop, but with the patch, it will get: >> >> for (;;) >> if (unlikely_cond) >> { >> invariant; >> for (;;) >> ; >> } >> >> 'invariant' is *cold* for outer loop, but it is still *hot* for inner loop, >> so hoist it out of inner loop, this is exactly what we want, right? > > Is relying on absolute numbers really what you want? If the > 'unlikely_cond' condition depends on the iteration count of the outer > loop the probability of it being true in each individual iteration can > be low (at least that's how I use unlikely) but the overall > probability of needing the code is higher 1 - (1 - p)^n if 'p' is the > probability of 'unlikely_cond' and 'n' is the number of iterations. > Assuming complete independence of the loop iterations, otherwise it's > rather an upper limit. > > At the very least I'd generate code like this: > > first = true; > for (;;) > if (unlikely_cond) > { > if (first) > { > invariant; > first = false; > } > for (;;) > ; > } > > If it's worth hoisting the code the the extra test and flag should be > small in cost in comparison. > > If 'unlikely_cond' does not in any way depend on the loop iteration > then I think your code generation is fine. Thanks for your good suggestion, I am also not sure whether it is necessary to do it this way:) But I found that even the first step of for (;;) if (unlikely_cond) { invariant; for (;;) ; } is not supported yet. So I added a new function *find_coldest_out_loop* to search the coldest function between outermost invariant loop and own loop in compute_invariantness to move invariant out to cold loop first: [PATCH v2] Don't move cold code out of loop by checking bb count There was a patch trying to avoid move cold block out of loop: https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html Richard suggested to "never hoist anything from a bb with lower execution frequency to a bb with higher one in LIM invariantness_dom_walker before_dom_children". In gimple LIM analysis, add find_coldest_out_loop to move invariants to expected target loop, then in both gimple LIM move_computations_worker and RTL loop-invariant.c find_invariants_bb, if profile count check find the loop bb is colder than target loop preheader, don't hoist it out of loop. SPEC2017 performance evaluation shows 1% performance improvement for intrate GEOMEAN and no obvious regression for others. Especially, 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00% on P8LE. Regression and bootstrap tested pass on P8LE, any comments? Thanks. gcc/ChangeLog: * loop-invariant.c (find_invariants_bb): Check profile count before motion. (find_invariants_body): Add argument. * tree-ssa-loop-im.c (find_coldest_out_loop): New function. (outermost_invariant_loop): Use find_coldest_out_loop. (determine_max_movement): Likewise. (move_computations_worker): Check profile count before motion. (execute_sm): Likewise. (execute_sm_exit): Check pointer validness. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/recip-3.c: Adjust. * gcc.dg/tree-ssa/ssa-lim-16.c: New test. * gcc.dg/tree-ssa/ssa-lim-17.c: New test. --- gcc/loop-invariant.c | 10 +- gcc/tree-ssa-loop-im.c | 186 +++++++++++++++++++-- gcc/testsuite/gcc.dg/tree-ssa/recip-3.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c | 21 +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c | 26 +++ 5 files changed, 231 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index fca0c2b24be..5c3be7bf0eb 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) call. */ static void -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, + bool always_executed) { rtx_insn *insn; + basic_block preheader = loop_preheader_edge (loop)->src; + + if (preheader->count > bb->count) + return; FOR_BB_INSNS (bb, insn) { @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body, unsigned i; for (i = 0; i < loop->num_nodes; i++) - find_invariants_bb (body[i], - bitmap_bit_p (always_reached, i), + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), bitmap_bit_p (always_executed, i)); } diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index d9f75d5025e..2b90caa90fc 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -417,6 +417,23 @@ movement_possibility (gimple *stmt) return ret; } +/* Find coldest loop between min_loop and loop by comapring profile count. */ + +static class loop * +find_coldest_out_loop (class loop *min_loop, class loop *loop) +{ + class loop *cold_loop = min_loop; + profile_count min_count = loop_preheader_edge (min_loop)->src->count; + + while (min_loop != loop) + { + min_loop = superloop_at_depth (loop, loop_depth (min_loop) + 1); + if (loop_preheader_edge (min_loop)->src->count < min_count) + cold_loop = min_loop; + } + return cold_loop; +} + /* Suppose that operand DEF is used inside the LOOP. Returns the outermost loop to that we could move the expression using DEF if it did not have other operands, i.e. the outermost loop enclosing LOOP in that the value @@ -431,18 +448,18 @@ outermost_invariant_loop (tree def, class loop *loop) struct lim_aux_data *lim_data; if (!def) - return superloop_at_depth (loop, 1); + return find_coldest_out_loop (superloop_at_depth (loop, 1), loop); if (TREE_CODE (def) != SSA_NAME) { gcc_assert (is_gimple_min_invariant (def)); - return superloop_at_depth (loop, 1); + return find_coldest_out_loop (superloop_at_depth (loop, 1), loop); } def_stmt = SSA_NAME_DEF_STMT (def); def_bb = gimple_bb (def_stmt); if (!def_bb) - return superloop_at_depth (loop, 1); + return find_coldest_out_loop (superloop_at_depth (loop, 1), loop); max_loop = find_common_loop (loop, def_bb->loop_father); @@ -452,7 +469,11 @@ outermost_invariant_loop (tree def, class loop *loop) loop_outer (lim_data->max_loop)); if (max_loop == loop) return NULL; - max_loop = superloop_at_depth (loop, loop_depth (max_loop) + 1); + max_loop = find_coldest_out_loop (max_loop, loop); + if (max_loop == loop) + return max_loop; + else + max_loop = superloop_at_depth (loop, loop_depth (max_loop) + 1); return max_loop; } @@ -684,7 +705,7 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec) if (must_preserve_exec) level = ALWAYS_EXECUTED_IN (bb); else - level = superloop_at_depth (loop, 1); + level = find_coldest_out_loop (superloop_at_depth (loop, 1), loop); lim_data->max_loop = level; if (gphi *phi = dyn_cast (stmt)) @@ -787,6 +808,7 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec) loop, ref); if (!lim_data->max_loop) return false; + lim_data->max_loop = find_coldest_out_loop (lim_data->max_loop, loop); } else if (! add_dependency (gimple_vuse (stmt), lim_data, loop, false)) return false; @@ -1154,6 +1176,58 @@ move_computations_worker (basic_block bb) continue; } + edge e = loop_preheader_edge (level); + if (e->src->count > bb->count) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "PHI node NOT moved to %d from %d:\n", + e->src->index, bb->index); + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost, + level->num); + } + gsi_next (&bsi); + continue; + } + else + { + unsigned i; + bool skip_phi_move = false; + for (i = 0; i < gimple_phi_num_args (stmt); i++) + { + tree def = PHI_ARG_DEF (stmt, i); + + if (TREE_CODE (def) != SSA_NAME) + continue; + + gimple *def_stmt = SSA_NAME_DEF_STMT (def); + + if (!gimple_bb (def_stmt)) + continue; + + if (!dominated_by_p (CDI_DOMINATORS, e->src, + gimple_bb (def_stmt))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "PHI node NOT moved to %d from %d\n", + e->src->index, bb->index); + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost, + level->num); + } + skip_phi_move = true; + break; + } + } + if (skip_phi_move) + { + gsi_next (&bsi); + continue; + } + } + if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Moving PHI node\n"); @@ -1191,14 +1265,13 @@ move_computations_worker (basic_block bb) tree lhs = gimple_assign_lhs (new_stmt); SSA_NAME_RANGE_INFO (lhs) = NULL; } - gsi_insert_on_edge (loop_preheader_edge (level), new_stmt); + gsi_insert_on_edge (e, new_stmt); remove_phi_node (&bsi, false); } for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); ) { edge e; - gimple *stmt = gsi_stmt (bsi); lim_data = get_lim_data (stmt); @@ -1221,7 +1294,83 @@ move_computations_worker (basic_block bb) /* We do not really want to move conditionals out of the loop; we just placed it here to force its operands to be moved if necessary. */ if (gimple_code (stmt) == GIMPLE_COND) - continue; + { + gsi_next (&bsi); + continue; + } + + e = loop_preheader_edge (level); + if (e->src->count > bb->count) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "stmt: Statement NOT moved to %d from %d \n", + e->src->index, bb->index); + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost, + level->num); + } + gsi_next (&bsi); + continue; + } + else + { + if (is_gimple_assign (stmt)) + { + tree rhs1 = gimple_assign_rhs1 (stmt); + tree rhs2 = gimple_assign_rhs2 (stmt); + if (TREE_CODE (rhs1) == MEM_REF) + { + rhs2 = TREE_OPERAND (rhs1, 1); + rhs1 = TREE_OPERAND (rhs1, 0); + } + gimple *stmt1 = NULL, *stmt2 = NULL; + basic_block def_bb; + if (rhs1 && TREE_CODE (rhs1) == SSA_NAME) + { + stmt1 = SSA_NAME_DEF_STMT (rhs1); + def_bb = gimple_bb (stmt1); + if (stmt1 + && def_bb + && (def_bb == bb + || !dominated_by_p (CDI_DOMINATORS, e->src, def_bb))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "stmt1: Statement NOT moved to %d from %d\n", + e->src->index, bb->index); + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "(cost %u) out of loop %d.\n\n", + cost, level->num); + } + gsi_next (&bsi); + continue; + } + } + if (rhs2 && TREE_CODE (rhs2) == SSA_NAME) + { + stmt2 = SSA_NAME_DEF_STMT (rhs2); + def_bb = gimple_bb (stmt2); + if (stmt2 && def_bb + && (def_bb == bb + || !dominated_by_p (CDI_DOMINATORS, e->src, def_bb))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "stmt2: Statement NOT moved to %d from %d\n", + e->src->index, bb->index); + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "(cost %u) out of loop %d.\n\n", + cost, level->num); + } + gsi_next (&bsi); + continue; + } + } + } + } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -1231,7 +1380,6 @@ move_computations_worker (basic_block bb) cost, level->num); } - e = loop_preheader_edge (level); gcc_assert (!gimple_vdef (stmt)); if (gimple_vuse (stmt)) { @@ -2133,6 +2281,19 @@ execute_sm (class loop *loop, im_mem_ref *ref, bool multi_threaded_model_p = false; gimple_stmt_iterator gsi; sm_aux *aux = new sm_aux; + basic_block bb = gimple_bb (first_mem_ref_loc (loop, ref)->stmt); + + edge e = loop_preheader_edge (loop); + if (e->src->count > bb->count) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Don't execute store motion of "); + print_generic_expr (dump_file, ref->mem.ref); + fprintf (dump_file, " from loop %d\n", loop->num); + } + return; + } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2241,7 +2402,12 @@ execute_sm_exit (class loop *loop, edge ex, vec &seq, } else { - sm_aux *aux = *aux_map.get (ref); + sm_aux **paux = aux_map.get (ref); + sm_aux *aux; + if (paux) + aux = *paux; + else + continue; if (!aux->store_flag || kind == sm_ord) { gassign *store; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c index 638bf38db8c..641c91e719e 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c @@ -23,4 +23,4 @@ float h () F[0] += E / d; } -/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */ +/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c new file mode 100644 index 00000000000..49b5979dc7d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ + +volatile int x; +void +assert_fail (int, char *, char *); +void +foo (int *a, int n, int k) +{ + int i; + + for (i = 0; i < n; i++) + { + if (__builtin_expect (x, 0)) + assert_fail (k / 5, "one", "two"); + a[i] = k; + } +} + +/* { dg-final { scan-tree-dump-times "stmt: Statement NOT moved to" 1 "lim2" } } */ +/* { dg-final { scan-tree-dump-times "stmt: Statement NOT moved to" 1 "lim4" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c new file mode 100644 index 00000000000..3b1c7c0cb3e --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ + +volatile int x; +void +assert_fail (int, char *, char *); +void +foo (int *a, int n, int m, int k, int s) +{ + int i; + int j; + + for (i = 0; i < m; i++) + { + if (__builtin_expect (x, 0)) + for (j = 0; j < n; j++) + { + assert_fail (k / 5, "one", "two"); + a[s] = k; + } + a[s] = s; + } +} + +/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */ +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */ -- 2.27.0.90.geebb51ba8c