From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 7AB8B3858D29; Fri, 24 Sep 2021 06:29:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7AB8B3858D29 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18O4Anit005422; Fri, 24 Sep 2021 02:29:12 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3b93svxmd5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Sep 2021 02:29:12 -0400 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 18O6OhCY008944; Fri, 24 Sep 2021 02:29:12 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 3b93svxmcr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Sep 2021 02:29:11 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18O6CIYh013748; Fri, 24 Sep 2021 06:29:10 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma04ams.nl.ibm.com with ESMTP id 3b93ga2ef5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Sep 2021 06:29:10 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18O6T6eN55050656 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Sep 2021 06:29:06 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8FF6BA4065; Fri, 24 Sep 2021 06:29:06 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F08BDA405D; Fri, 24 Sep 2021 06:29:03 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.155.166]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 24 Sep 2021 06:29:03 +0000 (GMT) Subject: Re: [RFC] Don't move cold code out of loop by checking bb count To: Richard Biener Cc: GCC Patches , Segher Boessenkool , David Edelsohn , Bill Schmidt , Jiufu Guo , linkw@gcc.gnu.org, Feng Xue OS , Jan Hubicka References: <20210802050501.159058-1-luoxhu@linux.ibm.com> From: Xionghu Luo Message-ID: Date: Fri, 24 Sep 2021 14:29:00 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: CsFPnLaG0Hjnx9iUkLSoG_b8NXXXp34t X-Proofpoint-ORIG-GUID: BBFs9JziRGzqWUkkpWz0n8hTAOZp5OUq Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-24_01,2021-09-23_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 adultscore=0 priorityscore=1501 spamscore=0 impostorscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2109240034 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Sep 2021 06:29:15 -0000 Update the patch to v3, not sure whether you prefer the paste style and continue to link the previous thread as Segher dislikes this... [PATCH v3] Don't move cold code out of loop by checking bb count Changes: 1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop. 2. Remove unnecessary changes. 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p. 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused infinite loop when implementing v1 and the iteration is missed to be updated actually. v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html There was a patch trying to avoid move cold block out of loop: https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html Richard suggested to "never hoist anything from a bb with lower execution frequency to a bb with higher one in LIM invariantness_dom_walker before_dom_children". In gimple LIM analysis, add find_coldest_out_loop to move invariants to expected target loop, if profile count of the loop bb is colder than target loop preheader, it won't be hoisted out of loop. Likely for store motion, if all locations of the REF in loop is cold, don't do store motion of it. SPEC2017 performance evaluation shows 1% performance improvement for intrate GEOMEAN and no obvious regression for others. Especially, 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00% on P8LE. gcc/ChangeLog: * loop-invariant.c (find_invariants_bb): Check profile count before motion. (find_invariants_body): Add argument. * tree-ssa-loop-im.c (find_coldest_out_loop): New function. (determine_max_movement): Use find_coldest_out_loop. (move_computations_worker): Adjust and fix iteration udpate. (execute_sm_exit): Check pointer validness. (class ref_in_loop_hot_body): New functor. (ref_in_loop_hot_body::operator): New. (can_sm_ref_p): Use for_all_locs_in_loop. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/recip-3.c: Adjust. * gcc.dg/tree-ssa/ssa-lim-18.c: New test. * gcc.dg/tree-ssa/ssa-lim-19.c: New test. * gcc.dg/tree-ssa/ssa-lim-20.c: New test. --- gcc/loop-invariant.c | 10 ++-- gcc/tree-ssa-loop-im.c | 61 ++++++++++++++++++++-- gcc/testsuite/gcc.dg/tree-ssa/recip-3.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++++++++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +++++++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++++++++++ 7 files changed, 165 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index fca0c2b24be..5c3be7bf0eb 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) call. */ static void -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, + bool always_executed) { rtx_insn *insn; + basic_block preheader = loop_preheader_edge (loop)->src; + + if (preheader->count > bb->count) + return; FOR_BB_INSNS (bb, insn) { @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body, unsigned i; for (i = 0; i < loop->num_nodes; i++) - find_invariants_bb (body[i], - bitmap_bit_p (always_reached, i), + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), bitmap_bit_p (always_executed, i)); } diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 4b187c2cdaf..655fab03442 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -417,6 +417,28 @@ movement_possibility (gimple *stmt) return ret; } +/* Find coldest loop between outmost_loop and loop by comapring profile count. */ + +static class loop * +find_coldest_out_loop (class loop *outmost_loop, class loop *loop, + basic_block curr_bb) +{ + class loop *cold_loop, *min_loop; + cold_loop = min_loop = outmost_loop; + profile_count min_count = loop_preheader_edge (min_loop)->src->count; + + if (curr_bb && curr_bb->count < loop_preheader_edge (loop)->src->count) + return NULL; + + while (min_loop != loop) + { + min_loop = superloop_at_depth (loop, loop_depth (min_loop) + 1); + if (loop_preheader_edge (min_loop)->src->count < min_count) + cold_loop = min_loop; + } + return cold_loop; +} + /* Suppose that operand DEF is used inside the LOOP. Returns the outermost loop to that we could move the expression using DEF if it did not have other operands, i.e. the outermost loop enclosing LOOP in that the value @@ -685,7 +707,9 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec) level = ALWAYS_EXECUTED_IN (bb); else level = superloop_at_depth (loop, 1); - lim_data->max_loop = level; + lim_data->max_loop = find_coldest_out_loop (level, loop, bb); + if (!lim_data->max_loop) + return false; if (gphi *phi = dyn_cast (stmt)) { @@ -1198,7 +1222,6 @@ move_computations_worker (basic_block bb) for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); ) { edge e; - gimple *stmt = gsi_stmt (bsi); lim_data = get_lim_data (stmt); @@ -1221,7 +1244,10 @@ move_computations_worker (basic_block bb) /* We do not really want to move conditionals out of the loop; we just placed it here to force its operands to be moved if necessary. */ if (gimple_code (stmt) == GIMPLE_COND) - continue; + { + gsi_next (&bsi); + continue; + } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2241,7 +2267,12 @@ execute_sm_exit (class loop *loop, edge ex, vec &seq, } else { - sm_aux *aux = *aux_map.get (ref); + sm_aux **paux = aux_map.get (ref); + sm_aux *aux; + if (paux) + aux = *paux; + else + continue; if (!aux->store_flag || kind == sm_ord) { gassign *store; @@ -2887,6 +2918,25 @@ ref_indep_loop_p (class loop *loop, im_mem_ref *ref, dep_kind kind) return indep_p; } +class ref_in_loop_hot_body +{ +public: + ref_in_loop_hot_body (loop *loop_) : l (loop_) {} + bool operator () (mem_ref_loc *loc); + class loop *l; +}; + +bool +ref_in_loop_hot_body::operator () (mem_ref_loc *loc) +{ + basic_block curr_bb = gimple_bb (loc->stmt); + edge e = loop_preheader_edge (l); + if (e->src->count > curr_bb->count) + return false; + else + return true; +} + /* Returns true if we can perform store motion of REF from LOOP. */ @@ -2941,6 +2991,9 @@ can_sm_ref_p (class loop *loop, im_mem_ref *ref) if (!ref_indep_loop_p (loop, ref, sm_war)) return false; + if (!for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body (loop))) + return false; + return true; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c index 638bf38db8c..641c91e719e 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c @@ -23,4 +23,4 @@ float h () F[0] += E / d; } -/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */ +/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c new file mode 100644 index 00000000000..7326a230b3f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ + +volatile int x; +void +bar (int, char *, char *); +void +foo (int *a, int n, int k) +{ + int i; + + for (i = 0; i < n; i++) + { + if (__builtin_expect (x, 0)) + bar (k / 5, "one", "two"); + a[i] = k; + } +} + +/* { dg-final { scan-tree-dump-not "out of loop 1" "lim2" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c new file mode 100644 index 00000000000..f0a99fa42b4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ + +volatile int x; +void +bar (int, char *, char *); +void +foo (int *a, int n, int m, int k, int s) +{ + int i; + int j; + + for (i = 0; i < m; i++) + { + if (__builtin_expect (x, 0)) + for (j = 0; j < n; j++) + { + bar (k / 5, "one", "two"); + a[s] = k; + } + a[s] = s; + } +} + +/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */ +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c new file mode 100644 index 00000000000..bc60a040a70 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ + +/* Test that `count' is not hoisted out of loop when bb is cold. */ + +int count; +volatile int x; + +struct obj { + int data; + struct obj *next; + +} *q; + +void +func (int m) +{ + struct obj *p; + for (int i = 0; i < m; i++) + if (__builtin_expect (x, 0)) + count++; + +} + +/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c new file mode 100644 index 00000000000..fedaa3b7119 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ + +/* Test that `count' is hoisted out of loop when one of it's used bb is hot. */ + +int count; +volatile int x; + +struct obj { + int data; + struct obj *next; + +} *q; + +void +func (int m, int n) +{ + struct obj *p; + for (int i = 0; i < m; i++) + { + if (__builtin_expect (x, 0)) + count++; + count += n; + } +} + +/* { dg-final { scan-tree-dump-times "Executing store motion of" 1 "lim2" } } */ + -- 2.27.0.90.geebb51ba8c