From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 6B9B1385840E; Wed, 24 Nov 2021 05:15:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6B9B1385840E Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1AO4lDld022721; Wed, 24 Nov 2021 05:15:35 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3chep20cmj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Nov 2021 05:15:35 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1AO5FYUf013032; Wed, 24 Nov 2021 05:15:34 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com with ESMTP id 3chep20ckm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Nov 2021 05:15:34 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1AO5Cbfd020965; Wed, 24 Nov 2021 05:15:32 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma01fra.de.ibm.com with ESMTP id 3cern9m6b6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Nov 2021 05:15:31 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1AO5FT9G20119928 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Nov 2021 05:15:29 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 11040A4053; Wed, 24 Nov 2021 05:15:29 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BC709A4051; Wed, 24 Nov 2021 05:15:25 +0000 (GMT) Received: from [9.200.154.17] (unknown [9.200.154.17]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 24 Nov 2021 05:15:25 +0000 (GMT) Message-ID: Date: Wed, 24 Nov 2021 13:15:23 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.3.1 Subject: Ping: [PATCH v7 2/2] Don't move cold code out of loop by checking bb count Content-Language: en-US To: Richard Biener Cc: Segher Boessenkool , Bill Schmidt , linkw@gcc.gnu.org, GCC Patches , Jan Hubicka , David Edelsohn References: <20210802050501.159058-1-luoxhu@linux.ibm.com> <53b7c729-33c0-138f-fa06-d6efb7a43911@linux.ibm.com> <0b675ba1-cdab-652a-0bba-704b0090f1c5@linux.ibm.com> <2f0aee4a-dfea-d04d-6179-4d4623702f0a@linux.ibm.com> <409e3be5-0bfd-cdc4-2d93-9dbacdbc69cf@linux.ibm.com> From: Xionghu Luo In-Reply-To: <409e3be5-0bfd-cdc4-2d93-9dbacdbc69cf@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 11rFkA-B45eMl2P3oHiHI4eg_qPuRexK X-Proofpoint-ORIG-GUID: vw2TyrKGWF3azEal-1ZvzPv-xuBnSGua Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-24_01,2021-11-23_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 adultscore=0 mlxscore=0 phishscore=0 impostorscore=0 bulkscore=0 clxscore=1011 mlxlogscore=999 spamscore=0 suspectscore=0 malwarescore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111240029 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Nov 2021 05:15:39 -0000 Gentle ping and is this patch still suitable for stage 3? Thanks. [PATCH v7 2/2] Don't move cold code out of loop by checking bb count https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583911.html On 2021/11/10 11:08, Xionghu Luo via Gcc-patches wrote: > > > On 2021/11/4 21:00, Richard Biener wrote: >> On Wed, Nov 3, 2021 at 2:29 PM Xionghu Luo wrote: >>> >>> >>>> + while (outmost_loop != loop) >>>> + { >>>> + if (bb_colder_than_loop_preheader (loop_preheader_edge >>>> (outmost_loop)->src, >>>> + loop_preheader_edge (cold_loop)->src)) >>>> + cold_loop = outmost_loop; >>>> + outmost_loop = superloop_at_depth (loop, loop_depth (outmost_loop) + 1); >>>> + } >>>> >>>> could be instead written as >>>> >>>> coldest_loop = coldest_outermost_loop[loop->num]; >>>> if (loop_depth (coldest_loop) < loop_depth (outermost_loop)) >>>> return outermost_loop; >>>> return coldest_loop; >>>> >>>> ? And in the usual case coldest_outermost_loop[L] would be the loop tree root. >>>> It should be possible to compute such cache in a DFS walk of the loop tree >>>> (the loop iterator by default visits in such order). >>> >>> >>> Thanks. Updated the patch with your suggestion. Not sure whether it strictly >>> conforms to your comments. Though the patch passed all my added tests(coverage not enough), >>> I am still a bit worried if pre-computed coldest_loop is outside of outermost_loop, but >>> outermost_loop is not the COLDEST LOOP, i.e. (outer->inner) >>> >>> [loop tree root, coldest_loop, outermost_loop,..., second_coldest_loop, ..., loop], >>> >>> then function find_coldest_out_loop will return a loop NOT accord with our >>> expectation, that should return second_coldest_loop instead of outermost_loop? >> Hmm, interesting - yes. I guess the common case will be that the pre-computed >> outermost loop will be the loop at depth 1 since outer loops tend to >> be colder than >> inner loops? That would then defeat the whole exercise. > > It is not easy to construct such cases, But finally I got below results, > > 1) many cases inner loop is hotter than outer loop, for example: > > loop 1's coldest_outermost_loop is 1, colder_than_inner_loop is NULL > loop 2's coldest_outermost_loop is 1, colder_than_inner_loop is 1 > loop 3's coldest_outermost_loop is 1, colder_than_inner_loop is 2 > loop 4's coldest_outermost_loop is 1, colder_than_inner_loop is 2 > > > 2) But there are also cases inner loop is colder than outer loop, like: > > loop 1's coldest outermost loop is 1, colder_than_inner_loop is NULL > loop 2's coldest outermost loop is 2, colder_than_inner_loop is NULL > loop 3's coldest outermost loop is 3, colder_than_inner_loop is NULL > > >> >> To optimize the common case but not avoiding iteration in the cases we care >> about we could instead cache the next outermost loop that is _not_ colder >> than loop. So for your [ ... ] example above we'd have> hotter_than_inner_loop[loop] == outer (second_coldest_loop), where the >> candidate would then be 'second_coldest_loop' and we'd then iterate >> to hotter_than_inner_loop[hotter_than_inner_loop[loop]] to find the next >> cold candidate we can compare against? For the common case we'd >> have hotter_than_inner_loop[looo] == NULL (no such loop) and we then >> simply pick 'outermost_loop'. > > Thanks. It was difficult to understand, but finally I got to know what you > want to express :) > > We should cache the next loop that is *colder* than loop instead of '_not_ colder > than loop', and 'hotter_than_inner_loop' should be 'colder_than_inner_loop', > then it makes sense if the coldest loop is outside of outermost loop, continue to > find a colder loop between outermost loop and current loop in > colder_than_inner_loop[loop->num]? Hope I understood you correctly... > >> >> One comment on the patch itself below. >> > > The loop in fill_cold_out_loop is also removed in the updated v7 patch. > > > > [PATCH v7 2/2] Don't move cold code out of loop by checking bb count > > From: Xiong Hu Luo > > v7 changes: > 1. Refine get_coldest_out_loop to replace loop with checking > pre-computed coldest_outermost_loop and colder_than_inner_loop. > 2. Add function fill_cold_out_loop, compute coldest_outermost_loop and > colder_than_inner_loop recursively without loop. > > v6 changes: > 1. Add function fill_coldest_out_loop to pre compute the coldest > outermost loop for each loop. > 2. Rename find_coldest_out_loop to get_coldest_out_loop. > 3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c. > > v5 changes: > 1. Refine comments for new functions. > 2. Use basic_block instead of count in bb_colder_than_loop_preheader > to align with function name. > 3. Refine with simpler implementation for get_coldest_out_loop and > ref_in_loop_hot_body::operator for better understanding. > > v4 changes: > 1. Sort out profile_count comparision to function bb_cold_than_loop_preheader. > 2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare. > 3. Split RTL invariant motion part out. > 4. Remove aux changes. > > v3 changes: > 1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop. > 2. Remove unnecessary changes. > 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p. > 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused > infinite loop when implementing v1 and the iteration is missed to be > updated actually. > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html > v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html > v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html > v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html > v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html > > There was a patch trying to avoid move cold block out of loop: > > https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html > > Richard suggested to "never hoist anything from a bb with lower execution > frequency to a bb with higher one in LIM invariantness_dom_walker > before_dom_children". > > In gimple LIM analysis, add get_coldest_out_loop to move invariants to > expected target loop, if profile count of the loop bb is colder > than target loop preheader, it won't be hoisted out of loop. > Likely for store motion, if all locations of the REF in loop is cold, > don't do store motion of it. > > SPEC2017 performance evaluation shows 1% performance improvement for > intrate GEOMEAN and no obvious regression for others. Especially, > 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is > largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00% > on P8LE. > > gcc/ChangeLog: > > * tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New > function. > (get_coldest_out_loop): New function. > (determine_max_movement): Use get_coldest_out_loop. > (move_computations_worker): Adjust and fix iteration udpate. > (class ref_in_loop_hot_body): New functor. > (ref_in_loop_hot_body::operator): New. > (can_sm_ref_p): Use for_all_locs_in_loop. > (fill_cold_out_loop): New. > (tree_ssa_lim_finalize): Free coldest_outermost_loop and > colder_than_inner_loop. > (loop_invariant_motion_in_fun): Call fill_cold_out_loop. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/recip-3.c: Adjust. > * gcc.dg/tree-ssa/ssa-lim-18.c: New test. > * gcc.dg/tree-ssa/ssa-lim-19.c: New test. > * gcc.dg/tree-ssa/ssa-lim-20.c: New test. > * gcc.dg/tree-ssa/ssa-lim-21.c: New test. > * gcc.dg/tree-ssa/ssa-lim-22.c: New test. > --- > gcc/tree-ssa-loop-im.c | 140 ++++++++++++++++++++- > gcc/testsuite/gcc.dg/tree-ssa/recip-3.c | 2 +- > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++ > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 29 +++++ > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 ++++ > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 35 ++++++ > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c | 32 +++++ > 7 files changed, 280 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c > > diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c > index 4b187c2cdaf..e9b3d0cba93 100644 > --- a/gcc/tree-ssa-loop-im.c > +++ b/gcc/tree-ssa-loop-im.c > @@ -146,6 +146,11 @@ public: > enum dep_kind { lim_raw, sm_war, sm_waw }; > enum dep_state { dep_unknown, dep_independent, dep_dependent }; > > +/* coldest outermost loop for given loop. */ > +class loop **coldest_outermost_loop; > +/* colder outer loop nerest to given loop. */ > +class loop **colder_than_inner_loop; > + > /* Populate the loop dependence cache of REF for LOOP, KIND with STATE. */ > > static void > @@ -417,6 +422,53 @@ movement_possibility (gimple *stmt) > return ret; > } > > +/* Compare the profile count inequality of bb and preheader, it is three-state > + as stated in profile-count.h, FALSE is returned if inequality cannot be > + decided. */ > +bool bb_colder_than_loop_preheader (basic_block bb, basic_block preheader) > +{ > + gcc_assert (bb && preheader); > + return bb->count < preheader->count; > +} > + > +/* Check coldest loop between OUTERMOST_LOOP and LOOP by comparing profile > + count. > + It does three steps check: > + 1) Check whether CURR_BB is cold in it's own loop_father, if it is cold, just > + return NULL which means it should not be moved out at all; > + 2) CURR_BB is NOT cold, check if pre-computed COLDEST_LOOP is outside of > + OUTERMOST_LOOP, if it is inside of OUTERMOST_LOOP, return the COLDEST_LOOP; > + 3) If COLDEST_LOOP is outside of OUTERMOST_LOOP, check whether there is a > + colder loop between OUTERMOST_LOOP and loop in pre-computed > + COLDER_THAN_INNER_LOOP, return it, otherwise return OUTERMOST_LOOP. */ > + > +static class loop * > +get_coldest_out_loop (class loop *outermost_loop, class loop *loop, > + basic_block curr_bb) > +{ > + gcc_assert (outermost_loop == loop > + || flow_loop_nested_p (outermost_loop, loop)); > + > + /* If bb_colder_than_loop_preheader returns false due to three-state > + comparision, OUTERMOST_LOOP is returned finally to preserve the behavior. > + Otherwise, return the coldest loop between OUTERMOST_LOOP and LOOP. */ > + if (curr_bb > + && bb_colder_than_loop_preheader (curr_bb, > + loop_preheader_edge (loop)->src)) > + return NULL; > + > + class loop *coldest_loop = coldest_outermost_loop[loop->num]; > + if (loop_depth (coldest_loop) < loop_depth (outermost_loop)) > + { > + if (colder_than_inner_loop[loop->num] != NULL > + && loop_depth (outermost_loop) > + < loop_depth (colder_than_inner_loop[loop->num])) > + return colder_than_inner_loop[loop->num]; > + return outermost_loop; > + } > + return coldest_loop; > +} > + > /* Suppose that operand DEF is used inside the LOOP. Returns the outermost > loop to that we could move the expression using DEF if it did not have > other operands, i.e. the outermost loop enclosing LOOP in that the value > @@ -685,7 +737,9 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec) > level = ALWAYS_EXECUTED_IN (bb); > else > level = superloop_at_depth (loop, 1); > - lim_data->max_loop = level; > + lim_data->max_loop = get_coldest_out_loop (level, loop, bb); > + if (!lim_data->max_loop) > + return false; > > if (gphi *phi = dyn_cast (stmt)) > { > @@ -1221,7 +1275,10 @@ move_computations_worker (basic_block bb) > /* We do not really want to move conditionals out of the loop; we just > placed it here to force its operands to be moved if necessary. */ > if (gimple_code (stmt) == GIMPLE_COND) > - continue; > + { > + gsi_next (&bsi); > + continue; > + } > > if (dump_file && (dump_flags & TDF_DETAILS)) > { > @@ -2887,6 +2944,26 @@ ref_indep_loop_p (class loop *loop, im_mem_ref *ref, dep_kind kind) > return indep_p; > } > > +class ref_in_loop_hot_body > +{ > +public: > + ref_in_loop_hot_body (loop *loop_) : l (loop_) {} > + bool operator () (mem_ref_loc *loc); > + class loop *l; > +}; > + > +/* Check the coldest loop between loop L and innermost loop. If there is one > + cold loop between L and INNER_LOOP, store motion can be performed, otherwise > + no cold loop means no store motion. get_coldest_out_loop also handles cases > + when l is inner_loop. */ > +bool > +ref_in_loop_hot_body::operator () (mem_ref_loc *loc) > +{ > + basic_block curr_bb = gimple_bb (loc->stmt); > + class loop *inner_loop = curr_bb->loop_father; > + return get_coldest_out_loop (l, inner_loop, curr_bb); > +} > + > > /* Returns true if we can perform store motion of REF from LOOP. */ > > @@ -2941,6 +3018,12 @@ can_sm_ref_p (class loop *loop, im_mem_ref *ref) > if (!ref_indep_loop_p (loop, ref, sm_war)) > return false; > > + /* Verify whether the candidate is hot for LOOP. Only do store motion if the > + candidate's profile count is hot. Statement in cold BB shouldn't be moved > + out of it's loop_father. */ > + if (!for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body (loop))) > + return false; > + > return true; > } > > @@ -3153,6 +3236,48 @@ fill_always_executed_in (void) > fill_always_executed_in_1 (loop, contains_call); > } > > +/* Find the coldest loop preheader for LOOP, also find the nearest colder loop > + to LOOP. Then recursively iterate each inner loop. */ > + > +void > +fill_cold_out_loop (class loop *coldest_loop, class loop *outer_loop, > + class loop *loop) > +{ > + class loop *colder_loop = NULL; > + if (outer_loop) > + { > + if (bb_colder_than_loop_preheader (loop_preheader_edge (outer_loop)->src, > + loop_preheader_edge (loop)->src)) > + colder_loop = outer_loop; > + else if (colder_than_inner_loop[outer_loop->num] > + && bb_colder_than_loop_preheader ( > + loop_preheader_edge (colder_than_inner_loop[outer_loop->num]) > + ->src, > + loop_preheader_edge (loop)->src)) > + colder_loop = colder_than_inner_loop[outer_loop->num]; > + } > + > + if (bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src, > + loop_preheader_edge (coldest_loop)->src)) > + coldest_loop = loop; > + > + coldest_outermost_loop[loop->num] = coldest_loop; > + colder_than_inner_loop[loop->num] = colder_loop; > + if (dump_enabled_p ()) > + { > + dump_printf (MSG_NOTE, "loop %d's coldest_outermost_loop is %d, ", > + loop->num, coldest_loop->num); > + if (colder_loop) > + dump_printf (MSG_NOTE, "colder_than_inner_loop is %d\n", > + colder_loop->num); > + else > + dump_printf (MSG_NOTE, "colder_than_inner_loop is NULL\n"); > + } > + > + class loop *inner_loop; > + for (inner_loop = loop->inner; inner_loop; inner_loop = inner_loop->next) > + fill_cold_out_loop (coldest_loop, loop, inner_loop); > +} > > /* Compute the global information needed by the loop invariant motion pass. */ > > @@ -3237,6 +3362,9 @@ tree_ssa_lim_finalize (void) > free_affine_expand_cache (&memory_accesses.ttae_cache); > > free (bb_loop_postorder); > + > + free (coldest_outermost_loop); > + free (colder_than_inner_loop); > } > > /* Moves invariants from loops. Only "expensive" invariants are moved out -- > @@ -3256,6 +3384,14 @@ loop_invariant_motion_in_fun (function *fun, bool store_motion) > /* Fills ALWAYS_EXECUTED_IN information for basic blocks. */ > fill_always_executed_in (); > > + /* Pre-compute coldest outermost loop and nearest colder loop of each loop. > + */ > + class loop *loop; > + coldest_outermost_loop = XNEWVEC (class loop *, number_of_loops (cfun)); > + colder_than_inner_loop = XNEWVEC (class loop *, number_of_loops (cfun)); > + for (loop = current_loops->tree_root->inner; loop != NULL; loop = loop->next) > + fill_cold_out_loop (loop, NULL, loop); > + > int *rpo = XNEWVEC (int, last_basic_block_for_fn (fun)); > int n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c > index 638bf38db8c..641c91e719e 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c > @@ -23,4 +23,4 @@ float h () > F[0] += E / d; > } > > -/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */ > +/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c > new file mode 100644 > index 00000000000..7326a230b3f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ > + > +volatile int x; > +void > +bar (int, char *, char *); > +void > +foo (int *a, int n, int k) > +{ > + int i; > + > + for (i = 0; i < n; i++) > + { > + if (__builtin_expect (x, 0)) > + bar (k / 5, "one", "two"); > + a[i] = k; > + } > +} > + > +/* { dg-final { scan-tree-dump-not "out of loop 1" "lim2" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > new file mode 100644 > index 00000000000..51c1913d003 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ > + > +volatile int x; > +void > +bar (int, char *, char *); > +void > +foo (int *a, int n, int m, int s, int t) > +{ > + int i; > + int j; > + int k; > + > + for (i = 0; i < m; i++) // Loop 1 > + { > + if (__builtin_expect (x, 0)) > + for (j = 0; j < n; j++) // Loop 2 > + for (k = 0; k < n; k++) // Loop 3 > + { > + bar (s / 5, "one", "two"); > + a[t] = s; > + } > + a[t] = t; > + } > +} > + > +/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */ > +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c > new file mode 100644 > index 00000000000..bc60a040a70 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ > + > +/* Test that `count' is not hoisted out of loop when bb is cold. */ > + > +int count; > +volatile int x; > + > +struct obj { > + int data; > + struct obj *next; > + > +} *q; > + > +void > +func (int m) > +{ > + struct obj *p; > + for (int i = 0; i < m; i++) > + if (__builtin_expect (x, 0)) > + count++; > + > +} > + > +/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c > new file mode 100644 > index 00000000000..ffe6f8f699d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ > + > +/* Test that `data' and 'data1' is not hoisted out of inner loop and outer loop > + when it is in cold loop. */ > + > +int count; > +volatile int x; > + > +struct obj { > + int data; > + int data1; > + struct obj *next; > +}; > + > +void > +func (int m, int n, int k, struct obj *a) > +{ > + struct obj *q = a; > + for (int j = 0; j < m; j++) > + if (__builtin_expect (m, 0)) > + for (int i = 0; i < m; i++) > + { > + if (__builtin_expect (x, 0)) > + { > + count++; > + q->data += 3; /* Not hoisted out to inner loop. */ > + } > + count += n; > + q->data1 += k; /* Not hoisted out to outer loop. */ > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c > new file mode 100644 > index 00000000000..16ba4ceb8ab > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ > + > +volatile int x; > +volatile int y; > +void > +bar (int, char *, char *); > +void > +foo (int *a, int n, int m, int s, int t) > +{ > + int i; > + int j; > + int k; > + > + for (i = 0; i < m; i++) // Loop 1 > + { > + if (__builtin_expect (x, 0)) > + for (j = 0; j < n; j++) // Loop 2 > + if (__builtin_expect (y, 0)) > + for (k = 0; k < n; k++) // Loop 3 > + { > + bar (s / 5, "one", "two"); > + a[t] = s; > + } > + a[t] = t; > + } > +} > + > +/* { dg-final { scan-tree-dump-times "out of loop 3" 4 "lim2" } } */ > +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */ > + > + > -- Thanks, Xionghu