From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id F2F63386197E for ; Fri, 10 Jul 2020 02:14:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F2F63386197E Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 06A21dsm090126; Thu, 9 Jul 2020 22:14:14 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 326bpbcrhf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Jul 2020 22:14:14 -0400 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 06A22eIL092517; Thu, 9 Jul 2020 22:14:13 -0400 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0b-001b2d01.pphosted.com with ESMTP id 326bpbcrh7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Jul 2020 22:14:13 -0400 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 06A2ACdI028677; Fri, 10 Jul 2020 02:14:12 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma01dal.us.ibm.com with ESMTP id 326bc49uvn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Jul 2020 02:14:12 +0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 06A2EBlM23593326 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 10 Jul 2020 02:14:11 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C91E37805F; Fri, 10 Jul 2020 02:14:11 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5E2287805C; Fri, 10 Jul 2020 02:14:11 +0000 (GMT) Received: from genoa (unknown [9.40.192.157]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTPS; Fri, 10 Jul 2020 02:14:11 +0000 (GMT) From: Jiufu Guo To: Martin =?utf-8?Q?Li=C5=A1ka?= Cc: Jiufu Guo via Gcc-patches , Jan Hubicka , wschmidt@linux.ibm.com, segher@kernel.crashing.org, Richard Biener , "Bin.Cheng" Subject: Re: [PATCH V2] PING^2 correct COUNT and PROB for unrolled loop References: <1580717822-6073-1-git-send-email-guojiufu@linux.ibm.com> <20200203162337.GK22868@kam.mff.cuni.cz> Date: Fri, 10 Jul 2020 10:14:08 +0800 In-Reply-To: ("Martin \=\?utf-8\?Q\?Li\=C5\=A1ka\=22's\?\= message of "Thu, 9 Jul 2020 13:55:57 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-09_11:2020-07-09, 2020-07-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 impostorscore=0 bulkscore=0 priorityscore=1501 spamscore=0 phishscore=0 clxscore=1015 lowpriorityscore=0 mlxlogscore=999 adultscore=0 mlxscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007100004 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2020 02:14:16 -0000 Hi Martin, Thanks so much for your time and kindly help!!! Wish Richi, Bin or Honza have time to review this patch. ;-) --Here is a summmary--- PR68212 mentioned that the COUNT of unrolled loop was not correct, and comments of this PR also mentioned that loop become 'cold'. The following patch fixes the wrong COUNT/PROB of unrolled loop. And the patch resets the COUNT the case where unrolling in unreliable count number can cause a loop to no longer look hot and therefor not get aligned. Belows messages are referenced. (https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02368.html) and comment (https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02380.html, https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00044.html), Bootstrap and regtest are ok on powerpc64le. Is this ok for trunk? ChangeLog: 2020-07-10 Jiufu Guo Pat Haugen PR rtl-optimization/68212 * cfgloopmanip.c (duplicate_loop_to_header_edge): Correct COUNT/PROB for unrolled/peeled blocks. testsuite/ChangeLog: 2020-07-10 Jiufu Guo Pat Haugen PR rtl-optimization/68212 * gcc.dg/pr68212.c: New test. diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c index 727e951..ded0046 100644 --- a/gcc/cfgloopmanip.c +++ b/gcc/cfgloopmanip.c @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3. If not see #include "gimplify-me.h" #include "tree-ssa-loop-manip.h" #include "dumpfile.h" +#include "cfgrtl.h" =20 static void copy_loops_to (class loop **, int, class loop *); @@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, ed= ge e, /* If original loop is executed COUNT_IN times, the unrolled loop will account SCALE_MAIN_DEN times. */ scale_main =3D count_in.probability_in (scale_main_den); + + /* If we are guessing at the number of iterations and count_in + becomes unrealistically small, reset probability. */ + if (!(count_in.reliable_p () || loop->any_estimate)) + { + profile_count new_count_in =3D count_in.apply_probability (scale_ma= in); + profile_count preheader_count =3D loop_preheader_edge (loop)->count= (); + if (new_count_in.apply_scale (1, 10) < preheader_count) + scale_main =3D profile_probability::likely (); + } + scale_act =3D scale_main * prob_pass_main; } else { + profile_count new_loop_count; profile_count preheader_count =3D e->count (); - for (i =3D 0; i < ndupl; i++) - scale_main =3D scale_main * scale_step[i]; scale_act =3D preheader_count.probability_in (count_in); + /* Compute final preheader count after peeling NDUPL copies. */ + for (i =3D 0; i < ndupl; i++) + preheader_count =3D preheader_count.apply_probability (scale_step[i]); + /* Subtract out exit(s) from peeled copies. */ + new_loop_count =3D count_in - (e->count () - preheader_count); + scale_main =3D new_loop_count.probability_in (count_in); } } =20 @@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, edg= e e, scale_bbs_frequencies (new_bbs, n, scale_act); scale_act =3D scale_act * scale_step[j]; } + + /* Need to update PROB of exit edge and corresponding COUNT. */ + if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1)) + && bbs_to_scale) + { + edge new_exit =3D new_spec_edges[SE_ORIG]; + profile_count new_count_in =3D new_exit->src->count; + profile_count preheader_count =3D loop_preheader_edge (loop)->count (); + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, new_exit->src->succs) + if (e !=3D new_exit) + break; + + gcc_assert (e && e !=3D new_exit); + + new_exit->probability =3D preheader_count.probability_in (new_count_in); + e->probability =3D new_exit->probability.invert (); + + profile_count new_latch_count + =3D new_exit->src->count.apply_probability (e->probability); + profile_count old_latch_count =3D e->dest->count; + + EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi) + scale_bbs_frequencies_profile_count (new_bbs + i, 1, + new_latch_count, + old_latch_count); + + if (current_ir_type () !=3D IR_GIMPLE) + update_br_prob_note (e->src); + } } free (new_bbs); free (orig_loops); diff --git a/gcc/testsuite/gcc.dg/pr68212.c b/gcc/testsuite/gcc.dg/pr68212.c new file mode 100644 index 0000000..f3b7c22 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr68212.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param max-unroll= -times=3D4 -fdump-rtl-alignments" } */ + +void foo(long int *a, long int *b, long int n) +{ + long int i; + + for (i =3D 0; i < n; i++) + a[i] =3D *b; +} + +/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 "ali= gnments"} } */ + --=20 2.7.4 Thanks! Jiufu Guo. Martin Li=C5=A1ka writes: > On 7/2/20 4:35 AM, Jiufu Guo via Gcc-patches wrote: >> I would like to reping this patch. >> Since this is correcting COUNT and PROB for hot blocks, it helps some >> cases. >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html > > Hey. > > I've just briefly looked at the patch and I don't feel the right person > to make a review for it. > > I believe Richi, Bin or Honza can help us here? > Martin