From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 89536 invoked by alias); 16 Nov 2015 17:36:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 89518 invoked by uid 89); 16 Nov 2015 17:36:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: NAM02-BL2-obe.outbound.protection.outlook.com Received: from mail-bl2nam02on0057.outbound.protection.outlook.com (HELO NAM02-BL2-obe.outbound.protection.outlook.com) (104.47.38.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA256 encrypted) ESMTPS; Mon, 16 Nov 2015 17:36:49 +0000 Received: from CY1NAM02FT018.eop-nam02.prod.protection.outlook.com (10.152.74.52) by CY1NAM02HT004.eop-nam02.prod.protection.outlook.com (10.152.75.71) with Microsoft SMTP Server (TLS) id 15.1.331.11; Mon, 16 Nov 2015 17:36:45 +0000 Authentication-Results: spf=pass (sender IP is 149.199.60.100) smtp.mailfrom=xilinx.com; redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=bestguesspass action=none header.from=xilinx.com; Received-SPF: Pass (protection.outlook.com: domain of xilinx.com designates 149.199.60.100 as permitted sender) receiver=protection.outlook.com; client-ip=149.199.60.100; helo=xsj-pvapsmtpgw02; Received: from xsj-pvapsmtpgw02 (149.199.60.100) by CY1NAM02FT018.mail.protection.outlook.com (10.152.75.183) with Microsoft SMTP Server (TLS) id 15.1.331.11 via Frontend Transport; Mon, 16 Nov 2015 17:36:45 +0000 Received: from unknown-38-66.xilinx.com ([149.199.38.66]:49824 helo=xsj-pvapsmtp01) by xsj-pvapsmtpgw02 with esmtp (Exim 4.63) (envelope-from ) id 1ZyNht-0002H1-4K; Mon, 16 Nov 2015 09:36:45 -0800 Received: from localhost ([127.0.0.1] helo=xsj-pvapsmtp01) by xsj-pvapsmtp01 with esmtp (Exim 4.63) (envelope-from ) id 1ZyNht-00014U-0i; Mon, 16 Nov 2015 09:36:45 -0800 Received: from [172.22.159.25] (helo=XAP-PVEXCAS01.xlnx.xilinx.com) by xsj-pvapsmtp01 with esmtp (Exim 4.63) (envelope-from ) id 1ZyNhs-00014R-Cs; Mon, 16 Nov 2015 09:36:44 -0800 Received: from XAP-PVEXMBX02.xlnx.xilinx.com ([fe80::6c95:7dae:8014:5ca1]) by XAP-PVEXCAS01.xlnx.xilinx.com ([::1]) with mapi id 14.03.0248.002; Tue, 17 Nov 2015 01:36:43 +0800 From: Ajit Kumar Agarwal To: Jeff Law , GCC Patches CC: Vinod Kathail , Shail Aditya Gupta , Vidhumouli Hunsigida , "Nagaraju Mekala" Subject: RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS. Date: Mon, 16 Nov 2015 17:36:00 -0000 Message-ID: <37378DC5BCD0EE48BA4B082E0B55DFAA429B45DD@XAP-PVEXMBX02.xlnx.xilinx.com> References: <37378DC5BCD0EE48BA4B082E0B55DFAA4299E44D@XAP-PVEXMBX02.xlnx.xilinx.com> <56457FA0.1070201@redhat.com> In-Reply-To: <56457FA0.1070201@redhat.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:149.199.60.100;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(438002)(54534003)(13464003)(377424004)(24454002)(189002)(199003)(479174004)(377454003)(43544003)(76176999)(47776003)(106116001)(2920100001)(106466001)(63266004)(86362001)(54356999)(50986999)(23726002)(19580405001)(33656002)(2900100001)(102836002)(19580395003)(2950100001)(5003600100002)(5250100002)(87936001)(55846006)(5001770100001)(5001920100001)(4001430100002)(107886002)(50466002)(46406003)(11100500001)(6806005)(189998001)(5008740100001)(5001960100002)(586003)(5007970100001)(5004730100002)(97756001)(92566002)(81156007)(107986001)(5001870100001);DIR:OUT;SFP:1101;SCL:1;SRVR:CY1NAM02HT004;H:xsj-pvapsmtpgw02;FPR:;SPF:Pass;PTR:unknown-60-100.xilinx.com,xapps1.xilinx.com;MX:1;A:1;LANG:en; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(8251501001);SRVR:CY1NAM02HT004; X-Microsoft-Antispam-PRVS: <406fe1c7925e4d42ab14fb91203eed68@CY1NAM02HT004.eop-nam02.prod.protection.outlook.com> X-Exchange-Antispam-Report-Test: UriScan:(192813158149592); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(520078)(5005006)(8121501046)(10201501046)(3002001);SRVR:CY1NAM02HT004;BCL:0;PCL:0;RULEID:;SRVR:CY1NAM02HT004; X-Forefront-PRVS: 0762FFD075 X-OriginatorOrg: xilinx.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Nov 2015 17:36:45.4809 (UTC) X-MS-Exchange-CrossTenant-Id: 657af505-d5df-48d0-8300-c31994686c5c X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=657af505-d5df-48d0-8300-c31994686c5c;Ip=[149.199.60.100];Helo=[xsj-pvapsmtpgw02] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1NAM02HT004 X-SW-Source: 2015-11/txt/msg01978.txt.bz2 -----Original Message----- From: Jeff Law [mailto:law@redhat.com]=20 Sent: Friday, November 13, 2015 11:44 AM To: Ajit Kumar Agarwal; GCC Patches Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [RFC, Patch]: Optimized changes in the register used inside lo= op for LICM and IVOPTS. On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote: > > 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch > > > From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00=20 > 2001 > From: Ajit Kumar Agarwal > Date: Wed, 7 Oct 2015 20:50:40 +0200 > Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used ins= ide > loop for LICM and IVOPTS. > > Changes are done in the Loop Invariant(LICM) at RTL level and also the=20 > Induction variable optimization based on SSA representation. The=20 > current logic used in LICM for register used inside the loops is=20 > changed. The Live Out of the loop latch node and the Live in of the=20 > destination of the exit nodes is used to set the Loops Liveness at the ex= it of the Loop. > The register used is the number of live variables at the exit of the=20 > Loop calculated above. > > For Induction variable optimization on tree SSA representation, the=20 > register used logic is based on the number of phi nodes at the loop=20 > header to represent the liveness at the loop. Current Logic used only=20 > the number of phi nodes at the loop header. Changes are made to=20 > represent the phi operands also live at the loop. Thus number of phi=20 > operands also gets incremented in the number of registers used. > > ChangeLog: > 2015-10-09 Ajit Agarwal > > * loop-invariant.c (compute_loop_liveness): New. > (determine_regs_used): New. > (find_invariants_to_move): Use of determine_regs_used. > * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi > arguments for register used. >>I think Bin rejected the tree-ssa-loop-ivopts change. However, the loop-= invariant change is still pending, right? > > Signed-off-by:Ajit Agarwalajitkum@xilinx.com > --- > gcc/loop-invariant.c | 72 +++++++++++++++++++++++++++++++++++++--= ------- > gcc/tree-ssa-loop-ivopts.c | 4 +-- > 2 files changed, 60 insertions(+), 16 deletions(-) > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index 52c8ae8..e4291c9 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain) > } > } > > +static int > +determine_regs_used() > +{ > + unsigned int j; > + unsigned int reg_used =3D 2; > + bitmap_iterator bi; > + > + EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi) > + (reg_used) ++; > + > + return reg_used; > +} >>Isn't this just bitmap_count_bits (regs_live) + 2? > @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void) > } > } > > - > +static void > +calculate_loop_liveness (void) >>Needs a function comment. I will incorporate the above comments. > +{ > + basic_block bb; > + struct loop *loop; > > -/* Move the invariants out of the loops. */ > + FOR_EACH_LOOP (loop, 0) > + if (loop->aux =3D=3D NULL) > + { > + loop->aux =3D xcalloc (1, sizeof (struct loop_data)); > + bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack); > + } > + > + FOR_EACH_BB_FN (bb, cfun) >>Why loop over blocks here? Why not just iterate through all the loops=20 >>in the loop structure. Order isn't particularly important AFAICT for=20 >>this code. Iterating over the Loop structure is enough. We don't need iterating over t= he basic blocks. > + { > + int i; > + edge e; > + vec edges; > + edges =3D get_loop_exit_edges (loop); > + FOR_EACH_VEC_ELT (edges, i, e) > + { > + bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->s= rc)); > + bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN(e->de= st)); >>Space before the open-paren in the previous two lines >>DF_LR_OUT (e->src) and FD_LR_INT (e->dest)) I will incorporate this. > + } > + } > + } > +} > + > +/* Move the invariants ut of the loops. */ >>Looks like you introduced a typo. >>I'd like to see testcases which show the change in # regs used=20 >>computation helping generate better code.=20 We need to measure the test case with the scenario where the new variable c= reated for loop invariant increases the register pressure and the cost with respect to reg_used and new_regs increases that lead to spill= and fetch and drop the invariant movement. Getting that environment in the test case seems to be little difficult. But= I will try to identify the testcases extracted from the Benchmarks where we got the gains by the above method. >>And I'd also like to see some background information on why you think=20 >>this is a more accurate measure for the number of registers used in the=20 >>loop. regs_used AFAICT is supposed to be an estimate of the registers=20 >>live around the loop. So ISTM that you get that value by live-out set=20 >>on the backedge of the loop. I guess you get somethign similar by=20 >>looking at the exit edge's source block's live-out set. But I don't see= =20 >>any value in including stuff live at the block outside the loop. >>It also seems fairly non-intuitive. Get the block's latch and use its=20 >>live-out set. That seems more intuitive. We are interested in registers pressure that arises with code motion optimi= zation like Loop Invariant. The register pressure is based out of the number of interfering live ranges at a given point. If = that exceeds the number of available registers, there may be chance of spilling that live range. Again the spilling is based on the coloring sc= heme where the Briggs Allocator place such live ranges into the stack during simplification phase instead of just spilling. This might enables th= e chances of getting the register and reduces the spill and fetch. Again some of the coloring scheme choose the live range that has high regis= ter pressure at the highest priority for coloring and removes Such nodes from interference graph and place it on the stack at the highest= priority. This may give a better changes of covering the=20 Interference graph colorable. Based on the above background and considering all the above points the numb= er of liveness of the loop is a better approximation of the register pressure and should be considered for the register used to do the = cost analysis. Getting the loop exit edges and the source of the exit edge is the Loop lat= ch node. This will take care of Live out of the latch node. Moreover adding the Live-in of the destination edge of the loop exit edges = will add the liveness after the loops. I also tried using only liveout=20 of the loop latch nodes but did not see much gains using only liveout of t= he loop latch node. Thanks & Regards Ajit