From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21464 invoked by alias); 15 Oct 2009 09:32:21 -0000 Received: (qmail 21450 invoked by uid 22791); 15 Oct 2009 09:32:19 -0000 X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 tests=AWL,BAYES_50 X-Spam-Check-By: sourceware.org Received: from mms2.broadcom.com (HELO mms2.broadcom.com) (216.31.210.18) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Oct 2009 09:32:14 +0000 Received: from [10.16.192.232] by mms2.broadcom.com with ESMTP (Broadcom SMTP Relay (Email Firewall v6.3.2)); Thu, 15 Oct 2009 02:31:54 -0700 X-Server-Uuid: D3C04415-6FA8-4F2C-93C1-920E106A2031 Received: from SJEXCHCCR02.corp.ad.broadcom.com ([10.16.192.130]) by SJEXCHHUB02.corp.ad.broadcom.com ([10.16.192.232]) with mapi; Thu, 15 Oct 2009 02:31:54 -0700 From: "Bingfeng Mei" To: "Jean Christophe Beyler" , "Zdenek Dvorak" cc: "gcc@gcc.gnu.org" Date: Thu, 15 Oct 2009 10:00:00 -0000 Subject: RE: Turning off unrolling to certain loops Message-ID: <7FB04A5C213E9943A72EE127DB74F0AD93CCBE4AD6@SJEXCHCCR02.corp.ad.broadcom.com> References: <20091006065352.GB15626@kam.mff.cuni.cz> <20091006135624.GA18714@kam.mff.cuni.cz> <20091006150918.GA19277@kam.mff.cuni.cz> <20091008161802.GA30141@kam.mff.cuni.cz> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00333.txt.bz2 Hello, I faced a similar issue a while ago. I filed a bug report=20 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D36712) In the end,=20 I implemented a simple tree-level unrolling pass in our port which uses all the existing infrastructure. It works quite well for=20 our purpose, but I hesitated to submit the patch because it contains our not-very-elegannt #prgama unroll implementation.=20 The following two functions do the unrolling. I insert the pass=20 just after the loop_prefetch pass (4.4) Cheers, Bingfeng Mei /* Perfect unrolling of a loop */ static void tree_unroll_perfect_loop (struct loop *loop, unsigned factor, edge exit) { sbitmap wont_exit; edge e; bool ok; unsigned i; VEC (edge, heap) *to_remove =3D NULL; =20=20 /* Unroll the loop and remove the exits in all iterations except for the last one. */ wont_exit =3D sbitmap_alloc (factor); sbitmap_ones (wont_exit); RESET_BIT (wont_exit, factor - 1); ok =3D gimple_duplicate_loop_to_header_edge (loop, loop_latch_edge (loop), factor - 1, wont_exit, exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ); free (wont_exit); gcc_assert (ok); for (i =3D 0; VEC_iterate (edge, to_remove, i, e); i++) { ok =3D remove_path (e); gcc_assert (ok); } VEC_free (edge, heap, to_remove); update_ssa (TODO_update_ssa); =20=20 #ifdef ENABLE_CHECKING verify_flow_info (); verify_dominators (CDI_DOMINATORS); verify_loop_structure (); verify_loop_closed_ssa (); #endif } =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 /* Go through all the loops:=20 1. Determine unrolling factor 2. Unroll loops in different conditions -- perfect loop: no extra copy of original loop -- other loops: the original version of loops to execute the remain= ing iterations */=20=20=20=20=20=20=20=20 static unsigned int rest_of_tree_unroll (void) { loop_iterator li; struct loop *loop; unsigned ninsns, unroll_factor; HOST_WIDE_INT est_niter; struct tree_niter_desc desc; bool unrolled =3D false; initialize_original_copy_tables (); =20=20 /* Scan the loops, inner ones first. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { =20=20=20=20=20 est_niter =3D estimated_loop_iterations_int (loop, false); ninsns =3D tree_num_loop_insns (loop, &eni_size_weights); unroll_factor =3D determine_unroll_factor (loop, ninsns, &desc, est_ni= ter); if (unroll_factor !=3D 1) { tree niters =3D number_of_exit_cond_executions(loop); =20=20=20=20=20=20=20 bool perfect_unrolling =3D false; if(niters !=3D NULL_TREE && niters!=3D chrec_dont_know && TREE_CODE(= niters) =3D=3D INTEGER_CST){ int num_iters =3D tree_low_cst(niters, 1); if((num_iters % unroll_factor) =3D=3D 0) perfect_unrolling =3D true; } =20=20=20=20=20=20=20 /* If no. of iterations can be divided by unrolling factor, we have = perfect unrolling */ if(perfect_unrolling){ tree_unroll_perfect_loop(loop, unroll_factor, single_dom_exit(loop= )); } else{ tree_unroll_loop (loop, unroll_factor, single_dom_exit (loop), &de= sc); }=20=20 unrolled =3D true; } }=20=20 =20=20 free_original_copy_tables (); =20=20 /* Need to rebuild call graph due if new function calls are created due to loop unrolling=20 FIXME: rebuild cgraph will lose some information like reason of not inl= ining*/ if(unrolled)=20=20 rebuild_cgraph_edges(); /*debug_cgraph();*/ =20=20 return 0; } > -----Original Message----- > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On=20 > Behalf Of Jean Christophe Beyler > Sent: 14 October 2009 19:29 > To: Zdenek Dvorak > Cc: gcc@gcc.gnu.org > Subject: Re: Turning off unrolling to certain loops >=20 > Ok, I've actually gone a different route. Instead of waiting for the > middle end to perform this, I've directly modified the parser stage to > unroll the loop directly there. >=20 > Basically, I take the parser of the for and modify how it adds the > various statements. Telling it to, instead of doing in the > c_finish_loop : >=20 > if (body) > add_stmt (body); > if (clab) > add_stmt (build1 (LABEL_EXPR, void_type_node, clab)); > if (incr) > add_stmt (incr); > ... >=20 > I tell it to add multiple copies of body and incr and the at the end > add in the loop the rest of it. I've also added support to remove > further unrolling to these modified loops and will be handling the > "No-unroll" pragma. I then let the rest of the optimization passes, > fuse the incrementations together if possible, etc. >=20 > The initial results are quite good and seem to work and=20 > produce good code. >=20 > Currently, there are two possibilities : >=20 > - If the loop is not in the form we want, for example: >=20 > for (;i { > ... > } >=20 > Do we still unroll even though we have to trust the user that the > number of unrolling will not break the semantics ? >=20 > To handle this, I am adding warnings that will appear if the loop is > anything but : >=20 > for (i=3DC1; i < C2; i ++) > { > ... > } >=20 > Later on, once this is thoroughly tested, I will allow : >=20 > for (i=3DC1; fct (i, C2); i =3D fct2 (i)) >=20 >=20 > where fct is any comparison function with only i and C2, > fct2 is a incrementation/decrementation calculation using i. >=20 >=20 > Any comments ? Concerns ? Questions ? > Thanks in advance, > Jc >=20 > On Thu, Oct 8, 2009 at 12:22 PM, Jean Christophe Beyler > wrote: > > Hi, > > > >> such an epilogue is needed when the # of iterations is not=20 > known in the > >> compile time; it should be fairly easy to modify the=20 > unrolling not to > >> emit it when it is not necessary, > > > > Agreed, that is why I was surprised to see this in my=20 > simple example. > > It seems to me that the whole unrolling process has been made to, on > > purpose, have this epilogue in place. > > > > In the case where the unrolling would be perfect (ie. there would be > > no epilogue), the calculation of the max bound of the=20 > unrolled version > > is always done to have this epilogue (if you have 4=20 > iterations and ask > > to unroll twice, it will actually change the max bound to 3, > > therefore, having one iteration of the unrolled version and 2 > > iterations of the original...). I am currently looking at=20 > the code of > > tree_transform_and_unroll_loop to figure out how to change this and > > not have an epilogue in my cases. > > > > Jc > > >=20 >=20