From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-157287-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 21464 invoked by alias); 15 Oct 2009 09:32:21 -0000
Received: (qmail 21450 invoked by uid 22791); 15 Oct 2009 09:32:19 -0000
X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 	tests=AWL,BAYES_50
X-Spam-Check-By: sourceware.org
Received: from mms2.broadcom.com (HELO mms2.broadcom.com) (216.31.210.18)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Oct 2009 09:32:14 +0000
Received: from [10.16.192.232] by mms2.broadcom.com with ESMTP (Broadcom  SMTP Relay (Email Firewall v6.3.2)); Thu, 15 Oct 2009 02:31:54 -0700
X-Server-Uuid: D3C04415-6FA8-4F2C-93C1-920E106A2031
Received: from SJEXCHCCR02.corp.ad.broadcom.com ([10.16.192.130]) by  SJEXCHHUB02.corp.ad.broadcom.com ([10.16.192.232]) with mapi; Thu, 15  Oct 2009 02:31:54 -0700
From: "Bingfeng Mei" <bmei@broadcom.com>
To: "Jean Christophe Beyler" <jean.christophe.beyler@gmail.com>, 	"Zdenek Dvorak" <rakdver@kam.mff.cuni.cz>
cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Date: Thu, 15 Oct 2009 10:00:00 -0000
Subject: RE: Turning off unrolling to certain loops
Message-ID: <7FB04A5C213E9943A72EE127DB74F0AD93CCBE4AD6@SJEXCHCCR02.corp.ad.broadcom.com>
References: <c568a2600910051346n7ea55296iba879c4cae9f6be3@mail.gmail.com>  <20091006065352.GB15626@kam.mff.cuni.cz>  <c568a2600910060634v4d78ad75x463d7e20edeea0f4@mail.gmail.com>  <20091006135624.GA18714@kam.mff.cuni.cz>  <c568a2600910060756m96adf86mb1507c6717fdfdb6@mail.gmail.com>  <20091006150918.GA19277@kam.mff.cuni.cz>  <c568a2600910060828l24ec5ecct926eb2624a1d2157@mail.gmail.com>  <c568a2600910080722k794552fvbde46fed95601b21@mail.gmail.com>  <20091008161802.GA30141@kam.mff.cuni.cz>  <c568a2600910080922n7f5496bfged5d9473ca04200b@mail.gmail.com>  <c568a2600910141129i75b2fa2bre4aff548ed376905@mail.gmail.com>
In-Reply-To: <c568a2600910141129i75b2fa2bre4aff548ed376905@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;  charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-10/txt/msg00333.txt.bz2

Hello,
I faced a similar issue a while ago. I filed a bug report=20
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D36712) In the end,=20
I implemented a simple tree-level unrolling pass in our port
which uses all the existing infrastructure. It works quite well for=20
our purpose, but I hesitated to submit the patch because it contains
our not-very-elegannt #prgama unroll implementation.=20

The following two functions do the unrolling. I insert the pass=20
just after the loop_prefetch pass (4.4)

Cheers,
Bingfeng Mei


/* Perfect unrolling of a loop */
static void tree_unroll_perfect_loop (struct loop *loop, unsigned factor,
		  edge exit)
{
  sbitmap wont_exit;
  edge e;
  bool ok;
  unsigned i;
  VEC (edge, heap) *to_remove =3D NULL;
=20=20
  /* Unroll the loop and remove the exits in all iterations except for the
     last one.  */
  wont_exit =3D sbitmap_alloc (factor);
  sbitmap_ones (wont_exit);
  RESET_BIT (wont_exit, factor - 1);

  ok =3D gimple_duplicate_loop_to_header_edge
	  (loop, loop_latch_edge (loop), factor - 1,
	   wont_exit, exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ);
  free (wont_exit);
  gcc_assert (ok);

  for (i =3D 0; VEC_iterate (edge, to_remove, i, e); i++)
    {
      ok =3D remove_path (e);
      gcc_assert (ok);
    }
  VEC_free (edge, heap, to_remove);
  update_ssa (TODO_update_ssa);
=20=20
#ifdef ENABLE_CHECKING
  verify_flow_info ();
  verify_dominators (CDI_DOMINATORS);
  verify_loop_structure ();
  verify_loop_closed_ssa ();
#endif
}


=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
/* Go through all the loops:=20
     1. Determine unrolling factor
     2. Unroll loops in different conditions
        -- perfect loop: no extra copy of original loop
        -- other loops: the original version of loops to execute the remain=
ing iterations
*/=20=20=20=20=20=20=20=20
static unsigned int rest_of_tree_unroll (void)
{
  loop_iterator li;
  struct loop *loop;
  unsigned ninsns, unroll_factor;
  HOST_WIDE_INT est_niter;
  struct tree_niter_desc desc;
  bool unrolled =3D false;

  initialize_original_copy_tables ();
=20=20
  /* Scan the loops, inner ones first.  */
  FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
  {
=20=20=20=20=20
     est_niter =3D estimated_loop_iterations_int (loop, false);
     ninsns =3D tree_num_loop_insns (loop, &eni_size_weights);
     unroll_factor =3D determine_unroll_factor (loop, ninsns, &desc, est_ni=
ter);
     if (unroll_factor !=3D 1)
     {
       tree niters =3D number_of_exit_cond_executions(loop);
=20=20=20=20=20=20=20
       bool perfect_unrolling =3D false;
       if(niters !=3D NULL_TREE && niters!=3D chrec_dont_know && TREE_CODE(=
niters) =3D=3D INTEGER_CST){
         int num_iters =3D tree_low_cst(niters, 1);
         if((num_iters % unroll_factor) =3D=3D 0)
           perfect_unrolling =3D true;
       }
=20=20=20=20=20=20=20
       /* If no. of iterations can be divided by unrolling factor, we have =
perfect unrolling */
       if(perfect_unrolling){
         tree_unroll_perfect_loop(loop, unroll_factor, single_dom_exit(loop=
));
       }
       else{
         tree_unroll_loop (loop, unroll_factor, single_dom_exit (loop), &de=
sc);
       }=20=20
       unrolled =3D true;
     }
  }=20=20
=20=20
  free_original_copy_tables ();
=20=20
  /* Need to rebuild call graph due if new function calls are created due to
    loop unrolling=20
    FIXME: rebuild cgraph will lose some information like reason of not inl=
ining*/
  if(unrolled)=20=20
    rebuild_cgraph_edges();
  /*debug_cgraph();*/
=20=20
  return 0;
}

> -----Original Message-----
> From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On=20
> Behalf Of Jean Christophe Beyler
> Sent: 14 October 2009 19:29
> To: Zdenek Dvorak
> Cc: gcc@gcc.gnu.org
> Subject: Re: Turning off unrolling to certain loops
>=20
> Ok, I've actually gone a different route. Instead of waiting for the
> middle end to perform this, I've directly modified the parser stage to
> unroll the loop directly there.
>=20
> Basically, I take the parser of the for and modify how it adds the
> various statements. Telling it to, instead of doing in the
> c_finish_loop :
>=20
>   if (body)
>     add_stmt (body);
>   if (clab)
>     add_stmt (build1 (LABEL_EXPR, void_type_node, clab));
>   if (incr)
>     add_stmt (incr);
> ...
>=20
> I tell it to add multiple copies of body and incr and the at the end
> add in the loop the rest of it. I've also added support to remove
> further unrolling to these modified loops and will be handling the
> "No-unroll" pragma. I then let the rest of the optimization passes,
> fuse the incrementations together if possible,  etc.
>=20
> The initial results are quite good and seem to work and=20
> produce good code.
>=20
> Currently, there are two possibilities :
>=20
> - If the loop is not in the form we want, for example:
>=20
> for (;i<n;)
> {
>     ...
> }
>=20
> Do we still unroll even though we have to trust the user that the
> number of unrolling will not break the semantics ?
>=20
> To handle this, I am adding warnings that will appear if the loop is
> anything but :
>=20
> for (i=3DC1; i < C2; i ++)
> {
> ...
> }
>=20
> Later on, once this is thoroughly tested, I will allow :
>=20
> for (i=3DC1; fct (i, C2); i =3D fct2 (i))
>=20
>=20
> where fct is any comparison function with only i and C2,
>          fct2 is a incrementation/decrementation calculation using i.
>=20
>=20
> Any comments ? Concerns ? Questions ?
> Thanks in advance,
> Jc
>=20
> On Thu, Oct 8, 2009 at 12:22 PM, Jean Christophe Beyler
> <jean.christophe.beyler@gmail.com> wrote:
> > Hi,
> >
> >> such an epilogue is needed when the # of iterations is not=20
> known in the
> >> compile time; it should be fairly easy to modify the=20
> unrolling not to
> >> emit it when it is not necessary,
> >
> > Agreed, that is why I was surprised to see this in my=20
> simple example.
> > It seems to me that the whole unrolling process has been made to, on
> > purpose, have this epilogue in place.
> >
> > In the case where the unrolling would be perfect (ie. there would be
> > no epilogue), the calculation of the max bound of the=20
> unrolled version
> > is always done to have this epilogue (if you have 4=20
> iterations and ask
> > to unroll twice, it will actually change the max bound to 3,
> > therefore, having one iteration of the unrolled version and 2
> > iterations of the original...). I am currently looking at=20
> the code of
> > tree_transform_and_unroll_loop to figure out how to change this and
> > not have an epilogue in my cases.
> >
> > Jc
> >
>=20
>=20