public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][modulo-sched] New flag to control reg-moves generation
@ 2007-08-02  7:24 Revital1 Eres
  2007-08-02 14:56 ` Andrey Belevantsev
  0 siblings, 1 reply; 6+ messages in thread
From: Revital1 Eres @ 2007-08-02  7:24 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: Kenneth.Zadeck, volodyan, abel, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3322 bytes --]


Hello,

This patch is the second one in the series of patches originated from
patch 1 of 2 (http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01468.html).

As mentioned in a previous message we decided to split patch 1 of 2
into several sub patches; marking known problems in the way for later
handling in order to make a progress towards the insertion of the main
issue this patch address - the removal of the profitability check which
currently cripples SMS.  (The previous ddg patch that was submitted was
out of track as it was not related to patch 1 of 2)

Here is the list of issues patch 1 of 2 addresses: (a more
detailed description of those items is available also in
http://gcc.gnu.org/wiki/SwingModuloScheduling)

1.1 Avoid SMS when the loop contains inc instruction. (commited)
1.2 Fix removal of anti-deps.
1.3 Add -fsms-allow-reg-moves flag to control reg-moves generation.
1.4 Fix order of instructions within one cycle.
1.5 Remove profitability checks.

After those issues will be handled we intend to insert further
enhancements which includes the extension of the do-loop pattern
recognition (patch 2 of 2) and address the known problems.

The attached patch handles issues 1.2 and 1.3 above.  It introduces a
new flag -fmodulo-sched-allow-regmoves which controls the generation of
reg-moves and thus perform a more aggressive SMS.

When -fno-modulo-sched-allow-regmoves is set all the dependencies exist
in the ddg and no reg-moves are needed.

When -fmodulo-sched-allow-regmoves is set we delete certain anti-deps
edges and compensate for that by generating reg-moves based on the
life-range analysis.

The anti-deps that will be deleted are the ones which have true-deps edges
in the opposite direction (in other words the kernel has only one def).
By deleting those anti-deps edges the corresponding uses are allowed
to be scheduled further away from their def, even more than ii cycles
after their def (which can be detected by the life-range analysis). The
case where there is no such opposite true-dep edge (when there is
more than one def for the relevant register) we choose not to delete
the anti-dep edge for now as deleting such edge can violate this anti
dependence without having the corresponding life range exceed II cycles.
We intend to support the removal of all the anti-deps edges as part of
the enhancements plans mentioned above.

:ADDPATCH middle-end (modulo-sched):

This patch was bootstrapped and tested on PPC and x86_64 (also with
--enable-checking=assert), with and without -fmodulo-sched-allow-regmoves
flag.

OK for mainline?

Thanks,
Revital


2007-08-02  Vladimir Yanovsky  <yanov@il.ibm.com>
            Revital Eres <eres@il.ibm.com>

        * doc/invoke.texi (-fmodulo-sched-allow-regmoves): Document new
flag.
        * ddg.c (create_ddg_dependence): Do not check for interloop edges.
        Do not create anti dependence edge when a true dependence edge
        exists in the opposite direction and -fmodulo-sched-allow-regmoves
        is set.
        (add_cross_iteration_register_deps): Create anti dependence edge
        when -fno-modulo-sched-allow-regmoves is set.
        * common.opt (-fmodulo-sched-allow-regmoves): New flag.

        * gcc.dg/sms-antideps.c: New test.

(See attached file: sms-antideps.txt)(See attached file:
patch_reg_moves_flag_2_8.txt)

[-- Attachment #2: sms-antideps.txt --]
[-- Type: text/plain, Size: 623 bytes --]

/*  This test is a reduced test case for a bug that caused
    bootstrapping with -fmodulo-sched.  Related to a broken anti-dep
    that was not fixed by reg-moves.  */

 /* { dg-do run } */
 /* { dg-options "-O2 -fmodulo-sched -fmodulo-sched-allow-regmoves" } */

#include <stdlib.h>

unsigned long long
foo (long long ixi, unsigned ctr)
{
  unsigned long long irslt = 1;
  long long ix = ixi;

  for (; ctr; ctr--)
    {
      irslt *= ix;
      ix *= ix;
    }

  if (irslt != 14348907)
    abort ();
  return irslt;
}


int
main ()
{
  unsigned long long res;

  res = foo (3, 4);
}


[-- Attachment #3: patch_reg_moves_flag_2_8.txt --]
[-- Type: text/plain, Size: 5759 bytes --]

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 127104)
+++ doc/invoke.texi	(working copy)
@@ -328,7 +328,7 @@
 -finline-functions  -finline-functions-called-once @gol
 -finline-limit=@var{n}  -fkeep-inline-functions @gol
 -fkeep-static-consts  -fmerge-constants  -fmerge-all-constants @gol
--fmodulo-sched -fno-branch-count-reg @gol
+-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-branch-count-reg @gol
 -fno-default-inline  -fno-defer-pop -fmove-loop-invariants @gol
 -fno-function-cse  -fno-guess-branch-probability @gol
 -fno-inline  -fno-math-errno  -fno-peephole  -fno-peephole2 @gol
@@ -5265,6 +5265,13 @@
 pass.  This pass looks at innermost loops and reorders their
 instructions by overlapping different iterations.
 
+@item -fmodulo-sched-allow-regmoves
+@opindex fmodulo-sched-allow-regmoves
+Perform more aggressive SMS based modulo scheduling with register moves
+allowed.  By setting this flag certain anti-dependences edges will be
+deleted which will trigger the generation of reg-moves based on the
+life-range analysis.
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: ddg.c
===================================================================
--- ddg.c	(revision 127104)
+++ ddg.c	(working copy)
@@ -150,17 +150,11 @@
 {
   ddg_edge_ptr e;
   int latency, distance = 0;
-  int interloop = (src_node->cuid >= dest_node->cuid);
   dep_type t = TRUE_DEP;
   dep_data_type dt = (mem_access_insn_p (src_node->insn)
 		      && mem_access_insn_p (dest_node->insn) ? MEM_DEP
 							     : REG_DEP);
-
-  /* For now we don't have an exact calculation of the distance,
-     so assume 1 conservatively.  */
-  if (interloop)
-     distance = 1;
-
+  gcc_assert (src_node->cuid < dest_node->cuid);
   gcc_assert (link);
 
   /* Note: REG_DEP_ANTI applies to MEM ANTI_DEP as well!!  */
@@ -168,27 +162,34 @@
     t = ANTI_DEP;
   else if (DEP_KIND (link) == REG_DEP_OUTPUT)
     t = OUTPUT_DEP;
-  latency = dep_cost (link);
 
-  e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
+  /* We currently choose to delete certain anti-deps edges and compensate
+     for that by generating reg-moves based on the life-range analysis.
+     The anti-deps that will be deleted are the ones which have true-deps
+     edges in the opposite direction (in other words the kernel has only
+     one def of the relevant register).  
+     TODO: support the removal of all anti-deps edges, i.e. including
+     those whose register has multiple defs in the loop.  */
+  if (flag_modulo_sched_allow_regmoves && (t == ANTI_DEP && dt == REG_DEP))
+    {
+      rtx set;
 
-  if (interloop)
-    {
-      /* Some interloop dependencies are relaxed:
-	 1. Every insn is output dependent on itself; ignore such deps.
-	 2. Every true/flow dependence is an anti dependence in the
-	 opposite direction with distance 1; such register deps
-	 will be removed by renaming if broken --- ignore them.  */
-      if (!(t == OUTPUT_DEP && src_node == dest_node)
-	  && !(t == ANTI_DEP && dt == REG_DEP))
-	add_backarc_to_ddg (g, e);
-      else
-	free (e);
+      set = single_set (dest_node->insn);
+      if (set)
+        {
+          int regno = REGNO (SET_DEST (set));
+          struct df_ref *first_def =
+            df_bb_regno_first_def_find (g->bb, regno);
+          struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
+
+          if (bitmap_bit_p (bb_info->gen, first_def->id))
+            return;
+        }
     }
-  else if (t == ANTI_DEP && dt == REG_DEP)
-    free (e);  /* We can fix broken anti register deps using reg-moves.  */
-  else
-    add_edge_to_ddg (g, e);
+
+   latency = dep_cost (link);
+   e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
+   add_edge_to_ddg (g, e);
 }
 
 /* The same as the above function, but it doesn't require a link parameter.  */
@@ -247,6 +248,11 @@
   gcc_assert (last_def_node);
   gcc_assert (first_def);
 
+#ifdef ENABLE_CHECKING
+  if (last_def->id != first_def->id)
+    gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
+#endif
+
   /* Create inter-loop true dependences and anti dependences.  */
   for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = r_use->next)
     {
@@ -280,14 +286,11 @@
 
 	  gcc_assert (first_def_node);
 
-          if (last_def->id != first_def->id)
-            {
-#ifdef ENABLE_CHECKING
-              gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
-#endif
-              create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
-                                      REG_DEP, 1);
-            }
+          if (last_def->id != first_def->id
+              || !flag_modulo_sched_allow_regmoves)
+            create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
+                                    REG_DEP, 1);
+
 	}
     }
   /* Create an inter-loop output dependence between LAST_DEF (which is the
Index: common.opt
===================================================================
--- common.opt	(revision 127104)
+++ common.opt	(working copy)
@@ -651,6 +651,10 @@
 Common Report Var(flag_modulo_sched) Optimization
 Perform SMS based modulo scheduling before the first scheduling pass
 
+fmodulo-sched-allow-regmoves
+Common Report Var(flag_modulo_sched_allow_regmoves)
+Perform SMS based modulo scheduling with register moves allowed
+
 fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1) Optimization
 Move loop invariant computations out of loops

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][modulo-sched] New flag to control reg-moves generation
  2007-08-02  7:24 [PATCH][modulo-sched] New flag to control reg-moves generation Revital1 Eres
@ 2007-08-02 14:56 ` Andrey Belevantsev
  0 siblings, 0 replies; 6+ messages in thread
From: Andrey Belevantsev @ 2007-08-02 14:56 UTC (permalink / raw)
  To: Revital1 Eres; +Cc: Ayal Zaks, Kenneth.Zadeck, volodyan, gcc-patches

On Thu, 2 Aug 2007, Revital1 Eres wrote:
> This patch is the second one in the series of patches originated from
> patch 1 of 2 (http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01468.html).
I am travelling at the moment, but I will check your patches on ia64 when 
I'll get back to work, which will be sometimes next week.  Sorry for not 
doing this earlier.

Andrey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][modulo-sched] New flag to control reg-moves generation
       [not found] <OFDC8B5268.741042D6-ONC225732E.0024DB5E-C225732E.00271F28@LocalDomain>
@ 2007-08-05  7:34 ` Ayal Zaks
  0 siblings, 0 replies; 6+ messages in thread
From: Ayal Zaks @ 2007-08-05  7:34 UTC (permalink / raw)
  To: Revital1 Eres; +Cc: abel, gcc-patches, Kenneth.Zadeck, volodyan

Revital1 Eres/Haifa/IBM wrote on 05/08/2007 10:07:18:

> > OK.
> > A couple of minor comments below.
>
> Attach is the patch with your comments.
> OK for mainline?
>

OK.
Ayal.

> Thanks,
> Revital
>
> [attachment "patch_sms_5_8.txt" deleted by Ayal Zaks/Haifa/IBM]
>
> > I trust no regressions were found with -fmodulo-sched-allow-regmoves,
> > right? As the current behavior allows regmoves.
> >
> > Ayal.
> >
> >
> > > Thanks,
> > > Revital
> > >
> > > 2007-08-02  Vladimir Yanovsky  <yanov@il.ibm.com>
> > >             Revital Eres <eres@il.ibm.com>
> >
> > >         * doc/invoke.texi (-fmodulo-sched-allow-regmoves): Document
new
> > flag.
> > >         * ddg.c (create_ddg_dependence): Do not check for interloop
> > edges.
> > >         Do not create anti dependence edge when a true dependence
edge
> > >         exists in the opposite direction and
> > -fmodulo-sched-allow-regmoves
> > >         is set.
> > >         (add_cross_iteration_register_deps): Create anti dependence
edge
> > >         when -fno-modulo-sched-allow-regmoves is set.
> > >         * common.opt (-fmodulo-sched-allow-regmoves): New flag.
> > >
> > >         * gcc.dg/sms-antideps.c: New test.
> > >
> > > [attachment "sms-antideps.txt" deleted by Ayal Zaks/Haifa/IBM]
> > [attachment
> > > "patch_reg_moves_flag_2_8.txt" deleted by Ayal Zaks/Haifa/IBM]
> >
> >
> >
> > ----- Forwarded by Ayal Zaks/Haifa/IBM on 03/08/2007 08:46 -----
> >
> > Ayal Zaks/Haifa/IBM wrote on 03/08/2007 08:46:28:
> >
> > > Index: doc/invoke.texi
> > > ===================================================================
> > > --- doc/invoke.texi (revision 127104)
> > > +++ doc/invoke.texi (working copy)
> > > @@ -328,7 +328,7 @@
> > >  -finline-functions  -finline-functions-called-once @gol
> > >  -finline-limit=@var{n}  -fkeep-inline-functions @gol
> > >  -fkeep-static-consts  -fmerge-constants  -fmerge-all-constants @gol
> > > --fmodulo-sched -fno-branch-count-reg @gol
> > > +-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-branch-count-reg
@gol
> > >  -fno-default-inline  -fno-defer-pop -fmove-loop-invariants @gol
> > >  -fno-function-cse  -fno-guess-branch-probability @gol
> > >  -fno-inline  -fno-math-errno  -fno-peephole  -fno-peephole2 @gol
> > > @@ -5265,6 +5265,13 @@
> > >  pass.  This pass looks at innermost loops and reorders their
> > >  instructions by overlapping different iterations.
> > >
> > > +@item -fmodulo-sched-allow-regmoves
> > > +@opindex fmodulo-sched-allow-regmoves
> > > +Perform more aggressive SMS based modulo scheduling with register
moves
> > > +allowed.  By setting this flag certain anti-dependences edges will
be
> > > +deleted which will trigger the generation of reg-moves based on the
> > > +life-range analysis.
> > > +
> > >  @item -fno-branch-count-reg
> > >  @opindex fno-branch-count-reg
> > >  Do not use ``decrement and branch'' instructions on a count
register,
> > > Index: ddg.c
> > > ===================================================================
> > > --- ddg.c (revision 127104)
> > > +++ ddg.c (working copy)
> > > @@ -150,17 +150,11 @@
> > >  {
> > >    ddg_edge_ptr e;
> > >    int latency, distance = 0;
> > > -  int interloop = (src_node->cuid >= dest_node->cuid);
> >
> > As create_ddg_dependence() treats only intra-iteration dependencies,
> > suggest renaming it to something like
> > create_ddg_dep_from_intra_loop_link().
> >
> >
> > >    dep_type t = TRUE_DEP;
> > >    dep_data_type dt = (mem_access_insn_p (src_node->insn)
> > >          && mem_access_insn_p (dest_node->insn) ? MEM_DEP
> > >              : REG_DEP);
> > > -
> > > -  /* For now we don't have an exact calculation of the distance,
> > > -     so assume 1 conservatively.  */
> > > -  if (interloop)
> > > -     distance = 1;
> > > -
> > > +  gcc_assert (src_node->cuid < dest_node->cuid);
> > >    gcc_assert (link);
> > >
> > >    /* Note: REG_DEP_ANTI applies to MEM ANTI_DEP as well!!  */
> > > @@ -168,27 +162,34 @@
> > >      t = ANTI_DEP;
> > >    else if (DEP_KIND (link) == REG_DEP_OUTPUT)
> > >      t = OUTPUT_DEP;
> > > -  latency = dep_cost (link);
> > >
> > > -  e = create_ddg_edge (src_node, dest_node, t, dt, latency,
distance);
> > > +  /* We currently choose to delete certain anti-deps edges and
> > compensate
> >                             ^^^^^^^^^
> >                             not to create
> >
> > > +     for that by generating reg-moves based on the life-range
analysis.
> > > +     The anti-deps that will be deleted are the ones which have
> > true-deps
> > > +     edges in the opposite direction (in other words the kernel has
only
> > > +     one def of the relevant register).
> > > +     TODO: support the removal of all anti-deps edges, i.e.
including
> > > +     those whose register has multiple defs in the loop.  */
> > > +  if (flag_modulo_sched_allow_regmoves && (t == ANTI_DEP && dt ==
> > REG_DEP))
> > > +    {
> > > +      rtx set;
> > >
> > > -  if (interloop)
> > > -    {
> > > -      /* Some interloop dependencies are relaxed:
> > > -  1. Every insn is output dependent on itself; ignore such deps.
> > > -  2. Every true/flow dependence is an anti dependence in the
> > > -  opposite direction with distance 1; such register deps
> > > -  will be removed by renaming if broken --- ignore them.  */
> > > -      if (!(t == OUTPUT_DEP && src_node == dest_node)
> > > -   && !(t == ANTI_DEP && dt == REG_DEP))
> > > - add_backarc_to_ddg (g, e);
> > > -      else
> > > - free (e);
> > > +      set = single_set (dest_node->insn);
> > > +      if (set)
> > > +        {
> > > +          int regno = REGNO (SET_DEST (set));
> > > +          struct df_ref *first_def =
> > > +            df_bb_regno_first_def_find (g->bb, regno);
> > > +          struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
> > > +
> > > +          if (bitmap_bit_p (bb_info->gen, first_def->id))
> > > +            return;
> > > +        }
> > >      }
> > > -  else if (t == ANTI_DEP && dt == REG_DEP)
> > > -    free (e);  /* We can fix broken anti register deps using
reg-moves.
> > */
> > > -  else
> > > -    add_edge_to_ddg (g, e);
> > > +
> > > +   latency = dep_cost (link);
> > > +   e = create_ddg_edge (src_node, dest_node, t, dt, latency,
distance);
> > > +   add_edge_to_ddg (g, e);
> > >  }
> > >
> > >  /* The same as the above function, but it doesn't require a link
> > parameter.  */
> > > @@ -247,6 +248,11 @@
> > >    gcc_assert (last_def_node);
> > >    gcc_assert (first_def);
> > >
> > > +#ifdef ENABLE_CHECKING
> > > +  if (last_def->id != first_def->id)
> > > +    gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
> > > +#endif
> > > +
> > >    /* Create inter-loop true dependences and anti dependences.  */
> > >    for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use =
> > r_use->next)
> > >      {
> > > @@ -280,14 +286,11 @@
> > >
> > >     gcc_assert (first_def_node);
> > >
> > > -          if (last_def->id != first_def->id)
> > > -            {
> > > -#ifdef ENABLE_CHECKING
> > > -              gcc_assert (!bitmap_bit_p (bb_info->gen,
first_def->id));
> > > -#endif
> > > -              create_ddg_dep_no_link (g, use_node, first_def_node,
> > ANTI_DEP,
> > > -                                      REG_DEP, 1);
> > > -            }
> > > +          if (last_def->id != first_def->id
> > > +              || !flag_modulo_sched_allow_regmoves)
> > > +            create_ddg_dep_no_link (g, use_node, first_def_node,
> > ANTI_DEP,
> > > +                                    REG_DEP, 1);
> > > +
> > >   }
> > >      }
> > >    /* Create an inter-loop output dependence between LAST_DEF (which
is
> > the
> > > Index: common.opt
> > > ===================================================================
> > > --- common.opt (revision 127104)
> > > +++ common.opt (working copy)
> > > @@ -651,6 +651,10 @@
> > >  Common Report Var(flag_modulo_sched) Optimization
> > >  Perform SMS based modulo scheduling before the first scheduling pass
> > >
> > > +fmodulo-sched-allow-regmoves
> > > +Common Report Var(flag_modulo_sched_allow_regmoves)
> > > +Perform SMS based modulo scheduling with register moves allowed
> > > +
> > >  fmove-loop-invariants
> > >  Common Report Var(flag_move_loop_invariants) Init(1) Optimization
> > >  Move loop invariant computations out of loops
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][modulo-sched] New flag to control reg-moves generation
  2007-08-03 11:46 ` Ayal Zaks
@ 2007-08-05  7:09   ` Revital1 Eres
  0 siblings, 0 replies; 6+ messages in thread
From: Revital1 Eres @ 2007-08-05  7:09 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: abel, gcc-patches, Kenneth.Zadeck, volodyan

[-- Attachment #1: Type: text/plain, Size: 7752 bytes --]

> OK.
> A couple of minor comments below.

Attach is the patch with your comments.
OK for mainline?

Thanks,
Revital

(See attached file: patch_sms_5_8.txt)

> I trust no regressions were found with -fmodulo-sched-allow-regmoves,
> right? As the current behavior allows regmoves.
>
> Ayal.
>
>
> > Thanks,
> > Revital
> >
> > 2007-08-02  Vladimir Yanovsky  <yanov@il.ibm.com>
> >             Revital Eres <eres@il.ibm.com>
>
> >         * doc/invoke.texi (-fmodulo-sched-allow-regmoves): Document new
> flag.
> >         * ddg.c (create_ddg_dependence): Do not check for interloop
> edges.
> >         Do not create anti dependence edge when a true dependence edge
> >         exists in the opposite direction and
> -fmodulo-sched-allow-regmoves
> >         is set.
> >         (add_cross_iteration_register_deps): Create anti dependence
edge
> >         when -fno-modulo-sched-allow-regmoves is set.
> >         * common.opt (-fmodulo-sched-allow-regmoves): New flag.
> >
> >         * gcc.dg/sms-antideps.c: New test.
> >
> > [attachment "sms-antideps.txt" deleted by Ayal Zaks/Haifa/IBM]
> [attachment
> > "patch_reg_moves_flag_2_8.txt" deleted by Ayal Zaks/Haifa/IBM]
>
>
>
> ----- Forwarded by Ayal Zaks/Haifa/IBM on 03/08/2007 08:46 -----
>
> Ayal Zaks/Haifa/IBM wrote on 03/08/2007 08:46:28:
>
> > Index: doc/invoke.texi
> > ===================================================================
> > --- doc/invoke.texi (revision 127104)
> > +++ doc/invoke.texi (working copy)
> > @@ -328,7 +328,7 @@
> >  -finline-functions  -finline-functions-called-once @gol
> >  -finline-limit=@var{n}  -fkeep-inline-functions @gol
> >  -fkeep-static-consts  -fmerge-constants  -fmerge-all-constants @gol
> > --fmodulo-sched -fno-branch-count-reg @gol
> > +-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-branch-count-reg
@gol
> >  -fno-default-inline  -fno-defer-pop -fmove-loop-invariants @gol
> >  -fno-function-cse  -fno-guess-branch-probability @gol
> >  -fno-inline  -fno-math-errno  -fno-peephole  -fno-peephole2 @gol
> > @@ -5265,6 +5265,13 @@
> >  pass.  This pass looks at innermost loops and reorders their
> >  instructions by overlapping different iterations.
> >
> > +@item -fmodulo-sched-allow-regmoves
> > +@opindex fmodulo-sched-allow-regmoves
> > +Perform more aggressive SMS based modulo scheduling with register
moves
> > +allowed.  By setting this flag certain anti-dependences edges will be
> > +deleted which will trigger the generation of reg-moves based on the
> > +life-range analysis.
> > +
> >  @item -fno-branch-count-reg
> >  @opindex fno-branch-count-reg
> >  Do not use ``decrement and branch'' instructions on a count register,
> > Index: ddg.c
> > ===================================================================
> > --- ddg.c (revision 127104)
> > +++ ddg.c (working copy)
> > @@ -150,17 +150,11 @@
> >  {
> >    ddg_edge_ptr e;
> >    int latency, distance = 0;
> > -  int interloop = (src_node->cuid >= dest_node->cuid);
>
> As create_ddg_dependence() treats only intra-iteration dependencies,
> suggest renaming it to something like
> create_ddg_dep_from_intra_loop_link().
>
>
> >    dep_type t = TRUE_DEP;
> >    dep_data_type dt = (mem_access_insn_p (src_node->insn)
> >          && mem_access_insn_p (dest_node->insn) ? MEM_DEP
> >              : REG_DEP);
> > -
> > -  /* For now we don't have an exact calculation of the distance,
> > -     so assume 1 conservatively.  */
> > -  if (interloop)
> > -     distance = 1;
> > -
> > +  gcc_assert (src_node->cuid < dest_node->cuid);
> >    gcc_assert (link);
> >
> >    /* Note: REG_DEP_ANTI applies to MEM ANTI_DEP as well!!  */
> > @@ -168,27 +162,34 @@
> >      t = ANTI_DEP;
> >    else if (DEP_KIND (link) == REG_DEP_OUTPUT)
> >      t = OUTPUT_DEP;
> > -  latency = dep_cost (link);
> >
> > -  e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
> > +  /* We currently choose to delete certain anti-deps edges and
> compensate
>                             ^^^^^^^^^
>                             not to create
>
> > +     for that by generating reg-moves based on the life-range
analysis.
> > +     The anti-deps that will be deleted are the ones which have
> true-deps
> > +     edges in the opposite direction (in other words the kernel has
only
> > +     one def of the relevant register).
> > +     TODO: support the removal of all anti-deps edges, i.e. including
> > +     those whose register has multiple defs in the loop.  */
> > +  if (flag_modulo_sched_allow_regmoves && (t == ANTI_DEP && dt ==
> REG_DEP))
> > +    {
> > +      rtx set;
> >
> > -  if (interloop)
> > -    {
> > -      /* Some interloop dependencies are relaxed:
> > -  1. Every insn is output dependent on itself; ignore such deps.
> > -  2. Every true/flow dependence is an anti dependence in the
> > -  opposite direction with distance 1; such register deps
> > -  will be removed by renaming if broken --- ignore them.  */
> > -      if (!(t == OUTPUT_DEP && src_node == dest_node)
> > -   && !(t == ANTI_DEP && dt == REG_DEP))
> > - add_backarc_to_ddg (g, e);
> > -      else
> > - free (e);
> > +      set = single_set (dest_node->insn);
> > +      if (set)
> > +        {
> > +          int regno = REGNO (SET_DEST (set));
> > +          struct df_ref *first_def =
> > +            df_bb_regno_first_def_find (g->bb, regno);
> > +          struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
> > +
> > +          if (bitmap_bit_p (bb_info->gen, first_def->id))
> > +            return;
> > +        }
> >      }
> > -  else if (t == ANTI_DEP && dt == REG_DEP)
> > -    free (e);  /* We can fix broken anti register deps using
reg-moves.
> */
> > -  else
> > -    add_edge_to_ddg (g, e);
> > +
> > +   latency = dep_cost (link);
> > +   e = create_ddg_edge (src_node, dest_node, t, dt, latency,
distance);
> > +   add_edge_to_ddg (g, e);
> >  }
> >
> >  /* The same as the above function, but it doesn't require a link
> parameter.  */
> > @@ -247,6 +248,11 @@
> >    gcc_assert (last_def_node);
> >    gcc_assert (first_def);
> >
> > +#ifdef ENABLE_CHECKING
> > +  if (last_def->id != first_def->id)
> > +    gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
> > +#endif
> > +
> >    /* Create inter-loop true dependences and anti dependences.  */
> >    for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use =
> r_use->next)
> >      {
> > @@ -280,14 +286,11 @@
> >
> >     gcc_assert (first_def_node);
> >
> > -          if (last_def->id != first_def->id)
> > -            {
> > -#ifdef ENABLE_CHECKING
> > -              gcc_assert (!bitmap_bit_p (bb_info->gen,
first_def->id));
> > -#endif
> > -              create_ddg_dep_no_link (g, use_node, first_def_node,
> ANTI_DEP,
> > -                                      REG_DEP, 1);
> > -            }
> > +          if (last_def->id != first_def->id
> > +              || !flag_modulo_sched_allow_regmoves)
> > +            create_ddg_dep_no_link (g, use_node, first_def_node,
> ANTI_DEP,
> > +                                    REG_DEP, 1);
> > +
> >   }
> >      }
> >    /* Create an inter-loop output dependence between LAST_DEF (which is
> the
> > Index: common.opt
> > ===================================================================
> > --- common.opt (revision 127104)
> > +++ common.opt (working copy)
> > @@ -651,6 +651,10 @@
> >  Common Report Var(flag_modulo_sched) Optimization
> >  Perform SMS based modulo scheduling before the first scheduling pass
> >
> > +fmodulo-sched-allow-regmoves
> > +Common Report Var(flag_modulo_sched_allow_regmoves)
> > +Perform SMS based modulo scheduling with register moves allowed
> > +
> >  fmove-loop-invariants
> >  Common Report Var(flag_move_loop_invariants) Init(1) Optimization
> >  Move loop invariant computations out of loops
>

[-- Attachment #2: patch_sms_5_8.txt --]
[-- Type: text/plain, Size: 9347 bytes --]

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 127221)
+++ doc/invoke.texi	(working copy)
@@ -328,7 +328,7 @@
 -finline-functions  -finline-functions-called-once @gol
 -finline-limit=@var{n}  -fkeep-inline-functions @gol
 -fkeep-static-consts  -fmerge-constants  -fmerge-all-constants @gol
--fmodulo-sched -fno-branch-count-reg @gol
+-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-branch-count-reg @gol
 -fno-default-inline  -fno-defer-pop -fmove-loop-invariants @gol
 -fno-function-cse  -fno-guess-branch-probability @gol
 -fno-inline  -fno-math-errno  -fno-peephole  -fno-peephole2 @gol
@@ -5265,6 +5265,13 @@
 pass.  This pass looks at innermost loops and reorders their
 instructions by overlapping different iterations.
 
+@item -fmodulo-sched-allow-regmoves
+@opindex fmodulo-sched-allow-regmoves
+Perform more aggressive SMS based modulo scheduling with register moves
+allowed.  By setting this flag certain anti-dependences edges will be
+deleted which will trigger the generation of reg-moves based on the
+life-range analysis.
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: ddg.c
===================================================================
--- ddg.c	(revision 127221)
+++ ddg.c	(working copy)
@@ -51,7 +51,8 @@
 static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr);
 static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr);
 static void add_scc_to_ddg (ddg_all_sccs_ptr, ddg_scc_ptr);
-static void create_ddg_dependence (ddg_ptr, ddg_node_ptr, ddg_node_ptr, dep_t);
+static void create_ddg_dep_from_intra_loop_link (ddg_ptr, ddg_node_ptr,
+                                                 ddg_node_ptr, dep_t);
 static void create_ddg_dep_no_link (ddg_ptr, ddg_node_ptr, ddg_node_ptr,
  				    dep_type, dep_data_type, int);
 static ddg_edge_ptr create_ddg_edge (ddg_node_ptr, ddg_node_ptr, dep_type,
@@ -145,22 +146,16 @@
 /* Computes the dependence parameters (latency, distance etc.), creates
    a ddg_edge and adds it to the given DDG.  */
 static void
-create_ddg_dependence (ddg_ptr g, ddg_node_ptr src_node,
-		       ddg_node_ptr dest_node, dep_t link)
+create_ddg_dep_from_intra_loop_link (ddg_ptr g, ddg_node_ptr src_node,
+                                     ddg_node_ptr dest_node, dep_t link)
 {
   ddg_edge_ptr e;
   int latency, distance = 0;
-  int interloop = (src_node->cuid >= dest_node->cuid);
   dep_type t = TRUE_DEP;
   dep_data_type dt = (mem_access_insn_p (src_node->insn)
 		      && mem_access_insn_p (dest_node->insn) ? MEM_DEP
 							     : REG_DEP);
-
-  /* For now we don't have an exact calculation of the distance,
-     so assume 1 conservatively.  */
-  if (interloop)
-     distance = 1;
-
+  gcc_assert (src_node->cuid < dest_node->cuid);
   gcc_assert (link);
 
   /* Note: REG_DEP_ANTI applies to MEM ANTI_DEP as well!!  */
@@ -168,27 +163,34 @@
     t = ANTI_DEP;
   else if (DEP_KIND (link) == REG_DEP_OUTPUT)
     t = OUTPUT_DEP;
-  latency = dep_cost (link);
 
-  e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
+  /* We currently choose not to create certain anti-deps edges and
+     compensate for that by generating reg-moves based on the life-range
+     analysis.  The anti-deps that will be deleted are the ones which
+     have true-deps edges in the opposite direction (in other words
+     the kernel has only one def of the relevant register).  TODO:
+     support the removal of all anti-deps edges, i.e. including those
+     whose register has multiple defs in the loop.  */
+  if (flag_modulo_sched_allow_regmoves && (t == ANTI_DEP && dt == REG_DEP))
+    {
+      rtx set;
 
-  if (interloop)
-    {
-      /* Some interloop dependencies are relaxed:
-	 1. Every insn is output dependent on itself; ignore such deps.
-	 2. Every true/flow dependence is an anti dependence in the
-	 opposite direction with distance 1; such register deps
-	 will be removed by renaming if broken --- ignore them.  */
-      if (!(t == OUTPUT_DEP && src_node == dest_node)
-	  && !(t == ANTI_DEP && dt == REG_DEP))
-	add_backarc_to_ddg (g, e);
-      else
-	free (e);
+      set = single_set (dest_node->insn);
+      if (set)
+        {
+          int regno = REGNO (SET_DEST (set));
+          struct df_ref *first_def =
+            df_bb_regno_first_def_find (g->bb, regno);
+          struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
+
+          if (bitmap_bit_p (bb_info->gen, first_def->id))
+            return;
+        }
     }
-  else if (t == ANTI_DEP && dt == REG_DEP)
-    free (e);  /* We can fix broken anti register deps using reg-moves.  */
-  else
-    add_edge_to_ddg (g, e);
+
+   latency = dep_cost (link);
+   e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
+   add_edge_to_ddg (g, e);
 }
 
 /* The same as the above function, but it doesn't require a link parameter.  */
@@ -247,6 +249,11 @@
   gcc_assert (last_def_node);
   gcc_assert (first_def);
 
+#ifdef ENABLE_CHECKING
+  if (last_def->id != first_def->id)
+    gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
+#endif
+
   /* Create inter-loop true dependences and anti dependences.  */
   for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = r_use->next)
     {
@@ -280,14 +287,11 @@
 
 	  gcc_assert (first_def_node);
 
-          if (last_def->id != first_def->id)
-            {
-#ifdef ENABLE_CHECKING
-              gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
-#endif
-              create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
-                                      REG_DEP, 1);
-            }
+          if (last_def->id != first_def->id
+              || !flag_modulo_sched_allow_regmoves)
+            create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
+                                    REG_DEP, 1);
+
 	}
     }
   /* Create an inter-loop output dependence between LAST_DEF (which is the
@@ -392,7 +396,7 @@
 	    continue;
 
       	  add_forw_dep (link);
-	  create_ddg_dependence (g, src_node, dest_node, dep);
+	  create_ddg_dep_from_intra_loop_link (g, src_node, dest_node, dep);
 	}
 
       /* If this insn modifies memory, add an edge to all insns that access
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 127221)
+++ ChangeLog	(working copy)
@@ -1,3 +1,18 @@
+2007-08-05  Vladimir Yanovsky  <yanov@il.ibm.com>
+            Revital Eres <eres@il.ibm.com>
+
+	* doc/invoke.texi (-fmodulo-sched-allow-regmoves): Document new
+	flag.
+	* ddg.c (create_ddg_dependence): Rename to...
+	(create_ddg_dep_from_intra_loop_link): This.  Do not check
+	for interloop edges.  Do not create anti dependence edge when
+	a true dependence edge exists in the opposite direction and
+	-fmodulo-sched-allow-regmoves is set.
+	(build_intra_loop_deps): Call create_ddg_dep_from_intra_loop_link.
+	(add_cross_iteration_register_deps): Create anti dependence edge
+	when -fno-modulo-sched-allow-regmoves is set.
+	* common.opt (-fmodulo-sched-allow-regmoves): New flag.
+
 2007-08-04  Richard Sandiford  <richard@codesourcery.com>
 
 	* config/arm/arm.md (movsi): Add braces.
Index: testsuite/gcc.dg/sms-antideps.c
===================================================================
--- testsuite/gcc.dg/sms-antideps.c	(revision 0)
+++ testsuite/gcc.dg/sms-antideps.c	(revision 0)
@@ -0,0 +1,37 @@
+/*  This test is a reduced test case for a bug that caused
+    bootstrapping with -fmodulo-sched.  Related to a broken anti-dep
+    that was not fixed by reg-moves.  */
+
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fmodulo-sched -fmodulo-sched-allow-regmoves" } */
+
+#include <stdlib.h>
+
+unsigned long long
+foo (long long ixi, unsigned ctr)
+{
+  unsigned long long irslt = 1;
+  long long ix = ixi;
+
+  for (; ctr; ctr--)
+    {
+      irslt *= ix;
+      ix *= ix;
+    }
+
+  if (irslt != 14348907)
+    abort ();
+  return irslt;
+}
+
+
+int
+main ()
+{
+  unsigned long long res;
+
+  res = foo (3, 4);
+}
+
+
+
Index: testsuite/ChangeLog
===================================================================
--- testsuite/ChangeLog	(revision 127221)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2007-08-05  Vladimir Yanovsky  <yanov@il.ibm.com>
+            Revital Eres <eres@il.ibm.com>
+
+	* gcc.dg/sms-antideps.c: New test.
+
 2007-08-04  Paul Thomas  <pault@gcc.gnu.org>
 
 	PR fortran/31214
Index: common.opt
===================================================================
--- common.opt	(revision 127221)
+++ common.opt	(working copy)
@@ -651,6 +651,10 @@
 Common Report Var(flag_modulo_sched) Optimization
 Perform SMS based modulo scheduling before the first scheduling pass
 
+fmodulo-sched-allow-regmoves
+Common Report Var(flag_modulo_sched_allow_regmoves)
+Perform SMS based modulo scheduling with register moves allowed
+
 fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1) Optimization
 Move loop invariant computations out of loops

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][modulo-sched] New flag to control reg-moves generation
       [not found] <OFCEF31A19.638AF6EA-ONC225732C.001EF49E-C225732C.0040AC87@LocalDomain>
@ 2007-08-03 16:07 ` Revital1 Eres
  0 siblings, 0 replies; 6+ messages in thread
From: Revital1 Eres @ 2007-08-03 16:07 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: abel, gcc-patches, Kenneth.Zadeck, volodyan


> >
> > :ADDPATCH middle-end (modulo-sched):
> >
> > This patch was bootstrapped and tested on PPC and x86_64 (also with
> > --enable-checking=assert), with and without
-fmodulo-sched-allow-regmoves
> > flag.
> >
> > OK for mainline?
> >

> OK.
> A couple of minor comments below.
> I trust no regressions were found with -fmodulo-sched-allow-regmoves,
right?
> As the current behavior allows regmoves.

Yes, no regressions with -fmodulo-sched-allow-regmoves.
I'll commit the patch with your comments.

Thanks,
Revital

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][modulo-sched] New flag to control reg-moves generation
       [not found] <OF2ACDB28C.B413182E-ONC2257327.00660549-C225732B.00288554@LocalDomain>
@ 2007-08-03 11:46 ` Ayal Zaks
  2007-08-05  7:09   ` Revital1 Eres
  0 siblings, 1 reply; 6+ messages in thread
From: Ayal Zaks @ 2007-08-03 11:46 UTC (permalink / raw)
  To: Revital1 Eres; +Cc: abel, gcc-patches, Kenneth.Zadeck, volodyan

Revital1 Eres/Haifa/IBM wrote on 02/08/2007 10:22:35:

> Hello,
>
> This patch is the second one in the series of patches originated from
> patch 1 of 2 (http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01468.html).
>
> As mentioned in a previous message we decided to split patch 1 of 2
> into several sub patches; marking known problems in the way for later
> handling in order to make a progress towards the insertion of the main
> issue this patch address - the removal of the profitability check which
> currently cripples SMS.  (The previous ddg patch that was submitted was
> out of track as it was not related to patch 1 of 2)
>
> Here is the list of issues patch 1 of 2 addresses: (a more
> detailed description of those items is available also in
> http://gcc.gnu.org/wiki/SwingModuloScheduling)
>
> 1.1 Avoid SMS when the loop contains inc instruction. (commited)
> 1.2 Fix removal of anti-deps.
> 1.3 Add -fsms-allow-reg-moves flag to control reg-moves generation.
> 1.4 Fix order of instructions within one cycle.
> 1.5 Remove profitability checks.
>
> After those issues will be handled we intend to insert further
> enhancements which includes the extension of the do-loop pattern
> recognition (patch 2 of 2) and address the known problems.
>
> The attached patch handles issues 1.2 and 1.3 above.  It introduces a
> new flag -fmodulo-sched-allow-regmoves which controls the generation of
> reg-moves and thus perform a more aggressive SMS.
>
> When -fno-modulo-sched-allow-regmoves is set all the dependencies exist
> in the ddg and no reg-moves are needed.
>
> When -fmodulo-sched-allow-regmoves is set we delete certain anti-deps
> edges and compensate for that by generating reg-moves based on the
> life-range analysis.
>
> The anti-deps that will be deleted are the ones which have true-deps
edges
> in the opposite direction (in other words the kernel has only one def).
> By deleting those anti-deps edges the corresponding uses are allowed
> to be scheduled further away from their def, even more than ii cycles
> after their def (which can be detected by the life-range analysis). The
> case where there is no such opposite true-dep edge (when there is
> more than one def for the relevant register) we choose not to delete
> the anti-dep edge for now as deleting such edge can violate this anti
> dependence without having the corresponding life range exceed II cycles.
> We intend to support the removal of all the anti-deps edges as part of
> the enhancements plans mentioned above.
>
> :ADDPATCH middle-end (modulo-sched):
>
> This patch was bootstrapped and tested on PPC and x86_64 (also with
> --enable-checking=assert), with and without -fmodulo-sched-allow-regmoves
> flag.
>
> OK for mainline?
>

OK.
A couple of minor comments below.
I trust no regressions were found with -fmodulo-sched-allow-regmoves,
right? As the current behavior allows regmoves.

Ayal.


> Thanks,
> Revital
>
> 2007-08-02  Vladimir Yanovsky  <yanov@il.ibm.com>
>             Revital Eres <eres@il.ibm.com>

>         * doc/invoke.texi (-fmodulo-sched-allow-regmoves): Document new
flag.
>         * ddg.c (create_ddg_dependence): Do not check for interloop
edges.
>         Do not create anti dependence edge when a true dependence edge
>         exists in the opposite direction and
-fmodulo-sched-allow-regmoves
>         is set.
>         (add_cross_iteration_register_deps): Create anti dependence edge
>         when -fno-modulo-sched-allow-regmoves is set.
>         * common.opt (-fmodulo-sched-allow-regmoves): New flag.
>
>         * gcc.dg/sms-antideps.c: New test.
>
> [attachment "sms-antideps.txt" deleted by Ayal Zaks/Haifa/IBM]
[attachment
> "patch_reg_moves_flag_2_8.txt" deleted by Ayal Zaks/Haifa/IBM]



----- Forwarded by Ayal Zaks/Haifa/IBM on 03/08/2007 08:46 -----

Ayal Zaks/Haifa/IBM wrote on 03/08/2007 08:46:28:

> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi (revision 127104)
> +++ doc/invoke.texi (working copy)
> @@ -328,7 +328,7 @@
>  -finline-functions  -finline-functions-called-once @gol
>  -finline-limit=@var{n}  -fkeep-inline-functions @gol
>  -fkeep-static-consts  -fmerge-constants  -fmerge-all-constants @gol
> --fmodulo-sched -fno-branch-count-reg @gol
> +-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-branch-count-reg @gol
>  -fno-default-inline  -fno-defer-pop -fmove-loop-invariants @gol
>  -fno-function-cse  -fno-guess-branch-probability @gol
>  -fno-inline  -fno-math-errno  -fno-peephole  -fno-peephole2 @gol
> @@ -5265,6 +5265,13 @@
>  pass.  This pass looks at innermost loops and reorders their
>  instructions by overlapping different iterations.
>
> +@item -fmodulo-sched-allow-regmoves
> +@opindex fmodulo-sched-allow-regmoves
> +Perform more aggressive SMS based modulo scheduling with register moves
> +allowed.  By setting this flag certain anti-dependences edges will be
> +deleted which will trigger the generation of reg-moves based on the
> +life-range analysis.
> +
>  @item -fno-branch-count-reg
>  @opindex fno-branch-count-reg
>  Do not use ``decrement and branch'' instructions on a count register,
> Index: ddg.c
> ===================================================================
> --- ddg.c (revision 127104)
> +++ ddg.c (working copy)
> @@ -150,17 +150,11 @@
>  {
>    ddg_edge_ptr e;
>    int latency, distance = 0;
> -  int interloop = (src_node->cuid >= dest_node->cuid);

As create_ddg_dependence() treats only intra-iteration dependencies,
suggest renaming it to something like
create_ddg_dep_from_intra_loop_link().


>    dep_type t = TRUE_DEP;
>    dep_data_type dt = (mem_access_insn_p (src_node->insn)
>          && mem_access_insn_p (dest_node->insn) ? MEM_DEP
>              : REG_DEP);
> -
> -  /* For now we don't have an exact calculation of the distance,
> -     so assume 1 conservatively.  */
> -  if (interloop)
> -     distance = 1;
> -
> +  gcc_assert (src_node->cuid < dest_node->cuid);
>    gcc_assert (link);
>
>    /* Note: REG_DEP_ANTI applies to MEM ANTI_DEP as well!!  */
> @@ -168,27 +162,34 @@
>      t = ANTI_DEP;
>    else if (DEP_KIND (link) == REG_DEP_OUTPUT)
>      t = OUTPUT_DEP;
> -  latency = dep_cost (link);
>
> -  e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
> +  /* We currently choose to delete certain anti-deps edges and
compensate
                            ^^^^^^^^^
                            not to create

> +     for that by generating reg-moves based on the life-range analysis.
> +     The anti-deps that will be deleted are the ones which have
true-deps
> +     edges in the opposite direction (in other words the kernel has only
> +     one def of the relevant register).
> +     TODO: support the removal of all anti-deps edges, i.e. including
> +     those whose register has multiple defs in the loop.  */
> +  if (flag_modulo_sched_allow_regmoves && (t == ANTI_DEP && dt ==
REG_DEP))
> +    {
> +      rtx set;
>
> -  if (interloop)
> -    {
> -      /* Some interloop dependencies are relaxed:
> -  1. Every insn is output dependent on itself; ignore such deps.
> -  2. Every true/flow dependence is an anti dependence in the
> -  opposite direction with distance 1; such register deps
> -  will be removed by renaming if broken --- ignore them.  */
> -      if (!(t == OUTPUT_DEP && src_node == dest_node)
> -   && !(t == ANTI_DEP && dt == REG_DEP))
> - add_backarc_to_ddg (g, e);
> -      else
> - free (e);
> +      set = single_set (dest_node->insn);
> +      if (set)
> +        {
> +          int regno = REGNO (SET_DEST (set));
> +          struct df_ref *first_def =
> +            df_bb_regno_first_def_find (g->bb, regno);
> +          struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
> +
> +          if (bitmap_bit_p (bb_info->gen, first_def->id))
> +            return;
> +        }
>      }
> -  else if (t == ANTI_DEP && dt == REG_DEP)
> -    free (e);  /* We can fix broken anti register deps using reg-moves.
*/
> -  else
> -    add_edge_to_ddg (g, e);
> +
> +   latency = dep_cost (link);
> +   e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
> +   add_edge_to_ddg (g, e);
>  }
>
>  /* The same as the above function, but it doesn't require a link
parameter.  */
> @@ -247,6 +248,11 @@
>    gcc_assert (last_def_node);
>    gcc_assert (first_def);
>
> +#ifdef ENABLE_CHECKING
> +  if (last_def->id != first_def->id)
> +    gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
> +#endif
> +
>    /* Create inter-loop true dependences and anti dependences.  */
>    for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use =
r_use->next)
>      {
> @@ -280,14 +286,11 @@
>
>     gcc_assert (first_def_node);
>
> -          if (last_def->id != first_def->id)
> -            {
> -#ifdef ENABLE_CHECKING
> -              gcc_assert (!bitmap_bit_p (bb_info->gen, first_def->id));
> -#endif
> -              create_ddg_dep_no_link (g, use_node, first_def_node,
ANTI_DEP,
> -                                      REG_DEP, 1);
> -            }
> +          if (last_def->id != first_def->id
> +              || !flag_modulo_sched_allow_regmoves)
> +            create_ddg_dep_no_link (g, use_node, first_def_node,
ANTI_DEP,
> +                                    REG_DEP, 1);
> +
>   }
>      }
>    /* Create an inter-loop output dependence between LAST_DEF (which is
the
> Index: common.opt
> ===================================================================
> --- common.opt (revision 127104)
> +++ common.opt (working copy)
> @@ -651,6 +651,10 @@
>  Common Report Var(flag_modulo_sched) Optimization
>  Perform SMS based modulo scheduling before the first scheduling pass
>
> +fmodulo-sched-allow-regmoves
> +Common Report Var(flag_modulo_sched_allow_regmoves)
> +Perform SMS based modulo scheduling with register moves allowed
> +
>  fmove-loop-invariants
>  Common Report Var(flag_move_loop_invariants) Init(1) Optimization
>  Move loop invariant computations out of loops

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-08-05  7:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-02  7:24 [PATCH][modulo-sched] New flag to control reg-moves generation Revital1 Eres
2007-08-02 14:56 ` Andrey Belevantsev
     [not found] <OF2ACDB28C.B413182E-ONC2257327.00660549-C225732B.00288554@LocalDomain>
2007-08-03 11:46 ` Ayal Zaks
2007-08-05  7:09   ` Revital1 Eres
     [not found] <OFCEF31A19.638AF6EA-ONC225732C.001EF49E-C225732C.0040AC87@LocalDomain>
2007-08-03 16:07 ` Revital1 Eres
     [not found] <OFDC8B5268.741042D6-ONC225732E.0024DB5E-C225732E.00271F28@LocalDomain>
2007-08-05  7:34 ` Ayal Zaks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).