public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [sel-sched] IA-64 back-end improvements and tweaks
@ 2007-05-22 11:41 Alexander Monakov
  2007-05-23  9:25 ` Andrey Belevantsev
  0 siblings, 1 reply; 2+ messages in thread
From: Alexander Monakov @ 2007-05-22 11:41 UTC (permalink / raw)
  To: gcc-patches.gcc.gnu.org; +Cc: Andrey Belevantsev

[-- Attachment #1: Type: text/plain, Size: 4764 bytes --]

Hello,

The first patch adds an important improvement to IA-64 backend: it allows  
the
backend to place stop bits after every simulated processor cycle.  This
typically improves floating-point code sequences, because processor  
front-end
does not take argument readiness into consideration when dispersing
instructions, and in case when functional units' availability allows to  
disperse
insns with not-yet-ready arguments, the whole parallel group stalls.  This
also improves generated code when scheduler finds a parallel group of 7
instructions (while Itanium allows at most 6 instructions per cycle),  
then, as
needed to resolve dependencies, emits a stop bit.  Thus, there is a split
issue after sixth insn (due to resource oversubscription), and also after
seventh (due to explicit stop bit).  With this patch, a stop bit would be
emitted after the sixth instruction.  However, this may cause severe
regressions on some codes, too (we noticed one on perlbmk from SPEC  
CPU2000),
due to stop bits being placed in accordance with resource availability, but
then linker ruining scheduler's plans by changing some "ld8" insns into  
nops
during address relaxation.

The new behaviour is controlled with back-end specific
-msched-stop-bits-after-every-cycle option and is enabled by default.

The second, third and fourth patches change the way true memory  
dependencies are
handled in the scheduler.  Their cumulative effect is that IA-64 backend
notices true memory dependencies, and changes their cost to zero, for  
those that
are likely not to alias and operate on floating-point data (thus allowing  
such
insns to be placed in a single instruction group), or to special higher  
value,
for those that are likely to alias (to avoid conflicts in L2). The aliasing
likeliness estimation comes from heuristic implemented for speculation  
support,
which is based on comparing base registers and addressing modes.

These new heuristics are controlled by corresponding option and param
(-msched-fp-mem-deps-zero-cost and sched-mem-true-dep-cost, respectively),  
and
are enabled by default.

The first two patches will be committed by Andrey Belevantsev to the  
sel-sched
branch. The other two will be committed later with speculation and  
pipelining of
outer loops support.

2007-05-22  Dmitry Zhurikhin  <zhur@ispras.ru>

         * config/ia64/ia64.c (scheduled_good_insn): New static helper  
function.
         (ia64_dfa_new_cycle): Mark that a stop bit should be before current
         insn if there was a cycle advance.
         (final_emit_insn_group_barriers): Emit stop bits before insns  
starting
         a new cycle.
         * config/ia64/ia64.opt (msched-stop-bits-after-every-cycle): New
         target flag.

2007-05-22  Alexander Monakov  <amonakov@ispras.ru>

         * sel-sched-ir.c (tick_check_dep_with_dw): Rename from
         tick_check_note_dep, ignore cost for true memory dependencies that  
are
         not likely to alias.
         (tick_check_note_dep): New implementation.
         (tick_check_note_mem_dep): Calculate dependence weight.
         * common.opt (fsel-sched-mem-deps-zero-cost): New option.
         * sched-deps.c (estimate_dep_weak): Remove static qualifier.
         * sched-deps.h (estimate_dep_weak): Declare.

2007-05-22  Alexander Monakov  <amonakov@ispras.ru>

         * sel-sched-ir.c (tick_check_dep_with_dw): Use special higher cost  
for
         memory dependencies that are likely to alias.
         (tick_check_note_dep, tick_check_note_mem_dep): Pass 0 to
         tick_check_dep_with_dw to indicate non-memory or non-true memory
         dependencies.
         * params.def (PARAM_SELSCHED_MEM_TRUE_DEP_COST): New parameter.
         * config/ia64/ia64.c (ia64_optimization_options): Set it to 4.

2007-05-22  Alexander Monakov  <amonakov@ispras.ru>

         * target.h (struct gcc_target): Add new parameter to adjust_cost_2
         declaration.
         * haifa-sched.c (dep_cost_1): Add new parameter into declaration,  
pass
         it to adjust_cost_2.
         (insn_cost1): Update.
         (dep_cost): Add new parameter into declaration, pass it to  
dep_cost_1.
         * sel-sched-ir.c (tick_check_dep_with_dw): Remove special memory
         dependencies handling.
         * common.opt (fsel-sched-mem-deps-zero-cost): Remove option.
         * sched-int.h (dep_cost): Move declaration.
         * params.def (PARAM_SCHED_MEM_TRUE_DEP_COST): Rename from
         PARAM_SELSCHED_MEM_TRUE_DEP_COST.
         * config/ia64/ia64.opt (msched-fp-mem-deps-zero-cost): New option.
         * config/ia64/ia64.c (ia64_optimization_options): Rename parameter.
         (ia64_adjust_cost_2): Add new parameter into declaration, add  
special
         memory dependencies handling.

[-- Attachment #2: stopbits.patch.txt --]
[-- Type: text/plain, Size: 4101 bytes --]

--- gcc-local/sel-sched-dev/gcc/config/ia64/ia64.opt	(revision 27186)
+++ gcc-local/sel-sched-dev/gcc/config/ia64/ia64.opt	(revision 27187)
@@ -144,6 +144,10 @@ msched-count-spec-in-critical-path
 Common Report Var(mflag_sched_count_spec_in_critical_path) Init(0)
 Count speculative dependencies while calculating priority of instructions
 
+msched-stop-bits-after-every-cycle
+Target Report Var(mflag_sched_stop_bits_after_every_cycle) Init(1)
+Place a stop bit after every cycle when scheduling
+
 msel-sched-renaming
 Common Report Var(mflag_sel_sched_renaming) Init(1)
 Do register renaming in selective scheduling
--- gcc-local/sel-sched-dev/gcc/config/ia64/ia64.c	(revision 27186)
+++ gcc-local/sel-sched-dev/gcc/config/ia64/ia64.c	(revision 27187)
@@ -6587,6 +6587,24 @@ ia64_first_cycle_multipass_dfa_lookahead
 
 static rtx dfa_pre_cycle_insn;
 
+/* Returns 1 when a meaningful insn was scheduled between the last group
+   barrier and LAST.  */
+static int
+scheduled_good_insn (rtx last)
+{
+  if (last && recog_memoized (last) >= 0)
+    return 1;
+
+  for ( ;
+       last != NULL && !NOTE_INSN_BASIC_BLOCK_P (last)
+       && !stops_p[INSN_UID (last)];
+       last = PREV_INSN (last))
+    if (recog_memoized (last) >= 0)
+      return 1;
+
+  return 0;
+}
+
 /* We are about to being issuing INSN.  Return nonzero if we cannot
    issue it on given cycle CLOCK and return zero if we should not sort
    the ready queue on the next clock start.  */
@@ -6603,7 +6621,12 @@ ia64_dfa_new_cycle (FILE *dump, int verb
   gcc_assert (!(reload_completed && safe_group_barrier_needed (insn))
               || last_scheduled_insn);
 
-  if ((reload_completed && safe_group_barrier_needed (insn))
+  if ((reload_completed
+       && (safe_group_barrier_needed (insn)
+	   || (mflag_sched_stop_bits_after_every_cycle
+	       && last_clock != clock
+	       && last_scheduled_insn
+	       && scheduled_good_insn (last_scheduled_insn))))
       || (last_scheduled_insn
 	  && (GET_CODE (last_scheduled_insn) == CALL_INSN
 	      || GET_CODE (PATTERN (last_scheduled_insn)) == ASM_INPUT
@@ -8323,6 +8346,7 @@ final_emit_insn_group_barriers (FILE *du
 {
   rtx insn;
   int need_barrier_p = 0;
+  int seen_good_insn = 0;
   rtx prev_insn = NULL_RTX;
 
   init_insn_group_barriers ();
@@ -8344,6 +8368,7 @@ final_emit_insn_group_barriers (FILE *du
 	    emit_insn_after (gen_insn_group_barrier (GEN_INT (3)), last);
 
 	  init_insn_group_barriers ();
+	  seen_good_insn = 0;
 	  need_barrier_p = 0;
 	  prev_insn = NULL_RTX;
 	}
@@ -8352,10 +8377,14 @@ final_emit_insn_group_barriers (FILE *du
 	  if (recog_memoized (insn) == CODE_FOR_insn_group_barrier)
 	    {
 	      init_insn_group_barriers ();
+	      seen_good_insn = 0;
 	      need_barrier_p = 0;
 	      prev_insn = NULL_RTX;
 	    }
-	  else if (need_barrier_p || group_barrier_needed (insn))
+	  else if (need_barrier_p || group_barrier_needed (insn)
+		   || (mflag_sched_stop_bits_after_every_cycle
+		       && GET_MODE (insn) == TImode
+		       && seen_good_insn))
 	    {
 	      if (TARGET_EARLY_STOP_BITS)
 		{
@@ -8379,19 +8408,29 @@ final_emit_insn_group_barriers (FILE *du
 		       last != insn;
 		       last = NEXT_INSN (last))
 		    if (INSN_P (last))
-		      group_barrier_needed (last);
+		      {
+			group_barrier_needed (last);
+			if (recog_memoized (last) >= 0)
+			  seen_good_insn = 1;
+		      }
 		}
 	      else
 		{
 		  emit_insn_before (gen_insn_group_barrier (GEN_INT (3)),
 				    insn);
 		  init_insn_group_barriers ();
+		  seen_good_insn = 0;
 		}
               group_barrier_needed (insn);
+	      if (recog_memoized (insn) >= 0)
+		seen_good_insn = 1;
 	      prev_insn = NULL_RTX;
 	    }
 	  else if (recog_memoized (insn) >= 0)
-	    prev_insn = insn;
+	    {
+	      prev_insn = insn;
+	      seen_good_insn = 1;
+	    }
 	  need_barrier_p = (GET_CODE (insn) == CALL_INSN
 			    || GET_CODE (PATTERN (insn)) == ASM_INPUT
 			    || asm_noperands (PATTERN (insn)) >= 0);

[-- Attachment #3: memcost-1.patch.txt --]
[-- Type: text/plain, Size: 3854 bytes --]

--- gcc-local/sel-sched-dev/gcc/sel-sched-ir.c	(revision 27188)
+++ gcc-local/sel-sched-dev/gcc/sel-sched-ir.c	(revision 27189)
@@ -1792,14 +1792,14 @@ static int tick_check_cycle;
 /* Whether we have seen a true dependence while checking.  */
 static int tick_check_seen_true_dep;
 
-/* An implementation of note_dep hook.  Update minimal scheduling cycle 
-   for tick_check_insn given that it depends on PRO with status DS.  */
+/* Update minimal scheduling cycle for tick_check_insn given that it depends
+   on PRO with status DS and weight DW.  */
 static void
-tick_check_note_dep (insn_t pro, ds_t ds)
+tick_check_dep_with_dw (insn_t pro, ds_t ds, dw_t dw)
 {
   insn_t con;
   enum reg_note dt;
-  int tick;
+  int tick, dc;
 
   con = tick_check_insn;
 
@@ -1824,7 +1824,15 @@ tick_check_note_dep (insn_t pro, ds_t ds
       if (dt == REG_DEP_TRUE)
         tick_check_seen_true_dep = 1;
 
-      tick = INSN_SCHED_CYCLE (pro) + dep_cost (pro, dt, con);
+      if (flag_sel_sched_mem_deps_zero_cost
+	  && dt == REG_DEP_TRUE 
+	  && dw != MIN_DEP_WEAK)
+	dc = 0;
+      else
+	dc = dep_cost (pro, dt, con);
+
+      tick = INSN_SCHED_CYCLE (pro) + dc;
+
       /* When there are several kinds of dependencies between pro and con,
          only REG_DEP_TRUE should be taken into account.  */
       if (tick > tick_check_cycle && (dt == REG_DEP_TRUE 
@@ -1833,12 +1841,24 @@ tick_check_note_dep (insn_t pro, ds_t ds
     }
 }
 
+/* An implementation of note_dep hook.	*/
+static void
+tick_check_note_dep (insn_t pro, ds_t ds)
+{
+  tick_check_dep_with_dw (pro, ds, MIN_DEP_WEAK);
+}
+
 /* An implementation of note_mem_dep hook.  */
 static void
-tick_check_note_mem_dep (rtx mem1 ATTRIBUTE_UNUSED, rtx mem2 ATTRIBUTE_UNUSED,
-			 insn_t pro, ds_t ds)
+tick_check_note_mem_dep (rtx mem1, rtx mem2, insn_t pro, ds_t ds)
 {
-  tick_check_note_dep (pro, ds);
+  dw_t dw;
+
+  dw = (ds_to_dt (ds) == REG_DEP_TRUE
+	? estimate_dep_weak (mem1, mem2)
+	: MIN_DEP_WEAK);
+
+  tick_check_dep_with_dw (pro, ds, dw);
 }
 
 static struct sched_deps_info_def tick_check_deps_info =
--- gcc-local/sel-sched-dev/gcc/common.opt	(revision 27188)
+++ gcc-local/sel-sched-dev/gcc/common.opt	(revision 27189)
@@ -846,6 +846,10 @@ fsel-sched-substitution
 Common Report Var(flag_sel_sched_substitution) Init(0)
 Perform substitution in selective scheduling
 
+fsel-sched-mem-deps-zero-cost
+Common Report Var(flag_sel_sched_mem_deps_zero_cost) Init(1)
+Set the cost of may alias true mem deps to zero
+
 fsel-sched-verbose
 Common Report Var(flag_sel_sched_verbose) Init(0)
 Be verbose when running selective scheduling
--- gcc-local/sel-sched-dev/gcc/sched-deps.c	(revision 27188)
+++ gcc-local/sel-sched-dev/gcc/sched-deps.c	(revision 27189)
@@ -122,7 +122,6 @@ static void add_back_dep (rtx, rtx, enum
 static void adjust_add_sorted_back_dep (rtx, rtx, rtx *);
 static void adjust_back_add_forw_dep (rtx, rtx *);
 static void delete_forw_dep (rtx, rtx);
-static dw_t estimate_dep_weak (rtx, rtx);
 #ifdef INSN_SCHEDULING
 #ifdef ENABLE_CHECKING
 static void check_dep_status (enum reg_note, ds_t, bool);
@@ -2250,7 +2249,7 @@ delete_forw_dep (rtx insn, rtx elem)
 }
 
 /* Estimate the weakness of dependence between MEM1 and MEM2.  */
-static dw_t
+dw_t
 estimate_dep_weak (rtx mem1, rtx mem2)
 {
   rtx r1, r2;
--- gcc-local/sel-sched-dev/gcc/sched-deps.h	(revision 27188)
+++ gcc-local/sel-sched-dev/gcc/sched-deps.h	(revision 27189)
@@ -153,6 +153,7 @@ extern void add_back_forw_dep (rtx, rtx,
 extern void delete_back_forw_dep (rtx, rtx);
 extern dw_t get_dep_weak (ds_t, ds_t);
 extern ds_t set_dep_weak (ds_t, ds_t, dw_t);
+extern dw_t estimate_dep_weak (rtx, rtx);
 extern ds_t ds_merge (ds_t, ds_t);
 
 extern void sched_deps_local_init (bool);

[-- Attachment #4: memcost-2.patch.txt --]
[-- Type: text/plain, Size: 2612 bytes --]

--- gcc-local/sel-sched-dev/gcc/sel-sched-ir.c	(revision 27230)
+++ gcc-local/sel-sched-dev/gcc/sel-sched-ir.c	(revision 27231)
@@ -1851,12 +1851,24 @@ tick_check_dep_with_dw (insn_t pro, ds_t
       if (dt == REG_DEP_TRUE)
         tick_check_seen_true_dep = 1;
 
-      if (flag_sel_sched_mem_deps_zero_cost
-          && dt == REG_DEP_TRUE 
-          && dw != MIN_DEP_WEAK)
-        dc = 0;
-      else
-        dc = dep_cost (pro, dt, con);
+      /* Adjust cost depending on dependency kind.  */
+      switch (dw)
+	{
+	case 0:
+	  /* Not a true memory dependence, use default cost.  */
+	  dc = dep_cost (pro, dt, con);
+	  break;
+	case MIN_DEP_WEAK:
+	  /* Store and load are likely to alias, use higher cost to avoid stall.  */
+	  dc = PARAM_VALUE (PARAM_SELSCHED_MEM_TRUE_DEP_COST);
+	  break;
+	default:
+	  /* Store and load are likely to be independent.  */
+	  if (flag_sel_sched_mem_deps_zero_cost)
+	    dc = 0;
+	  else
+	    dc = dep_cost (pro, dt, con);
+	}
 
       tick = INSN_SCHED_CYCLE (pro) + dc;
 
@@ -1872,7 +1884,7 @@ tick_check_dep_with_dw (insn_t pro, ds_t
 static void
 tick_check_note_dep (insn_t pro, ds_t ds)
 {
-  tick_check_dep_with_dw (pro, ds, MIN_DEP_WEAK);
+  tick_check_dep_with_dw (pro, ds, 0);
 }
 
 /* An implementation of note_mem_dep hook.  */
@@ -1883,7 +1895,7 @@ tick_check_note_mem_dep (rtx mem1, rtx m
 
   dw = (ds_to_dt (ds) == REG_DEP_TRUE
         ? estimate_dep_weak (mem1, mem2)
-        : MIN_DEP_WEAK);
+        : 0);
 
   tick_check_dep_with_dw (pro, ds, dw);
 }
--- gcc-local/sel-sched-dev/gcc/config/ia64/ia64.c	(revision 27230)
+++ gcc-local/sel-sched-dev/gcc/config/ia64/ia64.c	(revision 27231)
@@ -9999,6 +9999,7 @@ ia64_optimization_options (int level ATT
   set_param_value ("simultaneous-prefetches", 6);
   set_param_value ("l1-cache-line-size", 32);
 
+  set_param_value("selsched-mem-true-dep-cost", 4);
 }
 
 /* HP-UX version_id attribute.
--- gcc-local/sel-sched-dev/gcc/params.def	(revision 27230)
+++ gcc-local/sel-sched-dev/gcc/params.def	(revision 27231)
@@ -552,6 +552,14 @@ DEFPARAM(PARAM_SELSCHED_MAX_LOOKAHEAD,
          "The maximum size of the lookahead window of selective scheduling",
          32, 0, 0)
 
+/* Minimal distance (in CPU cycles) between store and load targeting same
+   memory locations.  */
+
+DEFPARAM (PARAM_SELSCHED_MEM_TRUE_DEP_COST,
+	  "selsched-mem-true-dep-cost",
+	  "Minimal distance between overlapping store and load",
+	  1, 0, 0)
+
 DEFPARAM(PARAM_ALLOW_START,
          "allow_start",
          "Allow something",

[-- Attachment #5: memcost-3.patch.txt --]
[-- Type: text/plain, Size: 9093 bytes --]

--- gcc-local/sel-sched-dev/gcc/target.h	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/target.h	(revision 27673)
@@ -339,7 +339,7 @@ struct gcc_target
     /* Given the current cost, COST, of an insn, INSN, calculate and
        return a new cost based on its relationship to DEP_INSN through the
        dependence of type DEP_TYPE.  The default is to make no adjustment.  */
-    int (* adjust_cost_2) (rtx insn, int, rtx def_insn, int cost);
+    int (* adjust_cost_2) (rtx insn, int, rtx def_insn, int cost, int dw);
 
     /* The following member value is a pointer to a function called
        by the insn scheduler. This hook is called to notify the backend
--- gcc-local/sel-sched-dev/gcc/haifa-sched.c	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/haifa-sched.c	(revision 27673)
@@ -495,7 +495,7 @@ haifa_classify_insn (rtx insn)
 /* Forward declarations.  */
 
 HAIFA_INLINE static int insn_cost1 (rtx, enum reg_note, rtx, rtx);
-static int dep_cost_1 (rtx, enum reg_note, rtx, rtx, int);
+static int dep_cost_1 (rtx, enum reg_note, dw_t, rtx, rtx, int);
 static int priority (rtx);
 static int rank_for_schedule (const void *, const void *);
 static void swap_sort (rtx *, int);
@@ -644,12 +644,12 @@ insn_cost1 (rtx insn, enum reg_note dep_
   if (used == 0)
     return cost;
 
-  return dep_cost_1 (insn, dep_type, link, used, cost);
+  return dep_cost_1 (insn, dep_type, 0, link, used, cost);
 }
 
 /* Compute the cost of the INSN given the dependence attributes.  */
 static int
-dep_cost_1 (rtx insn, enum reg_note dep_type, rtx link, rtx used, int cost)
+dep_cost_1 (rtx insn, enum reg_note dep_type, dw_t dw, rtx link, rtx used, int cost)
 {
   /* A USE insn should never require the value used to be computed.
      This allows the computation of a function's result and parameter
@@ -678,7 +678,8 @@ dep_cost_1 (rtx insn, enum reg_note dep_
 	}
 
       if (targetm.sched.adjust_cost_2)
-	cost = targetm.sched.adjust_cost_2 (used, (int) dep_type, insn, cost);
+	cost = targetm.sched.adjust_cost_2 (used, (int) dep_type, insn, cost,
+					    dw);
       else
 	{
           if (!link && targetm.sched.adjust_cost)
@@ -704,9 +705,9 @@ dep_cost_1 (rtx insn, enum reg_note dep_
 
 /* A convenience wrapper.  */
 int
-dep_cost (rtx pro, enum reg_note dt, rtx con)
+dep_cost (rtx pro, enum reg_note dt, dw_t dw, rtx con)
 {
-  return dep_cost_1 (pro, dt, NULL, con, -1);
+  return dep_cost_1 (pro, dt, dw, NULL, con, -1);
 }
 
 /* Compute the priority number for INSN.
--- gcc-local/sel-sched-dev/gcc/sel-sched-ir.c	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/sel-sched-ir.c	(revision 27673)
@@ -1826,7 +1826,7 @@ tick_check_dep_with_dw (insn_t pro, ds_t
 {
   insn_t con;
   enum reg_note dt;
-  int tick, dc;
+  int tick;
 
   con = tick_check_insn;
 
@@ -1846,35 +1846,16 @@ tick_check_dep_with_dw (insn_t pro, ds_t
 	}
 
       gcc_assert (INSN_SCHED_CYCLE (pro) > 0);
-      
+
       dt = ds_to_dt (ds);
       if (dt == REG_DEP_TRUE)
         tick_check_seen_true_dep = 1;
 
-      /* Adjust cost depending on dependency kind.  */
-      switch (dw)
-	{
-	case 0:
-	  /* Not a true memory dependence, use default cost.  */
-	  dc = dep_cost (pro, dt, con);
-	  break;
-	case MIN_DEP_WEAK:
-	  /* Store and load are likely to alias, use higher cost to avoid stall.  */
-	  dc = PARAM_VALUE (PARAM_SELSCHED_MEM_TRUE_DEP_COST);
-	  break;
-	default:
-	  /* Store and load are likely to be independent.  */
-	  if (flag_sel_sched_mem_deps_zero_cost)
-	    dc = 0;
-	  else
-	    dc = dep_cost (pro, dt, con);
-	}
-
-      tick = INSN_SCHED_CYCLE (pro) + dc;
+      tick = INSN_SCHED_CYCLE (pro) + dep_cost (pro, dt, dw, con);
 
       /* When there are several kinds of dependencies between pro and con,
          only REG_DEP_TRUE should be taken into account.  */
-      if (tick > tick_check_cycle && (dt == REG_DEP_TRUE 
+      if (tick > tick_check_cycle && (dt == REG_DEP_TRUE
                                       || !tick_check_seen_true_dep))
 	tick_check_cycle = tick;
     }
--- gcc-local/sel-sched-dev/gcc/common.opt	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/common.opt	(revision 27673)
@@ -850,10 +850,6 @@ fsel-sched-substitution
 Common Report Var(flag_sel_sched_substitution) Init(0)
 Perform substitution in selective scheduling
 
-fsel-sched-mem-deps-zero-cost
-Common Report Var(flag_sel_sched_mem_deps_zero_cost) Init(1)
-Set the cost of may alias true mem deps to zero
-
 fsel-sched-verbose
 Common Report Var(flag_sel_sched_verbose) Init(0)
 Be verbose when running selective scheduling
--- gcc-local/sel-sched-dev/gcc/sched-int.h	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/sched-int.h	(revision 27673)
@@ -272,7 +272,6 @@ enum INSN_TRAP_CLASS
 extern size_t dfa_state_size;
 
 extern void advance_state (state_t);
-extern int dep_cost (rtx, enum reg_note, rtx);
 
 extern void sched_init (void);
 extern void sched_finish (void);
@@ -735,6 +734,7 @@ enum SPEC_SCHED_FLAGS {
 #endif
 
 /* Functions in haifa-sched.c.  */
+extern int dep_cost (rtx, enum reg_note, dw_t, rtx);
 extern int no_real_insns_p (rtx, rtx);
 
 extern int insn_cost (rtx, rtx, rtx);
--- gcc-local/sel-sched-dev/gcc/config/ia64/ia64.opt	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/config/ia64/ia64.opt	(revision 27673)
@@ -148,6 +148,10 @@ msched-stop-bits-after-every-cycle
 Target Report Var(mflag_sched_stop_bits_after_every_cycle) Init(1)
 Place a stop bit after every cycle when scheduling
 
+msched-fp-mem-deps-zero-cost
+Target Report Var(mflag_sched_fp_mem_deps_zero_cost) Init(1)
+Assume that floating-point stores and loads are not likely to cause conflict when placed into one instruction group
+
 msel-sched-renaming
 Common Report Var(mflag_sel_sched_renaming) Init(1)
 Do register renaming in selective scheduling
--- gcc-local/sel-sched-dev/gcc/config/ia64/ia64.c	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/config/ia64/ia64.c	(revision 27673)
@@ -218,7 +218,7 @@ static void ia64_output_function_epilogu
 static void ia64_output_function_end_prologue (FILE *);
 
 static int ia64_issue_rate (void);
-static int ia64_adjust_cost_2 (rtx, int, rtx, int);
+static int ia64_adjust_cost_2 (rtx, int, rtx, int, dw_t);
 static void ia64_sched_init (FILE *, int, int);
 static void ia64_sched_init_global (FILE *, int, int);
 static void ia64_sched_finish_global (FILE *, int);
@@ -6293,20 +6293,37 @@ ia64_single_set (rtx insn)
 
 /* Adjust the cost of a scheduling dependency.
    Return the new cost of a dependency of type DEP_TYPE or INSN on DEP_INSN.
-   COST is the current cost.  */
+   COST is the current cost, DW is dependency weakness.  */
 
 static int
-ia64_adjust_cost_2 (rtx insn, int dep_type1, rtx dep_insn, int cost)
+ia64_adjust_cost_2 (rtx insn, int dep_type1, rtx dep_insn, int cost, dw_t dw)
 {
   enum reg_note dep_type = (enum reg_note) dep_type1;
   enum attr_itanium_class dep_class;
   enum attr_itanium_class insn_class;
 
+  insn_class = ia64_safe_itanium_class (insn);
+  dep_class = ia64_safe_itanium_class (dep_insn);
+
+  /* Treat true memory dependencies separately.  */
+  if (dw == MIN_DEP_WEAK)
+    /* Store and load are likely to alias, use higher cost to avoid stall.  */
+    return PARAM_VALUE (PARAM_SCHED_MEM_TRUE_DEP_COST);
+  else if (dw > MIN_DEP_WEAK)
+    {
+      /* Store and load are less likely to alias.  */
+      if (mflag_sched_fp_mem_deps_zero_cost && dep_class == ITANIUM_CLASS_STF)
+	/* Assume there will be no cache conflict for floating-point data.
+	   For integer data, L1 conflict penalty is huge (17 cycles), so we
+	   never assume it will not cause a conflict.  */
+	return 0;
+      else
+	return cost;
+    }
+
   if (dep_type != REG_DEP_OUTPUT)
     return cost;
 
-  insn_class = ia64_safe_itanium_class (insn);
-  dep_class = ia64_safe_itanium_class (dep_insn);
   if (dep_class == ITANIUM_CLASS_ST || dep_class == ITANIUM_CLASS_STF
       || insn_class == ITANIUM_CLASS_ST || insn_class == ITANIUM_CLASS_STF)
     return 0;
@@ -9999,7 +10016,7 @@ ia64_optimization_options (int level ATT
   set_param_value ("simultaneous-prefetches", 6);
   set_param_value ("l1-cache-line-size", 32);
 
-  set_param_value("selsched-mem-true-dep-cost", 4);
+  set_param_value("sched-mem-true-dep-cost", 4);
 }
 
 /* HP-UX version_id attribute.
--- gcc-local/sel-sched-dev/gcc/params.def	(revision 27672)
+++ gcc-local/sel-sched-dev/gcc/params.def	(revision 27673)
@@ -555,9 +555,9 @@ DEFPARAM(PARAM_SELSCHED_MAX_LOOKAHEAD,
 /* Minimal distance (in CPU cycles) between store and load targeting same
    memory locations.  */
 
-DEFPARAM (PARAM_SELSCHED_MEM_TRUE_DEP_COST,
-	  "selsched-mem-true-dep-cost",
-	  "Minimal distance between overlapping store and load",
+DEFPARAM (PARAM_SCHED_MEM_TRUE_DEP_COST,
+	  "sched-mem-true-dep-cost",
+	  "Minimal distance between possibly conflicting store and load",
 	  1, 0, 0)
 
 DEFPARAM(PARAM_ALLOW_START,

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [sel-sched] IA-64 back-end improvements and tweaks
  2007-05-22 11:41 [sel-sched] IA-64 back-end improvements and tweaks Alexander Monakov
@ 2007-05-23  9:25 ` Andrey Belevantsev
  0 siblings, 0 replies; 2+ messages in thread
From: Andrey Belevantsev @ 2007-05-23  9:25 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches.gcc.gnu.org

Alexander Monakov wrote:
> The first two patches will be committed by Andrey Belevantsev to the
>  sel-sched branch. The other two will be committed later with
> speculation and pipelining of outer loops support.
I have committed the last two patches after the pipelining patch.  Thanks!

Andrey

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-05-23  9:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-22 11:41 [sel-sched] IA-64 back-end improvements and tweaks Alexander Monakov
2007-05-23  9:25 ` Andrey Belevantsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).