public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/3] genautomata: Optimise some more
  2009-12-04 18:46 [PATCH 1/3] Speed up and fix genautomata Segher Boessenkool
@ 2009-12-04 18:46 ` Segher Boessenkool
  2009-12-04 19:05   ` [PATCH 3/3] Speed up genattrtab Segher Boessenkool
                     ` (2 more replies)
       [not found] ` <143525932d64d7c11e6cae645887c14da120c761.1259952302.git.segher@kernel      .crashing.org>
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 12+ messages in thread
From: Segher Boessenkool @ 2009-12-04 18:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

Simplify check_presence_pattern_sets(), check_absence_pattern_sets(),
get_excl_set().

2009-12-04  Segher Boessenkool  <segher@kernel.crashing.org>
	genautomata.c (get_excl_set): Do work per element, not per char.
	(check_presence_pattern_sets): Similar.
	(check_absence_pattern_sets): Similar.
---
 gcc/genautomata.c |   50 ++++++++++++++++++++++----------------------------
 1 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index 9cbe0f7..1f24499 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -4174,20 +4174,18 @@ initiate_excl_sets (void)
 static reserv_sets_t
 get_excl_set (reserv_sets_t in_set)
 {
-  int excl_char_num;
-  int chars_num;
-  int i;
+  int el;
+  unsigned int i;
   int start_unit_num;
   int unit_num;
 
-  chars_num = els_in_cycle_reserv * sizeof (set_el_t);
-  memset (excl_set, 0, chars_num);
-  for (excl_char_num = 0; excl_char_num < chars_num; excl_char_num++)
-    if (((unsigned char *) in_set) [excl_char_num])
-      for (i = CHAR_BIT - 1; i >= 0; i--)
-	if ((((unsigned char *) in_set) [excl_char_num] >> i) & 1)
+  memset (excl_set, 0, els_in_cycle_reserv * sizeof (set_el_t));
+  for (el = 0; el < els_in_cycle_reserv; el++)
+    if (in_set[el])
+      for (i = 0; i < CHAR_BIT * sizeof (set_el_t); i++)
+	if ((in_set[el] >> i) & 1)
 	  {
-	    start_unit_num = excl_char_num * CHAR_BIT + i;
+	    start_unit_num = el * CHAR_BIT * sizeof (set_el_t) + i;
 	    if (start_unit_num >= description->units_num)
 	      return excl_set;
 	    for (unit_num = 0; unit_num < els_in_cycle_reserv; unit_num++)
@@ -4286,21 +4284,19 @@ check_presence_pattern_sets (reserv_sets_t checked_set,
 			     reserv_sets_t original_set,
 			     int final_p)
 {
-  int char_num;
-  int chars_num;
-  int i;
+  int el;
+  unsigned int i;
   int start_unit_num;
   int unit_num;
   int presence_p;
   pattern_reserv_t pat_reserv;
 
-  chars_num = els_in_cycle_reserv * sizeof (set_el_t);
-  for (char_num = 0; char_num < chars_num; char_num++)
-    if (((unsigned char *) original_set) [char_num])
-      for (i = CHAR_BIT - 1; i >= 0; i--)
-	if ((((unsigned char *) original_set) [char_num] >> i) & 1)
+  for (el = 0; el < els_in_cycle_reserv; el++)
+    if (original_set[el])
+      for (i = 0; i < CHAR_BIT * sizeof (set_el_t); i++)
+	if ((original_set[el] >> i) & 1)
 	  {
-	    start_unit_num = char_num * CHAR_BIT + i;
+	    start_unit_num = el * CHAR_BIT * sizeof (set_el_t) + i;
 	    if (start_unit_num >= description->units_num)
 	      break;
 	    if ((final_p
@@ -4335,20 +4331,18 @@ check_absence_pattern_sets (reserv_sets_t checked_set,
 			    reserv_sets_t original_set,
 			    int final_p)
 {
-  int char_num;
-  int chars_num;
-  int i;
+  int el;
+  unsigned int i;
   int start_unit_num;
   int unit_num;
   pattern_reserv_t pat_reserv;
 
-  chars_num = els_in_cycle_reserv * sizeof (set_el_t);
-  for (char_num = 0; char_num < chars_num; char_num++)
-    if (((unsigned char *) original_set) [char_num])
-      for (i = CHAR_BIT - 1; i >= 0; i--)
-	if ((((unsigned char *) original_set) [char_num] >> i) & 1)
+  for (el = 0; el < els_in_cycle_reserv; el++)
+    if (original_set[el])
+      for (i = 0; i < CHAR_BIT * sizeof (set_el_t); i++)
+	if ((original_set[el] >> i) & 1)
 	  {
-	    start_unit_num = char_num * CHAR_BIT + i;
+	    start_unit_num = el * CHAR_BIT * sizeof (set_el_t) + i;
 	    if (start_unit_num >= description->units_num)
 	      break;
 	    for (pat_reserv = (final_p
-- 
1.6.5.2.154.g549c

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] Speed up and fix genautomata
@ 2009-12-04 18:46 Segher Boessenkool
  2009-12-04 18:46 ` [PATCH 2/3] genautomata: Optimise some more Segher Boessenkool
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Segher Boessenkool @ 2009-12-04 18:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

The algoritm used to find min_issue_delay_vect is incorrect; it
will not always consider paths that take more steps to arrive at
the same node.

Instead, use a simple standard breadth-first algorithm, that is a
whole lot faster as well, especially on big complex automata.

2009-12-04  Segher Boessenkool  <segher@kernel.crashing.org>

	* genautomata.c (curr_state_pass_num): Delete.
	(min_issue_delay_pass_states): Delete.
	(min_issue_delay): Delete.
	(initiate_min_issue_delay_pass_states): Delete.
	(output_min_issue_delay_table): Compute min_issue_delay_vect
	using a breadth-first search variant.
	(output_tables): Don't call initiate_min_issue_delay_pass_states.
---
 gcc/genautomata.c |  156 +++++++++++++++++++++++++----------------------------
 1 files changed, 74 insertions(+), 82 deletions(-)

diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index f332141..9cbe0f7 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -7488,72 +7488,6 @@ output_trans_table (automaton_t automaton)
   VEC_free (vect_el_t, heap, transition_vect);
 }
 
-/* The current number of passing states to find minimal issue delay
-   value for an ainsn and state.  */
-static int curr_state_pass_num;
-
-/* This recursive function passes states to find minimal issue delay
-   value for AINSN.  The state being visited is STATE.  The function
-   returns minimal issue delay value for AINSN in STATE or -1 if we
-   enter into a loop.  */
-static int
-min_issue_delay_pass_states (state_t state, ainsn_t ainsn)
-{
-  arc_t arc;
-  int min_insn_issue_delay, insn_issue_delay;
-
-  if (state->state_pass_num == curr_state_pass_num
-      || state->min_insn_issue_delay != -1)
-    /* We've entered into a loop or already have the correct value for
-       given state and ainsn.  */
-    return state->min_insn_issue_delay;
-  state->state_pass_num = curr_state_pass_num;
-  min_insn_issue_delay = -1;
-  for (arc = first_out_arc (state); arc != NULL; arc = next_out_arc (arc))
-    if (arc->insn == ainsn)
-      {
-	min_insn_issue_delay = 0;
-	break;
-      }
-    else
-      {
-        insn_issue_delay = min_issue_delay_pass_states (arc->to_state, ainsn);
-	if (insn_issue_delay != -1)
-	  {
-	    if (arc->insn->insn_reserv_decl
-		== DECL_INSN_RESERV (advance_cycle_insn_decl))
-	      insn_issue_delay++;
-	    if (min_insn_issue_delay == -1
-		|| min_insn_issue_delay > insn_issue_delay)
-	      {
-		min_insn_issue_delay = insn_issue_delay;
-		if (insn_issue_delay == 0)
-		  break;
-	      }
-	  }
-      }
-  return min_insn_issue_delay;
-}
-
-/* The function searches minimal issue delay value for AINSN in STATE.
-   The function can return negative value if we can not issue AINSN.  We
-   will report about it later.  */
-static int
-min_issue_delay (state_t state, ainsn_t ainsn)
-{
-  curr_state_pass_num++;
-  state->min_insn_issue_delay = min_issue_delay_pass_states (state, ainsn);
-  return state->min_insn_issue_delay;
-}
-
-/* The function initiates code for finding minimal issue delay values.
-   It should be called only once.  */
-static void
-initiate_min_issue_delay_pass_states (void)
-{
-  curr_state_pass_num = 0;
-}
-
 /* Form and output vectors representing minimal issue delay table of
    AUTOMATON.  The table is state x ainsn -> minimal issue delay of
    the ainsn.  */
@@ -7562,11 +7496,11 @@ output_min_issue_delay_table (automaton_t automaton)
 {
   vla_hwint_t min_issue_delay_vect;
   vla_hwint_t compressed_min_issue_delay_vect;
-  vect_el_t min_delay;
   ainsn_t ainsn;
-  size_t i, min_issue_delay_len;
-  size_t compressed_min_issue_delay_len;
+  size_t i;
+  size_t min_issue_delay_len, compressed_min_issue_delay_len;
   size_t cfactor;
+  int changed;
 
   /* Create vect of pointers to states ordered by num of transitions
      from the state (state with the maximum num is the first).  */
@@ -7577,27 +7511,86 @@ output_min_issue_delay_table (automaton_t automaton)
 			 * automaton->insn_equiv_classes_num);
   min_issue_delay_vect = VEC_alloc (vect_el_t, heap, min_issue_delay_len);
   for (i = 0; i < min_issue_delay_len; i++)
-    VEC_quick_push (vect_el_t, min_issue_delay_vect, 0);
+    VEC_quick_push (vect_el_t, min_issue_delay_vect, -1);
 
   automaton->max_min_delay = 0;
-  for (ainsn = automaton->ainsn_list; ainsn != NULL; ainsn = ainsn->next_ainsn)
+
+  do
+    {
+      size_t state_no;
+
+      changed = 0;
+
+      for (state_no = 0; state_no < VEC_length (state_t, output_states_vect);
+           state_no++)
+	{
+	  state_t s = VEC_index (state_t, output_states_vect, state_no);
+	  arc_t arc;
+
+	  for (arc = first_out_arc (s); arc; arc = next_out_arc (arc))
+	    {
+	      int k;
+
+	      size_t asn = s->order_state_num
+	                   * automaton->insn_equiv_classes_num
+	                   + arc->insn->insn_equiv_class_num;
+
+	      if (VEC_index (vect_el_t, min_issue_delay_vect, asn))
+		{
+		  VEC_replace (vect_el_t, min_issue_delay_vect, asn, 0);
+		  changed = 1;
+		}
+
+	      for (k = 0; k < automaton->insn_equiv_classes_num; k++)
+		{
+		  size_t n0, n1;
+		  vect_el_t delay0, delay1;
+
+		  n0 = s->order_state_num
+		       * automaton->insn_equiv_classes_num
+		       + k;
+		  n1 = arc->to_state->order_state_num
+		       * automaton->insn_equiv_classes_num
+		       + k;
+		  delay0 = VEC_index (vect_el_t, min_issue_delay_vect, n0);
+		  delay1 = VEC_index (vect_el_t, min_issue_delay_vect, n1);
+		  if (delay1 != -1)
+		    {
+		      if (arc->insn->insn_reserv_decl
+		          == DECL_INSN_RESERV (advance_cycle_insn_decl))
+			delay1++;
+		      if (delay1 < delay0 || delay0 == -1)
+			{
+			  VEC_replace (vect_el_t, min_issue_delay_vect, n0, delay1);
+			  changed = 1;
+			}
+		    }
+		}
+	    }
+	}
+    }
+  while (changed);
+
+  automaton->max_min_delay = 0;
+
+  for (ainsn = automaton->ainsn_list; ainsn; ainsn = ainsn->next_ainsn)
     if (ainsn->first_ainsn_with_given_equivalence_num)
       {
 	for (i = 0; i < VEC_length (state_t, output_states_vect); i++)
-	  VEC_index (state_t, output_states_vect, i)->min_insn_issue_delay = -1;
-	for (i = 0; i < VEC_length (state_t, output_states_vect); i++)
 	  {
 	    state_t s = VEC_index (state_t, output_states_vect, i);
-            min_delay = min_issue_delay (s, ainsn);
-	    if (automaton->max_min_delay < min_delay)
-	      automaton->max_min_delay = min_delay;
-	    VEC_replace (vect_el_t, min_issue_delay_vect,
-			 s->order_state_num
-			 * automaton->insn_equiv_classes_num
-			 + ainsn->insn_equiv_class_num,
-			 min_delay);
+	    size_t np = s->order_state_num
+	                * automaton->insn_equiv_classes_num
+	                + ainsn->insn_equiv_class_num;
+	    vect_el_t x = VEC_index (vect_el_t, min_issue_delay_vect, np);
+
+	    if (automaton->max_min_delay < x)
+	      automaton->max_min_delay = x;
+	    if (x == -1)
+	      VEC_replace (vect_el_t, min_issue_delay_vect, np, 0);
 	  }
       }
+
   fprintf (output_file, "/* Vector of min issue delay of insns.  */\n");
   fprintf (output_file, "static const ");
   output_range_type (output_file, 0, automaton->max_min_delay);
@@ -7749,7 +7742,6 @@ output_tables (void)
 {
   automaton_t automaton;
 
-  initiate_min_issue_delay_pass_states ();
   for (automaton = description->first_automaton;
        automaton != NULL;
        automaton = automaton->next_automaton)
-- 
1.6.5.2.154.g549c

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] Speed up genattrtab
  2009-12-04 18:46 ` [PATCH 2/3] genautomata: Optimise some more Segher Boessenkool
@ 2009-12-04 19:05   ` Segher Boessenkool
  2009-12-06 22:07     ` Michael Matz
  2010-06-01 16:19   ` [PATCH] [REPOST] genautomata: Optimise some more Segher Boessenkool
  2010-06-02 22:05   ` Vladimir Makarov
  2 siblings, 1 reply; 12+ messages in thread
From: Segher Boessenkool @ 2009-12-04 19:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

write_attr_set: Don't update known_true based on previous cases.

Updating it is very costly.  Not updating it does not result in many
changes in the generated code.  Also, GCC should be able to do (most of)
this optimisation by itself, so doing it in the generator program
doesn't even help anything.

2009-12-04  Segher Boessenkool  <segher@kernel.crashing.org>

	* genattrtab.c (write_attr_set): Don't update known_true.
---
 gcc/genattrtab.c |   14 ++++----------
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index 1e5198a..5221e95 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -3725,8 +3725,6 @@ write_attr_set (struct attr_desc *attr, int indent, rtx value,
       /* Assume the default value will be the default of the COND unless we
 	 find an always true expression.  */
       rtx default_val = XEXP (value, 1);
-      rtx our_known_true = known_true;
-      rtx newexp;
       int first_if = 1;
       int i;
 
@@ -3735,17 +3733,14 @@ write_attr_set (struct attr_desc *attr, int indent, rtx value,
 	  rtx testexp;
 	  rtx inner_true;
 
-	  testexp = eliminate_known_true (our_known_true,
+	  testexp = eliminate_known_true (known_true,
 					  XVECEXP (value, 0, i),
 					  insn_code, insn_index);
-	  newexp = attr_rtx (NOT, testexp);
-	  newexp = insert_right_side (AND, our_known_true, newexp,
-				      insn_code, insn_index);
 
 	  /* If the test expression is always true or if the next `known_true'
 	     expression is always false, this is the last case, so break
 	     out and let this value be the `else' case.  */
-	  if (testexp == true_rtx || newexp == false_rtx)
+	  if (testexp == true_rtx)
 	    {
 	      default_val = XVECEXP (value, 0, i + 1);
 	      break;
@@ -3753,7 +3748,7 @@ write_attr_set (struct attr_desc *attr, int indent, rtx value,
 
 	  /* Compute the expression to pass to our recursive call as being
 	     known true.  */
-	  inner_true = insert_right_side (AND, our_known_true,
+	  inner_true = insert_right_side (AND, known_true,
 					  testexp, insn_code, insn_index);
 
 	  /* If this is always false, skip it.  */
@@ -3773,7 +3768,6 @@ write_attr_set (struct attr_desc *attr, int indent, rtx value,
 			  inner_true, insn_code, insn_index);
 	  write_indent (indent + 2);
 	  printf ("}\n");
-	  our_known_true = newexp;
 	}
 
       if (! first_if)
@@ -3785,7 +3779,7 @@ write_attr_set (struct attr_desc *attr, int indent, rtx value,
 	}
 
       write_attr_set (attr, first_if ? indent : indent + 4, default_val,
-		      prefix, suffix, our_known_true, insn_code, insn_index);
+		      prefix, suffix, known_true, insn_code, insn_index);
 
       if (! first_if)
 	{
-- 
1.6.5.2.154.g549c

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] Speed up and fix genautomata
       [not found] ` <143525932d64d7c11e6cae645887c14da120c761.1259952302.git.segher@kernel      .crashing.org>
@ 2009-12-04 20:41   ` Segher Boessenkool
  0 siblings, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2009-12-04 20:41 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Segher Boessenkool

I totally forgot to mention how these patches were tested.

I ran a C-only bootstrap and testsuite test on x86_64-linux;
no regressions.

I also build a cross-compiler for all 27 targets supported
by the Linux kernel (32-bit and 64-bit separate, where
applicable), and build the kernel with that; no new failures.

The runtimes of genattrtab and genautomata go from more than
a minute for some targets, to a few seconds.


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Speed up genattrtab
  2009-12-04 19:05   ` [PATCH 3/3] Speed up genattrtab Segher Boessenkool
@ 2009-12-06 22:07     ` Michael Matz
  2009-12-07 19:53       ` Segher Boessenkool
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Matz @ 2009-12-06 22:07 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

Hi,

On Fri, 4 Dec 2009, Segher Boessenkool wrote:

> write_attr_set: Don't update known_true based on previous cases.
> 
> Updating it is very costly.  Not updating it does not result in many
> changes in the generated code.

It does for some targets, and worse ...

> Also, GCC should be able to do (most of) this optimisation by itself,

... this isn't the case very often.  You'd better check compile times for 
some large routines on several architectures to be sure not to introduce 
compile time regressions for improving bootstrap times.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Speed up genattrtab
  2009-12-06 22:07     ` Michael Matz
@ 2009-12-07 19:53       ` Segher Boessenkool
  0 siblings, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2009-12-07 19:53 UTC (permalink / raw)
  To: Michael Matz; +Cc: Segher Boessenkool, gcc-patches

>> write_attr_set: Don't update known_true based on previous cases.
>>
>> Updating it is very costly.  Not updating it does not result in many
>> changes in the generated code.
>
> It does for some targets, and worse ...
>
>> Also, GCC should be able to do (most of) this optimisation by itself,
>
> ... this isn't the case very often.  You'd better check compile times for
> some large routines on several architectures to be sure not to introduce
> compile time regressions for improving bootstrap times.

Sure, creating any real regressions in the compiler itself
wouldn't be a good thing.  I'm not _that_ stupid. :-)

Here are statistics for insn-attrtab.[co] (cross from
an AMD64 Linux system):

         |  insn-attrtab.c |    insn-attrtab.o
         |         # lines |     size of .text
         |    old      new |     old       new
----------------------------------------------
arm      | 138102 | 159799 |  477648 |  480632
mips     |  83843 |  85221 |  399161 |  400073
powerpc  |  80368 |  80825 |  360321 |  360553
sparc    |  18519 |  18975 |  129081 |  129129
i386     | 158417 | 164172 | 1154722 | 1171404
----------------------------------------------
parisc   |  13068 |  13180 |   97967 |   98575
ia64     |  15078 |  15134 |  100996 |  101028
s390     |  23601 |  24016 |  124635 |  124859
----------------------------------------------
alpha    |   7147 |   7147 |   44934 |   44934
avr32    |   2657 |   2657 |   25265 |   25265
blackfin |   2813 |   2849 |   30652 |   30820
cris     |   2529 |   2529 |   19124 |   19124
frv      |  14060 |  14080 |   68493 |   68595
h8300    |   2180 |   2222 |   21151 |   21191
m32r     |   2464 |   2532 |   20004 |   20052
m68k     |  89072 |  89158 |  559464 |  559877
mn10300  |    416 |    416 |    2162 |    2162
score    |   1252 |   1252 |   19296 |   19296
sh       |  14393 |  14777 |  126866 |  127122
xtensa   |    757 |    797 |   10121 |   10217

arm has a pretty big source code increase, but it's
all silly stuff like   if ((64) == (64))   and it
doesn't actually increase the binary size much.  i386
has the biggest actual size increase unfortunately,
but it's only 1.5% as well.

I'll run some timings of the resulting compilers; do
you have some prefered source files to test on?


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] [REPOST] Speed up and fix genautomata
  2009-12-04 18:46 [PATCH 1/3] Speed up and fix genautomata Segher Boessenkool
  2009-12-04 18:46 ` [PATCH 2/3] genautomata: Optimise some more Segher Boessenkool
       [not found] ` <143525932d64d7c11e6cae645887c14da120c761.1259952302.git.segher@kernel      .crashing.org>
@ 2010-06-01 16:16 ` Segher Boessenkool
  2010-06-03  3:02 ` [PATCH 1/3] " Vladimir N. Makarov
  3 siblings, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2010-06-01 16:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

The algoritm used to find min_issue_delay_vect is incorrect; it
will not always consider paths that take more steps to arrive at
the same node.

Instead, use a simple standard breadth-first algorithm, that is a
whole lot faster as well, especially on big complex automata.

2009-12-04  Segher Boessenkool  <segher@kernel.crashing.org>

	* genautomata.c (curr_state_pass_num): Delete.
	(min_issue_delay_pass_states): Delete.
	(min_issue_delay): Delete.
	(initiate_min_issue_delay_pass_states): Delete.
	(output_min_issue_delay_table): Compute min_issue_delay_vect
	using a breadth-first search variant.
	(output_tables): Don't call initiate_min_issue_delay_pass_states.
---
 gcc/genautomata.c |  156 +++++++++++++++++++++++++----------------------------
 1 files changed, 74 insertions(+), 82 deletions(-)

diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index f332141..9cbe0f7 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -7488,72 +7488,6 @@ output_trans_table (automaton_t automaton)
   VEC_free (vect_el_t, heap, transition_vect);
 }
 
-/* The current number of passing states to find minimal issue delay
-   value for an ainsn and state.  */
-static int curr_state_pass_num;
-
-/* This recursive function passes states to find minimal issue delay
-   value for AINSN.  The state being visited is STATE.  The function
-   returns minimal issue delay value for AINSN in STATE or -1 if we
-   enter into a loop.  */
-static int
-min_issue_delay_pass_states (state_t state, ainsn_t ainsn)
-{
-  arc_t arc;
-  int min_insn_issue_delay, insn_issue_delay;
-
-  if (state->state_pass_num == curr_state_pass_num
-      || state->min_insn_issue_delay != -1)
-    /* We've entered into a loop or already have the correct value for
-       given state and ainsn.  */
-    return state->min_insn_issue_delay;
-  state->state_pass_num = curr_state_pass_num;
-  min_insn_issue_delay = -1;
-  for (arc = first_out_arc (state); arc != NULL; arc = next_out_arc (arc))
-    if (arc->insn == ainsn)
-      {
-	min_insn_issue_delay = 0;
-	break;
-      }
-    else
-      {
-        insn_issue_delay = min_issue_delay_pass_states (arc->to_state, ainsn);
-	if (insn_issue_delay != -1)
-	  {
-	    if (arc->insn->insn_reserv_decl
-		== DECL_INSN_RESERV (advance_cycle_insn_decl))
-	      insn_issue_delay++;
-	    if (min_insn_issue_delay == -1
-		|| min_insn_issue_delay > insn_issue_delay)
-	      {
-		min_insn_issue_delay = insn_issue_delay;
-		if (insn_issue_delay == 0)
-		  break;
-	      }
-	  }
-      }
-  return min_insn_issue_delay;
-}
-
-/* The function searches minimal issue delay value for AINSN in STATE.
-   The function can return negative value if we can not issue AINSN.  We
-   will report about it later.  */
-static int
-min_issue_delay (state_t state, ainsn_t ainsn)
-{
-  curr_state_pass_num++;
-  state->min_insn_issue_delay = min_issue_delay_pass_states (state, ainsn);
-  return state->min_insn_issue_delay;
-}
-
-/* The function initiates code for finding minimal issue delay values.
-   It should be called only once.  */
-static void
-initiate_min_issue_delay_pass_states (void)
-{
-  curr_state_pass_num = 0;
-}
-
 /* Form and output vectors representing minimal issue delay table of
    AUTOMATON.  The table is state x ainsn -> minimal issue delay of
    the ainsn.  */
@@ -7562,11 +7496,11 @@ output_min_issue_delay_table (automaton_t automaton)
 {
   vla_hwint_t min_issue_delay_vect;
   vla_hwint_t compressed_min_issue_delay_vect;
-  vect_el_t min_delay;
   ainsn_t ainsn;
-  size_t i, min_issue_delay_len;
-  size_t compressed_min_issue_delay_len;
+  size_t i;
+  size_t min_issue_delay_len, compressed_min_issue_delay_len;
   size_t cfactor;
+  int changed;
 
   /* Create vect of pointers to states ordered by num of transitions
      from the state (state with the maximum num is the first).  */
@@ -7577,27 +7511,86 @@ output_min_issue_delay_table (automaton_t automaton)
 			 * automaton->insn_equiv_classes_num);
   min_issue_delay_vect = VEC_alloc (vect_el_t, heap, min_issue_delay_len);
   for (i = 0; i < min_issue_delay_len; i++)
-    VEC_quick_push (vect_el_t, min_issue_delay_vect, 0);
+    VEC_quick_push (vect_el_t, min_issue_delay_vect, -1);
 
   automaton->max_min_delay = 0;
-  for (ainsn = automaton->ainsn_list; ainsn != NULL; ainsn = ainsn->next_ainsn)
+
+  do
+    {
+      size_t state_no;
+
+      changed = 0;
+
+      for (state_no = 0; state_no < VEC_length (state_t, output_states_vect);
+           state_no++)
+	{
+	  state_t s = VEC_index (state_t, output_states_vect, state_no);
+	  arc_t arc;
+
+	  for (arc = first_out_arc (s); arc; arc = next_out_arc (arc))
+	    {
+	      int k;
+
+	      size_t asn = s->order_state_num
+	                   * automaton->insn_equiv_classes_num
+	                   + arc->insn->insn_equiv_class_num;
+
+	      if (VEC_index (vect_el_t, min_issue_delay_vect, asn))
+		{
+		  VEC_replace (vect_el_t, min_issue_delay_vect, asn, 0);
+		  changed = 1;
+		}
+
+	      for (k = 0; k < automaton->insn_equiv_classes_num; k++)
+		{
+		  size_t n0, n1;
+		  vect_el_t delay0, delay1;
+
+		  n0 = s->order_state_num
+		       * automaton->insn_equiv_classes_num
+		       + k;
+		  n1 = arc->to_state->order_state_num
+		       * automaton->insn_equiv_classes_num
+		       + k;
+		  delay0 = VEC_index (vect_el_t, min_issue_delay_vect, n0);
+		  delay1 = VEC_index (vect_el_t, min_issue_delay_vect, n1);
+		  if (delay1 != -1)
+		    {
+		      if (arc->insn->insn_reserv_decl
+		          == DECL_INSN_RESERV (advance_cycle_insn_decl))
+			delay1++;
+		      if (delay1 < delay0 || delay0 == -1)
+			{
+			  VEC_replace (vect_el_t, min_issue_delay_vect, n0, delay1);
+			  changed = 1;
+			}
+		    }
+		}
+	    }
+	}
+    }
+  while (changed);
+
+  automaton->max_min_delay = 0;
+
+  for (ainsn = automaton->ainsn_list; ainsn; ainsn = ainsn->next_ainsn)
     if (ainsn->first_ainsn_with_given_equivalence_num)
       {
 	for (i = 0; i < VEC_length (state_t, output_states_vect); i++)
-	  VEC_index (state_t, output_states_vect, i)->min_insn_issue_delay = -1;
-	for (i = 0; i < VEC_length (state_t, output_states_vect); i++)
 	  {
 	    state_t s = VEC_index (state_t, output_states_vect, i);
-            min_delay = min_issue_delay (s, ainsn);
-	    if (automaton->max_min_delay < min_delay)
-	      automaton->max_min_delay = min_delay;
-	    VEC_replace (vect_el_t, min_issue_delay_vect,
-			 s->order_state_num
-			 * automaton->insn_equiv_classes_num
-			 + ainsn->insn_equiv_class_num,
-			 min_delay);
+	    size_t np = s->order_state_num
+	                * automaton->insn_equiv_classes_num
+	                + ainsn->insn_equiv_class_num;
+	    vect_el_t x = VEC_index (vect_el_t, min_issue_delay_vect, np);
+
+	    if (automaton->max_min_delay < x)
+	      automaton->max_min_delay = x;
+	    if (x == -1)
+	      VEC_replace (vect_el_t, min_issue_delay_vect, np, 0);
 	  }
       }
+
   fprintf (output_file, "/* Vector of min issue delay of insns.  */\n");
   fprintf (output_file, "static const ");
   output_range_type (output_file, 0, automaton->max_min_delay);
@@ -7749,7 +7742,6 @@ output_tables (void)
 {
   automaton_t automaton;
 
-  initiate_min_issue_delay_pass_states ();
   for (automaton = description->first_automaton;
        automaton != NULL;
        automaton = automaton->next_automaton)
-- 
1.6.5.2.154.g549c

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] [REPOST] genautomata: Optimise some more
  2009-12-04 18:46 ` [PATCH 2/3] genautomata: Optimise some more Segher Boessenkool
  2009-12-04 19:05   ` [PATCH 3/3] Speed up genattrtab Segher Boessenkool
@ 2010-06-01 16:19   ` Segher Boessenkool
  2010-06-02 22:05   ` Vladimir Makarov
  2 siblings, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2010-06-01 16:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

Simplify check_presence_pattern_sets(), check_absence_pattern_sets(),
get_excl_set().

2009-12-04  Segher Boessenkool  <segher@kernel.crashing.org>
	genautomata.c (get_excl_set): Do work per element, not per char.
	(check_presence_pattern_sets): Similar.
	(check_absence_pattern_sets): Similar.
---
 gcc/genautomata.c |   50 ++++++++++++++++++++++----------------------------
 1 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index 9cbe0f7..1f24499 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -4174,20 +4174,18 @@ initiate_excl_sets (void)
 static reserv_sets_t
 get_excl_set (reserv_sets_t in_set)
 {
-  int excl_char_num;
-  int chars_num;
-  int i;
+  int el;
+  unsigned int i;
   int start_unit_num;
   int unit_num;
 
-  chars_num = els_in_cycle_reserv * sizeof (set_el_t);
-  memset (excl_set, 0, chars_num);
-  for (excl_char_num = 0; excl_char_num < chars_num; excl_char_num++)
-    if (((unsigned char *) in_set) [excl_char_num])
-      for (i = CHAR_BIT - 1; i >= 0; i--)
-	if ((((unsigned char *) in_set) [excl_char_num] >> i) & 1)
+  memset (excl_set, 0, els_in_cycle_reserv * sizeof (set_el_t));
+  for (el = 0; el < els_in_cycle_reserv; el++)
+    if (in_set[el])
+      for (i = 0; i < CHAR_BIT * sizeof (set_el_t); i++)
+	if ((in_set[el] >> i) & 1)
 	  {
-	    start_unit_num = excl_char_num * CHAR_BIT + i;
+	    start_unit_num = el * CHAR_BIT * sizeof (set_el_t) + i;
 	    if (start_unit_num >= description->units_num)
 	      return excl_set;
 	    for (unit_num = 0; unit_num < els_in_cycle_reserv; unit_num++)
@@ -4286,21 +4284,19 @@ check_presence_pattern_sets (reserv_sets_t checked_set,
 			     reserv_sets_t original_set,
 			     int final_p)
 {
-  int char_num;
-  int chars_num;
-  int i;
+  int el;
+  unsigned int i;
   int start_unit_num;
   int unit_num;
   int presence_p;
   pattern_reserv_t pat_reserv;
 
-  chars_num = els_in_cycle_reserv * sizeof (set_el_t);
-  for (char_num = 0; char_num < chars_num; char_num++)
-    if (((unsigned char *) original_set) [char_num])
-      for (i = CHAR_BIT - 1; i >= 0; i--)
-	if ((((unsigned char *) original_set) [char_num] >> i) & 1)
+  for (el = 0; el < els_in_cycle_reserv; el++)
+    if (original_set[el])
+      for (i = 0; i < CHAR_BIT * sizeof (set_el_t); i++)
+	if ((original_set[el] >> i) & 1)
 	  {
-	    start_unit_num = char_num * CHAR_BIT + i;
+	    start_unit_num = el * CHAR_BIT * sizeof (set_el_t) + i;
 	    if (start_unit_num >= description->units_num)
 	      break;
 	    if ((final_p
@@ -4335,20 +4331,18 @@ check_absence_pattern_sets (reserv_sets_t checked_set,
 			    reserv_sets_t original_set,
 			    int final_p)
 {
-  int char_num;
-  int chars_num;
-  int i;
+  int el;
+  unsigned int i;
   int start_unit_num;
   int unit_num;
   pattern_reserv_t pat_reserv;
 
-  chars_num = els_in_cycle_reserv * sizeof (set_el_t);
-  for (char_num = 0; char_num < chars_num; char_num++)
-    if (((unsigned char *) original_set) [char_num])
-      for (i = CHAR_BIT - 1; i >= 0; i--)
-	if ((((unsigned char *) original_set) [char_num] >> i) & 1)
+  for (el = 0; el < els_in_cycle_reserv; el++)
+    if (original_set[el])
+      for (i = 0; i < CHAR_BIT * sizeof (set_el_t); i++)
+	if ((original_set[el] >> i) & 1)
 	  {
-	    start_unit_num = char_num * CHAR_BIT + i;
+	    start_unit_num = el * CHAR_BIT * sizeof (set_el_t) + i;
 	    if (start_unit_num >= description->units_num)
 	      break;
 	    for (pat_reserv = (final_p
-- 
1.6.5.2.154.g549c

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [REPOST] genautomata: Optimise some more
  2009-12-04 18:46 ` [PATCH 2/3] genautomata: Optimise some more Segher Boessenkool
  2009-12-04 19:05   ` [PATCH 3/3] Speed up genattrtab Segher Boessenkool
  2010-06-01 16:19   ` [PATCH] [REPOST] genautomata: Optimise some more Segher Boessenkool
@ 2010-06-02 22:05   ` Vladimir Makarov
  2010-06-07 19:28     ` NightStrike
  2 siblings, 1 reply; 12+ messages in thread
From: Vladimir Makarov @ 2010-06-02 22:05 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

Segher Boessenkool wrote:
> Simplify check_presence_pattern_sets(), check_absence_pattern_sets(),
> get_excl_set().
>
> 2009-12-04  Segher Boessenkool  <segher@kernel.crashing.org>
> 	genautomata.c (get_excl_set): Do work per element, not per char.
> 	(check_presence_pattern_sets): Similar.
> 	(check_absence_pattern_sets): Similar.
>   
Sorry, for the delay with the review.

The patch is ok to commit.   Thanks for the patch.

I'll send a reveiw for the 2nd patch (min_issue_delay) tomorrow.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] Speed up and fix genautomata
  2009-12-04 18:46 [PATCH 1/3] Speed up and fix genautomata Segher Boessenkool
                   ` (2 preceding siblings ...)
  2010-06-01 16:16 ` [PATCH] [REPOST] " Segher Boessenkool
@ 2010-06-03  3:02 ` Vladimir N. Makarov
  3 siblings, 0 replies; 12+ messages in thread
From: Vladimir N. Makarov @ 2010-06-03  3:02 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On 12/04/2009 01:45 PM, Segher Boessenkool wrote:
> The algoritm used to find min_issue_delay_vect is incorrect; it
> will not always consider paths that take more steps to arrive at
> the same node.
>
> Instead, use a simple standard breadth-first algorithm, that is a
> whole lot faster as well, especially on big complex automata.
>
> 2009-12-04  Segher Boessenkool<segher@kernel.crashing.org>
>
> 	* genautomata.c (curr_state_pass_num): Delete.
> 	(min_issue_delay_pass_states): Delete.
> 	(min_issue_delay): Delete.
> 	(initiate_min_issue_delay_pass_states): Delete.
> 	(output_min_issue_delay_table): Compute min_issue_delay_vect
> 	using a breadth-first search variant.
>    
The patch is ok for the trunk.  Thanks for the patch, Segher.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [REPOST] genautomata: Optimise some more
  2010-06-02 22:05   ` Vladimir Makarov
@ 2010-06-07 19:28     ` NightStrike
  2010-06-08  3:08       ` Segher Boessenkool
  0 siblings, 1 reply; 12+ messages in thread
From: NightStrike @ 2010-06-07 19:28 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Segher Boessenkool, gcc-patches

On Wed, Jun 2, 2010 at 6:06 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> Segher Boessenkool wrote:
>>
>> Simplify check_presence_pattern_sets(), check_absence_pattern_sets(),
>> get_excl_set().
>>
>> 2009-12-04  Segher Boessenkool  <segher@kernel.crashing.org>
>>        genautomata.c (get_excl_set): Do work per element, not per char.
>>        (check_presence_pattern_sets): Similar.
>>        (check_absence_pattern_sets): Similar.
>>
>
> Sorry, for the delay with the review.
>
> The patch is ok to commit.   Thanks for the patch.
>
> I'll send a reveiw for the 2nd patch (min_issue_delay) tomorrow.
>
>

Ping: commit?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [REPOST] genautomata: Optimise some more
  2010-06-07 19:28     ` NightStrike
@ 2010-06-08  3:08       ` Segher Boessenkool
  0 siblings, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2010-06-08  3:08 UTC (permalink / raw)
  To: NightStrike; +Cc: Vladimir Makarov, Segher Boessenkool, gcc-patches

> Ping: commit?

This patch has been committed, as you could have trivially found out.
Also, it's not even a week ago -- if you want to ping other people's
old patches (which could be useful), please restrict yourself to _old_
patches.  Thanks.


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-06-08  3:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-04 18:46 [PATCH 1/3] Speed up and fix genautomata Segher Boessenkool
2009-12-04 18:46 ` [PATCH 2/3] genautomata: Optimise some more Segher Boessenkool
2009-12-04 19:05   ` [PATCH 3/3] Speed up genattrtab Segher Boessenkool
2009-12-06 22:07     ` Michael Matz
2009-12-07 19:53       ` Segher Boessenkool
2010-06-01 16:19   ` [PATCH] [REPOST] genautomata: Optimise some more Segher Boessenkool
2010-06-02 22:05   ` Vladimir Makarov
2010-06-07 19:28     ` NightStrike
2010-06-08  3:08       ` Segher Boessenkool
     [not found] ` <143525932d64d7c11e6cae645887c14da120c761.1259952302.git.segher@kernel      .crashing.org>
2009-12-04 20:41   ` [PATCH 1/3] Speed up and fix genautomata Segher Boessenkool
2010-06-01 16:16 ` [PATCH] [REPOST] " Segher Boessenkool
2010-06-03  3:02 ` [PATCH 1/3] " Vladimir N. Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).