public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Speed up genattrtab
@ 2010-06-15 16:58 Michael Matz
  2010-06-15 17:37 ` Mark Mitchell
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Matz @ 2010-06-15 16:58 UTC (permalink / raw)
  To: gcc-patches

Hi,

okay, let's try again, after many years.  Maybe this time :)

This speeds up genattrtab to be no time issue during bootstrap anymore.
Over the years I worked on many approaches to this.  My first one was to 
throttle down the optimization, then I completely removed the 
optimization, then I implemented different kinds of optimizations, then I 
combined them with the throttled down ones, and now I'm back to more or 
less only throttling the optimizations.

Obviously switching off all optimizations creates the fastest combination 
of genattrtab+compiling insn-attrtab.c.  But it has some effect on the 
overall speed of the compiler.  Not too bad as I rectified this a bit, but 
maybe too much to be acceptable to everyone.  But for all development 
during the last years I used a so modified genattrtab, it's really nice :)

Now, after much benchmarking last week I'm proposing the below patch.  It 
does not get rid of all optimizations, but throttles it significantly 
(plus reorders the order of computation so that lowering the limits 
doesn't have too much effect for small attributes).

Numbers follow.  First the architecture, then the version of genattrtab, 
then four numbers:
 gen_u == seconds to run an optimized (!) genattrtab
 st1_1 == seconds to compile generated insn-attrtab.c with an optimized 
          cc1
 big_u == seconds to compile an artificial piece of code that generates
          large functions, many loops, scheduling opportunities
 kde_u == seconds to compile kdecore.cc, a one-file variant of an older 
          version of libkdecore.

The genattrtab versions are: clean == as in SVN, try3 == no call to 
optimize_attrs, otherwise same as proposed patch, try == the proposed 
variant.  These measurements were taken on genattrtab versions that didn't 
contain the latest changes to support enum attributes, but those have no 
speed effect (I've checked for some combinations).

arch     name   gen_u  st1_u  big_u  kde_u
alpha    clean  0      0.75   43.21  32.52
alpha    try3   0      0.26   44.19  32.85
alpha    try    0      0.35   43.26  32.50
arm      clean  6      19.66  49.25  37.89
arm      try3   0      1.78   50.04  38.00
arm      try    2      2.35   49.85  38.10
crisv32  clean  0      0.21   36.17  27.33
crisv32  try3   0      0.15   36.49  27.53
crisv32  try    0      0.23   36.21  27.43
hppa     clean  0      1.11   46.77  31.97
hppa     try3   0      0.58   46.85  32.00
hppa     try    0      0.64   46.97  31.84
i386     clean  38     34.25  33.51  29.93
i386     try3   1      1.99   34.26  30.64
i386     try    6      2.31   33.78  30.12
ia64     clean  1      1.88   66.55  49.81
ia64     try3   0      0.71   67.08  50.33
ia64     try    0      0.95   66.62  49.81
mips     clean  74     17.08  51.23
mips     try3   0      1.74   52.11
mips     try    4      2.29   50.77
powerpc  clean  56     48.59  49.74  34.15
powerpc  try3   0      2.59   50.60  34.97
powerpc  try    5      2.17   49.38  34.71
s390x    clean  0      1.82   47.26  32.83
s390x    try3   0      0.62   47.63  33.75
s390x    try    0      0.78   47.41  33.64
sh       clean  0      1.46   50.99  38.09
sh       try3   0      0.68   51.05  38.30
sh       try    0      0.91   50.79  38.13
sparc    clean  0      1.11   44.21  32.98
sparc    try3   0      0.57   44.45  33.22
sparc    try    0      0.57   43.27  32.93
x86_64   clean  52     43.81  28.78  28.72
x86_64   try3   1      2.16   29.22  29.40
x86_64   try    6      2.98   28.96  28.96

(mips wasn't able to compile kdecore.cc).  This is all cross compilers to 
$arch-linux, all running on the same host machine (a x86_64-linux iCore7 
machine).  As said, I'm proposing "try", so compare the first and third 
numbers.  It will hugely help i386, mips, powerpc and x86_64, and arm a 
bit; the others aren't a problem right now anyway.  The speed difference 
of the compiler is acceptable I think, actually even speeding up the 
compiler sometimes (probably cache effects, because the .text size of 
insn-attrtab is _much_ smaller) or being in the noise.

So, if included, we go from 95 seconds to 9 seconds for x86_64 for an 
optimized cc1, the difference will be even larger for stage2 (using an 
possibly unoptimized cc1).

As said, I'm bootstrapping with variants of this since years, but of 
course I'm regstrapping this currently on x86_64-linux.  Okay for trunk?


Ciao,
Michael.
-- 
	* genattrtab.c (insn_code_str): New global.
	(attr_rtx_cost): Move earlier, start with cost being 1.
	(simplify_test_exp): Handle one more case of distributive law,
	handle insn_code attributes, decrease cost threshold.
	(simplify_cond_insn_list, tests_attr_p, get_attr_order): New
	functions.
	(optimize_attrs): Use topological order, inline only cheap
	values.
	(write_test_expr): Handle insn_code attribute.
	(write_expr_attr_cache, write_cache_used_attr_for_value): New
	functions.
	(write_attr_switch): New function, broken out from ...
	(write_attr_get): ... this, use the above one.  Fix parameter of
	recursive call.
	(write_attr_set): Reset our_known_true sometimes, call
	write_test_expr for using cached values.
	(write_attr_case): Optimize some conditions just before writing
	out, and write the setting of cached attribute values.
	(write_eligible_delay): Use write_attr_switch.
	(find_attr): Handle insn_code attribute as builtin one.
	(main): Initialize insn_code_str.

Index: genattrtab.c
===================================================================
--- genattrtab.c	(revision 160784)
+++ genattrtab.c	(working copy)
@@ -241,6 +241,7 @@ static const char *length_str;
 static const char *delay_type_str;
 static const char *delay_1_0_str;
 static const char *num_delay_slots_str;
+static const char *insn_code_str;
 
 /* Simplify an expression.  Only call the routine if there is something to
    simplify.  */
@@ -1621,6 +1622,58 @@ write_length_unit_log (void)
   printf ("EXPORTED_CONST int length_unit_log = %u;\n", length_unit_log);
 }
 
+/* Compute approximate cost of the expression.  Used to decide whether
+   expression is cheap enough for inline.  */
+static int
+attr_rtx_cost (rtx x)
+{
+  int cost = 1;
+  enum rtx_code code;
+  if (!x)
+    return 0;
+  code = GET_CODE (x);
+  switch (code)
+    {
+    case MATCH_OPERAND:
+      if (XSTR (x, 1)[0])
+	return 10;
+      else
+	return 1;
+
+    case EQ_ATTR_ALT:
+      return 1;
+
+    case EQ_ATTR:
+      /* Alternatives don't result into function call.  */
+      if (!strcmp_check (XSTR (x, 0), alternative_name)
+          || !strcmp_check (XSTR (x, 0), insn_code_str))
+	return 1;
+      else
+	return 5;
+    default:
+      {
+	int i, j;
+	const char *fmt = GET_RTX_FORMAT (code);
+	for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+	  {
+	    switch (fmt[i])
+	      {
+	      case 'V':
+	      case 'E':
+		for (j = 0; j < XVECLEN (x, i); j++)
+		  cost += attr_rtx_cost (XVECEXP (x, i, j));
+		break;
+	      case 'e':
+		cost += attr_rtx_cost (XEXP (x, i));
+		break;
+	      }
+	  }
+      }
+      break;
+    }
+  return cost;
+}
+
 /* Take a COND expression and see if any of the conditions in it can be
    simplified.  If any are known true or known false for the particular insn
    code, the COND can be further simplified.
@@ -1744,6 +1797,111 @@ simplify_cond (rtx exp, int insn_code, i
   return ret;
 }
 
+/* Taken a COND expression EXP and a value AV, and returns a possibly
+   optimized variant of EXP.  This is similar to simplify_cond, but
+   instead of optimizing for just one instruction we optimize for the
+   set of instructions found in AV.  */
+
+static rtx
+simplify_cond_insn_list (rtx exp, struct attr_value *av)
+{
+  int i;
+  /* We store the desired contents here,
+     then build a new expression if they don't match EXP.  */
+  rtx defval = XEXP (exp, 1);
+  rtx new_defval = XEXP (exp, 1);
+  int len = XVECLEN (exp, 0);
+  rtx *tests;
+  int allsame = 1;
+  rtx ret;
+
+  if (av->has_asm_insn)
+    return exp;
+
+  tests = XNEWVEC (rtx, len);
+  /* This lets us free all storage allocated below, if appropriate.  */
+  obstack_finish (rtl_obstack);
+
+  memcpy (tests, XVEC (exp, 0)->elem, len * sizeof (rtx));
+
+  /* See if default value needs simplification.  */
+  if (GET_CODE (defval) == COND)
+    new_defval = simplify_cond_insn_list (defval, av);
+
+  for (i = 0; i < len; i += 2)
+    {
+      rtx newtest, newval;
+      struct insn_ent *ie;
+      int oldcost;
+
+      oldcost = attr_rtx_cost (XVECEXP (exp, 0, i));
+      tests[i] = false_rtx;
+      for (ie = av->first_insn; ie != 0; ie = ie->next)
+        {
+	  rtx orig_test = XVECEXP (exp, 0, i);
+	  gcc_assert (ie->def->insn_code >= 0);
+	  newtest = simplify_test_exp_in_temp (orig_test, ie->def->insn_code,
+	  				       ie->def->insn_index);
+	  /* We couldn't simplify the expression for this insn code,
+	     so it makes no sense to try further, as syntactically we
+	     can't shorten the test anymore.  Further, also the
+	     simplifications done up to now can be removed.  */
+          if (newtest == orig_test)
+	    {
+	      tests[i] = orig_test;
+	      break;
+	    }
+	  if (newtest != false_rtx)
+	    {
+	      newtest = attr_rtx (AND,
+				  attr_eq (insn_code_str,
+					   attr_numeral (ie->def->insn_code)),
+				  newtest);
+	      tests[i] = insert_right_side (IOR, tests[i], newtest, -2, -2);
+	    }
+        }
+
+      if (attr_rtx_cost (tests[i]) > oldcost)
+        tests[i] = XVECEXP (exp, 0, i);
+
+      /* Simplify this test.  */
+      newtest = simplify_test_exp_in_temp (tests[i], -2, -2);
+      tests[i] = newtest;
+
+      newval = tests[i + 1];
+      /* See if this value may need simplification.  */
+      if (GET_CODE (newval) == COND)
+	newval = simplify_cond_insn_list (newval, av);
+
+      tests[i + 1] = newval;
+    }
+
+  /* See if we changed anything.  */
+  if (len != XVECLEN (exp, 0) || new_defval != XEXP (exp, 1))
+    allsame = 0;
+  else
+    for (i = 0; i < len; i++)
+      if (! attr_equal_p (tests[i], XVECEXP (exp, 0, i)))
+	{
+	  allsame = 0;
+	  break;
+	}
+
+  if (allsame)
+    ret = exp;
+  else
+    {
+      rtx newexp = rtx_alloc (COND);
+
+      XVEC (newexp, 0) = rtvec_alloc (len);
+      memcpy (XVEC (newexp, 0)->elem, tests, len * sizeof (rtx));
+      XEXP (newexp, 1) = new_defval;
+      ret = newexp;
+    }
+  free (tests);
+  return ret;
+}
+
 /* Remove an insn entry from an attribute value.  */
 
 static void
@@ -2216,57 +2374,6 @@ simplify_or_tree (rtx exp, rtx *pterm, i
   return exp;
 }
 
-/* Compute approximate cost of the expression.  Used to decide whether
-   expression is cheap enough for inline.  */
-static int
-attr_rtx_cost (rtx x)
-{
-  int cost = 0;
-  enum rtx_code code;
-  if (!x)
-    return 0;
-  code = GET_CODE (x);
-  switch (code)
-    {
-    case MATCH_OPERAND:
-      if (XSTR (x, 1)[0])
-	return 10;
-      else
-	return 0;
-
-    case EQ_ATTR_ALT:
-      return 0;
-
-    case EQ_ATTR:
-      /* Alternatives don't result into function call.  */
-      if (!strcmp_check (XSTR (x, 0), alternative_name))
-	return 0;
-      else
-	return 5;
-    default:
-      {
-	int i, j;
-	const char *fmt = GET_RTX_FORMAT (code);
-	for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
-	  {
-	    switch (fmt[i])
-	      {
-	      case 'V':
-	      case 'E':
-		for (j = 0; j < XVECLEN (x, i); j++)
-		  cost += attr_rtx_cost (XVECEXP (x, i, j));
-		break;
-	      case 'e':
-		cost += attr_rtx_cost (XEXP (x, i));
-		break;
-	      }
-	  }
-      }
-      break;
-    }
-  return cost;
-}
-
 /* Simplify test expression and use temporary obstack in order to avoid
    memory bloat.  Use ATTR_IND_SIMPLIFIED to avoid unnecessary simplifications
    and avoid unnecessary copying if possible.  */
@@ -2599,6 +2706,25 @@ simplify_test_exp (rtx exp, int insn_cod
 	  return SIMPLIFY_TEST_EXP (newexp, insn_code, insn_index);
 	}
 
+      /* Similarly,
+	    convert (ior (and (y) (x))
+			 (and (z) (x)))
+	    to      (and (ior (y) (z))
+			 (x))
+         Note that we want the common term to stay at the end.
+       */
+
+      else if (GET_CODE (left) == AND && GET_CODE (right) == AND
+	       && attr_equal_p (XEXP (left, 1), XEXP (right, 1)))
+	{
+	  newexp = attr_rtx (IOR, XEXP (left, 0), XEXP (right, 0));
+
+	  left = newexp;
+	  right = XEXP (right, 1);
+	  newexp = attr_rtx (AND, left, right);
+	  return SIMPLIFY_TEST_EXP (newexp, insn_code, insn_index);
+	}
+
       /* See if all or all but one of the insn's alternatives are specified
 	 in this tree.  Optimize if so.  */
 
@@ -2702,6 +2828,13 @@ simplify_test_exp (rtx exp, int insn_cod
 	  break;
 	}
 
+      if (XSTR (exp, 0) == insn_code_str)
+        {
+	  if (insn_code >= 0)
+	    newexp = atoi (XSTR (exp, 1)) == insn_code ? true_rtx : false_rtx;
+	  break;
+	}
+
       /* Look at the value for this insn code in the specified attribute.
 	 We normally can replace this comparison with the condition that
 	 would give this insn the values being tested for.  */
@@ -2734,7 +2867,7 @@ simplify_test_exp (rtx exp, int insn_cod
 	      x = evaluate_eq_attr (exp, attr, av->value,
 				    insn_code, insn_index);
 	      x = SIMPLIFY_TEST_EXP (x, insn_code, insn_index);
-	      if (attr_rtx_cost(x) < 20)
+	      if (attr_rtx_cost(x) < 7)
 		return x;
 	    }
 	}
@@ -2754,6 +2887,131 @@ simplify_test_exp (rtx exp, int insn_cod
   return newexp;
 }
 
+/* Return 1 if any EQ_ATTR subexpression of P refers to ATTR,
+   otherwise return 0.  */
+
+static int
+tests_attr_p (rtx p, struct attr_desc *attr)
+{
+  const char *fmt;
+  int i, ie, j, je;
+
+  if (GET_CODE (p) == EQ_ATTR)
+    {
+      if (XSTR (p, 0) != attr->name)
+	return 0;
+      return 1;
+    }
+
+  fmt = GET_RTX_FORMAT (GET_CODE (p));
+  ie = GET_RTX_LENGTH (GET_CODE (p));
+  for (i = 0; i < ie; i++)
+    {
+      switch (*fmt++)
+	{
+	case 'e':
+	  if (tests_attr_p (XEXP (p, i), attr))
+	    return 1;
+	  break;
+
+	case 'E':
+	  je = XVECLEN (p, i);
+	  for (j = 0; j < je; ++j)
+	    if (tests_attr_p (XVECEXP (p, i, j), attr))
+	      return 1;
+	  break;
+	}
+    }
+
+  return 0;
+}
+
+/* Calculate a topological sorting of all attributes so that
+   all attributes only depend on attributes in front of it.
+   Place the result in *RET (which is a pointer to an array of
+   attr_desc pointers), and return the size of that array.  */
+
+static int
+get_attr_order (struct attr_desc ***ret)
+{
+  int i, j;
+  int num = 0;
+  struct attr_desc *attr;
+  struct attr_desc **all, **sorted;
+  char *handled;
+  for (i = 0; i < MAX_ATTRS_INDEX; i++)
+    for (attr = attrs[i]; attr; attr = attr->next)
+      num++;
+  all = XNEWVEC (struct attr_desc *, num);
+  sorted = XNEWVEC (struct attr_desc *, num);
+  handled = XCNEWVEC (char, num);
+  num = 0;
+  for (i = 0; i < MAX_ATTRS_INDEX; i++)
+    for (attr = attrs[i]; attr; attr = attr->next)
+      all[num++] = attr;
+
+  j = 0;
+  for (i = 0; i < num; i++)
+    if (all[i]->is_const)
+      handled[i] = 1, sorted[j++] = all[i];
+
+  /* We have only few attributes hence we can live with the inner
+     loop being O(n^2), unlike the normal fast variants of topological
+     sorting.  */
+  while (j < num)
+    {
+      for (i = 0; i < num; i++)
+	if (!handled[i])
+	  {
+	    /* Let's see if I depends on anything interesting.  */
+	    int k;
+	    for (k = 0; k < num; k++)
+	      if (!handled[k])
+		{
+		  struct attr_value *av;
+		  for (av = all[i]->first_value; av; av = av->next)
+		    if (av->num_insns != 0)
+		      if (tests_attr_p (av->value, all[k]))
+			break;
+
+		  if (av)
+		    /* Something in I depends on K.  */
+		    break;
+		}
+	    if (k == num)
+	      {
+		/* Nothing in I depended on anything intersting, so
+		   it's done.  */
+		handled[i] = 1;
+		sorted[j++] = all[i];
+	      }
+	  }
+    }
+
+  for (j = 0; j < num; j++)
+    {
+      struct attr_desc *attr2;
+      struct attr_value *av;
+
+      attr = sorted[j];
+      fprintf (stderr, "%s depends on: ", attr->name);
+      for (i = 0; i < MAX_ATTRS_INDEX; ++i)
+	for (attr2 = attrs[i]; attr2; attr2 = attr2->next)
+	  if (!attr2->is_const)
+	    for (av = attr->first_value; av; av = av->next)
+	      if (av->num_insns != 0)
+		if (tests_attr_p (av->value, attr2))
+		  {
+		    fprintf (stderr, "%s, ", attr2->name);
+		    break;
+		  }
+      fprintf (stderr, "\n");
+    }
+  free (all);
+  *ret = sorted;
+  return num;
+}
+
 /* Optimize the attribute lists by seeing if we can determine conditional
    values from the known values of other attributes.  This will save subroutine
    calls during the compilation.  */
@@ -2768,6 +3026,8 @@ optimize_attrs (void)
   int i;
   struct attr_value_list *ivbuf;
   struct attr_value_list *iv;
+  struct attr_desc **topsort;
+  int topnum;
 
   /* For each insn code, make a list of all the insn_ent's for it,
      for all values for all attributes.  */
@@ -2783,18 +3043,22 @@ optimize_attrs (void)
 
   iv = ivbuf = XNEWVEC (struct attr_value_list, num_insn_ents);
 
-  for (i = 0; i < MAX_ATTRS_INDEX; i++)
-    for (attr = attrs[i]; attr; attr = attr->next)
-      for (av = attr->first_value; av; av = av->next)
-	for (ie = av->first_insn; ie; ie = ie->next)
-	  {
-	    iv->attr = attr;
-	    iv->av = av;
-	    iv->ie = ie;
-	    iv->next = insn_code_values[ie->def->insn_code];
-	    insn_code_values[ie->def->insn_code] = iv;
-	    iv++;
-	  }
+  /* Create the chain of insn*attr values such that we see dependend
+     attributes after their dependencies.  As we use a stack via the
+     next pointers start from the end of the topological order.  */
+  topnum = get_attr_order (&topsort);
+  for (i = topnum - 1; i >= 0; i--)
+    for (av = topsort[i]->first_value; av; av = av->next)
+      for (ie = av->first_insn; ie; ie = ie->next)
+	{
+	  iv->attr = topsort[i];
+	  iv->av = av;
+	  iv->ie = ie;
+	  iv->next = insn_code_values[ie->def->insn_code];
+	  insn_code_values[ie->def->insn_code] = iv;
+	  iv++;
+	}
+  free (topsort);
 
   /* Sanity check on num_insn_ents.  */
   gcc_assert (iv == ivbuf + num_insn_ents);
@@ -2829,7 +3093,15 @@ optimize_attrs (void)
 	    }
 
 	  rtl_obstack = old;
-	  if (newexp != av->value)
+	  /* If we created a new value for this instruction, and it's
+	     cheaper than the old value, and overall cheap, use that
+	     one as specific value for the current instruction.
+	     The last test is to avoid exploding the get_attr_ function
+	     sizes for no much gain.  */
+	  if (newexp != av->value
+	      && attr_rtx_cost (newexp) < attr_rtx_cost (av->value)
+	      && attr_rtx_cost (newexp) < 26
+	     )
 	    {
 	      newexp = attr_copy_rtx (newexp);
 	      remove_insn_ent (av, ie);
@@ -3332,6 +3604,12 @@ write_test_expr (rtx exp, int flags)
 	  break;
 	}
 
+      if (XSTR (exp, 0) == insn_code_str)
+        {
+	  printf ("insn_code == %s", XSTR (exp, 1));
+	  break;
+	}
+
       attr = find_attr (&XSTR (exp, 0), 0);
       gcc_assert (attr);
 
@@ -3618,10 +3896,80 @@ walk_attr_value (rtx exp)
       }
 }
 
-/* Write out a function to obtain the attribute for a given INSN.  */
+/* If a subexpression of P refers to ATTR, write out C code to retrieve
+   the value of that attribute storing it in a local variable.  */
+
+static int
+write_expr_attr_cache (rtx p, struct attr_desc *attr, int indent)
+{
+  const char *fmt;
+  int i, ie, j, je;
+
+  if (GET_CODE (p) == EQ_ATTR)
+    {
+      if (XSTR (p, 0) != attr->name)
+	return 0;
+
+      write_indent (indent);
+      if (attr->enum_name)
+	printf ("  enum %s ", attr->enum_name);
+      else if (!attr->is_numeric)
+	printf ("  enum attr_%s ", attr->name);
+      else
+	printf ("  int ");
+
+      printf ("attr_%s ATTRIBUTE_UNUSED = get_attr_%s (insn);\n",
+	      attr->name, attr->name);
+      return 1;
+    }
+
+  fmt = GET_RTX_FORMAT (GET_CODE (p));
+  ie = GET_RTX_LENGTH (GET_CODE (p));
+  for (i = 0; i < ie; i++)
+    {
+      switch (*fmt++)
+	{
+	case 'e':
+	  if (write_expr_attr_cache (XEXP (p, i), attr, indent))
+	    return 1;
+	  break;
+
+	case 'E':
+	  je = XVECLEN (p, i);
+	  for (j = 0; j < je; ++j)
+	    if (write_expr_attr_cache (XVECEXP (p, i, j), attr, indent))
+	      return 1;
+	  break;
+	}
+    }
+
+  return 0;
+}
+
+/* Given a VALUE (possibly having EQ_ATTR tests in subexpression)
+   write out C code to retrieve the value of all attributes used
+   in tests embedded therein.  */
 
 static void
-write_attr_get (struct attr_desc *attr)
+write_cache_used_attr_for_value (rtx value, int indent)
+{
+  struct attr_desc *attr2;
+  int i;
+
+  for (i = 0; i < MAX_ATTRS_INDEX; ++i)
+    for (attr2 = attrs[i]; attr2; attr2 = attr2->next)
+      if (!attr2->is_const)
+	write_expr_attr_cache (value, attr2, indent);
+}
+
+/* Write out C code to calculate the value of ATTR per instruction.
+   PREFIX and SUFFIX are used to delimit the 'return' statement delivering
+   the value to our caller.  Either it will form a real return statement,
+   or an accumulator update.  */
+
+static void
+write_attr_switch (struct attr_desc *attr, int indent, const char *prefix,
+		   const char *suffix)
 {
   struct attr_value *av, *common_av;
 
@@ -3629,6 +3977,32 @@ write_attr_get (struct attr_desc *attr)
      switch we will generate.  */
   common_av = find_most_used (attr);
 
+  write_indent (indent);
+  printf ("{\n");
+  write_indent (indent);
+  printf ("  int insn_code = recog_memoized (insn);\n");
+  write_indent (indent);
+  printf ("  switch (insn_code)\n");
+  write_indent (indent);
+  printf ("    {\n");
+
+  for (av = attr->first_value; av; av = av->next)
+    if (av != common_av)
+      write_attr_case (attr, av, 1, prefix, suffix, indent + 4, true_rtx);
+
+  write_attr_case (attr, common_av, 0, prefix, suffix, indent + 4, true_rtx);
+  
+  write_indent (indent);
+  printf ("    }\n");
+  write_indent (indent);
+  printf ("}\n\n");
+}
+
+/* Write out a function to obtain the attribute for a given INSN.  */
+
+static void
+write_attr_get (struct attr_desc *attr)
+{
   /* Write out start of function, then all values with explicit `case' lines,
      then a `default', then the value with the most uses.  */
   if (attr->enum_name)
@@ -3646,6 +4020,7 @@ write_attr_get (struct attr_desc *attr)
     printf ("get_attr_%s (rtx insn ATTRIBUTE_UNUSED)\n", attr->name);
   else
     {
+      struct attr_value *av;
       printf ("get_attr_%s (void)\n", attr->name);
       printf ("{\n");
 
@@ -3656,22 +4031,13 @@ write_attr_get (struct attr_desc *attr)
 			  av->first_insn->def->insn_index);
 	else if (av->num_insns != 0)
 	  write_attr_set (attr, 2, av->value, "return", ";",
-			  true_rtx, -2, 0);
+			  true_rtx, -2, -2);
 
       printf ("}\n\n");
       return;
     }
 
-  printf ("{\n");
-  printf ("  switch (recog_memoized (insn))\n");
-  printf ("    {\n");
-
-  for (av = attr->first_value; av; av = av->next)
-    if (av != common_av)
-      write_attr_case (attr, av, 1, "return", ";", 4, true_rtx);
-
-  write_attr_case (attr, common_av, 0, "return", ";", 4, true_rtx);
-  printf ("    }\n}\n\n");
+  write_attr_switch (attr, 0, "return", ";");
 }
 
 /* Given an AND tree of known true terms (because we are inside an `if' with
@@ -3727,6 +4093,10 @@ write_attr_set (struct attr_desc *attr,
 	  rtx testexp;
 	  rtx inner_true;
 
+	  /* Reset our_known_true after some time to not accumulate
+	     too much cruft (slowing down genattrtab).  */
+	  if ((i & 31) == 0)
+	    our_known_true = known_true;
 	  testexp = eliminate_known_true (our_known_true,
 					  XVECEXP (value, 0, i),
 					  insn_code, insn_index);
@@ -3755,7 +4125,7 @@ write_attr_set (struct attr_desc *attr,
 	  write_indent (indent);
 	  printf ("%sif ", first_if ? "" : "else ");
 	  first_if = 0;
-	  write_test_expr (testexp, 0);
+	  write_test_expr (testexp, 2);
 	  printf ("\n");
 	  write_indent (indent + 2);
 	  printf ("{\n");
@@ -3820,6 +4190,7 @@ write_attr_case (struct attr_desc *attr,
 		 int write_case_lines, const char *prefix, const char *suffix,
 		 int indent, rtx known_true)
 {
+  rtx opt_val;
   if (av->num_insns == 0)
     return;
 
@@ -3843,9 +4214,33 @@ write_attr_case (struct attr_desc *attr,
       printf ("default:\n");
     }
 
+  indent += 2;
+  write_indent (indent);
+  printf ("{\n");
+
+  opt_val = av->value;
+  /* If we have multiple but only few instructions associated with
+     this value we can possibly optimize this.  */
+  if (GET_CODE (opt_val) == COND && av->num_insns > 1 && av->num_insns < 10)
+    opt_val = simplify_cond_insn_list (opt_val, av);
+  while (GET_CODE (opt_val) == COND)
+    {
+      rtx newexp2;
+      if (av->num_insns == 1)
+	newexp2 = simplify_cond (opt_val, av->first_insn->def->insn_code,
+				 av->first_insn->def->insn_index);
+      else
+	newexp2 = simplify_cond (opt_val, -2, -2);
+      if (newexp2 == opt_val)
+	break;
+      opt_val = newexp2;
+    }
+
   /* See what we have to do to output this value.  */
   must_extract = must_constrain = address_used = 0;
-  walk_attr_value (av->value);
+  walk_attr_value (opt_val);
+
+  write_cache_used_attr_for_value (opt_val, indent);
 
   if (must_constrain)
     {
@@ -3859,11 +4254,11 @@ write_attr_case (struct attr_desc *attr,
     }
 
   if (av->num_insns == 1)
-    write_attr_set (attr, indent + 2, av->value, prefix, suffix,
+    write_attr_set (attr, indent + 2, opt_val, prefix, suffix,
 		    known_true, av->first_insn->def->insn_code,
 		    av->first_insn->def->insn_index);
   else
-    write_attr_set (attr, indent + 2, av->value, prefix, suffix,
+    write_attr_set (attr, indent + 2, opt_val, prefix, suffix,
 		    known_true, -2, 0);
 
   if (strncmp (prefix, "return", 6))
@@ -3871,6 +4266,8 @@ write_attr_case (struct attr_desc *attr,
       write_indent (indent + 2);
       printf ("break;\n");
     }
+  write_indent (indent);
+  printf ("}\n");
   printf ("\n");
 }
 
@@ -3993,7 +4390,6 @@ write_eligible_delay (const char *kind)
   char str[50];
   const char *pstr;
   struct attr_desc *attr;
-  struct attr_value *av, *common_av;
   int i;
 
   /* Compute the maximum number of delay slots required.  We use the delay
@@ -4026,19 +4422,11 @@ write_eligible_delay (const char *kind)
     {
       attr = find_attr (&delay_type_str, 0);
       gcc_assert (attr);
-      common_av = find_most_used (attr);
-
-      printf ("  insn = delay_insn;\n");
-      printf ("  switch (recog_memoized (insn))\n");
-      printf ("    {\n");
 
       sprintf (str, " * %d;\n      break;", max_slots);
-      for (av = attr->first_value; av; av = av->next)
-	if (av != common_av)
-	  write_attr_case (attr, av, 1, "slot +=", str, 4, true_rtx);
 
-      write_attr_case (attr, common_av, 0, "slot +=", str, 4, true_rtx);
-      printf ("    }\n\n");
+      printf ("  insn = delay_insn;\n");
+      write_attr_switch (attr, 2, "slot +=", str);
 
       /* Ensure matched.  Otherwise, shouldn't have been called.  */
       printf ("  gcc_assert (slot >= %d);\n\n", max_slots);
@@ -4047,20 +4435,10 @@ write_eligible_delay (const char *kind)
   /* If just one type of delay slot, write simple switch.  */
   if (num_delays == 1 && max_slots == 1)
     {
-      printf ("  insn = candidate_insn;\n");
-      printf ("  switch (recog_memoized (insn))\n");
-      printf ("    {\n");
-
       attr = find_attr (&delay_1_0_str, 0);
       gcc_assert (attr);
-      common_av = find_most_used (attr);
-
-      for (av = attr->first_value; av; av = av->next)
-	if (av != common_av)
-	  write_attr_case (attr, av, 1, "return", ";", 4, true_rtx);
-
-      write_attr_case (attr, common_av, 0, "return", ";", 4, true_rtx);
-      printf ("    }\n");
+      printf ("  insn = candidate_insn;\n");
+      write_attr_switch (attr, 2, "return", ";");
     }
 
   else
@@ -4076,21 +4454,12 @@ write_eligible_delay (const char *kind)
 	  {
 	    printf ("    case %d:\n",
 		    (i / 3) + (num_delays == 1 ? 0 : delay->num * max_slots));
-	    printf ("      switch (recog_memoized (insn))\n");
-	    printf ("\t{\n");
 
 	    sprintf (str, "*%s_%d_%d", kind, delay->num, i / 3);
 	    pstr = str;
 	    attr = find_attr (&pstr, 0);
 	    gcc_assert (attr);
-	    common_av = find_most_used (attr);
-
-	    for (av = attr->first_value; av; av = av->next)
-	      if (av != common_av)
-		write_attr_case (attr, av, 1, "return", ";", 8, true_rtx);
-
-	    write_attr_case (attr, common_av, 0, "return", ";", 8, true_rtx);
-	    printf ("      }\n");
+	    write_attr_switch (attr, 6, "return", ";");
 	  }
 
       printf ("    default:\n");
@@ -4133,7 +4502,7 @@ find_attr (const char **name_p, int crea
 
   /* Before we resort to using `strcmp', see if the string address matches
      anywhere.  In most cases, it should have been canonicalized to do so.  */
-  if (name == alternative_name)
+  if (name == alternative_name || name == insn_code_str)
     return NULL;
 
   index = name[0] & (MAX_ATTRS_INDEX - 1);
@@ -4458,6 +4827,7 @@ main (int argc, char **argv)
   delay_type_str = DEF_ATTR_STRING ("*delay_type");
   delay_1_0_str = DEF_ATTR_STRING ("*delay_1_0");
   num_delay_slots_str = DEF_ATTR_STRING ("*num_delay_slots");
+  insn_code_str = DEF_ATTR_STRING ("insn_code");
 
   printf ("/* Generated automatically by the program `genattrtab'\n\
 from the machine description file `md'.  */\n\n");
@@ -4573,7 +4943,7 @@ from the machine description file `md'.
   /* Construct extra attributes for `length'.  */
   make_length_attrs ();
 
-  /* Perform any possible optimizations to speed up compilation.  */
+  /* Perform some optimizations to speed up compilation.  */
   optimize_attrs ();
 
   /* Now write out all the `gen_attr_...' routines.  Do these before the

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 17:37 ` Mark Mitchell
@ 2010-06-15 17:37   ` Jan Hubicka
  2010-06-15 19:13     ` Mark Mitchell
  2010-06-15 17:59   ` Dave Korn
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Hubicka @ 2010-06-15 17:37 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Michael Matz, gcc-patches

> Michael Matz wrote:
> 
> > This speeds up genattrtab to be no time issue during bootstrap anymore.
> 
> > The speed difference
> > of the compiler is acceptable I think, actually even speeding up the 
> > compiler sometimes (probably cache effects, because the .text size of 
> > insn-attrtab is _much_ smaller) or being in the noise.
> 
> I certainly see how this benefits us as GCC developers.  But, it does
> look like it has a negative effect on overall compile-time, even though
> in some cases it helps a bit.  The effect seems negative, in the
> aggregate, though I didn't try to dump your table into a spreadsheet and
> actually get a number out.  I don't think we should help ourselves at
> the expense of the user's experience.
> 
> I happened to be talking to someone else this morning about this same
> issue.  Would it help just to organize the Makefile in such a way that
> this file is being built first, so that on a parallel machine it's not
> the last thing being built?  This person observed that in many cases
> make seems to wait to *start* building insn-attrtab until very late in
> the build process, meaning that all the other cores are idle while it's
> building.

In WHOPR compilation we now partition the program based on source files (this
is something I plan to revisit later), so we get partition that corresponds to
insn-attrtab.  WPA driver sorts units by size and feeds them to parallel make
and insn-attrtab is still last on finishing build by quite long shot.  This
ignore time spent by genattrtab itself.

So just reordering makefiles is not going to solve the problem.  I also think
insn-attrtab starts late because genattrtab takes long time to generate it.
insn-attrtab on i386 also has one dominating function.  With WHOPR this can
be solved by better partitioning algorithm along with function splitting (I have
prototypes of both the second I plan to contribute soon after some bugfixing).

Honza
> 
> Thanks,
> 
> -- 
> Mark Mitchell
> CodeSourcery
> mark@codesourcery.com
> (650) 331-3385 x713

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 16:58 Speed up genattrtab Michael Matz
@ 2010-06-15 17:37 ` Mark Mitchell
  2010-06-15 17:37   ` Jan Hubicka
  2010-06-15 17:59   ` Dave Korn
  0 siblings, 2 replies; 40+ messages in thread
From: Mark Mitchell @ 2010-06-15 17:37 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

Michael Matz wrote:

> This speeds up genattrtab to be no time issue during bootstrap anymore.

> The speed difference
> of the compiler is acceptable I think, actually even speeding up the 
> compiler sometimes (probably cache effects, because the .text size of 
> insn-attrtab is _much_ smaller) or being in the noise.

I certainly see how this benefits us as GCC developers.  But, it does
look like it has a negative effect on overall compile-time, even though
in some cases it helps a bit.  The effect seems negative, in the
aggregate, though I didn't try to dump your table into a spreadsheet and
actually get a number out.  I don't think we should help ourselves at
the expense of the user's experience.

I happened to be talking to someone else this morning about this same
issue.  Would it help just to organize the Makefile in such a way that
this file is being built first, so that on a parallel machine it's not
the last thing being built?  This person observed that in many cases
make seems to wait to *start* building insn-attrtab until very late in
the build process, meaning that all the other cores are idle while it's
building.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 17:37 ` Mark Mitchell
  2010-06-15 17:37   ` Jan Hubicka
@ 2010-06-15 17:59   ` Dave Korn
  2010-06-15 20:30     ` Paolo Bonzini
  2010-06-15 22:03     ` David Daney
  1 sibling, 2 replies; 40+ messages in thread
From: Dave Korn @ 2010-06-15 17:59 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Michael Matz, gcc-patches

On 15/06/2010 18:19, Mark Mitchell wrote:
> Michael Matz wrote:
> 
>> This speeds up genattrtab to be no time issue during bootstrap anymore.
> 
>> The speed difference
>> of the compiler is acceptable I think, actually even speeding up the 
>> compiler sometimes (probably cache effects, because the .text size of 
>> insn-attrtab is _much_ smaller) or being in the noise.
> 
> I certainly see how this benefits us as GCC developers.

  90 seconds out of a whole build, or did I misread something?

    cheers,
      DaveK

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 17:37   ` Jan Hubicka
@ 2010-06-15 19:13     ` Mark Mitchell
  2010-06-15 19:17       ` Ralf Wildenhues
                         ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Mark Mitchell @ 2010-06-15 19:13 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Michael Matz, gcc-patches

Jan Hubicka wrote:

> In WHOPR compilation we now partition the program based on source files (this
> is something I plan to revisit later), so we get partition that corresponds to
> insn-attrtab.  WPA driver sorts units by size and feeds them to parallel make
> and insn-attrtab is still last on finishing build by quite long shot.  This
> ignore time spent by genattrtab itself.
> 
> So just reordering makefiles is not going to solve the problem.

Well, it might solve the problem for developers not using WHOPR. :-)

I certainly don't claim it's a general solution; at some point, if you
have enouh cores, the single core compiling insn-attrtab will dominate
anyhow.  But if some easy Makefile hackery would help some developers,
I'd still argue we should do it.  And I still have a hard time with a
change that's going to make the compiler slower in order to save a
minute or two during compiler builds, especially given that test cycles
are so much longer than build cycles.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 19:13     ` Mark Mitchell
@ 2010-06-15 19:17       ` Ralf Wildenhues
  2010-06-15 19:22       ` Jakub Jelinek
  2010-06-16  9:32       ` Richard Guenther
  2 siblings, 0 replies; 40+ messages in thread
From: Ralf Wildenhues @ 2010-06-15 19:17 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, Michael Matz, gcc-patches

* Mark Mitchell wrote on Tue, Jun 15, 2010 at 08:39:47PM CEST:
> But if some easy Makefile hackery would help some developers,
> I'd still argue we should do it.

IIRC, this hackery has already been done sometime last year or
so, with reordering of objects and the order-only dependencies.
Unless the build has changed a lot (haven't checked), it is
simply not possible to improve in a generally useful way
without changing GNU make to allow for more fine-tuning.

It is typically helpful to not increase N in -jN too much over
the number of available cores.

Cheers,
Ralf

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 19:13     ` Mark Mitchell
  2010-06-15 19:17       ` Ralf Wildenhues
@ 2010-06-15 19:22       ` Jakub Jelinek
  2010-06-16 15:42         ` Jakub Jelinek
  2010-06-16  9:32       ` Richard Guenther
  2 siblings, 1 reply; 40+ messages in thread
From: Jakub Jelinek @ 2010-06-15 19:22 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, Michael Matz, gcc-patches

On Tue, Jun 15, 2010 at 11:39:47AM -0700, Mark Mitchell wrote:
> Jan Hubicka wrote:
> 
> > In WHOPR compilation we now partition the program based on source files (this
> > is something I plan to revisit later), so we get partition that corresponds to
> > insn-attrtab.  WPA driver sorts units by size and feeds them to parallel make
> > and insn-attrtab is still last on finishing build by quite long shot.  This
> > ignore time spent by genattrtab itself.
> > 
> > So just reordering makefiles is not going to solve the problem.
> 
> Well, it might solve the problem for developers not using WHOPR. :-)
> 
> I certainly don't claim it's a general solution; at some point, if you
> have enouh cores, the single core compiling insn-attrtab will dominate
> anyhow.  But if some easy Makefile hackery would help some developers,
> I'd still argue we should do it.  And I still have a hard time with a
> change that's going to make the compiler slower in order to save a
> minute or two during compiler builds, especially given that test cycles
> are so much longer than build cycles.

I believe on x86_64/i686 the most time is spent in compiling
internal_dfa_insn_code, primarily because there are so many different
schedulings.
The insn is a big switch on recog_memoized, where most of the cases first
compare ix86_schedule var to some enum.  I guess it would be certainly
faster to compile to instead split the big function into separate function
for each schedule and make internal_dfa_insn_code a function pointer, would
need to be benchmarked how it would actually perform at runtime.

	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 17:59   ` Dave Korn
@ 2010-06-15 20:30     ` Paolo Bonzini
  2010-06-15 20:34       ` Paolo Bonzini
  2010-06-15 22:03     ` David Daney
  1 sibling, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2010-06-15 20:30 UTC (permalink / raw)
  To: Dave Korn; +Cc: Mark Mitchell, Michael Matz, gcc-patches

On 06/15/2010 07:57 PM, Dave Korn wrote:
> On 15/06/2010 18:19, Mark Mitchell wrote:
>> Michael Matz wrote:
>>
>>> This speeds up genattrtab to be no time issue during bootstrap anymore.
>>
>>> The speed difference
>>> of the compiler is acceptable I think, actually even speeding up the
>>> compiler sometimes (probably cache effects, because the .text size of
>>> insn-attrtab is _much_ smaller) or being in the noise.
>>
>> I certainly see how this benefits us as GCC developers.
>
>    90 seconds out of a whole build, or did I misread something?

90 seconds out of "touch something.md; make cc1" is significant.  It's 
the most sensitive part for day-to-day backend development.

Here is Michael's data reformatted:

	big_u			kde_u		
	clean	try		clean	try	
alpha	43.21	43.26   0.12%	32.52	32.50  -0.06%
arm	49.25	49.85   1.22%	37.89	38.10   0.55%
crisv32	36.17	36.21   0.11%	27.33	27.43   0.37%
hppa	46.77	46.97   0.43%	31.97	31.84  -0.41%
i386	33.51	33.78   0.81%	29.93	30.12	 0.63%
ia64	66.55	66.62   0.11%	49.81	49.81	 0.00%
mips	51.23	50.77  -0.90%			
powerpc	49.74	49.38  -0.72%	34.15	34.71	 1.64%
s390x	47.26	47.41   0.32%	32.83	33.64	 2.47%
sh	50.99	50.79  -0.39%	38.09	38.13	 0.11%
sparc	44.21	43.27  -2.13%	32.98	32.93	-0.15%
x86_64	28.78	28.96   0.63%	28.72	28.96	 0.84%

It's a tougher call than it seems...  Unfortunately, of the four likely 
most used platforms (i386, x86_64, mips and arm) three are degraded 
consistently, and the fourth lacks one of the two data points.  That's 
indeed not too good. :-/

Compiling genattrtab at -O1 saves half of the time on x86_64, and 
cfgcleanup goes from 20 to 0.7 seconds.  So it should be worthwhile 
investigating _which_ cfgcleanup pass is spending so much time. 
Compiling insn-recog.c at -O1 instead saves 30% of the compilation time.

I used to like Michael's patches a lot but nowadays parallelization of 
the build is much better thanks to more cores in the systems.  Since 
genattrtab can run in parallel with most of the rest of the build, what 
we need to save time on is "touch x.md; make -j6 cc1" (with the 
insn-attrtab.c move-if-changed rule disabled of course).  Maybe -O1 is 
enough together with Jakub's old patch to split insn-attrtab.c into four 
or eight files.  Moreover, the "split" idea could be applied to 
insn-recog.c too.

Michael, can you get numbers for your testcases when you compile "clean" 
insn-attrtab.c and/or insn-recog.c with -O1?

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 20:30     ` Paolo Bonzini
@ 2010-06-15 20:34       ` Paolo Bonzini
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Bonzini @ 2010-06-15 20:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Mark Mitchell, Michael Matz, gcc-patches

On 06/15/2010 07:57 PM, Dave Korn wrote:
> On 15/06/2010 18:19, Mark Mitchell wrote:
>> Michael Matz wrote:
>>
>>> This speeds up genattrtab to be no time issue during bootstrap anymore.
>>
>>> The speed difference
>>> of the compiler is acceptable I think, actually even speeding up the
>>> compiler sometimes (probably cache effects, because the .text size of
>>> insn-attrtab is _much_ smaller) or being in the noise.
>>
>> I certainly see how this benefits us as GCC developers.
>
>    90 seconds out of a whole build, or did I misread something?

90 seconds out of "touch something.md; make cc1" is significant.  It's 
the most sensitive part for day-to-day backend development.

Here is Michael's data reformatted:

	big_u			kde_u		
	clean	try		clean	try	
alpha	43.21	43.26   0.12%	32.52	32.50  -0.06%
arm	49.25	49.85   1.22%	37.89	38.10   0.55%
crisv32	36.17	36.21   0.11%	27.33	27.43   0.37%
hppa	46.77	46.97   0.43%	31.97	31.84  -0.41%
i386	33.51	33.78   0.81%	29.93	30.12	 0.63%
ia64	66.55	66.62   0.11%	49.81	49.81	 0.00%
mips	51.23	50.77  -0.90%			
powerpc	49.74	49.38  -0.72%	34.15	34.71	 1.64%
s390x	47.26	47.41   0.32%	32.83	33.64	 2.47%
sh	50.99	50.79  -0.39%	38.09	38.13	 0.11%
sparc	44.21	43.27  -2.13%	32.98	32.93	-0.15%
x86_64	28.78	28.96   0.63%	28.72	28.96	 0.84%

It's a tougher call than it seems...  Unfortunately, of the four likely 
most used platforms (i386, x86_64, mips and arm) three are degraded 
consistently, and the fourth lacks one of the two data points.  That's 
indeed not too good. :-/

Compiling genattrtab at -O1 saves half of the time on x86_64, and 
cfgcleanup goes from 20 to 0.7 seconds.  So it should be worthwhile 
investigating _which_ cfgcleanup pass is spending so much time. 
Compiling insn-recog.c at -O1 instead saves 30% of the compilation time.

I used to like Michael's patches a lot but nowadays parallelization of 
the build is much better thanks to more cores in the systems.  Since 
genattrtab can run in parallel with most of the rest of the build, what 
we need to save time on is "touch x.md; make -j6 cc1" (with the 
insn-attrtab.c move-if-changed rule disabled of course).  Maybe -O1 is 
enough together with Jakub's old patch to split insn-attrtab.c into four 
or eight files.  Moreover, the "split" idea could be applied to 
insn-recog.c too.

Michael, can you get numbers for your testcases when you compile "clean" 
insn-attrtab.c and/or insn-recog.c with -O1?

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 17:59   ` Dave Korn
  2010-06-15 20:30     ` Paolo Bonzini
@ 2010-06-15 22:03     ` David Daney
  1 sibling, 0 replies; 40+ messages in thread
From: David Daney @ 2010-06-15 22:03 UTC (permalink / raw)
  To: Dave Korn; +Cc: Mark Mitchell, Michael Matz, gcc-patches

On 06/15/2010 10:57 AM, Dave Korn wrote:
> On 15/06/2010 18:19, Mark Mitchell wrote:
>> Michael Matz wrote:
>>
>>> This speeds up genattrtab to be no time issue during bootstrap anymore.
>>
>>> The speed difference
>>> of the compiler is acceptable I think, actually even speeding up the
>>> compiler sometimes (probably cache effects, because the .text size of
>>> insn-attrtab is _much_ smaller) or being in the noise.
>>
>> I certainly see how this benefits us as GCC developers.
>
>    90 seconds out of a whole build, or did I misread something?
>

For me, it is more like 20 minutes.  As Mark pointed out, the Makefile 
for some reason starts running genattrtab close to last in stages 2 and 
3.  So in my case (12 CPU mips64-linux), genattrtab takes about 50% of 
the wall-clock time to build a pass.

David Daney

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 19:13     ` Mark Mitchell
  2010-06-15 19:17       ` Ralf Wildenhues
  2010-06-15 19:22       ` Jakub Jelinek
@ 2010-06-16  9:32       ` Richard Guenther
  2010-06-16 14:30         ` Nathan Froyd
  2 siblings, 1 reply; 40+ messages in thread
From: Richard Guenther @ 2010-06-16  9:32 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, Michael Matz, gcc-patches

On Tue, Jun 15, 2010 at 8:39 PM, Mark Mitchell <mark@codesourcery.com> wrote:
> Jan Hubicka wrote:
>
>> In WHOPR compilation we now partition the program based on source files (this
>> is something I plan to revisit later), so we get partition that corresponds to
>> insn-attrtab.  WPA driver sorts units by size and feeds them to parallel make
>> and insn-attrtab is still last on finishing build by quite long shot.  This
>> ignore time spent by genattrtab itself.
>>
>> So just reordering makefiles is not going to solve the problem.
>
> Well, it might solve the problem for developers not using WHOPR. :-)
>
> I certainly don't claim it's a general solution; at some point, if you
> have enouh cores, the single core compiling insn-attrtab will dominate
> anyhow.  But if some easy Makefile hackery would help some developers,
> I'd still argue we should do it.  And I still have a hard time with a
> change that's going to make the compiler slower in order to save a
> minute or two during compiler builds, especially given that test cycles
> are so much longer than build cycles.

I think the makefile ordering is already good - it is just that genattrtab
takes so much time that compiling insn-attrtab starts very late.  Michael
addresses this by speeding up genattrtab.

Richard.

> --
> Mark Mitchell
> CodeSourcery
> mark@codesourcery.com
> (650) 331-3385 x713
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16  9:32       ` Richard Guenther
@ 2010-06-16 14:30         ` Nathan Froyd
  2010-06-16 14:33           ` Michael Matz
  0 siblings, 1 reply; 40+ messages in thread
From: Nathan Froyd @ 2010-06-16 14:30 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Mark Mitchell, Jan Hubicka, Michael Matz, gcc-patches

On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote:
> On Tue, Jun 15, 2010 at 8:39 PM, Mark Mitchell <mark@codesourcery.com> wrote:
> > I certainly don't claim it's a general solution; at some point, if you
> > have enouh cores, the single core compiling insn-attrtab will dominate
> > anyhow.  But if some easy Makefile hackery would help some developers,
> > I'd still argue we should do it.  And I still have a hard time with a
> > change that's going to make the compiler slower in order to save a
> > minute or two during compiler builds, especially given that test cycles
> > are so much longer than build cycles.
> 
> I think the makefile ordering is already good - it is just that genattrtab
> takes so much time that compiling insn-attrtab starts very late.  Michael
> addresses this by speeding up genattrtab.

Pardon the dumb question here, but IIUC, we build insn-attrtab at each
stage.  Why not just build it once for all stages?  Do we gain anything
(besides less complicated Makefiles) by building it at every stage?  I
realize that this is not quite as nice as cutting its build time by
90%+, but moving it earlier might hide the compilation latency just as
effectively.

-Nathan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 14:30         ` Nathan Froyd
@ 2010-06-16 14:33           ` Michael Matz
  2010-06-16 14:43             ` Nathan Froyd
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Matz @ 2010-06-16 14:33 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: Richard Guenther, Mark Mitchell, Jan Hubicka, gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1203 bytes --]

Hi,

On Wed, 16 Jun 2010, Nathan Froyd wrote:

> On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote:
> > On Tue, Jun 15, 2010 at 8:39 PM, Mark Mitchell <mark@codesourcery.com> wrote:
> > > I certainly don't claim it's a general solution; at some point, if you
> > > have enouh cores, the single core compiling insn-attrtab will dominate
> > > anyhow.  But if some easy Makefile hackery would help some developers,
> > > I'd still argue we should do it.  And I still have a hard time with a
> > > change that's going to make the compiler slower in order to save a
> > > minute or two during compiler builds, especially given that test cycles
> > > are so much longer than build cycles.
> > 
> > I think the makefile ordering is already good - it is just that genattrtab
> > takes so much time that compiling insn-attrtab starts very late.  Michael
> > addresses this by speeding up genattrtab.
> 
> Pardon the dumb question here, but IIUC, we build insn-attrtab at each 
> stage.  Why not just build it once for all stages?

Because that's the definition of bootstrapping.  Either we bootstrap, or 
we don't.  It doesn't make sense to bootstrap only half the sources.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 14:33           ` Michael Matz
@ 2010-06-16 14:43             ` Nathan Froyd
  2010-06-16 15:33               ` Mike Stump
  2010-06-16 15:48               ` Michael Matz
  0 siblings, 2 replies; 40+ messages in thread
From: Nathan Froyd @ 2010-06-16 14:43 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, Mark Mitchell, Jan Hubicka, gcc-patches

On Wed, Jun 16, 2010 at 03:23:49PM +0200, Michael Matz wrote:
> On Wed, 16 Jun 2010, Nathan Froyd wrote:
> > On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote:
> > > I think the makefile ordering is already good - it is just that genattrtab
> > > takes so much time that compiling insn-attrtab starts very late.  Michael
> > > addresses this by speeding up genattrtab.
> > 
> > Pardon the dumb question here, but IIUC, we build insn-attrtab at each 
> > stage.  Why not just build it once for all stages?
> 
> Because that's the definition of bootstrapping.  Either we bootstrap, or 
> we don't.  It doesn't make sense to bootstrap only half the sources.

I'm not asking why we compile insn-attrtab at every stage, but why we
run genattrtab at every stage.  I understand that it's nice to compile
genattrtab at each stage for a little sanity checking, but the compiler
used to compile genattrtab should have no effect on the output of
genattrtab.

Or were you already responding to "why generate insn-attrtab at every
stage"?

-Nathan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 14:43             ` Nathan Froyd
@ 2010-06-16 15:33               ` Mike Stump
  2010-06-16 15:48               ` Michael Matz
  1 sibling, 0 replies; 40+ messages in thread
From: Mike Stump @ 2010-06-16 15:33 UTC (permalink / raw)
  To: Nathan Froyd
  Cc: Michael Matz, Richard Guenther, Mark Mitchell, Jan Hubicka, gcc-patches

On Jun 16, 2010, at 6:37 AM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On Wed, Jun 16, 2010 at 03:23:49PM +0200, Michael Matz wrote:
>> On Wed, 16 Jun 2010, Nathan Froyd wrote:
>>> On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote:
>>>> I think the makefile ordering is already good - it is just that genattrtab
>>>> takes so much time that compiling insn-attrtab starts very late.  Michael
>>>> addresses this by speeding up genattrtab.
>>> 
>>> Pardon the dumb question here, but IIUC, we build insn-attrtab at each 
>>> stage.  Why not just build it once for all stages?
>> 
>> Because that's the definition of bootstrapping.  Either we bootstrap, or 
>> we don't.  It doesn't make sense to bootstrap only half the sources.
> 
> I'm not asking why we compile insn-attrtab at every stage, but why we
> run genattrtab at every stage.  I understand that it's nice to compile
> genattrtab at each stage for a little sanity checking, but the compiler
> used to compile genattrtab should have no effect on the output of
> genattrtab.

That is true, unless there are bugs.  We compile with gcc , and then test to ensure there are no bugs.  In this fashion, 100% of the source that is compiled, is compiled with gcc, the very definition of bootstrap.  People that want more speed at the expense of testing, don't bootstrap. 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-15 19:22       ` Jakub Jelinek
@ 2010-06-16 15:42         ` Jakub Jelinek
  2010-06-16 15:59           ` Jan Hubicka
  2010-06-16 20:27           ` Jakub Jelinek
  0 siblings, 2 replies; 40+ messages in thread
From: Jakub Jelinek @ 2010-06-16 15:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: Mark Mitchell, Jan Hubicka, Michael Matz

On Tue, Jun 15, 2010 at 09:16:33PM +0200, Jakub Jelinek wrote:
> On Tue, Jun 15, 2010 at 11:39:47AM -0700, Mark Mitchell wrote:
> I believe on x86_64/i686 the most time is spent in compiling
> internal_dfa_insn_code, primarily because there are so many different
> schedulings.
> The insn is a big switch on recog_memoized, where most of the cases first
> compare ix86_schedule var to some enum.  I guess it would be certainly
> faster to compile to instead split the big function into separate function
> for each schedule and make internal_dfa_insn_code a function pointer, would
> need to be benchmarked how it would actually perform at runtime.

Here is a WIP untested patch.

A little bit hackish (I hardcode "cpu" attribute name - the final variant
could perhaps look at all EQ_ATTRs in first decl->condexp and see whether
any of the attributes is used in EQ_ATTR in decl->condexp of all
reservations) so far and unfinished (it generates
internal_dfa_insn_code_generic64
internal_dfa_insn_code_amdfam10
internal_dfa_insn_code_core2
...
functions (and similarly for insn_default_latency_), but so far
doesn't emit the glue (in particular some function that would be called
to initialize the fn pointers based on ix86_schedule in x86_64/i686 case
(not sure if we would need to call it once post options, or e.g. during
expand, or before each scheduling), emit the actual fn pointer
and how to ensure the callers know it is a fn pointer instead of a function
(or whether we need to emit insn_dfa_insn_code as a wrapper that calls
a fn pointer).

In any case, the patch (assuming there are no bugs, would in the end
need to test whether these subfunctions give the same return values
as original function) speeded up genattrtab (-g -O0 build, i.e.
non-optimized) from 1m9.429s down to 0m3.621s,  insn-attrtab.c shrunk
from 5139992 bytes to 3822187, compilation time using optimized release
checking cc1 went down from 40.949s to 19.403s, insn-attrtab.o .text
section went down from 1003956 bytes to 635384 bytes.

--- gcc/genattrtab.c.jj	2010-06-11 09:38:08.000000000 +0200
+++ gcc/genattrtab.c	2010-06-16 16:00:00.000000000 +0200
@@ -4379,28 +4379,115 @@ make_automaton_attrs (void)
   int i;
   struct insn_reserv *decl;
   rtx code_exp, lats_exp, byps_exp;
+  const char *cpu_name;
+  struct attr_desc *cpu_attr;
 
   if (n_insn_reservs == 0)
     return;
 
-  code_exp = rtx_alloc (COND);
-  lats_exp = rtx_alloc (COND);
-
-  XVEC (code_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
-  XVEC (lats_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
+  cpu_name = "cpu";
+  cpu_attr = find_attr (&cpu_name, 0);
+  if (cpu_attr != NULL && 0)
+    {
+      rtx *condexps = XNEWVEC (rtx, n_insn_reservs * 3);
+      struct attr_value *val;
 
-  XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1);
-  XEXP (lats_exp, 1) = make_numeric_value (0);
+      gcc_assert (cpu_attr->is_const
+		  && !cpu_attr->is_special
+		  && !cpu_attr->is_numeric);
+      for (val = cpu_attr->first_value; val; val = val->next)
+	{
+	  int j;
+	  char *name;
+	  rtx test = attr_rtx (EQ_ATTR, cpu_name, XSTR (val->value, 0));
+
+	  if (val == cpu_attr->default_val)
+	    continue;
+	  gcc_assert (GET_CODE (val->value) == CONST_STRING);
+	  for (decl = all_insn_reservs, i = 0;
+	       decl;
+	       decl = decl->next)
+	    {
+	      rtx ctest = test;
+	      rtx condexp
+		= simplify_and_tree (decl->condexp, &ctest, -2, 0);
+	      if (condexp == false_rtx)
+		continue;
+	      if (condexp == true_rtx)
+		break;
+	      condexps[i] = condexp;
+	      condexps[i + 1] = make_numeric_value (decl->insn_num);
+	      condexps[i + 2] = make_numeric_value (decl->default_latency);
+	      i += 3;
+	    }
+
+	  code_exp = rtx_alloc (COND);
+	  lats_exp = rtx_alloc (COND);
+
+	  j = i / 3 * 2;
+	  XVEC (code_exp, 0) = rtvec_alloc (j);
+	  XVEC (lats_exp, 0) = rtvec_alloc (j);
+
+	  if (decl)
+	    {
+	      XEXP (code_exp, 1) = make_numeric_value (decl->insn_num);
+	      XEXP (lats_exp, 1) = make_numeric_value (decl->default_latency);
+	    }
+	  else
+	    {
+	      XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1);
+	      XEXP (lats_exp, 1) = make_numeric_value (0);
+	    }
+
+	  while (i > 0)
+	    {
+	      i -= 3;
+	      j -= 2;
+	      XVECEXP (code_exp, 0, j) = condexps[i];
+	      XVECEXP (lats_exp, 0, j) = condexps[i];
+
+	      XVECEXP (code_exp, 0, j + 1) = condexps[i + 1];
+	      XVECEXP (lats_exp, 0, j + 1) = condexps[i + 2];
+	    }
+
+	  name = XNEWVEC (char,
+			  sizeof ("*internal_dfa_insn_code_")
+			  + strlen (XSTR (val->value, 0)));
+	  strcpy (name, "*internal_dfa_insn_code_");
+	  strcat (name, XSTR (val->value, 0));
+	  make_internal_attr (name, code_exp, ATTR_NONE);
+	  strcpy (name, "*insn_default_latency_");
+	  strcat (name, XSTR (val->value, 0));
+	  make_internal_attr (name, lats_exp, ATTR_NONE);
+	  XDELETEVEC (name);
+	}
 
-  for (decl = all_insn_reservs, i = 0;
-       decl;
-       decl = decl->next, i += 2)
+      XDELETEVEC (condexps);
+    }
+  else
     {
-      XVECEXP (code_exp, 0, i)   = decl->condexp;
-      XVECEXP (lats_exp, 0, i)   = decl->condexp;
+      code_exp = rtx_alloc (COND);
+      lats_exp = rtx_alloc (COND);
+
+      XVEC (code_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
+      XVEC (lats_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
 
-      XVECEXP (code_exp, 0, i+1) = make_numeric_value (decl->insn_num);
-      XVECEXP (lats_exp, 0, i+1) = make_numeric_value (decl->default_latency);
+      XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1);
+      XEXP (lats_exp, 1) = make_numeric_value (0);
+
+      for (decl = all_insn_reservs, i = 0;
+	   decl;
+	   decl = decl->next, i += 2)
+	{
+	  XVECEXP (code_exp, 0, i)   = decl->condexp;
+	  XVECEXP (lats_exp, 0, i)   = decl->condexp;
+
+	  XVECEXP (code_exp, 0, i+1) = make_numeric_value (decl->insn_num);
+	  XVECEXP (lats_exp, 0, i+1)
+	    = make_numeric_value (decl->default_latency);
+	}
+      make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE);
+      make_internal_attr ("*insn_default_latency",   lats_exp, ATTR_NONE);
     }
 
   if (n_bypasses == 0)
@@ -4423,8 +4510,6 @@ make_automaton_attrs (void)
 	  }
     }
 
-  make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE);
-  make_internal_attr ("*insn_default_latency",   lats_exp, ATTR_NONE);
   make_internal_attr ("*bypass_p",               byps_exp, ATTR_NONE);
 }
 


	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 14:43             ` Nathan Froyd
  2010-06-16 15:33               ` Mike Stump
@ 2010-06-16 15:48               ` Michael Matz
  2010-06-16 15:57                 ` Jan Hubicka
  1 sibling, 1 reply; 40+ messages in thread
From: Michael Matz @ 2010-06-16 15:48 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: Richard Guenther, Mark Mitchell, Jan Hubicka, gcc-patches

Hi,

On Wed, 16 Jun 2010, Nathan Froyd wrote:

> > > Pardon the dumb question here, but IIUC, we build insn-attrtab at 
> > > each stage.  Why not just build it once for all stages?
> > 
> > Because that's the definition of bootstrapping.  Either we bootstrap, 
> > or we don't.  It doesn't make sense to bootstrap only half the 
> > sources.
> 
> I'm not asking why we compile insn-attrtab at every stage, but why we 
> run genattrtab at every stage.  I understand that it's nice to compile 
> genattrtab at each stage for a little sanity checking, but the compiler 
> used to compile genattrtab should have no effect on the output of 
> genattrtab.

If we were going that route, then we wouldn't need bootstrapping at all, 
because the compiler used to compile $random.c should have no effect on 
the output of the final cc1.

> Or were you already responding to "why generate insn-attrtab at every
> stage"?

Implicitely, yes.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 15:48               ` Michael Matz
@ 2010-06-16 15:57                 ` Jan Hubicka
  2010-06-16 17:21                   ` Paolo Bonzini
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Hubicka @ 2010-06-16 15:57 UTC (permalink / raw)
  To: Michael Matz
  Cc: Nathan Froyd, Richard Guenther, Mark Mitchell, Jan Hubicka, gcc-patches

> Hi,
> 
> On Wed, 16 Jun 2010, Nathan Froyd wrote:
> 
> > > > Pardon the dumb question here, but IIUC, we build insn-attrtab at 
> > > > each stage.  Why not just build it once for all stages?
> > > 
> > > Because that's the definition of bootstrapping.  Either we bootstrap, 
> > > or we don't.  It doesn't make sense to bootstrap only half the 
> > > sources.
> > 
> > I'm not asking why we compile insn-attrtab at every stage, but why we 
> > run genattrtab at every stage.  I understand that it's nice to compile 
> > genattrtab at each stage for a little sanity checking, but the compiler 
> > used to compile genattrtab should have no effect on the output of 
> > genattrtab.
> 
> If we were going that route, then we wouldn't need bootstrapping at all, 
> because the compiler used to compile $random.c should have no effect on 
> the output of the final cc1.
> 
> > Or were you already responding to "why generate insn-attrtab at every
> > stage"?
> 
> Implicitely, yes.

With a lot of cores, we might actually execute getnattrtab and compilation
of stage1 insn-attrtab.c at same time and then just sanity check that genattrtab
did not change results :)

Honza
> 
> 
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 15:42         ` Jakub Jelinek
@ 2010-06-16 15:59           ` Jan Hubicka
  2010-06-16 20:27           ` Jakub Jelinek
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Hubicka @ 2010-06-16 15:59 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Mark Mitchell, Jan Hubicka, Michael Matz

> On Tue, Jun 15, 2010 at 09:16:33PM +0200, Jakub Jelinek wrote:
> > On Tue, Jun 15, 2010 at 11:39:47AM -0700, Mark Mitchell wrote:
> > I believe on x86_64/i686 the most time is spent in compiling
> > internal_dfa_insn_code, primarily because there are so many different
> > schedulings.
> > The insn is a big switch on recog_memoized, where most of the cases first
> > compare ix86_schedule var to some enum.  I guess it would be certainly
> > faster to compile to instead split the big function into separate function
> > for each schedule and make internal_dfa_insn_code a function pointer, would
> > need to be benchmarked how it would actually perform at runtime.
> 
> Here is a WIP untested patch.
> 
> A little bit hackish (I hardcode "cpu" attribute name - the final variant
> could perhaps look at all EQ_ATTRs in first decl->condexp and see whether
> any of the attributes is used in EQ_ATTR in decl->condexp of all
> reservations) so far and unfinished (it generates
> internal_dfa_insn_code_generic64
> internal_dfa_insn_code_amdfam10
> internal_dfa_insn_code_core2
> ...

Another breakup of the large functions I was thinking if is actually based
on INSN code.  At the moment most of get_attr_* functions are organized as
calls to extract_insn that is followed by swithc on insn code that is
calling other get_attr_* functions doing the same.

Breaking up the individual cases into separate functions _and_ making those
functions not call get_attr_* but the subfunction for specific insn code
should IMO let GCC IPA to do the right thing and improve both compile time
and runtime.

It is very dificult to make IPA to do the right thing without this breakup
since extract_insn is in the way.

Honza

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 15:57                 ` Jan Hubicka
@ 2010-06-16 17:21                   ` Paolo Bonzini
  2010-06-17 11:56                     ` Michael Matz
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2010-06-16 17:21 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Michael Matz, Nathan Froyd, Richard Guenther, Mark Mitchell, gcc-patches

On 06/16/2010 05:51 PM, Jan Hubicka wrote:
> With a lot of cores, we might actually execute getnattrtab and compilation
> of stage1 insn-attrtab.c at same time and then just sanity check that genattrtab
> did not change results :)

That's a different story.  It doesn't break the idea of bootstrapping, 
and can be applied to all insn-* files.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 15:42         ` Jakub Jelinek
  2010-06-16 15:59           ` Jan Hubicka
@ 2010-06-16 20:27           ` Jakub Jelinek
  2010-06-16 21:50             ` Mark Mitchell
  2010-06-17 12:27             ` Jakub Jelinek
  1 sibling, 2 replies; 40+ messages in thread
From: Jakub Jelinek @ 2010-06-16 20:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: Mark Mitchell, Jan Hubicka, Michael Matz

On Wed, Jun 16, 2010 at 04:33:34PM +0200, Jakub Jelinek wrote:
> On Tue, Jun 15, 2010 at 09:16:33PM +0200, Jakub Jelinek wrote:
> > On Tue, Jun 15, 2010 at 11:39:47AM -0700, Mark Mitchell wrote:
> > I believe on x86_64/i686 the most time is spent in compiling
> > internal_dfa_insn_code, primarily because there are so many different
> > schedulings.
> > The insn is a big switch on recog_memoized, where most of the cases first
> > compare ix86_schedule var to some enum.  I guess it would be certainly
> > faster to compile to instead split the big function into separate function
> > for each schedule and make internal_dfa_insn_code a function pointer, would
> > need to be benchmarked how it would actually perform at runtime.
> 
> Here is a WIP untested patch.

And here is a patch I've actually bootstrapped/regtested on x86_64-linux and
i686-linux.  It was an --enable-checking=release build (simultaneously both
arches), so timing wasn't exact, but config.status timestamp to compare
timestamp difference was
1112s -> 843s x86_64
736s -> 630s i686.
Times from config.status to end of make were
2453s -> 2111s x86_64
1190s -> 1100s i686
and times from config.status to and of make check were
4033s -> 3805s x86_64
3331s -> 3303s i686

Except for a few expected differences (insn-attrtab.o, files including
insn-attr.h and *checksum* gcc/*.o had no differences on stripped objects).

2010-06-16  Jakub Jelinek  <jakub@redhat.com>

	* Makefile.in (cfgexpand.o): Depend on $(INSN_ATTR_H).
	* genattrtab.c (check_tune_attr, find_tune_attr): New functions.
	(make_automaton_attrs): If find_tune_attr returns non-NULL,
	write separate internal_dfa_insn_code_* and insn_default_latency_*
	functions for each attribute's value and emit init_sched_attrs
	function and function pointers.
	* genattr.c (const_attrs, reservations): New variables.
	(gen_attr): Add const attributes to const_attrs vector.
	(check_tune_attr, find_tune_attr): New functions.
	(main): Add reservations to reservations vector.  If find_tune_attr
	returns true, add prototype for init_sched_attrs and make
	internal_dfa_insn_code and insn_default_latency function pointers,
	otherwise define init_sched_attrs as dummy macro.
	* cfgexpand.c: Include insn-attr.h.
	(gimple_expand_cfg): Call init_sched_attrs.

--- gcc/Makefile.in.jj	2010-06-15 10:37:06.000000000 +0200
+++ gcc/Makefile.in	2010-06-16 18:41:41.000000000 +0200
@@ -3192,7 +3192,7 @@ cfgexpand.o : cfgexpand.c $(TREE_FLOW_H)
    coretypes.h $(TREE_DUMP_H) $(EXCEPT_H) langhooks.h $(TREE_PASS_H) $(RTL_H) \
    $(DIAGNOSTIC_H) $(TOPLEV_H) $(BASIC_BLOCK_H) $(FLAGS_H) debug.h $(PARAMS_H) \
    value-prof.h $(TREE_INLINE_H) $(TARGET_H) $(SSAEXPAND_H) \
-   tree-pretty-print.h gimple-pretty-print.h $(BITMAP_H) sbitmap.h
+   tree-pretty-print.h gimple-pretty-print.h $(BITMAP_H) sbitmap.h $(INSN_ATTR_H)
 cfgrtl.o : cfgrtl.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(FLAGS_H) insn-config.h $(BASIC_BLOCK_H) $(REGS_H) hard-reg-set.h \
    output.h $(TOPLEV_H) $(FUNCTION_H) $(EXCEPT_H) $(TM_P_H) $(INSN_ATTR_H) \
--- gcc/genattrtab.c.jj	2010-06-11 09:38:08.000000000 +0200
+++ gcc/genattrtab.c	2010-06-16 19:19:57.000000000 +0200
@@ -1,6 +1,6 @@
 /* Generate code from machine description to compute values of attributes.
    Copyright (C) 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-   2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009
+   2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
    Free Software Foundation, Inc.
    Contributed by Richard Kenner (kenner@vlsi1.ultra.nyu.edu)
 
@@ -4372,6 +4372,69 @@ process_bypasses (void)
 	r->bypassed = true;
 }
 
+/* Check that attribute NAME is used in define_insn_reservation condition
+   EXP.  Return true if it is.  */
+static bool
+check_tune_attr (const char *name, rtx exp)
+{
+  switch (GET_CODE (exp))
+    {
+    case AND:
+      if (check_tune_attr (name, XEXP (exp, 0)))
+	return true;
+      return check_tune_attr (name, XEXP (exp, 1));
+
+    case IOR:
+      return (check_tune_attr (name, XEXP (exp, 0))
+	      && check_tune_attr (name, XEXP (exp, 1)));
+
+    case EQ_ATTR:
+      return XSTR (exp, 0) == name;
+
+    default:
+      return false;
+    }
+}
+
+/* Try to find a const attribute (usually cpu or tune) that is used
+   in all define_insn_reservation conditions.  */
+static struct attr_desc *
+find_tune_attr (rtx exp)
+{
+  struct attr_desc *attr;
+
+  switch (GET_CODE (exp))
+    {
+    case AND:
+    case IOR:
+      attr = find_tune_attr (XEXP (exp, 0));
+      if (attr)
+	return attr;
+      return find_tune_attr (XEXP (exp, 1));
+
+    case EQ_ATTR:
+      if (XSTR (exp, 0) == alternative_name)
+	return NULL;
+
+      attr = find_attr (&XSTR (exp, 0), 0);
+      gcc_assert (attr);
+
+      if (attr->is_const && !attr->is_special)
+	{
+	  struct insn_reserv *decl;
+
+	  for (decl = all_insn_reservs; decl; decl = decl->next)
+	    if (! check_tune_attr (attr->name, decl->condexp))
+	      return NULL;
+	  return attr;
+	}
+      return NULL;
+
+    default:
+      return NULL;
+    }
+}
+
 /* Create all of the attributes that describe automaton properties.  */
 static void
 make_automaton_attrs (void)
@@ -4379,28 +4442,154 @@ make_automaton_attrs (void)
   int i;
   struct insn_reserv *decl;
   rtx code_exp, lats_exp, byps_exp;
+  struct attr_desc *tune_attr;
 
   if (n_insn_reservs == 0)
     return;
 
-  code_exp = rtx_alloc (COND);
-  lats_exp = rtx_alloc (COND);
+  tune_attr = find_tune_attr (all_insn_reservs->condexp);
+  if (tune_attr != NULL)
+    {
+      rtx *condexps = XNEWVEC (rtx, n_insn_reservs * 3);
+      struct attr_value *val;
+      bool first = true;
+
+      gcc_assert (tune_attr->is_const
+		  && !tune_attr->is_special
+		  && !tune_attr->is_numeric);
+      for (val = tune_attr->first_value; val; val = val->next)
+	{
+	  if (val == tune_attr->default_val)
+	    continue;
+	  gcc_assert (GET_CODE (val->value) == CONST_STRING);
+	  printf ("static int internal_dfa_insn_code_%s (rtx);\n"
+		  "static int insn_default_latency_%s (rtx);\n",
+		  XSTR (val->value, 0), XSTR (val->value, 0));
+	}
+
+      printf ("\n");
+      printf ("int (*internal_dfa_insn_code) (rtx);\n");
+      printf ("int (*insn_default_latency) (rtx);\n");
+      printf ("\n");
+      printf ("void\n");
+      printf ("init_sched_attrs (void)\n");
+      printf ("{\n");
+
+      for (val = tune_attr->first_value; val; val = val->next)
+	{
+	  int j;
+	  char *name;
+	  rtx test = attr_rtx (EQ_ATTR, tune_attr->name, XSTR (val->value, 0));
+
+	  if (val == tune_attr->default_val)
+	    continue;
+	  for (decl = all_insn_reservs, i = 0;
+	       decl;
+	       decl = decl->next)
+	    {
+	      rtx ctest = test;
+	      rtx condexp
+		= simplify_and_tree (decl->condexp, &ctest, -2, 0);
+	      if (condexp == false_rtx)
+		continue;
+	      if (condexp == true_rtx)
+		break;
+	      condexps[i] = condexp;
+	      condexps[i + 1] = make_numeric_value (decl->insn_num);
+	      condexps[i + 2] = make_numeric_value (decl->default_latency);
+	      i += 3;
+	    }
+
+	  code_exp = rtx_alloc (COND);
+	  lats_exp = rtx_alloc (COND);
+
+	  j = i / 3 * 2;
+	  XVEC (code_exp, 0) = rtvec_alloc (j);
+	  XVEC (lats_exp, 0) = rtvec_alloc (j);
+
+	  if (decl)
+	    {
+	      XEXP (code_exp, 1) = make_numeric_value (decl->insn_num);
+	      XEXP (lats_exp, 1) = make_numeric_value (decl->default_latency);
+	    }
+	  else
+	    {
+	      XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1);
+	      XEXP (lats_exp, 1) = make_numeric_value (0);
+	    }
+
+	  while (i > 0)
+	    {
+	      i -= 3;
+	      j -= 2;
+	      XVECEXP (code_exp, 0, j) = condexps[i];
+	      XVECEXP (lats_exp, 0, j) = condexps[i];
+
+	      XVECEXP (code_exp, 0, j + 1) = condexps[i + 1];
+	      XVECEXP (lats_exp, 0, j + 1) = condexps[i + 2];
+	    }
 
-  XVEC (code_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
-  XVEC (lats_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
+	  name = XNEWVEC (char,
+			  sizeof ("*internal_dfa_insn_code_")
+			  + strlen (XSTR (val->value, 0)));
+	  strcpy (name, "*internal_dfa_insn_code_");
+	  strcat (name, XSTR (val->value, 0));
+	  make_internal_attr (name, code_exp, ATTR_NONE);
+	  strcpy (name, "*insn_default_latency_");
+	  strcat (name, XSTR (val->value, 0));
+	  make_internal_attr (name, lats_exp, ATTR_NONE);
+	  XDELETEVEC (name);
 
-  XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1);
-  XEXP (lats_exp, 1) = make_numeric_value (0);
+	  if (first)
+	    {
+	      printf ("  if (");
+	      first = false;
+	    }
+	  else
+	    printf ("  else if (");
+	  write_test_expr (test, 0);
+	  printf (")\n");
+	  printf ("    {\n");
+	  printf ("      internal_dfa_insn_code\n");
+	  printf ("        = internal_dfa_insn_code_%s;\n",
+		  XSTR (val->value, 0));
+	  printf ("      insn_default_latency\n");
+	  printf ("        = insn_default_latency_%s;\n",
+		  XSTR (val->value, 0));
+	  printf ("    }\n");
+	}
+
+      printf ("  else\n");
+      printf ("    gcc_unreachable ();\n");
+      printf ("}\n");
+      printf ("\n");
 
-  for (decl = all_insn_reservs, i = 0;
-       decl;
-       decl = decl->next, i += 2)
+      XDELETEVEC (condexps);
+    }
+  else
     {
-      XVECEXP (code_exp, 0, i)   = decl->condexp;
-      XVECEXP (lats_exp, 0, i)   = decl->condexp;
+      code_exp = rtx_alloc (COND);
+      lats_exp = rtx_alloc (COND);
+
+      XVEC (code_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
+      XVEC (lats_exp, 0) = rtvec_alloc (n_insn_reservs * 2);
 
-      XVECEXP (code_exp, 0, i+1) = make_numeric_value (decl->insn_num);
-      XVECEXP (lats_exp, 0, i+1) = make_numeric_value (decl->default_latency);
+      XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1);
+      XEXP (lats_exp, 1) = make_numeric_value (0);
+
+      for (decl = all_insn_reservs, i = 0;
+	   decl;
+	   decl = decl->next, i += 2)
+	{
+	  XVECEXP (code_exp, 0, i)   = decl->condexp;
+	  XVECEXP (lats_exp, 0, i)   = decl->condexp;
+
+	  XVECEXP (code_exp, 0, i+1) = make_numeric_value (decl->insn_num);
+	  XVECEXP (lats_exp, 0, i+1)
+	    = make_numeric_value (decl->default_latency);
+	}
+      make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE);
+      make_internal_attr ("*insn_default_latency",   lats_exp, ATTR_NONE);
     }
 
   if (n_bypasses == 0)
@@ -4423,8 +4612,6 @@ make_automaton_attrs (void)
 	  }
     }
 
-  make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE);
-  make_internal_attr ("*insn_default_latency",   lats_exp, ATTR_NONE);
   make_internal_attr ("*bypass_p",               byps_exp, ATTR_NONE);
 }
 
--- gcc/genattr.c.jj	2010-06-11 09:38:08.000000000 +0200
+++ gcc/genattr.c	2010-06-16 19:18:10.000000000 +0200
@@ -1,6 +1,6 @@
 /* Generate attribute information (insn-attr.h) from machine description.
-   Copyright (C) 1991, 1994, 1996, 1998, 1999, 2000, 2003, 2004, 2007, 2008
-   Free Software Foundation, Inc.
+   Copyright (C) 1991, 1994, 1996, 1998, 1999, 2000, 2003, 2004, 2007, 2008,
+   2010  Free Software Foundation, Inc.
    Contributed by Richard Kenner (kenner@vlsi1.ultra.nyu.edu)
 
 This file is part of GCC.
@@ -40,12 +40,18 @@ write_upcase (const char *str)
     putchar (TOUPPER(*str));
 }
 
+static VEC (rtx, heap) *const_attrs, *reservations;
+
+
 static void
 gen_attr (rtx attr)
 {
   const char *p, *tag;
   int is_const = GET_CODE (XEXP (attr, 2)) == CONST;
 
+  if (is_const)
+    VEC_safe_push (rtx, heap, const_attrs, attr);
+
   printf ("#define HAVE_ATTR_%s\n", XSTR (attr, 0));
 
   /* If numeric attribute, don't need to write an enum.  */
@@ -92,6 +98,68 @@ extern int insn_current_length (rtx);\n\
     }
 }
 
+/* Check that attribute NAME is used in define_insn_reservation condition
+   EXP.  Return true if it is.  */
+static bool
+check_tune_attr (const char *name, rtx exp)
+{
+  switch (GET_CODE (exp))
+    {
+    case AND:
+      if (check_tune_attr (name, XEXP (exp, 0)))
+	return true;
+      return check_tune_attr (name, XEXP (exp, 1));
+
+    case IOR:
+      return (check_tune_attr (name, XEXP (exp, 0))
+	      && check_tune_attr (name, XEXP (exp, 1)));
+
+    case EQ_ATTR:
+      return strcmp (XSTR (exp, 0), name) == 0;
+
+    default:
+      return false;
+    }
+}
+
+/* Try to find a const attribute (usually cpu or tune) that is used
+   in all define_insn_reservation conditions.  */
+static bool
+find_tune_attr (rtx exp)
+{
+  unsigned int i;
+  rtx attr;
+
+  switch (GET_CODE (exp))
+    {
+    case AND:
+    case IOR:
+      if (find_tune_attr (XEXP (exp, 0)))
+	return true;
+      return find_tune_attr (XEXP (exp, 1));
+
+    case EQ_ATTR:
+      if (strcmp (XSTR (exp, 0), "alternative") == 0)
+	return false;
+
+      for (i = 0; VEC_iterate (rtx, const_attrs, i, attr); i++)
+	if (strcmp (XSTR (attr, 0), XSTR (exp, 0)) == 0)
+	  {
+	    unsigned int j;
+	    rtx resv;
+
+	    for (j = 0; VEC_iterate (rtx, reservations, j, resv); j++)
+	      if (! check_tune_attr (XSTR (attr, 0), XEXP (resv, 2)))
+		return false;
+	    return true;
+	  }
+      return false;
+
+    default:
+      return false;
+    }
+}
+
 int
 main (int argc, char **argv)
 {
@@ -162,11 +230,16 @@ main (int argc, char **argv)
         }
 
       else if (GET_CODE (desc) == DEFINE_INSN_RESERVATION)
-	num_insn_reservations++;
+	{
+	  num_insn_reservations++;
+	  VEC_safe_push (rtx, heap, reservations, desc);
+	}
     }
 
   if (num_insn_reservations > 0)
     {
+      bool has_tune_attr
+	= find_tune_attr (XEXP (VEC_index (rtx, reservations, 0), 2));
       /* Output interface for pipeline hazards recognition based on
 	 DFA (deterministic finite state automata.  */
       printf ("\n#define INSN_SCHEDULING\n");
@@ -181,10 +254,24 @@ main (int argc, char **argv)
       printf ("#define CPU_UNITS_QUERY 0\n");
       printf ("#endif\n\n");
       /* Interface itself: */
-      printf ("/* Internal insn code number used by automata.  */\n");
-      printf ("extern int internal_dfa_insn_code (rtx);\n\n");
-      printf ("/* Insn latency time defined in define_insn_reservation. */\n");
-      printf ("extern int insn_default_latency (rtx);\n\n");
+      if (has_tune_attr)
+	{
+	  printf ("/* Initialize fn pointers for internal_dfa_insn_code\n");
+	  printf ("   and insn_default_latency.  */\n");
+	  printf ("extern void init_sched_attrs (void);\n\n");
+	  printf ("/* Internal insn code number used by automata.  */\n");
+	  printf ("extern int (*internal_dfa_insn_code) (rtx);\n\n");
+	  printf ("/* Insn latency time defined in define_insn_reservation. */\n");
+	  printf ("extern int (*insn_default_latency) (rtx);\n\n");
+	}
+      else
+	{
+	  printf ("#define init_sched_attrs() do { } while (0)\n\n");
+	  printf ("/* Internal insn code number used by automata.  */\n");
+	  printf ("extern int internal_dfa_insn_code (rtx);\n\n");
+	  printf ("/* Insn latency time defined in define_insn_reservation. */\n");
+	  printf ("extern int insn_default_latency (rtx);\n\n");
+	}
       printf ("/* Return nonzero if there is a bypass for given insn\n");
       printf ("   which is a data producer.  */\n");
       printf ("extern int bypass_p (rtx);\n\n");
--- gcc/cfgexpand.c.jj	2010-06-07 11:24:33.000000000 +0200
+++ gcc/cfgexpand.c	2010-06-16 18:41:04.000000000 +0200
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  
 #include "ssaexpand.h"
 #include "bitmap.h"
 #include "sbitmap.h"
+#include "insn-attr.h" /* For INSN_SCHEDULING.  */
 
 /* This variable holds information helping the rewriting of SSA trees
    into RTL.  */
@@ -3761,6 +3762,10 @@ gimple_expand_cfg (void)
   set_curr_insn_block (DECL_INITIAL (current_function_decl));
   prologue_locator = curr_insn_locator ();
 
+#ifdef INSN_SCHEDULING
+  init_sched_attrs ();
+#endif
+
   /* Make sure first insn is a note even if we don't want linenums.
      This makes sure the first insn will never be deleted.
      Also, final expects a note to appear there.  */


	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 20:27           ` Jakub Jelinek
@ 2010-06-16 21:50             ` Mark Mitchell
  2010-06-17 10:21               ` Jakub Jelinek
  2010-06-17 12:27             ` Jakub Jelinek
  1 sibling, 1 reply; 40+ messages in thread
From: Mark Mitchell @ 2010-06-16 21:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Jan Hubicka, Michael Matz

Jakub Jelinek wrote:

> 2010-06-16  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* Makefile.in (cfgexpand.o): Depend on $(INSN_ATTR_H).
> 	* genattrtab.c (check_tune_attr, find_tune_attr): New functions.

This should have no impact on compile-time for things compiled with GCC,
correct?  If so, for avoidance of doubt, while I haven't reviewed the
patch in detail, I certainly have no objections to it.  Let me know if
you need help getting it reviewed.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 21:50             ` Mark Mitchell
@ 2010-06-17 10:21               ` Jakub Jelinek
  2010-06-17 11:50                 ` Jan Hubicka
  2010-06-17 13:39                 ` Michael Matz
  0 siblings, 2 replies; 40+ messages in thread
From: Jakub Jelinek @ 2010-06-17 10:21 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc-patches, Jan Hubicka, Michael Matz

On Wed, Jun 16, 2010 at 02:22:58PM -0700, Mark Mitchell wrote:
> Jakub Jelinek wrote:
> 
> > 2010-06-16  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	* Makefile.in (cfgexpand.o): Depend on $(INSN_ATTR_H).
> > 	* genattrtab.c (check_tune_attr, find_tune_attr): New functions.
> 
> This should have no impact on compile-time for things compiled with GCC,
> correct?  If so, for avoidance of doubt, while I haven't reviewed the
> patch in detail, I certainly have no objections to it.  Let me know if

It has compile time impact, but a mixed one.  The negative performance
impact is that internal_dfa_insn_code and insn_default_latency calls
are no longer direct function calls (on the targets which have some cpu/tune
attribute tested in all reservations), but are function pointers and thus
indirect calls.  The pointer isn't changing much usually though (unless
optimized/target attribute/pragma is used, it shouldn't change at all).
How much this costs depends on the host CPU (and whether the host CPU
is able to cache target CPU if the fn pointer isn't changing;
currently the init_sched_attrs call which is called once per function
always writes the fn pointers, usually with the same value as it already has
- would it help for some host CPUs if the function instead computed the
fn pointers into temporary variable and wrote the fn pointer var only
if the temporary is different from its current contents?).

The advantage is that the text size of the functions shrinks a lot
(at least on the architectures I've looked at - i?86/x86_64, powerpc{,64}
and s390{,x} the .text size of all the per tuning functions together
is smaller than the .text size of the old monster functions, the sum of
all the per tuning function .rodata sizes (jump tables) usually slightly
grew, but still for each individual function both sizes are much smaller),
which means that unless optimize/target attribute is used heavily and every
function uses different tuning, the new code is much more i-cache and
d-cache friendly.  Plus, many extract_insn_cached or
extract_constrain_insn_cached calls could go away - if say only one tuning
was interested in that additional info and all others don't care for
some particular insn, the new code will call it only in the function
for the tuning that needs it and not in the other tuning functions.

The last arch I've looked at was arm - there the patch doesn't make any
difference (except for #define init_sched_attrs() do { } while (0) in
insn-attr.h) because arm currently doesn't have a single const attribute
that is used in tests for all reservations.  The solution there could be
to create a new const attribute that would combine the attributes currently
used in define_insn_reservation tests (say a bitfield containing the other
attrs), will leave that to arm maintainers if they wish to do that.

	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 10:21               ` Jakub Jelinek
@ 2010-06-17 11:50                 ` Jan Hubicka
  2010-06-17 13:39                 ` Michael Matz
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Hubicka @ 2010-06-17 11:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Mark Mitchell, gcc-patches, Jan Hubicka, Michael Matz

> On Wed, Jun 16, 2010 at 02:22:58PM -0700, Mark Mitchell wrote:
> > Jakub Jelinek wrote:
> > 
> > > 2010-06-16  Jakub Jelinek  <jakub@redhat.com>
> > > 
> > > 	* Makefile.in (cfgexpand.o): Depend on $(INSN_ATTR_H).
> > > 	* genattrtab.c (check_tune_attr, find_tune_attr): New functions.
> > 
> > This should have no impact on compile-time for things compiled with GCC,
> > correct?  If so, for avoidance of doubt, while I haven't reviewed the
> > patch in detail, I certainly have no objections to it.  Let me know if
> 
> It has compile time impact, but a mixed one.  The negative performance
> impact is that internal_dfa_insn_code and insn_default_latency calls
> are no longer direct function calls (on the targets which have some cpu/tune
> attribute tested in all reservations), but are function pointers and thus
> indirect calls.  The pointer isn't changing much usually though (unless
> optimized/target attribute/pragma is used, it shouldn't change at all).
> How much this costs depends on the host CPU (and whether the host CPU
> is able to cache target CPU if the fn pointer isn't changing;
> currently the init_sched_attrs call which is called once per function
> always writes the fn pointers, usually with the same value as it already has
> - would it help for some host CPUs if the function instead computed the
> fn pointers into temporary variable and wrote the fn pointer var only
> if the temporary is different from its current contents?).

I think in general we took the way of function pointers instead of 
macro machinery with direct calls even in hot parts of program (we
have targhooks in general_operand and friendds; dataflow branch
has indirect calls in internal loop etc.).

So I would not worry about this particular case, it is not worse than
existic practices.
> 
> The advantage is that the text size of the functions shrinks a lot
> (at least on the architectures I've looked at - i?86/x86_64, powerpc{,64}
> and s390{,x} the .text size of all the per tuning functions together
> is smaller than the .text size of the old monster functions, the sum of
> all the per tuning function .rodata sizes (jump tables) usually slightly
> grew, but still for each individual function both sizes are much smaller),
> which means that unless optimize/target attribute is used heavily and every
> function uses different tuning, the new code is much more i-cache and
> d-cache friendly.  Plus, many extract_insn_cached or
> extract_constrain_insn_cached calls could go away - if say only one tuning
> was interested in that additional info and all others don't care for
> some particular insn, the new code will call it only in the function
> for the tuning that needs it and not in the other tuning functions.

Oprofiling the compilatio of small files even with LTO linked binary,
we do have a lot of system overhead (it is over 70% for empty file compilation).

I guess cost of mmpapping large binaries + the memory dirtified by startup
accounts a lot here.

Honza
> 
> The last arch I've looked at was arm - there the patch doesn't make any
> difference (except for #define init_sched_attrs() do { } while (0) in
> insn-attr.h) because arm currently doesn't have a single const attribute
> that is used in tests for all reservations.  The solution there could be
> to create a new const attribute that would combine the attributes currently
> used in define_insn_reservation tests (say a bitfield containing the other
> attrs), will leave that to arm maintainers if they wish to do that.
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 17:21                   ` Paolo Bonzini
@ 2010-06-17 11:56                     ` Michael Matz
  2010-06-17 12:08                       ` Jan Hubicka
  2010-06-17 12:14                       ` Paolo Bonzini
  0 siblings, 2 replies; 40+ messages in thread
From: Michael Matz @ 2010-06-17 11:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Hubicka, Nathan Froyd, Richard Guenther, Mark Mitchell, gcc-patches

Hi,

On Wed, 16 Jun 2010, Paolo Bonzini wrote:

> On 06/16/2010 05:51 PM, Jan Hubicka wrote:
> > With a lot of cores, we might actually execute getnattrtab and compilation
> > of stage1 insn-attrtab.c at same time and then just sanity check that
> > genattrtab
> > did not change results :)
> 
> That's a different story.  It doesn't break the idea of bootstrapping, 
> and can be applied to all insn-* files.

But it also doesn't give any gains.  You still would have to run 
genattrtab (to get something to compare with), and you still would have to 
compile insn-attrtab.c (to check that cc1 produced the same code as last 
stage).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 11:56                     ` Michael Matz
@ 2010-06-17 12:08                       ` Jan Hubicka
  2010-06-17 12:20                         ` Michael Matz
  2010-06-17 12:14                       ` Paolo Bonzini
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Hubicka @ 2010-06-17 12:08 UTC (permalink / raw)
  To: Michael Matz
  Cc: Paolo Bonzini, Jan Hubicka, Nathan Froyd, Richard Guenther,
	Mark Mitchell, gcc-patches

> Hi,
> 
> On Wed, 16 Jun 2010, Paolo Bonzini wrote:
> 
> > On 06/16/2010 05:51 PM, Jan Hubicka wrote:
> > > With a lot of cores, we might actually execute getnattrtab and compilation
> > > of stage1 insn-attrtab.c at same time and then just sanity check that
> > > genattrtab
> > > did not change results :)
> > 
> > That's a different story.  It doesn't break the idea of bootstrapping, 
> > and can be applied to all insn-* files.
> 
> But it also doesn't give any gains.  You still would have to run 
> genattrtab (to get something to compare with), and you still would have to 
> compile insn-attrtab.c (to check that cc1 produced the same code as last 
> stage).

This will let you to start insn-attrtab compilation as first thing during build,
so you gain parallelizm in between genattrtab and insn-attrtab compilation for stage>1.
Obviously as the insn-attrtab compilation time is still a bottleneck in whole build
(as can be seen in whopr link), we won't solve this completely, but we will get
genattrtab runtime less relevant to the whole build time.

Honza
> 
> 
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 11:56                     ` Michael Matz
  2010-06-17 12:08                       ` Jan Hubicka
@ 2010-06-17 12:14                       ` Paolo Bonzini
  1 sibling, 0 replies; 40+ messages in thread
From: Paolo Bonzini @ 2010-06-17 12:14 UTC (permalink / raw)
  To: Michael Matz
  Cc: Jan Hubicka, Nathan Froyd, Richard Guenther, Mark Mitchell, gcc-patches

On 06/17/2010 01:39 PM, Michael Matz wrote:
> On Wed, 16 Jun 2010, Paolo Bonzini wrote:
>> On 06/16/2010 05:51 PM, Jan Hubicka wrote:
>>> With a lot of cores, we might actually execute getnattrtab and compilation
>>> of stage1 insn-attrtab.c at same time and then just sanity check that
>>> genattrtab
>>> did not change results :)
>>
>> That's a different story.  It doesn't break the idea of bootstrapping,
>> and can be applied to all insn-* files.
>
> But it also doesn't give any gains.  You still would have to run
> genattrtab (to get something to compare with), and you still would have to
> compile insn-attrtab.c (to check that cc1 produced the same code as last
> stage).

You can parallelize those two, so you would still get a 50% gain on a 
beefy-enough machine (pre Jakub's patch).  However, the insn_dfa_code 
split makes the serialization of genattrtab and insn-attrtab.o 
compilation much less troublesome.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 12:08                       ` Jan Hubicka
@ 2010-06-17 12:20                         ` Michael Matz
  2010-06-17 12:26                           ` Jan Hubicka
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Matz @ 2010-06-17 12:20 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Paolo Bonzini, Nathan Froyd, Richard Guenther, Mark Mitchell,
	gcc-patches

Hi,

On Thu, 17 Jun 2010, Jan Hubicka wrote:

> > > That's a different story.  It doesn't break the idea of 
> > > bootstrapping, and can be applied to all insn-* files.
> > 
> > But it also doesn't give any gains.  You still would have to run 
> > genattrtab (to get something to compare with), and you still would 
> > have to compile insn-attrtab.c (to check that cc1 produced the same 
> > code as last stage).
> 
> This will let you to start insn-attrtab compilation as first thing 
> during build,

And then restart if the genattrtab result turns out to be different?  Or 
just stop compilation?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 12:20                         ` Michael Matz
@ 2010-06-17 12:26                           ` Jan Hubicka
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Hubicka @ 2010-06-17 12:26 UTC (permalink / raw)
  To: Michael Matz
  Cc: Jan Hubicka, Paolo Bonzini, Nathan Froyd, Richard Guenther,
	Mark Mitchell, gcc-patches

> Hi,
> 
> On Thu, 17 Jun 2010, Jan Hubicka wrote:
> 
> > > > That's a different story.  It doesn't break the idea of 
> > > > bootstrapping, and can be applied to all insn-* files.
> > > 
> > > But it also doesn't give any gains.  You still would have to run 
> > > genattrtab (to get something to compare with), and you still would 
> > > have to compile insn-attrtab.c (to check that cc1 produced the same 
> > > code as last stage).
> > 
> > This will let you to start insn-attrtab compilation as first thing 
> > during build,
> 
> And then restart if the genattrtab result turns out to be different?  Or 
> just stop compilation?

Stop compilation. They should be the same, right?

Honza

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-16 20:27           ` Jakub Jelinek
  2010-06-16 21:50             ` Mark Mitchell
@ 2010-06-17 12:27             ` Jakub Jelinek
  1 sibling, 0 replies; 40+ messages in thread
From: Jakub Jelinek @ 2010-06-17 12:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: Mark Mitchell, Jan Hubicka, Michael Matz

On Wed, Jun 16, 2010 at 09:09:24PM +0200, Jakub Jelinek wrote:
> Except for a few expected differences (insn-attrtab.o, files including
> insn-attr.h and *checksum* gcc/*.o had no differences on stripped objects).

FYI, I've just finished --enable-checking=yes,rtl bootstraps of
this patch with an extra checking patch on top of it (below) on x86_64-linux
and i686-linux to verify that the old and new internal_dfa_insn_code and
insn_default_latency give the same return values.

--- gcc/genattrtab.c.jj	2010-06-17 10:59:05.000000000 +0200
+++ gcc/genattrtab.c	2010-06-17 11:51:23.000000000 +0200
@@ -4467,9 +4467,29 @@ make_automaton_attrs (void)
 		  XSTR (val->value, 0), XSTR (val->value, 0));
 	}
 
+      printf ("static int internal_dfa_insn_code_ (rtx);\n");
+      printf ("static int insn_default_latency_ (rtx);\n");
+      printf ("static int (*ptr_internal_dfa_insn_code) (rtx);\n");
+      printf ("static int (*ptr_insn_default_latency) (rtx);\n");
       printf ("\n");
-      printf ("int (*internal_dfa_insn_code) (rtx);\n");
-      printf ("int (*insn_default_latency) (rtx);\n");
+      printf ("static int\n");
+      printf ("internal_dfa_insn_code_test_ (rtx insn)\n");
+      printf ("{\n");
+      printf ("  int ret = internal_dfa_insn_code_ (insn);\n");
+      printf ("  gcc_assert (ret == ptr_internal_dfa_insn_code (insn));\n");
+      printf ("  return ret;\n");
+      printf ("}\n");
+      printf ("static int\n");
+      printf ("insn_default_latency_test_ (rtx insn)\n");
+      printf ("{\n");
+      printf ("  int ret = insn_default_latency_ (insn);\n");
+      printf ("  gcc_assert (ret == ptr_insn_default_latency (insn));\n");
+      printf ("  return ret;\n");
+      printf ("}\n");
+      printf ("\n");
+      printf ("\n");
+      printf ("int (*internal_dfa_insn_code) (rtx) = internal_dfa_insn_code_test_;\n");
+      printf ("int (*insn_default_latency) (rtx) = insn_default_latency_test_;\n");
       printf ("\n");
       printf ("void\n");
       printf ("init_sched_attrs (void)\n");
@@ -4550,10 +4570,10 @@ make_automaton_attrs (void)
 	  write_test_expr (test, 0);
 	  printf (")\n");
 	  printf ("    {\n");
-	  printf ("      internal_dfa_insn_code\n");
+	  printf ("      ptr_internal_dfa_insn_code\n");
 	  printf ("        = internal_dfa_insn_code_%s;\n",
 		  XSTR (val->value, 0));
-	  printf ("      insn_default_latency\n");
+	  printf ("      ptr_insn_default_latency\n");
 	  printf ("        = insn_default_latency_%s;\n",
 		  XSTR (val->value, 0));
 	  printf ("    }\n");
@@ -4566,7 +4586,7 @@ make_automaton_attrs (void)
 
       XDELETEVEC (condexps);
     }
-  else
+  if (1)
     {
       code_exp = rtx_alloc (COND);
       lats_exp = rtx_alloc (COND);
@@ -4588,8 +4608,16 @@ make_automaton_attrs (void)
 	  XVECEXP (lats_exp, 0, i+1)
 	    = make_numeric_value (decl->default_latency);
 	}
-      make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE);
-      make_internal_attr ("*insn_default_latency",   lats_exp, ATTR_NONE);
+      if (tune_attr != NULL)
+	{
+	  make_internal_attr ("*internal_dfa_insn_code_", code_exp, ATTR_NONE);
+	  make_internal_attr ("*insn_default_latency_",   lats_exp, ATTR_NONE);
+	}
+      else
+	{
+	  make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE);
+	  make_internal_attr ("*insn_default_latency",   lats_exp, ATTR_NONE);
+	}
     }
 
   if (n_bypasses == 0)

	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 10:21               ` Jakub Jelinek
  2010-06-17 11:50                 ` Jan Hubicka
@ 2010-06-17 13:39                 ` Michael Matz
  2010-06-17 15:24                   ` Mark Mitchell
  1 sibling, 1 reply; 40+ messages in thread
From: Michael Matz @ 2010-06-17 13:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Mark Mitchell, gcc-patches, Jan Hubicka

Hello,

On Thu, 17 Jun 2010, Jakub Jelinek wrote:

> On Wed, Jun 16, 2010 at 02:22:58PM -0700, Mark Mitchell wrote:
> > Jakub Jelinek wrote:
> > 
> > > 2010-06-16  Jakub Jelinek  <jakub@redhat.com>
> > > 
> > > 	* Makefile.in (cfgexpand.o): Depend on $(INSN_ATTR_H).
> > > 	* genattrtab.c (check_tune_attr, find_tune_attr): New functions.
> > 
> > This should have no impact on compile-time for things compiled with GCC,
> > correct?  If so, for avoidance of doubt, while I haven't reviewed the
> > patch in detail, I certainly have no objections to it.  Let me know if
> 
> It has compile time impact, but a mixed one.

Same setup as with the other measurements.

arch     name   gen_u  st1_u  big_u  kde_u
alpha    clean  0      0.52   40.92  29.32
alpha    jj     0      0.46   41.01  29.14
arm      clean  6      27.10  47.61  41.30
arm      jj     6      27.03  46.93  41.14
crisv32  clean  0      0.21   34.16  24.44
crisv32  jj     0      0.21   34.18  24.46
hppa     clean  0      1.10   38.37  28.44
hppa     jj     0      0.89   38.34  28.27
i386     clean  44     33.39  30.64  26.51
i386     jj     3      10.06  30.54  26.56
ia64     clean  1      1.82   64.58  45.89
ia64     jj     0      1.54   64.57  47.07
mips     clean  74     14.68  44.01
mips     jj     74     14.61  44.13
powerpc  clean  56     48.49  42.49  30.57
powerpc  jj     1      4.10   42.46  30.28
s390x    clean  0      1.96   41.84  28.99
s390x    jj     0      1.40   41.64  28.80
sh       clean  0      1.35   47.57  34.35
sh       jj     0      1.37   47.57  34.41
sparc    clean  0      1.08   36.73  29.71
sparc    jj     0      1.01   36.40  29.39
x86_64   clean  52     42.40  28.02  25.71
x86_64   jj     3      12.25  27.92  25.64


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 13:39                 ` Michael Matz
@ 2010-06-17 15:24                   ` Mark Mitchell
  2010-06-17 18:09                     ` Michael Matz
  0 siblings, 1 reply; 40+ messages in thread
From: Mark Mitchell @ 2010-06-17 15:24 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jakub Jelinek, gcc-patches, Jan Hubicka

Michael Matz wrote:

>>> This should have no impact on compile-time for things compiled with GCC,
>>> correct?  If so, for avoidance of doubt, while I haven't reviewed the
>>> patch in detail, I certainly have no objections to it.  Let me know if
>> It has compile time impact, but a mixed one.
> 
> Same setup as with the other measurements.

Is it easy for you to toss that into a spreadsheet and get a geometric
mean of the impact across architectures?  Compared to the earlier table,
this looks like it has less negative impact in many cases and positive
impact in some cases.  I'm hoping that it's a wash overall...

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 15:24                   ` Mark Mitchell
@ 2010-06-17 18:09                     ` Michael Matz
  2010-06-17 18:44                       ` Mark Mitchell
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Matz @ 2010-06-17 18:09 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jakub Jelinek, gcc-patches, Jan Hubicka

Hi,

On Thu, 17 Jun 2010, Mark Mitchell wrote:

> Michael Matz wrote:
> 
> >>> This should have no impact on compile-time for things compiled with GCC,
> >>> correct?  If so, for avoidance of doubt, while I haven't reviewed the
> >>> patch in detail, I certainly have no objections to it.  Let me know if
> >> It has compile time impact, but a mixed one.
> > 
> > Same setup as with the other measurements.
> 
> Is it easy for you to toss that into a spreadsheet and get a geometric 
> mean of the impact across architectures?  Compared to the earlier table, 
> this looks like it has less negative impact in many cases and positive 
> impact in some cases.  I'm hoping that it's a wash overall...

It certainly has no negative impact I can measure.  The numbers have to be 
taken with a grain of salt, they are not done statistically correct (only 
two runs, without removing outliers and averaging results or taking the 
minimum).  So every difference less than say 1% contains much noise.

I think we should go with Jakubs patch.  I'll then rework my one to only 
contain the non-controversial stuff.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2010-06-17 18:09                     ` Michael Matz
@ 2010-06-17 18:44                       ` Mark Mitchell
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Mitchell @ 2010-06-17 18:44 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jakub Jelinek, gcc-patches, Jan Hubicka

Michael Matz wrote:

> It certainly has no negative impact I can measure.

I'm happy with the results, then.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2004-07-15  2:26   ` Vladimir Makarov
  2004-07-15 18:04     ` Ranjit Mathew
@ 2004-07-16 14:38     ` Segher Boessenkool
  1 sibling, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2004-07-16 14:38 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Gcc Patch List

> The patch is ok.  Please put the figure bracket on a separate line 
> (GNU standards require this) and change indentation inside the 
> if-statement.

Done and committed, with the following ChangeLog entry:

2004-07-16  Segher Boessenkool  <segher@kernel.crashing.org>

         * genautomata.c (add_vect): Speedup by using integers as
         bit-vectors for walking through the comb_vect and finding
         a match.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2004-07-15 18:04     ` Ranjit Mathew
@ 2004-07-16 13:19       ` Zack Weinberg
  0 siblings, 0 replies; 40+ messages in thread
From: Zack Weinberg @ 2004-07-16 13:19 UTC (permalink / raw)
  To: Ranjit Mathew; +Cc: Vladimir Makarov, Gcc Patch List, segher

Ranjit Mathew <rmathew@gmail.com> writes:

> Vladimir Makarov wrote:
>> Segher Boessenkool wrote:
> [...]
>>>... and the ChangeLog entry (oops).
>>>
>>>
>>>
>>>2004-05-15  Segher Boessenkool  <segher@kernel.crashing.org>
>>>
>>>    * genautomata.c (add_vect): Speedup.
>>>
>> 
>> The patch is ok.  Please put the figure bracket on a separate line (GNU 
>> standards require this) and change indentation inside the if-statement.
>
> Perhaps the ChangeLog entry can be a *little* more
> descriptive?

Allow me to strengthen that to a demand.

zw

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2004-07-15  2:26   ` Vladimir Makarov
@ 2004-07-15 18:04     ` Ranjit Mathew
  2004-07-16 13:19       ` Zack Weinberg
  2004-07-16 14:38     ` Segher Boessenkool
  1 sibling, 1 reply; 40+ messages in thread
From: Ranjit Mathew @ 2004-07-15 18:04 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Gcc Patch List, segher

Vladimir Makarov wrote:
> Segher Boessenkool wrote:
[...]
>>... and the ChangeLog entry (oops).
>>
>>
>>
>>2004-05-15  Segher Boessenkool  <segher@kernel.crashing.org>
>>
>>    * genautomata.c (add_vect): Speedup.
>>
> 
> The patch is ok.  Please put the figure bracket on a separate line (GNU 
> standards require this) and change indentation inside the if-statement.

Perhaps the ChangeLog entry can be a *little* more
descriptive?

Ranjit.

-- 
Ranjit Mathew          Email: rmathew AT gmail DOT com

Bangalore, INDIA.      Web: http://ranjitmathew.tripod.com/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2004-07-13 20:35 ` Segher Boessenkool
@ 2004-07-15  2:26   ` Vladimir Makarov
  2004-07-15 18:04     ` Ranjit Mathew
  2004-07-16 14:38     ` Segher Boessenkool
  0 siblings, 2 replies; 40+ messages in thread
From: Vladimir Makarov @ 2004-07-15  2:26 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Gcc Patch List

Segher Boessenkool wrote:

>> This patch speeds up genautomata (by a factor of two or
>> so; target-dependent).
>
I've got the following numbers:

x86:  no speedup
itanium: 25% speedup
ppc: 30% speedup

>
>
> ... and the ChangeLog entry (oops).
>
>
>
> 2004-05-15  Segher Boessenkool  <segher@kernel.crashing.org>
>
>     * genautomata.c (add_vect): Speedup.
>
The patch is ok.  Please put the figure bracket on a separate line (GNU 
standards require this) and change indentation inside the if-statement.

!   if (vect_length - first_unempty_vect_index >= SIZEOF_LONG * CHAR_BIT) {

After fixing this you can commit the patch.

Thanks for the patch.

Vlad




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Speed up genattrtab
  2004-07-13 20:28 Segher Boessenkool
@ 2004-07-13 20:35 ` Segher Boessenkool
  2004-07-15  2:26   ` Vladimir Makarov
  0 siblings, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2004-07-13 20:35 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Gcc Patch List

> This patch speeds up genautomata (by a factor of two or
> so; target-dependent).


... and the ChangeLog entry (oops).



2004-05-15  Segher Boessenkool  <segher@kernel.crashing.org>

	* genautomata.c (add_vect): Speedup.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Speed up genattrtab
@ 2004-07-13 20:28 Segher Boessenkool
  2004-07-13 20:35 ` Segher Boessenkool
  0 siblings, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2004-07-13 20:28 UTC (permalink / raw)
  To: Gcc Patch List

[-- Attachment #1: Type: text/plain, Size: 278 bytes --]

Hi,

This patch speeds up genautomata (by a factor of two or
so; target-dependent).

Bootstrapped and tested on powerpc64-unknown-linux-gnu
and powerpc-apple-darwin7.4.0 (and various people have
been using this for weeks now; thanks for testing!)

Okay for mainline?


Segher



[-- Attachment #2: patch-gcc-genautomata-speedup-1 --]
[-- Type: application/octet-stream, Size: 3849 bytes --]

*** /Users/segher/cvs/gcc/gcc/genautomata.c	Mon Feb 16 11:32:51 2004
--- ./genautomata.c	Sat May 15 00:12:10 2004
*************** add_vect (state_ainsn_table_t tab, int v
*** 7578,7583 ****
--- 7578,7584 ----
    int no_state_value;
    vect_el_t vect_el;
    int i;
+   unsigned long vect_mask, comb_vect_mask;
  
    if (vect_length == 0)
      abort ();
*************** add_vect (state_ainsn_table_t tab, int v
*** 7599,7621 ****
         first_unempty_vect_index++)
      if (vect [first_unempty_vect_index] != undefined_vect_el_value)
        break;
    /* Search for the place in comb vect for the inserted vect.  */
!   for (comb_vect_index = 0;
!        comb_vect_index < comb_vect_els_num;
!        comb_vect_index++)
!     {
!       for (vect_index = first_unempty_vect_index;
!            vect_index < vect_length
!              && vect_index + comb_vect_index < comb_vect_els_num;
!            vect_index++)
!         if (vect [vect_index] != undefined_vect_el_value
!             && (comb_vect_start [vect_index + comb_vect_index]
! 		!= undefined_vect_el_value))
            break;
!       if (vect_index >= vect_length
!           || vect_index + comb_vect_index >= comb_vect_els_num)
!         break;
      }
    /* Slot was found.  */
    additional_els_num = comb_vect_index + real_vect_length - comb_vect_els_num;
    if (additional_els_num < 0)
--- 7600,7675 ----
         first_unempty_vect_index++)
      if (vect [first_unempty_vect_index] != undefined_vect_el_value)
        break;
+ 
    /* Search for the place in comb vect for the inserted vect.  */
! 
!   /* Slow case.  */
!   if (vect_length - first_unempty_vect_index >= SIZEOF_LONG * CHAR_BIT) {
!     for (comb_vect_index = 0;
!          comb_vect_index < comb_vect_els_num;
!          comb_vect_index++)
!       {
!         for (vect_index = first_unempty_vect_index;
!              vect_index < vect_length
!                && vect_index + comb_vect_index < comb_vect_els_num;
!              vect_index++)
!           if (vect [vect_index] != undefined_vect_el_value
!               && (comb_vect_start [vect_index + comb_vect_index]
! 		  != undefined_vect_el_value))
!             break;
!         if (vect_index >= vect_length
!             || vect_index + comb_vect_index >= comb_vect_els_num)
            break;
!       }
!     goto found;
!   }
! 
!   /* Fast case.  */
!   vect_mask = 0;
!   for (vect_index = first_unempty_vect_index;
!        vect_index < vect_length;
!        vect_index++)
!     {
!       vect_mask = vect_mask << 1;
!       if (vect [vect_index] != undefined_vect_el_value)
! 	vect_mask |= 1;
      }
+ 
+   /* Search for the place in comb vect for the inserted vect.  */
+   comb_vect_index = 0;
+   if (comb_vect_els_num == 0)
+     goto found;
+ 
+   comb_vect_mask = 0;
+   for (vect_index = first_unempty_vect_index;
+        vect_index < vect_length && vect_index < comb_vect_els_num;
+        vect_index++)
+     {
+       comb_vect_mask <<= 1;
+       if (vect_index + comb_vect_index < comb_vect_els_num
+ 	  && comb_vect_start [vect_index + comb_vect_index]
+ 	     != undefined_vect_el_value)
+ 	comb_vect_mask |= 1;
+     }
+   if ((vect_mask & comb_vect_mask) == 0)
+     goto found;
+ 
+   for (comb_vect_index = 1, i = vect_length; i < comb_vect_els_num;
+        comb_vect_index++, i++)
+     {
+       comb_vect_mask = (comb_vect_mask << 1) | 1;
+       comb_vect_mask ^= comb_vect_start [i] == undefined_vect_el_value;
+       if ((vect_mask & comb_vect_mask) == 0)
+ 	goto found;
+     }
+   for ( ; comb_vect_index < comb_vect_els_num; comb_vect_index++)
+     {
+       comb_vect_mask <<= 1;
+       if ((vect_mask & comb_vect_mask) == 0)
+ 	goto found;
+     }
+ 
+ found:
    /* Slot was found.  */
    additional_els_num = comb_vect_index + real_vect_length - comb_vect_els_num;
    if (additional_els_num < 0)

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2010-06-17 16:16 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-15 16:58 Speed up genattrtab Michael Matz
2010-06-15 17:37 ` Mark Mitchell
2010-06-15 17:37   ` Jan Hubicka
2010-06-15 19:13     ` Mark Mitchell
2010-06-15 19:17       ` Ralf Wildenhues
2010-06-15 19:22       ` Jakub Jelinek
2010-06-16 15:42         ` Jakub Jelinek
2010-06-16 15:59           ` Jan Hubicka
2010-06-16 20:27           ` Jakub Jelinek
2010-06-16 21:50             ` Mark Mitchell
2010-06-17 10:21               ` Jakub Jelinek
2010-06-17 11:50                 ` Jan Hubicka
2010-06-17 13:39                 ` Michael Matz
2010-06-17 15:24                   ` Mark Mitchell
2010-06-17 18:09                     ` Michael Matz
2010-06-17 18:44                       ` Mark Mitchell
2010-06-17 12:27             ` Jakub Jelinek
2010-06-16  9:32       ` Richard Guenther
2010-06-16 14:30         ` Nathan Froyd
2010-06-16 14:33           ` Michael Matz
2010-06-16 14:43             ` Nathan Froyd
2010-06-16 15:33               ` Mike Stump
2010-06-16 15:48               ` Michael Matz
2010-06-16 15:57                 ` Jan Hubicka
2010-06-16 17:21                   ` Paolo Bonzini
2010-06-17 11:56                     ` Michael Matz
2010-06-17 12:08                       ` Jan Hubicka
2010-06-17 12:20                         ` Michael Matz
2010-06-17 12:26                           ` Jan Hubicka
2010-06-17 12:14                       ` Paolo Bonzini
2010-06-15 17:59   ` Dave Korn
2010-06-15 20:30     ` Paolo Bonzini
2010-06-15 20:34       ` Paolo Bonzini
2010-06-15 22:03     ` David Daney
  -- strict thread matches above, loose matches on Subject: below --
2004-07-13 20:28 Segher Boessenkool
2004-07-13 20:35 ` Segher Boessenkool
2004-07-15  2:26   ` Vladimir Makarov
2004-07-15 18:04     ` Ranjit Mathew
2004-07-16 13:19       ` Zack Weinberg
2004-07-16 14:38     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).