public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC: improving estimate_num_insns
@ 2005-07-12 17:56 Josh Conner
  2005-07-12 20:07 ` Steven Bosscher
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Conner @ 2005-07-12 17:56 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 4642 bytes --]

In looking at the function inliner, I did some analysis of the  
correctness of estimate_num_insns (from tree-inline.c), as this  
measurement is critical to the inlining heuristics.  Here is the  
overview of what I found on the functions in two sample files from  
the FAAD2 AAC library (measured at -O3 with inlining disabled):

syntax.c
               avg.    avg.     std.
target        ratio   error    deviation
arm-none      2.88     57%     2.37
darwin-ppc    3.17     55%     2.50
linux-x86     2.72     55%     2.06

huffman.c
               avg.    avg.     std.
target        ratio   error    deviation
arm-none      3.27    56%      2.37
darwin-ppc    4.19    53%      2.88
linux-x86     3.42    56%      2.39

     fn. differential = actual bytes / estimated insns
   avg. ratio = average (fn. differential) of all functions
     fn. error = max (fn. differential, avg. ratio) / min (fn.  
differential, avg. ratio)
   avg. error = average (fn. error) of all functions
   std. deviation = standard deviation (fn. differential) across all  
functions

Note that by choosing these metrics, my intent was to isolate the  
consistency of the differential from the magnitude of the  
differential.  In other words, the goal was to have estimates that  
were a consistent ratio to the actual results, whatever that ratio  
may be.

Thinking that there may be room for improvement on this, I tried this  
same experiment with a couple of adjustments to estimate_num_insns:
- Instead of ignoring constants, assign them a weight of 2  
instructions (one for the constant itself and one to load into memory)
- Instead of ignoring dereferences, assign a weight of 2 to an array  
reference and 1 for a pointer dereference
- Instead of ignoring case labels, assign them a weight of 2  
instructions (to represent the cost of control logic to get there)

With these modifications (tree-estimate.patch), I see the following  
results:

syntax.c
               avg.    avg.     std.
target        ratio   error    deviation
arm-none      1.68    29%      0.67
darwin-ppc    1.84    29%      0.69
linux-x86     1.62    26%      0.55

huffman.c
               avg.    avg.     std.
target        ratio   error    deviation
arm-none      1.75    26%      0.52
darwin-ppc    2.31    24%      0.69
linux-x86     1.95    25%      0.67

Which appears to be a significant improvement in the accuracy of  
estimate_num_insns.  However, note that since the avg. ratio has  
decreased significantly (estimates have gone up because we're  
counting things that were ignored before), any heuristics that are  
based on instruction counts will have effectively decreased by  
~25-50%.  To compensate for that, I bumped up any constants that are  
based on instruction estimates by 50% each (see inl-costs.patch).

With both of these changes in place, I ran some simple SPEC2000  
integer benchmarks (not fine-tuned, not reportable) and saw the  
following impact on performance (positive is better):

           darwin-ppc   linux-x86  linux-arm*
gzip      -4.7%        +11.7%     +1.3%
vpr       -0.7%        -1.1%      -----
mcf       -0.3%        +1.1%      -----
crafty    -0.8%        +1.8%      -----
parser    -----        -----      -----
eon       -4.4%        +0.3%      -----
perlbmk   +0.4%        0.0%       -1.1%
gap       -----        +1.0%      -2.9%
vortex    +4.1%        0.0%       0.0%
bzip2     +1.2%        +1.1%      -1.3%
twolf     -0.5%        -2.9%      -1.2%

   ----- = unable to obtain a valid result
   * linux-arm results are based on a subset of SPEC data and command- 
line options for use on a 32MB embedded board -- not even close to  
the full SPEC suite.

For what it's worth, code size is equal to or smaller for all  
benchmarks across all platforms.

So, here are the open issues I see at this point:
1. It appears that this change to estimate_num_instructions generates  
a much better estimate of actual code size.  However, the benchmark  
results are ambiguous.  Is this patch worth considering as-is?
2. Increasing instruction weights causes the instruction-based values  
(e.g., --param max-inline-insns-auto) to be effectively lower.   
However, changing these constants/defaults as in the second patch  
will cause a semantic change to anyone who is setting these values at  
the command line.  Is that change acceptable?

I do realize that this area is one that will eventually be likely  
best served by target-dependent routines (and constants), however I  
also see a significant benefit to all targets in fixing the default  
implementation first.

Thoughts?  Advice?

Thanks -

Josh
~~~~
Josh Conner



[-- Attachment #2: tree-estimate.patch --]
[-- Type: application/octet-stream, Size: 3220 bytes --]

Index: tree-inline.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/tree-inline.c,v
retrieving revision 1.200
diff -c -3 -p -r1.200 tree-inline.c
*** tree-inline.c	28 Jun 2005 19:33:23 -0000	1.200
--- tree-inline.c	1 Jul 2005 23:08:11 -0000
*************** estimate_num_insns_1 (tree *tp, int *wal
*** 1564,1574 ****
        *walk_subtrees = 0;
        return NULL;
      }
!   /* Assume that constants and references counts nothing.  These should
!      be majorized by amount of operations among them we count later
!      and are common target of CSE and similar optimizations.  */
!   else if (CONSTANT_CLASS_P (x) || REFERENCE_CLASS_P (x))
!     return NULL;
  
    switch (TREE_CODE (x))
      {
--- 1564,1593 ----
        *walk_subtrees = 0;
        return NULL;
      }
!   else if (CONSTANT_CLASS_P (x))
!     {
!       /* One word to hold the constant and a load instruction.  */
!       *count += 2;
!       *walk_subtrees = 0;
!       return NULL;
!     }
!   else if (REFERENCE_CLASS_P (x))
!     {
!       switch (TREE_CODE (x))
! 	{
! 	  /* Approximate an array access as an addition + pointer
!              dereference.  */
! 	  case ARRAY_REF:
! 	    *count += 2;
! 	    break;
! 
! 	  /* Any other dereference.  */
! 	  default:
! 	    *count += 1;
! 	    break;
! 	}
!       return NULL;
!     }
  
    switch (TREE_CODE (x))
      {
*************** estimate_num_insns_1 (tree *tp, int *wal
*** 1576,1588 ****
      case TREE_LIST:
      case TREE_VEC:
      case BLOCK:
-     case COMPONENT_REF:
-     case BIT_FIELD_REF:
-     case INDIRECT_REF:
-     case ALIGN_INDIRECT_REF:
-     case MISALIGNED_INDIRECT_REF:
-     case ARRAY_REF:
-     case ARRAY_RANGE_REF:
      case OBJ_TYPE_REF:
      case EXC_PTR_EXPR: /* ??? */
      case FILTER_EXPR: /* ??? */
--- 1595,1600 ----
*************** estimate_num_insns_1 (tree *tp, int *wal
*** 1595,1601 ****
      case ADDR_EXPR:
      case COMPLEX_EXPR:
      case RANGE_EXPR:
-     case CASE_LABEL_EXPR:
      case SSA_NAME:
      case CATCH_EXPR:
      case EH_FILTER_EXPR:
--- 1607,1612 ----
*************** estimate_num_insns_1 (tree *tp, int *wal
*** 1613,1632 ****
      case LOOP_EXPR:
      case PHI_NODE:
      case WITH_SIZE_EXPR:
        break;
  
!     /* We don't account constants for now.  Assume that the cost is amortized
!        by operations that do use them.  We may re-consider this decision once
!        we are able to optimize the tree before estimating its size and break
!        out static initializers.  */
!     case IDENTIFIER_NODE:
!     case INTEGER_CST:
!     case REAL_CST:
!     case COMPLEX_CST:
!     case VECTOR_CST:
!     case STRING_CST:
!       *walk_subtrees = 0;
!       return NULL;
  
      /* Try to estimate the cost of assignments.  We have three cases to
         deal with:
--- 1624,1636 ----
      case LOOP_EXPR:
      case PHI_NODE:
      case WITH_SIZE_EXPR:
+     case IDENTIFIER_NODE:
        break;
  
!     /* A switch statement generally requires testing logic for each case.  */
!     case CASE_LABEL_EXPR:
!       *count += 2;
!       break;
  
      /* Try to estimate the cost of assignments.  We have three cases to
         deal with:

[-- Attachment #3: inl-costs.patch --]
[-- Type: application/octet-stream, Size: 6182 bytes --]

Index: params.def
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/params.def,v
retrieving revision 1.61
diff -c -3 -p -r1.61 params.def
*** params.def	27 Jun 2005 07:40:48 -0000	1.61
--- params.def	1 Jul 2005 23:08:11 -0000
*************** DEFPARAM (PARAM_SRA_FIELD_STRUCTURE_RATI
*** 66,72 ****
     of a function counted in internal gcc instructions (not in
     real machine instructions) that is eligible for inlining
     by the tree inliner.
!    The default value is 450.
     Only functions marked inline (or methods defined in the class
     definition for C++) are affected by this.
     There are more restrictions to inlining: If inlined functions
--- 66,72 ----
     of a function counted in internal gcc instructions (not in
     real machine instructions) that is eligible for inlining
     by the tree inliner.
!    The default value is 675.
     Only functions marked inline (or methods defined in the class
     definition for C++) are affected by this.
     There are more restrictions to inlining: If inlined functions
*************** DEFPARAM (PARAM_SRA_FIELD_STRUCTURE_RATI
*** 77,83 ****
  DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE,
  	  "max-inline-insns-single",
  	  "The maximum number of instructions in a single function eligible for inlining",
! 	  450, 0, 0)
  
  /* The single function inlining limit for functions that are
     inlined by virtue of -finline-functions (-O3).
--- 77,83 ----
  DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE,
  	  "max-inline-insns-single",
  	  "The maximum number of instructions in a single function eligible for inlining",
! 	  675, 0, 0)
  
  /* The single function inlining limit for functions that are
     inlined by virtue of -finline-functions (-O3).
*************** DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE,
*** 85,105 ****
     that is applied to functions marked inlined (or defined in the
     class declaration in C++) given by the "max-inline-insns-single"
     parameter.
!    The default value is 90.  */
  DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO,
  	  "max-inline-insns-auto",
  	  "The maximum number of instructions when automatically inlining",
! 	  90, 0, 0)
  
  DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE,
  	  "max-inline-insns-recursive",
  	  "The maximum number of instructions inline function can grow to via recursive inlining",
! 	  450, 0, 0)
  
  DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO,
  	  "max-inline-insns-recursive-auto",
  	  "The maximum number of instructions non-inline function can grow to via recursive inlining",
! 	  450, 0, 0)
  
  DEFPARAM (PARAM_MAX_INLINE_RECURSIVE_DEPTH,
  	  "max-inline-recursive-depth",
--- 85,105 ----
     that is applied to functions marked inlined (or defined in the
     class declaration in C++) given by the "max-inline-insns-single"
     parameter.
!    The default value is 135.  */
  DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO,
  	  "max-inline-insns-auto",
  	  "The maximum number of instructions when automatically inlining",
! 	  135, 0, 0)
  
  DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE,
  	  "max-inline-insns-recursive",
  	  "The maximum number of instructions inline function can grow to via recursive inlining",
! 	  675, 0, 0)
  
  DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO,
  	  "max-inline-insns-recursive-auto",
  	  "The maximum number of instructions non-inline function can grow to via recursive inlining",
! 	  675, 0, 0)
  
  DEFPARAM (PARAM_MAX_INLINE_RECURSIVE_DEPTH,
  	  "max-inline-recursive-depth",
*************** DEFPARAM(PARAM_MAX_PENDING_LIST_LENGTH,
*** 155,161 ****
  DEFPARAM(PARAM_LARGE_FUNCTION_INSNS,
  	 "large-function-insns",
  	 "The size of function body to be considered large",
! 	 2700, 0, 0)
  DEFPARAM(PARAM_LARGE_FUNCTION_GROWTH,
  	 "large-function-growth",
  	 "Maximal growth due to inlining of large function (in percent)",
--- 155,161 ----
  DEFPARAM(PARAM_LARGE_FUNCTION_INSNS,
  	 "large-function-insns",
  	 "The size of function body to be considered large",
! 	 4050, 0, 0)
  DEFPARAM(PARAM_LARGE_FUNCTION_GROWTH,
  	 "large-function-growth",
  	 "Maximal growth due to inlining of large function (in percent)",
*************** DEFPARAM(PARAM_MAX_PEEL_TIMES,
*** 233,239 ****
  DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS,
  	"max-completely-peeled-insns",
  	"The maximum number of insns of a completely peeled loop",
! 	400, 0, 0)
  /* The maximum number of peelings of a single loop that is peeled completely.  */
  DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES,
  	"max-completely-peel-times",
--- 233,239 ----
  DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS,
  	"max-completely-peeled-insns",
  	"The maximum number of insns of a completely peeled loop",
! 	600, 0, 0)
  /* The maximum number of peelings of a single loop that is peeled completely.  */
  DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES,
  	"max-completely-peel-times",
Index: tree-ssa-loop-ch.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/tree-ssa-loop-ch.c,v
retrieving revision 2.18
diff -c -3 -p -r2.18 tree-ssa-loop-ch.c
*** tree-ssa-loop-ch.c	25 Jun 2005 02:01:39 -0000	2.18
--- tree-ssa-loop-ch.c	1 Jul 2005 23:08:11 -0000
*************** should_duplicate_loop_header_p (basic_bl
*** 76,82 ****
      return false;
  
    /* Approximately copy the conditions that used to be used in jump.c --
!      at most 20 insns and no calls.  */
    for (bsi = bsi_start (header); !bsi_end_p (bsi); bsi_next (&bsi))
      {
        last = bsi_stmt (bsi);
--- 76,82 ----
      return false;
  
    /* Approximately copy the conditions that used to be used in jump.c --
!      at most 30 insns and no calls.  */
    for (bsi = bsi_start (header); !bsi_end_p (bsi); bsi_next (&bsi))
      {
        last = bsi_stmt (bsi);
*************** copy_loop_headers (void)
*** 150,157 ****
  
    for (i = 1; i < loops->num; i++)
      {
!       /* Copy at most 20 insns.  */
!       int limit = 20;
  
        loop = loops->parray[i];
        if (!loop)
--- 150,157 ----
  
    for (i = 1; i < loops->num; i++)
      {
!       /* Copy at most 30 insns.  */
!       int limit = 30;
  
        loop = loops->parray[i];
        if (!loop)

[-- Attachment #4: Type: text/plain, Size: 1 bytes --]



^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: RFC: improving estimate_num_insns
@ 2005-07-13  2:44 Richard Kenner
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Kenner @ 2005-07-13  2:44 UTC (permalink / raw)
  To: stevenb; +Cc: gcc

    First of all, I think you should handle ARRAY_RANGE_REF and ARRAY_REF
    the same.

I don't think so.  The latter is a memory reference while the former
is more like a NOP_EXPR: it's basically just creating a new array, which
can then either be indexed or pointed to.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-07-13  8:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-12 17:56 RFC: improving estimate_num_insns Josh Conner
2005-07-12 20:07 ` Steven Bosscher
2005-07-13  2:31   ` Josh Conner
2005-07-13  7:59     ` Steven Bosscher
2005-07-13  8:47       ` Richard Guenther
2005-07-13  2:44 Richard Kenner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).