public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Inliner parameters
@ 2003-04-16 21:38 Richard Guenther
  2003-04-17  3:11 ` Steven Bosscher
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Guenther @ 2003-04-16 21:38 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

> Hi,
>
> Many 3.3 compile time regressions from 3.2 seem to be caused by the new
> tree inliner heuristics.  There are at least a few PRs for which the

They may be related to EH reorganization as well, as noted, f.i. in
PR 10196.

> inliner apparently causes serious compile time regressions (PRs 10160,
> 10196, and 10316 and part of 8361 as well).  Obviously those PRs are
> just the tip of the proverbial iceberg...
>
> All these compile time regressions disappear when the inlining limit is
> changed.  It looks like the 3.3 inliner is a much more agressive, but I
> have not seen any data to confirm this.
>
> When was the last time somebody tried to tune the parameters a bit?  Did
> anyone try the effects of different parameter settings for, say, SPEC
> and POOMA (and, ideally, on more than one platform)?

I tried various parameters for POOMA to tune the performance of the
optimized code and the key parameter to change was min-inline-insns.
This is _way_ too low for POOMA to collapse the expression template
trees. I need to bump this up to 250 to get good performance. The
max-inline-insns-single can be dropped to 250 without loss then.

Maybe the insn counting for small C++ template methods is just way off
compared to C programs? Maybe we should have different default parameters
for C and C++ programs? And we should certainly get -funit-at-a-time
working for C++ in mainline.

Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inliner parameters
  2003-04-16 21:38 Inliner parameters Richard Guenther
@ 2003-04-17  3:11 ` Steven Bosscher
  2003-04-17 10:50   ` Richard Guenther
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Steven Bosscher @ 2003-04-17  3:11 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

Op wo 16-04-2003, om 23:17 schreef Richard Guenther:
> > Hi,
> >
> > Many 3.3 compile time regressions from 3.2 seem to be caused by the new
> > tree inliner heuristics.  There are at least a few PRs for which the
> 
> They may be related to EH reorganization as well, as noted, f.i. in
> PR 10196.

Yes, but the stuff you're inline throws everywhere, from recursively
inlined functions.  So if you do EH, you're inlining lots of throws. 
Remember that 3.2 could do EH _and_ inlining about 12 times as fast as
3.3 for your test case.  Has EH slowed down so much from 3.2 to 3.3 that
it would explain a 1200% compiler slowdown (and even worse before Mark's
fixup_thingy O(t^2) fix that is not even on the 3.2 branch)?  I sure
hope not.


> > When was the last time somebody tried to tune the parameters a bit?  Did
> > anyone try the effects of different parameter settings for, say, SPEC
> > and POOMA (and, ideally, on more than one platform)?
> 
> I tried various parameters for POOMA to tune the performance of the
> optimized code and the key parameter to change was min-inline-insns.
> This is _way_ too low for POOMA to collapse the expression template
> trees. I need to bump this up to 250 to get good performance. The
> max-inline-insns-single can be dropped to 250 without loss then.

What happened to the compile times with bigger min-inline-insns?

Have you also tried different values of max-inline-slope?  Larger values
of that param should also increase the number of inlined functions.

> Maybe the insn counting for small C++ template methods is just way off
> compared to C programs? Maybe we should have different default parameters
> for C and C++ programs?

It has been suggested before that C and C++ require different inline
parameters.  INSN counting seems a bit crude, with INSNS_PER_STMT. 
Maybe we should play with that a bit, see if the default value of 10
makes sense for C++...  What is that number based on anyway?  Maybe the
number of INSNs could be varied per tree code.  Surely, on the tree-ssa
branch with GIMPLE it should be possible to make a better estimate of
the number of insn per statement?

> And we should certainly get -funit-at-a-time
> working for C++ in mainline.

That would be really cool, yes.

Greetz
Steven


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inliner parameters
  2003-04-17  3:11 ` Steven Bosscher
  2003-04-17 10:50   ` Richard Guenther
@ 2003-04-17 10:50   ` Richard Guenther
  2003-04-17 13:29   ` Jan Hubicka
  2 siblings, 0 replies; 7+ messages in thread
From: Richard Guenther @ 2003-04-17 10:50 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

On 17 Apr 2003, Steven Bosscher wrote:

> Op wo 16-04-2003, om 23:17 schreef Richard Guenther:
>
> > > When was the last time somebody tried to tune the parameters a bit?  Did
> > > anyone try the effects of different parameter settings for, say, SPEC
> > > and POOMA (and, ideally, on more than one platform)?
> >
> > I tried various parameters for POOMA to tune the performance of the
> > optimized code and the key parameter to change was min-inline-insns.
> > This is _way_ too low for POOMA to collapse the expression template
> > trees. I need to bump this up to 250 to get good performance. The
> > max-inline-insns-single can be dropped to 250 without loss then.
>
> What happened to the compile times with bigger min-inline-insns?

The following tests are for g++-3.3 (GCC) 3.3 20030414 (prerelease)

Compiling my worst-case example with -O2 -fno-exceptions --param
min-inline-insns=X gives

   X     compile-time
  50       11.25   [uses 112MB ram max]
 100       11.25   [112MB]
 150       11.50   [112MB]
 200       11.50   [113MB]
 250       12.50   [110MB]
 300       12.75   [110MB]

g++-3.2 (GCC) 3.2.3 20030414 (prerelease): 9.50   [73MB]

and with -O2 --param min-inline-insns=X (EH turned on)

   X     compile-time
  50      178.75   [uses 359MB ram max!]
 100      178.00   [368MB]
 150      179.75   [363MB]
 200      180.25   [363MB]
 250      killed after 4min [559MB - we're swapping us to death..., seems
          we're starting to inline EH stuff here]
 300      wouldn't be any better, I suppose

g++-3.2 (GCC) 3.2.3 20030414 (prerelease): 11.00  [81MB]

note the difference in with/without EH, especially the memory requirements
(the machine has 512MB of ram).

So I really think we have a problem with EH left...

And I assume we can safely raise min-inline-insns at least for C++ code.

I'll post numbers of my performance test as a follow up (this one needs
the min-inline-insns bumped and so may show different compile time
characteristics).

> Have you also tried different values of max-inline-slope?  Larger values
> of that param should also increase the number of inlined functions.

Yes, but apart from setting it to gazillion, the required value (to make
good performing programs) depend on the actual code that is compiled. F.I.
I usually get empty destructors not inlined and such simple stuff if just
the nesting level is deep enough. This doesnt happen with adjusting
min-inline-insns.

> > Maybe the insn counting for small C++ template methods is just way off
> > compared to C programs? Maybe we should have different default parameters
> > for C and C++ programs?
>
> It has been suggested before that C and C++ require different inline
> parameters.  INSN counting seems a bit crude, with INSNS_PER_STMT.
> Maybe we should play with that a bit, see if the default value of 10
> makes sense for C++...  What is that number based on anyway?  Maybe the
> number of INSNs could be varied per tree code.  Surely, on the tree-ssa
> branch with GIMPLE it should be possible to make a better estimate of
> the number of insn per statement?

Also do we account for constant parameters passed to to be inlined
functions? If not we're over-estimating the number of resulting
instructions for such calls.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inliner parameters
  2003-04-17  3:11 ` Steven Bosscher
@ 2003-04-17 10:50   ` Richard Guenther
  2003-04-23 12:21     ` PR 10196 / " Richard Guenther
  2003-04-17 10:50   ` Richard Guenther
  2003-04-17 13:29   ` Jan Hubicka
  2 siblings, 1 reply; 7+ messages in thread
From: Richard Guenther @ 2003-04-17 10:50 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

On 17 Apr 2003, Steven Bosscher wrote:

> Op wo 16-04-2003, om 23:17 schreef Richard Guenther:
> > > When was the last time somebody tried to tune the parameters a bit?  Did
> > > anyone try the effects of different parameter settings for, say, SPEC
> > > and POOMA (and, ideally, on more than one platform)?
> >
> > I tried various parameters for POOMA to tune the performance of the
> > optimized code and the key parameter to change was min-inline-insns.
> > This is _way_ too low for POOMA to collapse the expression template
> > trees. I need to bump this up to 250 to get good performance. The
> > max-inline-insns-single can be dropped to 250 without loss then.
>
> What happened to the compile times with bigger min-inline-insns?

Here are compile time and runtime numbers for my performace testcase using
g++-3.3 (GCC) 3.3 20030414 (prerelease) with options -O2 -march=athlon
-fomit-frame-pointer -funroll-loops -fno-exceptions --param min-inline-insns=X.
Lower numbers for the perf. indicator are better.

  X      compile-time    performance indicator
default     49.50           1.99804e-06
 50         50.25           2.26817e-06
100         50.00           1.96918e-06
150         51.00           1.90269e-06
200         58.25           1.83045e-06
250         61.25           1.28309e-06
300         62.75           1.29364e-06
default + -Dinline="__inline__ __attribute__((always_inline))"
            50.50           1.31171e-06
(while the source is not optimized for inline->always_inline
transformation)

just to show what happens with EH on, for the best param above (250)
we get

250         [again, goes into swap... - till then, 3min elapsed]
killed it - going to a machine with 2GB ram and more GHz where we cant
compare the compile time numbers from above, of course...
250        448.00 [uses 750MB of ram]  5.96641e-07
which is more than an order of magnitude worse than without EH on a
faster CPU with faster mem... ugh!

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Inliner parameters
  2003-04-17  3:11 ` Steven Bosscher
  2003-04-17 10:50   ` Richard Guenther
  2003-04-17 10:50   ` Richard Guenther
@ 2003-04-17 13:29   ` Jan Hubicka
  2 siblings, 0 replies; 7+ messages in thread
From: Jan Hubicka @ 2003-04-17 13:29 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Richard Guenther, gcc

> > And we should certainly get -funit-at-a-time
> > working for C++ in mainline.
> 
> That would be really cool, yes.
Really hope that I will be able to do somw work on it this week (once I
get past the bugreports that has accumulated).  I am attaching the WIP
patch in case someone wants to take a look.  It works for simple cases
but fails to build some functions when iostream.h is included.  not sure
where I miss them.

Honza

Index: cgraph.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.c,v
retrieving revision 1.9
diff -c -3 -p -r1.9 cgraph.c
*** cgraph.c	8 Mar 2003 18:24:22 -0000	1.9
--- cgraph.c	12 Mar 2003 17:56:11 -0000
*************** static htab_t cgraph_hash = 0;
*** 48,53 ****
--- 48,56 ----
  /* The linked list of cgraph nodes.  */
  struct cgraph_node *cgraph_nodes;
  
+ /* Queue of cgraph nodes scheduled to be lowered.  */
+ struct cgraph_node *cgraph_nodes_queue;
+ 
  /* Number of nodes in existence.  */
  int cgraph_n_nodes;
  
*************** eq_node (p1, p2)
*** 79,85 ****
       const PTR p2;
  {
    return ((DECL_ASSEMBLER_NAME (((struct cgraph_node *) p1)->decl)) ==
! 	  DECL_ASSEMBLER_NAME ((tree) p2));
  }
  
  /* Return cgraph node assigned to DECL.  Create new one when needed.  */
--- 82,88 ----
       const PTR p2;
  {
    return ((DECL_ASSEMBLER_NAME (((struct cgraph_node *) p1)->decl)) ==
! 	  (tree) p2);
  }
  
  /* Return cgraph node assigned to DECL.  Create new one when needed.  */
*************** cgraph_node (decl)
*** 100,106 ****
      }
  
    slot =
!     (struct cgraph_node **) htab_find_slot_with_hash (cgraph_hash, decl,
  						      htab_hash_pointer
  						      (DECL_ASSEMBLER_NAME
  						       (decl)), 1);
--- 103,110 ----
      }
  
    slot =
!     (struct cgraph_node **) htab_find_slot_with_hash (cgraph_hash,
! 						      DECL_ASSEMBLER_NAME (decl),
  						      htab_hash_pointer
  						      (DECL_ASSEMBLER_NAME
  						       (decl)), 1);
*************** cgraph_node (decl)
*** 125,130 ****
--- 129,158 ----
    return node;
  }
  
+ /* Try to find existing function for identifier ID.  */
+ struct cgraph_node *
+ cgraph_node_for_identifier (id)
+      tree id;
+ {
+   struct cgraph_node **slot;
+ 
+   if (TREE_CODE (id) != IDENTIFIER_NODE)
+     abort ();
+ 
+   if (!cgraph_hash)
+     {
+       cgraph_hash = htab_create (10, hash_node, eq_node, NULL);
+       VARRAY_TREE_INIT (known_fns, 32, "known_fns");
+     }
+ 
+   slot =
+     (struct cgraph_node **) htab_find_slot_with_hash (cgraph_hash, id,
+ 						      htab_hash_pointer (id), 0);
+   if (!slot)
+     return NULL;
+   return *slot;
+ }
+ 
  /* Create edge from CALLER to CALLEE in the cgraph.  */
  
  static struct cgraph_edge *
*************** cgraph_remove_node (node)
*** 192,197 ****
--- 220,249 ----
      node->next->previous = node->previous;
    DECL_SAVED_TREE (node->decl) = NULL;
    /* Do not free the structure itself so the walk over chain can continue.  */
+ }
+ 
+ /* Notify finalize_compilation_unit that given node is reachable
+    or needed.  */
+ void
+ cgraph_mark_needed_node (node, needed)
+      struct cgraph_node *node;
+      int needed;
+ {
+   if (needed)
+     {
+       if (DECL_SAVED_TREE (node->decl))
+         announce_function (node->decl);
+       node->needed = 1;
+     }
+   if (!node->reachable)
+     {
+       node->reachable = 1;
+       if (DECL_SAVED_TREE (node->decl))
+ 	{
+ 	  node->aux = cgraph_nodes_queue;
+ 	  cgraph_nodes_queue = node;
+         }
+     }
  }
  
  
Index: cgraph.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.h,v
retrieving revision 1.4
diff -c -3 -p -r1.4 cgraph.h
*** cgraph.h	8 Mar 2003 13:26:35 -0000	1.4
--- cgraph.h	12 Mar 2003 17:56:11 -0000
*************** struct cgraph_edge
*** 100,105 ****
--- 100,106 ----
  extern struct cgraph_node *cgraph_nodes;
  extern int cgraph_n_nodes;
  extern bool cgraph_global_info_ready;
+ extern struct cgraph_node *cgraph_nodes_queue;
  
  /* In cgraph.c  */
  void dump_cgraph			PARAMS ((FILE *));
*************** void cgraph_remove_call			PARAMS ((tree,
*** 107,112 ****
--- 108,114 ----
  void cgraph_remove_node			PARAMS ((struct cgraph_node *));
  struct cgraph_edge *cgraph_record_call	PARAMS ((tree, tree));
  struct cgraph_node *cgraph_node		PARAMS ((tree decl));
+ struct cgraph_node *cgraph_node_for_identifier	PARAMS ((tree id));
  bool cgraph_calls_p			PARAMS ((tree, tree));
  struct cgraph_local_info *cgraph_local_info PARAMS ((tree));
  struct cgraph_global_info *cgraph_global_info PARAMS ((tree));
Index: cgraphunit.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraphunit.c,v
retrieving revision 1.3
diff -c -3 -p -r1.3 cgraphunit.c
*** cgraphunit.c	8 Mar 2003 13:26:35 -0000	1.3
--- cgraphunit.c	12 Mar 2003 17:56:11 -0000
*************** cgraph_finalize_function (decl, body)
*** 64,95 ****
    (*debug_hooks->deferred_inline_function) (decl);
  }
  
- static struct cgraph_node *queue = NULL;
- 
- /* Notify finalize_compilation_unit that given node is reachable
-    or needed.  */
- void
- cgraph_mark_needed_node (node, needed)
-      struct cgraph_node *node;
-      int needed;
- {
-   if (needed)
-     {
-       if (DECL_SAVED_TREE (node->decl))
-         announce_function (node->decl);
-       node->needed = 1;
-     }
-   if (!node->reachable)
-     {
-       node->reachable = 1;
-       if (DECL_SAVED_TREE (node->decl))
- 	{
- 	  node->aux = queue;
- 	  queue = node;
-         }
-     }
- }
- 
  /* Walk tree and record all calls.  Called via walk_tree.  */
  static tree
  record_call_1 (tp, walk_subtrees, data)
--- 64,69 ----
*************** cgraph_finalize_compilation_unit ()
*** 163,174 ****
        functions.  In the future, lowering will introduce new functions and
        new entry points on the way (by template instantiation and virtual
        method table generation for instance).  */
!   while (queue)
      {
!       tree decl = queue->decl;
  
!       node = queue;
!       queue = queue->aux;
        if (node->lowered || !node->reachable || !DECL_SAVED_TREE (decl))
  	abort ();
  
--- 137,148 ----
        functions.  In the future, lowering will introduce new functions and
        new entry points on the way (by template instantiation and virtual
        method table generation for instance).  */
!   while (cgraph_nodes_queue)
      {
!       tree decl = cgraph_nodes_queue->decl;
  
!       node = cgraph_nodes_queue;
!       cgraph_nodes_queue = cgraph_nodes_queue->aux;
        if (node->lowered || !node->reachable || !DECL_SAVED_TREE (decl))
  	abort ();
  
Index: varasm.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/varasm.c,v
retrieving revision 1.333
diff -c -3 -p -r1.333 varasm.c
*** varasm.c	9 Mar 2003 20:41:30 -0000	1.333
--- varasm.c	12 Mar 2003 17:56:14 -0000
*************** Software Foundation, 59 Temple Place - S
*** 49,54 ****
--- 49,55 ----
  #include "tm_p.h"
  #include "debug.h"
  #include "target.h"
+ #include "cgraph.h"
  
  #ifdef XCOFF_DEBUGGING_INFO
  #include "xcoffout.h"		/* Needed for external data
*************** assemble_name (file, name)
*** 1734,1740 ****
  
    id = maybe_get_identifier (real_name);
    if (id)
!     TREE_SYMBOL_REFERENCED (id) = 1;
  
    if (name[0] == '*')
      fputs (&name[1], file);
--- 1735,1750 ----
  
    id = maybe_get_identifier (real_name);
    if (id)
!     {
!       if (!TREE_SYMBOL_REFERENCED (id)
! 	  && !cgraph_global_info_ready)
! 	{
! 	  struct cgraph_node *node = cgraph_node_for_identifier (id);
! 	  if (node)
! 	    cgraph_mark_needed_node (node, 1);
! 	}
!       TREE_SYMBOL_REFERENCED (id) = 1;
!     }
  
    if (name[0] == '*')
      fputs (&name[1], file);
Index: cp/cp-lang.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/cp-lang.c,v
retrieving revision 1.48
diff -c -3 -p -r1.48 cp-lang.c
*** cp/cp-lang.c	10 Mar 2003 07:26:33 -0000	1.48
--- cp/cp-lang.c	12 Mar 2003 17:56:15 -0000
*************** static bool cp_var_mod_type_p (tree);
*** 142,147 ****
--- 142,152 ----
  #undef LANG_HOOKS_EXPR_SIZE
  #define LANG_HOOKS_EXPR_SIZE cp_expr_size
  
+ #undef LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION
+ #define LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION really_expand_body
+ #undef LANG_HOOKS_CALLGRAPH_LOWER_FUNCTION
+ #define LANG_HOOKS_CALLGRAPH_LOWER_FUNCTION lower_function
+ 
  #undef LANG_HOOKS_MAKE_TYPE
  #define LANG_HOOKS_MAKE_TYPE cxx_make_type
  #undef LANG_HOOKS_TYPE_FOR_MODE
Index: cp/cp-tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/cp-tree.h,v
retrieving revision 1.824
diff -c -3 -p -r1.824 cp-tree.h
*** cp/cp-tree.h	11 Mar 2003 15:43:14 -0000	1.824
--- cp/cp-tree.h	12 Mar 2003 17:56:15 -0000
*************** extern tree build_artificial_parm (tree,
*** 3854,3859 ****
--- 3854,3860 ----
  extern tree get_guard (tree);
  extern tree get_guard_cond (tree);
  extern tree set_guard (tree);
+ extern void lower_function (tree);
  
  extern void cp_error_at		(const char *msgid, ...);
  extern void cp_warning_at	(const char *msgid, ...);
*************** extern void clear_out_block             
*** 4206,4211 ****
--- 4207,4213 ----
  extern tree begin_global_stmt_expr              (void);
  extern tree finish_global_stmt_expr             (tree);
  extern tree check_template_template_default_arg (tree);
+ extern void really_expand_body			(tree);
  
  /* in tree.c */
  extern void lang_check_failed			(const char *, int,
Index: cp/decl2.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/decl2.c,v
retrieving revision 1.604
diff -c -3 -p -r1.604 decl2.c
*** cp/decl2.c	11 Mar 2003 15:43:14 -0000	1.604
--- cp/decl2.c	12 Mar 2003 17:56:15 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 46,51 ****
--- 46,53 ----
  #include "cpplib.h"
  #include "target.h"
  #include "c-common.h"
+ #include "cgraph.h"
+ #include "tree-inline.h"
  extern cpp_reader *parse_in;
  
  /* This structure contains information about the initializations
*************** finish_objects (int method_type, int ini
*** 2018,2023 ****
--- 2020,2026 ----
    finish_compound_stmt (/*has_no_scope=*/0, body);
    fn = finish_function (0);
    expand_body (fn);
+   cgraph_mark_needed_node (cgraph_node (fn), 1);
  
    /* When only doing semantic analysis, and no RTL generation, we
       can't call functions that directly emit assembly code; there is
*************** start_static_storage_duration_function (
*** 2177,2182 ****
--- 2180,2186 ----
  static void
  finish_static_storage_duration_function (tree body)
  {
+   tree decl;
    /* Close out the function.  */
    finish_compound_stmt (/*has_no_scope=*/0, body);
    expand_body (finish_function (0));
*************** generate_ctor_and_dtor_functions_for_pri
*** 2544,2549 ****
--- 2548,2717 ----
    return 0;
  }
  
+ /* Mark all references to static declaration as used.  Called via walk_tree.  */
+ static tree
+ mark_used_decls (tree *tp, int *walk_subtrees ATTRIBUTE_UNUSED,
+ 		 void *data ATTRIBUTE_UNUSED)
+ {
+   if (DECL_P (*tp) && TREE_STATIC (*tp) && DECL_ASSEMBLER_NAME_SET_P (*tp)
+       && TREE_CODE (*tp) != FUNCTION_DECL)
+     TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (*tp)) = 1;
+   return 0;
+ }
+ 
+ /* Called via LANGHOOK_CALLGRAPH_LOWER_FUNCTION.  Make all calls to function explicit
+    and mark all data references by the function to be output.  */
+ void
+ lower_function (tree fn)
+ {
+   bool reconsider = true;
+   tree t;
+   tree vars;
+   unsigned int i;
+ 
+   import_export_decl (fn);
+   
+   /* Does it need synthesizing?  */
+   if (DECL_ARTIFICIAL (fn) && ! DECL_INITIAL (fn)
+       && TREE_USED (fn)
+       && (! DECL_REALLY_EXTERN (fn) || DECL_INLINE (fn)))
+     {
+       /* Even though we're already at the top-level, we push
+ 	 there again.  That way, when we pop back a few lines
+ 	 hence, all of our state is restored.  Otherwise,
+ 	 finish_function doesn't clean things up, and we end
+ 	 up with CURRENT_FUNCTION_DECL set.  */
+       push_to_top_level ();
+       synthesize_method (fn);
+       pop_from_top_level ();
+       reconsider = true;
+     }
+ 
+   /* Assume that all referenced static variables will actually be used.
+      This is overactive and will need to be improved using AST.  */
+   walk_tree (&DECL_SAVED_TREE (fn), mark_used_decls, NULL, NULL);
+ 
+   while (reconsider)
+     {
+       reconsider = false;
+ 
+       /* If there are templates that we've put off instantiating, do
+ 	 them now.  */
+       instantiate_pending_templates ();
+ 
+       /* Write out virtual tables as required.  Note that writing out
+   	 the virtual table for a template class may cause the
+  	 instantiation of members of that class.  If we write out
+  	 vtables then we remove the class from our list so we don't
+  	 have to look at it again. */
+  
+       while (keyed_classes != NULL_TREE
+  	     && maybe_emit_vtables (TREE_VALUE (keyed_classes)))
+  	{
+   	  reconsider = 1;
+  	  keyed_classes = TREE_CHAIN (keyed_classes);
+  	}
+  
+       t = keyed_classes;
+       if (t != NULL_TREE)
+  	{
+  	  tree next = TREE_CHAIN (t);
+  
+  	  while (next)
+  	    {
+  	      if (maybe_emit_vtables (TREE_VALUE (next)))
+  		{
+  		  reconsider = 1;
+  		  TREE_CHAIN (t) = TREE_CHAIN (next);
+  		}
+  	      else
+  		t = next;
+  
+  	      next = TREE_CHAIN (t);
+  	    }
+  	}
+        
+       /* Write out needed type info variables. Writing out one variable
+          might cause others to be needed.  */
+       if (walk_globals (unemitted_tinfo_decl_p, emit_tinfo_decl, /*data=*/0))
+ 	reconsider = true;
+ 
+       /* The list of objects with static storage duration is built up
+ 	 in reverse order.  We clear STATIC_AGGREGATES so that any new
+ 	 aggregates added during the initialization of these will be
+ 	 initialized in the correct order when we next come around the
+ 	 loop.  */
+       vars = prune_vars_needing_no_initialization (&static_aggregates);
+ 
+       if (vars)
+ 	{
+ 	  tree v;
+ 
+ 	  /* We need to start a new initialization function each time
+ 	     through the loop.  That's because we need to know which
+ 	     vtables have been referenced, and TREE_SYMBOL_REFERENCED
+ 	     isn't computed until a function is finished, and written
+ 	     out.  That's a deficiency in the back-end.  When this is
+ 	     fixed, these initialization functions could all become
+ 	     inline, with resulting performance improvements.  */
+ 	  tree ssdf_body = start_static_storage_duration_function ();
+ 
+ 	  /* Make sure the back end knows about all the variables.  */
+ 	  write_out_vars (vars);
+ 
+ 	  /* First generate code to do all the initializations.  */
+ 	  for (v = vars; v; v = TREE_CHAIN (v))
+ 	    do_static_initialization (TREE_VALUE (v),
+ 				      TREE_PURPOSE (v));
+ 
+ 	  /* Then, generate code to do all the destructions.  Do these
+ 	     in reverse order so that the most recently constructed
+ 	     variable is the first destroyed.  If we're using
+ 	     __cxa_atexit, then we don't need to do this; functions
+ 	     were registered at initialization time to destroy the
+ 	     local statics.  */
+ 	  if (!flag_use_cxa_atexit)
+ 	    {
+ 	      vars = nreverse (vars);
+ 	      for (v = vars; v; v = TREE_CHAIN (v))
+ 		do_static_destruction (TREE_VALUE (v));
+ 	    }
+ 	  else
+ 	    vars = NULL_TREE;
+ 
+ 	  /* Finish up the static storage duration function for this
+ 	     round.  */
+ 	  finish_static_storage_duration_function (ssdf_body);
+ 
+ 	  /* All those initializations and finalizations might cause
+ 	     us to need more inline functions, more template
+ 	     instantiations, etc.  */
+ 	  reconsider = true;
+ 	}
+       
+       if (deferred_fns_used
+ 	  && wrapup_global_declarations (&VARRAY_TREE (deferred_fns, 0),
+ 					 deferred_fns_used))
+ 	reconsider = true;
+       if (walk_namespaces (wrapup_globals_for_namespace, /*data= */ 0))
+ 	reconsider = true;
+       /* Static data members are just like namespace-scope globals.  */
+       for (i = 0; i < pending_statics_used; ++i)
+ 	{
+ 	  tree decl = VARRAY_TREE (pending_statics, i);
+ 	  if (TREE_ASM_WRITTEN (decl))
+ 	    continue;
+ 	  import_export_decl (decl);
+ 	  if (DECL_NOT_REALLY_EXTERN (decl) && !DECL_IN_AGGR_P (decl))
+ 	    DECL_EXTERNAL (decl) = 0;
+ 	}
+       if (pending_statics
+ 	  && wrapup_global_declarations (&VARRAY_TREE (pending_statics, 0),
+ 					 pending_statics_used))
+ 	reconsider = true;
+     }
+ }
+ 
  /* This routine is called from the last rule in yyparse ().
     Its job is to create all the code needed to initialize and
     destroy the global aggregates.  We do the destruction
*************** finish_file ()
*** 2736,2741 ****
--- 2904,2910 ----
  	  if (!DECL_EXTERNAL (decl)
  	      && DECL_NEEDED_P (decl)
  	      && DECL_SAVED_TREE (decl)
+ 	      && !flag_unit_at_a_time
  	      && !TREE_ASM_WRITTEN (decl))
  	    {
  	      int saved_not_really_extern;
*************** finish_file ()
*** 2783,2804 ****
      } 
    while (reconsider);
  
    /* All used inline functions must have a definition at this point. */
!   for (i = 0; i < deferred_fns_used; ++i)
!     {
!       tree decl = VARRAY_TREE (deferred_fns, i);
! 
!       if (TREE_USED (decl) && DECL_DECLARED_INLINE_P (decl)
! 	  && !(TREE_ASM_WRITTEN (decl) || DECL_SAVED_TREE (decl)))
! 	{
! 	  cp_warning_at ("inline function `%D' used but never defined", decl);
! 	  /* This symbol is effectively an "extern" declaration now.
! 	     This is not strictly necessary, but removes a duplicate
! 	     warning.  */
! 	  TREE_PUBLIC (decl) = 1;
! 	}
!       
!     }
    
    /* We give C linkage to static constructors and destructors.  */
    push_lang_context (lang_name_c);
--- 2952,2974 ----
      } 
    while (reconsider);
  
+   if (!flag_unit_at_a_time)
    /* All used inline functions must have a definition at this point. */
!     for (i = 0; i < deferred_fns_used; ++i)
!       {
! 	tree decl = VARRAY_TREE (deferred_fns, i);
! 
! 	if (TREE_USED (decl) && DECL_DECLARED_INLINE_P (decl)
! 	    && !(TREE_ASM_WRITTEN (decl) || DECL_SAVED_TREE (decl)))
! 	  {
! 	    cp_warning_at ("inline function `%D' used but never defined", decl);
! 	    /* This symbol is effectively an "extern" declaration now.
! 	       This is not strictly necessary, but removes a duplicate
! 	       warning.  */
! 	    TREE_PUBLIC (decl) = 1;
! 	  }
! 	
!       }
    
    /* We give C linkage to static constructors and destructors.  */
    push_lang_context (lang_name_c);
*************** finish_file ()
*** 2809,2814 ****
--- 2979,2989 ----
      splay_tree_foreach (priority_info_map, 
  			generate_ctor_and_dtor_functions_for_priority,
  			/*data=*/0);
+   if (flag_unit_at_a_time)
+     {
+       cgraph_finalize_compilation_unit ();
+       cgraph_optimize ();
+     }
  
    /* We're done with the splay-tree now.  */
    if (priority_info_map)
Index: cp/pt.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/pt.c,v
retrieving revision 1.669
diff -c -3 -p -r1.669 pt.c
*** cp/pt.c	4 Mar 2003 01:13:38 -0000	1.669
--- cp/pt.c	12 Mar 2003 17:56:16 -0000
*************** instantiate_decl (d, defer_ok)
*** 10893,10899 ****
        local_specializations = saved_local_specializations;
  
        /* Finish the function.  */
!       expand_body (finish_function (0));
      }
  
    /* We're not deferring instantiation any more.  */
--- 10893,10899 ----
        local_specializations = saved_local_specializations;
  
        /* Finish the function.  */
!       really_expand_body (finish_function (0));
      }
  
    /* We're not deferring instantiation any more.  */
Index: cp/semantics.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/semantics.c,v
retrieving revision 1.299
diff -c -3 -p -r1.299 semantics.c
*** cp/semantics.c	8 Mar 2003 18:47:41 -0000	1.299
--- cp/semantics.c	12 Mar 2003 17:56:17 -0000
***************
*** 41,46 ****
--- 41,47 ----
  #include "output.h"
  #include "timevar.h"
  #include "debug.h"
+ #include "cgraph.h"
  
  /* There routines provide a modular interface to perform many parsing
     operations.  They may therefore be used during actual parsing, or
*************** emit_associated_thunks (fn)
*** 2294,2372 ****
      }
  }
  
- /* Generate RTL for FN.  */
- 
  void
! expand_body (fn)
       tree fn;
  {
    int saved_lineno;
    const char *saved_input_filename;
    tree saved_function;
  
-   /* When the parser calls us after finishing the body of a template
-      function, we don't really want to expand the body.  When we're
-      processing an in-class definition of an inline function,
-      PROCESSING_TEMPLATE_DECL will no longer be set here, so we have
-      to look at the function itself.  */
-   if (processing_template_decl
-       || (DECL_LANG_SPECIFIC (fn) 
- 	  && DECL_TEMPLATE_INFO (fn)
- 	  && uses_template_parms (DECL_TI_ARGS (fn))))
-     {
-       /* Normally, collection only occurs in rest_of_compilation.  So,
- 	 if we don't collect here, we never collect junk generated
- 	 during the processing of templates until we hit a
- 	 non-template function.  */
-       ggc_collect ();
-       return;
-     }
- 
-   /* Replace AGGR_INIT_EXPRs with appropriate CALL_EXPRs.  */
-   walk_tree_without_duplicates (&DECL_SAVED_TREE (fn),
- 				simplify_aggr_init_exprs_r,
- 				NULL);
- 
-   /* If this is a constructor or destructor body, we have to clone
-      it.  */
-   if (maybe_clone_body (fn))
-     {
-       /* We don't want to process FN again, so pretend we've written
- 	 it out, even though we haven't.  */
-       TREE_ASM_WRITTEN (fn) = 1;
-       return;
-     }
- 
-   /* There's no reason to do any of the work here if we're only doing
-      semantic analysis; this code just generates RTL.  */
-   if (flag_syntax_only)
-     return;
- 
-   /* If possible, avoid generating RTL for this function.  Instead,
-      just record it as an inline function, and wait until end-of-file
-      to decide whether to write it out or not.  */
-   if (/* We have to generate RTL if it's not an inline function.  */
-       (DECL_INLINE (fn) || DECL_COMDAT (fn))
-       /* Or if we have to emit code for inline functions anyhow.  */
-       && !flag_keep_inline_functions
-       /* Or if we actually have a reference to the function.  */
-       && !DECL_NEEDED_P (fn))
-     {
-       /* Set DECL_EXTERNAL so that assemble_external will be called as
- 	 necessary.  We'll clear it again in finish_file.  */
-       if (!DECL_EXTERNAL (fn))
- 	{
- 	  DECL_NOT_REALLY_EXTERN (fn) = 1;
- 	  DECL_EXTERNAL (fn) = 1;
- 	}
-       /* Remember this function.  In finish_file we'll decide if
- 	 we actually need to write this function out.  */
-       defer_fn (fn);
-       /* Let the back-end know that this function exists.  */
-       (*debug_hooks->deferred_inline_function) (fn);
-       return;
-     }
- 
    /* Compute the appropriate object-file linkage for inline
       functions.  */
    if (DECL_DECLARED_INLINE_P (fn))
--- 2295,2308 ----
      }
  }
  
  void
! really_expand_body (fn)
       tree fn;
  {
    int saved_lineno;
    const char *saved_input_filename;
    tree saved_function;
  
    /* Compute the appropriate object-file linkage for inline
       functions.  */
    if (DECL_DECLARED_INLINE_P (fn))
*************** expand_body (fn)
*** 2438,2443 ****
--- 2374,2467 ----
  
    /* Emit any thunks that should be emitted at the same time as FN.  */
    emit_associated_thunks (fn);
+ }
+ 
+ /* Generate RTL for FN.  */
+ 
+ void
+ expand_body (fn)
+      tree fn;
+ {
+   /* When the parser calls us after finishing the body of a template
+      function, we don't really want to expand the body.  When we're
+      processing an in-class definition of an inline function,
+      PROCESSING_TEMPLATE_DECL will no longer be set here, so we have
+      to look at the function itself.  */
+   if (processing_template_decl
+       || (DECL_LANG_SPECIFIC (fn) 
+ 	  && DECL_TEMPLATE_INFO (fn)
+ 	  && uses_template_parms (DECL_TI_ARGS (fn))))
+     {
+       /* Normally, collection only occurs in rest_of_compilation.  So,
+ 	 if we don't collect here, we never collect junk generated
+ 	 during the processing of templates until we hit a
+ 	 non-template function.  */
+       ggc_collect ();
+       return;
+     }
+ 
+   /* Replace AGGR_INIT_EXPRs with appropriate CALL_EXPRs.  */
+   walk_tree_without_duplicates (&DECL_SAVED_TREE (fn),
+ 				simplify_aggr_init_exprs_r,
+ 				NULL);
+ 
+   /* If this is a constructor or destructor body, we have to clone
+      it.  */
+   if (maybe_clone_body (fn))
+     {
+       /* We don't want to process FN again, so pretend we've written
+ 	 it out, even though we haven't.  */
+       TREE_ASM_WRITTEN (fn) = 1;
+       return;
+     }
+ 
+   /* There's no reason to do any of the work here if we're only doing
+      semantic analysis; this code just generates RTL.  */
+   if (flag_syntax_only)
+     return;
+ 
+   if (flag_unit_at_a_time)
+     {
+ #if 0
+       /* Set DECL_EXTERNAL so that assemble_external will be called as
+ 	 necessary.  We'll clear it again in finish_file.  */
+       if (!DECL_EXTERNAL (fn))
+ 	{
+ 	  DECL_NOT_REALLY_EXTERN (fn) = 1;
+ 	  DECL_EXTERNAL (fn) = 1;
+ 	}
+ #endif
+       cgraph_finalize_function (fn, DECL_SAVED_TREE (fn));
+       current_function_decl = NULL;
+       return;
+     }
+ 
+   /* If possible, avoid generating RTL for this function.  Instead,
+      just record it as an inline function, and wait until end-of-file
+      to decide whether to write it out or not.  */
+   if (/* We have to generate RTL if it's not an inline function.  */
+       (DECL_INLINE (fn) || DECL_COMDAT (fn))
+       /* Or if we have to emit code for inline functions anyhow.  */
+       && !flag_keep_inline_functions
+       /* Or if we actually have a reference to the function.  */
+       && !DECL_NEEDED_P (fn))
+     {
+       /* Set DECL_EXTERNAL so that assemble_external will be called as
+ 	 necessary.  We'll clear it again in finish_file.  */
+       if (!DECL_EXTERNAL (fn))
+ 	{
+ 	  DECL_NOT_REALLY_EXTERN (fn) = 1;
+ 	  DECL_EXTERNAL (fn) = 1;
+ 	}
+       /* Remember this function.  In finish_file we'll decide if
+ 	 we actually need to write this function out.  */
+       defer_fn (fn);
+       /* Let the back-end know that this function exists.  */
+       (*debug_hooks->deferred_inline_function) (fn);
+       return;
+     }
+ 
+   really_expand_body (fn);
  }
  
  /* Helper function for walk_tree, used by finish_function to override all

^ permalink raw reply	[flat|nested] 7+ messages in thread

* PR 10196 / Re: Inliner parameters
  2003-04-17 10:50   ` Richard Guenther
@ 2003-04-23 12:21     ` Richard Guenther
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Guenther @ 2003-04-23 12:21 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Mark Mitchell

On Thu, 17 Apr 2003, Richard Guenther wrote:

> On 17 Apr 2003, Steven Bosscher wrote:
>
> > Op wo 16-04-2003, om 23:17 schreef Richard Guenther:
> > > > When was the last time somebody tried to tune the parameters a bit?  Did
> > > > anyone try the effects of different parameter settings for, say, SPEC
> > > > and POOMA (and, ideally, on more than one platform)?
> > >
> > > I tried various parameters for POOMA to tune the performance of the
> > > optimized code and the key parameter to change was min-inline-insns.
> > > This is _way_ too low for POOMA to collapse the expression template
> > > trees. I need to bump this up to 250 to get good performance. The
> > > max-inline-insns-single can be dropped to 250 without loss then.
> >
> > What happened to the compile times with bigger min-inline-insns?
>
> Here are compile time and runtime numbers for my performace testcase using
> g++-3.3 (GCC) 3.3 20030414 (prerelease) with options -O2 -march=athlon
> -fomit-frame-pointer -funroll-loops -fno-exceptions --param min-inline-insns=X.
> Lower numbers for the perf. indicator are better.
>
>   X      compile-time    performance indicator
> default     49.50           1.99804e-06
>  50         50.25           2.26817e-06
> 100         50.00           1.96918e-06
> 150         51.00           1.90269e-06
> 200         58.25           1.83045e-06
> 250         61.25           1.28309e-06
> 300         62.75           1.29364e-06
> default + -Dinline="__inline__ __attribute__((always_inline))"
>             50.50           1.31171e-06
> (while the source is not optimized for inline->always_inline
> transformation)
>
> just to show what happens with EH on, for the best param above (250)
> we get
>
> 250         [again, goes into swap... - till then, 3min elapsed]
> killed it - going to a machine with 2GB ram and more GHz where we cant
> compare the compile time numbers from above, of course...
> 250        448.00 [uses 750MB of ram]  5.96641e-07
> which is more than an order of magnitude worse than without EH on a
> faster CPU with faster mem... ugh!

With g++-3.3 (GCC) 3.3 20030423 (prerelease) I now get

250     79.75 [154MB]     1.2667e-06

which is not only a lot better in compile time and in memory usage, but
also on-par in performance with the -fno-exceptions case.

Just to repeat the -fno-exceptions case, with the new gcc I get

250     63.75 [153MB]     1.26749e-06

so compile time is still worse for exceptions turned on, but that is to
be expected anyway. Just for the curious, here are the g++-3.2 (GCC) 3.2.3
20030414 (prerelease) numbers:

with exceptions (default inlining params)

default  63.00 [170MB]   1.96574e-06

without exceptions (default inlining params)

default  54.00 [161MB]   1.97593e-06


The testcase for PR10196 now shows:

g++-3.3 -fno-exceptions: 11.75s
g++-3.3 -fexceptions:    14.75s
g++-3.2 -fno-exceptions:  9.50s
g++-3.2 -fexceptions:    11.50s

which is _a lot_ better, but still a 19% regression for -fno-exceptions
and a 22% regression for -fexceptions. But as these numbers are below
30%, we can now downgrade the priority of the PR?


Btw., using libstdc++ from gcc-3.3 with code compiled with gcc-3.2 gives
/LINUXgcc32/Bench: relocation error: ./LINUXgcc32/Bench: symbol
_ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E,
version GLIBCPP_3.2 not defined in file libstdc++.so.5 with link time
reference

the other way around its similar. Maybe libstdc++ version needs to be
bumped for 3.3?

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Inliner parameters
@ 2003-04-16 17:26 Steven Bosscher
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Bosscher @ 2003-04-16 17:26 UTC (permalink / raw)
  To: gcc

Hi,

Many 3.3 compile time regressions from 3.2 seem to be caused by the new
tree inliner heuristics.  There are at least a few PRs for which the
inliner apparently causes serious compile time regressions (PRs 10160,
10196, and 10316 and part of 8361 as well).  Obviously those PRs are
just the tip of the proverbial iceberg...

All these compile time regressions disappear when the inlining limit is
changed.  It looks like the 3.3 inliner is a much more agressive, but I
have not seen any data to confirm this.

When was the last time somebody tried to tune the parameters a bit?  Did
anyone try the effects of different parameter settings for, say, SPEC
and POOMA (and, ideally, on more than one platform)?

Greetz
Steven




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-04-23 11:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-16 21:38 Inliner parameters Richard Guenther
2003-04-17  3:11 ` Steven Bosscher
2003-04-17 10:50   ` Richard Guenther
2003-04-23 12:21     ` PR 10196 / " Richard Guenther
2003-04-17 10:50   ` Richard Guenther
2003-04-17 13:29   ` Jan Hubicka
  -- strict thread matches above, loose matches on Subject: below --
2003-04-16 17:26 Steven Bosscher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).