public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time
@ 2012-07-13 23:57 ncahill_alt at yahoo dot com
  2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: ncahill_alt at yahoo dot com @ 2012-07-13 23:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

             Bug #: 53958
           Summary: set_slot_part and canon_value_cmp using 90% of compile
                    time
    Classification: Unclassified
           Product: gcc
           Version: 4.7.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: ncahill_alt@yahoo.com


Created attachment 27786
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27786
Source code showing problem.

I've got a situation here where the compile basically stalls, and a profiling
sample from a late enough stage shows 92% of time going to set_slot_part and
canon_value_cmp (from var-tracking.c).  So I guess the assumptions about the
size of some data structure are being invalidated.

I am unable to give a small testcase, but I've attached the preprocessed code
that causes this condition.  The flag that seems to give the trouble is
-fno-omit-frame-pointer.  Without this flag, it completes in reasonable time.

This is with gcc 4.7.1, i686-pc-linux-gnu, 32 bit.  

Commands:

cc1plus -v b.c -dumpbase b.c -march=athlon -mtune=pentium2 -auxbase-strip b.o
-g2 -O2 -std=gnu++98 -fno-inline -funroll-loops -funswitch-loops
-fno-strict-aliasing -fno-omit-frame-pointer


gcc -pipe -g2 -fno-omit-frame-pointer -O2 -Werror -fno-strict-aliasing
-march=athlon -mtune=pentium2 e -funroll-loops -funswitch-loops -Wall
-Wcast-align -Wundef -Wformat-security -Wwrite-strings -Wno-se -Wno-conversion
-Wno-narrowing -pthread -pthread -x c++ -std=gnu++98 -Woverloaded-virtual -c
b.c -o b.o


Thank you.
Neil.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
@ 2012-08-25 18:41 ` paolo.carlini at oracle dot com
  2012-08-26 20:37 ` [Bug c++/53958] [4.6/4.7/4.8 Regression] " steven at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: paolo.carlini at oracle dot com @ 2012-08-25 18:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |stevenb.gcc at gmail dot
                   |                            |com

--- Comment #1 from Paolo Carlini <paolo.carlini at oracle dot com> 2012-08-25 18:41:26 UTC ---
Maybe Steven is interested...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] [4.6/4.7/4.8 Regression] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
  2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
@ 2012-08-26 20:37 ` steven at gcc dot gnu.org
  2012-08-26 23:42 ` steven at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-26 20:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|set_slot_part and           |[4.6/4.7/4.8 Regression]
                   |canon_value_cmp using 90%   |set_slot_part and
                   |of compile time             |canon_value_cmp using 90%
                   |                            |of compile time
      Known to fail|                            |4.6.3, 4.7.1, 4.8.0

--- Comment #2 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-26 20:36:42 UTC ---
Another one on the heap of var-tracking slowness issues...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] [4.6/4.7/4.8 Regression] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
  2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
  2012-08-26 20:37 ` [Bug c++/53958] [4.6/4.7/4.8 Regression] " steven at gcc dot gnu.org
@ 2012-08-26 23:42 ` steven at gcc dot gnu.org
  2012-09-06 14:43 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-26 23:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-26 23:41:41 UTC ---
Created attachment 28088
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28088
Somewhat reduced, preprocessed test case

On x86_64, compile with:

$ ./cc1plus -m32 -quiet -ftime-report -std=gnu++98 -O2 -g2 PR53958.cc

With 1 "HUN" line I have:

 var-tracking dataflow   :   1.54 (24%) usr
 var-tracking emit       :   1.58 (25%) usr
 TOTAL                 :   6.43


With 2 "HUN" lines, this changes to:

 var-tracking dataflow   :   9.19 (37%) usr
 var-tracking emit       :   8.85 (36%) usr
 TOTAL                 :  24.86


With 3 "HUN" lines, things begin to get really ugly:

 var-tracking dataflow   :  33.39 (43%) usr
 var-tracking emit       :  31.79 (41%) usr
 TOTAL                 :  77.44

With 4 "HUN lines, the last test I ran, the timings are:

 var-tracking dataflow   :  69.47 (80%) usr
 var-tracking emit       :   0.03 ( 0%) usr
 TOTAL                 :  87.2495353 kB

So both the var-tracking dataflow and note emitting initially show quadratic
behavior for this test case, but the emitting appears to have some kind of
cut-off to make it disappear for the last test:

PR53958.cc: In function 'void
construct_core_types(simple_list<input_type_entry>&)':
PR53958.cc:234:6: note: variable tracking size limit exceeded with
-fvar-tracking-assignments, retrying without
 void construct_core_types(simple_list<input_type_entry> &typelist)

This limit is never reached in the original test case.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] [4.6/4.7/4.8 Regression] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
                   ` (2 preceding siblings ...)
  2012-08-26 23:42 ` steven at gcc dot gnu.org
@ 2012-09-06 14:43 ` rguenth at gcc dot gnu.org
  2012-09-07  9:47 ` [Bug c++/53958] " rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-06 14:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.6.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
                   ` (3 preceding siblings ...)
  2012-09-06 14:43 ` rguenth at gcc dot gnu.org
@ 2012-09-07  9:47 ` rguenth at gcc dot gnu.org
  2013-03-06 10:38 ` steven at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-07  9:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-09-07
             Blocks|                            |47344
   Target Milestone|4.6.4                       |---
            Summary|[4.6/4.7/4.8 Regression]    |set_slot_part and
                   |set_slot_part and           |canon_value_cmp using 90%
                   |canon_value_cmp using 90%   |of compile time
                   |of compile time             |
     Ever Confirmed|0                           |1

--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-07 09:46:43 UTC ---
Confirmed.  Happens in all maintained releases, duplicating the regression
to 47344.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
                   ` (4 preceding siblings ...)
  2012-09-07  9:47 ` [Bug c++/53958] " rguenth at gcc dot gnu.org
@ 2013-03-06 10:38 ` steven at gcc dot gnu.org
  2014-01-20 10:51 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-06 10:38 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |steven at gcc dot gnu.org

--- Comment #5 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 10:38:08 UTC ---
More var-tracking slowness. Maybe fixed by recent patches, needs triaging.

NB: this is only not marked as a regression because all maintained release
branches have the problem. That's a rather odd way to "hide" a regression,
but it was so decided by Richi. IMHO it's worth having a look at this, the
test case isn't completely unreasonable.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
                   ` (5 preceding siblings ...)
  2013-03-06 10:38 ` steven at gcc dot gnu.org
@ 2014-01-20 10:51 ` rguenth at gcc dot gnu.org
  2014-01-20 11:02 ` jakub at gcc dot gnu.org
  2014-01-21 11:55 ` aoliva at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-20 10:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
As usual, the iteration order imposed by pre_and_rev_postorder isn't the best
one for a forward dataflow problem.  Using inverted_post_order_compute
improves compile-time somewhat.  But I can still confirm

 variable tracking       :   0.39 ( 1%) usr   0.00 ( 0%) sys   0.41 ( 1%) wall 
  5504 kB ( 6%) ggc
 var-tracking dataflow   :  34.96 (84%) usr   0.16 (53%) sys  35.10 (84%) wall 
    47 kB ( 0%) ggc

as general observation I wonder why the dataflow problem computes the
_union_ as opposed to the intersection of the info on the preds
in the !MAY_HAVE_DEBUG_INSNS case.

Also if the MAY_HAVE_DEBUG_INSNS case really computes an intersection
(as documented) then it can avoid repeated clearing of the in set
and only needs to dataflow_set_merge from changed edges.

Now the discrepancy in wrt the !MAY_HAVE_DEBUG_INSNS case makes me not
trust that comment blindly ...

That said, handling only changed edges can be done by doing the intersection
in the if (changed) FOR_EACH_EDGE loop and dropping the initial ->in set
compute (just retaining the post-merge-adjust).

>From a quick look var-tracking doesn't seem to take the opportunity of
pruning its sets based on variable scopes (it would need to compute
scope liveness in vt_initialize).

Anyway, here's a patch improving compile-time for this testcase by ~6%

Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c  (revision 206599)
+++ gcc/var-tracking.c  (working copy)
@@ -6934,12 +6934,12 @@ vt_find_locations (void)
   bool success = true;

   timevar_push (TV_VAR_TRACKING_DATAFLOW);
-  /* Compute reverse completion order of depth first search of the CFG
+  /* Compute reverse top sord order of the inverted CFG
      so that the data-flow runs faster.  */
-  rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+  rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
   bb_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
-  pre_and_rev_post_order_compute (NULL, rc_order, false);
-  for (i = 0; i < n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS; i++)
+  int num = inverted_post_order_compute (rc_order);
+  for (i = 0; i < num; i++)
     bb_order[rc_order[i]] = i;
   free (rc_order);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
                   ` (6 preceding siblings ...)
  2014-01-20 10:51 ` rguenth at gcc dot gnu.org
@ 2014-01-20 11:02 ` jakub at gcc dot gnu.org
  2014-01-21 11:55 ` aoliva at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-20 11:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> As usual, the iteration order imposed by pre_and_rev_postorder isn't the best
> one for a forward dataflow problem.  Using inverted_post_order_compute
> improves compile-time somewhat.  But I can still confirm
> 
>  variable tracking       :   0.39 ( 1%) usr   0.00 ( 0%) sys   0.41 ( 1%)
> wall    5504 kB ( 6%) ggc
>  var-tracking dataflow   :  34.96 (84%) usr   0.16 (53%) sys  35.10 (84%)
> wall      47 kB ( 0%) ggc
> 
> as general observation I wonder why the dataflow problem computes the
> _union_ as opposed to the intersection of the info on the preds
> in the !MAY_HAVE_DEBUG_INSNS case.
> 
> Also if the MAY_HAVE_DEBUG_INSNS case really computes an intersection
> (as documented) then it can avoid repeated clearing of the in set
> and only needs to dataflow_set_merge from changed edges.
> 
> Now the discrepancy in wrt the !MAY_HAVE_DEBUG_INSNS case makes me not
> trust that comment blindly ...

I think it isn't about MAY_HAVE_DEBUG_INSNS or not, but the union vs.
intersection depends on if the var is tracked by VALUEs or not, i.e. onepart
decls vs. others.

> From a quick look var-tracking doesn't seem to take the opportunity of
> pruning its sets based on variable scopes (it would need to compute
> scope liveness in vt_initialize).

That has been discussed and refused by GDB folks AFAIK.
E.g. see http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00960.html
(though I think it was mostly IRC based later on).

> Anyway, here's a patch improving compile-time for this testcase by ~6%
> 
> Index: gcc/var-tracking.c
> ===================================================================
> --- gcc/var-tracking.c  (revision 206599)
> +++ gcc/var-tracking.c  (working copy)
> @@ -6934,12 +6934,12 @@ vt_find_locations (void)
>    bool success = true;
>  
>    timevar_push (TV_VAR_TRACKING_DATAFLOW);
> -  /* Compute reverse completion order of depth first search of the CFG
> +  /* Compute reverse top sord order of the inverted CFG
>       so that the data-flow runs faster.  */
> -  rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
> +  rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>    bb_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
> -  pre_and_rev_post_order_compute (NULL, rc_order, false);
> -  for (i = 0; i < n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS; i++)
> +  int num = inverted_post_order_compute (rc_order);
> +  for (i = 0; i < num; i++)
>      bb_order[rc_order[i]] = i;
>    free (rc_order);

If it doesn't regress on other testcases (var-tracking speed wise), then that
LGTM.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
  2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
                   ` (7 preceding siblings ...)
  2014-01-20 11:02 ` jakub at gcc dot gnu.org
@ 2014-01-21 11:55 ` aoliva at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: aoliva at gcc dot gnu.org @ 2014-01-21 11:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958

--- Comment #8 from Alexandre Oliva <aoliva at gcc dot gnu.org> ---
Jakub is right WRT onepart vs non-onepart vars.  Now, I can't think of any why
the union/intersection couldn't be done incrementally, and only for changed
incoming sets (but how would you tell an incoming set changed?).

IIRC, onepart and non-onepart sets are effective disjoint, even if they share
the same data structures, so when we perform union on one and intersection on
another we could avoid doing that repeatedly.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-01-21 11:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
2012-08-26 20:37 ` [Bug c++/53958] [4.6/4.7/4.8 Regression] " steven at gcc dot gnu.org
2012-08-26 23:42 ` steven at gcc dot gnu.org
2012-09-06 14:43 ` rguenth at gcc dot gnu.org
2012-09-07  9:47 ` [Bug c++/53958] " rguenth at gcc dot gnu.org
2013-03-06 10:38 ` steven at gcc dot gnu.org
2014-01-20 10:51 ` rguenth at gcc dot gnu.org
2014-01-20 11:02 ` jakub at gcc dot gnu.org
2014-01-21 11:55 ` aoliva at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).