public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time
@ 2012-07-13 23:57 ncahill_alt at yahoo dot com
2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: ncahill_alt at yahoo dot com @ 2012-07-13 23:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
Bug #: 53958
Summary: set_slot_part and canon_value_cmp using 90% of compile
time
Classification: Unclassified
Product: gcc
Version: 4.7.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: ncahill_alt@yahoo.com
Created attachment 27786
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27786
Source code showing problem.
I've got a situation here where the compile basically stalls, and a profiling
sample from a late enough stage shows 92% of time going to set_slot_part and
canon_value_cmp (from var-tracking.c). So I guess the assumptions about the
size of some data structure are being invalidated.
I am unable to give a small testcase, but I've attached the preprocessed code
that causes this condition. The flag that seems to give the trouble is
-fno-omit-frame-pointer. Without this flag, it completes in reasonable time.
This is with gcc 4.7.1, i686-pc-linux-gnu, 32 bit.
Commands:
cc1plus -v b.c -dumpbase b.c -march=athlon -mtune=pentium2 -auxbase-strip b.o
-g2 -O2 -std=gnu++98 -fno-inline -funroll-loops -funswitch-loops
-fno-strict-aliasing -fno-omit-frame-pointer
gcc -pipe -g2 -fno-omit-frame-pointer -O2 -Werror -fno-strict-aliasing
-march=athlon -mtune=pentium2 e -funroll-loops -funswitch-loops -Wall
-Wcast-align -Wundef -Wformat-security -Wwrite-strings -Wno-se -Wno-conversion
-Wno-narrowing -pthread -pthread -x c++ -std=gnu++98 -Woverloaded-virtual -c
b.c -o b.o
Thank you.
Neil.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
@ 2012-08-25 18:41 ` paolo.carlini at oracle dot com
2012-08-26 20:37 ` [Bug c++/53958] [4.6/4.7/4.8 Regression] " steven at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: paolo.carlini at oracle dot com @ 2012-08-25 18:41 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
Paolo Carlini <paolo.carlini at oracle dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |stevenb.gcc at gmail dot
| |com
--- Comment #1 from Paolo Carlini <paolo.carlini at oracle dot com> 2012-08-25 18:41:26 UTC ---
Maybe Steven is interested...
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] [4.6/4.7/4.8 Regression] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
@ 2012-08-26 20:37 ` steven at gcc dot gnu.org
2012-08-26 23:42 ` steven at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-26 20:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|set_slot_part and |[4.6/4.7/4.8 Regression]
|canon_value_cmp using 90% |set_slot_part and
|of compile time |canon_value_cmp using 90%
| |of compile time
Known to fail| |4.6.3, 4.7.1, 4.8.0
--- Comment #2 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-26 20:36:42 UTC ---
Another one on the heap of var-tracking slowness issues...
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] [4.6/4.7/4.8 Regression] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
2012-08-26 20:37 ` [Bug c++/53958] [4.6/4.7/4.8 Regression] " steven at gcc dot gnu.org
@ 2012-08-26 23:42 ` steven at gcc dot gnu.org
2012-09-06 14:43 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-26 23:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-26 23:41:41 UTC ---
Created attachment 28088
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28088
Somewhat reduced, preprocessed test case
On x86_64, compile with:
$ ./cc1plus -m32 -quiet -ftime-report -std=gnu++98 -O2 -g2 PR53958.cc
With 1 "HUN" line I have:
var-tracking dataflow : 1.54 (24%) usr
var-tracking emit : 1.58 (25%) usr
TOTAL : 6.43
With 2 "HUN" lines, this changes to:
var-tracking dataflow : 9.19 (37%) usr
var-tracking emit : 8.85 (36%) usr
TOTAL : 24.86
With 3 "HUN" lines, things begin to get really ugly:
var-tracking dataflow : 33.39 (43%) usr
var-tracking emit : 31.79 (41%) usr
TOTAL : 77.44
With 4 "HUN lines, the last test I ran, the timings are:
var-tracking dataflow : 69.47 (80%) usr
var-tracking emit : 0.03 ( 0%) usr
TOTAL : 87.2495353 kB
So both the var-tracking dataflow and note emitting initially show quadratic
behavior for this test case, but the emitting appears to have some kind of
cut-off to make it disappear for the last test:
PR53958.cc: In function 'void
construct_core_types(simple_list<input_type_entry>&)':
PR53958.cc:234:6: note: variable tracking size limit exceeded with
-fvar-tracking-assignments, retrying without
void construct_core_types(simple_list<input_type_entry> &typelist)
This limit is never reached in the original test case.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] [4.6/4.7/4.8 Regression] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
` (2 preceding siblings ...)
2012-08-26 23:42 ` steven at gcc dot gnu.org
@ 2012-09-06 14:43 ` rguenth at gcc dot gnu.org
2012-09-07 9:47 ` [Bug c++/53958] " rguenth at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-06 14:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.6.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
` (3 preceding siblings ...)
2012-09-06 14:43 ` rguenth at gcc dot gnu.org
@ 2012-09-07 9:47 ` rguenth at gcc dot gnu.org
2013-03-06 10:38 ` steven at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-07 9:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-09-07
Blocks| |47344
Target Milestone|4.6.4 |---
Summary|[4.6/4.7/4.8 Regression] |set_slot_part and
|set_slot_part and |canon_value_cmp using 90%
|canon_value_cmp using 90% |of compile time
|of compile time |
Ever Confirmed|0 |1
--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-07 09:46:43 UTC ---
Confirmed. Happens in all maintained releases, duplicating the regression
to 47344.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
` (4 preceding siblings ...)
2012-09-07 9:47 ` [Bug c++/53958] " rguenth at gcc dot gnu.org
@ 2013-03-06 10:38 ` steven at gcc dot gnu.org
2014-01-20 10:51 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-06 10:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org,
| |steven at gcc dot gnu.org
--- Comment #5 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 10:38:08 UTC ---
More var-tracking slowness. Maybe fixed by recent patches, needs triaging.
NB: this is only not marked as a regression because all maintained release
branches have the problem. That's a rather odd way to "hide" a regression,
but it was so decided by Richi. IMHO it's worth having a look at this, the
test case isn't completely unreasonable.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
` (5 preceding siblings ...)
2013-03-06 10:38 ` steven at gcc dot gnu.org
@ 2014-01-20 10:51 ` rguenth at gcc dot gnu.org
2014-01-20 11:02 ` jakub at gcc dot gnu.org
2014-01-21 11:55 ` aoliva at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-20 10:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
As usual, the iteration order imposed by pre_and_rev_postorder isn't the best
one for a forward dataflow problem. Using inverted_post_order_compute
improves compile-time somewhat. But I can still confirm
variable tracking : 0.39 ( 1%) usr 0.00 ( 0%) sys 0.41 ( 1%) wall
5504 kB ( 6%) ggc
var-tracking dataflow : 34.96 (84%) usr 0.16 (53%) sys 35.10 (84%) wall
47 kB ( 0%) ggc
as general observation I wonder why the dataflow problem computes the
_union_ as opposed to the intersection of the info on the preds
in the !MAY_HAVE_DEBUG_INSNS case.
Also if the MAY_HAVE_DEBUG_INSNS case really computes an intersection
(as documented) then it can avoid repeated clearing of the in set
and only needs to dataflow_set_merge from changed edges.
Now the discrepancy in wrt the !MAY_HAVE_DEBUG_INSNS case makes me not
trust that comment blindly ...
That said, handling only changed edges can be done by doing the intersection
in the if (changed) FOR_EACH_EDGE loop and dropping the initial ->in set
compute (just retaining the post-merge-adjust).
>From a quick look var-tracking doesn't seem to take the opportunity of
pruning its sets based on variable scopes (it would need to compute
scope liveness in vt_initialize).
Anyway, here's a patch improving compile-time for this testcase by ~6%
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c (revision 206599)
+++ gcc/var-tracking.c (working copy)
@@ -6934,12 +6934,12 @@ vt_find_locations (void)
bool success = true;
timevar_push (TV_VAR_TRACKING_DATAFLOW);
- /* Compute reverse completion order of depth first search of the CFG
+ /* Compute reverse top sord order of the inverted CFG
so that the data-flow runs faster. */
- rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+ rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
bb_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
- pre_and_rev_post_order_compute (NULL, rc_order, false);
- for (i = 0; i < n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS; i++)
+ int num = inverted_post_order_compute (rc_order);
+ for (i = 0; i < num; i++)
bb_order[rc_order[i]] = i;
free (rc_order);
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
` (6 preceding siblings ...)
2014-01-20 10:51 ` rguenth at gcc dot gnu.org
@ 2014-01-20 11:02 ` jakub at gcc dot gnu.org
2014-01-21 11:55 ` aoliva at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-20 11:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> As usual, the iteration order imposed by pre_and_rev_postorder isn't the best
> one for a forward dataflow problem. Using inverted_post_order_compute
> improves compile-time somewhat. But I can still confirm
>
> variable tracking : 0.39 ( 1%) usr 0.00 ( 0%) sys 0.41 ( 1%)
> wall 5504 kB ( 6%) ggc
> var-tracking dataflow : 34.96 (84%) usr 0.16 (53%) sys 35.10 (84%)
> wall 47 kB ( 0%) ggc
>
> as general observation I wonder why the dataflow problem computes the
> _union_ as opposed to the intersection of the info on the preds
> in the !MAY_HAVE_DEBUG_INSNS case.
>
> Also if the MAY_HAVE_DEBUG_INSNS case really computes an intersection
> (as documented) then it can avoid repeated clearing of the in set
> and only needs to dataflow_set_merge from changed edges.
>
> Now the discrepancy in wrt the !MAY_HAVE_DEBUG_INSNS case makes me not
> trust that comment blindly ...
I think it isn't about MAY_HAVE_DEBUG_INSNS or not, but the union vs.
intersection depends on if the var is tracked by VALUEs or not, i.e. onepart
decls vs. others.
> From a quick look var-tracking doesn't seem to take the opportunity of
> pruning its sets based on variable scopes (it would need to compute
> scope liveness in vt_initialize).
That has been discussed and refused by GDB folks AFAIK.
E.g. see http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00960.html
(though I think it was mostly IRC based later on).
> Anyway, here's a patch improving compile-time for this testcase by ~6%
>
> Index: gcc/var-tracking.c
> ===================================================================
> --- gcc/var-tracking.c (revision 206599)
> +++ gcc/var-tracking.c (working copy)
> @@ -6934,12 +6934,12 @@ vt_find_locations (void)
> bool success = true;
>
> timevar_push (TV_VAR_TRACKING_DATAFLOW);
> - /* Compute reverse completion order of depth first search of the CFG
> + /* Compute reverse top sord order of the inverted CFG
> so that the data-flow runs faster. */
> - rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
> + rc_order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> bb_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
> - pre_and_rev_post_order_compute (NULL, rc_order, false);
> - for (i = 0; i < n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS; i++)
> + int num = inverted_post_order_compute (rc_order);
> + for (i = 0; i < num; i++)
> bb_order[rc_order[i]] = i;
> free (rc_order);
If it doesn't regress on other testcases (var-tracking speed wise), then that
LGTM.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c++/53958] set_slot_part and canon_value_cmp using 90% of compile time
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
` (7 preceding siblings ...)
2014-01-20 11:02 ` jakub at gcc dot gnu.org
@ 2014-01-21 11:55 ` aoliva at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: aoliva at gcc dot gnu.org @ 2014-01-21 11:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53958
--- Comment #8 from Alexandre Oliva <aoliva at gcc dot gnu.org> ---
Jakub is right WRT onepart vs non-onepart vars. Now, I can't think of any why
the union/intersection couldn't be done incrementally, and only for changed
incoming sets (but how would you tell an incoming set changed?).
IIRC, onepart and non-onepart sets are effective disjoint, even if they share
the same data structures, so when we perform union on one and intersection on
another we could avoid doing that repeatedly.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-01-21 11:55 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-13 23:57 [Bug c++/53958] New: set_slot_part and canon_value_cmp using 90% of compile time ncahill_alt at yahoo dot com
2012-08-25 18:41 ` [Bug c++/53958] " paolo.carlini at oracle dot com
2012-08-26 20:37 ` [Bug c++/53958] [4.6/4.7/4.8 Regression] " steven at gcc dot gnu.org
2012-08-26 23:42 ` steven at gcc dot gnu.org
2012-09-06 14:43 ` rguenth at gcc dot gnu.org
2012-09-07 9:47 ` [Bug c++/53958] " rguenth at gcc dot gnu.org
2013-03-06 10:38 ` steven at gcc dot gnu.org
2014-01-20 10:51 ` rguenth at gcc dot gnu.org
2014-01-20 11:02 ` jakub at gcc dot gnu.org
2014-01-21 11:55 ` aoliva at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).