public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
@ 2023-05-26  9:26 sjames at gcc dot gnu.org
  2023-05-26  9:28 ` [Bug ipa/109983] " sjames at gcc dot gnu.org
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-05-26  9:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

            Bug ID: 109983
           Summary: [12/13/14 regression] Wireshark compilation hangs with
                    -O2 -fipa-pta
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sjames at gcc dot gnu.org
                CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55165
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55165&action=edit
packet-rnsap.c.i.xz

Originally reported downstream in Gentoo at https://bugs.gentoo.org/907182.

Building Wireshark with -fipa-pta ends up hanging when compiling
packet-rnsap.c.

It's enough to just try to build the reproducer with -O2 -fipa-pta for me
(actually, -O1 hangs too):
```
$ gcc-12 -O2 -fipa-pta -c packet-rnsap.c.i
[... hangs ...]
```

gcc 11.3.1 20230518 doesn't hang, while 12.3.1 20230519 / 13.1.1 20230520 /
14.0.0 20230521 do.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
@ 2023-05-26  9:28 ` sjames at gcc dot gnu.org
  2023-05-26  9:36 ` sjames at gcc dot gnu.org
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-05-26  9:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #1 from Sam James <sjames at gcc dot gnu.org> ---
I let perf spin for a while and got this w/ 13:

```
$ perf record gcc-13 -O2 -fipa-pta -c packet-rnsap.c.i
[^C'd after ~2 minutes]

$ perf report
    43.18%  cc1      cc1                      [.] bitmap_ior_into
    40.43%  cc1      cc1                      [.] bitmap_bit_p
     9.65%  cc1      cc1                      [.] bitmap_set_bit
     2.87%  cc1      cc1                      [.] solve_graph
     0.98%  cc1      cc1                      [.] solution_set_expand
     0.80%  cc1      cc1                      [.] add_graph_edge
     0.39%  cc1      cc1                      [.] find
     0.04%  cc1      cc1                      [.]
symbol_table::decl_assembler_name_hash
     0.04%  cc1      libc.so.6                [.] malloc
     0.03%  cc1      cc1                      [.] ggc_set_mark
     0.03%  cc1      cc1                      [.] verify_ssa
     0.03%  cc1      cc1                      [.] gt_ggc_mx_lang_tree_node
     0.03%  cc1      cc1                      [.] walk_gimple_op
     0.03%  cc1      cc1                      [.] walk_tree_1
     0.03%  cc1      cc1                      [.]
base_pool_allocator<memory_block_pool>::allocate
     0.03%  cc1      cc1                      [.] solve_constraints
     0.02%  cc1      cc1                      [.]
bitmap_list_insert_element_after
     0.02%  cc1      cc1                      [.] verify_gimple_call
     0.02%  cc1      libc.so.6                [.] cfree
     0.02%  cc1      cc1                      [.] ggc_internal_alloc
     0.02%  cc1      cc1                      [.] verify_gimple_in_cfg
     0.02%  cc1      cc1                      [.] equiv_class_lookup_or_add
     0.02%  cc1      cc1                      [.] verify_gimple_stmt
     0.02%  cc1      [unknown]                [k] 0xffffffff89000b90
     0.02%  cc1      cc1                      [.] do_per_function
     0.01%  cc1      cc1                      [.]
hash_table<default_hash_traits<tree_node*>, false, xcallocator>::verify
     0.01%  cc1      cc1                      [.] contains_struct_check
     0.01%  cc1      cc1                      [.]
hash_table<hash_map<nofree_string_hash, align_flags,
simple_hashmap_traits<default_hash_traits<nofree_string_hash>, align_flags>
>::hash_entry, false, xcallocator>::verify
     0.01%  cc1      cc1                      [.] free_dominance_info
     0.01%  cc1      cc1                      [.]
operands_scanner::get_expr_operands
     0.01%  cc1      cc1                      [.] verify_expr_location_1
     0.01%  cc1      cc1                      [.] _cpp_lex_direct
     0.01%  cc1      cc1                      [.] verify_location
     0.01%  cc1      cc1                      [.] (anonymous
namespace)::dom_info::calc_dfs_tree_nonrec
     0.01%  cc1      cc1                      [.] (anonymous
namespace)::dom_info::calc_idoms
[...]
```

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
  2023-05-26  9:28 ` [Bug ipa/109983] " sjames at gcc dot gnu.org
@ 2023-05-26  9:36 ` sjames at gcc dot gnu.org
  2023-05-26 10:38 ` slyfox at gcc dot gnu.org
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-05-26  9:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |compile-time-hog

--- Comment #2 from Sam James <sjames at gcc dot gnu.org> ---
Note that compilation (even with checking off) is kind of slow even without
-fipa-pta, but of course it doesn't hang then.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
  2023-05-26  9:28 ` [Bug ipa/109983] " sjames at gcc dot gnu.org
  2023-05-26  9:36 ` sjames at gcc dot gnu.org
@ 2023-05-26 10:38 ` slyfox at gcc dot gnu.org
  2023-05-26 10:48 ` sjames at gcc dot gnu.org
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: slyfox at gcc dot gnu.org @ 2023-05-26 10:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

Sergei Trofimovich <slyfox at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |slyfox at gcc dot gnu.org

--- Comment #3 from Sergei Trofimovich <slyfox at gcc dot gnu.org> ---
Quoting `man gcc`:

       -fipa-pta
           Perform interprocedural pointer analysis and interprocedural
modification
           and reference analysis.  This option can cause excessive memory and
           compile-time usage on large compilation units.  It is not enabled by
           default at any optimization level.

Chances are `gcc` succeeds in a longer time: O(minutes). For me it finishes in
3 minutes on gcc-14. Enabled extra checkers (verify_*) probably make it slower.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-05-26 10:38 ` slyfox at gcc dot gnu.org
@ 2023-05-26 10:48 ` sjames at gcc dot gnu.org
  2023-05-26 11:04 ` slyfox at gcc dot gnu.org
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-05-26 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #4 from Sam James <sjames at gcc dot gnu.org> ---
I know what the option does, but:
1. It's substantially slower in 12/13/14, with or without checking. If that's
expected, that's fine, but someone has to say if it is.

2. With default checking (=release) on 12.2.1 20230428, it's taken over 5
minutes.

3. Even lots of checking (yes,rtl,extra) on 11, it takes about 10 seconds.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-05-26 10:48 ` sjames at gcc dot gnu.org
@ 2023-05-26 11:04 ` slyfox at gcc dot gnu.org
  2023-05-26 18:17 ` slyfox at gcc dot gnu.org
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: slyfox at gcc dot gnu.org @ 2023-05-26 11:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #5 from Sergei Trofimovich <slyfox at gcc dot gnu.org> ---
> For me it finishes in 3 minutes on gcc-14.

I'll take it back. It does not finish for me in 10 minutes on gcc-14. Don't
know where I picked the number.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-05-26 11:04 ` slyfox at gcc dot gnu.org
@ 2023-05-26 18:17 ` slyfox at gcc dot gnu.org
  2023-05-26 20:03 ` slyfox at gcc dot gnu.org
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: slyfox at gcc dot gnu.org @ 2023-05-26 18:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #6 from Sergei Trofimovich <slyfox at gcc dot gnu.org> ---
Created attachment 55174
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55174&action=edit
packet-rnsap-shrunk-slightly.c.i.xz

packet-rnsap-shrunk-slightly.c.i.xz is a slightly shrunk version of the
original. 

It exhibits 10x slowdown and has a reasonable compilation completion time.
Might be useful to explore as is or bisect gcc:

$ gcc -O2 -c packet-rnsap-shrunk-slightly.c.i -o bug.o -fipa-pta
-Wno-deprecated-declarations -fno-ipa-pta >/dev/null 2>&1

real    0m0,657s
user    0m0,626s
sys     0m0,026s

$ gcc -O2 -c packet-rnsap-shrunk-slightly.c.i -o bug.o -fipa-pta
-Wno-deprecated-declarations -fipa-pta >/dev/null 2>&1

real    0m6,120s
user    0m6,065s
sys     0m0,045s

-ftime-report says 'ipa points-to' takes 88%.

-fdump-ipa-all-details creates 2.0G bug.i.092i.pta2 file (the rest of files are
unred 5M).

I suspect it's a pathology in solving a huge `proto_reg_handoff_rnsap()` graph.
Some variables have up to 5000 PT entries.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-05-26 18:17 ` slyfox at gcc dot gnu.org
@ 2023-05-26 20:03 ` slyfox at gcc dot gnu.org
  2023-05-26 20:09 ` pinskia at gcc dot gnu.org
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: slyfox at gcc dot gnu.org @ 2023-05-26 20:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #7 from Sergei Trofimovich <slyfox at gcc dot gnu.org> ---
Original packet-rnsap.c.i.xz takes 27 minutes to compile for me.

The hack below cuts this time down to 9 minutes (slashes 60% of runtime). 

The considerable amount of time is spent looking up the bitmaps for graph edges
to extract and solve PT facts.

I'd say there is a room for micro-optimization to turn bitmap to something
slightly smarter than a linked list. It will not improve the runtime too much.

Another option could be to put a limit on edge count (say, controlled by a
`param`) which `gcc` could use to fallback on conservative value.

--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -283,7 +283,7 @@ typedef unsigned long BITMAP_WORD;
 /* Number of words to use for each element in the linked list.  */

 #ifndef BITMAP_ELEMENT_WORDS
-#define BITMAP_ELEMENT_WORDS ((128 + BITMAP_WORD_BITS - 1) / BITMAP_WORD_BITS)
+#define BITMAP_ELEMENT_WORDS ((8192 + BITMAP_WORD_BITS - 1) /
BITMAP_WORD_BITS)
 #endif

 /* Number of bits in each actual element of a bitmap.  */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-05-26 20:03 ` slyfox at gcc dot gnu.org
@ 2023-05-26 20:09 ` pinskia at gcc dot gnu.org
  2023-05-30  7:25 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26 20:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Sergei Trofimovich from comment #7)
> Original packet-rnsap.c.i.xz takes 27 minutes to compile for me.
> 
> The hack below cuts this time down to 9 minutes (slashes 60% of runtime). 

Or maybe it should be moved over to use sbitmap rather than bitmap ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-05-26 20:09 ` pinskia at gcc dot gnu.org
@ 2023-05-30  7:25 ` rguenth at gcc dot gnu.org
  2023-05-30  8:39 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-30  7:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-05-30
   Target Milestone|---                         |12.4
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
IPA PTA (the PTA solver in general) is a known hog, it got significantly worse
for testcases involving a lot of calls with Honzas re-org to improve PTA around
calls.

My suggestion is to not enable -fipa-pta if you hit such issue or in general if
you don't know it pays off with a good runtime performance boost.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-05-30  7:25 ` rguenth at gcc dot gnu.org
@ 2023-05-30  8:39 ` rguenth at gcc dot gnu.org
  2023-05-30  9:08 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-30  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Unsurprisingly, for the full testcase on current trunk with release checking:

 ipa points-to                      :1080.80 (100%)   0.56 ( 47%)1081.52 (100%)
 9739k (  4%)
 TOTAL                              :1083.27          1.20       1084.62       
  261M

1083.27user 1.21system 18:04.64elapsed 99%CPU (0avgtext+0avgdata
2778064maxresident)k
0inputs+9168outputs (0major+712792minor)pagefaults 0swaps


Note there are PRs showing PTA itself is slow for large functions, IPA PTA
isn't very much special here.

The bitmap data structure isn't the main issue, the main issue is the
working set becomes too big and the iteration isn't optimized for memory
locality.  The biggest hits tends to be the bitmap_ior_into done in
solve_add_graph_edge which is necessary to support the solver working
on incremental solution changes only (we only have 'changed' on nodes,
not on graph edges which are only represented implicitly).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-05-30  8:39 ` rguenth at gcc dot gnu.org
@ 2023-05-30  9:08 ` rguenth at gcc dot gnu.org
  2023-05-31 10:37 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-30  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 55213
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55213&action=edit
prototype patch

Trying to fix that (again - I usually break correctness here :/) this
would result in

 ipa points-to                      : 170.66 ( 99%)   0.28 ( 28%) 170.95 ( 98%)
 1428k (  1%)
 TOTAL                              : 173.10          0.99        174.12       
  253M

so still slow but a lot faster.  What the patch definitely misses is
handling the graph node merging cases, properly merging new_edges[] for
the merged nodes.  Note it's also trading memory locality against the
possibility of needing more solver iterations (it probably pays off for
large working sets but might hurt small ones).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-05-30  9:08 ` rguenth at gcc dot gnu.org
@ 2023-05-31 10:37 ` rguenth at gcc dot gnu.org
  2023-05-31 10:49 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-31 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Bad:

Samples: 4M of event 'cycles:u', Event count (approx.): 6574968845607           
Overhead       Samples  Command  Shared Object       Symbol                     
  59.44%       2803407  cc1      cc1                 [.] bitmap_ior_into
  23.71%       1119062  cc1      cc1                 [.] bitmap_bit_p
  12.30%        583673  cc1      cc1                 [.] bitmap_set_bit
   1.13%         53698  cc1      cc1                 [.] find
   1.09%         51770  cc1      cc1                 [.] solve_constraints
   0.55%         26265  cc1      cc1                 [.] add_graph_edge
   0.52%         24726  cc1      cc1                 [.] solve_add_graph_edge
   0.34%         16328  cc1      cc1                 [.]
find_what_var_points_to
   0.23%         11259  cc1      cc1                 [.] solution_set_expand
   0.14%          7015  cc1      cc1                 [.] ipa_pta_execute
   0.13%          6296  cc1      cc1                 [.] topo_visit
   0.06%          2708  cc1      cc1                 [.] auto_var_in_fn_p

        1157768.85 msec task-clock:u                     #    1.000 CPUs
utilized          
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
            712310      page-faults:u                    #  615.244 /sec        
     6492736607297      cycles:u                         #    5.608 GHz        
             (83.33%)
         629989262      stalled-cycles-frontend:u        #    0.01% frontend
cycles idle     (83.33%)
        1079567639      stalled-cycles-backend:u         #    0.02% backend
cycles idle      (83.33%)
     3618200824104      instructions:u                   #    0.56  insn per
cycle         
                                                  #    0.00  stalled cycles per
insn  (83.33%)
      976059766707      branches:u                       #  843.052 M/sec      
             (83.33%)
        2658800911      branch-misses:u                  #    0.27% of all
branches          (83.33%)

    1158.070038218 seconds time elapsed

    1157.021441000 seconds user
       0.743804000 seconds sys

Samples: 4M of event 'cache-misses:u', Event count (approx.): 30071477326       
Overhead       Samples  Command  Shared Object       Symbol                     
  67.54%       2723606  cc1      cc1                 [.] bitmap_ior_into
  16.50%        994335  cc1      cc1                 [.] bitmap_bit_p
   3.33%        121544  cc1      cc1                 [.] bitmap_set_bit


Good:

Samples: 390K of event 'cycles:u', Event count (approx.): 532564428833          
Overhead       Samples  Command  Shared Object       Symbol                     
  73.09%        280949  cc1      cc1                 [.] bitmap_set_bit
   3.70%         14919  cc1      cc1                 [.] find
   3.54%         13591  cc1      cc1                 [.]
find_what_var_points_to
   3.48%         13952  cc1      cc1                 [.] bitmap_bit_p
   2.26%          8969  cc1      cc1                 [.] bitmap_ior_into
   2.24%          9027  cc1      cc1                 [.] add_graph_edge
   2.03%          8176  cc1      cc1                 [.] solve_add_graph_edge
   1.75%          7072  cc1      cc1                 [.] solution_set_expand
   1.41%          5620  cc1      cc1                 [.] ipa_pta_execute
   1.34%          5406  cc1      cc1                 [.] solve_constraints

          98853.84 msec task-clock:u                     #    1.000 CPUs
utilized          
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
            558875      page-faults:u                    #    5.654 K/sec       
      533990826593      cycles:u                         #    5.402 GHz        
             (83.33%)
         387751641      stalled-cycles-frontend:u        #    0.07% frontend
cycles idle     (83.33%)
         236682949      stalled-cycles-backend:u         #    0.04% backend
cycles idle      (83.33%)
     1084066822948      instructions:u                   #    2.03  insn per
cycle         
                                                  #    0.00  stalled cycles per
insn  (83.33%)
      276015818957      branches:u                       #    2.792 G/sec      
             (83.34%)
         899265351      branch-misses:u                  #    0.33% of all
branches          (83.33%)

      98.878776190 seconds time elapsed

      98.459437000 seconds user
       0.395912000 seconds sys

Samples: 140K of event 'cache-misses:u', Event count (approx.): 4272440656      
Overhead       Samples  Command  Shared Object       Symbol                     
  17.92%          6124  cc1      cc1                 [.] ipa_pta_execute
  11.93%         28566  cc1      cc1                 [.] bitmap_set_bit
  10.48%         14489  cc1      cc1                 [.] find


But the key observation is from the stats.  In the bad case we have

Points-to Stats:
Total vars:               108033
Non-pointer vars:          59
Statically unified vars:  20147
Dynamically unified vars: 0
Iterations:               4
Number of edges:          141447426
Number of implicit edges: 215676
Points to sets created:15930

and the good case sees

Points-to Stats:
Total vars:               108033
Non-pointer vars:          59
Statically unified vars:  20147
Dynamically unified vars: 0
Iterations:               10
Number of edges:          582729
Number of implicit edges: 215676
Points to sets created:15930

so we have a _lot_ more (242x) edges in the slow case (but also only 4
solver iterations) while patched we somehow do with less edges (and
thus way less propagations aka bitmap_ior) but need 2.5x the number of
solver iterations.

The final points-to solutions are 1:1 identical (semantically, I did
not look at the ESCAPED set here).

The key is the heuristic in add_graph_edge triggers way more often:

      /* The graph solving process does not avoid "triangles", thus
         there can be multiple paths from a node to another involving
         intermediate other nodes.  That causes extra copying which is
         most difficult to avoid when the intermediate node is ESCAPED
         because there are no edges added from ESCAPED.  Avoid
         adding the direct edge FROM -> TO when we have FROM -> ESCAPED
         and TO contains ESCAPED. 
         ???  Note this is only a heuristic, it does not prevent the
         situation from occuring.  The heuristic helps PR38474 and
         PR99912 significantly.  */
      if (to < FIRST_REF_NODE
          && bitmap_bit_p (graph->succs[from], find (escaped_id))
          && bitmap_bit_p (get_varinfo (find (to))->solution, escaped_id))
        {
          stats.num_avoided_edges++;
          return false;
        }

I've added num_avoided_edges and for the fast case we see

Number of avoided edges: 870215056 (that's 6x the number of edges in the slow
case!)

For the slow case we have

Number of avoided edges: 1113769444

which is even more.  But I think that should be the only reason the
number of edges created differs.  Converging more slowly means
executing ESCAPED = *ESCAPED more often which creates edges to
escaped earlier and thus possibly triggering the above.  But it
also means the 'to' solution might not yet contain ESCAPED ...

ICK.  I can get it to

 ipa points-to                      :   0.40 ( 14%)   0.03 (  5%)   0.43 ( 12%)
 1950k (  1%)
 TOTAL                              :   2.87          0.61          3.49       
  253M

when I avoid defering copies to ESCAPED itself (that makes the first
ESCAPED = *ESCAPED do more but delays increasing other solutions
to followup iterations).  In fact it probably special-cases the
ESCAPED = *ESCAPED complex constraint which we also locally iterate
until convergence before visiting other constraints.

Note the initial graph tends to be very unconnected thus the order
is somewhat random but the way we compute the topological sorting
and process the worklist ensures ESCAPED = *ESCAPED is done as something
very last in the first iteration (if there are no existing edges to
ESCAPED - I think we do have those for calls though).

It does help to order the ESCAPED connected component first:

Points-to Stats:
Total vars:               108033
Non-pointer vars:          59
Statically unified vars:  20147
Dynamically unified vars: 0
Iterations:               3
Number of edges:          239596
Number of implicit edges: 215676
Points to sets created:15930

 ipa points-to                      :   0.28 ( 10%)   0.05 (  8%)   0.34 ( 10%)
 2191k (  1%)
 TOTAL                              :   2.85          0.60          3.47       
  253M

I'm testing this approach now.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-05-31 10:37 ` rguenth at gcc dot gnu.org
@ 2023-05-31 10:49 ` rguenth at gcc dot gnu.org
  2023-05-31 11:27 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-31 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |11.3.0

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 11.3 also takes ~3s to compile this so the patch would fix the regression
in full.  It's also remotely simple enough to be backported I think.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-05-31 10:49 ` rguenth at gcc dot gnu.org
@ 2023-05-31 11:27 ` cvs-commit at gcc dot gnu.org
  2023-05-31 11:27 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-31 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:95e5c38a98cc64a797b1d766a20f8c0d0c807a74

commit r14-1429-g95e5c38a98cc64a797b1d766a20f8c0d0c807a74
Author: Richard Biener <rguenther@suse.de>
Date:   Wed May 31 12:07:42 2023 +0200

    ipa/109983 - (IPA) PTA speedup

    This improves the edge avoidance heuristic by re-ordering the
    topological sort of the graph to make sure the component with
    the ESCAPED node is processed first.  This improves the number
    of created edges which directly correlates with the number
    of bitmap_ior_into calls from 141447426 to 239596 and the
    compile-time from 1083s to 3s.  It also improves the compile-time
    for the related PR109143 from 81s to 27s.

    I've modernized the topological sorting API on the way as well.

            PR ipa/109983
            PR tree-optimization/109143
            * tree-ssa-structalias.cc (struct topo_info): Remove.
            (init_topo_info): Likewise.
            (free_topo_info): Likewise.
            (compute_topo_order): Simplify API, put the component
            with ESCAPED last so it's processed first.
            (topo_visit): Adjust.
            (solve_graph): Likewise.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-05-31 11:27 ` cvs-commit at gcc dot gnu.org
@ 2023-05-31 11:27 ` rguenth at gcc dot gnu.org
  2023-05-31 19:27 ` [Bug ipa/109983] [12/13 " sjames at gcc dot gnu.org
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-31 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, for me GCC 12.3 takes 170s for the compile.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-05-31 11:27 ` rguenth at gcc dot gnu.org
@ 2023-05-31 19:27 ` sjames at gcc dot gnu.org
  2023-06-06  8:42 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-05-31 19:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #16 from Sam James <sjames at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #9)
> My suggestion is to not enable -fipa-pta if you hit such issue or in general
> if you don't know it pays off with a good runtime performance boost.

Many thanks richi. I've been telling people this informally but I'll try to
make it a bit clearer somewhere as well. We don't recommend using -fipa-pta but
some adventurous users pick it up.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-05-31 19:27 ` [Bug ipa/109983] [12/13 " sjames at gcc dot gnu.org
@ 2023-06-06  8:42 ` rguenth at gcc dot gnu.org
  2023-06-23 10:29 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-06  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12/13 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-06-06  8:42 ` rguenth at gcc dot gnu.org
@ 2023-06-23 10:29 ` cvs-commit at gcc dot gnu.org
  2023-11-27 11:34 ` [Bug ipa/109983] [12 " cvs-commit at gcc dot gnu.org
  2023-11-27 11:38 ` rguenth at gcc dot gnu.org
  19 siblings, 0 replies; 21+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-23 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a737da4885b7038fdd7d6646deaa1644efff2257

commit r13-7466-ga737da4885b7038fdd7d6646deaa1644efff2257
Author: Richard Biener <rguenther@suse.de>
Date:   Wed May 31 12:07:42 2023 +0200

    ipa/109983 - (IPA) PTA speedup

    This improves the edge avoidance heuristic by re-ordering the
    topological sort of the graph to make sure the component with
    the ESCAPED node is processed first.  This improves the number
    of created edges which directly correlates with the number
    of bitmap_ior_into calls from 141447426 to 239596 and the
    compile-time from 1083s to 3s.  It also improves the compile-time
    for the related PR109143 from 81s to 27s.

    I've modernized the topological sorting API on the way as well.

            PR ipa/109983
            PR tree-optimization/109143
            * tree-ssa-structalias.cc (struct topo_info): Remove.
            (init_topo_info): Likewise.
            (free_topo_info): Likewise.
            (compute_topo_order): Simplify API, put the component
            with ESCAPED last so it's processed first.
            (topo_visit): Adjust.
            (solve_graph): Likewise.

    (cherry picked from commit 95e5c38a98cc64a797b1d766a20f8c0d0c807a74)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2023-06-23 10:29 ` cvs-commit at gcc dot gnu.org
@ 2023-11-27 11:34 ` cvs-commit at gcc dot gnu.org
  2023-11-27 11:38 ` rguenth at gcc dot gnu.org
  19 siblings, 0 replies; 21+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-27 11:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

--- Comment #18 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:5a381334e2b0faac75c059504155aa4a8f8809ac

commit r12-10005-g5a381334e2b0faac75c059504155aa4a8f8809ac
Author: Richard Biener <rguenther@suse.de>
Date:   Wed May 31 12:07:42 2023 +0200

    ipa/109983 - (IPA) PTA speedup

    This improves the edge avoidance heuristic by re-ordering the
    topological sort of the graph to make sure the component with
    the ESCAPED node is processed first.  This improves the number
    of created edges which directly correlates with the number
    of bitmap_ior_into calls from 141447426 to 239596 and the
    compile-time from 1083s to 3s.  It also improves the compile-time
    for the related PR109143 from 81s to 27s.

    I've modernized the topological sorting API on the way as well.

            PR ipa/109983
            PR tree-optimization/109143
            * tree-ssa-structalias.cc (struct topo_info): Remove.
            (init_topo_info): Likewise.
            (free_topo_info): Likewise.
            (compute_topo_order): Simplify API, put the component
            with ESCAPED last so it's processed first.
            (topo_visit): Adjust.
            (solve_graph): Likewise.

    (cherry picked from commit 95e5c38a98cc64a797b1d766a20f8c0d0c807a74)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug ipa/109983] [12 regression] Wireshark compilation hangs with -O2 -fipa-pta
  2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2023-11-27 11:34 ` [Bug ipa/109983] [12 " cvs-commit at gcc dot gnu.org
@ 2023-11-27 11:38 ` rguenth at gcc dot gnu.org
  19 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-27 11:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109983

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED
      Known to work|                            |12.3.1

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-11-27 11:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26  9:26 [Bug ipa/109983] New: [12/13/14 regression] Wireshark compilation hangs with -O2 -fipa-pta sjames at gcc dot gnu.org
2023-05-26  9:28 ` [Bug ipa/109983] " sjames at gcc dot gnu.org
2023-05-26  9:36 ` sjames at gcc dot gnu.org
2023-05-26 10:38 ` slyfox at gcc dot gnu.org
2023-05-26 10:48 ` sjames at gcc dot gnu.org
2023-05-26 11:04 ` slyfox at gcc dot gnu.org
2023-05-26 18:17 ` slyfox at gcc dot gnu.org
2023-05-26 20:03 ` slyfox at gcc dot gnu.org
2023-05-26 20:09 ` pinskia at gcc dot gnu.org
2023-05-30  7:25 ` rguenth at gcc dot gnu.org
2023-05-30  8:39 ` rguenth at gcc dot gnu.org
2023-05-30  9:08 ` rguenth at gcc dot gnu.org
2023-05-31 10:37 ` rguenth at gcc dot gnu.org
2023-05-31 10:49 ` rguenth at gcc dot gnu.org
2023-05-31 11:27 ` cvs-commit at gcc dot gnu.org
2023-05-31 11:27 ` rguenth at gcc dot gnu.org
2023-05-31 19:27 ` [Bug ipa/109983] [12/13 " sjames at gcc dot gnu.org
2023-06-06  8:42 ` rguenth at gcc dot gnu.org
2023-06-23 10:29 ` cvs-commit at gcc dot gnu.org
2023-11-27 11:34 ` [Bug ipa/109983] [12 " cvs-commit at gcc dot gnu.org
2023-11-27 11:38 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).