public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass
@ 2021-01-28 11:19 marxin at gcc dot gnu.org
  2021-01-28 11:20 ` [Bug rtl-optimization/98863] " marxin at gcc dot gnu.org
                   ` (49 more replies)
  0 siblings, 50 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-28 11:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

            Bug ID: 98863
           Summary: WRF with LTO consumes a lot of memory in split2 pass
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marxin at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---

Created attachment 50072
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50072&action=edit
CPU and memory usage

Using -flto and -Ofast -march=znver needs >20GB for a single huge ltrans.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
@ 2021-01-28 11:20 ` marxin at gcc dot gnu.org
  2021-01-28 13:18 ` rguenth at gcc dot gnu.org
                   ` (48 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-28 11:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |jamborm at gcc dot gnu.org
   Last reconfirmed|                            |2021-01-28
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
We have an ltrans that needs > 1000s to compile and has a memory hog here:

1034s: current pass = gcse2 (306)
1034s: current pass = split2 (307)
{'ltrans': {'memory': 1.8760414123535156, 'cpu': 6.25}}
{'ltrans': {'memory': 3.2761878967285156, 'cpu': 6.25}}
{'ltrans': {'memory': 6.182369232177734, 'cpu': 6.25}}
{'ltrans': {'memory': 9.13412094116211, 'cpu': 6.25}}
{'ltrans': {'memory': 12.164928436279297, 'cpu': 6.25}}
{'ltrans': {'memory': 15.184154510498047, 'cpu': 6.25}}
{'ltrans': {'memory': 18.196331024169922, 'cpu': 6.25}}
{'ltrans': {'memory': 21.150096893310547, 'cpu': 6.25}}
{'ltrans': {'memory': 21.467578887939453, 'cpu': 6.24375}}
{'ltrans': {'memory': 21.467578887939453, 'cpu': 6.25}}
{'ltrans': {'memory': 21.468082427978516, 'cpu': 6.25}}
1045s: current pass = ree (308)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
  2021-01-28 11:20 ` [Bug rtl-optimization/98863] " marxin at gcc dot gnu.org
@ 2021-01-28 13:18 ` rguenth at gcc dot gnu.org
  2021-01-28 13:21 ` marxin at gcc dot gnu.org
                   ` (47 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 13:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Huh.  I guess you need to trace that with detailed mem stats, split itself
should be really OK it should be linear in the number of (split) insns.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
  2021-01-28 11:20 ` [Bug rtl-optimization/98863] " marxin at gcc dot gnu.org
  2021-01-28 13:18 ` rguenth at gcc dot gnu.org
@ 2021-01-28 13:21 ` marxin at gcc dot gnu.org
  2021-01-28 14:14 ` marxin at gcc dot gnu.org
                   ` (46 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-28 13:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
It's 521.wrf_r from SPEC 2017.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-01-28 13:21 ` marxin at gcc dot gnu.org
@ 2021-01-28 14:14 ` marxin at gcc dot gnu.org
  2021-01-28 14:20 ` rguenth at gcc dot gnu.org
                   ` (45 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-28 14:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> ---
Created attachment 50075
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50075&action=edit
time and memory report

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-01-28 14:14 ` marxin at gcc dot gnu.org
@ 2021-01-28 14:20 ` rguenth at gcc dot gnu.org
  2021-01-28 14:24 ` rguenth at gcc dot gnu.org
                   ` (44 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 14:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
So GC memory according to -ftime-report isn't so bad.  tail of sorted (after
time):

 TOTAL                              :  25.40          0.32         25.75       
  244M
 TOTAL                              :  26.66          0.22         26.90       
  130M
 TOTAL                              :  67.58          1.49         69.11       
  834M
 TOTAL                              :  98.32          2.98        101.36       
 1342M
 TOTAL                              : 671.02          9.38        680.77       
 2576M

the outlier is ltrans34 for me which is also the biggest unit by far.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-01-28 14:20 ` rguenth at gcc dot gnu.org
@ 2021-01-28 14:24 ` rguenth at gcc dot gnu.org
  2021-01-28 15:09 ` marxin at gcc dot gnu.org
                   ` (43 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 14:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
df-problems.c:228 (df_rd_alloc)                        702M:  6.9%      705M   
   18M:  0.7%          0           0       heap
df-problems.c:509 (df_rd_transfer_function)           3709M: 36.6%     3709M   
  188M:  7.1%          0           0       heap
df-problems.c:227 (df_rd_alloc)                       4417M: 43.6%     4417M   
  111M:  4.2%          0           0       heap

that's not 20Gb but quite a bit.  For GC memory complete unrolling is the
biggest offender (but "only" 500MB).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-01-28 14:24 ` rguenth at gcc dot gnu.org
@ 2021-01-28 15:09 ` marxin at gcc dot gnu.org
  2021-01-28 15:15 ` rguenth at gcc dot gnu.org
                   ` (42 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-28 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
I set up ulimit -v 16GB and I attached gdb. Allocation failure happens here:

(gdb) p *current_pass
$1 = {
  <pass_data> = {
    type = RTL_PASS,
    name = 0x19f43ae "ree",

(gdb) bt
#0  xmalloc_failed (size=size@entry=65536) at
/home/marxin/Programming/gcc/libiberty/xmalloc.c:119
#1  0x00000000019ac968 in xmalloc (size=65536) at
/home/marxin/Programming/gcc/libiberty/xmalloc.c:149
#2  0x00000000019a3158 in call_chunkfun (size=<optimized out>, h=0x51b5240) at
/home/marxin/Programming/gcc/libiberty/obstack.c:94
#3  _obstack_newchunk (h=h@entry=0x51b5240, length=length@entry=40) at
/home/marxin/Programming/gcc/libiberty/obstack.c:206
#4  0x0000000000853cde in bitmap_element_allocate (head=0x5908d020,
head=0x5908d020) at /home/marxin/Programming/gcc/gcc/bitmap.c:123
#5  bitmap_list_insert_element_after (head=0x5908d020, elt=0x3d861a4f8,
indx=3548, node=<optimized out>) at
/home/marxin/Programming/gcc/gcc/bitmap.c:312
#6  0x0000000000855b34 in bitmap_set_range (count=<optimized out>, start=0,
head=0x5908d020) at /home/marxin/Programming/gcc/gcc/bitmap.c:1624
#7  bitmap_set_range (head=0x5908d020, start=0, count=<optimized out>) at
/home/marxin/Programming/gcc/gcc/bitmap.c:1577
#8  0x000000000090926f in df_mir_alloc (all_blocks=<optimized out>) at
/home/marxin/Programming/gcc/gcc/df-problems.c:1921
#9  0x0000000000901ef6 in df_analyze_problem (dflow=0x3ce62b0,
blocks_to_consider=0x445a888, postorder=0x4a821208, n_blocks=56609) at
/home/marxin/Programming/gcc/gcc/df-core.c:1162
#10 0x0000000000902e42 in df_analyze_1 () at
/home/marxin/Programming/gcc/gcc/df-core.c:1228
#11 0x00000000017bb663 in find_and_remove_re () at
/home/marxin/Programming/gcc/gcc/ree.c:1290
#12 rest_of_handle_ree () at /home/marxin/Programming/gcc/gcc/ree.c:1384
#13 (anonymous namespace)::pass_ree::execute (this=<optimized out>) at
/home/marxin/Programming/gcc/gcc/ree.c:1412
#14 0x0000000000c62a38 in execute_one_pass (pass=0x24e5d90) at
/home/marxin/Programming/gcc/gcc/passes.c:2567
#15 0x0000000000c63413 in execute_pass_list_1 (pass=0x24e5d90) at
/home/marxin/Programming/gcc/gcc/passes.c:2656
#16 0x0000000000c63425 in execute_pass_list_1 (pass=0x24e5c10) at
/home/marxin/Programming/gcc/gcc/passes.c:2657
#17 0x0000000000c63425 in execute_pass_list_1 (pass=0x24e4810) at
/home/marxin/Programming/gcc/gcc/passes.c:2657
#18 0x0000000000c63456 in execute_pass_list (fn=0x7ffff6b457e8, pass=<optimized
out>) at /home/marxin/Programming/gcc/gcc/passes.c:2667
#19 0x00000000008e2235 in cgraph_node::expand (this=0x7ffff71c8990) at
/home/marxin/Programming/gcc/gcc/context.h:48
#20 0x00000000008e389f in expand_all_functions () at
/home/marxin/Programming/gcc/gcc/cgraphunit.c:1995
#21 symbol_table::compile (this=<optimized out>) at
/home/marxin/Programming/gcc/gcc/cgraphunit.c:2359
#22 symbol_table::compile (this=<optimized out>) at
/home/marxin/Programming/gcc/gcc/cgraphunit.c:2270
#23 0x000000000082d575 in lto_main () at
/home/marxin/Programming/gcc/gcc/lto/lto.c:653
#24 0x0000000000d3b83e in compile_file () at
/home/marxin/Programming/gcc/gcc/toplev.c:457
#25 0x0000000000801a80 in do_compile () at
/home/marxin/Programming/gcc/gcc/toplev.c:2193
#26 toplev::main (this=this@entry=0x7fffffffdd4e, argc=<optimized out>,
argc@entry=20, argv=<optimized out>, argv@entry=0x7fffffffde58) at
/home/marxin/Programming/gcc/gcc/toplev.c:2332
#27 0x0000000000805c35 in main (argc=20, argv=0x7fffffffde58) at
/home/marxin/Programming/gcc/gcc/main.c:39

Does it help?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-01-28 15:09 ` marxin at gcc dot gnu.org
@ 2021-01-28 15:15 ` rguenth at gcc dot gnu.org
  2021-01-28 15:17 ` rguenth at gcc dot gnu.org
                   ` (41 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 15:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So -fno-ree doesn't help (just figured it might be given the DF numbers).  But
confirmed:

> /usr/bin/time /home/rguenther/install/gcc-11.0/usr/local/bin/../lib64/gcc/../../lib/gcc/x86_64-pc-linux-gnu/11.0.0/lto1 -quiet -dumpbase ./wrf_r.ltrans34.ltrans -march=znver2 -g0 -Ofast -Ofast -version -fno-openacc -fno-pie -fcf-protection=none -fno-openmp -ftime-report -fltrans @./wrf_r.ltrans34.ltrans.args.0 -o ./wrf_r.ltrans34.ltrans.s
GNU GIMPLE (GCC) version 11.0.0 20210128 (experimental) (x86_64-pc-linux-gnu)
        compiled by GNU C version 11.0.0 20210128 (experimental), GMP version
6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU GIMPLE (GCC) version 11.0.0 20210128 (experimental) (x86_64-pc-linux-gnu)
        compiled by GNU C version 11.0.0 20210128 (experimental), GMP version
6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
 df reaching defs                   :  26.41 (  4%)   1.49 ( 16%)  28.07 (  4%)
    0  (  0%)
 df live regs                       :  81.46 ( 12%)   0.12 (  1%)  81.73 ( 12%)
    0  (  0%)
 df live&initialized regs           :  83.78 ( 13%)   0.06 (  1%)  83.77 ( 13%)
    0  (  0%)
...
 PRE                                : 214.60 ( 33%)   1.35 ( 15%) 216.04 ( 32%)
 2619k (  0%)
...
 LRA create live ranges             :  30.87 (  5%)   0.00 (  0%)  30.85 (  5%)
 4168k (  0%)
...
 TOTAL                              : 657.16          9.30        666.82       
 2576M
657.16user 9.35system 11:06.87elapsed 99%CPU (0avgtext+0avgdata
25834184maxresident)k
0inputs+21088outputs (0major+11450874minor)pagefaults 0swaps

but there isn't really anything in the mem-report that explains the 25GB
max-rss.  Some int overflows might result in spectacular (but unused)
mallocs but then those shouldn't show up in resident size.

Need to rebuild GCC with dwarf4 to be able to leak-check with valgrind (will
need the whole night I guess ;))

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-01-28 15:15 ` rguenth at gcc dot gnu.org
@ 2021-01-28 15:17 ` rguenth at gcc dot gnu.org
  2021-01-28 15:18 ` rguenth at gcc dot gnu.org
                   ` (40 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 15:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, I guess -fno-ree on the lto1 command-line gets ignored :/  So yeah, there's
known issues with REE (PR80930 and PR98144).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-01-28 15:17 ` rguenth at gcc dot gnu.org
@ 2021-01-28 15:18 ` rguenth at gcc dot gnu.org
  2021-01-28 15:30 ` rguenth at gcc dot gnu.org
                   ` (39 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 15:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
But I wonder why the mem-report doesn't show these?  They dont' sum up to 20GB
for me.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-01-28 15:18 ` rguenth at gcc dot gnu.org
@ 2021-01-28 15:30 ` rguenth at gcc dot gnu.org
  2021-01-29  8:47 ` rguenth at gcc dot gnu.org
                   ` (38 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |80930

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed, -fno-ree improves it.  Peak is way down to 8GB which is still too
much of course.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80930
[Bug 80930] REE pass causes high memory usage via df_mir_alloc() with
ASAN+UBSAN turned on

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-01-28 15:30 ` rguenth at gcc dot gnu.org
@ 2021-01-29  8:47 ` rguenth at gcc dot gnu.org
  2021-01-29  9:03 ` rguenth at gcc dot gnu.org
                   ` (37 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Wow, so the MIR problem starts with

static void
df_mir_alloc (bitmap all_blocks)
{   
...
  EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
    {
...
          bitmap_set_range (&bb_info->in, 0, DF_REG_SIZE (df));
          bitmap_set_range (&bb_info->out, 0, DF_REG_SIZE (df));

I'll see if I can apply the same trick I applied to tree PRE when it
had its "maximum set".

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-01-29  8:47 ` rguenth at gcc dot gnu.org
@ 2021-01-29  9:03 ` rguenth at gcc dot gnu.org
  2021-01-29  9:47 ` rguenth at gcc dot gnu.org
                   ` (36 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29  9:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Mine.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2021-01-29  9:03 ` rguenth at gcc dot gnu.org
@ 2021-01-29  9:47 ` rguenth at gcc dot gnu.org
  2021-01-29  9:56 ` marxin at gcc dot gnu.org
                   ` (35 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29  9:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 50079
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50079&action=edit
patch I am testing

Maybe you can produce updated mem-report with this patch?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2021-01-29  9:47 ` rguenth at gcc dot gnu.org
@ 2021-01-29  9:56 ` marxin at gcc dot gnu.org
  2021-01-29 10:18 ` marxin at gcc dot gnu.org
                   ` (34 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29  9:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #15 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> Created attachment 50079 [details]
> patch I am testing
> 
> Maybe you can produce updated mem-report with this patch?

I'm testing the patch with my plotting script.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2021-01-29  9:56 ` marxin at gcc dot gnu.org
@ 2021-01-29 10:18 ` marxin at gcc dot gnu.org
  2021-01-29 10:24 ` rguenther at suse dot de
                   ` (33 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29 10:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #16 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #15)
> (In reply to Richard Biener from comment #14)
> > Created attachment 50079 [details]
> > patch I am testing
> > 
> > Maybe you can produce updated mem-report with this patch?
> 
> I'm testing the patch with my plotting script.

And yes, it's fixed now:
https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/cb7df4d0b7ef88dfa4ef8cdea0a82fbe6444e553/wrf-fixed.svg

Kudos!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2021-01-29 10:18 ` marxin at gcc dot gnu.org
@ 2021-01-29 10:24 ` rguenther at suse dot de
  2021-01-29 11:03 ` rguenth at gcc dot gnu.org
                   ` (32 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenther at suse dot de @ 2021-01-29 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #17 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 29 Jan 2021, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
> 
> --- Comment #16 from Martin Liška <marxin at gcc dot gnu.org> ---
> (In reply to Martin Liška from comment #15)
> > (In reply to Richard Biener from comment #14)
> > > Created attachment 50079 [details]
> > > patch I am testing
> > > 
> > > Maybe you can produce updated mem-report with this patch?
> > 
> > I'm testing the patch with my plotting script.
> 
> And yes, it's fixed now:
> https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/cb7df4d0b7ef88dfa4ef8cdea0a82fbe6444e553/wrf-fixed.svg
> 
> Kudos!

Nice.  There's still short peaks, one to 11GB, would be interesting
to see what pass causes those as well (the smaller ones are likely
easier to analyze, possibly caused by the same pass).

Now waiting for the patch to be approved.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in split2 pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2021-01-29 10:24 ` rguenther at suse dot de
@ 2021-01-29 11:03 ` rguenth at gcc dot gnu.org
  2021-01-29 11:03 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE pass rguenth at gcc dot gnu.org
                   ` (31 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 11:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
Bug 98863 depends on bug 80930, which changed state.

Bug 80930 Summary: REE pass causes high memory usage via df_mir_alloc() with ASAN+UBSAN turned on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80930

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2021-01-29 11:03 ` rguenth at gcc dot gnu.org
@ 2021-01-29 11:03 ` rguenth at gcc dot gnu.org
  2021-01-29 11:38 ` marxin at gcc dot gnu.org
                   ` (30 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 11:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|WRF with LTO consumes a lot |WRF with LTO consumes a lot
                   |of memory in split2 pass    |of memory in REE pass

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
REE issue fixed on trunk, leaving open for eventual further analysis.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2021-01-29 11:03 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE pass rguenth at gcc dot gnu.org
@ 2021-01-29 11:38 ` marxin at gcc dot gnu.org
  2021-01-29 12:13 ` rguenth at gcc dot gnu.org
                   ` (29 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29 11:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #19 from Martin Liška <marxin at gcc dot gnu.org> ---
The other peak at ~12 GB is in:

679s: current pass = cprop (261)
{'ltrans': {'memory': 1.6539268493652344, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.6539268493652344, 'cpu': 6.24375}}
{'ltrans': {'memory': 4.188739776611328, 'cpu': 6.24375}}
{'ltrans': {'memory': 4.480541229248047, 'cpu': 6.24375}}
{'ltrans': {'memory': 4.480541229248047, 'cpu': 6.1875}}
{'ltrans': {'memory': 4.480541229248047, 'cpu': 6.30625}}
{'ltrans': {'memory': 4.480541229248047, 'cpu': 6.24375}}
{'ltrans': {'memory': 4.480541229248047, 'cpu': 6.24375}}
...
{'ltrans': {'memory': 9.175521850585938, 'cpu': 6.24375}}
{'ltrans': {'memory': 9.6334228515625, 'cpu': 6.3}}
{'ltrans': {'memory': 9.737640380859375, 'cpu': 6.2375}}

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE pass
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2021-01-29 11:38 ` marxin at gcc dot gnu.org
@ 2021-01-29 12:13 ` rguenth at gcc dot gnu.org
  2021-01-29 12:50 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes rguenth at gcc dot gnu.org
                   ` (28 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 12:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so that's guarded with gcse_or_cprop_is_too_expensive which doesn't trigger
but I see it allocating

      alloc_cprop_mem (last_basic_block_for_fn (cfun),
                       set_hash_table.n_elems);

which does 4 times sbitmap_vector_alloc (n_blocks, n_sets) which consumes
n_blocks * n_sets / 2 bytes of memory.  gcse_or_cprop_is_too_expensive
uses n_basic_blocks_for_fn but stores the result in an int which quite
likely overflows here, defeating the limitation (eh).  Will fix / test.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2021-01-29 12:13 ` rguenth at gcc dot gnu.org
@ 2021-01-29 12:50 ` rguenth at gcc dot gnu.org
  2021-01-29 12:57 ` marxin at gcc dot gnu.org
                   ` (27 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 12:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|missed-optimization         |memory-hog, ra
            Summary|WRF with LTO consumes a lot |WRF with LTO consumes a lot
                   |of memory in REE pass       |of memory in REE, CPROP,
                   |                            |PRE and LRA passes

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
With that fixed ltrans34 runs into a 7GB peak in LRA.  It does use lra_simple_p
but that alone doesn't help.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2021-01-29 12:50 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes rguenth at gcc dot gnu.org
@ 2021-01-29 12:57 ` marxin at gcc dot gnu.org
  2021-01-29 13:01 ` cvs-commit at gcc dot gnu.org
                   ` (26 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29 12:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #22 from Martin Liška <marxin at gcc dot gnu.org> ---
There are other smaller spikes I can see:

663s: current pass = cse1 (259)
{'ltrans': {'memory': 1.5744667053222656, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.5744667053222656, 'cpu': 6.24375}}
{'ltrans': {'memory': 2.4257125854492188, 'cpu': 6.30625}}
{'ltrans': {'memory': 3.3943252563476562, 'cpu': 6.24375}}
{'ltrans': {'memory': 4.317256927490234, 'cpu': 6.24375}}
{'ltrans': {'memory': 5.229209899902344, 'cpu': 6.24375}}
{'ltrans': {'memory': 6.062725067138672, 'cpu': 6.24375}}
{'ltrans': {'memory': 6.596076965332031, 'cpu': 6.24375}}
{'ltrans': {'memory': 6.596076965332031, 'cpu': 6.24375}}
{'ltrans': {'memory': 6.596076965332031, 'cpu': 6.30625}}
...
1020s: current pass = cprop (277)
{'ltrans': {'memory': 1.7144355773925781, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.7144355773925781, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.7144355773925781, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.7144355773925781, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.7144355773925781, 'cpu': 6.30625}}
{'ltrans': {'memory': 2.1491661071777344, 'cpu': 6.24375}}
{'ltrans': {'memory': 4.545764923095703, 'cpu': 6.24375}}
{'ltrans': {'memory': 5.894496917724609, 'cpu': 6.24375}}
{'ltrans': {'memory': 6.246974945068359, 'cpu': 6.30625}}
{'ltrans': {'memory': 6.279705047607422, 'cpu': 6.1875}}
...
1055s: current pass = dse1 (280)
{'ltrans': {'memory': 1.7354316711425781, 'cpu': 6.3125}}
{'ltrans': {'memory': 1.7354316711425781, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.7354316711425781, 'cpu': 6.25}}
{'ltrans': {'memory': 1.7354316711425781, 'cpu': 6.25}}
{'ltrans': {'memory': 1.7354316711425781, 'cpu': 6.25}}
{'ltrans': {'memory': 1.9871711730957031, 'cpu': 6.25}}
{'ltrans': {'memory': 2.9451065063476562, 'cpu': 6.25}}
{'ltrans': {'memory': 3.904376983642578, 'cpu': 6.1875}}
{'ltrans': {'memory': 4.7339630126953125, 'cpu': 6.30625}}
{'ltrans': {'memory': 5.655551910400391, 'cpu': 6.1875}}
{'ltrans': {'memory': 6.43743896484375, 'cpu': 6.3125}}
{'ltrans': {'memory': 6.7246246337890625, 'cpu': 6.25}}
{'ltrans': {'memory': 6.7246246337890625, 'cpu': 6.25}}
{'ltrans': {'memory': 6.488391876220703, 'cpu': 6.24375}}
...
1096s: current pass = combine (285)
{'ltrans': {'memory': 1.7870025634765625, 'cpu': 6.24375}}
 {GC released 6144k madv_dontneed 2460k} {GC 421M -> 294M}{'ltrans': {'memory':
1.7791824340820312, 'cpu': 6.30625}}
{'ltrans': {'memory': 1.7791824340820312, 'cpu': 6.1875}}
{'ltrans': {'memory': 1.7791824340820312, 'cpu': 6.30625}}
{'ltrans': {'memory': 1.7791824340820312, 'cpu': 6.25}}
{'ltrans': {'memory': 1.7791824340820312, 'cpu': 6.24375}}
{'ltrans': {'memory': 1.7791824340820312, 'cpu': 6.24375}}
{'ltrans': {'memory': 3.4982643127441406, 'cpu': 6.24375}}
{'ltrans': {'memory': 5.336437225341797, 'cpu': 6.18125}}
{'ltrans': {'memory': 5.823863983154297, 'cpu': 6.30625}}
{'ltrans': {'memory': 5.903423309326172, 'cpu': 6.24375}}
...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2021-01-29 12:57 ` marxin at gcc dot gnu.org
@ 2021-01-29 13:01 ` cvs-commit at gcc dot gnu.org
  2021-01-29 13:23 ` rguenth at gcc dot gnu.org
                   ` (25 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-29 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #23 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:cb52e59e33845152cef6f9042a142a246e9447f6

commit r11-6973-gcb52e59e33845152cef6f9042a142a246e9447f6
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jan 29 13:25:49 2021 +0100

    rtl-optimization/98863 - fix PRE/CPROP memory usage check

    This fixes overflow of the memory usage estimate in turn failing
    to disable itself on WRF with LTO, causing a few GBs worth of
    memory peak.

    2021-01-29  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/98863
            * gcse.c (gcse_or_cprop_is_too_expensive): Use unsigned
            HOST_WIDE_INT for the memory estimate.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2021-01-29 13:01 ` cvs-commit at gcc dot gnu.org
@ 2021-01-29 13:23 ` rguenth at gcc dot gnu.org
  2021-01-29 13:37 ` marxin at gcc dot gnu.org
                   ` (24 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 13:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 50087
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50087&action=edit
updated time and memory report

This is updated -f{time,mem}-report (with --enable-gather-detailed-mem-stats)
for ltrans34.o

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2021-01-29 13:23 ` rguenth at gcc dot gnu.org
@ 2021-01-29 13:37 ` marxin at gcc dot gnu.org
  2021-01-29 14:39 ` rguenth at gcc dot gnu.org
                   ` (23 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29 13:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #50072|0                           |1
        is obsolete|                            |

--- Comment #25 from Martin Liška <marxin at gcc dot gnu.org> ---
Created attachment 50088
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50088&action=edit
Current CPU/memory usage for master

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2021-01-29 13:37 ` marxin at gcc dot gnu.org
@ 2021-01-29 14:39 ` rguenth at gcc dot gnu.org
  2021-01-29 15:01 ` rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #24)
> Created attachment 50087 [details]
> updated time and memory report
> 
> This is updated -f{time,mem}-report (with
> --enable-gather-detailed-mem-stats) for ltrans34.o

And the obvious candidate is the RD DF problem where DF-chain depends on RD.
Explicit RD users are haifa (sched_init (), but adds also DF-chain),
WEB, REE, loop IV and invariant and DCE.

df-problems.c:228 (df_rd_alloc)                       1276M:  7.3%     1280M   
   33M:  1.6%          0           0       heap
df-problems.c:509 (df_rd_transfer_function)           6709M: 38.1%     6709M   
  329M: 15.6%          0           0       heap
df-problems.c:227 (df_rd_alloc)                       7992M: 45.4%     7992M   
  200M:  9.5%          0           0       heap

but it's odd since in particular IRA does not seem to use DF_CHAIN/DF_RD
and at the start of ira() we do have 7GB RSS (for quite some time already,
somewhere in loop opt).

So the first peak to 7GB is fwprop_init, after add_phi_nodes.  Memory
is released again after fwprop but since fwprop runs at -O1 it should
likely limit itself somehow.  Richard?

The next pass is (meh), STV, which uses df_chain.

Then we have another fwprop instance.

Then remove_partial_avx_dependency and STV again, then if-conversion
but that doesn't use chain or rd - so sth from before must leak,
most likely one of the i386 specific passes after combine.

Looks like remove_partial_avx_dependency is the culprit, leaving garbage
around and for whatever stupid reason adds a plethora of DF problems...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2021-01-29 14:39 ` rguenth at gcc dot gnu.org
@ 2021-01-29 15:01 ` rguenth at gcc dot gnu.org
  2021-01-29 15:56 ` hjl.tools at gmail dot com
                   ` (21 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 15:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com,
                   |                            |jakub at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> ---
HJ, you added remove_partial_avx_dependency - it adds loads of DF problems
but during its execution it does not seem to use anything but doing
df_insn_rescan.  I'm not too familiar with DF so I assume calling
df_analyze is necessary to mark insns for rescan (I'd assume _not_ calling
df_analyze but only setting defered-rescan and not processing defered
rescans in a pass not needing DF at all should work?!  Or even defered
rescan is default on in that case).

Besides this the pass ups memory use from 2GB to 7GB and memory use doesn't
drop so there's sth fishy going on here.

I'm testing the obvious remove of

              df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
              df_md_add_problem ();

which was likely copied from STV (where I removed adding MD), but STV
_does_ at least use DF and exercises use/def chains.

I'm also not sure altering the CFG is the very best thing to do when
DF is active (and doing that with MD or DU/UD might cause interesting
unknown issues I guess).  At least with not adding MD or DU/UD_CHAIN
the "leak" is gone.  Richard or Jakub may know more here, esp. whether
we can elide the df_analyze completely (I hope we can!).  Even
DF_LIVE and LR alone are a major hog on this testcase (but at least
stay within 2GB of memory while RD tops at 7GB).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2021-01-29 15:01 ` rguenth at gcc dot gnu.org
@ 2021-01-29 15:56 ` hjl.tools at gmail dot com
  2021-01-29 16:32 ` cvs-commit at gcc dot gnu.org
                   ` (20 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: hjl.tools at gmail dot com @ 2021-01-29 15:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #28 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #27)
> HJ, you added remove_partial_avx_dependency - it adds loads of DF problems
> but during its execution it does not seem to use anything but doing
> df_insn_rescan.  I'm not too familiar with DF so I assume calling
> df_analyze is necessary to mark insns for rescan (I'd assume _not_ calling
> df_analyze but only setting defered-rescan and not processing defered
> rescans in a pass not needing DF at all should work?!  Or even defered
> rescan is default on in that case).
> 
> Besides this the pass ups memory use from 2GB to 7GB and memory use doesn't
> drop so there's sth fishy going on here.
> 
> I'm testing the obvious remove of
> 
>               df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>               df_md_add_problem ();
> 
> which was likely copied from STV (where I removed adding MD), but STV
> _does_ at least use DF and exercises use/def chains.
> 
> I'm also not sure altering the CFG is the very best thing to do when
> DF is active (and doing that with MD or DU/UD might cause interesting
> unknown issues I guess).  At least with not adding MD or DU/UD_CHAIN
> the "leak" is gone.  Richard or Jakub may know more here, esp. whether
> we can elide the df_analyze completely (I hope we can!).  Even
> DF_LIVE and LR alone are a major hog on this testcase (but at least
> stay within 2GB of memory while RD tops at 7GB).

If it isn't used, please remove it.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2021-01-29 15:56 ` hjl.tools at gmail dot com
@ 2021-01-29 16:32 ` cvs-commit at gcc dot gnu.org
  2021-02-01  8:22 ` cvs-commit at gcc dot gnu.org
                   ` (19 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-29 16:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #29 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a7f52181a6a16bb6d216ff41d9c6a9da95c19b5c

commit r11-6981-ga7f52181a6a16bb6d216ff41d9c6a9da95c19b5c
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jan 29 16:02:36 2021 +0100

    rtl-optimization/98863 - tame i386 specific RPAD pass

    This removes analyzing DF with expensive problems which we do not
    use at all and which somehow cause 5GB of memory to leak.  Instead
    just do a defered rescan of added insns.

    2021-01-29  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/98863
            * config/i386/i386-features.c (remove_partial_avx_dependency):
            Do not perform DF analysis.
            (pass_data_remove_partial_avx_dependency): Remove
            TODO_df_finish.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2021-01-29 16:32 ` cvs-commit at gcc dot gnu.org
@ 2021-02-01  8:22 ` cvs-commit at gcc dot gnu.org
  2021-02-01  9:31 ` marxin at gcc dot gnu.org
                   ` (18 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-01  8:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:972918eea873f8b1663151316c4b3aee7ae028e2

commit r11-7005-g972918eea873f8b1663151316c4b3aee7ae028e2
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Feb 1 09:18:43 2021 +0100

    rtl-optimization/98863 - prune RD with LIVE in STV

    This sets DF_RD_PRUNE_DEAD_DEFS like all other uses of the UD/DU
    chain problems which makes the RD problem consume a lot less memory.

    2021-02-01  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/98863
            * config/i386/i386-features.c (convert_scalars_to_vector):
            Set DF_RD_PRUNE_DEAD_DEFS.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2021-02-01  8:22 ` cvs-commit at gcc dot gnu.org
@ 2021-02-01  9:31 ` marxin at gcc dot gnu.org
  2021-02-01 10:41 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-02-01  9:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #50088|0                           |1
        is obsolete|                            |

--- Comment #31 from Martin Liška <marxin at gcc dot gnu.org> ---
Created attachment 50101
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50101&action=edit
Current CPU/memory usage for master

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2021-02-01  9:31 ` marxin at gcc dot gnu.org
@ 2021-02-01 10:41 ` rguenth at gcc dot gnu.org
  2021-02-01 12:07 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-01 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #50087|0                           |1
        is obsolete|                            |

--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 50104
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50104&action=edit
updated time and memory report

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2021-02-01 10:41 ` rguenth at gcc dot gnu.org
@ 2021-02-01 12:07 ` rguenth at gcc dot gnu.org
  2021-02-01 12:30 ` rsandifo at gcc dot gnu.org
                   ` (15 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-01 12:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*
            Summary|WRF with LTO consumes a lot |WRF with LTO consumes a lot
                   |of memory in REE, CPROP,    |of memory in REE, FWPROP
                   |PRE and LRA passes          |and x86 specific passes

--- Comment #33 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the remaining two 7GB peaks are fwprop SSA build consuming 5GB over the
otherwise ~2GB peak RSS.  Unfortunately it escapes -fmem-report and seems
to implement its own LIVE dataflow problem rather than using the LIVE from DF
and/or its USE/DEF chains.

I'm not sure how much of a regression this is (the old fwprop implementation
is gone and I've not yet backported the various memory fixes though probably
will do so).  Old fwprop used MD and built its own single def-use links as
well.

Richard, I think we need to do something about this for GCC 11.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2021-02-01 12:07 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes rguenth at gcc dot gnu.org
@ 2021-02-01 12:30 ` rsandifo at gcc dot gnu.org
  2021-02-01 12:32 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-02-01 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |rsandifo at gcc dot gnu.org

--- Comment #34 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Taking for the remaining fwprop work.  Hope to get to it later
in the week.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2021-02-01 12:30 ` rsandifo at gcc dot gnu.org
@ 2021-02-01 12:32 ` rguenth at gcc dot gnu.org
  2021-02-01 12:45 ` [Bug rtl-optimization/98863] [11 Regression] " rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-01 12:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #35 from Richard Biener <rguenth at gcc dot gnu.org> ---
So on the gcc-10 branch with 4311ae206da1, 5ea7be5068f17, 99c96e797d9662c8,
99dafea9bfebb and 327ec3ea29b58 cherry-picked we top out at 2.5GB and
RTL forwprop is not measuably increasing RSS footage.

This makes it a GCC 11 regression and given that fwprop is enabled at -O1
(and I guess we can't really disable it there) we have to care about
the "insanely large" testcases.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2021-02-01 12:32 ` rguenth at gcc dot gnu.org
@ 2021-02-01 12:45 ` rguenth at gcc dot gnu.org
  2021-02-01 13:35 ` cvs-commit at gcc dot gnu.org
                   ` (12 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-01 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0
            Summary|WRF with LTO consumes a lot |[11 Regression] WRF with
                   |of memory in REE, FWPROP    |LTO consumes a lot of
                   |and x86 specific passes     |memory in REE, FWPROP and
                   |                            |x86 specific passes

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2021-02-01 12:45 ` [Bug rtl-optimization/98863] [11 Regression] " rguenth at gcc dot gnu.org
@ 2021-02-01 13:35 ` cvs-commit at gcc dot gnu.org
  2021-02-03 10:27 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-01 13:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #36 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:4311ae206da13a9bfdb4245feb400dbee0f528a0

commit r10-9330-g4311ae206da13a9bfdb4245feb400dbee0f528a0
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jan 29 13:25:49 2021 +0100

    rtl-optimization/98863 - fix PRE/CPROP memory usage check

    This fixes overflow of the memory usage estimate in turn failing
    to disable itself on WRF with LTO, causing a few GBs worth of
    memory peak.

    2021-01-29  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/98863
            * gcse.c (gcse_or_cprop_is_too_expensive): Use unsigned
            HOST_WIDE_INT for the memory estimate.

    (cherry picked from commit cb52e59e33845152cef6f9042a142a246e9447f6)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2021-02-01 13:35 ` cvs-commit at gcc dot gnu.org
@ 2021-02-03 10:27 ` cvs-commit at gcc dot gnu.org
  2021-02-03 12:33 ` cvs-commit at gcc dot gnu.org
                   ` (10 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-03 10:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #37 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ab5b267e1e2d29c4f1acf39af85e247894193168

commit r10-9338-gab5b267e1e2d29c4f1acf39af85e247894193168
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Feb 1 09:18:43 2021 +0100

    rtl-optimization/98863 - prune RD with LIVE in STV

    This sets DF_RD_PRUNE_DEAD_DEFS like all other uses of the UD/DU
    chain problems which makes the RD problem consume a lot less memory.

    2021-02-01  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/98863
            * config/i386/i386-features.c (convert_scalars_to_vector):
            Set DF_RD_PRUNE_DEAD_DEFS.

    (cherry picked from commit 972918eea873f8b1663151316c4b3aee7ae028e2)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (38 preceding siblings ...)
  2021-02-03 10:27 ` cvs-commit at gcc dot gnu.org
@ 2021-02-03 12:33 ` cvs-commit at gcc dot gnu.org
  2021-02-08  8:45 ` rsandifo at gcc dot gnu.org
                   ` (9 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-03 12:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #38 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:550bf0c50024c320b7010d1b2bf644c5c918ad98

commit r10-9341-g550bf0c50024c320b7010d1b2bf644c5c918ad98
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jan 29 16:02:36 2021 +0100

    rtl-optimization/98863 - tame i386 specific RPAD pass

    This removes analyzing DF with expensive problems which we do not
    use at all and which somehow cause 5GB of memory to leak.  Instead
    just do a defered rescan of added insns.

    This avoids

    > FAIL: gcc.c-torture/compile/20051216-1.c   -O1  (internal compiler error)
    > FAIL: gcc.c-torture/compile/20051216-1.c   -O1  (test for excess errors)

    by clearing DF_DEFER_INSN_RESCAN after calling df_process_deferred_rescans,
    so that it doesn't leak into following unprepared passes that expect
    non-deferred rescans.

    2021-02-03  Richard Biener  <rguenther@suse.de>
                Jakub Jelinek  <jakub@redhat.com>

            PR rtl-optimization/98863
            * config/i386/i386-features.c (remove_partial_avx_dependency):
            Do not perform DF analysis.
            (pass_data_remove_partial_avx_dependency): Remove
            TODO_df_finish.

            * gcc.target/i386/20051216-1.c: New test.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (39 preceding siblings ...)
  2021-02-03 12:33 ` cvs-commit at gcc dot gnu.org
@ 2021-02-08  8:45 ` rsandifo at gcc dot gnu.org
  2021-02-08  9:01 ` rguenther at suse dot de
                   ` (8 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-02-08  8:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #39 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Just to give an update on this: I have a patch that reduces the
amount of memory consumed by fwprop so that it no longer seems
to be outlier.  However, it involves doing more bitmap operations.
In this testcase we have a larger number of registers that
seem to be live but unused across a large region of code,
so bitmap ANDs with the live in sets are expensive and hit
the worst-case O(nblocks×nregisters).  I'm still trying to find
a way of reducing the effect of that.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (40 preceding siblings ...)
  2021-02-08  8:45 ` rsandifo at gcc dot gnu.org
@ 2021-02-08  9:01 ` rguenther at suse dot de
  2021-02-08  9:23 ` richard.sandiford at arm dot com
                   ` (7 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenther at suse dot de @ 2021-02-08  9:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #40 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 8 Feb 2021, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
> 
> --- Comment #39 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> Just to give an update on this: I have a patch that reduces the
> amount of memory consumed by fwprop so that it no longer seems
> to be outlier.  However, it involves doing more bitmap operations.
> In this testcase we have a larger number of registers that
> seem to be live but unused across a large region of code,
> so bitmap ANDs with the live in sets are expensive and hit
> the worst-case O(nblocks×nregisters).  I'm still trying to find
> a way of reducing the effect of that.

But isn't this what the RD problem does as well (yeah, DF shows
up as quite compile-time expensive here), and thus all UD/DU chain
users suffer from the same issue?

What I didn't explore further is re-doing the way RD numbers defs
in the bitmaps with the idea that all defs just used inside a
single BB are not necessary to be represented (the local problems
take care of them).  But that of course only helps if there are
a significant number of such defs (shadowed by later defs of the same
reg in the same BB) - which usually should be the case.  There's
extra overhead for re-numbering things of course (but my hope was
to make the RD problem fit in the cache for a nice speedup...)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (41 preceding siblings ...)
  2021-02-08  9:01 ` rguenther at suse dot de
@ 2021-02-08  9:23 ` richard.sandiford at arm dot com
  2021-02-08  9:46 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: richard.sandiford at arm dot com @ 2021-02-08  9:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #41 from richard.sandiford at arm dot com ---
"rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org> writes:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
>
> --- Comment #40 from rguenther at suse dot de <rguenther at suse dot de> ---
> On Mon, 8 Feb 2021, rsandifo at gcc dot gnu.org wrote:
>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
>> 
>> --- Comment #39 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
>> Just to give an update on this: I have a patch that reduces the
>> amount of memory consumed by fwprop so that it no longer seems
>> to be outlier.  However, it involves doing more bitmap operations.
>> In this testcase we have a larger number of registers that
>> seem to be live but unused across a large region of code,
>> so bitmap ANDs with the live in sets are expensive and hit
>> the worst-case O(nblocks×nregisters).  I'm still trying to find
>> a way of reducing the effect of that.
>
> But isn't this what the RD problem does as well (yeah, DF shows
> up as quite compile-time expensive here), and thus all UD/DU chain
> users suffer from the same issue?
Sure, it certainly isn't specific to the RTL-SSA code :-)
I just think we can do better than my current WIP patch does.

> What I didn't explore further is re-doing the way RD numbers defs
> in the bitmaps with the idea that all defs just used inside a
> single BB are not necessary to be represented (the local problems
> take care of them).  But that of course only helps if there are
> a significant number of such defs (shadowed by later defs of the same
> reg in the same BB) - which usually should be the case.
Yeah.  And I think the problem here is that we have a large
number of non-local defs and uses.  It doesn't look like there
are an excessive number of uses per def, just that the defs are
live across a large region before being used.

> There's extra overhead for re-numbering things of course (but my hope
> was to make the RD problem fit in the cache for a nice speedup...)

Has anyone looked into how we end up in this situation for this
testcase?  E.g. did we make bad inlining decisions?  Or is it just
a natural consequence of the way the source is written?

We should cope with the situation better regardless, but since
extreme cases like this tend to trigger --param limits, it would
be good to avoid getting into the situation too. :-)

FWIW, as far as compile-time goes, the outlier in a release build
seems to be do_rpo_vn.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (42 preceding siblings ...)
  2021-02-08  9:23 ` richard.sandiford at arm dot com
@ 2021-02-08  9:46 ` rguenth at gcc dot gnu.org
  2021-02-08 12:44 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-08  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #42 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to richard.sandiford from comment #41)
> "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org> writes:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
> >
> > --- Comment #40 from rguenther at suse dot de <rguenther at suse dot de> ---
> > On Mon, 8 Feb 2021, rsandifo at gcc dot gnu.org wrote:
> >
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
> >> 
> >> --- Comment #39 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> >> Just to give an update on this: I have a patch that reduces the
> >> amount of memory consumed by fwprop so that it no longer seems
> >> to be outlier.  However, it involves doing more bitmap operations.
> >> In this testcase we have a larger number of registers that
> >> seem to be live but unused across a large region of code,
> >> so bitmap ANDs with the live in sets are expensive and hit
> >> the worst-case O(nblocks×nregisters).  I'm still trying to find
> >> a way of reducing the effect of that.
> >
> > But isn't this what the RD problem does as well (yeah, DF shows
> > up as quite compile-time expensive here), and thus all UD/DU chain
> > users suffer from the same issue?
> Sure, it certainly isn't specific to the RTL-SSA code :-)
> I just think we can do better than my current WIP patch does.
> 
> > What I didn't explore further is re-doing the way RD numbers defs
> > in the bitmaps with the idea that all defs just used inside a
> > single BB are not necessary to be represented (the local problems
> > take care of them).  But that of course only helps if there are
> > a significant number of such defs (shadowed by later defs of the same
> > reg in the same BB) - which usually should be the case.
> Yeah.  And I think the problem here is that we have a large
> number of non-local defs and uses.  It doesn't look like there
> are an excessive number of uses per def, just that the defs are
> live across a large region before being used.
> 
> > There's extra overhead for re-numbering things of course (but my hope
> > was to make the RD problem fit in the cache for a nice speedup...)
> 
> Has anyone looked into how we end up in this situation for this
> testcase?  E.g. did we make bad inlining decisions?  Or is it just
> a natural consequence of the way the source is written?

I don't think it's the natural consequence.  Last time I looked at WRF
excessive compile-time issues the root is that there are _lots_ of loops
eventually cloned for pro/epilogue by vectorization so we have _lots_
of loops.  I think the issue is also visible without LTO, just less obviously
pronounced.

> We should cope with the situation better regardless, but since
> extreme cases like this tend to trigger --param limits, it would
> be good to avoid getting into the situation too. :-)

Yeah.  One source of extra lifetime is of course demotion of memory
to registers done by GIMPLE optimizers (store-motion, PRE, hoisting
but also simply FRE).  IIRC WRF once was a bad hitter at store-motion.

> FWIW, as far as compile-time goes, the outlier in a release build
> seems to be do_rpo_vn.

Yeah, I've looked at profiles and the outlier was

static void
do_unwind (unwind_state *to, int rpo_idx, rpo_elim &avail, int *bb_to_rpo)
{
...
  /* Prune [rpo_idx, ] from avail.  */
  /* ???  This is O(number-of-values-in-region) which is
     O(region-size) rather than O(iteration-piece).  */
  for (hash_table<vn_ssa_aux_hasher>::iterator i = vn_ssa_aux_hash->begin ();
       i != vn_ssa_aux_hash->end (); ++i)
    {
      while ((*i)->avail)
        {
          if (bb_to_rpo[(*i)->avail->location] < rpo_idx)
            break;
          vn_avail *av = (*i)->avail;
          (*i)->avail = (*i)->avail->next;
          av->next = avail.m_avail_freelist;
          avail.m_avail_freelist = av;
        }
    }

which has tons of cache misses (and this is triggered from the region-based
call from unrolling a lot of times, making the comment eventually apply as
well).  But for me the "total" outlier is the sum of DF times.

Again almost all dataflow problems do not fit the cache since the function
is so large.  That adds quite some constant factor to all compile-time
cost making it memory bound :/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (43 preceding siblings ...)
  2021-02-08  9:46 ` rguenth at gcc dot gnu.org
@ 2021-02-08 12:44 ` rguenth at gcc dot gnu.org
  2021-02-09 12:07 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-08 12:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #43 from Richard Biener <rguenth at gcc dot gnu.org> ---
So module_configure.fppized.f90 is one of the interesting ones (IIRC lot of
small init loops), module_first_rk_step_part1.fppized.f90 is the one with most
obvious DF and fwprop participation, that one also has complete unrolling.

/home/rguenther/install/gcc-11.0/usr/local/bin/gfortran -c -o
start_em.fppized.o -I. -I./netcdf/include -I./inc -Ofast -march=znver2
-ftime-report -std=legacy -fconvert=big-endian -fno-openmp -g0
start_em.fppized.f90
 df reaching defs                   :   0.96 (  3%)   0.02 (  3%)   0.94 (  3%)
    0  (  0%)
 df live regs                       :   1.15 (  4%)   0.00 (  0%)   1.13 (  4%)
    0  (  0%)
 df live&initialized regs           :   1.08 (  4%)   0.01 (  2%)   1.18 (  4%)
    0  (  0%)
 tree forward propagate             :   0.09 (  0%)   0.01 (  2%)   0.08 (  0%)
 2217k (  0%)
 complete unrolling                 :   1.00 (  3%)   0.02 (  3%)   1.01 (  3%)
   56M ( 12%)
 forward prop                       :   0.93 (  3%)   0.18 ( 29%)   1.10 (  4%)
  388k (  0%)
 TOTAL                              :  30.60          0.62         31.23       
  492M

/home/rguenther/install/gcc-11.0/usr/local/bin/gfortran -c -o
module_advect_em.fppized.o -I. -I./netcdf/include -I./inc -Ofast -march=znver2
-ftime-report -std=legacy -fconvert=big-endian -fno-openmp -g0
module_advect_em.fppized.f90
 df reaching defs                   :   0.28 (  1%)   0.00 (  0%)   0.30 (  1%)
    0  (  0%)
 df live regs                       :   1.45 (  5%)   0.00 (  0%)   1.35 (  4%)
    0  (  0%)
 df live&initialized regs           :   0.54 (  2%)   0.00 (  0%)   0.64 (  2%)
    0  (  0%)
 tree forward propagate             :   0.08 (  0%)   0.00 (  0%)   0.13 (  0%)
 2824k (  0%)
 complete unrolling                 :   0.99 (  3%)   0.03 (  9%)   1.05 (  3%)
   78M (  8%)
 forward prop                       :   0.24 (  1%)   0.00 (  0%)   0.20 (  1%)
 1270k (  0%)
 TOTAL                              :  31.59          0.34         31.96       
  974M

/home/rguenther/install/gcc-11.0/usr/local/bin/gfortran -c -o
module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc -Ofast
-march=znver2 -ftime-report -std=legacy -fconvert=big-endian -fno-openmp -g0
module_first_rk_step_part1.fppized.f90
 df reaching defs                   :   2.69 (  4%)   0.04 (  3%)   2.81 (  5%)
    0  (  0%)
 df live regs                       :   1.76 (  3%)   0.01 (  1%)   1.71 (  3%)
    0  (  0%)
 df live&initialized regs           :   1.57 (  3%)   0.01 (  1%)   1.58 (  3%)
    0  (  0%)
 tree forward propagate             :   0.20 (  0%)   0.02 (  2%)   0.19 (  0%)
 4095k (  0%)
 complete unrolling                 :   3.25 (  5%)   0.04 (  3%)   3.27 (  5%)
   94M ( 11%)
 forward prop                       :   2.79 (  5%)   0.38 ( 30%)   3.16 (  5%)
  765k (  0%)
 TOTAL                              :  60.41          1.27         61.72       
  873M

/home/rguenther/install/gcc-11.0/usr/local/bin/gfortran -c -o
solve_em.fppized.o -I. -I./netcdf/include -I./inc -Ofast -march=znver2
-ftime-report -std=legacy -fconvert=big-endian -fno-openmp -g0
solve_em.fppized.f90
 df reaching defs                   :   1.57 (  5%)   0.01 (  2%)   1.50 (  5%)
    0  (  0%)
 df live regs                       :   2.22 (  7%)   0.01 (  2%)   2.24 (  7%)
    0  (  0%)
 df live&initialized regs           :   2.83 (  9%)   0.01 (  2%)   2.81 (  9%)
    0  (  0%)
 tree forward propagate             :   0.15 (  0%)   0.00 (  0%)   0.15 (  0%)
 2644k (  0%)
 complete unrolling                 :   1.11 (  4%)   0.02 (  4%)   1.12 (  4%)
   86M ( 14%)
 forward prop                       :   0.75 (  2%)   0.08 ( 14%)   0.84 (  3%)
  495k (  0%)
 TOTAL                              :  31.21          0.57         31.80       
  629M

In the end LTO makes the case unique still ...

And then there's of course one other frequently reported bug:

/home/rguenther/install/gcc-11.0/usr/local/bin/gfortran -c -o
module_alloc_space_2.fppized.o -I. -I./netcdf/include -I./inc -Ofast
-march=znver2 -ftime-report -std=legacy -fconvert=big-endian -fno-openmp -g0
module_alloc_space_2.fppized.f90
 load CSE after reload              :  10.18 ( 37%)   0.00 (  0%)  10.19 ( 37%)
   14k (  0%)
 TOTAL                              :  27.28          0.33         27.63       
  333M

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (44 preceding siblings ...)
  2021-02-08 12:44 ` rguenth at gcc dot gnu.org
@ 2021-02-09 12:07 ` cvs-commit at gcc dot gnu.org
  2021-02-09 12:40 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-09 12:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #44 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:396cc31317ebad416e234dfa5f85d42585d32437

commit r11-7147-g396cc31317ebad416e234dfa5f85d42585d32437
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Feb 9 11:59:06 2021 +0100

    Fix O(region-size) unwind in VN

    This fixes the currently O(region-size) unwinding of avail info
    to be O(unwind-size) by tracking a linked-list stack of pushed
    avails.  This reduces the compile-time spent in complete unrolling
    for WRF.

    2021-02-09  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/98863
            * tree-ssa-sccvn.h (vn_avail::next_undo): Add.
            * tree-ssa-sccvn.c (last_pushed_avail): New global.
            (rpo_elim::eliminate_push_avail): Chain pushed avails.
            (unwind_state::avail_top): Add.
            (do_unwind): Rewrite unwinding of avail entries.
            (do_rpo_vn): Initialize last_pushed_avail and
            avail_top of the undo state.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (45 preceding siblings ...)
  2021-02-09 12:07 ` cvs-commit at gcc dot gnu.org
@ 2021-02-09 12:40 ` rguenth at gcc dot gnu.org
  2021-02-10 13:51 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-09 12:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #45 from Richard Biener <rguenth at gcc dot gnu.org> ---
perf profile from non-bootstrapped, release checking enabled lto1 for ltrans34
on a 3900X (so plenty of L3):

Samples: 1M of event 'cycles:u', Event count (approx.): 1572289976832           
Overhead       Samples  Command      Shared Object     Symbol                   
   7.10%         96925  lto1-ltrans  lto1              [.] bitmap_and_into     
                             #
   6.37%         87172  lto1-ltrans  lto1              [.]
bitmap_list_insert_element_after                  #
   5.84%         80125  lto1-ltrans  lto1              [.] bitmap_set_bit      
                             #
   5.71%         78151  lto1-ltrans  lto1              [.] bitmap_ior_into     
                             #
   5.70%         78041  lto1-ltrans  lto1              [.] bitmap_bit_p        
                             #
   3.71%         50632  lto1-ltrans  lto1              [.] bitmap_and          
                             #
   3.48%         47914  lto1-ltrans  lto1              [.] df_count_refs       
                             #
   2.87%         39504  lto1-ltrans  lto1              [.]
lra_create_live_ranges_1                          #
   2.45%         33656  lto1-ltrans  lto1              [.] bitmap_elt_ior      
                             #
   2.39%         32794  lto1-ltrans  lto1              [.]
pre_and_rev_post_order_compute_fn                 #
   2.34%         32200  lto1-ltrans  lto1              [.] update_pseudo_point 
                             #
   2.03%         27707  lto1-ltrans  lto1              [.] bitmap_ior_and_compl
                             #
   1.79%         24445  lto1-ltrans  lto1              [.]
bitmap_and_compl_into                             #
   1.52%         20804  lto1-ltrans  lto1              [.] df_worklist_dataflow
                             #
   1.45%         19838  lto1-ltrans  lto1              [.] bitmap_copy         
                             #
   1.35%         18690  lto1-ltrans  lto1              [.]
get_immediate_dominator                           #
   1.32%         18127  lto1-ltrans  lto1              [.] update_ssa          
                             #
   1.26%         17562  lto1-ltrans  lto1              [.]
determine_value_range                             #
   1.09%         14873  lto1-ltrans  lto1              [.]
rewrite_update_dom_walker::before_dom_children    #
   1.08%         14777  lto1-ltrans  lto1              [.] compute_idf         
                             #
   1.05%         14349  lto1-ltrans  lto1              [.]
compute_dominance_frontiers                       #
   0.94%         12956  lto1-ltrans  libc-2.26.so      [.]
__memset_avx2_unaligned_erms                      #

at some point I wondered why keeping DF_LIVE around pays off, but we don't
have an easy knob to disable it at -O2.  Not many passes need LIVE, most
do with LR.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (46 preceding siblings ...)
  2021-02-09 12:40 ` rguenth at gcc dot gnu.org
@ 2021-02-10 13:51 ` rguenth at gcc dot gnu.org
  2021-02-15 15:06 ` cvs-commit at gcc dot gnu.org
  2021-02-15 15:34 ` rsandifo at gcc dot gnu.org
  49 siblings, 0 replies; 51+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 13:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #46 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 50162
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50162&action=edit
df-live on demand

So I did the experiment to turn off DF_LIVE as permanent like we do for -O1.
w/o permanent DF_LIVE:

 df reaching defs                   :  23.36 (  7%)   0.13 (  4%)  23.84 (  7%)
    0  (  0%)
 df live regs                       :  45.56 ( 14%)   0.09 (  3%)  45.61 ( 14%)
    0  (  0%)
 df live&initialized regs           :  19.42 (  6%)   0.11 (  3%)  19.49 (  6%)
    0  (  0%)
 TOTAL                              : 314.61          3.19        317.93       
 2538M

w/ permanent DF_LIVE:

 df reaching defs                   :  23.53 (  7%)   0.01 (  0%)  23.45 (  7%)
    0  (  0%)
 df live regs                       :  44.95 ( 14%)   0.09 (  3%)  45.13 ( 14%)
    0  (  0%)
 df live&initialized regs           :  19.70 (  6%)   0.08 (  3%)  19.68 (  6%)
    0  (  0%)
 TOTAL                              : 313.83          2.94        316.92       
 2538M

which doesn't seem to be much of a difference (but there's some observable
times with less memory use since live consumes ~400MB in bitmaps).

It should be noted that the passes that add DF_LIVE anyway (and thus
on-demand at -O1) are all enabled by default at -O1.  Which makes
me question this design decision even more.  In principle keeping
DF_LIVE from invariant motion to doloop might make sense (unroll loops
isn't enabled by default, but also not all targets have doloop).
Maybe even starting from ifcvt1.

I've attached the patch used for the experiment.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (47 preceding siblings ...)
  2021-02-10 13:51 ` rguenth at gcc dot gnu.org
@ 2021-02-15 15:06 ` cvs-commit at gcc dot gnu.org
  2021-02-15 15:34 ` rsandifo at gcc dot gnu.org
  49 siblings, 0 replies; 51+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-15 15:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #47 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:abe07a74bb7a2692eff2af151ca54e749ed5eba6

commit r11-7246-gabe07a74bb7a2692eff2af151ca54e749ed5eba6
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Mon Feb 15 15:05:22 2021 +0000

    rtl-ssa: Reduce the amount of temporary memory needed [PR98863]

    The rtl-ssa code uses an on-the-side IL and needs to build that IL
    for each block and RTL insn.  I'd originally not used the classical
    dominance frontier method for placing phis on the basis that it seemed
    like more work in this context: we're having to visit everything in
    an RPO walk anyway, so for non-backedge cases we can tell immediately
    whether a phi node is needed.  We then speculatively created phis for
    registers that are live across backedges and simplified them later.
    This avoided having to walk most of the IL twice (once to build the
    initial IL, and once to link uses to phis).

    However, as shown in PR98863, this leads to excessive temporary
    memory in extreme cases, since we had to record the value of
    every live register on exit from every block.  In that PR,
    there were many registers that were live (but unused) across
    a large region of code.

    This patch does use the classical approach to placing phis, but tries
    to use the existing DF defs information to avoid two walks of the IL.
    We still use the previous approach for memory, since there is no
    up-front information to indicate whether a block defines memory or not.
    However, since memory is just treated as a single unified thing
    (like for gimple vops), memory doesn't suffer from the same
    scalability problems as registers.

    With this change, fwprop no longer seems to be a memory-hog outlier
    in the PR: the maximum RSS is similar with and without fwprop.

    The PR also shows the problems inherent in using bitmap operations
    involving the live-in and live-out sets, which in the testcase are
    very large.  I've therefore tried to reduce those operations to the
    bare minimum.

    The patch also includes other compile-time optimisations motivated
    by the PR; see the changelog for details.

    I tried adding:

        for (int i = 0; i < 200; ++i)
          {
            crtl->ssa = new rtl_ssa::function_info (cfun);
            delete crtl->ssa;
          }

    to fwprop.c to stress the code.  fwprop then took 35% of the compile
    time for the problematic partition in the PR (measured on a release
    build).  fwprop takes less than .5% of the compile time when running
    normally.

    The command:

      git diff 0b76990a9d75d97b84014e37519086b81824c307~ gcc/fwprop.c | \
        patch -p1 -R

    still gives a working compiler that uses the old fwprop.c.  The compile
    time with that version is very similar.

    For a more reasonable testcase like optabs.ii at -O, I saw a 6.7%
    compile time regression with the loop above added (i.e. creating
    the info 201 times per pass instead of once per pass).  That goes
    down to 4.8% with -O -g.  I can't measure a significant difference
    with a normal compiler (no 200-iteration loop).

    So I think that (as expected) the patch does make things a bit
    slower in the normal case.  But like Richi says, peak memory usage
    is harder for users to work around than slighter slower compile times.

    gcc/
            PR rtl-optimization/98863
            * rtl-ssa/functions.h (function_info::bb_live_out_info): Delete.
            (function_info::build_info): Turn into a declaration, moving the
            definition to internals.h.
            (function_info::bb_walker): Declare.
            (function_info::create_reg_use): Likewise.
            (function_info::calculate_potential_phi_regs): Take a build_info
            parameter.
            (function_info::place_phis, function_info::create_ebbs): Declare.
            (function_info::calculate_ebb_live_in_for_debug): Likewise.
            (function_info::populate_backedge_phis): Delete.
            (function_info::start_block, function_info::end_block): Declare.
            (function_info::populate_phi_inputs): Delete.
            (function_info::m_potential_phi_regs): Move information to
build_info.
            * rtl-ssa/internals.h: New file.
            (function_info::bb_phi_info): New class.
            (function_info::build_info): Moved from functions.h.
            Add a constructor and destructor.
            (function_info::build_info::ebb_use): Delete.
            (function_info::build_info::ebb_def): Likewise.
            (function_info::build_info::bb_live_out): Likewise.
            (function_info::build_info::tmp_ebb_live_in_for_debug): New
variable.
            (function_info::build_info::potential_phi_regs): Likewise.
            (function_info::build_info::potential_phi_regs_for_debug):
Likewise.
            (function_info::build_info::ebb_def_regs): Likewise.
            (function_info::build_info::bb_phis): Likewise.
            (function_info::build_info::bb_mem_live_out): Likewise.
            (function_info::build_info::bb_to_rpo): Likewise.
            (function_info::build_info::def_stack): Likewise.
            (function_info::build_info::old_def_stack_limit): Likewise.
            * rtl-ssa/internals.inl
(function_info::build_info::record_reg_def):
            Remove the regno argument.  Push the previous definition onto the
            definition stack where necessary.
            * rtl-ssa/accesses.cc: Include internals.h.
            * rtl-ssa/changes.cc: Likewise.
            * rtl-ssa/blocks.cc: Likewise.
            (function_info::build_info::build_info): Define.
            (function_info::build_info::~build_info): Likewise.
            (function_info::bb_walker): New class.
            (function_info::bb_walker::bb_walker): Define.
            (function_info::add_live_out_use): Convert a logarithmic-complexity
            test into a linear one.  Allow the same definition to be passed
            multiple times.
            (function_info::calculate_potential_phi_regs): Moved from
            functions.cc.  Take a build_info parameter and store the
            information there instead.
            (function_info::place_phis): New function.
            (function_info::add_entry_block_defs): Update call to
record_reg_def.
            (function_info::calculate_ebb_live_in_for_debug): New function.
            (function_info::add_phi_nodes): Use bb_phis to decide which
            registers need phi nodes and initialize ebb_def_regs accordingly.
            Do not add degenerate phis here.
            (function_info::add_artificial_accesses): Use create_reg_use.
            Assert that all definitions are listed in the DF LR sets.
            Update call to record_reg_def.
            (function_info::record_block_live_out): Record live-out register
            values in the phis of successor blocks.  Use the live-out set
            when processing the last block in an EBB, instead of always
            using the live-in sets of successor blocks.  AND the live sets
            with the set of registers that have been defined in the EBB,
            rather than with all potential phi registers.  Cope correctly
            with branches back to the start of the current EBB.
            (function_info::start_block): New function.
            (function_info::end_block): Likewise.
            (function_info::populate_phi_inputs): Likewise.
            (function_info::create_ebbs): Likewise.
            (function_info::process_all_blocks): Rewrite into a multi-phase
            process.
            * rtl-ssa/functions.cc: Include internals.h.
            (function_info::calculate_potential_phi_regs): Move to blocks.cc.
            (function_info::init_function_data): Remove caller.
            * rtl-ssa/insns.cc: Include internals.h
            (function_info::create_reg_use): New function.  Lazily any
            degenerate phis needed by the linear RPO view.
            (function_info::record_use): Use create_reg_use.  When processing
            debug uses, use potential_phi_regs and test it before checking
            whether the register is live on entry to the current EBB.  Lazily
            calculate ebb_live_in_for_debug.
            (function_info::record_call_clobbers): Update call to
record_reg_def.
            (function_info::record_def): Likewise.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Bug rtl-optimization/98863] [11 Regression] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes
  2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
                   ` (48 preceding siblings ...)
  2021-02-15 15:06 ` cvs-commit at gcc dot gnu.org
@ 2021-02-15 15:34 ` rsandifo at gcc dot gnu.org
  49 siblings, 0 replies; 51+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-02-15 15:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #48 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Fixed on master.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2021-02-15 15:34 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28 11:19 [Bug rtl-optimization/98863] New: WRF with LTO consumes a lot of memory in split2 pass marxin at gcc dot gnu.org
2021-01-28 11:20 ` [Bug rtl-optimization/98863] " marxin at gcc dot gnu.org
2021-01-28 13:18 ` rguenth at gcc dot gnu.org
2021-01-28 13:21 ` marxin at gcc dot gnu.org
2021-01-28 14:14 ` marxin at gcc dot gnu.org
2021-01-28 14:20 ` rguenth at gcc dot gnu.org
2021-01-28 14:24 ` rguenth at gcc dot gnu.org
2021-01-28 15:09 ` marxin at gcc dot gnu.org
2021-01-28 15:15 ` rguenth at gcc dot gnu.org
2021-01-28 15:17 ` rguenth at gcc dot gnu.org
2021-01-28 15:18 ` rguenth at gcc dot gnu.org
2021-01-28 15:30 ` rguenth at gcc dot gnu.org
2021-01-29  8:47 ` rguenth at gcc dot gnu.org
2021-01-29  9:03 ` rguenth at gcc dot gnu.org
2021-01-29  9:47 ` rguenth at gcc dot gnu.org
2021-01-29  9:56 ` marxin at gcc dot gnu.org
2021-01-29 10:18 ` marxin at gcc dot gnu.org
2021-01-29 10:24 ` rguenther at suse dot de
2021-01-29 11:03 ` rguenth at gcc dot gnu.org
2021-01-29 11:03 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE pass rguenth at gcc dot gnu.org
2021-01-29 11:38 ` marxin at gcc dot gnu.org
2021-01-29 12:13 ` rguenth at gcc dot gnu.org
2021-01-29 12:50 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, CPROP, PRE and LRA passes rguenth at gcc dot gnu.org
2021-01-29 12:57 ` marxin at gcc dot gnu.org
2021-01-29 13:01 ` cvs-commit at gcc dot gnu.org
2021-01-29 13:23 ` rguenth at gcc dot gnu.org
2021-01-29 13:37 ` marxin at gcc dot gnu.org
2021-01-29 14:39 ` rguenth at gcc dot gnu.org
2021-01-29 15:01 ` rguenth at gcc dot gnu.org
2021-01-29 15:56 ` hjl.tools at gmail dot com
2021-01-29 16:32 ` cvs-commit at gcc dot gnu.org
2021-02-01  8:22 ` cvs-commit at gcc dot gnu.org
2021-02-01  9:31 ` marxin at gcc dot gnu.org
2021-02-01 10:41 ` rguenth at gcc dot gnu.org
2021-02-01 12:07 ` [Bug rtl-optimization/98863] WRF with LTO consumes a lot of memory in REE, FWPROP and x86 specific passes rguenth at gcc dot gnu.org
2021-02-01 12:30 ` rsandifo at gcc dot gnu.org
2021-02-01 12:32 ` rguenth at gcc dot gnu.org
2021-02-01 12:45 ` [Bug rtl-optimization/98863] [11 Regression] " rguenth at gcc dot gnu.org
2021-02-01 13:35 ` cvs-commit at gcc dot gnu.org
2021-02-03 10:27 ` cvs-commit at gcc dot gnu.org
2021-02-03 12:33 ` cvs-commit at gcc dot gnu.org
2021-02-08  8:45 ` rsandifo at gcc dot gnu.org
2021-02-08  9:01 ` rguenther at suse dot de
2021-02-08  9:23 ` richard.sandiford at arm dot com
2021-02-08  9:46 ` rguenth at gcc dot gnu.org
2021-02-08 12:44 ` rguenth at gcc dot gnu.org
2021-02-09 12:07 ` cvs-commit at gcc dot gnu.org
2021-02-09 12:40 ` rguenth at gcc dot gnu.org
2021-02-10 13:51 ` rguenth at gcc dot gnu.org
2021-02-15 15:06 ` cvs-commit at gcc dot gnu.org
2021-02-15 15:34 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).