* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
@ 2021-01-27 12:23 ` rguenth at gcc dot gnu.org
2021-01-27 13:09 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-27 12:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |4.3.4
Known to fail| |4.8.5
--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
So now I see
> /usr/bin/time gfortran-4.3 t.f90 -fdefault-integer-8 -O2 -ftime-report
combiner : 0.25 ( 2%) usr 0.00 ( 0%) sys 0.24 ( 2%) wall
9947 kB ( 5%) ggc
TOTAL : 15.43 0.21 15.65
220667 kB
15.59user 0.24system 0:15.84elapsed 99%CPU (0avgtext+0avgdata
607492maxresident)k
0inputs+0outputs (0major+164981minor)pagefaults 0swaps
> /usr/bin/time gfortran-4.8 t.f90 -fdefault-integer-8 -O2 -ftime-report
combiner : 90.22 (48%) usr 1.07 (63%) sys 91.33 (48%) wall
1757344 kB (88%) ggc
TOTAL : 188.29 1.70 190.04
2000994 kB
188.43user 1.73system 3:10.21elapsed 99%CPU (0avgtext+0avgdata
6523136maxresident)k
0inputs+0outputs (0major+1727565minor)pagefaults 0swaps
> /usr/bin/time gfortran-7 t.f90 -fdefault-integer-8 -O2 -fno-checking -ftime-report
combiner : 67.18 (64%) usr 0.56 (60%) sys 67.76 (64%) wall
2701121 kB (60%) ggc
TOTAL : 105.40 0.93 106.36
4530486 kB
105.54user 0.99system 1:46.58elapsed 99%CPU (0avgtext+0avgdata
3297696maxresident)k
48248inputs+0outputs (7major+835050minor)pagefaults 0swaps
> /usr/bin/time gfortran-10 t.f90 -fdefault-integer-8 -O2 -fno-checking -ftime-report
combiner : 0.24 ( 0%) 0.00 ( 0%) 0.22 ( 0%)
10376 kB ( 1%)
TOTAL : 52.02 0.49 52.52
1876905 kB
52.16user 0.52system 0:52.71elapsed 99%CPU (0avgtext+0avgdata
1831392maxresident)k
55032inputs+0outputs (8major+539965minor)pagefaults 0swaps
(that combine number prevails on trunk as well, I can't spot any code
that disables combine on large BBs so not sure what goes on here)
At least clearly GCC 4.8.5 is bad as well and there's clear progression
on both memory use and compile-time, still not up to the level of GCC 4.3.
Interestingly memory-wise it all points to RTL DSE (GCC 10), likely
because of DF. Eventually post-reload we can simplify some things...
dead store elim2 : 6.90 ( 12%) 0.20 ( 27%) 7.12 ( 12%)
1641076 kB ( 87%)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
2021-01-27 12:23 ` [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case rguenth at gcc dot gnu.org
@ 2021-01-27 13:09 ` rguenth at gcc dot gnu.org
2021-01-27 14:35 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-27 13:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
And we allocate
plus 66M 1606M
66 million PLUS RTXen via
explow.c:200 (plus_constant) 0 : 0.0% 1596M:
92.0% 0 : 0.0% 0 : 0.0% 66M
called by DSE check_mem_read_rtx and record_store. Ideally we'd not need
any of that via an interface change to canon_true_dependence and friends
(pass in an optional offset).
Most of the time the plus RTX is already present in the original MEM. Like
Breakpoint 6, record_store (body=0x7ffff42caa98, bb_info=0x3ea3b60)
at /home/rguenther/src/gcc2/gcc/dse.c:1529
1529 mem_addr = plus_constant (get_address_mode (mem), mem_addr,
offset);
(reg/f:DI 19 frame)
$14 = void
(gdb) p debug_rtx (mem)
(mem/c:DI (plus:DI (reg/f:DI 19 frame)
(const_int -440 [0xfffffffffffffe48])) [1 MEM[(struct __st_parameter_dt
*)_13].format_len+0 S8 A64])
$15 = void
(gdb) p offset
$16 = {<poly_int_pod<1, long>> = {coeffs = {-440}}, <No data fields>}
trivially pattern matching existing PLUS like
if (MEM_P (mem)
&& GET_CODE (XEXP (mem, 0)) == PLUS
&& XEXP (XEXP (mem, 0), 0) == mem_addr
&& CONST_INT_P (XEXP (XEXP (mem, 0), 1))
&& known_eq (offset, INTVAL (XEXP (XEXP (mem, 0), 1))))
mem_addr= XEXP (mem, 0);
else
mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
doesn't help much. Most cases seem to be build over (value:...) RTXen,
those we could ggc_free I presume. Doing that in check_mem_read_rtx
doesn't help though.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
2021-01-27 12:23 ` [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case rguenth at gcc dot gnu.org
2021-01-27 13:09 ` rguenth at gcc dot gnu.org
@ 2021-01-27 14:35 ` rguenth at gcc dot gnu.org
2021-01-27 22:15 ` segher at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-27 14:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, so it's not actually that plus_constant calls but the ones called via
get_addr from true_dependence_1 which is called 60 million times from
check_mem_read_use. That does:
/* Convert the address X into something we can use. This is done by returning
it unchanged unless it is a VALUE or VALUE +/- constant; for VALUE
we call cselib to get a more useful rtx. */
rtx
get_addr (rtx x)
{
cselib_val *v;
struct elt_loc_list *l;
if (GET_CODE (x) != VALUE)
{
if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)
&& GET_CODE (XEXP (x, 0)) == VALUE
&& CONST_SCALAR_INT_P (XEXP (x, 1)))
{
rtx op0 = get_addr (XEXP (x, 0));
if (op0 != XEXP (x, 0))
{
poly_int64 c;
if (GET_CODE (x) == PLUS
&& poly_int_rtx_p (XEXP (x, 1), &c))
return plus_constant (GET_MODE (x), op0, c);
thus undoing the valueization DSE does. Since it unconditionally does
this I guess DSE could do it itself instead. That helps tremendously:
dead store elim2 : 6.34 ( 11%) 0.02 ( 7%) 6.38 ( 11%)
170M ( 45%)
TOTAL : 56.96 0.27 57.26
381M
56.96user 0.29system 0:57.27elapsed 99%CPU (0avgtext+0avgdata
825148maxresident)k
0inputs+0outputs (0major+210372minor)pagefaults 0swaps
diff --git a/gcc/dse.c b/gcc/dse.c
index c88587e7d94..da0df54a2dd 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
}
if (maybe_ne (offset, 0))
mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
+ /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
+ which will over and over re-create proper RTL and re-apply the
+ offset above. See PR80960 where we almost allocate 1.6GB of PLUS
+ RTXen that way. */
+ mem_addr = get_addr (mem_addr);
if (group_id >= 0)
{
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2021-01-27 14:35 ` rguenth at gcc dot gnu.org
@ 2021-01-27 22:15 ` segher at gcc dot gnu.org
2021-01-28 8:14 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: segher at gcc dot gnu.org @ 2021-01-27 22:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
--- Comment #26 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #23)
> (that combine number prevails on trunk as well, I can't spot any code
> that disables combine on large BBs so not sure what goes on here)
There is no such thing, indeed. And the instruction combiner is
"mostly linear", so it shouldn't actually matter.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2021-01-27 22:15 ` segher at gcc dot gnu.org
@ 2021-01-28 8:14 ` cvs-commit at gcc dot gnu.org
2021-01-29 11:05 ` [Bug rtl-optimization/80960] [8/9/10 " rguenth at gcc dot gnu.org
` (4 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-28 8:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
--- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:a523add327c6cfdd68cf9b788ea808068d0f508c
commit r11-6948-ga523add327c6cfdd68cf9b788ea808068d0f508c
Author: Richard Biener <rguenther@suse.de>
Date: Wed Jan 27 15:35:52 2021 +0100
rtl-optimization/80960 - avoid creating garbage RTL in DSE
The following avoids repeatedly turning VALUE RTXen into
sth useful and re-applying a constant offset through get_addr
via DSE check_mem_read_rtx. Instead perform this once for
all stores to be visited in check_mem_read_rtx. This avoids
allocating 1.6GB of garbage PLUS RTXen on the PR80960
testcase, fixing the memory usage regression from old GCC.
2021-01-27 Richard Biener <rguenther@suse.de>
PR rtl-optimization/80960
* dse.c (check_mem_read_rtx): Call get_addr on the
offsetted address.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [8/9/10 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2021-01-28 8:14 ` cvs-commit at gcc dot gnu.org
@ 2021-01-29 11:05 ` rguenth at gcc dot gnu.org
2021-05-14 9:49 ` [Bug rtl-optimization/80960] [9/10 " jakub at gcc dot gnu.org
` (3 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 11:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |11.0
Summary|[8/9/10/11 Regression] Huge |[8/9/10 Regression] Huge
|memory use when compiling a |memory use when compiling a
|very large test case |very large test case
--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [9/10 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2021-01-29 11:05 ` [Bug rtl-optimization/80960] [8/9/10 " rguenth at gcc dot gnu.org
@ 2021-05-14 9:49 ` jakub at gcc dot gnu.org
2021-05-17 17:12 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-14 9:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|8.5 |9.4
--- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 8 branch is being closed.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [9/10 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2021-05-14 9:49 ` [Bug rtl-optimization/80960] [9/10 " jakub at gcc dot gnu.org
@ 2021-05-17 17:12 ` cvs-commit at gcc dot gnu.org
2021-05-18 7:05 ` [Bug rtl-optimization/80960] [9 " cvs-commit at gcc dot gnu.org
2021-05-18 7:07 ` rguenth at gcc dot gnu.org
9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-05-17 17:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
--- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:47d3815f0669800666f1dd69f0c5cfecc617a12b
commit r10-9832-g47d3815f0669800666f1dd69f0c5cfecc617a12b
Author: Richard Biener <rguenther@suse.de>
Date: Wed Jan 27 15:35:52 2021 +0100
rtl-optimization/80960 - avoid creating garbage RTL in DSE
The following avoids repeatedly turning VALUE RTXen into
sth useful and re-applying a constant offset through get_addr
via DSE check_mem_read_rtx. Instead perform this once for
all stores to be visited in check_mem_read_rtx. This avoids
allocating 1.6GB of garbage PLUS RTXen on the PR80960
testcase, fixing the memory usage regression from old GCC.
2021-01-27 Richard Biener <rguenther@suse.de>
PR rtl-optimization/80960
* dse.c (check_mem_read_rtx): Call get_addr on the
offsetted address.
(cherry picked from commit a523add327c6cfdd68cf9b788ea808068d0f508c)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [9 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2021-05-17 17:12 ` cvs-commit at gcc dot gnu.org
@ 2021-05-18 7:05 ` cvs-commit at gcc dot gnu.org
2021-05-18 7:07 ` rguenth at gcc dot gnu.org
9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-05-18 7:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
--- Comment #31 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-9 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:6cb5edbf7e6724ff014954863989d7444ee84c6a
commit r9-9540-g6cb5edbf7e6724ff014954863989d7444ee84c6a
Author: Richard Biener <rguenther@suse.de>
Date: Wed Jan 27 15:35:52 2021 +0100
rtl-optimization/80960 - avoid creating garbage RTL in DSE
The following avoids repeatedly turning VALUE RTXen into
sth useful and re-applying a constant offset through get_addr
via DSE check_mem_read_rtx. Instead perform this once for
all stores to be visited in check_mem_read_rtx. This avoids
allocating 1.6GB of garbage PLUS RTXen on the PR80960
testcase, fixing the memory usage regression from old GCC.
2021-01-27 Richard Biener <rguenther@suse.de>
PR rtl-optimization/80960
* dse.c (check_mem_read_rtx): Call get_addr on the
offsetted address.
(cherry picked from commit a523add327c6cfdd68cf9b788ea808068d0f508c)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/80960] [9 Regression] Huge memory use when compiling a very large test case
[not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2021-05-18 7:05 ` [Bug rtl-optimization/80960] [9 " cvs-commit at gcc dot gnu.org
@ 2021-05-18 7:07 ` rguenth at gcc dot gnu.org
9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-18 7:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |9.3.1
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
Known to fail| |9.3.0
--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 10+ messages in thread