public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
@ 2021-01-27 12:23 ` rguenth at gcc dot gnu.org
  2021-01-27 13:09 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-27 12:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |4.3.4
      Known to fail|                            |4.8.5

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
So now I see

> /usr/bin/time gfortran-4.3 t.f90 -fdefault-integer-8 -O2 -ftime-report
 combiner              :   0.25 ( 2%) usr   0.00 ( 0%) sys   0.24 ( 2%) wall   
9947 kB ( 5%) ggc
 TOTAL                 :  15.43             0.21            15.65            
220667 kB
15.59user 0.24system 0:15.84elapsed 99%CPU (0avgtext+0avgdata
607492maxresident)k
0inputs+0outputs (0major+164981minor)pagefaults 0swaps

> /usr/bin/time gfortran-4.8 t.f90 -fdefault-integer-8 -O2 -ftime-report
 combiner                :  90.22 (48%) usr   1.07 (63%) sys  91.33 (48%) wall
1757344 kB (88%) ggc
 TOTAL                 : 188.29             1.70           190.04           
2000994 kB
188.43user 1.73system 3:10.21elapsed 99%CPU (0avgtext+0avgdata
6523136maxresident)k
0inputs+0outputs (0major+1727565minor)pagefaults 0swaps

> /usr/bin/time gfortran-7 t.f90 -fdefault-integer-8 -O2 -fno-checking -ftime-report
 combiner                :  67.18 (64%) usr   0.56 (60%) sys  67.76 (64%) wall
2701121 kB (60%) ggc
 TOTAL                 : 105.40             0.93           106.36           
4530486 kB
105.54user 0.99system 1:46.58elapsed 99%CPU (0avgtext+0avgdata
3297696maxresident)k
48248inputs+0outputs (7major+835050minor)pagefaults 0swaps

> /usr/bin/time gfortran-10 t.f90 -fdefault-integer-8 -O2 -fno-checking -ftime-report
 combiner                           :   0.24 (  0%)   0.00 (  0%)   0.22 (  0%)
  10376 kB (  1%)
 TOTAL                              :  52.02          0.49         52.52       
1876905 kB
52.16user 0.52system 0:52.71elapsed 99%CPU (0avgtext+0avgdata
1831392maxresident)k
55032inputs+0outputs (8major+539965minor)pagefaults 0swaps

(that combine number prevails on trunk as well, I can't spot any code
that disables combine on large BBs so not sure what goes on here)

At least clearly GCC 4.8.5 is bad as well and there's clear progression
on both memory use and compile-time, still not up to the level of GCC 4.3.

Interestingly memory-wise it all points to RTL DSE (GCC 10), likely
because of DF.  Eventually post-reload we can simplify some things...

 dead store elim2                   :   6.90 ( 12%)   0.20 ( 27%)   7.12 ( 12%)
1641076 kB ( 87%)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
  2021-01-27 12:23 ` [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case rguenth at gcc dot gnu.org
@ 2021-01-27 13:09 ` rguenth at gcc dot gnu.org
  2021-01-27 14:35 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-27 13:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
And we allocate

plus                         66M      1606M

66 million PLUS RTXen via

explow.c:200 (plus_constant)                             0 :  0.0%     1596M:
92.0%        0 :  0.0%        0 :  0.0%       66M

called by DSE check_mem_read_rtx and record_store.  Ideally we'd not need
any of that via an interface change to canon_true_dependence and friends
(pass in an optional offset).

Most of the time the plus RTX is already present in the original MEM.  Like

Breakpoint 6, record_store (body=0x7ffff42caa98, bb_info=0x3ea3b60)
    at /home/rguenther/src/gcc2/gcc/dse.c:1529
1529        mem_addr = plus_constant (get_address_mode (mem), mem_addr,
offset);
(reg/f:DI 19 frame)
$14 = void
(gdb) p debug_rtx (mem)
(mem/c:DI (plus:DI (reg/f:DI 19 frame)
        (const_int -440 [0xfffffffffffffe48])) [1 MEM[(struct __st_parameter_dt
*)_13].format_len+0 S8 A64])
$15 = void
(gdb) p offset
$16 = {<poly_int_pod<1, long>> = {coeffs = {-440}}, <No data fields>}

trivially pattern matching existing PLUS like

      if (MEM_P (mem)
          && GET_CODE (XEXP (mem, 0)) == PLUS
          && XEXP (XEXP (mem, 0), 0) == mem_addr
          && CONST_INT_P (XEXP (XEXP (mem, 0), 1))
          && known_eq (offset, INTVAL (XEXP (XEXP (mem, 0), 1))))
        mem_addr= XEXP (mem, 0);
      else
        mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);

doesn't help much.  Most cases seem to be build over (value:...) RTXen,
those we could ggc_free I presume.  Doing that in check_mem_read_rtx
doesn't help though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
  2021-01-27 12:23 ` [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case rguenth at gcc dot gnu.org
  2021-01-27 13:09 ` rguenth at gcc dot gnu.org
@ 2021-01-27 14:35 ` rguenth at gcc dot gnu.org
  2021-01-27 22:15 ` segher at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-27 14:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, so it's not actually that plus_constant calls but the ones called via
get_addr from true_dependence_1 which is called 60 million times from
check_mem_read_use.  That does:

/* Convert the address X into something we can use.  This is done by returning
   it unchanged unless it is a VALUE or VALUE +/- constant; for VALUE
   we call cselib to get a more useful rtx.  */

rtx
get_addr (rtx x)
{
  cselib_val *v;
  struct elt_loc_list *l;

  if (GET_CODE (x) != VALUE)
    {
      if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)
          && GET_CODE (XEXP (x, 0)) == VALUE
          && CONST_SCALAR_INT_P (XEXP (x, 1)))
        {
          rtx op0 = get_addr (XEXP (x, 0));
          if (op0 != XEXP (x, 0))
            {
              poly_int64 c;
              if (GET_CODE (x) == PLUS
                  && poly_int_rtx_p (XEXP (x, 1), &c))
                return plus_constant (GET_MODE (x), op0, c);

thus undoing the valueization DSE does.  Since it unconditionally does
this I guess DSE could do it itself instead.  That helps tremendously:

 dead store elim2                   :   6.34 ( 11%)   0.02 (  7%)   6.38 ( 11%)
  170M ( 45%)
 TOTAL                              :  56.96          0.27         57.26       
  381M
56.96user 0.29system 0:57.27elapsed 99%CPU (0avgtext+0avgdata
825148maxresident)k
0inputs+0outputs (0major+210372minor)pagefaults 0swaps

diff --git a/gcc/dse.c b/gcc/dse.c
index c88587e7d94..da0df54a2dd 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
     }
   if (maybe_ne (offset, 0))
     mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
+  /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
+     which will over and over re-create proper RTL and re-apply the
+     offset above.  See PR80960 where we almost allocate 1.6GB of PLUS
+     RTXen that way.  */
+  mem_addr = get_addr (mem_addr);

   if (group_id >= 0)
     {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-01-27 14:35 ` rguenth at gcc dot gnu.org
@ 2021-01-27 22:15 ` segher at gcc dot gnu.org
  2021-01-28  8:14 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: segher at gcc dot gnu.org @ 2021-01-27 22:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #26 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #23)
> (that combine number prevails on trunk as well, I can't spot any code
> that disables combine on large BBs so not sure what goes on here)

There is no such thing, indeed.  And the instruction combiner is
"mostly linear", so it shouldn't actually matter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-01-27 22:15 ` segher at gcc dot gnu.org
@ 2021-01-28  8:14 ` cvs-commit at gcc dot gnu.org
  2021-01-29 11:05 ` [Bug rtl-optimization/80960] [8/9/10 " rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-28  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a523add327c6cfdd68cf9b788ea808068d0f508c

commit r11-6948-ga523add327c6cfdd68cf9b788ea808068d0f508c
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 27 15:35:52 2021 +0100

    rtl-optimization/80960 - avoid creating garbage RTL in DSE

    The following avoids repeatedly turning VALUE RTXen into
    sth useful and re-applying a constant offset through get_addr
    via DSE check_mem_read_rtx.  Instead perform this once for
    all stores to be visited in check_mem_read_rtx.  This avoids
    allocating 1.6GB of garbage PLUS RTXen on the PR80960
    testcase, fixing the memory usage regression from old GCC.

    2021-01-27  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/80960
            * dse.c (check_mem_read_rtx): Call get_addr on the
            offsetted address.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [8/9/10 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-01-28  8:14 ` cvs-commit at gcc dot gnu.org
@ 2021-01-29 11:05 ` rguenth at gcc dot gnu.org
  2021-05-14  9:49 ` [Bug rtl-optimization/80960] [9/10 " jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 11:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |11.0
            Summary|[8/9/10/11 Regression] Huge |[8/9/10 Regression] Huge
                   |memory use when compiling a |memory use when compiling a
                   |very large test case        |very large test case

--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [9/10 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-01-29 11:05 ` [Bug rtl-optimization/80960] [8/9/10 " rguenth at gcc dot gnu.org
@ 2021-05-14  9:49 ` jakub at gcc dot gnu.org
  2021-05-17 17:12 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-14  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|8.5                         |9.4

--- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 8 branch is being closed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [9/10 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-05-14  9:49 ` [Bug rtl-optimization/80960] [9/10 " jakub at gcc dot gnu.org
@ 2021-05-17 17:12 ` cvs-commit at gcc dot gnu.org
  2021-05-18  7:05 ` [Bug rtl-optimization/80960] [9 " cvs-commit at gcc dot gnu.org
  2021-05-18  7:07 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-05-17 17:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:47d3815f0669800666f1dd69f0c5cfecc617a12b

commit r10-9832-g47d3815f0669800666f1dd69f0c5cfecc617a12b
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 27 15:35:52 2021 +0100

    rtl-optimization/80960 - avoid creating garbage RTL in DSE

    The following avoids repeatedly turning VALUE RTXen into
    sth useful and re-applying a constant offset through get_addr
    via DSE check_mem_read_rtx.  Instead perform this once for
    all stores to be visited in check_mem_read_rtx.  This avoids
    allocating 1.6GB of garbage PLUS RTXen on the PR80960
    testcase, fixing the memory usage regression from old GCC.

    2021-01-27  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/80960
            * dse.c (check_mem_read_rtx): Call get_addr on the
            offsetted address.

    (cherry picked from commit a523add327c6cfdd68cf9b788ea808068d0f508c)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [9 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-05-17 17:12 ` cvs-commit at gcc dot gnu.org
@ 2021-05-18  7:05 ` cvs-commit at gcc dot gnu.org
  2021-05-18  7:07 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-05-18  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #31 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-9 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:6cb5edbf7e6724ff014954863989d7444ee84c6a

commit r9-9540-g6cb5edbf7e6724ff014954863989d7444ee84c6a
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 27 15:35:52 2021 +0100

    rtl-optimization/80960 - avoid creating garbage RTL in DSE

    The following avoids repeatedly turning VALUE RTXen into
    sth useful and re-applying a constant offset through get_addr
    via DSE check_mem_read_rtx.  Instead perform this once for
    all stores to be visited in check_mem_read_rtx.  This avoids
    allocating 1.6GB of garbage PLUS RTXen on the PR80960
    testcase, fixing the memory usage regression from old GCC.

    2021-01-27  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/80960
            * dse.c (check_mem_read_rtx): Call get_addr on the
            offsetted address.

    (cherry picked from commit a523add327c6cfdd68cf9b788ea808068d0f508c)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/80960] [9 Regression] Huge memory use when compiling a very large test case
       [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2021-05-18  7:05 ` [Bug rtl-optimization/80960] [9 " cvs-commit at gcc dot gnu.org
@ 2021-05-18  7:07 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-18  7:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |9.3.1
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED
      Known to fail|                            |9.3.0

--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-18  7:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-80960-4@http.gcc.gnu.org/bugzilla/>
2021-01-27 12:23 ` [Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case rguenth at gcc dot gnu.org
2021-01-27 13:09 ` rguenth at gcc dot gnu.org
2021-01-27 14:35 ` rguenth at gcc dot gnu.org
2021-01-27 22:15 ` segher at gcc dot gnu.org
2021-01-28  8:14 ` cvs-commit at gcc dot gnu.org
2021-01-29 11:05 ` [Bug rtl-optimization/80960] [8/9/10 " rguenth at gcc dot gnu.org
2021-05-14  9:49 ` [Bug rtl-optimization/80960] [9/10 " jakub at gcc dot gnu.org
2021-05-17 17:12 ` cvs-commit at gcc dot gnu.org
2021-05-18  7:05 ` [Bug rtl-optimization/80960] [9 " cvs-commit at gcc dot gnu.org
2021-05-18  7:07 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).