public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function
@ 2024-07-19 12:06 iamanonymous.cs at gmail dot com
  2024-07-19 13:20 ` [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: iamanonymous.cs at gmail dot com @ 2024-07-19 12:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

            Bug ID: 116002
           Summary: GCC Compiler Hang with Recursive Macros in Function
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: iamanonymous.cs at gmail dot com
  Target Milestone: ---
            Target: x86_64

*******************************************************************************
I encountered an issue where the GCC compiler hangs indefinitely while trying
to compile a specific piece of code involving recursive macros within a
function. The code snippet is provided below.Clang also has this problem.

*******************************************************************************
OS and Platform:
# uname -a
Linux ubuntu 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023
x86_64 x86_64 x86_64 GNU/Linux
*******************************************************************************
# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/root/gdbtest/gcc/gcc-15/libexec/gcc/x86_64-pc-linux-gnu/15.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /root/gdbtest/gcc/obj/../gcc/configure
--prefix=/root/gdbtest/gcc/gcc-15 --enable-languages=c,c++,fortran,go
--disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20240509 (experimental) (GCC) 
*******************************************************************************
Program:
# cat 1.c

int a(int c) {
  if (c < 12345) {
#define A \
    { int d = c % 12345; \
      int e = c ^ d * 12345; \
      int f = c * d % e % 12345; \
      c = f; }
#define B A A A A A A
#define C B B B B B B
#define D C C C C C C
#define E D D D D D D
    E E E E E
  }
  return c;
}




*******************************************************************************
Output: Nothing (Waited for nearly 300 seconds.
*******************************************************************************

Also got killed on compiler explorer:https://godbolt.org/z/9Exof1rrW

*******************************************************************************

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
@ 2024-07-19 13:20 ` rguenth at gcc dot gnu.org
  2024-07-19 13:30 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-07-19 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |14.1.1
          Component|c                           |rtl-optimization
     Ever confirmed|0                           |1
            Summary|GCC Compiler Hang with      |GCC Compiler time-hog with
                   |Recursive Macros in         |large basic block in
                   |Function                    |Function
   Last reconfirmed|                            |2024-07-19
           Keywords|                            |compile-time-hog
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
With only 'E E' it's

 CPROP                              :  18.28 ( 62%)

with 'E E E'

 dead store elim1                   :  17.04 ( 22%)
 CPROP                              :  50.19 ( 65%)

with 'E E E E'

 dead store elim1                   :  36.41 ( 23%)
 CPROP                              : 106.68 ( 67%)

this is the usual gcse memory/compile-time hog with its (and DF) bitmap
operations.
We already disable it via gcse_or_cprop_is_too_expensive but the heuristic
doesn't trigger here - it's few basic blocks only.  We end up

  <bb 2> [local count: 1073741824]:
  if (c_11666(D) <= 12344)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870912]:
  d_11667 = c_11666(D) % 12345;
  _1 = d_11667 * 12345;
  e_11668 = _1 ^ c_11666(D);
  _2 = c_11666(D) * d_11667;
  _3 = _2 % e_11668; 
  f_11669 = _3 % 12345;
  _4 = f_11669 * 12345;
  e_11670 = _4 ^ f_11669;
  _5 = f_11669 * f_11669;
  _6 = _5 % e_11670;
  f_11671 = _6 % 12345;
  _7 = f_11671 * 12345;
  e_11672 = _7 ^ f_11671;
  _8 = f_11671 * f_11671;
  _9 = _8 % e_11672;
...
  _11664 = _11663 % e_19442;
  f_19443 = _11664 % 12345;

  <bb 4> [local count: 1073741824]:
  # c_11665 = PHI <c_11666(D)(2), f_19443(3)>
  return c_11665;

thus very many pseudos and insns but few basic-blocks.  I believe there's
a duplicate.

Samples: 114K of event 'cycles:P', Event count (approx.): 121898207258          
Overhead       Samples  Command  Shared Object         Symbol                   
  61.00%         70222  cc1      cc1                   [.]
_Z22rtx_equal_for_cselib_1P7rtx_defS0_12machine_modei
  10.17%         11704  cc1      cc1                   [.]
_Z13cselib_lookupP7rtx_def12machine_modeiS1_
   6.42%          7400  cc1      cc1                   [.]
_ZN10hash_tableI13cselib_hasherLb0E11xcallocatorE19find_slot_with
   4.75%          5251  cc1      cc1                   [.]
_ZL10topo_visitP16constraint_graphR3vecIj7va_heap6vl_ptrEP17simpl
   2.91%          3219  cc1      cc1                   [.]
_ZL11solve_graphP16constraint_graph

ah, I've seen this as well - the CSELIB hashing is a joke;  I've run into this
earlier.  It certainly will produce collisions a lot
with a lot of expressions with the same RTX code like this.  Ah, it was
cse.cc hashing that was even worse.  But CSELIB hashing also uses + for
mixing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
  2024-07-19 13:20 ` [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block " rguenth at gcc dot gnu.org
@ 2024-07-19 13:30 ` rguenth at gcc dot gnu.org
  2024-07-19 14:33 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-07-19 13:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 58708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58708&action=edit
prototype patch

This improves compile time to 23s for 'E E E E E' with now

 tree PTA                           :  16.93 ( 74%)
 CPROP                              :   0.15 (  1%)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
  2024-07-19 13:20 ` [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block " rguenth at gcc dot gnu.org
  2024-07-19 13:30 ` rguenth at gcc dot gnu.org
@ 2024-07-19 14:33 ` rguenth at gcc dot gnu.org
  2024-07-23  7:34 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-07-19 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |13.3.0

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The minimal fix for this is the following with the first hunk only for
consistency (I'm not sure this is relevant).  This might be more appropriate
when one considers backporting.

diff --git a/gcc/cselib.cc b/gcc/cselib.cc
index cbaab7d515c..67f57b67649 100644
--- a/gcc/cselib.cc
+++ b/gcc/cselib.cc
@@ -1267,12 +1267,12 @@ cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int
create,
     return e->hash;

   unsigned hash = (unsigned) PLUS + (unsigned) GET_MODE (x);
-  hash += e->hash;
+  hash = iterative_hash_hashval_t (hash, e->hash);
   unsigned int tem_hash = (unsigned) CONST_INT + (unsigned) VOIDmode;
   tem_hash += ((unsigned) CONST_INT << 7) + (unsigned HOST_WIDE_INT) c;
   if (tem_hash == 0)
     tem_hash = (unsigned int) CONST_INT;
-  hash += tem_hash;
+  hash = iterative_hash_hashval_t (hash, tem_hash);
   return hash ? hash : 1 + (unsigned int) PLUS;
 }

@@ -1509,7 +1509,7 @@ cselib_hash_rtx (rtx x, int create, machine_mode memmode)
            if (tem_hash == 0)
              return 0;

-           hash += tem_hash;
+           hash = iterative_hash_hashval_t (hash, tem_hash);
          }
          break;
        case 'E':

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
                   ` (2 preceding siblings ...)
  2024-07-19 14:33 ` rguenth at gcc dot gnu.org
@ 2024-07-23  7:34 ` cvs-commit at gcc dot gnu.org
  2024-07-23  7:59 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-07-23  7:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:44e065a52fa6069d6c8cacebc8f876840d278dd0

commit r15-2218-g44e065a52fa6069d6c8cacebc8f876840d278dd0
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jul 19 16:23:51 2024 +0200

    [v2] rtl-optimization/116002 - cselib hash is bad

    The following addresses the bad hash function of cselib which uses
    integer plus for merging.  This causes a huge number of collisions
    for the testcase in the PR and thus very large compile-time.

    The following rewrites it to use inchash, eliding duplicate mixing
    of RTX code and mode in some cases and more consistently avoiding
    a return value of zero as well as treating zero as fatal.  An
    important part is to preserve mixing of hashes of commutative
    operators as commutative.

    For cselib_hash_plus_const_int this removes the apparent attempt
    of making sure to hash the same as a PLUS as cselib_hash_rtx makes
    sure to dispatch to cselib_hash_plus_const_int consistently.

    This reduces compile-time for the testcase in the PR from unknown
    to 22s and for a reduced testcase from 73s to 9s.  There's another
    pending patchset to improve the speed of inchash mixing, but it's
    not in the profile for this testcase (PTA pops up now).

    The generated code is equal.  I've also compared cc1 builds
    with and without the patch and they are now commparing equal
    after retaining commutative hashing for commutative operators.

            PR rtl-optimization/116002
            * cselib.cc (cselib_hash_rtx): Use inchash to get proper mixing.
            Consistently avoid a zero return value when hashing successfully.
            Consistently treat a zero hash value from recursing as fatal.
            Use hashval_t where appropriate.
            (cselib_hash_plus_const_int): Likewise.
            (new_cselib_val): Use hashval_t.
            (cselib_lookup_1): Likewise.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
                   ` (3 preceding siblings ...)
  2024-07-23  7:34 ` cvs-commit at gcc dot gnu.org
@ 2024-07-23  7:59 ` rguenth at gcc dot gnu.org
  2024-07-23  8:35 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-07-23  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |15.0

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The CPROP and DSE issues are fixed for GCC 15.

With -O1 we need 4s to compile even 'E E E E E E E E E', adding 'F'
and testing with 'F F F F F F F F F' needs 43s which looks linear behavior.
Popping up on the radar are then

 combiner                           :   4.65 ( 11%)
 integrated RA                      :   7.92 ( 18%)
 LRA reload inheritance             :   4.15 ( 10%)
 LRA create live ranges             :   5.66 ( 13%)

so mainly register allocation.

With -O2 we have PTA showing up

 tree PTA                           :  12.21 ( 77%)

PTA solving should be pretty quick, but it seems we take a lot of iterations
for unknown reasons:

  32.27%         20663  cc1      cc1               [.] topo_visit
  16.29%         10432  cc1      cc1               [.] solve_constraints
  14.69%          9412  cc1      cc1               [.] find
  14.10%          9021  cc1      cc1               [.] bitmap_clear_bit
   1.19%           764  cc1      cc1               [.] bitmap_set_bit

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
                   ` (4 preceding siblings ...)
  2024-07-23  7:59 ` rguenth at gcc dot gnu.org
@ 2024-07-23  8:35 ` rguenth at gcc dot gnu.org
  2024-07-23  9:51 ` cvs-commit at gcc dot gnu.org
  2024-07-25 11:49 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-07-23  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> With -O2 we have PTA showing up
> 
>  tree PTA                           :  12.21 ( 77%)
> 
> PTA solving should be pretty quick, but it seems we take a lot of iterations
> for unknown reasons:
> 
>   32.27%         20663  cc1      cc1               [.] topo_visit
>   16.29%         10432  cc1      cc1               [.] solve_constraints
>   14.69%          9412  cc1      cc1               [.] find
>   14.10%          9021  cc1      cc1               [.] bitmap_clear_bit
>    1.19%           764  cc1      cc1               [.] bitmap_set_bit

the issue is that we have only constraints like

_7 = &NONLOCAL
e_30 = _7 + UNKNOWN
e_30 = f_27 + UNKNOWN
_8 = f_27 + UNKNOWN
_8 = f_27 + UNKNOWN
_9 = _8 + UNKNOWN

those form a graph with N nodes and no copy edges as those constraints
are all complex.  Each solving iteration a few nodes get a new solution
but those dependent have already been evaluated so we bottleneck on
computing the optimal order to visit.

I'm testing a patch for this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
                   ` (5 preceding siblings ...)
  2024-07-23  8:35 ` rguenth at gcc dot gnu.org
@ 2024-07-23  9:51 ` cvs-commit at gcc dot gnu.org
  2024-07-25 11:49 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-07-23  9:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:15d3b2dab9182eff036a604169b5e6f4ab3b2a40

commit r15-2223-g15d3b2dab9182eff036a604169b5e6f4ab3b2a40
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jul 23 10:29:58 2024 +0200

    tree-optimization/116002 - PTA solving slow with degenerate graph

    When the constraint graph consists of N nodes with only complex
    constraints and no copy edges we have to be lucky to arrive at
    a constraint solving order that requires the optimal number of
    iterations.  What happens in the testcase is that we bottle-neck
    on computing the visitation order but propagate changes only
    very slowly.  Luckily the testcase complex constraints are
    all copy-with-offset and those do provide a way to order
    visitation.  The following adds this which reduces the iteration
    count to one.

            PR tree-optimization/116002
            * tree-ssa-structalias.cc (topo_visit): Also consider
            SCALAR = SCALAR complex constraints as edges.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function
  2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
                   ` (6 preceding siblings ...)
  2024-07-23  9:51 ` cvs-commit at gcc dot gnu.org
@ 2024-07-25 11:49 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-07-25 11:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
   Target Milestone|---                         |15.0
         Resolution|---                         |FIXED

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
The RTL combiner is now topmost spot at -O3

 combiner                           :   5.39 ( 21%)

but I consider this bug fixed now.  Thus, fixed for GCC 15.  I don't think it's
worth backporting this for artificial testcases like this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-25 11:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-19 12:06 [Bug c/116002] New: GCC Compiler Hang with Recursive Macros in Function iamanonymous.cs at gmail dot com
2024-07-19 13:20 ` [Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block " rguenth at gcc dot gnu.org
2024-07-19 13:30 ` rguenth at gcc dot gnu.org
2024-07-19 14:33 ` rguenth at gcc dot gnu.org
2024-07-23  7:34 ` cvs-commit at gcc dot gnu.org
2024-07-23  7:59 ` rguenth at gcc dot gnu.org
2024-07-23  8:35 ` rguenth at gcc dot gnu.org
2024-07-23  9:51 ` cvs-commit at gcc dot gnu.org
2024-07-25 11:49 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).