public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
@ 2024-06-23  8:24 tnfchris at gcc dot gnu.org
  2024-06-23  8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-23  8:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597

            Bug ID: 115597
           Summary: [15 Regression] vectorizer takes 20+ h compiling
                    510.parest in SPECCPU2017 since
                    g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: compile-time-hog
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

Created attachment 58496
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58496&action=edit
slp dump graph

Since:

commit 46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 (HEAD)
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jun 19 12:57:27 2024 +0200

    tree-optimization/114413 - SLP CSE after permute optimization

    We currently fail to re-CSE SLP nodes after optimizing permutes
    which results in off cost estimates.  For gcc.dg/vect/bb-slp-32.c
    this shows in not re-using the SLP node with the load and arithmetic
    for both the store and the reduction.  The following implements
    CSE by re-bst-mapping nodes as finalization part of vect_optimize_slp.

    I've tried to make the CSE part of permute materialization but it
    isn't a very good fit there.  I've not bothered to implement something
    more complete, also handling external defs or defs without
    SLP_TREE_SCALAR_STMTS.

    I realize this might result in more BB SLP which in turn might slow
    down code given costing for BB SLP is difficult (even that we now
    vectorize gcc.dg/vect/bb-slp-32.c on x86_64 might be not a good idea).
    This is nevertheless feeding more accurate info to costing which is
    good.

            PR tree-optimization/114413
            * tree-vect-slp.cc (release_scalar_stmts_to_slp_tree_map):
            New function, split out from ...
            (vect_analyze_slp): ... here.  Call it.
            (vect_cse_slp_nodes): New function.
            (vect_optimize_slp): Call it.

            * gcc.dg/vect/bb-slp-32.c: Expect CSE and vectorization on x86.

Compilation takes an extremely long time in 510.parest_r.

The problem seems to be that vect_cse_slp_nodes visits the same nodes twice.
It looks like the function has no visited set, and the hot loop in parest (when
vectorizable thanks to libmvec) has many TWO_OPERANDS nodes and one of them is
rooted at the top level.

vect_cse_slp_nodes seems to skip VEC_PERM_EXPR but not it's children, as such
it ends up visiting the same subgraphs multiple times. The graph in parest has
so many TWO_OPERAND nodes that essentially compilation never finishes.

I believe this function needs a visited node set.

example call graph:

#334 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4132f40: 0x3df2ec0) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#335 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41321a0: 0x3df2b90) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#336 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4130b00: 0x3df2860) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#337 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41348a0: 0x3df2530) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#338 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3b8b0d0: 0x3df2310) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#339 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41348f0: 0x3dee928) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#340 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4134500: 0x3dee460) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#341 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3c14600: 0x3ded690) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#342 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3ca75f0: 0x3de7910) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#343 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3e28590: 0x3de8768) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#344 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3c2e4b8: 0x3de7778) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#345 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3da5e58: 0x3de7dd8) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#346 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41d0770: 0x3de7f70) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#347 0x00000000018a1f20 in vect_optimize_slp (vinfo=0x3e291c0) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6128

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
  2024-06-23  8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
@ 2024-06-23  8:49 ` rguenth at gcc dot gnu.org
  2024-06-23  9:29 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-23  8:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-06-23
   Target Milestone|---                         |15.0
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Mine.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
  2024-06-23  8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
  2024-06-23  8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
@ 2024-06-23  9:29 ` rguenth at gcc dot gnu.org
  2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-23  9:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, I feared this would happen - this case seems to be because of a lot of
VEC_PERM nodes(?) which are not handled by the CSE process as well as the
two-operator nodes which lack SLP_TREE_SCALAR_STMTS (we'd need NULL elements
there, something I need to add anyway).

The bst_map deals as "visited" map, but nodes not handled there would need
a "visited" set (but as said above, the plan is to reduce that set to zero).

I'll see to reproduce to confirm.  Usually a two-operator node shouldn't
be too bad since the next non-two-operator one will serve as 'visited' point
but in this graph we have several adjacent two-operator nodes without any
intermediate node handled by the bst-map processing code.  I can't reproduce
with -Ofast -march=znver2 though.

The easiest way to remedy the situation is probably to allow VEC_PERM_EXPR
CSE when the node has SLP_TREE_SCALAR_STMTS as two-operator nodes have.
So I _think_ the following should fix this.  I'm going to test it (on x86-64).

Can you check whether that fixes the issue?

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9465d94de1a..212d5f97f7d 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6085,7 +6085,6 @@ static void
 vect_cse_slp_nodes (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree& node)
 {
   if (SLP_TREE_DEF_TYPE (node) == vect_internal_def
-      && SLP_TREE_CODE (node) != VEC_PERM_EXPR
       /* Besides some VEC_PERM_EXPR, two-operator nodes also
         lack scalar stmts and thus CSE doesn't work via bst_map.  Ideally
         we'd have sth that works for all internal and external nodes.  */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
  2024-06-23  8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
  2024-06-23  8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
  2024-06-23  9:29 ` rguenth at gcc dot gnu.org
@ 2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
  2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-23 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597

--- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> 
> Can you check whether that fixes the issue?
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 9465d94de1a..212d5f97f7d 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -6085,7 +6085,6 @@ static void
>  vect_cse_slp_nodes (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree& node)
>  {
>    if (SLP_TREE_DEF_TYPE (node) == vect_internal_def
> -      && SLP_TREE_CODE (node) != VEC_PERM_EXPR
>        /* Besides some VEC_PERM_EXPR, two-operator nodes also
>          lack scalar stmts and thus CSE doesn't work via bst_map.  Ideally
>          we'd have sth that works for all internal and external nodes.  */

Yeah that seems to do it, can compile SPECFP again.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
  2024-06-23  8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
@ 2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
  2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
  2024-06-23 12:19 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-23 11:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597

--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> Ah, I feared this would happen - this case seems to be because of a lot of
> VEC_PERM nodes(?) which are not handled by the CSE process as well as the
> two-operator nodes which lack SLP_TREE_SCALAR_STMTS (we'd need NULL elements
> there, something I need to add anyway).
> 
> The bst_map deals as "visited" map, but nodes not handled there would need
> a "visited" set (but as said above, the plan is to reduce that set to zero).
> 

Ah I see, that makes sense.

> I'll see to reproduce to confirm.  Usually a two-operator node shouldn't
> be too bad since the next non-two-operator one will serve as 'visited' point
> but in this graph we have several adjacent two-operator nodes without any
> intermediate node handled by the bst-map processing code.  I can't reproduce
> with -Ofast -march=znver2 though.
> 

Yeah I forgot to mention I could only reproduce it with LTO and a recent glibc.

Thanks for the fix!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
  2024-06-23  8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
@ 2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
  2024-06-23 12:19 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-23 12:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:2a345214fc332b6f0821edf394ff8802b768db1d

commit r15-1565-g2a345214fc332b6f0821edf394ff8802b768db1d
Author: Richard Biener <rguenther@suse.de>
Date:   Sun Jun 23 11:26:39 2024 +0200

    tree-optimization/115597 - allow CSE of two-operator VEC_PERM nodes

    The following makes sure to always CSE when there's SLP_TREE_SCALAR_STMTS
    as otherwise a chain of two-operator node operations can result in
    exponential behavior of the CSE process as likely seen when building
    510.parest on aarch64.

            PR tree-optimization/115597
            * tree-vect-slp.cc (vect_cse_slp_nodes): Allow to CSE
            VEC_PERM nodes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
  2024-06-23  8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
@ 2024-06-23 12:19 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-23 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-23 12:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-23  8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
2024-06-23  8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
2024-06-23  9:29 ` rguenth at gcc dot gnu.org
2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
2024-06-23 12:19 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).