public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
@ 2024-06-23 8:24 tnfchris at gcc dot gnu.org
2024-06-23 8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-23 8:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
Bug ID: 115597
Summary: [15 Regression] vectorizer takes 20+ h compiling
510.parest in SPECCPU2017 since
g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: compile-time-hog
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Target Milestone: ---
Target: aarch64*
Created attachment 58496
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58496&action=edit
slp dump graph
Since:
commit 46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 (HEAD)
Author: Richard Biener <rguenther@suse.de>
Date: Wed Jun 19 12:57:27 2024 +0200
tree-optimization/114413 - SLP CSE after permute optimization
We currently fail to re-CSE SLP nodes after optimizing permutes
which results in off cost estimates. For gcc.dg/vect/bb-slp-32.c
this shows in not re-using the SLP node with the load and arithmetic
for both the store and the reduction. The following implements
CSE by re-bst-mapping nodes as finalization part of vect_optimize_slp.
I've tried to make the CSE part of permute materialization but it
isn't a very good fit there. I've not bothered to implement something
more complete, also handling external defs or defs without
SLP_TREE_SCALAR_STMTS.
I realize this might result in more BB SLP which in turn might slow
down code given costing for BB SLP is difficult (even that we now
vectorize gcc.dg/vect/bb-slp-32.c on x86_64 might be not a good idea).
This is nevertheless feeding more accurate info to costing which is
good.
PR tree-optimization/114413
* tree-vect-slp.cc (release_scalar_stmts_to_slp_tree_map):
New function, split out from ...
(vect_analyze_slp): ... here. Call it.
(vect_cse_slp_nodes): New function.
(vect_optimize_slp): Call it.
* gcc.dg/vect/bb-slp-32.c: Expect CSE and vectorization on x86.
Compilation takes an extremely long time in 510.parest_r.
The problem seems to be that vect_cse_slp_nodes visits the same nodes twice.
It looks like the function has no visited set, and the hot loop in parest (when
vectorizable thanks to libmvec) has many TWO_OPERANDS nodes and one of them is
rooted at the top level.
vect_cse_slp_nodes seems to skip VEC_PERM_EXPR but not it's children, as such
it ends up visiting the same subgraphs multiple times. The graph in parest has
so many TWO_OPERAND nodes that essentially compilation never finishes.
I believe this function needs a visited node set.
example call graph:
#334 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4132f40: 0x3df2ec0) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#335 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41321a0: 0x3df2b90) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#336 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4130b00: 0x3df2860) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#337 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41348a0: 0x3df2530) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#338 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3b8b0d0: 0x3df2310) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#339 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41348f0: 0x3dee928) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#340 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4134500: 0x3dee460) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#341 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3c14600: 0x3ded690) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#342 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3ca75f0: 0x3de7910) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#343 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3e28590: 0x3de8768) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#344 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3c2e4b8: 0x3de7778) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#345 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3da5e58: 0x3de7dd8) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#346 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41d0770: 0x3de7f70) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#347 0x00000000018a1f20 in vect_optimize_slp (vinfo=0x3e291c0) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6128
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
2024-06-23 8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
@ 2024-06-23 8:49 ` rguenth at gcc dot gnu.org
2024-06-23 9:29 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-23 8:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-06-23
Target Milestone|--- |15.0
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Status|UNCONFIRMED |ASSIGNED
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Mine.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
2024-06-23 8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
2024-06-23 8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
@ 2024-06-23 9:29 ` rguenth at gcc dot gnu.org
2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-23 9:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, I feared this would happen - this case seems to be because of a lot of
VEC_PERM nodes(?) which are not handled by the CSE process as well as the
two-operator nodes which lack SLP_TREE_SCALAR_STMTS (we'd need NULL elements
there, something I need to add anyway).
The bst_map deals as "visited" map, but nodes not handled there would need
a "visited" set (but as said above, the plan is to reduce that set to zero).
I'll see to reproduce to confirm. Usually a two-operator node shouldn't
be too bad since the next non-two-operator one will serve as 'visited' point
but in this graph we have several adjacent two-operator nodes without any
intermediate node handled by the bst-map processing code. I can't reproduce
with -Ofast -march=znver2 though.
The easiest way to remedy the situation is probably to allow VEC_PERM_EXPR
CSE when the node has SLP_TREE_SCALAR_STMTS as two-operator nodes have.
So I _think_ the following should fix this. I'm going to test it (on x86-64).
Can you check whether that fixes the issue?
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9465d94de1a..212d5f97f7d 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6085,7 +6085,6 @@ static void
vect_cse_slp_nodes (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree& node)
{
if (SLP_TREE_DEF_TYPE (node) == vect_internal_def
- && SLP_TREE_CODE (node) != VEC_PERM_EXPR
/* Besides some VEC_PERM_EXPR, two-operator nodes also
lack scalar stmts and thus CSE doesn't work via bst_map. Ideally
we'd have sth that works for all internal and external nodes. */
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
2024-06-23 8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
2024-06-23 8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
2024-06-23 9:29 ` rguenth at gcc dot gnu.org
@ 2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-23 10:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
--- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
>
> Can you check whether that fixes the issue?
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 9465d94de1a..212d5f97f7d 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -6085,7 +6085,6 @@ static void
> vect_cse_slp_nodes (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree& node)
> {
> if (SLP_TREE_DEF_TYPE (node) == vect_internal_def
> - && SLP_TREE_CODE (node) != VEC_PERM_EXPR
> /* Besides some VEC_PERM_EXPR, two-operator nodes also
> lack scalar stmts and thus CSE doesn't work via bst_map. Ideally
> we'd have sth that works for all internal and external nodes. */
Yeah that seems to do it, can compile SPECFP again.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
2024-06-23 8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
` (2 preceding siblings ...)
2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
@ 2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
2024-06-23 12:19 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-23 11:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> Ah, I feared this would happen - this case seems to be because of a lot of
> VEC_PERM nodes(?) which are not handled by the CSE process as well as the
> two-operator nodes which lack SLP_TREE_SCALAR_STMTS (we'd need NULL elements
> there, something I need to add anyway).
>
> The bst_map deals as "visited" map, but nodes not handled there would need
> a "visited" set (but as said above, the plan is to reduce that set to zero).
>
Ah I see, that makes sense.
> I'll see to reproduce to confirm. Usually a two-operator node shouldn't
> be too bad since the next non-two-operator one will serve as 'visited' point
> but in this graph we have several adjacent two-operator nodes without any
> intermediate node handled by the bst-map processing code. I can't reproduce
> with -Ofast -march=znver2 though.
>
Yeah I forgot to mention I could only reproduce it with LTO and a recent glibc.
Thanks for the fix!
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
2024-06-23 8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
` (3 preceding siblings ...)
2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
@ 2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
2024-06-23 12:19 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-23 12:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:2a345214fc332b6f0821edf394ff8802b768db1d
commit r15-1565-g2a345214fc332b6f0821edf394ff8802b768db1d
Author: Richard Biener <rguenther@suse.de>
Date: Sun Jun 23 11:26:39 2024 +0200
tree-optimization/115597 - allow CSE of two-operator VEC_PERM nodes
The following makes sure to always CSE when there's SLP_TREE_SCALAR_STMTS
as otherwise a chain of two-operator node operations can result in
exponential behavior of the CSE process as likely seen when building
510.parest on aarch64.
PR tree-optimization/115597
* tree-vect-slp.cc (vect_cse_slp_nodes): Allow to CSE
VEC_PERM nodes.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
2024-06-23 8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
` (4 preceding siblings ...)
2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
@ 2024-06-23 12:19 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-23 12:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-23 12:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-23 8:24 [Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 tnfchris at gcc dot gnu.org
2024-06-23 8:49 ` [Bug middle-end/115597] " rguenth at gcc dot gnu.org
2024-06-23 9:29 ` rguenth at gcc dot gnu.org
2024-06-23 10:29 ` tnfchris at gcc dot gnu.org
2024-06-23 11:00 ` tnfchris at gcc dot gnu.org
2024-06-23 12:18 ` cvs-commit at gcc dot gnu.org
2024-06-23 12:19 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).