public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes @ 2024-03-21 10:13 rguenth at gcc dot gnu.org 2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-03-21 10:13 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413 Bug ID: 114413 Summary: BB SLP sub-graph merging fails to CSE nodes Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- The gcc.dg/vect/bb-slp-32.c shows that while we now discover both the store and the reduction as BB vectorization opportunities we merge the SLP instances into the same graph because they overlap but fail to unify nodes within them so both costing and code-generation is off duplicating the load and the adds: <bb 2> [local count: 1073741824]: _36 = {a_12(D), b_15(D), b_15(D), a_12(D)}; _30 = {a_12(D), b_15(D), b_15(D), a_12(D)}; p_10 = __builtin_assume_aligned (p_9(D), 16); vectp.4_27 = p_10; vect__1.5_28 = MEM <vector(4) int> [(int *)vectp.4_27]; vect__2.6_29 = vect__1.5_28 + { 1, 2, 3, 4 }; vect_tem0_13.7_31 = vect__2.6_29 + _30; vectp.11_33 = p_10; vect__7.12_34 = MEM <vector(4) int> [(int *)vectp.11_33]; vect__8.13_35 = vect__7.12_34 + { 1, 2, 3, 4 }; vect_tem3_22.14_37 = vect__8.13_35 + _36; _1 = *p_10; _2 = _1 + 1; tem0_13 = _2 + a_12(D); _3 = MEM[(int *)p_10 + 4B]; _4 = _3 + 2; tem1_16 = _4 + b_15(D); sum_17 = tem0_13 + tem1_16; _5 = MEM[(int *)p_10 + 8B]; _6 = _5 + 3; tem2_19 = _6 + b_15(D); sum_20 = sum_17 + tem2_19; _7 = MEM[(int *)p_10 + 12B]; _8 = _7 + 4; tem3_22 = _8 + a_12(D); _38 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(vect_tem3_22.14_37); _39 = .REDUC_PLUS (_38); _40 = (int) _39; sum_23 = _40; MEM <vector(4) int> [(int *)&x] = vect_tem0_13.7_31; bar (&x); x ={v} {CLOBBER(eos)}; but the vectorization should be profitable, we CSE this to foo: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movd %edx, %xmm2 movd %esi, %xmm0 movdqa %xmm2, %xmm3 punpckldq %xmm0, %xmm3 punpckldq %xmm2, %xmm0 subq $16, %rsp .cfi_def_cfa_offset 32 movdqa .LC0(%rip), %xmm1 paddd (%rdi), %xmm1 punpcklqdq %xmm3, %xmm0 movq %rsp, %rdi paddd %xmm0, %xmm1 movdqa %xmm1, %xmm0 movaps %xmm1, (%rsp) psrldq $8, %xmm0 paddd %xmm1, %xmm0 movdqa %xmm0, %xmm2 psrldq $4, %xmm2 paddd %xmm2, %xmm0 movd %xmm0, %ebx call bar addq $16, %rsp .cfi_def_cfa_offset 16 movl %ebx, %eax popq %rbx .cfi_def_cfa_offset 8 ret in the end. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/114413] BB SLP sub-graph merging fails to CSE nodes 2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org @ 2024-06-19 10:55 ` rguenth at gcc dot gnu.org 2024-06-20 6:48 ` cvs-commit at gcc dot gnu.org 2024-06-20 7:05 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-06-19 10:55 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed| |2024-06-19 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/114413] BB SLP sub-graph merging fails to CSE nodes 2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org 2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org @ 2024-06-20 6:48 ` cvs-commit at gcc dot gnu.org 2024-06-20 7:05 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2024-06-20 6:48 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413 --- Comment #1 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 commit r15-1467-g46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 Author: Richard Biener <rguenther@suse.de> Date: Wed Jun 19 12:57:27 2024 +0200 tree-optimization/114413 - SLP CSE after permute optimization We currently fail to re-CSE SLP nodes after optimizing permutes which results in off cost estimates. For gcc.dg/vect/bb-slp-32.c this shows in not re-using the SLP node with the load and arithmetic for both the store and the reduction. The following implements CSE by re-bst-mapping nodes as finalization part of vect_optimize_slp. I've tried to make the CSE part of permute materialization but it isn't a very good fit there. I've not bothered to implement something more complete, also handling external defs or defs without SLP_TREE_SCALAR_STMTS. I realize this might result in more BB SLP which in turn might slow down code given costing for BB SLP is difficult (even that we now vectorize gcc.dg/vect/bb-slp-32.c on x86_64 might be not a good idea). This is nevertheless feeding more accurate info to costing which is good. PR tree-optimization/114413 * tree-vect-slp.cc (release_scalar_stmts_to_slp_tree_map): New function, split out from ... (vect_analyze_slp): ... here. Call it. (vect_cse_slp_nodes): New function. (vect_optimize_slp): Call it. * gcc.dg/vect/bb-slp-32.c: Expect CSE and vectorization on x86. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/114413] BB SLP sub-graph merging fails to CSE nodes 2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org 2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org 2024-06-20 6:48 ` cvs-commit at gcc dot gnu.org @ 2024-06-20 7:05 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-06-20 7:05 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |15.0 Status|ASSIGNED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- This should be largely fixed now, the missing piece that might be important in some cases is CSE of permutes (or two-operator nodes) and of extern CTORs. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-20 7:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org 2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org 2024-06-20 6:48 ` cvs-commit at gcc dot gnu.org 2024-06-20 7:05 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).