[Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes
@ 2024-03-21 10:13 rguenth at gcc dot gnu.org
  2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-21 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413

            Bug ID: 114413
           Summary: BB SLP sub-graph merging fails to CSE nodes
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The gcc.dg/vect/bb-slp-32.c shows that while we now discover both the store
and the reduction as BB vectorization opportunities we merge the SLP
instances into the same graph because they overlap but fail to unify
nodes within them so both costing and code-generation is off duplicating
the load and the adds:

  <bb 2> [local count: 1073741824]:
  _36 = {a_12(D), b_15(D), b_15(D), a_12(D)};
  _30 = {a_12(D), b_15(D), b_15(D), a_12(D)};
  p_10 = __builtin_assume_aligned (p_9(D), 16);
  vectp.4_27 = p_10;
  vect__1.5_28 = MEM <vector(4) int> [(int *)vectp.4_27];
  vect__2.6_29 = vect__1.5_28 + { 1, 2, 3, 4 };
  vect_tem0_13.7_31 = vect__2.6_29 + _30;
  vectp.11_33 = p_10;
  vect__7.12_34 = MEM <vector(4) int> [(int *)vectp.11_33];
  vect__8.13_35 = vect__7.12_34 + { 1, 2, 3, 4 };
  vect_tem3_22.14_37 = vect__8.13_35 + _36;
  _1 = *p_10;
  _2 = _1 + 1;
  tem0_13 = _2 + a_12(D);
  _3 = MEM[(int *)p_10 + 4B];
  _4 = _3 + 2;
  tem1_16 = _4 + b_15(D); 
  sum_17 = tem0_13 + tem1_16;
  _5 = MEM[(int *)p_10 + 8B];
  _6 = _5 + 3;
  tem2_19 = _6 + b_15(D);
  sum_20 = sum_17 + tem2_19;
  _7 = MEM[(int *)p_10 + 12B];
  _8 = _7 + 4;
  tem3_22 = _8 + a_12(D);
  _38 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(vect_tem3_22.14_37);
  _39 = .REDUC_PLUS (_38);
  _40 = (int) _39;
  sum_23 = _40;
  MEM <vector(4) int> [(int *)&x] = vect_tem0_13.7_31;
  bar (&x);
  x ={v} {CLOBBER(eos)};

but the vectorization should be profitable, we CSE this to

foo:
.LFB0:
        .cfi_startproc
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        movd    %edx, %xmm2
        movd    %esi, %xmm0
        movdqa  %xmm2, %xmm3
        punpckldq       %xmm0, %xmm3
        punpckldq       %xmm2, %xmm0
        subq    $16, %rsp
        .cfi_def_cfa_offset 32
        movdqa  .LC0(%rip), %xmm1
        paddd   (%rdi), %xmm1
        punpcklqdq      %xmm3, %xmm0
        movq    %rsp, %rdi
        paddd   %xmm0, %xmm1
        movdqa  %xmm1, %xmm0
        movaps  %xmm1, (%rsp)
        psrldq  $8, %xmm0
        paddd   %xmm1, %xmm0
        movdqa  %xmm0, %xmm2
        psrldq  $4, %xmm2
        paddd   %xmm2, %xmm0
        movd    %xmm0, %ebx
        call    bar
        addq    $16, %rsp
        .cfi_def_cfa_offset 16
        movl    %ebx, %eax
        popq    %rbx
        .cfi_def_cfa_offset 8
        ret

in the end.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/114413] BB SLP sub-graph merging fails to CSE nodes
  2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org
@ 2024-06-19 10:55 ` rguenth at gcc dot gnu.org
  2024-06-20  6:48 ` cvs-commit at gcc dot gnu.org
  2024-06-20  7:05 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-19 10:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2024-06-19

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/114413] BB SLP sub-graph merging fails to CSE nodes
  2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org
  2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org
@ 2024-06-20  6:48 ` cvs-commit at gcc dot gnu.org
  2024-06-20  7:05 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-20  6:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413

--- Comment #1 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

commit r15-1467-g46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jun 19 12:57:27 2024 +0200

    tree-optimization/114413 - SLP CSE after permute optimization

    We currently fail to re-CSE SLP nodes after optimizing permutes
    which results in off cost estimates.  For gcc.dg/vect/bb-slp-32.c
    this shows in not re-using the SLP node with the load and arithmetic
    for both the store and the reduction.  The following implements
    CSE by re-bst-mapping nodes as finalization part of vect_optimize_slp.

    I've tried to make the CSE part of permute materialization but it
    isn't a very good fit there.  I've not bothered to implement something
    more complete, also handling external defs or defs without
    SLP_TREE_SCALAR_STMTS.

    I realize this might result in more BB SLP which in turn might slow
    down code given costing for BB SLP is difficult (even that we now
    vectorize gcc.dg/vect/bb-slp-32.c on x86_64 might be not a good idea).
    This is nevertheless feeding more accurate info to costing which is
    good.

            PR tree-optimization/114413
            * tree-vect-slp.cc (release_scalar_stmts_to_slp_tree_map):
            New function, split out from ...
            (vect_analyze_slp): ... here.  Call it.
            (vect_cse_slp_nodes): New function.
            (vect_optimize_slp): Call it.

            * gcc.dg/vect/bb-slp-32.c: Expect CSE and vectorization on x86.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/114413] BB SLP sub-graph merging fails to CSE nodes
  2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org
  2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org
  2024-06-20  6:48 ` cvs-commit at gcc dot gnu.org
@ 2024-06-20  7:05 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-20  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114413

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |15.0
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
This should be largely fixed now, the missing piece that might be important in
some cases is CSE of permutes (or two-operator nodes) and of extern CTORs.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-20  7:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21 10:13 [Bug tree-optimization/114413] New: BB SLP sub-graph merging fails to CSE nodes rguenth at gcc dot gnu.org
2024-06-19 10:55 ` [Bug tree-optimization/114413] " rguenth at gcc dot gnu.org
2024-06-20  6:48 ` cvs-commit at gcc dot gnu.org
2024-06-20  7:05 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).