public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/101895] New: [11/12 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities
@ 2021-08-13  5:01 law at gcc dot gnu.org
  2021-08-13  5:24 ` [Bug tree-optimization/101895] " pinskia at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: law at gcc dot gnu.org @ 2021-08-13  5:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101895

            Bug ID: 101895
           Summary: [11/12 Regression] SLP Vectorizer change pushes
                    VEC_PERM_EXPR into bad location spoiling further
                    optimization opportunities
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Consider this code:


void foo(int * restrict a, int b, int *c) {
  a[0] = c[0]*b + a[0];
  a[1] = c[2]*b + a[1];
  a[2] = c[1]*b + a[2];
  a[3] = c[3]*b + a[3];
}

Prior to this commit:

commit 126ed72b9f48f8530b194532cc281fb761690435
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Sep 30 17:08:01 2020 +0200

    optimize permutes in SLP, remove vect_attempt_slp_rearrange_stmts

    This introduces a permute optimization phase for SLP which is
    intended to cover the existing permute eliding for SLP reductions
    plus handling commonizing the easy cases.

    It currently uses graphds to compute a postorder on the reverse
    SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts
    did (hopefully - I've adjusted most testcases that triggered it
    a few days ago).  It restricts itself to move around bijective
    permutations to simplify things for now, mainly around constant nodes.

    As a prerequesite it makes the SLP graph cyclic (ugh).  It looks
    like it would pay off to compute a PRE/POST order visit array
    once and elide all the recursive SLP graph walks and their
    visited hash-set.  At least for the time where we do not change
    the SLP graph during such walk.

    I do not like using graphds too much but at least I don't have to
    re-implement yet another RPO walk, so maybe it isn't too bad.

    It now computes permute placement during iteration and thus should
    get cycles more obviously correct.

[ ... ]

GCC would generate this (x86_64 -O3 -march=native):

  vect__1.6_27 = VEC_PERM_EXPR <vect__1.5_25, vect__1.5_25, { 0, 2, 1, 3 }>;
  vect__2.7_29 = vect__1.6_27 * _28;
  _1 = *c_18(D);
  _2 = _1 * b_19(D);
  vectp.9_30 = a_20(D);
  vect__3.10_31 = MEM <vector(4) int> [(int *)vectp.9_30];
  vect__4.11_32 = vect__2.7_29 + vect__3.10_31;



This is good.  Note how the VEC_PERM_EXPR happens before the vector multiply
and how the vector multiply directly feeds the vector add.  On our target we
have a vector multiply-add which would be generated and all is good.


After the above change we generate this:

  vect__2.6_28 = vect__1.5_25 * _27;
  _29 = VEC_PERM_EXPR <vect__2.6_28, vect__2.6_28, { 0, 2, 1, 3 }>;
  _1 = *c_18(D);
  _2 = _1 * b_19(D);
  vectp.8_30 = a_20(D);
  vect__3.9_31 = MEM <vector(4) int> [(int *)vectp.8_30];
  vect__4.10_32 = _29 + vect__3.9_31;


Note how we have the vmul, then permute, then vadd.  This spoils our ability to
generate a vmadd.  This behavior is still seen on the trunk as well.

Conceptually it seems to me that having a permute at the start or end of a
chain of vector operations is better than moving the permute into the middle of
a chain of dependent vector operations.

We could probably fix this in the backend with some special patterns, but ISTM
that getting it right in SLP would be better.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-05-29 10:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13  5:01 [Bug tree-optimization/101895] New: [11/12 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities law at gcc dot gnu.org
2021-08-13  5:24 ` [Bug tree-optimization/101895] " pinskia at gcc dot gnu.org
2021-08-13  5:24 ` pinskia at gcc dot gnu.org
2021-08-16  7:49 ` rguenth at gcc dot gnu.org
2021-08-25  4:40 ` law at gcc dot gnu.org
2021-08-25  7:19 ` rguenth at gcc dot gnu.org
2022-03-11 23:51 ` roger at nextmovesoftware dot com
2022-03-15  9:06 ` cvs-commit at gcc dot gnu.org
2022-03-15 23:44 ` roger at nextmovesoftware dot com
2022-03-16  7:10 ` [Bug tree-optimization/101895] [11 " rguenth at gcc dot gnu.org
2022-03-16 23:13 ` law at gcc dot gnu.org
2022-03-17 23:18 ` law at gcc dot gnu.org
2022-04-21  7:50 ` rguenth at gcc dot gnu.org
2023-05-29 10:05 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).