[Bug tree-optimization/101895] New: [11/12 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "law at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/101895] New: [11/12 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities
Date: Fri, 13 Aug 2021 05:01:45 +0000	[thread overview]
Message-ID: <bug-101895-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101895

            Bug ID: 101895
           Summary: [11/12 Regression] SLP Vectorizer change pushes
                    VEC_PERM_EXPR into bad location spoiling further
                    optimization opportunities
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Consider this code:


void foo(int * restrict a, int b, int *c) {
  a[0] = c[0]*b + a[0];
  a[1] = c[2]*b + a[1];
  a[2] = c[1]*b + a[2];
  a[3] = c[3]*b + a[3];
}

Prior to this commit:

commit 126ed72b9f48f8530b194532cc281fb761690435
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Sep 30 17:08:01 2020 +0200

    optimize permutes in SLP, remove vect_attempt_slp_rearrange_stmts

    This introduces a permute optimization phase for SLP which is
    intended to cover the existing permute eliding for SLP reductions
    plus handling commonizing the easy cases.

    It currently uses graphds to compute a postorder on the reverse
    SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts
    did (hopefully - I've adjusted most testcases that triggered it
    a few days ago).  It restricts itself to move around bijective
    permutations to simplify things for now, mainly around constant nodes.

    As a prerequesite it makes the SLP graph cyclic (ugh).  It looks
    like it would pay off to compute a PRE/POST order visit array
    once and elide all the recursive SLP graph walks and their
    visited hash-set.  At least for the time where we do not change
    the SLP graph during such walk.

    I do not like using graphds too much but at least I don't have to
    re-implement yet another RPO walk, so maybe it isn't too bad.

    It now computes permute placement during iteration and thus should
    get cycles more obviously correct.

[ ... ]

GCC would generate this (x86_64 -O3 -march=native):

  vect__1.6_27 = VEC_PERM_EXPR <vect__1.5_25, vect__1.5_25, { 0, 2, 1, 3 }>;
  vect__2.7_29 = vect__1.6_27 * _28;
  _1 = *c_18(D);
  _2 = _1 * b_19(D);
  vectp.9_30 = a_20(D);
  vect__3.10_31 = MEM <vector(4) int> [(int *)vectp.9_30];
  vect__4.11_32 = vect__2.7_29 + vect__3.10_31;



This is good.  Note how the VEC_PERM_EXPR happens before the vector multiply
and how the vector multiply directly feeds the vector add.  On our target we
have a vector multiply-add which would be generated and all is good.


After the above change we generate this:

  vect__2.6_28 = vect__1.5_25 * _27;
  _29 = VEC_PERM_EXPR <vect__2.6_28, vect__2.6_28, { 0, 2, 1, 3 }>;
  _1 = *c_18(D);
  _2 = _1 * b_19(D);
  vectp.8_30 = a_20(D);
  vect__3.9_31 = MEM <vector(4) int> [(int *)vectp.8_30];
  vect__4.10_32 = _29 + vect__3.9_31;


Note how we have the vmul, then permute, then vadd.  This spoils our ability to
generate a vmadd.  This behavior is still seen on the trunk as well.

Conceptually it seems to me that having a permute at the start or end of a
chain of vector operations is better than moving the permute into the middle of
a chain of dependent vector operations.

We could probably fix this in the backend with some special patterns, but ISTM
that getting it right in SLP would be better.

next             reply	other threads:[~2021-08-13  5:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-13  5:01 law at gcc dot gnu.org [this message]
2021-08-13  5:24 ` [Bug tree-optimization/101895] " pinskia at gcc dot gnu.org
2021-08-13  5:24 ` pinskia at gcc dot gnu.org
2021-08-16  7:49 ` rguenth at gcc dot gnu.org
2021-08-25  4:40 ` law at gcc dot gnu.org
2021-08-25  7:19 ` rguenth at gcc dot gnu.org
2022-03-11 23:51 ` roger at nextmovesoftware dot com
2022-03-15  9:06 ` cvs-commit at gcc dot gnu.org
2022-03-15 23:44 ` roger at nextmovesoftware dot com
2022-03-16  7:10 ` [Bug tree-optimization/101895] [11 " rguenth at gcc dot gnu.org
2022-03-16 23:13 ` law at gcc dot gnu.org
2022-03-17 23:18 ` law at gcc dot gnu.org
2022-04-21  7:50 ` rguenth at gcc dot gnu.org
2023-05-29 10:05 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-101895-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).