public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "law at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/101895] New: [11/12 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities Date: Fri, 13 Aug 2021 05:01:45 +0000 [thread overview] Message-ID: <bug-101895-4@http.gcc.gnu.org/bugzilla/> (raw) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101895 Bug ID: 101895 Summary: [11/12 Regression] SLP Vectorizer change pushes VEC_PERM_EXPR into bad location spoiling further optimization opportunities Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Consider this code: void foo(int * restrict a, int b, int *c) { a[0] = c[0]*b + a[0]; a[1] = c[2]*b + a[1]; a[2] = c[1]*b + a[2]; a[3] = c[3]*b + a[3]; } Prior to this commit: commit 126ed72b9f48f8530b194532cc281fb761690435 Author: Richard Biener <rguenther@suse.de> Date: Wed Sep 30 17:08:01 2020 +0200 optimize permutes in SLP, remove vect_attempt_slp_rearrange_stmts This introduces a permute optimization phase for SLP which is intended to cover the existing permute eliding for SLP reductions plus handling commonizing the easy cases. It currently uses graphds to compute a postorder on the reverse SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts did (hopefully - I've adjusted most testcases that triggered it a few days ago). It restricts itself to move around bijective permutations to simplify things for now, mainly around constant nodes. As a prerequesite it makes the SLP graph cyclic (ugh). It looks like it would pay off to compute a PRE/POST order visit array once and elide all the recursive SLP graph walks and their visited hash-set. At least for the time where we do not change the SLP graph during such walk. I do not like using graphds too much but at least I don't have to re-implement yet another RPO walk, so maybe it isn't too bad. It now computes permute placement during iteration and thus should get cycles more obviously correct. [ ... ] GCC would generate this (x86_64 -O3 -march=native): vect__1.6_27 = VEC_PERM_EXPR <vect__1.5_25, vect__1.5_25, { 0, 2, 1, 3 }>; vect__2.7_29 = vect__1.6_27 * _28; _1 = *c_18(D); _2 = _1 * b_19(D); vectp.9_30 = a_20(D); vect__3.10_31 = MEM <vector(4) int> [(int *)vectp.9_30]; vect__4.11_32 = vect__2.7_29 + vect__3.10_31; This is good. Note how the VEC_PERM_EXPR happens before the vector multiply and how the vector multiply directly feeds the vector add. On our target we have a vector multiply-add which would be generated and all is good. After the above change we generate this: vect__2.6_28 = vect__1.5_25 * _27; _29 = VEC_PERM_EXPR <vect__2.6_28, vect__2.6_28, { 0, 2, 1, 3 }>; _1 = *c_18(D); _2 = _1 * b_19(D); vectp.8_30 = a_20(D); vect__3.9_31 = MEM <vector(4) int> [(int *)vectp.8_30]; vect__4.10_32 = _29 + vect__3.9_31; Note how we have the vmul, then permute, then vadd. This spoils our ability to generate a vmadd. This behavior is still seen on the trunk as well. Conceptually it seems to me that having a permute at the start or end of a chain of vector operations is better than moving the permute into the middle of a chain of dependent vector operations. We could probably fix this in the backend with some special patterns, but ISTM that getting it right in SLP would be better.
next reply other threads:[~2021-08-13 5:01 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-13 5:01 law at gcc dot gnu.org [this message] 2021-08-13 5:24 ` [Bug tree-optimization/101895] " pinskia at gcc dot gnu.org 2021-08-13 5:24 ` pinskia at gcc dot gnu.org 2021-08-16 7:49 ` rguenth at gcc dot gnu.org 2021-08-25 4:40 ` law at gcc dot gnu.org 2021-08-25 7:19 ` rguenth at gcc dot gnu.org 2022-03-11 23:51 ` roger at nextmovesoftware dot com 2022-03-15 9:06 ` cvs-commit at gcc dot gnu.org 2022-03-15 23:44 ` roger at nextmovesoftware dot com 2022-03-16 7:10 ` [Bug tree-optimization/101895] [11 " rguenth at gcc dot gnu.org 2022-03-16 23:13 ` law at gcc dot gnu.org 2022-03-17 23:18 ` law at gcc dot gnu.org 2022-04-21 7:50 ` rguenth at gcc dot gnu.org 2023-05-29 10:05 ` jakub at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-101895-4@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).