[Bug target/96166] [10/11 Regression] -O3/-ftree-slp-vectorize turns ROL into a mess

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/96166] [10/11 Regression] -O3/-ftree-slp-vectorize turns ROL into a mess
Date: Fri, 12 Feb 2021 10:11:12 +0000	[thread overview]
Message-ID: <bug-96166-4-yWU8y0iOjS@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-96166-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96166

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note that the rotate isn't something created by the bswap pass, it isn't really
byteswap, just swapping of two halves of the long long.
It comes from expansion and combine.  Expanding
  _9 = (int) _3;
  _10 = BIT_FIELD_REF <_3, 32, 32>;
  MEM[(int &)&y] = _10;
  MEM[(int &)&y + 4] = _9;
  _4 = MEM <long unsigned int> [(char * {ref-all})&y];
  MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _4;
results in
(insn 7 6 8 (parallel [
            (set (reg:DI 88)
                (ashiftrt:DI (reg:DI 82 [ _3 ])
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) "pr96166.c":4:5 -1
     (nil))

(insn 8 7 9 (set (reg:DI 89)
        (zero_extend:DI (subreg:SI (reg:DI 88) 0))) "pr96166.c":4:5 -1
     (nil))

(insn 9 8 10 (set (reg:DI 91)
        (const_int -4294967296 [0xffffffff00000000])) "pr96166.c":4:5 -1
     (nil))

(insn 10 9 11 (parallel [
            (set (reg:DI 90)
                (and:DI (reg/v:DI 86 [ y ])
                    (reg:DI 91)))
            (clobber (reg:CC 17 flags))
        ]) "pr96166.c":4:5 -1
     (nil))

(insn 11 10 12 (parallel [
            (set (reg:DI 92)
                (ior:DI (reg:DI 90)
                    (reg:DI 89)))
            (clobber (reg:CC 17 flags))
        ]) "pr96166.c":4:5 -1
     (nil))

(insn 12 11 0 (set (reg/v:DI 86 [ y ])
        (reg:DI 92)) "pr96166.c":4:5 -1
     (nil))

(insn 13 12 14 (set (reg:DI 93)
        (zero_extend:DI (subreg:SI (reg:DI 82 [ _3 ]) 0))) "pr96166.c":5:5 -1
     (nil))

(insn 14 13 15 (parallel [
            (set (reg:DI 94)
                (ashift:DI (reg:DI 93)
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) "pr96166.c":5:5 -1
     (nil))

(insn 15 14 16 (set (reg:DI 95)
        (zero_extend:DI (subreg:SI (reg/v:DI 86 [ y ]) 0))) "pr96166.c":5:5 -1
     (nil))

(insn 16 15 17 (parallel [
            (set (reg:DI 96)
                (ior:DI (reg:DI 95)
                    (reg:DI 94)))
            (clobber (reg:CC 17 flags))
        ]) "pr96166.c":5:5 -1
     (nil))

(insn 17 16 0 (set (reg/v:DI 86 [ y ])
        (reg:DI 96)) "pr96166.c":5:5 -1
     (nil))

(insn 18 17 0 (set (mem:DI (reg/v/f:DI 87 [ x ]) [0 MEM <long unsigned int>
[(char * {ref-all})x_2(D)]+0 S8 A8])
        (reg/v:DI 86 [ y ])) "pr96166.c":13:19 -1
     (nil))

(I must say I'm surprised y hasn't been forced into stack even when it is
stored in parts) and then combine matches a rotate out of that.
While with SLP vectorization, we end up with:
   _9 = (int) _3;
   _10 = BIT_FIELD_REF <_3, 32, 32>;
-  MEM[(int &)&y] = _10;
-  MEM[(int &)&y + 4] = _9;
+  _11 = {_10, _9};
+  MEM <vector(2) int> [(int &)&y] = _11;
   _4 = MEM <long unsigned int> [(char * {ref-all})&y];
   MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _4;
and aren't able to undo the vectorization during the RTL optimizations.
I'm surprised costs suggest such vectorization is beneficial, constructing a
vector just to store it into memory seems more expensive than just doing two
stores, isn't it?

next prev parent reply	other threads:[~2021-02-12 10:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-11 16:01 [Bug tree-optimization/96166] New: " nok.raven at gmail dot com
2020-07-13  8:05 ` [Bug target/96166] " rguenth at gcc dot gnu.org
2020-07-23  6:51 ` rguenth at gcc dot gnu.org
2020-10-12 12:47 ` rguenth at gcc dot gnu.org
2021-02-11 15:00 ` jakub at gcc dot gnu.org
2021-02-12 10:11 ` jakub at gcc dot gnu.org [this message]
2021-02-12 11:18 ` jakub at gcc dot gnu.org
2021-02-12 12:17 ` pinskia at gcc dot gnu.org
2021-02-12 12:21 ` jakub at gcc dot gnu.org
2021-02-12 13:53 ` jakub at gcc dot gnu.org
2021-02-12 14:03 ` rguenth at gcc dot gnu.org
2021-02-12 14:40 ` jakub at gcc dot gnu.org
2021-02-13  9:33 ` cvs-commit at gcc dot gnu.org
2021-02-13  9:34 ` [Bug target/96166] [10 " jakub at gcc dot gnu.org
2021-04-08 12:02 ` rguenth at gcc dot gnu.org
2022-06-28 10:41 ` jakub at gcc dot gnu.org
2023-07-07  8:55 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-96166-4-yWU8y0iOjS@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).