From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7879D3857809; Fri, 12 Feb 2021 10:11:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7879D3857809 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/96166] [10/11 Regression] -O3/-ftree-slp-vectorize turns ROL into a mess Date: Fri, 12 Feb 2021 10:11:12 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 10.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Feb 2021 10:11:12 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96166 --- Comment #4 from Jakub Jelinek --- Note that the rotate isn't something created by the bswap pass, it isn't re= ally byteswap, just swapping of two halves of the long long. It comes from expansion and combine. Expanding _9 =3D (int) _3; _10 =3D BIT_FIELD_REF <_3, 32, 32>; MEM[(int &)&y] =3D _10; MEM[(int &)&y + 4] =3D _9; _4 =3D MEM [(char * {ref-all})&y]; MEM [(char * {ref-all})x_2(D)] =3D _4; results in (insn 7 6 8 (parallel [ (set (reg:DI 88) (ashiftrt:DI (reg:DI 82 [ _3 ]) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) "pr96166.c":4:5 -1 (nil)) (insn 8 7 9 (set (reg:DI 89) (zero_extend:DI (subreg:SI (reg:DI 88) 0))) "pr96166.c":4:5 -1 (nil)) (insn 9 8 10 (set (reg:DI 91) (const_int -4294967296 [0xffffffff00000000])) "pr96166.c":4:5 -1 (nil)) (insn 10 9 11 (parallel [ (set (reg:DI 90) (and:DI (reg/v:DI 86 [ y ]) (reg:DI 91))) (clobber (reg:CC 17 flags)) ]) "pr96166.c":4:5 -1 (nil)) (insn 11 10 12 (parallel [ (set (reg:DI 92) (ior:DI (reg:DI 90) (reg:DI 89))) (clobber (reg:CC 17 flags)) ]) "pr96166.c":4:5 -1 (nil)) (insn 12 11 0 (set (reg/v:DI 86 [ y ]) (reg:DI 92)) "pr96166.c":4:5 -1 (nil)) (insn 13 12 14 (set (reg:DI 93) (zero_extend:DI (subreg:SI (reg:DI 82 [ _3 ]) 0))) "pr96166.c":5:5 = -1 (nil)) (insn 14 13 15 (parallel [ (set (reg:DI 94) (ashift:DI (reg:DI 93) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) "pr96166.c":5:5 -1 (nil)) (insn 15 14 16 (set (reg:DI 95) (zero_extend:DI (subreg:SI (reg/v:DI 86 [ y ]) 0))) "pr96166.c":5:5= -1 (nil)) (insn 16 15 17 (parallel [ (set (reg:DI 96) (ior:DI (reg:DI 95) (reg:DI 94))) (clobber (reg:CC 17 flags)) ]) "pr96166.c":5:5 -1 (nil)) (insn 17 16 0 (set (reg/v:DI 86 [ y ]) (reg:DI 96)) "pr96166.c":5:5 -1 (nil)) (insn 18 17 0 (set (mem:DI (reg/v/f:DI 87 [ x ]) [0 MEM [(char * {ref-all})x_2(D)]+0 S8 A8]) (reg/v:DI 86 [ y ])) "pr96166.c":13:19 -1 (nil)) (I must say I'm surprised y hasn't been forced into stack even when it is stored in parts) and then combine matches a rotate out of that. While with SLP vectorization, we end up with: _9 =3D (int) _3; _10 =3D BIT_FIELD_REF <_3, 32, 32>; - MEM[(int &)&y] =3D _10; - MEM[(int &)&y + 4] =3D _9; + _11 =3D {_10, _9}; + MEM [(int &)&y] =3D _11; _4 =3D MEM [(char * {ref-all})&y]; MEM [(char * {ref-all})x_2(D)] =3D _4; and aren't able to undo the vectorization during the RTL optimizations. I'm surprised costs suggest such vectorization is beneficial, constructing a vector just to store it into memory seems more expensive than just doing two stores, isn't it?=