From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0BA473AA9415; Thu, 4 Mar 2021 12:14:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0BA473AA9415 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af Date: Thu, 04 Mar 2021 12:14:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc keywords Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Mar 2021 12:14:23 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vmakarov at gcc dot gnu.org Keywords| |ra --- Comment #17 from Richard Biener --- So coming back here. We're presenting RA with a quite hard problem given we have (insn 7 4 8 2 (set (reg:TI 84 [ _9 ]) (mem:TI (reg:DI 101) [0 MEM <__int128 unsigned> [(char * {ref-all})in_8(D)]+0 S16 A8])) 73 {*movti_internal} (expr_list:REG_DEAD (reg:DI 101) (nil))) (insn 8 7 9 2 (parallel [ (set (reg:DI 95) (lshiftrt:DI (subreg:DI (reg:TI 84 [ _9 ]) 8) (const_int 63 [0x3f]))) (clobber (reg:CC 17 flags)) ]) "t.c":7:26 703 {*lshrdi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) .. (insn 10 9 11 2 (parallel [ (set (reg:DI 97) (lshiftrt:DI (subreg:DI (reg:TI 84 [ _9 ]) 0) (const_int 63 [0x3f]))) (clobber (reg:CC 17 flags)) ]) "t.c":8:30 703 {*lshrdi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags) .. (insn 12 11 13 2 (set (reg:V2DI 98 [ vect__5.3 ]) (ashift:V2DI (subreg:V2DI (reg:TI 84 [ _9 ]) 0) (const_int 1 [0x1]))) "t.c":9:16 3611 {ashlv2di3} (expr_list:REG_DEAD (reg:TI 84 [ _9 ]) (nil))) where I wonder why we keep the (subreg:DI (reg:TI 84 ...) 8) around for so long. Probably the subreg pass gives up because of the V2DImode subreg of that reg. That said RA chooses xmm for reg:84 but then spills it immediately to fulfil the subregs even though there's mov and pextrd that could be used or the reload could use the original mem. That we reload even the xmm use is another odd thing. Vlad, I'm not sure about the possibilities LRA has here but maybe you can have a look at the testcase in comment#6 (use -O3 -march=3Dznver2 or -march=3Dcore-avx2). For one I expected vmovdqu (%rsi), %xmm2 vmovdqa %xmm2, -24(%rsp) movq -16(%rsp), %rax (2a) vmovdqa -24(%rsp), %xmm4 (1) ... movq -24(%rsp), %rdx (2b) (1) to be not there (not sure how that even survives postreload optimizations...) (2a/b) to be 'inherited' by instead loading from (%rsi) and 8(%rsi) which is maybe too much being asked because it requires aliasing considerations That is, even if we don't consider using movq %xmm2, %rax (2a) pextrd %xmm2, %rdx, 1 (2b) I expected us to not spill.=