From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C269D386F444; Fri, 5 Mar 2021 12:27:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C269D386F444 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af Date: Fri, 05 Mar 2021 12:27:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Mar 2021 12:27:54 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856 --- Comment #29 from Richard Biener --- (In reply to Uro=C5=A1 Bizjak from comment #27) > (In reply to Richard Biener from comment #26) > > but that doesn't seem to match for some unknown reason. >=20 > Try this: >=20 > (define_peephole2 > [(match_scratch:DI 5 "Yv") > (set (match_operand:DI 0 "sse_reg_operand") > (match_operand:DI 1 "general_reg_operand")) > (set (match_operand:V2DI 2 "sse_reg_operand") > (vec_concat:V2DI (match_operand:DI 3 "sse_reg_operand") > (match_operand:DI 4 "nonimmediate_gr_operand")))] > "" > [(set (match_dup 0) > (match_dup 1)) > (set (match_dup 5) > (match_dup 4)) > (set (match_dup 2) > (vec_concat:V2DI (match_dup 3) > (match_dup 5)))]) Ah, I messed up operands. The following works (the above position of match_scratch happily chooses an operand matching operand 0): ;; Further split pinsrq variants of vec_concatv2di with two GPR sources, ;; one already reloaded, to hide the latency of one GPR->XMM transitions. (define_peephole2 [(set (match_operand:DI 0 "sse_reg_operand") (match_operand:DI 1 "general_reg_operand")) (match_scratch:DI 2 "Yv") (set (match_operand:V2DI 3 "sse_reg_operand") (vec_concat:V2DI (match_dup 0) (match_operand:DI 4 "nonimmediate_gr_operand")))] "reload_completed && optimize_insn_for_speed_p ()" [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (match_dup 4)) (set (match_dup 3) (vec_concat:V2DI (match_dup 0) (match_dup 2)))]) but for some reason it again doesn't work for the important loop. There we have 389: xmm0:DI=3Dcx:DI REG_DEAD cx:DI 390: dx:DI=3D[sp:DI+0x10] 56: {dx:DI=3Ddx:DI 0>>0x3f;clobber flags:CC;} REG_UNUSED flags:CC 57: xmm0:V2DI=3Dvec_concat(xmm0:DI,dx:DI) I suppose the reason is that there's two unrelated insns between the xmm0 =3D cx:DI and the vec_concat. Which would hint that we somehow need to not match this GPR->XMM move in the peephole pattern but instead somehow in the condition (can we use DF there?) The simplified variant below works but IMHO matches cases we do not want to transform. I can't find any example on how to achieve that though. ;; Further split pinsrq variants of vec_concatv2di with two GPR sources, ;; one already reloaded, to hide the latency of one GPR->XMM transitions. (define_peephole2 [(match_scratch:DI 3 "Yv") (set (match_operand:V2DI 0 "sse_reg_operand") (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand") (match_operand:DI 2 "nonimmediate_gr_operand")))] "reload_completed && optimize_insn_for_speed_p ()" [(set (match_dup 3) (match_dup 2)) (set (match_dup 0) (vec_concat:V2DI (match_dup 1) (match_dup 3)))])=