From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 3EE9D39F148A; Fri, 5 Feb 2021 13:43:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3EE9D39F148A From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af Date: Fri, 05 Feb 2021 13:43:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2021 13:43:14 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856 Jakub Jelinek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |uros at gcc dot gnu.org --- Comment #11 from Jakub Jelinek --- For V2DImode arithmetic right shift, I think it would be something like: --- gcc/config/i386/sse.md.jj 2021-01-27 11:50:09.168981297 +0100 +++ gcc/config/i386/sse.md 2021-02-05 14:32:44.175463716 +0100 @@ -20313,10 +20313,55 @@ (define_expand "ashrv2di3" (ashiftrt:V2DI (match_operand:V2DI 1 "register_operand") (match_operand:DI 2 "nonmemory_operand")))] - "TARGET_XOP || TARGET_AVX512VL" + "TARGET_SSE4_2" { if (!TARGET_AVX512VL) { + if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) =3D=3D 63) + { + rtx zero =3D force_reg (V2DImode, CONST0_RTX (V2DImode)); + emit_insn (gen_sse4_2_gtv2di3 (operands[0], zero, operands[1])); + DONE; + } + if (operands[2] =3D=3D const0_rtx) + { + emit_move_insn (operands[0], operands[1]); + DONE; + } + if (!TARGET_XOP) + { + rtx zero =3D force_reg (V2DImode, CONST0_RTX (V2DImode)); + rtx zero_or_all_ones =3D gen_reg_rtx (V2DImode); + emit_insn (gen_sse4_2_gtv2di3 (zero_or_all_ones, zero, operands[1= ])); + rtx lshr_res =3D gen_reg_rtx (V2DImode); + emit_insn (gen_lshrv2di3 (lshr_res, operands[1], operands[2])); + rtx ashl_res =3D gen_reg_rtx (V2DImode); + rtx amount; + if (CONST_INT_P (operands[2])) + amount =3D GEN_INT (64 - INTVAL (operands[2])); + else if (TARGET_64BIT) + { + amount =3D gen_reg_rtx (DImode); + emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64= )), + operands[2])); + } + else + { + rtx temp =3D gen_reg_rtx (SImode); + emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)), + lowpart_subreg (SImode, operands[2], + DImode))); + amount =3D gen_reg_rtx (V4SImode); + emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode), + temp)); + } + if (!CONST_INT_P (operands[2])) + amount =3D lowpart_subreg (DImode, amount, GET_MODE (amount)); + emit_insn (gen_ashlv2di3 (ashl_res, zero_or_all_ones, amount)); + emit_insn (gen_iorv2di3 (operands[0], lshr_res, ashl_res)); + DONE; + } + rtx reg =3D gen_reg_rtx (V2DImode); rtx par; bool negate =3D false; plus adjusting the cost computation to hint that at least the non-63 arithm= etic right V2DImode shifts are more expensive. Even if in the end the V2DImode arithmetic right shifts turn to be more expensive than scalar code (though, it surprises me at least for the >> 63 case), I think V4DImode for TARGET_AVX2 should be beneficial always (haven't tried= to adjust the expander for that yet).=