From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A6E993861039; Fri, 5 Feb 2021 16:29:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6E993861039 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af Date: Fri, 05 Feb 2021 16:29:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2021 16:29:54 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856 --- Comment #13 from Jakub Jelinek --- Looking at what other compilers emit for this, ICC seems to be completely broken, it emits logical right shifts instead of arithmetic right shift, and LLVM trunk emits for >> 63 what this patch emits, for >> 17 it emits vpsrad $17, %xmm0, %xmm1 vpsrlq $17, %xmm0, %xmm0 vpblendd $10, %xmm1, %xmm0, %xmm0 instead of vpxor %xmm1, %xmm1, %xmm1 vpcmpgtq %xmm0, %xmm1, %xmm1 vpsrlq $17, %xmm0, %xmm0 vpsllq $47, %xmm1, %xmm1 vpor %xmm1, %xmm0, %xmm0 the patch emits. For >> 47 it emits: vpsrad $31, %xmm0, %xmm1 vpsrad $15, %xmm0, %xmm0 vpshufd $245, %xmm0, %xmm0 vpblendd $10, %xmm1, %xmm0, %xmm0 etc. So, in summary, for >> 63 with SSE4.2 I think what the patch does looks bes= t, for >> 63 and SSE2 we can emit psrad $31 instead and permute the odd elemen= ts into even ones (i.e. __builtin_shuffle ((v4si) x >> 31, { 1, 1, 3, 3 })). For >> cst where cst < 32, do a psrad and psrlq by that cst and permute such that we get the even SI elts from the psrlq result and odd from psrad result. For >> 32, do a psrad $31 and permute to get the even SI elts from odd elts= of the source and odd SI elts from odd results of psrad $31. For >> cst where cst > 32, do psrad $31 and psrad $(cst-32) and permute such that even SI elts come from odd elts of the latter and odd elts come f= rom odd elts of the former.=