From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 76FE1387089E; Fri, 5 Mar 2021 10:04:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 76FE1387089E From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af Date: Fri, 05 Mar 2021 10:04:37 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Mar 2021 10:04:37 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856 --- Comment #25 from rguenther at suse dot de --- On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856 >=20 > --- Comment #24 from Uro=C5=A1 Bizjak --- > (In reply to Richard Biener from comment #22) > > I guess the idea of this insn setup was exactly to get IRA/LRA choose > > the optimal instruction sequence - otherwise exposing the reload so > > late is probably suboptimal. >=20 > THere is one more tool in the toolbox. A peephole2 pattern can be > conditionalized on availabe XMM register. So, if XMM reg is available, the > GPR->XMM move can be emitted in front of the insn. So, if there is XMM re= gister > pressure, pinsrd will be used, but if an XMM register is availabe, it wil= l be > reused to emit punpcklqdq. >=20 > The peephole2 pattern can also be conditionalized for targets where GPR->= XMM > moves are fast. Note the trick is esp. important when GPR->XMM moves are _slow_. But only in the case we originally combine two GPR operands. Doing two GPR->XMM moves and then one puncklqdq hides half of the latency of the slow moves since they have no data dependence on each other. So for the peephole we should try to match this - a reloaded operand and a GPR operand. When the %xmm operand results from a SSE computation there's no point in splitting out a GPR->XMM move. So in the end a peephole2 sounds like it could better match the condition the transform is profitable on.=