From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 9ECA6383E805; Fri,  5 Mar 2021 10:43:47 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9ECA6383E805
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is
 slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
Date: Fri, 05 Mar 2021 10:43:47 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-98856-4-NAuj5HIO2I@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-98856-4@http.gcc.gnu.org/bugzilla/>
References: <bug-98856-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Mar 2021 10:43:47 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #25)
> On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote:
>=20
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856
> >=20
> > --- Comment #24 from Uro=C5=A1 Bizjak <ubizjak at gmail dot com> ---
> > (In reply to Richard Biener from comment #22)
> > > I guess the idea of this insn setup was exactly to get IRA/LRA choose
> > > the optimal instruction sequence - otherwise exposing the reload so
> > > late is probably suboptimal.
> >=20
> > THere is one more tool in the toolbox. A peephole2 pattern can be
> > conditionalized on availabe XMM register. So, if XMM reg is available, =
the
> > GPR->XMM move can be emitted in front of the insn. So, if there is XMM =
register
> > pressure, pinsrd will be used, but if an XMM register is availabe, it w=
ill be
> > reused to emit punpcklqdq.
> >=20
> > The peephole2 pattern can also be conditionalized for targets where GPR=
->XMM
> > moves are fast.
>=20
> Note the trick is esp. important when GPR->XMM moves are _slow_.  But only
> in the case we originally combine two GPR operands.  Doing two
> GPR->XMM moves and then one puncklqdq hides half of the latency of the
> slow moves since they have no data dependence on each other.  So for the
> peephole we should try to match this - a reloaded operand and a GPR
> operand.  When the %xmm operand results from a SSE computation there's
> no point in splitting out a GPR->XMM move.
>=20
> So in the end a peephole2 sounds like it could better match the condition
> the transform is profitable on.

I tried
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index db5be59f5b7..8d0d3077cf8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1419,6 +1419,23 @@
   DONE;
 })

+(define_peephole2
+  [(set (match_operand:DI 0 "sse_reg_operand")
+        (match_operand:DI 1 "general_gr_operand"))
+   (match_scratch:DI 2 "sse_reg_operand")
+   (set (match_operand:V2DI 2 "sse_reg_operand")
+       (vec_concat:V2DI (match_dup:DI 0)
+                        (match_operand:DI 3 "general_gr_operand")))]
+  "reload_completed"
+  [(set (match_dup 0)
+        (match_dup 1))
+   (set (match_dup 2)
+        (match_dup 3))
+   (set (match_dup 2)
+       (vec_concat:V2DI (match_dup 0)
+                        (match_dup 2)))]
+  "")
+
 ;; Merge movsd/movhpd to movupd for TARGET_SSE_UNALIGNED_LOAD_OPTIMAL targ=
ets.
 (define_peephole2
   [(set (match_operand:V2DF 0 "sse_reg_operand")

but that doesn't seem to match for some unknown reason.=