From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id B61F13851C1C; Fri,  5 Feb 2021 10:18:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B61F13851C1C
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is
 slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
Date: Fri, 05 Feb 2021 10:18:38 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-98856-4-T3oaTb1put@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-98856-4@http.gcc.gnu.org/bugzilla/>
References: <bug-98856-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Feb 2021 10:18:38 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
exploring more options I noticed there's no arithmetic vector V2DI right sh=
ift,
so vectorizing

  uint64_t carry =3D (uint64_t)(((int64_t)W[1]) >> 63) & (uint64_t)135;
  W[1] =3D (W[1] << 1) ^ ((uint64_t)(((int64_t)W[0]) >> 63) & (uint64_t)1);
  W[0] =3D (W[0] << 1) ^ carry;

didn't work out.  But V2DI >> CST with CST > 31 can be implemented with
VPSRAD and then doing PMOVSXDQ after shuffling the high shifted part into
low position.

Maybe there's sth more clever for the special case of >> 63 even.

As said, just trying if "optimal" vectorization of the kernel would solve
the issue.  But I guess pipelines are wide enough so the original scalar
code effectively executes "vectorized".=