From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
 id D17143938C04; Mon,  1 Mar 2021 14:14:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D17143938C04
From: "rguenther at suse dot de" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug string/27457] vzeroupper use in AVX2 multiarch string functions
 cause HTM aborts
Date: Mon, 01 Mar 2021 14:14:38 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: string
X-Bugzilla-Version: 2.31
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: hjl.tools at gmail dot com
X-Bugzilla-Target-Milestone: 2.34
X-Bugzilla-Flags: security-
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-27457-131-UezLUFo8sG@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-27457-131@http.sourceware.org/bugzilla/>
References: <bug-27457-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: glibc-bugs@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Glibc-bugs mailing list <glibc-bugs.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/glibc-bugs/>
List-Help: <mailto:glibc-bugs-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Mar 2021 14:14:38 -0000

https://sourceware.org/bugzilla/show_bug.cgi?id=3D27457

--- Comment #16 from rguenther at suse dot de ---
On Mon, 1 Mar 2021, hjl.tools at gmail dot com wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=3D27457
>=20
> --- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Richard Biener from comment #14)
> >=20
> > Note according to Agner vzeroall, for example on Haswell, decodes to
> > 20 uops while vzeroupper only requires 4.  On Skylake it's even worse
> > (34 uops).  For short sizes (as in our benchmark which had 16-31 byte
> > strcmp) this might be a bigger difference than using the SSE2 variant
> > off an early xtest result.  That said, why not, for HTM + AVX2 CPUs,
> > have an intermediate dispatcher between the AVX2 and the SSE variant
> > using xtest?  That leaves the actual implementations unchanged and thus
> > with known performance characteristic?
>=20
> It is implemented on users/hjl/pr27457/wrapper branch:
>=20
> https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/pr27457/wrapper
>=20
> There are 2 problems:
>=20
> 1. Many RTM tests failed for other reasons.
> 2. Even with vzeroall overhead, AVX version may still be faster than
> SSE version.

And the SSE version may still be faster than the AVX version with
vzeroall.

I guess we should mostly care about optimizing for "modern" CPUs
which likely means HTM + AVX512 which should be already optimal
on your branches by using %ymm16+.  So we're talking about
the "legacy" AVX2 + HTM path.

And there I think we should optimize the path that is _not_ in
a transaction since that will be 99% of the cases.  Which to
me means using the proven tuned (on their respective ISA subsets)
SSE2 and AVX2 variants and simply switch between them based on
xtest.  Yeah, so strcmp of a large string inside an transaction
might not run at optimal AVX2 speed.  But it will be faster
than before the xtest dispatch since before that it would have
aborted the transaction.

--=20
You are receiving this mail because:
You are on the CC list for the bug.=