public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "hjl.tools at gmail dot com" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts Date: Mon, 01 Mar 2021 14:37:33 +0000 [thread overview] Message-ID: <bug-27457-131-GaRqP7vqgT@http.sourceware.org/bugzilla/> (raw) In-Reply-To: <bug-27457-131@http.sourceware.org/bugzilla/> https://sourceware.org/bugzilla/show_bug.cgi?id=27457 --- Comment #18 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to rguenther from comment #16) > On Mon, 1 Mar 2021, hjl.tools at gmail dot com wrote: > > > https://sourceware.org/bugzilla/show_bug.cgi?id=27457 > > > > --- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> --- > > (In reply to Richard Biener from comment #14) > > > > > > Note according to Agner vzeroall, for example on Haswell, decodes to > > > 20 uops while vzeroupper only requires 4. On Skylake it's even worse > > > (34 uops). For short sizes (as in our benchmark which had 16-31 byte > > > strcmp) this might be a bigger difference than using the SSE2 variant > > > off an early xtest result. That said, why not, for HTM + AVX2 CPUs, > > > have an intermediate dispatcher between the AVX2 and the SSE variant > > > using xtest? That leaves the actual implementations unchanged and thus > > > with known performance characteristic? > > > > It is implemented on users/hjl/pr27457/wrapper branch: > > > > https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/pr27457/wrapper > > > > There are 2 problems: > > > > 1. Many RTM tests failed for other reasons. > > 2. Even with vzeroall overhead, AVX version may still be faster than > > SSE version. > > And the SSE version may still be faster than the AVX version with > vzeroall. Here is some data: Function: strcmp Variant: default __strcmp_avx2 __strcmp_sse2_unaligned length=14, align1=14, align2=14: 11.36 17.50 length=14, align1=14, align2=14: 11.36 15.59 length=14, align1=14, align2=14: 11.43 15.55 length=15, align1=15, align2=15: 11.36 17.42 length=15, align1=15, align2=15: 11.96 17.41 length=15, align1=15, align2=15: 11.36 16.97 length=16, align1=16, align2=16: 11.36 18.58 length=16, align1=16, align2=16: 11.36 17.41 length=16, align1=16, align2=16: 11.43 17.34 length=17, align1=17, align2=17: 11.36 21.37 length=17, align1=17, align2=17: 11.36 18.52 length=17, align1=17, align2=17: 11.36 17.94 length=18, align1=18, align2=18: 11.36 19.73 length=18, align1=18, align2=18: 11.36 19.20 length=18, align1=18, align2=18: 11.36 19.13 length=19, align1=19, align2=19: 11.36 20.38 length=19, align1=19, align2=19: 11.36 19.39 length=19, align1=19, align2=19: 11.36 20.39 length=20, align1=20, align2=20: 11.36 21.53 length=20, align1=20, align2=20: 11.36 20.98 length=20, align1=20, align2=20: 11.36 20.93 length=21, align1=21, align2=21: 11.36 22.83 length=21, align1=21, align2=21: 11.36 22.26 length=21, align1=21, align2=21: 11.36 22.25 length=22, align1=22, align2=22: 11.43 23.37 length=22, align1=22, align2=22: 11.36 22.78 length=22, align1=22, align2=22: 12.29 22.12 length=23, align1=23, align2=23: 11.36 24.63 length=23, align1=23, align2=23: 12.53 23.97 length=23, align1=23, align2=23: 11.36 23.97 length=24, align1=24, align2=24: 11.36 24.52 length=24, align1=24, align2=24: 11.36 43.47 length=24, align1=24, align2=24: 11.36 44.47 length=25, align1=25, align2=25: 11.36 39.50 length=25, align1=25, align2=25: 11.36 48.97 length=25, align1=25, align2=25: 11.36 48.53 length=26, align1=26, align2=26: 11.36 47.87 length=26, align1=26, align2=26: 11.36 47.20 length=26, align1=26, align2=26: 11.36 47.15 length=27, align1=27, align2=27: 11.36 50.90 length=27, align1=27, align2=27: 11.44 49.98 length=27, align1=27, align2=27: 11.36 49.77 length=28, align1=28, align2=28: 11.36 49.74 length=28, align1=28, align2=28: 11.36 48.86 length=28, align1=28, align2=28: 11.36 49.08 length=29, align1=29, align2=29: 11.36 52.74 length=29, align1=29, align2=29: 11.36 54.04 length=29, align1=29, align2=29: 11.36 29.49 length=30, align1=30, align2=30: 11.36 50.91 length=30, align1=30, align2=30: 11.36 51.09 length=30, align1=30, align2=30: 11.36 51.13 length=31, align1=31, align2=31: 12.36 54.33 length=31, align1=31, align2=31: 11.36 53.49 length=31, align1=31, align2=31: 11.36 53.29 length=16, align1=0, align2=0: 11.36 18.02 length=16, align1=0, align2=0: 11.36 18.58 length=16, align1=0, align2=0: 11.36 17.34 length=16, align1=0, align2=0: 11.44 19.88 length=16, align1=0, align2=0: 11.36 16.74 length=16, align1=0, align2=0: 11.36 17.42 length=16, align1=0, align2=3: 11.36 17.34 length=16, align1=3, align2=4: 11.36 17.34 length=32, align1=0, align2=0: 12.29 61.07 length=32, align1=0, align2=0: 12.63 61.08 length=32, align1=0, align2=0: 11.36 60.48 length=32, align1=0, align2=0: 11.36 60.48 length=32, align1=0, align2=0: 11.36 60.40 length=32, align1=0, align2=0: 11.36 60.40 length=32, align1=0, align2=4: 11.36 60.40 length=32, align1=4, align2=5: 12.10 59.72 > I guess we should mostly care about optimizing for "modern" CPUs > which likely means HTM + AVX512 which should be already optimal > on your branches by using %ymm16+. So we're talking about > the "legacy" AVX2 + HTM path. > > And there I think we should optimize the path that is _not_ in > a transaction since that will be 99% of the cases. Which to > me means using the proven tuned (on their respective ISA subsets) > SSE2 and AVX2 variants and simply switch between them based on > xtest. Yeah, so strcmp of a large string inside an transaction I tried it and I got RTM abort for other reasons. > might not run at optimal AVX2 speed. But it will be faster > than before the xtest dispatch since before that it would have > aborted the transaction. Please give my current approach is a try. -- You are receiving this mail because: You are on the CC list for the bug.
next prev parent reply other threads:[~2021-03-01 14:37 UTC|newest] Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-02-22 12:40 [Bug string/27457] New: " rguenth at gcc dot gnu.org 2021-02-22 12:40 ` [Bug string/27457] " rguenth at gcc dot gnu.org 2021-02-22 14:50 ` matz at suse dot de 2021-02-22 15:00 ` rguenth at gcc dot gnu.org 2021-02-22 15:26 ` hjl.tools at gmail dot com 2021-02-22 15:26 ` hjl.tools at gmail dot com 2021-02-22 18:45 ` fweimer at redhat dot com 2021-02-23 9:44 ` roman.dementiev at intel dot com 2021-02-27 2:39 ` hjl.tools at gmail dot com 2021-02-27 7:34 ` rguenther at suse dot de 2021-02-28 14:53 ` hjl.tools at gmail dot com 2021-03-01 11:32 ` fweimer at redhat dot com 2021-03-01 12:24 ` mliska at suse dot cz 2021-03-01 12:47 ` rguenther at suse dot de 2021-03-01 13:13 ` roman.dementiev at intel dot com 2021-03-01 13:19 ` fweimer at redhat dot com 2021-03-01 13:21 ` hjl.tools at gmail dot com 2021-03-01 13:24 ` hjl.tools at gmail dot com 2021-03-01 13:27 ` hjl.tools at gmail dot com 2021-03-01 13:29 ` hjl.tools at gmail dot com 2021-03-01 13:44 ` rguenth at gcc dot gnu.org 2021-03-01 14:05 ` hjl.tools at gmail dot com 2021-03-01 14:14 ` rguenther at suse dot de 2021-03-01 14:25 ` rguenth at gcc dot gnu.org 2021-03-01 14:37 ` hjl.tools at gmail dot com [this message] 2021-03-01 14:47 ` hjl.tools at gmail dot com 2021-03-01 14:49 ` rguenth at gcc dot gnu.org 2021-03-01 14:53 ` rguenth at gcc dot gnu.org 2021-03-01 15:19 ` hjl.tools at gmail dot com 2021-03-01 23:39 ` hjl.tools at gmail dot com 2021-03-05 16:54 ` hjl.tools at gmail dot com 2021-03-11 10:42 ` rguenth at gcc dot gnu.org 2021-03-16 13:53 ` rguenth at gcc dot gnu.org 2021-03-16 14:12 ` hjl.tools at gmail dot com 2021-03-29 23:00 ` hjl.tools at gmail dot com 2022-01-27 20:21 ` cvs-commit at gcc dot gnu.org 2022-01-27 20:23 ` cvs-commit at gcc dot gnu.org 2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org 2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org 2022-01-27 20:48 ` cvs-commit at gcc dot gnu.org 2022-01-27 22:41 ` cvs-commit at gcc dot gnu.org 2022-01-28 2:24 ` hjl.tools at gmail dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-27457-131-GaRqP7vqgT@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).