public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "hjl.tools at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
Date: Mon, 01 Mar 2021 14:37:33 +0000	[thread overview]
Message-ID: <bug-27457-131-GaRqP7vqgT@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-27457-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=27457

--- Comment #18 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to rguenther from comment #16)
> On Mon, 1 Mar 2021, hjl.tools at gmail dot com wrote:
> 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=27457
> > 
> > --- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> ---
> > (In reply to Richard Biener from comment #14)
> > > 
> > > Note according to Agner vzeroall, for example on Haswell, decodes to
> > > 20 uops while vzeroupper only requires 4.  On Skylake it's even worse
> > > (34 uops).  For short sizes (as in our benchmark which had 16-31 byte
> > > strcmp) this might be a bigger difference than using the SSE2 variant
> > > off an early xtest result.  That said, why not, for HTM + AVX2 CPUs,
> > > have an intermediate dispatcher between the AVX2 and the SSE variant
> > > using xtest?  That leaves the actual implementations unchanged and thus
> > > with known performance characteristic?
> > 
> > It is implemented on users/hjl/pr27457/wrapper branch:
> > 
> > https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/pr27457/wrapper
> > 
> > There are 2 problems:
> > 
> > 1. Many RTM tests failed for other reasons.
> > 2. Even with vzeroall overhead, AVX version may still be faster than
> > SSE version.
> 
> And the SSE version may still be faster than the AVX version with
> vzeroall.

Here is some data:

Function: strcmp
Variant: default
                                       __strcmp_avx2    __strcmp_sse2_unaligned
     length=14, align1=14, align2=14:        11.36             17.50    
     length=14, align1=14, align2=14:        11.36             15.59    
     length=14, align1=14, align2=14:        11.43             15.55    
     length=15, align1=15, align2=15:        11.36             17.42    
     length=15, align1=15, align2=15:        11.96             17.41    
     length=15, align1=15, align2=15:        11.36             16.97    
     length=16, align1=16, align2=16:        11.36             18.58    
     length=16, align1=16, align2=16:        11.36             17.41    
     length=16, align1=16, align2=16:        11.43             17.34    
     length=17, align1=17, align2=17:        11.36             21.37    
     length=17, align1=17, align2=17:        11.36             18.52    
     length=17, align1=17, align2=17:        11.36             17.94    
     length=18, align1=18, align2=18:        11.36             19.73    
     length=18, align1=18, align2=18:        11.36             19.20    
     length=18, align1=18, align2=18:        11.36             19.13    
     length=19, align1=19, align2=19:        11.36             20.38    
     length=19, align1=19, align2=19:        11.36             19.39    
     length=19, align1=19, align2=19:        11.36             20.39    
     length=20, align1=20, align2=20:        11.36             21.53    
     length=20, align1=20, align2=20:        11.36             20.98    
     length=20, align1=20, align2=20:        11.36             20.93    
     length=21, align1=21, align2=21:        11.36             22.83    
     length=21, align1=21, align2=21:        11.36             22.26    
     length=21, align1=21, align2=21:        11.36             22.25    
     length=22, align1=22, align2=22:        11.43             23.37    
     length=22, align1=22, align2=22:        11.36             22.78    
     length=22, align1=22, align2=22:        12.29             22.12    
     length=23, align1=23, align2=23:        11.36             24.63    
     length=23, align1=23, align2=23:        12.53             23.97    
     length=23, align1=23, align2=23:        11.36             23.97    
     length=24, align1=24, align2=24:        11.36             24.52    
     length=24, align1=24, align2=24:        11.36             43.47    
     length=24, align1=24, align2=24:        11.36             44.47    
     length=25, align1=25, align2=25:        11.36             39.50    
     length=25, align1=25, align2=25:        11.36             48.97    
     length=25, align1=25, align2=25:        11.36             48.53    
     length=26, align1=26, align2=26:        11.36             47.87    
     length=26, align1=26, align2=26:        11.36             47.20    
     length=26, align1=26, align2=26:        11.36             47.15    
     length=27, align1=27, align2=27:        11.36             50.90    
     length=27, align1=27, align2=27:        11.44             49.98    
     length=27, align1=27, align2=27:        11.36             49.77    
     length=28, align1=28, align2=28:        11.36             49.74    
     length=28, align1=28, align2=28:        11.36             48.86    
     length=28, align1=28, align2=28:        11.36             49.08    
     length=29, align1=29, align2=29:        11.36             52.74    
     length=29, align1=29, align2=29:        11.36             54.04    
     length=29, align1=29, align2=29:        11.36             29.49    
     length=30, align1=30, align2=30:        11.36             50.91    
     length=30, align1=30, align2=30:        11.36             51.09    
     length=30, align1=30, align2=30:        11.36             51.13    
     length=31, align1=31, align2=31:        12.36             54.33    
     length=31, align1=31, align2=31:        11.36             53.49    
     length=31, align1=31, align2=31:        11.36             53.29    
       length=16, align1=0, align2=0:        11.36             18.02    
       length=16, align1=0, align2=0:        11.36             18.58    
       length=16, align1=0, align2=0:        11.36             17.34    
       length=16, align1=0, align2=0:        11.44             19.88    
       length=16, align1=0, align2=0:        11.36             16.74    
       length=16, align1=0, align2=0:        11.36             17.42    
       length=16, align1=0, align2=3:        11.36             17.34    
       length=16, align1=3, align2=4:        11.36             17.34    
       length=32, align1=0, align2=0:        12.29             61.07    
       length=32, align1=0, align2=0:        12.63             61.08    
       length=32, align1=0, align2=0:        11.36             60.48    
       length=32, align1=0, align2=0:        11.36             60.48    
       length=32, align1=0, align2=0:        11.36             60.40    
       length=32, align1=0, align2=0:        11.36             60.40    
       length=32, align1=0, align2=4:        11.36             60.40    
       length=32, align1=4, align2=5:        12.10             59.72    

> I guess we should mostly care about optimizing for "modern" CPUs
> which likely means HTM + AVX512 which should be already optimal
> on your branches by using %ymm16+.  So we're talking about
> the "legacy" AVX2 + HTM path.
> 
> And there I think we should optimize the path that is _not_ in
> a transaction since that will be 99% of the cases.  Which to
> me means using the proven tuned (on their respective ISA subsets)
> SSE2 and AVX2 variants and simply switch between them based on
> xtest.  Yeah, so strcmp of a large string inside an transaction

I tried it and I got RTM abort for other reasons.

> might not run at optimal AVX2 speed.  But it will be faster
> than before the xtest dispatch since before that it would have
> aborted the transaction.

Please give my current approach is a try.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2021-03-01 14:37 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22 12:40 [Bug string/27457] New: " rguenth at gcc dot gnu.org
2021-02-22 12:40 ` [Bug string/27457] " rguenth at gcc dot gnu.org
2021-02-22 14:50 ` matz at suse dot de
2021-02-22 15:00 ` rguenth at gcc dot gnu.org
2021-02-22 15:26 ` hjl.tools at gmail dot com
2021-02-22 15:26 ` hjl.tools at gmail dot com
2021-02-22 18:45 ` fweimer at redhat dot com
2021-02-23  9:44 ` roman.dementiev at intel dot com
2021-02-27  2:39 ` hjl.tools at gmail dot com
2021-02-27  7:34 ` rguenther at suse dot de
2021-02-28 14:53 ` hjl.tools at gmail dot com
2021-03-01 11:32 ` fweimer at redhat dot com
2021-03-01 12:24 ` mliska at suse dot cz
2021-03-01 12:47 ` rguenther at suse dot de
2021-03-01 13:13 ` roman.dementiev at intel dot com
2021-03-01 13:19 ` fweimer at redhat dot com
2021-03-01 13:21 ` hjl.tools at gmail dot com
2021-03-01 13:24 ` hjl.tools at gmail dot com
2021-03-01 13:27 ` hjl.tools at gmail dot com
2021-03-01 13:29 ` hjl.tools at gmail dot com
2021-03-01 13:44 ` rguenth at gcc dot gnu.org
2021-03-01 14:05 ` hjl.tools at gmail dot com
2021-03-01 14:14 ` rguenther at suse dot de
2021-03-01 14:25 ` rguenth at gcc dot gnu.org
2021-03-01 14:37 ` hjl.tools at gmail dot com [this message]
2021-03-01 14:47 ` hjl.tools at gmail dot com
2021-03-01 14:49 ` rguenth at gcc dot gnu.org
2021-03-01 14:53 ` rguenth at gcc dot gnu.org
2021-03-01 15:19 ` hjl.tools at gmail dot com
2021-03-01 23:39 ` hjl.tools at gmail dot com
2021-03-05 16:54 ` hjl.tools at gmail dot com
2021-03-11 10:42 ` rguenth at gcc dot gnu.org
2021-03-16 13:53 ` rguenth at gcc dot gnu.org
2021-03-16 14:12 ` hjl.tools at gmail dot com
2021-03-29 23:00 ` hjl.tools at gmail dot com
2022-01-27 20:21 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:23 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:48 ` cvs-commit at gcc dot gnu.org
2022-01-27 22:41 ` cvs-commit at gcc dot gnu.org
2022-01-28  2:24 ` hjl.tools at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-27457-131-GaRqP7vqgT@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).