From: "H.J. Lu" <hjl.tools@gmail.com>
To: libc-alpha@sourceware.org
Subject: [PATCH 0/8] x86-64: Avoid RTM abort inside a RTM region
Date: Fri, 5 Mar 2021 08:53:08 -0800 [thread overview]
Message-ID: <20210305165316.323467-1-hjl.tools@gmail.com> (raw)
Since VZEROUPPER triggers RTM abort inside a transactionally executing
RTM region, avoid VZEROUPPER inside a RTM region in string/memory
functions:
1. Turn on Prefer_No_VZEROUPPER for processors with RTM.
2. Select functions optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers, which don't need VZEROUPPER at function exit.
3. Select AVX optimized string/memory functions with
xtest
jz 1f
vzeroall
ret
1:
vzeroupper
ret
at function exit on processors with RTM, but without 256-bit EVEX
instructions.
4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
5. Add tests to verify that string/memory functions won't cause RTM abort
in RTM region.
H.J. Lu (8):
x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
x86-64: Add strcpy family functions with 256-bit EVEX
x86-64: Add memmove family functions with 256-bit EVEX
x86-64: Add memset family functions with 256-bit EVEX
x86-64: Add memcmp family functions with 256-bit EVEX
x86-64: Add AVX optimized string/memory functions for RTM
x86: Add string/memory function tests in RTM region
sysdeps/x86/Makefile | 23 +
sysdeps/x86/cpu-features.c | 20 +-
sysdeps/x86/cpu-tunables.c | 2 +
...cpu-features-preferred_feature_index_1.def | 1 +
sysdeps/x86/tst-memchr-rtm.c | 54 +
sysdeps/x86/tst-memcmp-rtm.c | 52 +
sysdeps/x86/tst-memmove-rtm.c | 53 +
sysdeps/x86/tst-memrchr-rtm.c | 54 +
sysdeps/x86/tst-memset-rtm.c | 45 +
sysdeps/x86/tst-strchr-rtm.c | 54 +
sysdeps/x86/tst-strcpy-rtm.c | 53 +
sysdeps/x86/tst-string-rtm.h | 72 ++
sysdeps/x86/tst-strlen-rtm.c | 53 +
sysdeps/x86/tst-strncmp-rtm.c | 52 +
sysdeps/x86/tst-strrchr-rtm.c | 53 +
sysdeps/x86_64/multiarch/Makefile | 58 +-
sysdeps/x86_64/multiarch/ifunc-avx2.h | 18 +-
sysdeps/x86_64/multiarch/ifunc-impl-list.c | 339 ++++++
sysdeps/x86_64/multiarch/ifunc-memcmp.h | 17 +-
sysdeps/x86_64/multiarch/ifunc-memmove.h | 33 +-
sysdeps/x86_64/multiarch/ifunc-memset.h | 35 +-
sysdeps/x86_64/multiarch/ifunc-strcpy.h | 17 +-
sysdeps/x86_64/multiarch/ifunc-wmemset.h | 18 +-
sysdeps/x86_64/multiarch/memchr-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/memchr-avx2.S | 45 +-
sysdeps/x86_64/multiarch/memchr-evex.S | 381 ++++++
.../x86_64/multiarch/memcmp-avx2-movbe-rtm.S | 12 +
sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 28 +-
sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 440 +++++++
.../memmove-avx-unaligned-erms-rtm.S | 17 +
.../multiarch/memmove-evex-unaligned-erms.S | 26 +
.../multiarch/memmove-vec-unaligned-erms.S | 57 +-
sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/memrchr-avx2.S | 53 +-
sysdeps/x86_64/multiarch/memrchr-evex.S | 337 ++++++
.../memset-avx2-unaligned-erms-rtm.S | 10 +
.../multiarch/memset-avx2-unaligned-erms.S | 12 +-
.../multiarch/memset-evex-unaligned-erms.S | 24 +
.../multiarch/memset-vec-unaligned-erms.S | 61 +-
sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S | 4 +
sysdeps/x86_64/multiarch/rawmemchr-evex.S | 4 +
sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S | 3 +
sysdeps/x86_64/multiarch/stpcpy-evex.S | 3 +
sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S | 4 +
sysdeps/x86_64/multiarch/stpncpy-evex.S | 4 +
sysdeps/x86_64/multiarch/strcat-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/strcat-avx2.S | 6 +-
sysdeps/x86_64/multiarch/strcat-evex.S | 283 +++++
sysdeps/x86_64/multiarch/strchr-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/strchr-avx2.S | 28 +-
sysdeps/x86_64/multiarch/strchr-evex.S | 335 ++++++
sysdeps/x86_64/multiarch/strchr.c | 17 +-
sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S | 3 +
sysdeps/x86_64/multiarch/strchrnul-evex.S | 3 +
sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/strcmp-avx2.S | 55 +-
sysdeps/x86_64/multiarch/strcmp-evex.S | 1043 +++++++++++++++++
sysdeps/x86_64/multiarch/strcmp.c | 19 +-
sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/strcpy-avx2.S | 85 +-
sysdeps/x86_64/multiarch/strcpy-evex.S | 1007 ++++++++++++++++
sysdeps/x86_64/multiarch/strlen-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/strlen-avx2.S | 43 +-
sysdeps/x86_64/multiarch/strlen-evex.S | 436 +++++++
sysdeps/x86_64/multiarch/strncat-avx2-rtm.S | 3 +
sysdeps/x86_64/multiarch/strncat-evex.S | 3 +
sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S | 3 +
sysdeps/x86_64/multiarch/strncmp-evex.S | 3 +
sysdeps/x86_64/multiarch/strncmp.c | 19 +-
sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S | 3 +
sysdeps/x86_64/multiarch/strncpy-evex.S | 3 +
sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S | 4 +
sysdeps/x86_64/multiarch/strnlen-evex.S | 4 +
sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S | 12 +
sysdeps/x86_64/multiarch/strrchr-avx2.S | 19 +-
sysdeps/x86_64/multiarch/strrchr-evex.S | 265 +++++
sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S | 3 +
sysdeps/x86_64/multiarch/wcschr-evex.S | 3 +
sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S | 4 +
sysdeps/x86_64/multiarch/wcscmp-evex.S | 4 +
sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S | 4 +
sysdeps/x86_64/multiarch/wcslen-evex.S | 4 +
sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S | 5 +
sysdeps/x86_64/multiarch/wcsncmp-evex.S | 5 +
sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S | 5 +
sysdeps/x86_64/multiarch/wcsnlen-evex.S | 5 +
sysdeps/x86_64/multiarch/wcsnlen.c | 18 +-
sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S | 3 +
sysdeps/x86_64/multiarch/wcsrchr-evex.S | 3 +
sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S | 4 +
sysdeps/x86_64/multiarch/wmemchr-evex.S | 4 +
.../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S | 4 +
sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S | 4 +
sysdeps/x86_64/sysdep.h | 22 +
94 files changed, 6295 insertions(+), 298 deletions(-)
create mode 100644 sysdeps/x86/tst-memchr-rtm.c
create mode 100644 sysdeps/x86/tst-memcmp-rtm.c
create mode 100644 sysdeps/x86/tst-memmove-rtm.c
create mode 100644 sysdeps/x86/tst-memrchr-rtm.c
create mode 100644 sysdeps/x86/tst-memset-rtm.c
create mode 100644 sysdeps/x86/tst-strchr-rtm.c
create mode 100644 sysdeps/x86/tst-strcpy-rtm.c
create mode 100644 sysdeps/x86/tst-string-rtm.h
create mode 100644 sysdeps/x86/tst-strlen-rtm.c
create mode 100644 sysdeps/x86/tst-strncmp-rtm.c
create mode 100644 sysdeps/x86/tst-strrchr-rtm.c
create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S
create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S
create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S
create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S
create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S
create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S
--
2.29.2
next reply other threads:[~2021-03-05 16:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-05 16:53 H.J. Lu [this message]
2021-03-05 16:53 ` [PATCH 1/8] x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP H.J. Lu
2021-03-05 16:53 ` [PATCH 2/8] x86-64: Add ifunc-avx2.h functions with 256-bit EVEX H.J. Lu
2021-03-05 16:53 ` [PATCH 3/8] x86-64: Add strcpy family " H.J. Lu
2021-03-05 16:53 ` [PATCH 4/8] x86-64: Add memmove " H.J. Lu
2021-03-05 16:53 ` [PATCH 5/8] x86-64: Add memset " H.J. Lu
2021-03-05 16:53 ` [PATCH 6/8] x86-64: Add memcmp " H.J. Lu
2021-03-05 16:53 ` [PATCH 7/8] x86-64: Add AVX optimized string/memory functions for RTM H.J. Lu
2021-03-05 16:53 ` [PATCH 8/8] x86: Add string/memory function tests in RTM region H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210305165316.323467-1-hjl.tools@gmail.com \
--to=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).