* Re: [PATCH v2 00/10] x86-64: Avoid RTM abort inside a RTM region
[not found] ` <CAMe9rOoi7uK8wMo01TXKaXjJ8pF1mPchOMhop46kkb4rWeodLw@mail.gmail.com>
@ 2021-03-29 23:06 ` H.J. Lu
2022-01-27 17:13 ` H.J. Lu
0 siblings, 1 reply; 2+ messages in thread
From: H.J. Lu @ 2021-03-29 23:06 UTC (permalink / raw)
To: GNU C Library, Libc-stable Mailing List, Carlos O'Donell,
Florian Weimer
On Wed, Mar 24, 2021 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Changes in v2:
> >
> > 1. Don't use YMM2 in EVEX strcpy/strcat.
> > 2. Correct EVEX mempcpy listing.
> > 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> >
> > ---
> > Since VZEROUPPER triggers RTM abort inside a transactionally executing
> > RTM region, avoid VZEROUPPER inside a RTM region in string/memory
> > functions:
> >
> > 1. Turn on Prefer_No_VZEROUPPER for processors with RTM.
> > 2. Select functions optimized with 256-bit EVEX instructions using
> > YMM16-YMM31 registers, which don't need VZEROUPPER at function exit.
> > 3. Select AVX optimized string/memory functions with
> >
> > xtest
> > jz 1f
> > vzeroall
> > ret
> > 1:
> > vzeroupper
> > ret
> >
> > at function exit on processors with RTM, but without 256-bit EVEX
> > instructions.
> > 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
> > loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
> > 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
> > Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
> > 5. Add tests to verify that string/memory functions won't cause RTM abort
> > in RTM region.
> > 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> >
> > H.J. Lu (10):
> > x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
> > x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
> > x86-64: Add strcpy family functions with 256-bit EVEX
> > x86-64: Add memmove family functions with 256-bit EVEX
> > x86-64: Add memset family functions with 256-bit EVEX
> > x86-64: Add memcmp family functions with 256-bit EVEX
> > x86-64: Add AVX optimized string/memory functions for RTM
> > x86: Add string/memory function tests in RTM region
> > x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
> > x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
> >
> > sysdeps/x86/Makefile | 23 +
> > sysdeps/x86/cpu-features.c | 20 +-
> > sysdeps/x86/cpu-tunables.c | 2 +
> > ...cpu-features-preferred_feature_index_1.def | 1 +
> > sysdeps/x86/tst-memchr-rtm.c | 54 +
> > sysdeps/x86/tst-memcmp-rtm.c | 52 +
> > sysdeps/x86/tst-memmove-rtm.c | 53 +
> > sysdeps/x86/tst-memrchr-rtm.c | 54 +
> > sysdeps/x86/tst-memset-rtm.c | 45 +
> > sysdeps/x86/tst-strchr-rtm.c | 54 +
> > sysdeps/x86/tst-strcpy-rtm.c | 53 +
> > sysdeps/x86/tst-string-rtm.h | 72 ++
> > sysdeps/x86/tst-strlen-rtm.c | 53 +
> > sysdeps/x86/tst-strncmp-rtm.c | 52 +
> > sysdeps/x86/tst-strrchr-rtm.c | 53 +
> > sysdeps/x86_64/multiarch/Makefile | 58 +-
> > sysdeps/x86_64/multiarch/ifunc-avx2.h | 18 +-
> > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 381 +++++-
> > sysdeps/x86_64/multiarch/ifunc-memcmp.h | 17 +-
> > sysdeps/x86_64/multiarch/ifunc-memmove.h | 45 +-
> > sysdeps/x86_64/multiarch/ifunc-memset.h | 49 +-
> > sysdeps/x86_64/multiarch/ifunc-strcpy.h | 17 +-
> > sysdeps/x86_64/multiarch/ifunc-wmemset.h | 22 +-
> > sysdeps/x86_64/multiarch/memchr-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/memchr-avx2.S | 45 +-
> > sysdeps/x86_64/multiarch/memchr-evex.S | 381 ++++++
> > .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 28 +-
> > sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 440 +++++++
> > .../memmove-avx-unaligned-erms-rtm.S | 17 +
> > .../multiarch/memmove-avx512-unaligned-erms.S | 25 +-
> > .../multiarch/memmove-evex-unaligned-erms.S | 33 +
> > .../multiarch/memmove-vec-unaligned-erms.S | 57 +-
> > sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/memrchr-avx2.S | 53 +-
> > sysdeps/x86_64/multiarch/memrchr-evex.S | 337 ++++++
> > .../memset-avx2-unaligned-erms-rtm.S | 10 +
> > .../multiarch/memset-avx2-unaligned-erms.S | 12 +-
> > .../multiarch/memset-avx512-unaligned-erms.S | 16 +-
> > .../multiarch/memset-evex-unaligned-erms.S | 24 +
> > .../multiarch/memset-vec-unaligned-erms.S | 61 +-
> > sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S | 4 +
> > sysdeps/x86_64/multiarch/rawmemchr-evex.S | 4 +
> > sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S | 3 +
> > sysdeps/x86_64/multiarch/stpcpy-evex.S | 3 +
> > sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S | 4 +
> > sysdeps/x86_64/multiarch/stpncpy-evex.S | 4 +
> > sysdeps/x86_64/multiarch/strcat-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/strcat-avx2.S | 6 +-
> > sysdeps/x86_64/multiarch/strcat-evex.S | 283 +++++
> > sysdeps/x86_64/multiarch/strchr-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/strchr-avx2.S | 28 +-
> > sysdeps/x86_64/multiarch/strchr-evex.S | 335 ++++++
> > sysdeps/x86_64/multiarch/strchr.c | 17 +-
> > sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S | 3 +
> > sysdeps/x86_64/multiarch/strchrnul-evex.S | 3 +
> > sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/strcmp-avx2.S | 55 +-
> > sysdeps/x86_64/multiarch/strcmp-evex.S | 1043 +++++++++++++++++
> > sysdeps/x86_64/multiarch/strcmp.c | 19 +-
> > sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/strcpy-avx2.S | 85 +-
> > sysdeps/x86_64/multiarch/strcpy-evex.S | 1003 ++++++++++++++++
> > sysdeps/x86_64/multiarch/strlen-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/strlen-avx2.S | 43 +-
> > sysdeps/x86_64/multiarch/strlen-evex.S | 436 +++++++
> > sysdeps/x86_64/multiarch/strncat-avx2-rtm.S | 3 +
> > sysdeps/x86_64/multiarch/strncat-evex.S | 3 +
> > sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S | 3 +
> > sysdeps/x86_64/multiarch/strncmp-evex.S | 3 +
> > sysdeps/x86_64/multiarch/strncmp.c | 19 +-
> > sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S | 3 +
> > sysdeps/x86_64/multiarch/strncpy-evex.S | 3 +
> > sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S | 4 +
> > sysdeps/x86_64/multiarch/strnlen-evex.S | 4 +
> > sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S | 12 +
> > sysdeps/x86_64/multiarch/strrchr-avx2.S | 19 +-
> > sysdeps/x86_64/multiarch/strrchr-evex.S | 265 +++++
> > sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S | 3 +
> > sysdeps/x86_64/multiarch/wcschr-evex.S | 3 +
> > sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S | 4 +
> > sysdeps/x86_64/multiarch/wcscmp-evex.S | 4 +
> > sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S | 4 +
> > sysdeps/x86_64/multiarch/wcslen-evex.S | 4 +
> > sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S | 5 +
> > sysdeps/x86_64/multiarch/wcsncmp-evex.S | 5 +
> > sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S | 5 +
> > sysdeps/x86_64/multiarch/wcsnlen-evex.S | 5 +
> > sysdeps/x86_64/multiarch/wcsnlen.c | 18 +-
> > sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S | 3 +
> > sysdeps/x86_64/multiarch/wcsrchr-evex.S | 3 +
> > sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S | 4 +
> > sysdeps/x86_64/multiarch/wmemchr-evex.S | 4 +
> > .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S | 4 +
> > sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S | 4 +
> > sysdeps/x86_64/sysdep.h | 22 +
> > 96 files changed, 6372 insertions(+), 337 deletions(-)
> > create mode 100644 sysdeps/x86/tst-memchr-rtm.c
> > create mode 100644 sysdeps/x86/tst-memcmp-rtm.c
> > create mode 100644 sysdeps/x86/tst-memmove-rtm.c
> > create mode 100644 sysdeps/x86/tst-memrchr-rtm.c
> > create mode 100644 sysdeps/x86/tst-memset-rtm.c
> > create mode 100644 sysdeps/x86/tst-strchr-rtm.c
> > create mode 100644 sysdeps/x86/tst-strcpy-rtm.c
> > create mode 100644 sysdeps/x86/tst-string-rtm.h
> > create mode 100644 sysdeps/x86/tst-strlen-rtm.c
> > create mode 100644 sysdeps/x86/tst-strncmp-rtm.c
> > create mode 100644 sysdeps/x86/tst-strrchr-rtm.c
> > create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> > create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S
> > create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
> > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S
> > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S
> > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S
> >
> > --
> > 2.30.2
> >
>
> These patches have been tested internally and externally for more than 2
> weeks. I have been running the patched system glibc on AVX and AVX512
> machines. If there are no objections nor comments, I will check them in
> next Tuesday.
>
I checked all 10 patches into master branch. Here are backports for
2.33 to 2.28 branches:
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.33
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.32
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.31
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.30
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.29
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.28
--
H.J.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH v2 00/10] x86-64: Avoid RTM abort inside a RTM region
2021-03-29 23:06 ` [PATCH v2 00/10] x86-64: Avoid RTM abort inside a RTM region H.J. Lu
@ 2022-01-27 17:13 ` H.J. Lu
0 siblings, 0 replies; 2+ messages in thread
From: H.J. Lu @ 2022-01-27 17:13 UTC (permalink / raw)
To: GNU C Library, Libc-stable Mailing List, Carlos O'Donell,
Florian Weimer
On Mon, Mar 29, 2021 at 4:06 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Mar 24, 2021 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > Changes in v2:
> > >
> > > 1. Don't use YMM2 in EVEX strcpy/strcat.
> > > 2. Correct EVEX mempcpy listing.
> > > 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> > >
> > > ---
> > > Since VZEROUPPER triggers RTM abort inside a transactionally executing
> > > RTM region, avoid VZEROUPPER inside a RTM region in string/memory
> > > functions:
> > >
> > > 1. Turn on Prefer_No_VZEROUPPER for processors with RTM.
> > > 2. Select functions optimized with 256-bit EVEX instructions using
> > > YMM16-YMM31 registers, which don't need VZEROUPPER at function exit.
> > > 3. Select AVX optimized string/memory functions with
> > >
> > > xtest
> > > jz 1f
> > > vzeroall
> > > ret
> > > 1:
> > > vzeroupper
> > > ret
> > >
> > > at function exit on processors with RTM, but without 256-bit EVEX
> > > instructions.
> > > 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
> > > loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
> > > 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
> > > Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
> > > 5. Add tests to verify that string/memory functions won't cause RTM abort
> > > in RTM region.
> > > 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> > >
> > > H.J. Lu (10):
> > > x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
> > > x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
> > > x86-64: Add strcpy family functions with 256-bit EVEX
> > > x86-64: Add memmove family functions with 256-bit EVEX
> > > x86-64: Add memset family functions with 256-bit EVEX
> > > x86-64: Add memcmp family functions with 256-bit EVEX
> > > x86-64: Add AVX optimized string/memory functions for RTM
> > > x86: Add string/memory function tests in RTM region
> > > x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
> > > x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
> > >
> > > sysdeps/x86/Makefile | 23 +
> > > sysdeps/x86/cpu-features.c | 20 +-
> > > sysdeps/x86/cpu-tunables.c | 2 +
> > > ...cpu-features-preferred_feature_index_1.def | 1 +
> > > sysdeps/x86/tst-memchr-rtm.c | 54 +
> > > sysdeps/x86/tst-memcmp-rtm.c | 52 +
> > > sysdeps/x86/tst-memmove-rtm.c | 53 +
> > > sysdeps/x86/tst-memrchr-rtm.c | 54 +
> > > sysdeps/x86/tst-memset-rtm.c | 45 +
> > > sysdeps/x86/tst-strchr-rtm.c | 54 +
> > > sysdeps/x86/tst-strcpy-rtm.c | 53 +
> > > sysdeps/x86/tst-string-rtm.h | 72 ++
> > > sysdeps/x86/tst-strlen-rtm.c | 53 +
> > > sysdeps/x86/tst-strncmp-rtm.c | 52 +
> > > sysdeps/x86/tst-strrchr-rtm.c | 53 +
> > > sysdeps/x86_64/multiarch/Makefile | 58 +-
> > > sysdeps/x86_64/multiarch/ifunc-avx2.h | 18 +-
> > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 381 +++++-
> > > sysdeps/x86_64/multiarch/ifunc-memcmp.h | 17 +-
> > > sysdeps/x86_64/multiarch/ifunc-memmove.h | 45 +-
> > > sysdeps/x86_64/multiarch/ifunc-memset.h | 49 +-
> > > sysdeps/x86_64/multiarch/ifunc-strcpy.h | 17 +-
> > > sysdeps/x86_64/multiarch/ifunc-wmemset.h | 22 +-
> > > sysdeps/x86_64/multiarch/memchr-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/memchr-avx2.S | 45 +-
> > > sysdeps/x86_64/multiarch/memchr-evex.S | 381 ++++++
> > > .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 28 +-
> > > sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 440 +++++++
> > > .../memmove-avx-unaligned-erms-rtm.S | 17 +
> > > .../multiarch/memmove-avx512-unaligned-erms.S | 25 +-
> > > .../multiarch/memmove-evex-unaligned-erms.S | 33 +
> > > .../multiarch/memmove-vec-unaligned-erms.S | 57 +-
> > > sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/memrchr-avx2.S | 53 +-
> > > sysdeps/x86_64/multiarch/memrchr-evex.S | 337 ++++++
> > > .../memset-avx2-unaligned-erms-rtm.S | 10 +
> > > .../multiarch/memset-avx2-unaligned-erms.S | 12 +-
> > > .../multiarch/memset-avx512-unaligned-erms.S | 16 +-
> > > .../multiarch/memset-evex-unaligned-erms.S | 24 +
> > > .../multiarch/memset-vec-unaligned-erms.S | 61 +-
> > > sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S | 4 +
> > > sysdeps/x86_64/multiarch/rawmemchr-evex.S | 4 +
> > > sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S | 3 +
> > > sysdeps/x86_64/multiarch/stpcpy-evex.S | 3 +
> > > sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S | 4 +
> > > sysdeps/x86_64/multiarch/stpncpy-evex.S | 4 +
> > > sysdeps/x86_64/multiarch/strcat-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/strcat-avx2.S | 6 +-
> > > sysdeps/x86_64/multiarch/strcat-evex.S | 283 +++++
> > > sysdeps/x86_64/multiarch/strchr-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/strchr-avx2.S | 28 +-
> > > sysdeps/x86_64/multiarch/strchr-evex.S | 335 ++++++
> > > sysdeps/x86_64/multiarch/strchr.c | 17 +-
> > > sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S | 3 +
> > > sysdeps/x86_64/multiarch/strchrnul-evex.S | 3 +
> > > sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/strcmp-avx2.S | 55 +-
> > > sysdeps/x86_64/multiarch/strcmp-evex.S | 1043 +++++++++++++++++
> > > sysdeps/x86_64/multiarch/strcmp.c | 19 +-
> > > sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/strcpy-avx2.S | 85 +-
> > > sysdeps/x86_64/multiarch/strcpy-evex.S | 1003 ++++++++++++++++
> > > sysdeps/x86_64/multiarch/strlen-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/strlen-avx2.S | 43 +-
> > > sysdeps/x86_64/multiarch/strlen-evex.S | 436 +++++++
> > > sysdeps/x86_64/multiarch/strncat-avx2-rtm.S | 3 +
> > > sysdeps/x86_64/multiarch/strncat-evex.S | 3 +
> > > sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S | 3 +
> > > sysdeps/x86_64/multiarch/strncmp-evex.S | 3 +
> > > sysdeps/x86_64/multiarch/strncmp.c | 19 +-
> > > sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S | 3 +
> > > sysdeps/x86_64/multiarch/strncpy-evex.S | 3 +
> > > sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S | 4 +
> > > sysdeps/x86_64/multiarch/strnlen-evex.S | 4 +
> > > sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S | 12 +
> > > sysdeps/x86_64/multiarch/strrchr-avx2.S | 19 +-
> > > sysdeps/x86_64/multiarch/strrchr-evex.S | 265 +++++
> > > sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S | 3 +
> > > sysdeps/x86_64/multiarch/wcschr-evex.S | 3 +
> > > sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S | 4 +
> > > sysdeps/x86_64/multiarch/wcscmp-evex.S | 4 +
> > > sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S | 4 +
> > > sysdeps/x86_64/multiarch/wcslen-evex.S | 4 +
> > > sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S | 5 +
> > > sysdeps/x86_64/multiarch/wcsncmp-evex.S | 5 +
> > > sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S | 5 +
> > > sysdeps/x86_64/multiarch/wcsnlen-evex.S | 5 +
> > > sysdeps/x86_64/multiarch/wcsnlen.c | 18 +-
> > > sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S | 3 +
> > > sysdeps/x86_64/multiarch/wcsrchr-evex.S | 3 +
> > > sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S | 4 +
> > > sysdeps/x86_64/multiarch/wmemchr-evex.S | 4 +
> > > .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S | 4 +
> > > sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S | 4 +
> > > sysdeps/x86_64/sysdep.h | 22 +
> > > 96 files changed, 6372 insertions(+), 337 deletions(-)
> > > create mode 100644 sysdeps/x86/tst-memchr-rtm.c
> > > create mode 100644 sysdeps/x86/tst-memcmp-rtm.c
> > > create mode 100644 sysdeps/x86/tst-memmove-rtm.c
> > > create mode 100644 sysdeps/x86/tst-memrchr-rtm.c
> > > create mode 100644 sysdeps/x86/tst-memset-rtm.c
> > > create mode 100644 sysdeps/x86/tst-strchr-rtm.c
> > > create mode 100644 sysdeps/x86/tst-strcpy-rtm.c
> > > create mode 100644 sysdeps/x86/tst-string-rtm.h
> > > create mode 100644 sysdeps/x86/tst-strlen-rtm.c
> > > create mode 100644 sysdeps/x86/tst-strncmp-rtm.c
> > > create mode 100644 sysdeps/x86/tst-strrchr-rtm.c
> > > create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
> > > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S
> > > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S
> > >
> > > --
> > > 2.30.2
> > >
> >
> > These patches have been tested internally and externally for more than 2
> > weeks. I have been running the patched system glibc on AVX and AVX512
> > machines. If there are no objections nor comments, I will check them in
> > next Tuesday.
> >
>
> I checked all 10 patches into master branch. Here are backports for
> 2.33 to 2.28 branches:
>
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.33
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.32
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.31
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.30
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.29
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.28
>
I am backporting these to release branches.
--
H.J.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-01-27 17:13 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20210315142520.1661407-1-hjl.tools@gmail.com>
[not found] ` <CAMe9rOoi7uK8wMo01TXKaXjJ8pF1mPchOMhop46kkb4rWeodLw@mail.gmail.com>
2021-03-29 23:06 ` [PATCH v2 00/10] x86-64: Avoid RTM abort inside a RTM region H.J. Lu
2022-01-27 17:13 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).