public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts
@ 2021-02-22 12:40 rguenth at gcc dot gnu.org
2021-02-22 12:40 ` [Bug string/27457] " rguenth at gcc dot gnu.org
` (40 more replies)
0 siblings, 41 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-22 12:40 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
Bug ID: 27457
Summary: vzeroupper use in AVX2 multiarch string functions
cause HTM aborts
Product: glibc
Version: 2.31
Status: NEW
Severity: normal
Priority: P2
Component: string
Assignee: unassigned at sourceware dot org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
The use of vzeroupper in for example strcmp on a AVX2 capable machine like
Skylake-X causes HTM aborts when used inside transactions. This causes severe
performance degradation for some workloads compared to glibc without those
multiarch implementations.
For one specific benchmark the following hack restores performance (as does
removing the VZEROUPPER or replacing it with the way more costly VZEROALL):
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S
b/sysdeps/x86_64/multiarch/strcmp-avx2.S
index ee82fa3e19..208b396557 100644
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
@@ -127,7 +127,8 @@ L(return):
movzbl (%rsi, %rdx), %edx
subl %edx, %eax
# endif
- VZEROUPPER
+ vpxor %ymm0, %ymm0, %ymm0
+ vpxor %ymm1, %ymm1, %ymm1
ret
.p2align 4
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
@ 2021-02-22 12:40 ` rguenth at gcc dot gnu.org
2021-02-22 14:50 ` matz at suse dot de
` (39 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-22 12:40 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hjl at sourceware dot org
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
2021-02-22 12:40 ` [Bug string/27457] " rguenth at gcc dot gnu.org
@ 2021-02-22 14:50 ` matz at suse dot de
2021-02-22 15:00 ` rguenth at gcc dot gnu.org
` (38 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: matz at suse dot de @ 2021-02-22 14:50 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
Michael Matz <matz at suse dot de> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |matz at suse dot de
--- Comment #1 from Michael Matz <matz at suse dot de> ---
FWIW, the (proprietary) benchmark regresses by 40% (!) when using the avx2
strcmp
routines, even though the overall runtime of strcmp is only about 1%. So the
transaction aborts caused by vzeroupper here are quite tremendous.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
2021-02-22 12:40 ` [Bug string/27457] " rguenth at gcc dot gnu.org
2021-02-22 14:50 ` matz at suse dot de
@ 2021-02-22 15:00 ` rguenth at gcc dot gnu.org
2021-02-22 15:26 ` hjl.tools at gmail dot com
` (37 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-22 15:00 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
More correct, the wcscmp path ends here with higher %ymm regs used
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S
b/sysdeps/x86_64/multiarch/strcmp-avx2.S
index ee82fa3e19..bd3b6243e2 100644
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
@@ -122,13 +122,16 @@ L(wcscmp_return):
negl %eax
orl $1, %eax
L(return):
+ VZEROUPPER
+ ret
# else
movzbl (%rdi, %rdx), %eax
movzbl (%rsi, %rdx), %edx
subl %edx, %eax
-# endif
- VZEROUPPER
+ vpxor %ymm0, %ymm0, %ymm0
+ vpxor %ymm1, %ymm1, %ymm1
ret
+# endif
.p2align 4
L(return_vec_size):
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (2 preceding siblings ...)
2021-02-22 15:00 ` rguenth at gcc dot gnu.org
@ 2021-02-22 15:26 ` hjl.tools at gmail dot com
2021-02-22 15:26 ` hjl.tools at gmail dot com
` (36 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-02-22 15:26 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at sourceware dot org |hjl.tools at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (3 preceding siblings ...)
2021-02-22 15:26 ` hjl.tools at gmail dot com
@ 2021-02-22 15:26 ` hjl.tools at gmail dot com
2021-02-22 18:45 ` fweimer at redhat dot com
` (35 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-02-22 15:26 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC|hjl at sourceware dot org |hjl.tools at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (4 preceding siblings ...)
2021-02-22 15:26 ` hjl.tools at gmail dot com
@ 2021-02-22 18:45 ` fweimer at redhat dot com
2021-02-23 9:44 ` roman.dementiev at intel dot com
` (34 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: fweimer at redhat dot com @ 2021-02-22 18:45 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |fweimer at redhat dot com
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (5 preceding siblings ...)
2021-02-22 18:45 ` fweimer at redhat dot com
@ 2021-02-23 9:44 ` roman.dementiev at intel dot com
2021-02-27 2:39 ` hjl.tools at gmail dot com
` (33 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: roman.dementiev at intel dot com @ 2021-02-23 9:44 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
Roman Dementiev <roman.dementiev at intel dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roman.dementiev at intel dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (6 preceding siblings ...)
2021-02-23 9:44 ` roman.dementiev at intel dot com
@ 2021-02-27 2:39 ` hjl.tools at gmail dot com
2021-02-27 7:34 ` rguenther at suse dot de
` (32 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-02-27 2:39 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |2.34
--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> More correct, the wcscmp path ends here with higher %ymm regs used
>
> diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> index ee82fa3e19..bd3b6243e2 100644
> --- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> +++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> @@ -122,13 +122,16 @@ L(wcscmp_return):
> negl %eax
> orl $1, %eax
> L(return):
> + VZEROUPPER
> + ret
> # else
> movzbl (%rdi, %rdx), %eax
> movzbl (%rsi, %rdx), %edx
> subl %edx, %eax
> -# endif
> - VZEROUPPER
> + vpxor %ymm0, %ymm0, %ymm0
> + vpxor %ymm1, %ymm1, %ymm1
> ret
> +# endif
>
> .p2align 4
> L(return_vec_size):
These won't remove AVX-SSE transition penalty. I am re-implementing
all AVX string/memory functions with YMM16-YMM31, which don't need
VZEROUPPER. My current work is on users/hjl/pr27457/evex branch at
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/evex
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (7 preceding siblings ...)
2021-02-27 2:39 ` hjl.tools at gmail dot com
@ 2021-02-27 7:34 ` rguenther at suse dot de
2021-02-28 14:53 ` hjl.tools at gmail dot com
` (31 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenther at suse dot de @ 2021-02-27 7:34 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #4 from rguenther at suse dot de ---
On February 27, 2021 3:39:50 AM GMT+01:00, "hjl.tools at gmail dot com"
<sourceware-bugzilla@sourceware.org> wrote:
>https://sourceware.org/bugzilla/show_bug.cgi?id=27457
>
>H.J. Lu <hjl.tools at gmail dot com> changed:
>
> What |Removed |Added
>----------------------------------------------------------------------------
> Target Milestone|--- |2.34
>
>--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
>(In reply to Richard Biener from comment #2)
>> More correct, the wcscmp path ends here with higher %ymm regs used
>>
>> diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S
>> b/sysdeps/x86_64/multiarch/strcmp-avx2.S
>> index ee82fa3e19..bd3b6243e2 100644
>> --- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
>> +++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
>> @@ -122,13 +122,16 @@ L(wcscmp_return):
>> negl %eax
>> orl $1, %eax
>> L(return):
>> + VZEROUPPER
>> + ret
>> # else
>> movzbl (%rdi, %rdx), %eax
>> movzbl (%rsi, %rdx), %edx
>> subl %edx, %eax
>> -# endif
>> - VZEROUPPER
>> + vpxor %ymm0, %ymm0, %ymm0
>> + vpxor %ymm1, %ymm1, %ymm1
>> ret
>> +# endif
>>
>> .p2align 4
>> L(return_vec_size):
>
>These won't remove AVX-SSE transition penalty. I am re-implementing
>all AVX string/memory functions with YMM16-YMM31, which don't need
>VZEROUPPER.
It should still avoid any false dependences. Ymm16 to ymm31 are only available
with AVX512, that will make the AVX2 strong functions unusable on non-avx512
hardware. Are you introducing another set of functions then?
My current work is on users/hjl/pr27457/evex branch at
>
>https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/evex
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (8 preceding siblings ...)
2021-02-27 7:34 ` rguenther at suse dot de
@ 2021-02-28 14:53 ` hjl.tools at gmail dot com
2021-03-01 11:32 ` fweimer at redhat dot com
` (30 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-02-28 14:53 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to rguenther from comment #4)
>
> It should still avoid any false dependences. Ymm16 to ymm31 are only
> available with AVX512, that will make the AVX2 strong functions unusable on
> non-avx512 hardware. Are you introducing another set of functions then?
>
> My current work is on users/hjl/pr27457/evex branch at
> >
> >https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/evex
I added another set of AVX/AVX2 functions to support RTM. You can use
libcpu-rt-c.so:
https://gitlab.com/cpu-rt/glibc/-/wikis/libcpu-rt-c.so-to-avoid-vzeroupper-in-RTM-region
to try it out.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (9 preceding siblings ...)
2021-02-28 14:53 ` hjl.tools at gmail dot com
@ 2021-03-01 11:32 ` fweimer at redhat dot com
2021-03-01 12:24 ` mliska at suse dot cz
` (29 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: fweimer at redhat dot com @ 2021-03-01 11:32 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #6 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to H.J. Lu from comment #5)
> I added another set of AVX/AVX2 functions to support RTM. You can use
> libcpu-rt-c.so:
>
> https://gitlab.com/cpu-rt/glibc/-/wikis/libcpu-rt-c.so-to-avoid-vzeroupper-
> in-RTM-region
>
> to try it out.
The sources do not seem to show the selection logic.
__memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
instruction, which can't be good for general application performance.
Anyway, is there anything we can realistically do here on CPUs with AVX2, but
without AVX-512? VPXOR doesn't avoid the transition penalty according to the
Intel documentation. Switching to 128-bit registers once the CPU supports RTM
is also likely to introduce performance regressions in non-transactional code.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (10 preceding siblings ...)
2021-03-01 11:32 ` fweimer at redhat dot com
@ 2021-03-01 12:24 ` mliska at suse dot cz
2021-03-01 12:47 ` rguenther at suse dot de
` (28 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: mliska at suse dot cz @ 2021-03-01 12:24 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
Martin Liska <mliska at suse dot cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mliska at suse dot cz
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (11 preceding siblings ...)
2021-03-01 12:24 ` mliska at suse dot cz
@ 2021-03-01 12:47 ` rguenther at suse dot de
2021-03-01 13:13 ` roman.dementiev at intel dot com
` (27 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenther at suse dot de @ 2021-03-01 12:47 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #7 from rguenther at suse dot de ---
On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=27457
>
> --- Comment #6 from Florian Weimer <fweimer at redhat dot com> ---
> (In reply to H.J. Lu from comment #5)
> > I added another set of AVX/AVX2 functions to support RTM. You can use
> > libcpu-rt-c.so:
> >
> > https://gitlab.com/cpu-rt/glibc/-/wikis/libcpu-rt-c.so-to-avoid-vzeroupper-
> > in-RTM-region
> >
> > to try it out.
>
> The sources do not seem to show the selection logic.
> __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> instruction, which can't be good for general application performance.
The intent was to use %ymm16+ only which does not cause any transition
penalty even w/o vzeroupper.
> Anyway, is there anything we can realistically do here on CPUs with AVX2, but
> without AVX-512? VPXOR doesn't avoid the transition penalty according to the
> Intel documentation. Switching to 128-bit registers once the CPU supports RTM
> is also likely to introduce performance regressions in non-transactional code.
There's the option to use another path when HTM is available, doing
xtest and branch to the non-AVX variants when a transaction is active.
I'm not sure about the overhead of xtest here. Nor am I sure whether
Intel has (or plans any) SKUs with HTM but not AVX512. Since HTM
on broadwell + haswell is crippled due to bugs (and thus usually
disabled in firmware) this leaves Skylake and later where I don't
know of any HTM but no-AVX512 SKUs. But this is Intel, so ...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (12 preceding siblings ...)
2021-03-01 12:47 ` rguenther at suse dot de
@ 2021-03-01 13:13 ` roman.dementiev at intel dot com
2021-03-01 13:19 ` fweimer at redhat dot com
` (26 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: roman.dementiev at intel dot com @ 2021-03-01 13:13 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #8 from Roman Dementiev <roman.dementiev at intel dot com> ---
xtest has a few cycles latency.
RTM is not disabled in firmware by default on HSX-EX, BDW-EP, BDW-EX, SKX, CLX,
CPX (server SKUs). This is different on client SKUs.
Examples of RTM but no-AVX512 SKUs: HSX-EX, BDW-EP, BDW-EX.
--
Roman
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (13 preceding siblings ...)
2021-03-01 13:13 ` roman.dementiev at intel dot com
@ 2021-03-01 13:19 ` fweimer at redhat dot com
2021-03-01 13:21 ` hjl.tools at gmail dot com
` (25 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: fweimer at redhat dot com @ 2021-03-01 13:19 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #9 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to rguenther from comment #7)
> On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> > The sources do not seem to show the selection logic.
> > __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> > instruction, which can't be good for general application performance.
>
> The intent was to use %ymm16+ only which does not cause any transition
> penalty even w/o vzeroupper.
I still saw %ymm0 usage in the disassembly, if I recall correctly. And for
AVX2, there isn't much choice. I didn't try to reverse-engineer the
corresponding IFUNC selector.
> There's the option to use another path when HTM is available, doing
> xtest and branch to the non-AVX variants when a transaction is active.
> I'm not sure about the overhead of xtest here. Nor am I sure whether
> Intel has (or plans any) SKUs with HTM but not AVX512. Since HTM
> on broadwell + haswell is crippled due to bugs (and thus usually
> disabled in firmware) this leaves Skylake and later where I don't
> know of any HTM but no-AVX512 SKUs. But this is Intel, so ...
Skylake is an overloaded term. There is e.g. Xeon E3-1240 v5 which is
advertised as having the Skylake microarchitecture, but it doesn't have
AVX-512. (I haven't checked RTM status, but I don't see why it wouldn't be
supported.) Xeon E3-1240 v6 is Kaby Lake, but it doesn't have AVX-512 either,
and according to what I see, RTM is not disabled by at least one current
distribution kernel/microcode combination.
But I think the gist is that RTM without AVX-512 exists out there even in
server parts.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (14 preceding siblings ...)
2021-03-01 13:19 ` fweimer at redhat dot com
@ 2021-03-01 13:21 ` hjl.tools at gmail dot com
2021-03-01 13:24 ` hjl.tools at gmail dot com
` (24 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 13:21 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #10 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Florian Weimer from comment #6)
> (In reply to H.J. Lu from comment #5)
> > I added another set of AVX/AVX2 functions to support RTM. You can use
> > libcpu-rt-c.so:
> >
> > https://gitlab.com/cpu-rt/glibc/-/wikis/libcpu-rt-c.so-to-avoid-vzeroupper-
> > in-RTM-region
> >
> > to try it out.
>
> The sources do not seem to show the selection logic.
> __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> instruction, which can't be good for general application performance.
memset-avx2-unaligned-erms-rtm.S has
#define ZERO_UPPER_VEC_REGISTERS_RETURN \
ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
0000000000000040 <__memset_avx2_unaligned_rtm>:
40: f3 0f 1e fa endbr64
44: c5 f9 6e c6 vmovd %esi,%xmm0
48: 48 89 f8 mov %rdi,%rax
4b: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0
50: 48 83 fa 20 cmp $0x20,%rdx
54: 0f 82 0b 01 00 00 jb 165
<__memset_avx2_unaligned_erms_rtm+0xc5>
5a: 48 83 fa 40 cmp $0x40,%rdx
5e: 77 75 ja d5
<__memset_avx2_unaligned_erms_rtm+0x35>
60: c5 fe 7f 44 17 e0 vmovdqu %ymm0,-0x20(%rdi,%rdx,1)
66: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
6a: e9 84 00 00 00 jmp f3
<__memset_avx2_unaligned_erms_rtm+0x53>
6f: 90 nop
,,,
00000000000000a0 <__memset_avx2_unaligned_erms_rtm>:
a0: f3 0f 1e fa endbr64
a4: c5 f9 6e c6 vmovd %esi,%xmm0
a8: 48 89 f8 mov %rdi,%rax
ab: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0
b0: 48 83 fa 20 cmp $0x20,%rdx
b4: 0f 82 ab 00 00 00 jb 165
<__memset_avx2_unaligned_erms_rtm+0xc5>
ba: 48 83 fa 40 cmp $0x40,%rdx
be: 77 0c ja cc
<__memset_avx2_unaligned_erms_rtm+0x2c>
c0: c5 fe 7f 44 17 e0 vmovdqu %ymm0,-0x20(%rdi,%rdx,1)
c6: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
ca: eb 27 jmp f3
<__memset_avx2_unaligned_erms_rtm+0x53>
cc: 48 3b 15 00 00 00 00 cmp 0x0(%rip),%rdx # d3
<__memset_avx2_unaligned_erms_rtm+0x33>
d3: 77 9f ja 74 <__memset_avx2_erms_rtm+0x4>
d5: 48 81 fa 80 00 00 00 cmp $0x80,%rdx
dc: 77 22 ja 100
<__memset_avx2_unaligned_erms_rtm+0x60>
de: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
e2: c5 fe 7f 47 20 vmovdqu %ymm0,0x20(%rdi)
e7: c5 fe 7f 44 17 e0 vmovdqu %ymm0,-0x20(%rdi,%rdx,1)
ed: c5 fe 7f 44 17 c0 vmovdqu %ymm0,-0x40(%rdi,%rdx,1)
f3: 0f 01 d6 xtest
f6: 74 04 je fc
<__memset_avx2_unaligned_erms_rtm+0x5c>
f8: c5 fc 77 vzeroall
fb: c3 ret
fc: c5 f8 77 vzeroupper
ff: c3 ret
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (15 preceding siblings ...)
2021-03-01 13:21 ` hjl.tools at gmail dot com
@ 2021-03-01 13:24 ` hjl.tools at gmail dot com
2021-03-01 13:27 ` hjl.tools at gmail dot com
` (23 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 13:24 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Florian Weimer from comment #9)
> (In reply to rguenther from comment #7)
> > On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> > > The sources do not seem to show the selection logic.
> > > __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> > > instruction, which can't be good for general application performance.
> >
> > The intent was to use %ymm16+ only which does not cause any transition
> > penalty even w/o vzeroupper.
>
> I still saw %ymm0 usage in the disassembly, if I recall correctly. And for
> AVX2, there isn't much choice. I didn't try to reverse-engineer the
> corresponding IFUNC selector.
At function exit, there is
f3: 0f 01 d6 xtest
f6: 74 04 je fc
<__memset_avx2_unaligned_erms_rtm+0x5c>
f8: c5 fc 77 vzeroall
fb: c3 ret
fc: c5 f8 77 vzeroupper
ff: c3 ret
> > There's the option to use another path when HTM is available, doing
> > xtest and branch to the non-AVX variants when a transaction is active.
> > I'm not sure about the overhead of xtest here. Nor am I sure whether
> > Intel has (or plans any) SKUs with HTM but not AVX512. Since HTM
> > on broadwell + haswell is crippled due to bugs (and thus usually
> > disabled in firmware) this leaves Skylake and later where I don't
> > know of any HTM but no-AVX512 SKUs. But this is Intel, so ...
>
> Skylake is an overloaded term. There is e.g. Xeon E3-1240 v5 which is
> advertised as having the Skylake microarchitecture, but it doesn't have
> AVX-512. (I haven't checked RTM status, but I don't see why it wouldn't be
> supported.) Xeon E3-1240 v6 is Kaby Lake, but it doesn't have AVX-512
> either, and according to what I see, RTM is not disabled by at least one
> current distribution kernel/microcode combination.
>
> But I think the gist is that RTM without AVX-512 exists out there even in
> server parts.
It is handed by
#define ZERO_UPPER_VEC_REGISTERS_RETURN \
ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (16 preceding siblings ...)
2021-03-01 13:24 ` hjl.tools at gmail dot com
@ 2021-03-01 13:27 ` hjl.tools at gmail dot com
2021-03-01 13:29 ` hjl.tools at gmail dot com
` (22 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 13:27 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Florian Weimer from comment #9)
> (In reply to rguenther from comment #7)
> > On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> > > The sources do not seem to show the selection logic.
> > > __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> > > instruction, which can't be good for general application performance.
> >
> > The intent was to use %ymm16+ only which does not cause any transition
> > penalty even w/o vzeroupper.
>
> I still saw %ymm0 usage in the disassembly, if I recall correctly. And for
> AVX2, there isn't much choice. I didn't try to reverse-engineer the
> corresponding IFUNC selector.
>
memset-evex-unaligned-erms.S has
# define VEC_SIZE 32
# define XMM0 xmm16
# define YMM0 ymm16
# define VEC0 ymm16
# define VEC(i) VEC##i
# define VMOVU vmovdqu64
# define VMOVA vmovdqa64
# define VZEROUPPER
There are no %ymm0 nor vzeroupper.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (17 preceding siblings ...)
2021-03-01 13:27 ` hjl.tools at gmail dot com
@ 2021-03-01 13:29 ` hjl.tools at gmail dot com
2021-03-01 13:44 ` rguenth at gcc dot gnu.org
` (21 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 13:29 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to rguenther from comment #7)
> On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
>
> There's the option to use another path when HTM is available, doing
> xtest and branch to the non-AVX variants when a transaction is active.
> I'm not sure about the overhead of xtest here. Nor am I sure whether
The overhead is low. Can you verify it?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (18 preceding siblings ...)
2021-03-01 13:29 ` hjl.tools at gmail dot com
@ 2021-03-01 13:44 ` rguenth at gcc dot gnu.org
2021-03-01 14:05 ` hjl.tools at gmail dot com
` (20 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-01 13:44 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #11)
> (In reply to Florian Weimer from comment #9)
> > (In reply to rguenther from comment #7)
> > > On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> > > > The sources do not seem to show the selection logic.
> > > > __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> > > > instruction, which can't be good for general application performance.
> > >
> > > The intent was to use %ymm16+ only which does not cause any transition
> > > penalty even w/o vzeroupper.
> >
> > I still saw %ymm0 usage in the disassembly, if I recall correctly. And for
> > AVX2, there isn't much choice. I didn't try to reverse-engineer the
> > corresponding IFUNC selector.
>
> At function exit, there is
>
> f3: 0f 01 d6 xtest
> f6: 74 04 je fc <__memset_avx2_unaligned_erms_rtm+0x5c>
> f8: c5 fc 77 vzeroall
> fb: c3 ret
> fc: c5 f8 77 vzeroupper
> ff: c3 ret
Note according to Agner vzeroall, for example on Haswell, decodes to
20 uops while vzeroupper only requires 4. On Skylake it's even worse
(34 uops). For short sizes (as in our benchmark which had 16-31 byte
strcmp) this might be a bigger difference than using the SSE2 variant
off an early xtest result. That said, why not, for HTM + AVX2 CPUs,
have an intermediate dispatcher between the AVX2 and the SSE variant
using xtest? That leaves the actual implementations unchanged and thus
with known performance characteristic?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (19 preceding siblings ...)
2021-03-01 13:44 ` rguenth at gcc dot gnu.org
@ 2021-03-01 14:05 ` hjl.tools at gmail dot com
2021-03-01 14:14 ` rguenther at suse dot de
` (19 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 14:05 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #14)
>
> Note according to Agner vzeroall, for example on Haswell, decodes to
> 20 uops while vzeroupper only requires 4. On Skylake it's even worse
> (34 uops). For short sizes (as in our benchmark which had 16-31 byte
> strcmp) this might be a bigger difference than using the SSE2 variant
> off an early xtest result. That said, why not, for HTM + AVX2 CPUs,
> have an intermediate dispatcher between the AVX2 and the SSE variant
> using xtest? That leaves the actual implementations unchanged and thus
> with known performance characteristic?
It is implemented on users/hjl/pr27457/wrapper branch:
https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/pr27457/wrapper
There are 2 problems:
1. Many RTM tests failed for other reasons.
2. Even with vzeroall overhead, AVX version may still be faster than
SSE version.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (20 preceding siblings ...)
2021-03-01 14:05 ` hjl.tools at gmail dot com
@ 2021-03-01 14:14 ` rguenther at suse dot de
2021-03-01 14:25 ` rguenth at gcc dot gnu.org
` (18 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenther at suse dot de @ 2021-03-01 14:14 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #16 from rguenther at suse dot de ---
On Mon, 1 Mar 2021, hjl.tools at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=27457
>
> --- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Richard Biener from comment #14)
> >
> > Note according to Agner vzeroall, for example on Haswell, decodes to
> > 20 uops while vzeroupper only requires 4. On Skylake it's even worse
> > (34 uops). For short sizes (as in our benchmark which had 16-31 byte
> > strcmp) this might be a bigger difference than using the SSE2 variant
> > off an early xtest result. That said, why not, for HTM + AVX2 CPUs,
> > have an intermediate dispatcher between the AVX2 and the SSE variant
> > using xtest? That leaves the actual implementations unchanged and thus
> > with known performance characteristic?
>
> It is implemented on users/hjl/pr27457/wrapper branch:
>
> https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/pr27457/wrapper
>
> There are 2 problems:
>
> 1. Many RTM tests failed for other reasons.
> 2. Even with vzeroall overhead, AVX version may still be faster than
> SSE version.
And the SSE version may still be faster than the AVX version with
vzeroall.
I guess we should mostly care about optimizing for "modern" CPUs
which likely means HTM + AVX512 which should be already optimal
on your branches by using %ymm16+. So we're talking about
the "legacy" AVX2 + HTM path.
And there I think we should optimize the path that is _not_ in
a transaction since that will be 99% of the cases. Which to
me means using the proven tuned (on their respective ISA subsets)
SSE2 and AVX2 variants and simply switch between them based on
xtest. Yeah, so strcmp of a large string inside an transaction
might not run at optimal AVX2 speed. But it will be faster
than before the xtest dispatch since before that it would have
aborted the transaction.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (21 preceding siblings ...)
2021-03-01 14:14 ` rguenther at suse dot de
@ 2021-03-01 14:25 ` rguenth at gcc dot gnu.org
2021-03-01 14:37 ` hjl.tools at gmail dot com
` (17 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-01 14:25 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #11)
> (In reply to Florian Weimer from comment #9)
> > (In reply to rguenther from comment #7)
> > > On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> > > > The sources do not seem to show the selection logic.
> > > > __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> > > > instruction, which can't be good for general application performance.
> > >
> > > The intent was to use %ymm16+ only which does not cause any transition
> > > penalty even w/o vzeroupper.
> >
> > I still saw %ymm0 usage in the disassembly, if I recall correctly. And for
> > AVX2, there isn't much choice. I didn't try to reverse-engineer the
> > corresponding IFUNC selector.
>
> At function exit, there is
>
> f3: 0f 01 d6 xtest
> f6: 74 04 je fc <__memset_avx2_unaligned_erms_rtm+0x5c>
> f8: c5 fc 77 vzeroall
> fb: c3 ret
> fc: c5 f8 77 vzeroupper
> ff: c3 ret
Btw, if the 'je' mispredicts to the vzeroupper case inside an transaction
will the speculative execution of vzeroupper abort the transaction or
does it only abort the transaction when retired?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (22 preceding siblings ...)
2021-03-01 14:25 ` rguenth at gcc dot gnu.org
@ 2021-03-01 14:37 ` hjl.tools at gmail dot com
2021-03-01 14:47 ` hjl.tools at gmail dot com
` (16 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 14:37 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #18 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to rguenther from comment #16)
> On Mon, 1 Mar 2021, hjl.tools at gmail dot com wrote:
>
> > https://sourceware.org/bugzilla/show_bug.cgi?id=27457
> >
> > --- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> ---
> > (In reply to Richard Biener from comment #14)
> > >
> > > Note according to Agner vzeroall, for example on Haswell, decodes to
> > > 20 uops while vzeroupper only requires 4. On Skylake it's even worse
> > > (34 uops). For short sizes (as in our benchmark which had 16-31 byte
> > > strcmp) this might be a bigger difference than using the SSE2 variant
> > > off an early xtest result. That said, why not, for HTM + AVX2 CPUs,
> > > have an intermediate dispatcher between the AVX2 and the SSE variant
> > > using xtest? That leaves the actual implementations unchanged and thus
> > > with known performance characteristic?
> >
> > It is implemented on users/hjl/pr27457/wrapper branch:
> >
> > https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/pr27457/wrapper
> >
> > There are 2 problems:
> >
> > 1. Many RTM tests failed for other reasons.
> > 2. Even with vzeroall overhead, AVX version may still be faster than
> > SSE version.
>
> And the SSE version may still be faster than the AVX version with
> vzeroall.
Here is some data:
Function: strcmp
Variant: default
__strcmp_avx2 __strcmp_sse2_unaligned
length=14, align1=14, align2=14: 11.36 17.50
length=14, align1=14, align2=14: 11.36 15.59
length=14, align1=14, align2=14: 11.43 15.55
length=15, align1=15, align2=15: 11.36 17.42
length=15, align1=15, align2=15: 11.96 17.41
length=15, align1=15, align2=15: 11.36 16.97
length=16, align1=16, align2=16: 11.36 18.58
length=16, align1=16, align2=16: 11.36 17.41
length=16, align1=16, align2=16: 11.43 17.34
length=17, align1=17, align2=17: 11.36 21.37
length=17, align1=17, align2=17: 11.36 18.52
length=17, align1=17, align2=17: 11.36 17.94
length=18, align1=18, align2=18: 11.36 19.73
length=18, align1=18, align2=18: 11.36 19.20
length=18, align1=18, align2=18: 11.36 19.13
length=19, align1=19, align2=19: 11.36 20.38
length=19, align1=19, align2=19: 11.36 19.39
length=19, align1=19, align2=19: 11.36 20.39
length=20, align1=20, align2=20: 11.36 21.53
length=20, align1=20, align2=20: 11.36 20.98
length=20, align1=20, align2=20: 11.36 20.93
length=21, align1=21, align2=21: 11.36 22.83
length=21, align1=21, align2=21: 11.36 22.26
length=21, align1=21, align2=21: 11.36 22.25
length=22, align1=22, align2=22: 11.43 23.37
length=22, align1=22, align2=22: 11.36 22.78
length=22, align1=22, align2=22: 12.29 22.12
length=23, align1=23, align2=23: 11.36 24.63
length=23, align1=23, align2=23: 12.53 23.97
length=23, align1=23, align2=23: 11.36 23.97
length=24, align1=24, align2=24: 11.36 24.52
length=24, align1=24, align2=24: 11.36 43.47
length=24, align1=24, align2=24: 11.36 44.47
length=25, align1=25, align2=25: 11.36 39.50
length=25, align1=25, align2=25: 11.36 48.97
length=25, align1=25, align2=25: 11.36 48.53
length=26, align1=26, align2=26: 11.36 47.87
length=26, align1=26, align2=26: 11.36 47.20
length=26, align1=26, align2=26: 11.36 47.15
length=27, align1=27, align2=27: 11.36 50.90
length=27, align1=27, align2=27: 11.44 49.98
length=27, align1=27, align2=27: 11.36 49.77
length=28, align1=28, align2=28: 11.36 49.74
length=28, align1=28, align2=28: 11.36 48.86
length=28, align1=28, align2=28: 11.36 49.08
length=29, align1=29, align2=29: 11.36 52.74
length=29, align1=29, align2=29: 11.36 54.04
length=29, align1=29, align2=29: 11.36 29.49
length=30, align1=30, align2=30: 11.36 50.91
length=30, align1=30, align2=30: 11.36 51.09
length=30, align1=30, align2=30: 11.36 51.13
length=31, align1=31, align2=31: 12.36 54.33
length=31, align1=31, align2=31: 11.36 53.49
length=31, align1=31, align2=31: 11.36 53.29
length=16, align1=0, align2=0: 11.36 18.02
length=16, align1=0, align2=0: 11.36 18.58
length=16, align1=0, align2=0: 11.36 17.34
length=16, align1=0, align2=0: 11.44 19.88
length=16, align1=0, align2=0: 11.36 16.74
length=16, align1=0, align2=0: 11.36 17.42
length=16, align1=0, align2=3: 11.36 17.34
length=16, align1=3, align2=4: 11.36 17.34
length=32, align1=0, align2=0: 12.29 61.07
length=32, align1=0, align2=0: 12.63 61.08
length=32, align1=0, align2=0: 11.36 60.48
length=32, align1=0, align2=0: 11.36 60.48
length=32, align1=0, align2=0: 11.36 60.40
length=32, align1=0, align2=0: 11.36 60.40
length=32, align1=0, align2=4: 11.36 60.40
length=32, align1=4, align2=5: 12.10 59.72
> I guess we should mostly care about optimizing for "modern" CPUs
> which likely means HTM + AVX512 which should be already optimal
> on your branches by using %ymm16+. So we're talking about
> the "legacy" AVX2 + HTM path.
>
> And there I think we should optimize the path that is _not_ in
> a transaction since that will be 99% of the cases. Which to
> me means using the proven tuned (on their respective ISA subsets)
> SSE2 and AVX2 variants and simply switch between them based on
> xtest. Yeah, so strcmp of a large string inside an transaction
I tried it and I got RTM abort for other reasons.
> might not run at optimal AVX2 speed. But it will be faster
> than before the xtest dispatch since before that it would have
> aborted the transaction.
Please give my current approach is a try.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (23 preceding siblings ...)
2021-03-01 14:37 ` hjl.tools at gmail dot com
@ 2021-03-01 14:47 ` hjl.tools at gmail dot com
2021-03-01 14:49 ` rguenth at gcc dot gnu.org
` (15 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 14:47 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #19 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #17)
> (In reply to H.J. Lu from comment #11)
> > (In reply to Florian Weimer from comment #9)
> > > (In reply to rguenther from comment #7)
> > > > On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> > > > > The sources do not seem to show the selection logic.
> > > > > __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> > > > > instruction, which can't be good for general application performance.
> > > >
> > > > The intent was to use %ymm16+ only which does not cause any transition
> > > > penalty even w/o vzeroupper.
> > >
> > > I still saw %ymm0 usage in the disassembly, if I recall correctly. And for
> > > AVX2, there isn't much choice. I didn't try to reverse-engineer the
> > > corresponding IFUNC selector.
> >
> > At function exit, there is
> >
> > f3: 0f 01 d6 xtest
> > f6: 74 04 je fc <__memset_avx2_unaligned_erms_rtm+0x5c>
> > f8: c5 fc 77 vzeroall
> > fb: c3 ret
> > fc: c5 f8 77 vzeroupper
> > ff: c3 ret
>
> Btw, if the 'je' mispredicts to the vzeroupper case inside an transaction
> will the speculative execution of vzeroupper abort the transaction or
> does it only abort the transaction when retired?
My branch includes RTM test:
for (i = 0; i < loop; i++)
{
if (_xbegin() == _XBEGIN_STARTED)
{
failed |= function ();
_xend();
}
else
{
failed |= function ();
++naborts;
}
}
It passes with xtest + je.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (24 preceding siblings ...)
2021-03-01 14:47 ` hjl.tools at gmail dot com
@ 2021-03-01 14:49 ` rguenth at gcc dot gnu.org
2021-03-01 14:53 ` rguenth at gcc dot gnu.org
` (14 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-01 14:49 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #18)
> (In reply to rguenther from comment #16)
> > On Mon, 1 Mar 2021, hjl.tools at gmail dot com wrote:
> >
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=27457
> > >
> > > --- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> ---
> > > (In reply to Richard Biener from comment #14)
> > > >
> > > > Note according to Agner vzeroall, for example on Haswell, decodes to
> > > > 20 uops while vzeroupper only requires 4. On Skylake it's even worse
> > > > (34 uops). For short sizes (as in our benchmark which had 16-31 byte
> > > > strcmp) this might be a bigger difference than using the SSE2 variant
> > > > off an early xtest result. That said, why not, for HTM + AVX2 CPUs,
> > > > have an intermediate dispatcher between the AVX2 and the SSE variant
> > > > using xtest? That leaves the actual implementations unchanged and thus
> > > > with known performance characteristic?
> > >
> > > It is implemented on users/hjl/pr27457/wrapper branch:
> > >
> > > https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/pr27457/wrapper
> > >
> > > There are 2 problems:
> > >
> > > 1. Many RTM tests failed for other reasons.
> > > 2. Even with vzeroall overhead, AVX version may still be faster than
> > > SSE version.
> >
> > And the SSE version may still be faster than the AVX version with
> > vzeroall.
>
> Here is some data:
>
> Function: strcmp
> Variant: default
> __strcmp_avx2 __strcmp_sse2_unaligned
> length=14, align1=14, align2=14: 11.36 17.50
> length=14, align1=14, align2=14: 11.36 15.59
> length=14, align1=14, align2=14: 11.43 15.55
> length=15, align1=15, align2=15: 11.36 17.42
> length=15, align1=15, align2=15: 11.96 17.41
> length=15, align1=15, align2=15: 11.36 16.97
> length=16, align1=16, align2=16: 11.36 18.58
> length=16, align1=16, align2=16: 11.36 17.41
> length=16, align1=16, align2=16: 11.43 17.34
> length=17, align1=17, align2=17: 11.36 21.37
> length=17, align1=17, align2=17: 11.36 18.52
> length=17, align1=17, align2=17: 11.36 17.94
> length=18, align1=18, align2=18: 11.36 19.73
> length=18, align1=18, align2=18: 11.36 19.20
> length=18, align1=18, align2=18: 11.36 19.13
> length=19, align1=19, align2=19: 11.36 20.38
> length=19, align1=19, align2=19: 11.36 19.39
> length=19, align1=19, align2=19: 11.36 20.39
> length=20, align1=20, align2=20: 11.36 21.53
> length=20, align1=20, align2=20: 11.36 20.98
> length=20, align1=20, align2=20: 11.36 20.93
> length=21, align1=21, align2=21: 11.36 22.83
> length=21, align1=21, align2=21: 11.36 22.26
> length=21, align1=21, align2=21: 11.36 22.25
> length=22, align1=22, align2=22: 11.43 23.37
> length=22, align1=22, align2=22: 11.36 22.78
> length=22, align1=22, align2=22: 12.29 22.12
> length=23, align1=23, align2=23: 11.36 24.63
> length=23, align1=23, align2=23: 12.53 23.97
> length=23, align1=23, align2=23: 11.36 23.97
> length=24, align1=24, align2=24: 11.36 24.52
> length=24, align1=24, align2=24: 11.36 43.47
> length=24, align1=24, align2=24: 11.36 44.47
> length=25, align1=25, align2=25: 11.36 39.50
> length=25, align1=25, align2=25: 11.36 48.97
> length=25, align1=25, align2=25: 11.36 48.53
> length=26, align1=26, align2=26: 11.36 47.87
> length=26, align1=26, align2=26: 11.36 47.20
> length=26, align1=26, align2=26: 11.36 47.15
> length=27, align1=27, align2=27: 11.36 50.90
> length=27, align1=27, align2=27: 11.44 49.98
> length=27, align1=27, align2=27: 11.36 49.77
> length=28, align1=28, align2=28: 11.36 49.74
> length=28, align1=28, align2=28: 11.36 48.86
> length=28, align1=28, align2=28: 11.36 49.08
> length=29, align1=29, align2=29: 11.36 52.74
> length=29, align1=29, align2=29: 11.36 54.04
> length=29, align1=29, align2=29: 11.36 29.49
> length=30, align1=30, align2=30: 11.36 50.91
> length=30, align1=30, align2=30: 11.36 51.09
> length=30, align1=30, align2=30: 11.36 51.13
> length=31, align1=31, align2=31: 12.36 54.33
> length=31, align1=31, align2=31: 11.36 53.49
> length=31, align1=31, align2=31: 11.36 53.29
> length=16, align1=0, align2=0: 11.36 18.02
> length=16, align1=0, align2=0: 11.36 18.58
> length=16, align1=0, align2=0: 11.36 17.34
> length=16, align1=0, align2=0: 11.44 19.88
> length=16, align1=0, align2=0: 11.36 16.74
> length=16, align1=0, align2=0: 11.36 17.42
> length=16, align1=0, align2=3: 11.36 17.34
> length=16, align1=3, align2=4: 11.36 17.34
> length=32, align1=0, align2=0: 12.29 61.07
> length=32, align1=0, align2=0: 12.63 61.08
> length=32, align1=0, align2=0: 11.36 60.48
> length=32, align1=0, align2=0: 11.36 60.48
> length=32, align1=0, align2=0: 11.36 60.40
> length=32, align1=0, align2=0: 11.36 60.40
> length=32, align1=0, align2=4: 11.36 60.40
> length=32, align1=4, align2=5: 12.10 59.72
That's with or without the vzeroall actually executing?
> > I guess we should mostly care about optimizing for "modern" CPUs
> > which likely means HTM + AVX512 which should be already optimal
> > on your branches by using %ymm16+. So we're talking about
> > the "legacy" AVX2 + HTM path.
> >
> > And there I think we should optimize the path that is _not_ in
> > a transaction since that will be 99% of the cases. Which to
> > me means using the proven tuned (on their respective ISA subsets)
> > SSE2 and AVX2 variants and simply switch between them based on
> > xtest. Yeah, so strcmp of a large string inside an transaction
>
> I tried it and I got RTM abort for other reasons.
>
> > might not run at optimal AVX2 speed. But it will be faster
> > than before the xtest dispatch since before that it would have
> > aborted the transaction.
>
> Please give my current approach is a try.
Well, I know that even unconditionally doing vzeroall will fix our observed
regression since the time is dominated by all the other code inside the
transaction that is then retried a few times (and always re-fails with
vzeroupper), the strcmp part is just ~1%.
I also only have AVX512 HW with HTM so can't easily test the AVX2 + HTM
path. That said, I'm fine with the xtest/vzero{all,upper} epilogue.
But I have also been reported libmicro regressions for strcpy with length
10 (32byte aligned), hot cache, when using AVX2 vs. SSE2 (SSE2 being faster
by 20%). [note strcpy, not strcmp here] Yeah, stupid benchmark ... but
it likely shows that for small lenghts every detail matters in case you
want to shave off the last ns.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (25 preceding siblings ...)
2021-03-01 14:49 ` rguenth at gcc dot gnu.org
@ 2021-03-01 14:53 ` rguenth at gcc dot gnu.org
2021-03-01 15:19 ` hjl.tools at gmail dot com
` (13 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-01 14:53 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #19)
> (In reply to Richard Biener from comment #17)
> > (In reply to H.J. Lu from comment #11)
> > > (In reply to Florian Weimer from comment #9)
> > > > (In reply to rguenther from comment #7)
> > > > > On Mon, 1 Mar 2021, fweimer at redhat dot com wrote:
> > > > > > The sources do not seem to show the selection logic.
> > > > > > __memset_avx2_unaligned_rtm seems to simply have dropped the VZEROUPPER
> > > > > > instruction, which can't be good for general application performance.
> > > > >
> > > > > The intent was to use %ymm16+ only which does not cause any transition
> > > > > penalty even w/o vzeroupper.
> > > >
> > > > I still saw %ymm0 usage in the disassembly, if I recall correctly. And for
> > > > AVX2, there isn't much choice. I didn't try to reverse-engineer the
> > > > corresponding IFUNC selector.
> > >
> > > At function exit, there is
> > >
> > > f3: 0f 01 d6 xtest
> > > f6: 74 04 je fc <__memset_avx2_unaligned_erms_rtm+0x5c>
> > > f8: c5 fc 77 vzeroall
> > > fb: c3 ret
> > > fc: c5 f8 77 vzeroupper
> > > ff: c3 ret
> >
> > Btw, if the 'je' mispredicts to the vzeroupper case inside an transaction
> > will the speculative execution of vzeroupper abort the transaction or
> > does it only abort the transaction when retired?
>
> My branch includes RTM test:
>
> for (i = 0; i < loop; i++)
> {
> if (_xbegin() == _XBEGIN_STARTED)
> {
> failed |= function ();
> _xend();
> }
> else
> {
> failed |= function ();
> ++naborts;
> }
> }
>
> It passes with xtest + je.
that's good to hear. Does it still work when you add and unconditional
non-transaction
failed |= function ();
before the loop? Just trying to make sure we do actually mispredict the
xtest + je.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (26 preceding siblings ...)
2021-03-01 14:53 ` rguenth at gcc dot gnu.org
@ 2021-03-01 15:19 ` hjl.tools at gmail dot com
2021-03-01 23:39 ` hjl.tools at gmail dot com
` (12 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 15:19 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #22 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #21)
>
> that's good to hear. Does it still work when you add and unconditional
> non-transaction
>
> failed |= function ();
>
> before the loop? Just trying to make sure we do actually mispredict the
> xtest + je.
I will give it a try.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (27 preceding siblings ...)
2021-03-01 15:19 ` hjl.tools at gmail dot com
@ 2021-03-01 23:39 ` hjl.tools at gmail dot com
2021-03-05 16:54 ` hjl.tools at gmail dot com
` (11 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-01 23:39 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #23 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to H.J. Lu from comment #22)
> (In reply to Richard Biener from comment #21)
>
> >
> > that's good to hear. Does it still work when you add and unconditional
> > non-transaction
> >
> > failed |= function ();
> >
> > before the loop? Just trying to make sure we do actually mispredict the
> > xtest + je.
>
> I will give it a try.
I did:
for (i = 0; i < loop; i++)
{
failed |= function ();
if (_xbegin() == _XBEGIN_STARTED)
{
failed |= function ();
_xend();
}
else
{
failed |= function ();
++naborts;
}
}
There is no issue.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (28 preceding siblings ...)
2021-03-01 23:39 ` hjl.tools at gmail dot com
@ 2021-03-05 16:54 ` hjl.tools at gmail dot com
2021-03-11 10:42 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-05 16:54 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
URL| |https://sourceware.org/pipe
| |rmail/libc-alpha/2021-March
| |/123302.html
--- Comment #24 from H.J. Lu <hjl.tools at gmail dot com> ---
A patch set is posted at
https://sourceware.org/pipermail/libc-alpha/2021-March/123302.html
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (29 preceding siblings ...)
2021-03-05 16:54 ` hjl.tools at gmail dot com
@ 2021-03-11 10:42 ` rguenth at gcc dot gnu.org
2021-03-16 13:53 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-11 10:42 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
We have successfully tested the backport to the 2.31 branch where it mitigates
the regression seen.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (30 preceding siblings ...)
2021-03-11 10:42 ` rguenth at gcc dot gnu.org
@ 2021-03-16 13:53 ` rguenth at gcc dot gnu.org
2021-03-16 14:12 ` hjl.tools at gmail dot com
` (8 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-16 13:53 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
Any progress? I see the patchset did not get any feedback in the last 10 days.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (31 preceding siblings ...)
2021-03-16 13:53 ` rguenth at gcc dot gnu.org
@ 2021-03-16 14:12 ` hjl.tools at gmail dot com
2021-03-29 23:00 ` hjl.tools at gmail dot com
` (7 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-16 14:12 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #27 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #26)
> Any progress? I see the patchset did not get any feedback in the last 10
> days.
We discussed it at glibc patch meeting on Monday. I posted the v2 patch:
https://sourceware.org/pipermail/libc-alpha/2021-March/123867.html
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (32 preceding siblings ...)
2021-03-16 14:12 ` hjl.tools at gmail dot com
@ 2021-03-29 23:00 ` hjl.tools at gmail dot com
2022-01-27 20:21 ` cvs-commit at gcc dot gnu.org
` (6 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-29 23:00 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #28 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for 2.34 on master so far by 10 commits. The last commit is
commit e4fda4631017e49d4ee5a2755db34289b6860fa4
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Sun Mar 7 09:45:23 2021 -0800
x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (33 preceding siblings ...)
2021-03-29 23:00 ` hjl.tools at gmail dot com
@ 2022-01-27 20:21 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:23 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-27 20:21 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #29 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The release/2.33/master branch has been updated by H.J. Lu
<hjl@sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=902af2f5eee71c3e48fe30d43fd7c61d563e975b
commit 902af2f5eee71c3e48fe30d43fd7c61d563e975b
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jan 27 12:20:21 2022 -0800
NEWS: Add a bug fix entry for BZ #27457
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (34 preceding siblings ...)
2022-01-27 20:21 ` cvs-commit at gcc dot gnu.org
@ 2022-01-27 20:23 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-27 20:23 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #30 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The release/2.32/master branch has been updated by H.J. Lu
<hjl@sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=05751d1c5c85b7f55086e81567cd55b025b25625
commit 05751d1c5c85b7f55086e81567cd55b025b25625
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jan 27 12:22:42 2022 -0800
NEWS: Add a bug fix entry for BZ #27457
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (35 preceding siblings ...)
2022-01-27 20:23 ` cvs-commit at gcc dot gnu.org
@ 2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-27 20:47 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #31 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The release/2.31/master branch has been updated by H.J. Lu
<hjl@sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c0cbb9345ea2d81d017e7725e3ac5250ef870513
commit c0cbb9345ea2d81d017e7725e3ac5250ef870513
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jan 27 12:23:42 2022 -0800
NEWS: Add a bug fix entry for BZ #27457
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (36 preceding siblings ...)
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
@ 2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:48 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-27 20:47 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #32 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The release/2.30/master branch has been updated by H.J. Lu
<hjl@sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=de28bb3c612daedb08fa975325c1a293fbca07a9
commit de28bb3c612daedb08fa975325c1a293fbca07a9
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jan 27 12:24:44 2022 -0800
NEWS: Add a bug fix entry for BZ #27457
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (37 preceding siblings ...)
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
@ 2022-01-27 20:48 ` cvs-commit at gcc dot gnu.org
2022-01-27 22:41 ` cvs-commit at gcc dot gnu.org
2022-01-28 2:24 ` hjl.tools at gmail dot com
40 siblings, 0 replies; 42+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-27 20:48 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #33 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The release/2.29/master branch has been updated by H.J. Lu
<hjl@sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c3535cb6cdd4cbbce22018df09cc69633781d808
commit c3535cb6cdd4cbbce22018df09cc69633781d808
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jan 27 12:25:41 2022 -0800
NEWS: Add a bug fix entry for BZ #27457
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (38 preceding siblings ...)
2022-01-27 20:48 ` cvs-commit at gcc dot gnu.org
@ 2022-01-27 22:41 ` cvs-commit at gcc dot gnu.org
2022-01-28 2:24 ` hjl.tools at gmail dot com
40 siblings, 0 replies; 42+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-27 22:41 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
--- Comment #34 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The release/2.28/master branch has been updated by H.J. Lu
<hjl@sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2baf5616d5ec5e592d64746253713969eb473f5b
commit 2baf5616d5ec5e592d64746253713969eb473f5b
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jan 27 12:49:55 2022 -0800
NEWS: Add a bug fix entry for BZ #27457
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Bug string/27457] vzeroupper use in AVX2 multiarch string functions cause HTM aborts
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
` (39 preceding siblings ...)
2022-01-27 22:41 ` cvs-commit at gcc dot gnu.org
@ 2022-01-28 2:24 ` hjl.tools at gmail dot com
40 siblings, 0 replies; 42+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-28 2:24 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=27457
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #35 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for 2.34 and all release branches.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2022-01-28 2:24 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 12:40 [Bug string/27457] New: vzeroupper use in AVX2 multiarch string functions cause HTM aborts rguenth at gcc dot gnu.org
2021-02-22 12:40 ` [Bug string/27457] " rguenth at gcc dot gnu.org
2021-02-22 14:50 ` matz at suse dot de
2021-02-22 15:00 ` rguenth at gcc dot gnu.org
2021-02-22 15:26 ` hjl.tools at gmail dot com
2021-02-22 15:26 ` hjl.tools at gmail dot com
2021-02-22 18:45 ` fweimer at redhat dot com
2021-02-23 9:44 ` roman.dementiev at intel dot com
2021-02-27 2:39 ` hjl.tools at gmail dot com
2021-02-27 7:34 ` rguenther at suse dot de
2021-02-28 14:53 ` hjl.tools at gmail dot com
2021-03-01 11:32 ` fweimer at redhat dot com
2021-03-01 12:24 ` mliska at suse dot cz
2021-03-01 12:47 ` rguenther at suse dot de
2021-03-01 13:13 ` roman.dementiev at intel dot com
2021-03-01 13:19 ` fweimer at redhat dot com
2021-03-01 13:21 ` hjl.tools at gmail dot com
2021-03-01 13:24 ` hjl.tools at gmail dot com
2021-03-01 13:27 ` hjl.tools at gmail dot com
2021-03-01 13:29 ` hjl.tools at gmail dot com
2021-03-01 13:44 ` rguenth at gcc dot gnu.org
2021-03-01 14:05 ` hjl.tools at gmail dot com
2021-03-01 14:14 ` rguenther at suse dot de
2021-03-01 14:25 ` rguenth at gcc dot gnu.org
2021-03-01 14:37 ` hjl.tools at gmail dot com
2021-03-01 14:47 ` hjl.tools at gmail dot com
2021-03-01 14:49 ` rguenth at gcc dot gnu.org
2021-03-01 14:53 ` rguenth at gcc dot gnu.org
2021-03-01 15:19 ` hjl.tools at gmail dot com
2021-03-01 23:39 ` hjl.tools at gmail dot com
2021-03-05 16:54 ` hjl.tools at gmail dot com
2021-03-11 10:42 ` rguenth at gcc dot gnu.org
2021-03-16 13:53 ` rguenth at gcc dot gnu.org
2021-03-16 14:12 ` hjl.tools at gmail dot com
2021-03-29 23:00 ` hjl.tools at gmail dot com
2022-01-27 20:21 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:23 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:47 ` cvs-commit at gcc dot gnu.org
2022-01-27 20:48 ` cvs-commit at gcc dot gnu.org
2022-01-27 22:41 ` cvs-commit at gcc dot gnu.org
2022-01-28 2:24 ` hjl.tools at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).