* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
@ 2021-11-03 20:53 ` thiago at kde dot org
2021-11-04 21:25 ` thiago at kde dot org
` (21 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2021-11-03 20:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #1 from Thiago Macieira <thiago at kde dot org> ---
(the assembly doesn't match the source code, but we got your point)
Another possible improvement for the __atomic_fetch_{and,nand,or} functions is
that it can check whether the fetched value is already correct and branch out.
In your example, the __atomic_fetch_or with 0x40000000 can check if that bit is
already set and, if so, not execute the CMPXCHG at all.
This is a valid solution for x86 on memory orderings up to acq_rel. For other
architectures, they may still need barriers. For seq_cst, we either need a
barrier or we need to execute the CMPXCHG at least once.
Therefore, the emitted code might want to optimistically execute the operation
once and, if it fails, enter the load loop. That's a slightly longer codegen.
Whether we want that under -Os or not, you'll have to be the judge.
Prior art: glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S:
ENTRY(__pthread_spin_lock)
1: LOCK
decl 0(%rdi)
jne 2f
xor %eax, %eax
ret
.align 16
2: rep
nop
cmpl $0, 0(%rdi)
jg 1b
jmp 2b
END(__pthread_spin_lock)
This does the atomic operation once, hoping it'll succeed. If it fails, it
enters the PAUSE+CMP+JG loop until the value is suitable.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
2021-11-03 20:53 ` [Bug target/103069] " thiago at kde dot org
@ 2021-11-04 21:25 ` thiago at kde dot org
2021-11-15 11:10 ` cvs-commit at gcc dot gnu.org
` (20 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2021-11-04 21:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #2 from Thiago Macieira <thiago at kde dot org> ---
See also bug 103090 for a few more (restricted) possibilities to replace a
cmpxchg loop with a LOCK RMW operation.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
2021-11-03 20:53 ` [Bug target/103069] " thiago at kde dot org
2021-11-04 21:25 ` thiago at kde dot org
@ 2021-11-15 11:10 ` cvs-commit at gcc dot gnu.org
2021-11-15 14:26 ` hjl.tools at gmail dot com
` (19 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-15 11:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:
https://gcc.gnu.org/g:4d281ff7ddd8f6365943c0a622107f92315bb8a6
commit r12-5265-g4d281ff7ddd8f6365943c0a622107f92315bb8a6
Author: Hongyu Wang <hongyu.wang@intel.com>
Date: Fri Nov 12 10:50:46 2021 +0800
PR target/103069: Relax cmpxchg loop for x86 target
From the CPU's point of view, getting a cache line for writing is more
expensive than reading. See Appendix A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.
The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under
-march=x86-64 like:
movl v(%rip), %eax
.L2:
movl %eax, %ecx
movl %eax, %edx
orl $1, %ecx
lock cmpxchgl %ecx, v(%rip)
jne .L2
movl %edx, %eax
andl $1, %eax
ret
To relax above loop, GCC should first emit a normal load, check and jump to
.L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to
yield the CPU to another hyperthread and to save power, so the code is
like
.L84:
movl (%rdi), %ecx
movl %eax, %edx
orl %esi, %edx
cmpl %eax, %ecx
jne .L82
lock cmpxchgl %edx, (%rdi)
jne .L84
.L82:
rep nop
jmp .L84
This patch adds corresponding atomic_fetch_op expanders to insert load/
compare and pause for all the atomic logic fetch builtins. Add flag
-mrelax-cmpxchg-loop to control whether to generate relaxed loop.
gcc/ChangeLog:
PR target/103069
* config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
New expand function.
* config/i386/i386-options.c (ix86_target_string): Add
-mrelax-cmpxchg-loop flag.
(ix86_valid_target_attribute_inner_p): Likewise.
* config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop):
New expand function prototype.
* config/i386/i386.opt: Add -mrelax-cmpxchg-loop.
* config/i386/sync.md (atomic_fetch_<logic><mode>): New expander
for SI,HI,QI modes.
(atomic_<logic>_fetch<mode>): Likewise.
(atomic_fetch_nand<mode>): Likewise.
(atomic_nand_fetch<mode>): Likewise.
(atomic_fetch_<logic><mode>): New expander for DI,TI modes.
(atomic_<logic>_fetch<mode>): Likewise.
(atomic_fetch_nand<mode>): Likewise.
(atomic_nand_fetch<mode>): Likewise.
* doc/invoke.texi: Document -mrelax-cmpxchg-loop.
gcc/testsuite/ChangeLog:
PR target/103069
* gcc.target/i386/pr103069-1.c: New test.
* gcc.target/i386/pr103069-2.c: Ditto.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (2 preceding siblings ...)
2021-11-15 11:10 ` cvs-commit at gcc dot gnu.org
@ 2021-11-15 14:26 ` hjl.tools at gmail dot com
2021-11-18 8:31 ` cvs-commit at gcc dot gnu.org
` (18 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2021-11-15 14:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Target Milestone|--- |12.0
Resolution|--- |FIXED
--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 12.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (3 preceding siblings ...)
2021-11-15 14:26 ` hjl.tools at gmail dot com
@ 2021-11-18 8:31 ` cvs-commit at gcc dot gnu.org
2022-01-24 23:49 ` hjl.tools at gmail dot com
` (17 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-18 8:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:
https://gcc.gnu.org/g:15f5e70cbb33b40c97325ef9d55557747a148d39
commit r12-5363-g15f5e70cbb33b40c97325ef9d55557747a148d39
Author: Hongyu Wang <hongyu.wang@intel.com>
Date: Thu Nov 18 14:45:23 2021 +0800
i386: Fix wrong codegen for -mrelax-cmpxchg-loop
For -mrelax-cmpxchg-loop introduced by PR 103069/r12-5265, it would
produce infinite loop. The correct code should be
.L84:
movl (%rdi), %ecx
movl %eax, %edx
orl %esi, %edx
cmpl %eax, %ecx
jne .L82
lock cmpxchgl %edx, (%rdi)
jne .L84
movl %r8d, %eax <<< retval is missing in previous impl
ret
.L82:
rep nop
jmp .L84
Adjust corresponding expander to fix such issue, and fix runtime test
so the problem would be exposed.
gcc/ChangeLog:
* config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
Adjust generated cfg to avoid infinite loop.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr103069-2.c: Adjust.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (4 preceding siblings ...)
2021-11-18 8:31 ` cvs-commit at gcc dot gnu.org
@ 2022-01-24 23:49 ` hjl.tools at gmail dot com
2022-01-24 23:52 ` hjl.tools at gmail dot com
` (16 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Ever confirmed|0 |1
Resolution|FIXED |---
Last reconfirmed| |2022-01-24
--- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/pthread_rwlock_common.c in glibc has:
unsigned int r = atomic_load_relaxed (&rwlock->__data.__readers);
/* Release MO so that subsequent readers or writers synchronize with us. */
while (!atomic_compare_exchange_weak_release
(&rwlock->__data.__readers, &r,
((r ^ PTHREAD_RWLOCK_WRLOCKED)
^ ((r >> PTHREAD_RWLOCK_READER_SHIFT) == 0 ? 0
: PTHREAD_RWLOCK_WRPHASE))))
{
/* TODO Back-off. */
}
-mrelax-cmpxchg-loop -O2 generates:
9240e: 89 c1 mov %eax,%ecx
92410: 31 d2 xor %edx,%edx
92412: c1 e9 03 shr $0x3,%ecx
92415: 0f 95 c2 setne %dl
92418: 31 c2 xor %eax,%edx
9241a: 83 f2 02 xor $0x2,%edx
9241d: f0 41 0f b1 10 lock cmpxchg %edx,(%r8)
92422: 75 ea jne 9240e
<__pthread_rwlock_unlock@GL
IBC_2.2.5+0x10e>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (5 preceding siblings ...)
2022-01-24 23:49 ` hjl.tools at gmail dot com
@ 2022-01-24 23:52 ` hjl.tools at gmail dot com
2022-01-24 23:53 ` hjl.tools at gmail dot com
` (15 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/pthread_mutex_unlock.c in glibc has:
do
{
newval = oldval & PTHREAD_MUTEX_PRIO_CEILING_MASK;
}
while (!atomic_compare_exchange_weak_release (&mutex->__data.__lock,
&oldval, newval));
GCC 12 generates:
90a81: 41 89 c0 mov %eax,%r8d
90a84: 41 81 e0 00 00 f8 ff and $0xfff80000,%r8d
90a8b: f0 44 0f b1 07 lock cmpxchg %r8d,(%rdi)
90a90: 75 ef jne 90a81
<__pthread_mutex_unlock_ful
l+0x81>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (6 preceding siblings ...)
2022-01-24 23:52 ` hjl.tools at gmail dot com
@ 2022-01-24 23:53 ` hjl.tools at gmail dot com
2022-01-24 23:55 ` hjl.tools at gmail dot com
` (14 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/pthread_create.c has
do
pd->nextevent = __nptl_last_event;
while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
pd, pd->nextevent));
we got
8d1b7: 48 8b 05 da 55 16 00 mov 0x1655da(%rip),%rax
8d1be: 48 89 83 68 06 00 00 mov %rax,0x668(%rbx)
8d1c5: f0 4c 0f b1 35 ca 55 lock cmpxchg %r14,0x1655ca(%rip)
8d1ce: 75 e7 jne 8d1b7
<pthread_create@@GLIBC_2.34
+0xb57>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (7 preceding siblings ...)
2022-01-24 23:53 ` hjl.tools at gmail dot com
@ 2022-01-24 23:55 ` hjl.tools at gmail dot com
2022-01-25 0:04 ` thiago at kde dot org
` (13 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |NEW
--- Comment #9 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/nptl_setxid.c in glibc has
do
{
flags = THREAD_GETMEM (self, cancelhandling);
newval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
flags & ~SETXID_BITMASK, flags);
}
while (flags != newval);
GCC 12 generates:
899f0: 64 8b 14 25 08 03 00 mov %fs:0x308,%edx
899f8: 89 d6 mov %edx,%esi
899fa: 89 d0 mov %edx,%eax
899fc: 83 e6 bf and $0xffffffbf,%esi
899ff: f0 0f b1 31 lock cmpxchg %esi,(%rcx)
89a03: 75 eb jne 899f0
<__GI___nptl_setxid_sighand
ler+0x90>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (8 preceding siblings ...)
2022-01-24 23:55 ` hjl.tools at gmail dot com
@ 2022-01-25 0:04 ` thiago at kde dot org
2022-02-15 8:59 ` wwwhhhyyy333 at gmail dot com
` (12 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-01-25 0:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #10 from Thiago Macieira <thiago at kde dot org> ---
(In reply to H.J. Lu from comment #9)
> nptl/nptl_setxid.c in glibc has
>
> do
> {
> flags = THREAD_GETMEM (self, cancelhandling);
> newval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
> flags & ~SETXID_BITMASK, flags);
> }
> while (flags != newval);
>
> GCC 12 generates:
>
> 899f0: 64 8b 14 25 08 03 00 mov %fs:0x308,%edx
> 899f8: 89 d6 mov %edx,%esi
> 899fa: 89 d0 mov %edx,%eax
> 899fc: 83 e6 bf and $0xffffffbf,%esi
> 899ff: f0 0f b1 31 lock cmpxchg %esi,(%rcx)
> 89a03: 75 eb jne 899f0 <__GI___nptl_setxid_sighand
> ler+0x90>
This one is a single bit. This one should be replaced with a LOCK BTC and no
loop.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (9 preceding siblings ...)
2022-01-25 0:04 ` thiago at kde dot org
@ 2022-02-15 8:59 ` wwwhhhyyy333 at gmail dot com
2022-02-22 3:36 ` cvs-commit at gcc dot gnu.org
` (11 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-15 8:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #11 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
For the case with atomic_compare_exchange_weak_release, it can be expanded as
loop: mov %eax,%r8d
and $0xfff80000,%r8d
mov (%r8),%rsi <--- load lock first
cmp %rsi,%rax <--- compare with expected input
jne .L2 <--- lock ne expected
lock cmpxchg %r8d,(%rdi)
mov %rsi,%rax <--- perform the behavior of failed cmpxchg
jne loop
But this is not suitable for atomic_compare_exchange_strong, as the document
said
Unlike atomic_compare_exchange_weak, this strong version is required to always
return true when expected indeed compares equal to the contained object, not
allowing spurious failures. If we expand cmpxchg as above, it would result in
spurious failure since the load is not atomic.
So for
do
pd->nextevent = __nptl_last_event;
while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
pd, pd->nextevent));
who invokes atomic_compare_exchange_strong we may not simply adjust the
expander. It is better to know the call is in loop condition and relax it
accordingly.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (10 preceding siblings ...)
2022-02-15 8:59 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-22 3:36 ` cvs-commit at gcc dot gnu.org
2022-02-22 3:38 ` wwwhhhyyy333 at gmail dot com
` (10 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-22 3:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:
https://gcc.gnu.org/g:0435b978f95971e139882549f5a1765c50682216
commit r12-7316-g0435b978f95971e139882549f5a1765c50682216
Author: Hongyu Wang <hongyu.wang@intel.com>
Date: Fri Feb 11 14:44:15 2022 +0800
i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR103069]
For cmpxchg, it is commonly used in spin loop, and several user code
such as pthread directly takes cmpxchg as loop condition, which cause
huge cache bouncing.
This patch extends previous implementation to relax all cmpxchg
instruction under -mrelax-cmpxchg-loop with an extra atomic load,
compare and emulate the failed cmpxchg behavior.
For original spin loop which looks like
loop: mov %eax,%r8d
or $1,%r8d
lock cmpxchg %r8d,(%rdi)
jne loop
It will now truns to
loop: mov %eax,%r8d
or $1,%r8d
mov (%r8),%rsi <--- load lock first
cmp %rsi,%rax <--- compare with expected input
jne .L2 <--- lock ne expected
lock cmpxchg %r8d,(%rdi)
jne loop
L2: mov %rsi,%rax <--- perform the behavior of failed cmpxchg
jne loop
under -mrelax-cmpxchg-loop.
gcc/ChangeLog:
PR target/103069
* config/i386/i386-expand.cc (ix86_expand_atomic_fetch_op_loop):
Split atomic fetch and loop part.
(ix86_expand_cmpxchg_loop): New expander for cmpxchg loop.
* config/i386/i386-protos.h (ix86_expand_cmpxchg_loop): New
prototype.
* config/i386/sync.md (atomic_compare_and_swap<mode>): Call new
expander under TARGET_RELAX_CMPXCHG_LOOP.
(atomic_compare_and_swap<mode>): Likewise for doubleword modes.
gcc/testsuite/ChangeLog:
PR target/103069
* gcc.target/i386/pr103069-2.c: Adjust result check.
* gcc.target/i386/pr103069-3.c: New test.
* gcc.target/i386/pr103069-4.c: Likewise.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (11 preceding siblings ...)
2022-02-22 3:36 ` cvs-commit at gcc dot gnu.org
@ 2022-02-22 3:38 ` wwwhhhyyy333 at gmail dot com
2022-02-22 4:16 ` thiago at kde dot org
` (9 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-22 3:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #13 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
All above glibc cases are now both relaxed by an load/cmp to skip cmpxchg under
-mrelax-cmpxchg-loop,
but for
> do
> {
> flags = THREAD_GETMEM (self, cancelhandling);
> newval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
> flags & ~SETXID_BITMASK, flags);
> }
> while (flags != newval);
If we want to optimize it to lock btc, we need to know the cmpxchg lies in a
loop. So it may require an extra pass to do further analysis and optimize,
which is not a good idea to do in stage 4.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (12 preceding siblings ...)
2022-02-22 3:38 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-22 4:16 ` thiago at kde dot org
2022-02-22 8:21 ` wwwhhhyyy333 at gmail dot com
` (8 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-22 4:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #14 from Thiago Macieira <thiago at kde dot org> ---
I'd restrict relaxations to loops emitted by the compiler. All other atomic
operations shouldn't be modified at all, unless the user asks for it. That
includes non-looping atomic operations (like LOCK BTC, LOCK XADD) as well as a
pure LOCK CMPXCHG that came from a single __atomic_compare_exchange by the
user.
I'd welcome the ability to relax the latter, especially if with one codebase I
could be efficient in CAS architectures as well as LL/SC ones.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (13 preceding siblings ...)
2022-02-22 4:16 ` thiago at kde dot org
@ 2022-02-22 8:21 ` wwwhhhyyy333 at gmail dot com
2022-02-22 18:05 ` thiago at kde dot org
` (7 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-22 8:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #15 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Thiago Macieira from comment #14)
> I'd restrict relaxations to loops emitted by the compiler. All other atomic
> operations shouldn't be modified at all, unless the user asks for it. That
> includes non-looping atomic operations (like LOCK BTC, LOCK XADD) as well as
> a pure LOCK CMPXCHG that came from a single __atomic_compare_exchange by the
> user.
>
> I'd welcome the ability to relax the latter, especially if with one codebase
> I could be efficient in CAS architectures as well as LL/SC ones.
The latest patch relaxed the pure LOCK CMPXCHG with -mrelax-cmpxchg-loop as the
commit message shows. So if you want, I can split this part to another switch
like -mrelax-cmpxchg-insn.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (14 preceding siblings ...)
2022-02-22 8:21 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-22 18:05 ` thiago at kde dot org
2022-02-22 18:41 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-22 18:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #16 from Thiago Macieira <thiago at kde dot org> ---
Can this option be enabled and disabled with a _Pragma?
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (15 preceding siblings ...)
2022-02-22 18:05 ` thiago at kde dot org
@ 2022-02-22 18:41 ` jakub at gcc dot gnu.org
2022-02-22 20:25 ` thiago at kde dot org
` (5 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-22 18:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
_Pragma("GCC target \"relax-cmpxchg-loop\"")
should do that (ditto target("relax-cmpxchg-loop") attribute).
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (16 preceding siblings ...)
2022-02-22 18:41 ` jakub at gcc dot gnu.org
@ 2022-02-22 20:25 ` thiago at kde dot org
2022-02-23 3:35 ` wwwhhhyyy333 at gmail dot com
` (4 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-22 20:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #18 from Thiago Macieira <thiago at kde dot org> ---
(In reply to Jakub Jelinek from comment #17)
> _Pragma("GCC target \"relax-cmpxchg-loop\"")
> should do that (ditto target("relax-cmpxchg-loop") attribute).
The attribute is applied to a function. I'm hoping to do it for s block of
code:
_Pragma("GCC push_options")
_Pragma("GCC target \"relax-cmpxchg-loop\"")
__atomic_compare_exchange_weak(....);
_Pragma("GCC pop_options")
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (17 preceding siblings ...)
2022-02-22 20:25 ` thiago at kde dot org
@ 2022-02-23 3:35 ` wwwhhhyyy333 at gmail dot com
2022-02-23 4:06 ` thiago at kde dot org
` (3 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-23 3:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #19 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Thiago Macieira from comment #18)
> (In reply to Jakub Jelinek from comment #17)
> > _Pragma("GCC target \"relax-cmpxchg-loop\"")
> > should do that (ditto target("relax-cmpxchg-loop") attribute).
>
> The attribute is applied to a function. I'm hoping to do it for s block of
> code:
>
> _Pragma("GCC push_options")
> _Pragma("GCC target \"relax-cmpxchg-loop\"")
> __atomic_compare_exchange_weak(....);
> _Pragma("GCC pop_options")
I'm not aware of any target __attribute__ or #Pragma can be used to code block,
at this level user can change their code directly, so I don't know why it is
needed..
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (18 preceding siblings ...)
2022-02-23 3:35 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-23 4:06 ` thiago at kde dot org
2022-04-13 8:18 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-23 4:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #20 from Thiago Macieira <thiago at kde dot org> ---
I think there will be cases where the relaxation makes sense and others where
it doesn't because the surrounding code already does it. So I'd like to control
per emission.
If I can't do it per code block, I suppose I could make a lambda block
[&]() __attribute__((target("relax-cmpxchg-loop"))) {
return __atomic_compare_exchange_weak(....);
}();
Of course, it might be easier to simply relax it myself at that point.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (19 preceding siblings ...)
2022-02-23 4:06 ` thiago at kde dot org
@ 2022-04-13 8:18 ` cvs-commit at gcc dot gnu.org
2022-05-06 8:31 ` jakub at gcc dot gnu.org
2023-05-08 12:23 ` rguenth at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-04-13 8:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #21 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:
https://gcc.gnu.org/g:522f25e90c781d284f8347a04940db8b41c42fd5
commit r12-8136-g522f25e90c781d284f8347a04940db8b41c42fd5
Author: Hongyu Wang <hongyu.wang@intel.com>
Date: Wed Apr 13 14:51:36 2022 +0800
i386: Fix infinite loop under -mrelax-cmpxchg-loop [PR 103069]
For -mrelax-cmpxchg-loop which relaxes atomic_fetch_<logic> loops,
there is a missing set to %eax when compare fails, which would result
in infinite loop in some benchmark. Add set to %eax to avoid it.
gcc/ChangeLog:
PR target/103069
* config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop):
Add missing set to target_val at pause label.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (20 preceding siblings ...)
2022-04-13 8:18 ` cvs-commit at gcc dot gnu.org
@ 2022-05-06 8:31 ` jakub at gcc dot gnu.org
2023-05-08 12:23 ` rguenth at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06 8:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|12.0 |12.2
--- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/103069] cmpxchg isn't optimized
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
` (21 preceding siblings ...)
2022-05-06 8:31 ` jakub at gcc dot gnu.org
@ 2023-05-08 12:23 ` rguenth at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|12.3 |12.4
--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.
^ permalink raw reply [flat|nested] 24+ messages in thread