public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110184] New: [i386] Missed optimisation: atomic operations should use PF, ZF and SF
@ 2023-06-08 22:55 thiago at kde dot org
2023-06-08 23:12 ` [Bug target/110184] " pinskia at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: thiago at kde dot org @ 2023-06-08 22:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184
Bug ID: 110184
Summary: [i386] Missed optimisation: atomic operations should
use PF, ZF and SF
Product: gcc
Version: 13.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: thiago at kde dot org
Target Milestone: ---
Follow up from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
The x86 locked ALU operations always set PF, ZF and SF, so the atomic builtins
could use those to emit more optimal code instead of a cmpxchg loop.
Given:
template <auto Op> int atomic_rmw_op(std::atomic_int &i)
{
int old = Op(i);
if (old == 0)
return 1;
if (old < 0)
return 2;
return 0;
}
-------
Starting with the non-standard __atomic_OP_fetch, the current code for
inline int andn_fetch_1(std::atomic_int &i)
{
return __atomic_and_fetch((int *)&i, ~1, 0);
}
is
L33:
movl %eax, %edx
andl $-2, %edx
lock cmpxchgl %edx, (%rdi)
jne .L33
movl %edx, %eax
shrl $31, %eax
addl %eax, %eax // eax = 2 if edx < 0
testl %edx, %edx
movl $1, %edx
cmove %edx, %eax
But it could be more optimally written as:
movl %ecx, 1
movl %edx, 2
xorl %eax, %eax
lock andl $-2, (%rdi)
cmove %ecx, %eax
cmovs %edx, %eax
The other __atomic_OP_fetch operations are very similar. I note that GCC
already realises that if you perform __atomic_and_fetch(ptr, 1), the result
can't have the sign bit set.
-------
For the standard atomic_fetch_OP operations, there are a couple of caveats:
fetch_and: if the retrieved value is ANDed again with the same pattern; for
example:
int pattern = 0x80000001;
return i.fetch_and(pattern, std::memory_order_relaxed) & pattern;
This appears to be partially implemented, depending on what the pattern is. For
example, it generates the optimal code for pattern = 3, 15, 0x7fffffff,
0x80000000. It appears to be related to testing for either SF or ZF, but not
both.
fetch_or: always for SF, for the useful case when the pattern being ORed
doesn't already contain the sign bit. If it does (a "non-useful case"), then
the comparison is a constant, and likewise for ZF because it's never set if the
pattern isn't zero.
fetch_xor: always, because the original value is reconstructible. Avoid
generating unnecessary code in case the code already does the XOR itself, as
in:
return i.fetch_xor(1, std::memory_order_relaxed) ^ 1;
See https://gcc.godbolt.org/z/n9bMnaE4e for full results.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110184] [i386] Missed optimisation: atomic operations should use PF, ZF and SF
2023-06-08 22:55 [Bug target/110184] New: [i386] Missed optimisation: atomic operations should use PF, ZF and SF thiago at kde dot org
@ 2023-06-08 23:12 ` pinskia at gcc dot gnu.org
2023-06-08 23:16 ` [Bug target/110184] [x86] " pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-08 23:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 55288
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55288&action=edit
Full testcase
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110184] [x86] Missed optimisation: atomic operations should use PF, ZF and SF
2023-06-08 22:55 [Bug target/110184] New: [i386] Missed optimisation: atomic operations should use PF, ZF and SF thiago at kde dot org
2023-06-08 23:12 ` [Bug target/110184] " pinskia at gcc dot gnu.org
@ 2023-06-08 23:16 ` pinskia at gcc dot gnu.org
2023-11-19 17:58 ` securesneakers at gmail dot com
2023-11-19 17:59 ` securesneakers at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-08 23:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110184] [x86] Missed optimisation: atomic operations should use PF, ZF and SF
2023-06-08 22:55 [Bug target/110184] New: [i386] Missed optimisation: atomic operations should use PF, ZF and SF thiago at kde dot org
2023-06-08 23:12 ` [Bug target/110184] " pinskia at gcc dot gnu.org
2023-06-08 23:16 ` [Bug target/110184] [x86] " pinskia at gcc dot gnu.org
@ 2023-11-19 17:58 ` securesneakers at gmail dot com
2023-11-19 17:59 ` securesneakers at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: securesneakers at gmail dot com @ 2023-11-19 17:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184
Ivan Bodrov <securesneakers at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |securesneakers at gmail dot com
--- Comment #2 from Ivan Bodrov <securesneakers at gmail dot com> ---
This seem to have been implemented, at least for __atomic_fetch_and, but the
optimization is very fragile and fails when "lock and" value and mask used
during checking come from separate literals:
$ cat fragile-fetch-and.c
void slowpath(unsigned long *p);
void func_bad(unsigned long *p)
{
if (__atomic_fetch_and(p, ~1UL, __ATOMIC_RELAXED) & ~1UL)
slowpath(p);
}
void func_good(unsigned long *p)
{
unsigned long mask = ~1UL;
if (__atomic_fetch_and(p, mask, __ATOMIC_RELAXED) & mask)
slowpath(p);
}
Compiling this we can see that even though functions are the same, the first
one wasn't optimized:
$ gcc --version
gcc (GCC) 13.2.1 20230801
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ uname -s -m
Linux x86_64
$ gcc -O2 -c fragile-fetch-and.c
$ objdump -d fragile-fetch-and.o
fragile-fetch-and.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <func_bad>:
0: 48 8b 07 mov (%rdi),%rax
3: 48 89 c1 mov %rax,%rcx
6: 48 89 c2 mov %rax,%rdx
9: 48 83 e1 fe and $0xfffffffffffffffe,%rcx
d: f0 48 0f b1 0f lock cmpxchg %rcx,(%rdi)
12: 75 ef jne 3 <func_bad+0x3>
14: 48 83 fa 01 cmp $0x1,%rdx
18: 77 06 ja 20 <func_bad+0x20>
1a: c3 ret
1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
20: e9 00 00 00 00 jmp 25 <func_bad+0x25>
25: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
2c: 00 00 00 00
0000000000000030 <func_good>:
30: f0 48 83 27 fe lock andq $0xfffffffffffffffe,(%rdi)
35: 75 09 jne 40 <func_good+0x10>
37: c3 ret
38: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
3f: 00
40: e9 00 00 00 00 jmp 45 <func_good+0x15>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110184] [x86] Missed optimisation: atomic operations should use PF, ZF and SF
2023-06-08 22:55 [Bug target/110184] New: [i386] Missed optimisation: atomic operations should use PF, ZF and SF thiago at kde dot org
` (2 preceding siblings ...)
2023-11-19 17:58 ` securesneakers at gmail dot com
@ 2023-11-19 17:59 ` securesneakers at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: securesneakers at gmail dot com @ 2023-11-19 17:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184
--- Comment #3 from Ivan Bodrov <securesneakers at gmail dot com> ---
Created attachment 56646
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56646&action=edit
Fails to apply optimization for __atomic_fetch_and ZF-flag with separate
literals
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-19 17:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 22:55 [Bug target/110184] New: [i386] Missed optimisation: atomic operations should use PF, ZF and SF thiago at kde dot org
2023-06-08 23:12 ` [Bug target/110184] " pinskia at gcc dot gnu.org
2023-06-08 23:16 ` [Bug target/110184] [x86] " pinskia at gcc dot gnu.org
2023-11-19 17:58 ` securesneakers at gmail dot com
2023-11-19 17:59 ` securesneakers at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).