* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
@ 2013-01-14 19:06 ` andi-gcc at firstfloor dot org
2013-01-14 19:52 ` andi-gcc at firstfloor dot org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 19:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
--- Comment #1 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 19:06:02 UTC ---
Here's a test case. This requires the libstdc++ HLE patch from
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00673.html
g++ -std=gnu++0x
#include <atomic>
#define ACQ memory_order_acquire | __memory_order_hle_acquire
#define REL memory_order_release | __memory_order_hle_release
int main()
{
using namespace std;
atomic_ulong au = ATOMIC_VAR_INIT(0);
if (!au.fetch_and(1, ACQ))
au.fetch_and(-1, REL);
unsigned lock = 0;
__atomic_fetch_and(&lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE);
return 0;
}
The first fetch_and generates incorrectly:
.L2:
movq %rax, %rcx
movq %rax, %rdx
andl $1, %ecx
lock; cmpxchgq %rcx, -24(%rsp)
jne .L2
The second generates correctly:
lock;
.byte 0xf3
andq $-1, -24(%rsp)
The __atomic_fetch_and generates correctly:
lock;
.byte 0xf2
andl $1, -28(%rsp)
The first fetch_and should generate the same code sequence.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
2013-01-14 19:06 ` [Bug target/55966] " andi-gcc at firstfloor dot org
@ 2013-01-14 19:52 ` andi-gcc at firstfloor dot org
2013-01-14 20:15 ` andi-gcc at firstfloor dot org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 19:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
--- Comment #2 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 19:52:03 UTC ---
Created attachment 29163
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29163
preprocessed testcase
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
2013-01-14 19:06 ` [Bug target/55966] " andi-gcc at firstfloor dot org
2013-01-14 19:52 ` andi-gcc at firstfloor dot org
@ 2013-01-14 20:15 ` andi-gcc at firstfloor dot org
2013-01-14 21:22 ` ubizjak at gmail dot com
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 20:15 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
Andi Kleen <andi-gcc at firstfloor dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #29163|0 |1
is obsolete| |
--- Comment #3 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 20:15:11 UTC ---
Created attachment 29164
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29164
preprocessed test case
Sorry earlier was the wrong file
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
` (2 preceding siblings ...)
2013-01-14 20:15 ` andi-gcc at firstfloor dot org
@ 2013-01-14 21:22 ` ubizjak at gmail dot com
2013-01-14 21:25 ` ubizjak at gmail dot com
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2013-01-14 21:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
--- Comment #4 from Uros Bizjak <ubizjak at gmail dot com> 2013-01-14 21:22:35 UTC ---
The problem is, that in failed case maybe_emit_op() gets target register to
return the result to, so with after=false, it expands via
optab->mem_fetch_before.
Unfortunately, atomic_fetch_<logic> or atomic_<logic>_fetch do not exist for
x86 target, so generic expander falls back to exchange loop.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
` (3 preceding siblings ...)
2013-01-14 21:22 ` ubizjak at gmail dot com
@ 2013-01-14 21:25 ` ubizjak at gmail dot com
2013-01-14 22:05 ` andi-gcc at firstfloor dot org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2013-01-14 21:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2013-01-14 21:25:13 UTC ---
Following testcase will expand to a cmpxchg loop:
int hle_and (int *p, int v)
{
return __atomic_fetch_and_4 (p, v, __ATOMIC_ACQUIRE | __ATOMIC_HLE_ACQUIRE);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
` (4 preceding siblings ...)
2013-01-14 21:25 ` ubizjak at gmail dot com
@ 2013-01-14 22:05 ` andi-gcc at firstfloor dot org
2013-01-14 22:23 ` amacleod at redhat dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 22:05 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
--- Comment #6 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 22:05:38 UTC ---
Hmm that's true. x86 doesn't have xand, x_or, x_xor, only xadd
Maybe cmpxchg is the only way?
For some special cases it can be done (like and single bit-> btr, or single bit
-> btr), but it's probably complicated to implement.
In this case I would prefer to forbid those for HLE.
I guess more arguments for the target hook.
Other ideas?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
` (5 preceding siblings ...)
2013-01-14 22:05 ` andi-gcc at firstfloor dot org
@ 2013-01-14 22:23 ` amacleod at redhat dot com
2013-01-14 22:32 ` andi-gcc at firstfloor dot org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: amacleod at redhat dot com @ 2013-01-14 22:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
Andrew Macleod <amacleod at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amacleod at redhat dot com
--- Comment #7 from Andrew Macleod <amacleod at redhat dot com> 2013-01-14 22:22:58 UTC ---
What do you mean 'forbid' it?
Isn't the HLE bit suppose to be a hint anyway? Saying if the hardware supports
it, use it on this atomic instruction... Its not incorrect to ignore it, and
some targets simply don't support it. so its ignored.
I would expect that when we have to drop back to a compare-exchange loop we're
no longer going to really care?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
` (6 preceding siblings ...)
2013-01-14 22:23 ` amacleod at redhat dot com
@ 2013-01-14 22:32 ` andi-gcc at firstfloor dot org
2013-01-14 22:34 ` andi-gcc at firstfloor dot org
2024-04-09 2:58 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 22:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
--- Comment #8 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 22:32:06 UTC ---
forbid = give warning and drop bit
It's a hint, but in a good implementation it should not be silently dropped or
code generated that has no chance to elide. It's a quality of implementation
issue.
IMHO there should be at least a warning for this case, like we do for other
cases where elision doesn't work.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
` (7 preceding siblings ...)
2013-01-14 22:32 ` andi-gcc at firstfloor dot org
@ 2013-01-14 22:34 ` andi-gcc at firstfloor dot org
2024-04-09 2:58 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 22:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
--- Comment #9 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 22:34:16 UTC ---
Also i need to look more closely, but most likely the C++ atomic code should be
changed to avoid this situation. This would give much better code on x86 in any
case even without elision.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
2013-01-14 6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
` (8 preceding siblings ...)
2013-01-14 22:34 ` andi-gcc at firstfloor dot org
@ 2024-04-09 2:58 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-09 2:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |WONTFIX
Status|UNCONFIRMED |RESOLVED
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dropping __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE is always ok thing to do, it
is only there for a hint anyways. So closing as won't fix as Intel also disable
HLE on almost all cores anyways:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2955f270a84762343000f103e0640d29c7a96f3
^ permalink raw reply [flat|nested] 11+ messages in thread