public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction
@ 2012-07-24 19:00 drepper.fsp at gmail dot com
2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-07-24 19:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
Bug #: 54087
Summary: __atomic_fetch_add does not use xadd instruction
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: drepper.fsp@gmail.com
Target: x86_64-redhat-linux
Compiling this code
int a;
int f1(int p)
{
return __atomic_sub_fetch(&a, p, __ATOMIC_SEQ_CST) == 0;
}
int f2(int p)
{
return __atomic_fetch_sub(&a, p, __ATOMIC_SEQ_CST) - p == 0;
}
you'll see that neither function uses the xadd instruction with the lock
prefix. Instead an expensive emulation using cmpxchg is used:
0000000000000000 <f1>:
0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <f1+0x6>
2: R_X86_64_PC32 a-0x4
6: 89 c2 mov %eax,%edx
8: 29 fa sub %edi,%edx
a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip) # 12
<f1+0x12>
11: 00
e: R_X86_64_PC32 a-0x4
12: 75 f2 jne 6 <f1+0x6>
14: 31 c0 xor %eax,%eax
16: 85 d2 test %edx,%edx
18: 0f 94 c0 sete %al
1b: c3 retq
This implementation not only is larger, it has possibly (unlikely) unbounded
cost and even if the cmpxchg succeeds right away it is costlier. The last
point is esepcially true if the cache line for the variable in question is not
in the core's cache. In this case the initial load causes a I->S transition
for the cache line and the cmpxchg an additional and possibly also very
expensive S->E transition. Using xadd would cause a I->E transition.
The config/i386/sync.md file in the current tree contains a pattern for
atomic_fetch_add which does use xadd but it seems not to be used, even if
instead of the function parameter an immediate value is used.
;; For operand 2 nonmemory_operand predicate is used instead of
;; register_operand to allow combiner to better optimize atomic
;; additions of constants.
(define_insn "atomic_fetch_add<mode>"
[(set (match_operand:SWI 0 "register_operand" "=<r>")
(unspec_volatile:SWI
[(match_operand:SWI 1 "memory_operand" "+m")
(match_operand:SI 3 "const_int_operand")] ;; model
UNSPECV_XCHG))
(set (match_dup 1)
(plus:SWI (match_dup 1)
(match_operand:SWI 2 "nonmemory_operand" "0")))
(clobber (reg:CC FLAGS_REG))]
"TARGET_XADD"
"lock{%;} %K3xadd{<imodesuffix>}\t{%0, %1|%1, %0}")
X86_ARCH_XADD should be defined for every architecture but i386.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
@ 2012-07-24 19:21 ` ubizjak at gmail dot com
2012-07-24 19:22 ` ubizjak at gmail dot com
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2012-07-24 19:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #1 from Uros Bizjak <ubizjak at gmail dot com> 2012-07-24 19:21:02 UTC ---
Use __atomic_add_fetch and __atomic_fetch_sub in the testcase, and you will
get:
0000000000000000 <f1>:
0: 89 f8 mov %edi,%eax
2: f0 0f c1 05 00 00 00 lock xadd %eax,0x0(%rip) # a <f1+0xa>
9: 00
a: 01 f8 add %edi,%eax
c: 85 c0 test %eax,%eax
e: 0f 94 c0 sete %al
11: 0f b6 c0 movzbl %al,%eax
14: c3 retq
0000000000000020 <f2>:
20: 89 f8 mov %edi,%eax
22: f0 0f c1 05 00 00 00 lock xadd %eax,0x0(%rip) # 2a <f2+0xa>
29: 00
2a: 39 f8 cmp %edi,%eax
2c: 0f 94 c0 sete %al
2f: 0f b6 c0 movzbl %al,%eax
32: c3 retq
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
@ 2012-07-24 19:22 ` ubizjak at gmail dot com
2012-08-01 16:06 ` drepper.fsp at gmail dot com
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2012-07-24 19:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2012-07-24 19:22:00 UTC ---
(In reply to comment #1)
> Use __atomic_add_fetch and __atomic_fetch_sub in the testcase, and you will
Eh, __atomic_fetch_add.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
2012-07-24 19:22 ` ubizjak at gmail dot com
@ 2012-08-01 16:06 ` drepper.fsp at gmail dot com
2012-08-02 14:33 ` drepper.fsp at gmail dot com
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-01 16:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #3 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-01 16:06:33 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > Use __atomic_add_fetch and __atomic_fetch_sub in the testcase, and you will
>
> Eh, __atomic_fetch_add.
Yes, but the compiler should automatically do this. The extreme case is this:
int v;
int a(void)
{
return __sync_sub_and_fetch(&v, 5);
}
int b(void)
{
return __sync_add_and_fetch(&v, -5);
}
The second function does compile as expected. The first doesn't, it uses
cmpxchg.
Shouldn't this be easy enough to fix by adding patterns for atomic_fetch_sub
and atomic_sub_fetch which match if the second parameter is a constant? If
it's not a constant a bit more code is needed but that should be no problem
either.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (2 preceding siblings ...)
2012-08-01 16:06 ` drepper.fsp at gmail dot com
@ 2012-08-02 14:33 ` drepper.fsp at gmail dot com
2012-08-02 17:36 ` ubizjak at gmail dot com
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-02 14:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #4 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-02 14:33:19 UTC ---
One more data point. In a micro-benchmark which uses realistic code used in
production the change from
__sync_sub_and_fetch(var, constant)
to
__sync_add_and_fetch(var, -constant)
lead to a 10% to 27% improvement in performance. The cmpxchg use with the
necessary initial load and I->S cache transition really kills performance when
memory is highly contested.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (3 preceding siblings ...)
2012-08-02 14:33 ` drepper.fsp at gmail dot com
@ 2012-08-02 17:36 ` ubizjak at gmail dot com
2012-08-03 2:17 ` drepper.fsp at gmail dot com
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2012-08-02 17:36 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2012-08-02 17:35:30 UTC ---
Created attachment 27927
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27927
Patch that introduces atomic_fetch_sub<mode> with const_int operands
This patch introduces atomic_fetch_sub<mode>:
--cut here--
(define_expand "atomic_fetch_sub<mode>"
[(set (match_operand:SWI 0 "register_operand")
(unspec_volatile:SWI
[(match_operand:SWI 1 "memory_operand")
(match_operand:SI 3 "const_int_operand")] ;; model
UNSPECV_XCHG))
(set (match_dup 1)
(minus:SWI (match_dup 1)
(match_operand:SWI 2 "const_int_operand")))
(clobber (reg:CC FLAGS_REG))]
"TARGET_XADD"
{
/* Avoid overflows. */
if (mode_signbit_p (<MODE>mode, operands[2]))
FAIL;
operands[2] = negate_rtx (<MODE>mode, operands[2]);
emit_insn (gen_atomic_fetch_add<mode> (operands[0], operands[1],
operands[2], operands[3]));
DONE;
})
--cut here--
We have to take care of overflows, so we can handle only const_int operands
with known-to-be overflow free values.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (4 preceding siblings ...)
2012-08-02 17:36 ` ubizjak at gmail dot com
@ 2012-08-03 2:17 ` drepper.fsp at gmail dot com
2012-08-23 14:33 ` amacleod at redhat dot com
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-03 2:17 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #6 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-03 02:16:57 UTC ---
(In reply to comment #5)
> This patch introduces atomic_fetch_sub<mode>:
Seems to work nicely.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (5 preceding siblings ...)
2012-08-03 2:17 ` drepper.fsp at gmail dot com
@ 2012-08-23 14:33 ` amacleod at redhat dot com
2012-08-23 15:42 ` drepper.fsp at gmail dot com
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: amacleod at redhat dot com @ 2012-08-23 14:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
Andrew Macleod <amacleod at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #27927|0 |1
is obsolete| |
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-08-23
AssignedTo|unassigned at gcc dot |amacleod at redhat dot com
|gnu.org |
Ever Confirmed|0 |1
--- Comment #7 from Andrew Macleod <amacleod at redhat dot com> 2012-08-23 14:33:01 UTC ---
Created attachment 28074
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28074
Generic expansion patch proposal
This patch changes the tree-rtl expansion code to try using the complementary
atomic add or sub operation with a negative operand if an instruction sequence
is not created.
This will enable the optimization automatically for all targets.
Check to see if it solves the problem as well.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (6 preceding siblings ...)
2012-08-23 14:33 ` amacleod at redhat dot com
@ 2012-08-23 15:42 ` drepper.fsp at gmail dot com
2012-10-01 15:50 ` amacleod at redhat dot com
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-23 15:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #8 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-23 15:41:49 UTC ---
(In reply to comment #7)
> Check to see if it solves the problem as well.
I tested it. Seems to work in all cases and does not disturb other
optimizations like comparisons with zero.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (7 preceding siblings ...)
2012-08-23 15:42 ` drepper.fsp at gmail dot com
@ 2012-10-01 15:50 ` amacleod at redhat dot com
2013-11-19 14:09 ` glisse at gcc dot gnu.org
2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: amacleod at redhat dot com @ 2012-10-01 15:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #9 from Andrew Macleod <amacleod at redhat dot com> 2012-10-01 15:50:26 UTC ---
Author: amacleod
Date: Mon Oct 1 15:50:09 2012
New Revision: 191929
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191929
Log:
2012-10-01 Andrew MacLeod <amacleod@redhat.com>
PR target/54087
* optabs.c (expand_atomic_fetch_op_no_fallback): New. Factored code
from expand_atomic_fetch_op.
(expand_atomic_fetch_op): Try atomic_{add|sub} operations in terms of
the other one if direct opcode fails.
* testsuite/gcc.dg/pr54087.c: New testcase for atomic_sub ->
atomic_add when atomic_sub fails.
Added:
trunk/gcc/testsuite/gcc.dg/pr54087.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/optabs.c
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (8 preceding siblings ...)
2012-10-01 15:50 ` amacleod at redhat dot com
@ 2013-11-19 14:09 ` glisse at gcc dot gnu.org
2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-11-19 14:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
Marc Glisse <glisse at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #10 from Marc Glisse <glisse at gcc dot gnu.org> ---
Looks like it is fixed.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
` (9 preceding siblings ...)
2013-11-19 14:09 ` glisse at gcc dot gnu.org
@ 2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp+rhbz at gmail dot com @ 2013-11-19 14:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
Ulrich Drepper <drepper.fsp+rhbz at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |drepper.fsp+rhbz at gmail dot com
--- Comment #11 from Ulrich Drepper <drepper.fsp+rhbz at gmail dot com> ---
Yes, although there still in an oddity. The code from comment #3 is compiled
as:
0000000000000000 <a>:
0: ba fb ff ff ff mov $0xfffffffb,%edx
5: 89 d0 mov %edx,%eax
7: f0 0f c1 05 00 00 00 lock xadd %eax,0x0(%rip) # f <a+0xf>
e: 00
b: R_X86_64_PC32 v-0x4
f: 01 d0 add %edx,%eax
11: c3 retq
12: 66 66 66 66 66 2e 0f data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
19: 1f 84 00 00 00 00 00
0000000000000020 <b>:
20: b8 fb ff ff ff mov $0xfffffffb,%eax
25: f0 0f c1 05 00 00 00 lock xadd %eax,0x0(%rip) # 2d <b+0xd>
2c: 00
29: R_X86_64_PC32 v-0x4
2d: 83 e8 05 sub $0x5,%eax
30: c3 retq
There is no reason for the difference. In both cases the latter sequence
should be generated.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-11-19 14:46 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
2012-07-24 19:22 ` ubizjak at gmail dot com
2012-08-01 16:06 ` drepper.fsp at gmail dot com
2012-08-02 14:33 ` drepper.fsp at gmail dot com
2012-08-02 17:36 ` ubizjak at gmail dot com
2012-08-03 2:17 ` drepper.fsp at gmail dot com
2012-08-23 14:33 ` amacleod at redhat dot com
2012-08-23 15:42 ` drepper.fsp at gmail dot com
2012-10-01 15:50 ` amacleod at redhat dot com
2013-11-19 14:09 ` glisse at gcc dot gnu.org
2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).