[Bug target/54087] New: __atomic_fetch_add does not use xadd instruction

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction
@ 2012-07-24 19:00 drepper.fsp at gmail dot com
  2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-07-24 19:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

             Bug #: 54087
           Summary: __atomic_fetch_add does not use xadd instruction
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: drepper.fsp@gmail.com
            Target: x86_64-redhat-linux


Compiling this code

int a;

int f1(int p)
{
  return __atomic_sub_fetch(&a, p, __ATOMIC_SEQ_CST) == 0;
}

int f2(int p)
{
  return __atomic_fetch_sub(&a, p, __ATOMIC_SEQ_CST) - p == 0;
}

you'll see that neither function uses the xadd instruction with the lock
prefix.  Instead an expensive emulation using cmpxchg is used:

0000000000000000 <f1>:
   0:    8b 05 00 00 00 00        mov    0x0(%rip),%eax        # 6 <f1+0x6>
            2: R_X86_64_PC32    a-0x4
   6:    89 c2                    mov    %eax,%edx
   8:    29 fa                    sub    %edi,%edx
   a:    f0 0f b1 15 00 00 00     lock cmpxchg %edx,0x0(%rip)        # 12
<f1+0x12>
  11:    00 
            e: R_X86_64_PC32    a-0x4
  12:    75 f2                    jne    6 <f1+0x6>
  14:    31 c0                    xor    %eax,%eax
  16:    85 d2                    test   %edx,%edx
  18:    0f 94 c0                 sete   %al
  1b:    c3                       retq   

This implementation not only is larger, it has possibly (unlikely) unbounded
cost and even if the cmpxchg succeeds right away it is costlier.  The last
point is esepcially true if the cache line for the variable in question is not
in the core's cache.  In this case the initial load causes a I->S transition
for the cache line and the cmpxchg an additional and possibly also very
expensive S->E transition.  Using xadd would cause a I->E transition.

The config/i386/sync.md file in the current tree contains a pattern for
atomic_fetch_add which does use xadd but it seems not to be used, even if
instead of the function parameter an immediate value is used.

;; For operand 2 nonmemory_operand predicate is used instead of
;; register_operand to allow combiner to better optimize atomic
;; additions of constants.
(define_insn "atomic_fetch_add<mode>"
  [(set (match_operand:SWI 0 "register_operand" "=<r>")
        (unspec_volatile:SWI
          [(match_operand:SWI 1 "memory_operand" "+m")
           (match_operand:SI 3 "const_int_operand")]            ;; model
          UNSPECV_XCHG))
   (set (match_dup 1)
        (plus:SWI (match_dup 1)
                  (match_operand:SWI 2 "nonmemory_operand" "0")))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_XADD"
  "lock{%;} %K3xadd{<imodesuffix>}\t{%0, %1|%1, %0}")


X86_ARCH_XADD should be defined for every architecture but i386.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
@ 2012-07-24 19:21 ` ubizjak at gmail dot com
  2012-07-24 19:22 ` ubizjak at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2012-07-24 19:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #1 from Uros Bizjak <ubizjak at gmail dot com> 2012-07-24 19:21:02 UTC ---
Use __atomic_add_fetch and __atomic_fetch_sub in the testcase, and you will
get:

0000000000000000 <f1>:
   0:   89 f8                   mov    %edi,%eax
   2:   f0 0f c1 05 00 00 00    lock xadd %eax,0x0(%rip)        # a <f1+0xa>
   9:   00 
   a:   01 f8                   add    %edi,%eax
   c:   85 c0                   test   %eax,%eax
   e:   0f 94 c0                sete   %al
  11:   0f b6 c0                movzbl %al,%eax
  14:   c3                      retq   

0000000000000020 <f2>:
  20:   89 f8                   mov    %edi,%eax
  22:   f0 0f c1 05 00 00 00    lock xadd %eax,0x0(%rip)        # 2a <f2+0xa>
  29:   00 
  2a:   39 f8                   cmp    %edi,%eax
  2c:   0f 94 c0                sete   %al
  2f:   0f b6 c0                movzbl %al,%eax
  32:   c3                      retq


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
  2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
@ 2012-07-24 19:22 ` ubizjak at gmail dot com
  2012-08-01 16:06 ` drepper.fsp at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2012-07-24 19:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2012-07-24 19:22:00 UTC ---
(In reply to comment #1)
> Use __atomic_add_fetch and __atomic_fetch_sub in the testcase, and you will

Eh, __atomic_fetch_add.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
  2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
  2012-07-24 19:22 ` ubizjak at gmail dot com
@ 2012-08-01 16:06 ` drepper.fsp at gmail dot com
  2012-08-02 14:33 ` drepper.fsp at gmail dot com
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-01 16:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #3 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-01 16:06:33 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > Use __atomic_add_fetch and __atomic_fetch_sub in the testcase, and you will
> 
> Eh, __atomic_fetch_add.

Yes, but the compiler should automatically do this.  The extreme case is this:


int v;

int a(void)
{
  return __sync_sub_and_fetch(&v, 5);
}

int b(void)
{
  return __sync_add_and_fetch(&v, -5);
}


The second function does compile as expected.  The first doesn't, it uses
cmpxchg.

Shouldn't this be easy enough to fix by adding patterns for atomic_fetch_sub
and atomic_sub_fetch which match if the second parameter is a constant?  If
it's not a constant a bit more code is needed but that should be no problem
either.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (2 preceding siblings ...)
  2012-08-01 16:06 ` drepper.fsp at gmail dot com
@ 2012-08-02 14:33 ` drepper.fsp at gmail dot com
  2012-08-02 17:36 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-02 14:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #4 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-02 14:33:19 UTC ---
One more data point.  In a micro-benchmark which uses realistic code used in
production the change from

   __sync_sub_and_fetch(var, constant)

to

   __sync_add_and_fetch(var, -constant)

lead to a 10% to 27% improvement in performance.  The cmpxchg use with the
necessary initial load and I->S cache transition really kills performance when
memory is highly contested.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (3 preceding siblings ...)
  2012-08-02 14:33 ` drepper.fsp at gmail dot com
@ 2012-08-02 17:36 ` ubizjak at gmail dot com
  2012-08-03  2:17 ` drepper.fsp at gmail dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2012-08-02 17:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2012-08-02 17:35:30 UTC ---
Created attachment 27927
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27927
Patch that introduces atomic_fetch_sub<mode> with const_int operands

This patch introduces atomic_fetch_sub<mode>:

--cut here--
(define_expand "atomic_fetch_sub<mode>"
  [(set (match_operand:SWI 0 "register_operand")
    (unspec_volatile:SWI
      [(match_operand:SWI 1 "memory_operand")
       (match_operand:SI 3 "const_int_operand")]        ;; model
      UNSPECV_XCHG))
   (set (match_dup 1)
    (minus:SWI (match_dup 1)
           (match_operand:SWI 2 "const_int_operand")))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_XADD"
{
  /* Avoid overflows.  */
  if (mode_signbit_p (<MODE>mode, operands[2]))
    FAIL;

  operands[2] = negate_rtx (<MODE>mode, operands[2]);

  emit_insn (gen_atomic_fetch_add<mode> (operands[0], operands[1],
                     operands[2], operands[3]));
  DONE;
})
--cut here--

We have to take care of overflows, so we can handle only const_int operands
with known-to-be overflow free values.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (4 preceding siblings ...)
  2012-08-02 17:36 ` ubizjak at gmail dot com
@ 2012-08-03  2:17 ` drepper.fsp at gmail dot com
  2012-08-23 14:33 ` amacleod at redhat dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-03  2:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #6 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-03 02:16:57 UTC ---
(In reply to comment #5)
> This patch introduces atomic_fetch_sub<mode>:

Seems to work nicely.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (5 preceding siblings ...)
  2012-08-03  2:17 ` drepper.fsp at gmail dot com
@ 2012-08-23 14:33 ` amacleod at redhat dot com
  2012-08-23 15:42 ` drepper.fsp at gmail dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: amacleod at redhat dot com @ 2012-08-23 14:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

Andrew Macleod <amacleod at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27927|0                           |1
        is obsolete|                            |
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-08-23
         AssignedTo|unassigned at gcc dot       |amacleod at redhat dot com
                   |gnu.org                     |
     Ever Confirmed|0                           |1

--- Comment #7 from Andrew Macleod <amacleod at redhat dot com> 2012-08-23 14:33:01 UTC ---
Created attachment 28074
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28074
Generic expansion patch proposal

This patch changes the tree-rtl expansion code to try using the complementary 
atomic add or sub operation with a negative operand if an instruction sequence
is not created.

This will enable the optimization automatically for all targets.

Check to see if it solves the problem as well.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (6 preceding siblings ...)
  2012-08-23 14:33 ` amacleod at redhat dot com
@ 2012-08-23 15:42 ` drepper.fsp at gmail dot com
  2012-10-01 15:50 ` amacleod at redhat dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-08-23 15:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #8 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-23 15:41:49 UTC ---
(In reply to comment #7)
> Check to see if it solves the problem as well.

I tested it.  Seems to  work in all cases and does not disturb other
optimizations like comparisons with zero.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (7 preceding siblings ...)
  2012-08-23 15:42 ` drepper.fsp at gmail dot com
@ 2012-10-01 15:50 ` amacleod at redhat dot com
  2013-11-19 14:09 ` glisse at gcc dot gnu.org
  2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: amacleod at redhat dot com @ 2012-10-01 15:50 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

--- Comment #9 from Andrew Macleod <amacleod at redhat dot com> 2012-10-01 15:50:26 UTC ---
Author: amacleod
Date: Mon Oct  1 15:50:09 2012
New Revision: 191929

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191929
Log:

2012-10-01  Andrew MacLeod  <amacleod@redhat.com>

    PR target/54087
    * optabs.c (expand_atomic_fetch_op_no_fallback): New.  Factored code
    from expand_atomic_fetch_op.
    (expand_atomic_fetch_op):  Try atomic_{add|sub} operations in terms of
    the other one if direct opcode fails.
    * testsuite/gcc.dg/pr54087.c:  New testcase for atomic_sub -> 
    atomic_add when atomic_sub fails.


Added:
    trunk/gcc/testsuite/gcc.dg/pr54087.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/optabs.c
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (8 preceding siblings ...)
  2012-10-01 15:50 ` amacleod at redhat dot com
@ 2013-11-19 14:09 ` glisse at gcc dot gnu.org
  2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-11-19 14:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Marc Glisse <glisse at gcc dot gnu.org> ---
Looks like it is fixed.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54087] __atomic_fetch_add does not use xadd instruction
  2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
                   ` (9 preceding siblings ...)
  2013-11-19 14:09 ` glisse at gcc dot gnu.org
@ 2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: drepper.fsp+rhbz at gmail dot com @ 2013-11-19 14:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

Ulrich Drepper <drepper.fsp+rhbz at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |drepper.fsp+rhbz at gmail dot com

--- Comment #11 from Ulrich Drepper <drepper.fsp+rhbz at gmail dot com> ---
Yes, although there still in an oddity.  The code from comment #3 is compiled
as:

0000000000000000 <a>:
   0:    ba fb ff ff ff           mov    $0xfffffffb,%edx
   5:    89 d0                    mov    %edx,%eax
   7:    f0 0f c1 05 00 00 00     lock xadd %eax,0x0(%rip)        # f <a+0xf>
   e:    00 
            b: R_X86_64_PC32    v-0x4
   f:    01 d0                    add    %edx,%eax
  11:    c3                       retq   
  12:    66 66 66 66 66 2e 0f     data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
  19:    1f 84 00 00 00 00 00 

0000000000000020 <b>:
  20:    b8 fb ff ff ff           mov    $0xfffffffb,%eax
  25:    f0 0f c1 05 00 00 00     lock xadd %eax,0x0(%rip)        # 2d <b+0xd>
  2c:    00 
            29: R_X86_64_PC32    v-0x4
  2d:    83 e8 05                 sub    $0x5,%eax
  30:    c3                       retq   



There is no reason for the difference.  In both cases the latter sequence
should be generated.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-11-19 14:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-24 19:00 [Bug target/54087] New: __atomic_fetch_add does not use xadd instruction drepper.fsp at gmail dot com
2012-07-24 19:21 ` [Bug target/54087] " ubizjak at gmail dot com
2012-07-24 19:22 ` ubizjak at gmail dot com
2012-08-01 16:06 ` drepper.fsp at gmail dot com
2012-08-02 14:33 ` drepper.fsp at gmail dot com
2012-08-02 17:36 ` ubizjak at gmail dot com
2012-08-03  2:17 ` drepper.fsp at gmail dot com
2012-08-23 14:33 ` amacleod at redhat dot com
2012-08-23 15:42 ` drepper.fsp at gmail dot com
2012-10-01 15:50 ` amacleod at redhat dot com
2013-11-19 14:09 ` glisse at gcc dot gnu.org
2013-11-19 14:46 ` drepper.fsp+rhbz at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).