public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/48986] New: Missed optimization in atomic decrement on x86/x64
@ 2011-05-13 10:32 piotr.wyderski at gmail dot com
  2011-05-13 15:07 ` [Bug rtl-optimization/48986] " jakub at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-13 10:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48986

           Summary: Missed optimization in atomic decrement on x86/x64
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: piotr.wyderski@gmail.com


Many uses of __sync_fetch_and_add() boil down to
decrement operation and checking if the result is
zero in order to delete the pointee. The most natural
way is to define it as:

bool xxx_decrement(int* p) {

   return __sync_fetch_and_add(p, -1) == 1;
}

void yyy(int* p) {

    if (xxx_decrement(p)) {

        delete p;
    }
}

Unfortunately, GCC compiles it in a literal way:

<__Z3yyyPi>:
  40edd0:    83 ec 0c                 sub    $0xc,%esp
  40edd3:    ba ff ff ff ff           mov    $0xffffffff,%edx
  40edd8:    8b 44 24 10              mov    0x10(%esp),%eax
  40eddc:    f0 0f c1 10              lock xadd %edx,(%eax)
  40ede0:    83 fa 01                 cmp    $0x1,%edx
  40ede3:    74 0b                    je     40edf0 <__Z3yyyPi+0x20>
  40ede5:    83 c4 0c                 add    $0xc,%esp
  40ede8:    c3                       ret    
  40ede9:    8d b4 26 00 00 00 00     lea    0x0(%esi,%eiz,1),%esi
  40edf0:    89 44 24 10              mov    %eax,0x10(%esp)
  40edf4:    83 c4 0c                 add    $0xc,%esp
  40edf7:    e9 24 03 00 00           jmp    40f120 <___wrap__ZdlPv>
  40edfc:    8d 74 26 00              lea    0x0(%esi,%eiz,1),%esi 

with the gist being:

  40edd3:    ba ff ff ff ff           mov    $0xffffffff,%edx
  40eddc:    f0 0f c1 10              lock xadd %edx,(%eax)
  40ede0:    83 fa 01                 cmp    $0x1,%edx
  40ede3:    74 0b                    je     40edf0 <__Z3yyyPi+0x20>

This special case should be handled by the optimizer and produce:

   lock sub $0x01,(%eax)
   je ...

or:

   lock dec (%eax)
   je ...

on platforms which do not suffer carry chain dependency penalties,
e.g. some AMD's chips.

Please note that this generalizes for any N:

   return __sync_fetch_and_add(p, -N) == N;

with a remark that for N != 1 the dec replacement can't be used.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-26  5:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-13 10:32 [Bug rtl-optimization/48986] New: Missed optimization in atomic decrement on x86/x64 piotr.wyderski at gmail dot com
2011-05-13 15:07 ` [Bug rtl-optimization/48986] " jakub at gcc dot gnu.org
2011-05-16 12:07 ` jakub at gcc dot gnu.org
2011-05-16 13:14 ` [Bug target/48986] " jakub at gcc dot gnu.org
2011-05-17  7:58 ` jakub at gcc dot gnu.org
2011-05-17  8:17 ` jakub at gcc dot gnu.org
2011-05-17  8:41 ` jakub at gcc dot gnu.org
2021-07-26  5:22 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).