public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc'
@ 2011-05-31 19:38 desrt at desrt dot ca
  2011-06-01  6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: desrt at desrt dot ca @ 2011-05-31 19:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

           Summary: no intrinsics to emit 'lock bts' and 'lock btc'
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: other
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: desrt@desrt.ca


I want to be able to code this function:

bool
set_and_test (int *a,
              int bit)
{
  uint mask = (1u << bit);

  return (__sync_fetch_and_or (a, mask) & mask) != 0;
}

and have GCC not emit a loop on amd64 and x86.


GCC presently emits a loop for __sync_fetch_and_or() in this case.  That's
because asm "lock or" discards the previous value, so it can only be used in
cases that the result is ignored.  Since we do a comparison with the value, GCC
has to do the loop.

This special case (set and test a single bit) corresponds quite directly to the
'lock bts' assembly instruction, though.  GCC could emit that instead.

It would be nice if GCC could detect (by magic?) that I am only interested in
this single bit or (probably much easier) expose an intrinsic that lets me
access this functionality on platforms that it exists and falls back to using
__sync_fetch_and_or() otherwise.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug other/49244] no intrinsics to emit 'lock bts' and 'lock btc'
  2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
@ 2011-06-01  6:55 ` jakub at gcc dot gnu.org
  2011-06-02 23:07 ` desrt at desrt dot ca
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-06-01  6:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aldyh at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org,
                   |                            |rth at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-01 06:54:05 UTC ---
I'm afraid this can't be done in the combiner nor in peephole2 and generally
would be hard to do after expansion, but perhaps it could be done during
expansion or earlier, by folding the (complex) sequence into a new builtin,
preferrably with space in the name to make it inaccessible to users and then
expanding that builtin.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug other/49244] no intrinsics to emit 'lock bts' and 'lock btc'
  2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
  2011-06-01  6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org
@ 2011-06-02 23:07 ` desrt at desrt dot ca
  2012-08-23  9:47 ` [Bug target/49244] " mgorny at gentoo dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: desrt at desrt dot ca @ 2011-06-02 23:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

--- Comment #2 from Ryan Lortie <desrt at desrt dot ca> 2011-06-02 23:05:58 UTC ---
I'd be happy enough if GCC published a "recommended sequence" or something so
that it would be easier to check for this exact sequence...

A new user-accessible builtin would also make me happy.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/49244] no intrinsics to emit 'lock bts' and 'lock btc'
  2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
  2011-06-01  6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org
  2011-06-02 23:07 ` desrt at desrt dot ca
@ 2012-08-23  9:47 ` mgorny at gentoo dot org
  2021-10-04 14:52 ` [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mgorny at gentoo dot org @ 2012-08-23  9:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

Michał Górny <mgorny at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mgorny at gentoo dot org

--- Comment #3 from Michał Górny <mgorny at gentoo dot org> 2012-08-23 09:46:56 UTC ---
I believe that this became more important with C++11 <atomic>, and a separate
intrinsics will be not be enough anymore. I believe that GCC should be able to
optimize out such a simple cases.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc'
  2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
                   ` (2 preceding siblings ...)
  2012-08-23  9:47 ` [Bug target/49244] " mgorny at gentoo dot org
@ 2021-10-04 14:52 ` jakub at gcc dot gnu.org
  2021-10-06  2:16 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-10-04 14:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I wanted to look at #c20, but at least my i9-7960X for e.g. lock; btsl $65, var
acts the same as lock; btsl $1, var rather than lock; btsl $1, var+8,
so maybe #c20 is not possible.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc'
  2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
                   ` (3 preceding siblings ...)
  2021-10-04 14:52 ` [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' jakub at gcc dot gnu.org
@ 2021-10-06  2:16 ` hjl.tools at gmail dot com
  2021-10-06  8:00 ` jakub at gcc dot gnu.org
  2021-11-10  9:17 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: hjl.tools at gmail dot com @ 2021-10-06  2:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

--- Comment #25 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Jakub Jelinek from comment #24)
> I wanted to look at #c20, but at least my i9-7960X for e.g. lock; btsl $65,
> var
> acts the same as lock; btsl $1, var rather than lock; btsl $1, var+8,
> so maybe #c20 is not possible.

The maximum bit position is 63 for the immediate operand.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc'
  2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
                   ` (4 preceding siblings ...)
  2021-10-06  2:16 ` hjl.tools at gmail dot com
@ 2021-10-06  8:00 ` jakub at gcc dot gnu.org
  2021-11-10  9:17 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-10-06  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #26 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So bug is fixed then, #c20 is impossible.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc'
  2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
                   ` (5 preceding siblings ...)
  2021-10-06  8:00 ` jakub at gcc dot gnu.org
@ 2021-11-10  9:17 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-10  9:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

--- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:fb161782545224f55ba26ba663889c5e6e9a04d1

commit r12-5102-gfb161782545224f55ba26ba663889c5e6e9a04d1
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Oct 25 13:59:51 2021 +0800

    Improve integer bit test on __atomic_fetch_[or|and]_* returns

    commit adedd5c173388ae505470df152b9cb3947339566
    Author: Jakub Jelinek <jakub@redhat.com>
    Date:   Tue May 3 13:37:25 2016 +0200

        re PR target/49244 (__sync or __atomic builtins will not emit 'lock
bts/btr/btc')

    optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns
    with lock bts/btr/btc by turning

      mask_2 = 1 << cnt_1;
      _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
      _5 = _4 & mask_2;

    into

      _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3);
      _5 = _4;

    and

      mask_6 = 1 << bit_5(D);
      _1 = ~mask_6;
      _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
      _3 = _2 & mask_6;
      _4 = _3 != 0;

    into

      mask_6 = 1 << bit_5(D);
      _1 = ~mask_6;
      _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0);
      _4 = _11 != 0;

    But it failed to optimize many equivalent, but slighly different cases:

    1.
      _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
      _4 = (_Bool) _1;
    2.
      _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
      _4 = (_Bool) _1;
    3.
      _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
      _7 = ~_1;
      _5 = (_Bool) _7;
    4.
      _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
      _7 = ~_1;
      _5 = (_Bool) _7;
    5.
      _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
      _2 = (int) _1;
      _7 = ~_2;
      _5 = (_Bool) _7;
    6.
      _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
      _2 = (int) _1;
      _7 = ~_2;
      _5 = (_Bool) _7;
    7.
      _1 = __atomic_fetch_or_4 (ptr_6, 0x80000000, _3);
      _5 = (signed int) _1;
      _4 = _5 < 0;
    8.
      _1 = __atomic_fetch_and_4 (ptr_6, 0x7fffffff, _3);
      _5 = (signed int) _1;
      _4 = _5 < 0;
    9.
      _1 = 1 << bit_4(D);
      mask_5 = (unsigned int) _1;
      _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0);
      _3 = _2 & mask_5;
    10.
      mask_7 = 1 << bit_6(D);
      _1 = ~mask_7;
      _2 = (unsigned int) _1;
      _3 = __atomic_fetch_and_4 (v_9(D), _2, 0);
      _4 = (int) _3;
      _5 = _4 & mask_7;

    We make

      mask_2 = 1 << cnt_1;
      _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
      _5 = _4 & mask_2;

    and

      mask_6 = 1 << bit_5(D);
      _1 = ~mask_6;
      _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
      _3 = _2 & mask_6;
      _4 = _3 != 0;

    the canonical forms for this optimization and transform cases 1-9 to the
    equivalent canonical form.  For cases 10 and 11, we simply remove the cast
    before __atomic_fetch_or_4/__atomic_fetch_and_4 with

      _1 = 1 << bit_4(D);
      _2 = __atomic_fetch_or_4 (v_7(D), _1, 0);
      _3 = _2 & _1;

    and

      mask_7 = 1 << bit_6(D);
      _1 = ~mask_7;
      _3 = __atomic_fetch_and_4 (v_9(D), _1, 0);
      _6 = _3 & mask_7;
      _5 = (int) _6;

    2021-11-04  H.J. Lu  <hongjiu.lu@intel.com>
                Hongtao Liu  <hongtao.liu@intel.com>
    gcc/

            PR middle-end/102566
            * match.pd (nop_atomic_bit_test_and_p): New match.
            * tree-ssa-ccp.c (convert_atomic_bit_not): New function.
            (gimple_nop_atomic_bit_test_and_p): New prototype.
            (optimize_atomic_bit_test_and): Transform equivalent, but slighly
            different cases to their canonical forms.

    gcc/testsuite/

            PR middle-end/102566
            * g++.target/i386/pr102566-1.C: New test.
            * g++.target/i386/pr102566-2.C: Likewise.
            * g++.target/i386/pr102566-3.C: Likewise.
            * g++.target/i386/pr102566-4.C: Likewise.
            * g++.target/i386/pr102566-5a.C: Likewise.
            * g++.target/i386/pr102566-5b.C: Likewise.
            * g++.target/i386/pr102566-6a.C: Likewise.
            * g++.target/i386/pr102566-6b.C: Likewise.
            * gcc.target/i386/pr102566-1a.c: Likewise.
            * gcc.target/i386/pr102566-1b.c: Likewise.
            * gcc.target/i386/pr102566-2.c: Likewise.
            * gcc.target/i386/pr102566-3a.c: Likewise.
            * gcc.target/i386/pr102566-3b.c: Likewise.
            * gcc.target/i386/pr102566-4.c: Likewise.
            * gcc.target/i386/pr102566-5.c: Likewise.
            * gcc.target/i386/pr102566-6.c: Likewise.
            * gcc.target/i386/pr102566-7.c: Likewise.
            * gcc.target/i386/pr102566-8a.c: Likewise.
            * gcc.target/i386/pr102566-8b.c: Likewise.
            * gcc.target/i386/pr102566-9a.c: Likewise.
            * gcc.target/i386/pr102566-9b.c: Likewise.
            * gcc.target/i386/pr102566-10a.c: Likewise.
            * gcc.target/i386/pr102566-10b.c: Likewise.
            * gcc.target/i386/pr102566-11.c: Likewise.
            * gcc.target/i386/pr102566-12.c: Likewise.
            * gcc.target/i386/pr102566-13.c: New test.
            * gcc.target/i386/pr102566-14.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-10  9:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca
2011-06-01  6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org
2011-06-02 23:07 ` desrt at desrt dot ca
2012-08-23  9:47 ` [Bug target/49244] " mgorny at gentoo dot org
2021-10-04 14:52 ` [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' jakub at gcc dot gnu.org
2021-10-06  2:16 ` hjl.tools at gmail dot com
2021-10-06  8:00 ` jakub at gcc dot gnu.org
2021-11-10  9:17 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).