[Bug target/98737] New: Atomic operation on x86 no optimized to use flags

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/98737] New: Atomic operation on x86 no optimized to use flags
@ 2021-01-18 19:59 drepper.fsp+rhbz at gmail dot com
  2021-01-19  7:55 ` [Bug target/98737] " rguenth at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: drepper.fsp+rhbz at gmail dot com @ 2021-01-18 19:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

            Bug ID: 98737
           Summary: Atomic operation on x86 no optimized to use flags
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: drepper.fsp+rhbz at gmail dot com
  Target Milestone: ---

Consider the following code:

long a;

_Bool f(long b)
{
  return __atomic_sub_fetch(&a, b, __ATOMIC_RELEASE) == 0;
}

_Bool g(long b)
{
  return (a -= b) == 0;
}


When compiling for x86-64 with the current HEAD as of 20210118 the resulting
code is:

0000000000000000 <f>:
   0:   48 f7 df                neg    %rdi
   3:   48 89 f8                mov    %rdi,%rax
   6:   f0 48 0f c1 05 00 00    lock xadd %rax,0x0(%rip)        # f <f+0xf>
   d:   00 00 
   f:   48 01 f8                add    %rdi,%rax
  12:   0f 94 c0                sete   %al
  15:   c3                      retq   
  16:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  1d:   00 00 00 

0000000000000020 <g>:
  20:   48 29 3d 00 00 00 00    sub    %rdi,0x0(%rip)        # 27 <g+0x7>
  27:   0f 94 c0                sete   %al
  2a:   c3                      retq   

The code for f is far too complicated.  All that needs to be different from the
code in g is that the lock prefix must be used for sub.

Probably all __atomic_* builtins have problems with using flags when possible.

This is not an esoteric problem.  I was specifically looking at optimizing the
std::latch implementation for C++20 and this is what would be needed.  Without
a fix a special version would be needed or the current, much worse code is
used.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
@ 2021-01-19  7:55 ` rguenth at gcc dot gnu.org
  2021-01-19  8:37 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-19  7:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-01-19
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1
             Target|x86                         |x86_64-*-* i?86-*-*

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  Probably difficult to achieve since the atomics map to
parallel/UNSPEC:

(insn 7 6 8 (parallel [
            (set (reg:DI 83 [ _2 ])
                (unspec_volatile:DI [
                        (mem/v:DI (symbol_ref:DI ("a") [flags 0x2]  <var_decl
0x7ffff7ff5b40 a>) [-1  S8 A64])
                        (const_int 3 [0x3])
                    ] UNSPECV_XCHG))
            (set (mem/v:DI (symbol_ref:DI ("a") [flags 0x2]  <var_decl
0x7ffff7ff5b40 a>) [-1  S8 A64])
                (plus:DI (mem/v:DI (symbol_ref:DI ("a") [flags 0x2]  <var_decl
0x7ffff7ff5b40 a>) [-1  S8 A64])
                    (reg:DI 86)))
            (clobber (reg:CC 17 flags))
        ]) "t.c":5:10 -1
     (nil))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
  2021-01-19  7:55 ` [Bug target/98737] " rguenth at gcc dot gnu.org
@ 2021-01-19  8:37 ` ubizjak at gmail dot com
  2021-01-19  9:53 ` redi at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2021-01-19  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
This can be optimized with peephole2, we already have similar case in sync.md:

;; This peephole2 and following insn optimize
;; __sync_fetch_and_add (x, -N) == N into just lock {add,sub,inc,dec}
;; followed by testing of flags instead of lock xadd and comparisons.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
  2021-01-19  7:55 ` [Bug target/98737] " rguenth at gcc dot gnu.org
  2021-01-19  8:37 ` ubizjak at gmail dot com
@ 2021-01-19  9:53 ` redi at gcc dot gnu.org
  2021-01-19 10:01 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2021-01-19  9:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #2)
> This can be optimized with peephole2, we already have similar case in
> sync.md:
> 
> ;; This peephole2 and following insn optimize
> ;; __sync_fetch_and_add (x, -N) == N into just lock {add,sub,inc,dec}
> ;; followed by testing of flags instead of lock xadd and comparisons.

That doesn't seem to do anything though, I see the same bad code for 

_Bool h(long b)
{
  return __sync_fetch_and_add(&a, -b) == b;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (2 preceding siblings ...)
  2021-01-19  9:53 ` redi at gcc dot gnu.org
@ 2021-01-19 10:01 ` jakub at gcc dot gnu.org
  2021-01-26 15:09 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-19 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That peephole handles only constants, i.e.
_Bool h(void)
{
  return __sync_fetch_and_add(&a, -32) == 32;
}

_Bool i(void)
{
  return __sync_fetch_and_add(&a, 64) == -64;
}

Anyway, for #c0 the peephole2 would need to match because it is op_fetch rather
than fetch_op the UNSPEC_XCHG followed by the operation again followed by
comparison.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (3 preceding siblings ...)
  2021-01-19 10:01 ` jakub at gcc dot gnu.org
@ 2021-01-26 15:09 ` jakub at gcc dot gnu.org
  2021-01-26 16:45 ` drepper.fsp+rhbz at gmail dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-26 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 50058
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50058&action=edit
gcc11-pr98737.patch

Untested fix.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (4 preceding siblings ...)
  2021-01-26 15:09 ` jakub at gcc dot gnu.org
@ 2021-01-26 16:45 ` drepper.fsp+rhbz at gmail dot com
  2021-01-26 17:11 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: drepper.fsp+rhbz at gmail dot com @ 2021-01-26 16:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #6 from Ulrich Drepper <drepper.fsp+rhbz at gmail dot com> ---
(In reply to Jakub Jelinek from comment #5)
> Created attachment 50058 [details]
> gcc11-pr98737.patch
> 
> Untested fix.

This only handles sub?

The same applies to add, or, and, xor.  Maybe nand?  Can this patch be
generalized?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (5 preceding siblings ...)
  2021-01-26 16:45 ` drepper.fsp+rhbz at gmail dot com
@ 2021-01-26 17:11 ` jakub at gcc dot gnu.org
  2021-01-26 20:10 ` drepper.fsp+rhbz at gmail dot com
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-26 17:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The sub fix won't be the same as would add, perhaps xor/or/and can be handled
by the same peephole2, but even for that I'm not sure.  Though e.g. trying
__atomic_or_fetch (&a, b, ...) == 0 doesn't seem to be something people would
use.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (6 preceding siblings ...)
  2021-01-26 17:11 ` jakub at gcc dot gnu.org
@ 2021-01-26 20:10 ` drepper.fsp+rhbz at gmail dot com
  2021-12-14 10:20 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: drepper.fsp+rhbz at gmail dot com @ 2021-01-26 20:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #8 from Ulrich Drepper <drepper.fsp+rhbz at gmail dot com> ---
(In reply to Jakub Jelinek from comment #7)
> The sub fix won't be the same as would add, perhaps xor/or/and can be
> handled by the same peephole2, but even for that I'm not sure.

Just a proposal, but I can see myself using code like this.


> Though e.g.
> trying __atomic_or_fetch (&a, b, ...) == 0 doesn't seem to be something
> people would use.

I can see this being valid.  If b is a variable of some time (e.g.,
representing a flag), a could be the set of all flag set and the result of the
or operation being non-zero could mean some work based on the flags needs to be
done.

The alternative is is

  if ((b != 0 && __atomic_or_fetch(&a, b, ...)) || a != 0)
    ...

Unconditionally performing the or is likely faster than the additional tests
and jumps.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (7 preceding siblings ...)
  2021-01-26 20:10 ` drepper.fsp+rhbz at gmail dot com
@ 2021-12-14 10:20 ` jakub at gcc dot gnu.org
  2022-01-03 13:17 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-12-14 10:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #50058|0                           |1
        is obsolete|                            |

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51997
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51997&action=edit
gcc12-pr98737.patch

Updated patch which uses internal fn and optabs for this.

Only the *_fetch and not fetch_* atomics are handled though.
For the latter, I wonder if we instead just shouldn't generally try to replace
__atomic_fetch_add (&x, y, whatever) + y with __atomic_add_fetch (&x, y,
whatever) and __atomic_add_fetch (&x, y, whatever) - y with __atomic_fetch_add
(&x, y, whatever) etc.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (8 preceding siblings ...)
  2021-12-14 10:20 ` jakub at gcc dot gnu.org
@ 2022-01-03 13:17 ` cvs-commit at gcc dot gnu.org
  2022-01-12 15:50 ` jakub at gcc dot gnu.org
  2022-01-14 11:07 ` cvs-commit at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-03 13:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:6362627b27f395b054f359244fcfcb15ac0ac2ab

commit r12-6190-g6362627b27f395b054f359244fcfcb15ac0ac2ab
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Mon Jan 3 14:02:23 2022 +0100

    i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z)
{==,!=,<,<=,>,>=} 0 [PR98737]

    On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches
wrote:
    > On 1/27/21 11:37 AM, Jakub Jelinek wrote:
    > > Would equality comparison against 0 handle the most common cases.
    > >
    > > The user can write it as
    > > __atomic_sub_fetch (x, y, z) == 0
    > > or
    > > __atomic_fetch_sub (x, y, z) - y == 0
    > > thouch, so the expansion code would need to be able to cope with both.
    >
    > Please also keep !=0, <0, <=0, >0, and >=0 in mind.  They all can be
    > useful and can be handled with the flags.

    <= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86
doesn't
    have comparisons that would look solely at both SF and ZF and not at other
    flags (and emitting two separate conditional jumps or two setcc insns and
    oring them together looks awful).

    But the rest can work.

    Here is a patch that adds internal functions and optabs for these,
    recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal
    functions (fold all builtins pass) and expands them appropriately (or for
    the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back).

    So far I have handled just the op_fetch builtins, IMHO instead of handling
    also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize
    __atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice
    versa).

    2022-01-03  Jakub Jelinek  <jakub@redhat.com>

            PR target/98737
            * internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0,
            ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0,
ATOMIC_XOR_FETCH_CMP_0):
            New internal fns.
            * internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ,
ATOMIC_OP_FETCH_CMP_0_NE,
            ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE,
            ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New
enumerators.
            * internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0,
            expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0,
            expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New
            functions.
            * optabs.def (atomic_add_fetch_cmp_0_optab,
            atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab,
            atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New
            direct optabs.
            * builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare.
            * builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function.
            * tree-ssa-ccp.c: Include internal-fn.h.
            (optimize_atomic_bit_test_and): Add . before internal fn call
            in function comment.  Change return type from void to bool and
            return true only if successfully replaced.
            (optimize_atomic_op_fetch_cmp_0): New function.
            (pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0
            for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and
            BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16},
            for *XOR* ones only if optimize_atomic_bit_test_and failed.
            * config/i386/sync.md
(atomic_<plusminus_mnemonic>_fetch_cmp_0<mode>,
            atomic_<logic>_fetch_cmp_0<mode>): New define_expand patterns.
            (atomic_add_fetch_cmp_0<mode>_1, atomic_sub_fetch_cmp_0<mode>_1,
            atomic_<logic>_fetch_cmp_0<mode>_1): New define_insn patterns.
            * doc/md.texi (atomic_add_fetch_cmp_0<mode>,
            atomic_sub_fetch_cmp_0<mode>, atomic_and_fetch_cmp_0<mode>,
            atomic_or_fetch_cmp_0<mode>, atomic_xor_fetch_cmp_0<mode>):
Document
            new named patterns.

            * gcc.target/i386/pr98737-1.c: New test.
            * gcc.target/i386/pr98737-2.c: New test.
            * gcc.target/i386/pr98737-3.c: New test.
            * gcc.target/i386/pr98737-4.c: New test.
            * gcc.target/i386/pr98737-5.c: New test.
            * gcc.target/i386/pr98737-6.c: New test.
            * gcc.target/i386/pr98737-7.c: New test.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (9 preceding siblings ...)
  2022-01-03 13:17 ` cvs-commit at gcc dot gnu.org
@ 2022-01-12 15:50 ` jakub at gcc dot gnu.org
  2022-01-14 11:07 ` cvs-commit at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-01-12 15:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 52170
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52170&action=edit
gcc12-pr98737-canon.patch

Untested patch to fix up if users misuse the fetch_op or op_fetch atomic when
they should be using the other one.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/98737] Atomic operation on x86 no optimized to use flags
  2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
                   ` (10 preceding siblings ...)
  2022-01-12 15:50 ` jakub at gcc dot gnu.org
@ 2022-01-14 11:07 ` cvs-commit at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-14 11:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:9896e96d4cae00d0f4d2b694284cb30bbd9c80fc

commit r12-6577-g9896e96d4cae00d0f4d2b694284cb30bbd9c80fc
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Jan 14 12:04:59 2022 +0100

    forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa
[PR98737]

    When writing the PR98737 fix, I've handled just the case where people
    use __atomic_op_fetch (p, x, y) etc.
    But some people actually use the other builtins, like
    __atomic_fetch_op (p, x, y) op x.
    The following patch canonicalizes the latter to the former and vice versa
    when possible if the result of the builtin is a single use and if
    that use is a cast with same precision, also that cast's lhs has a single
    use.
    For all ops of +, -, &, | and ^ we can do those
    __atomic_fetch_op (p, x, y) op x -> __atomic_op_fetch (p, x, y)
    (and __sync too) opts, but cases of INTEGER_CST and SSA_NAME x
    behave differently.  For INTEGER_CST, typically - x is
    canonicalized to + (-x), while for SSA_NAME we need to handle various
    casts, which sometimes happen on the second argument of the builtin
    (there can be even two subsequent casts for char/short due to the
    promotions we do) and there can be a cast on the argument of op too.
    And all ops but - are commutative.
    For the other direction, i.e.
    __atomic_op_fetch (p, x, y) rop x -> __atomic_fetch_op (p, x, y)
    we can't handle op of & and |, those aren't reversible, for
    op + rop is -, for - rop is + and for ^ rop is ^, otherwise the same
    stuff as above applies.
    And, there is another case, we canonicalize
    x - y == 0 (or != 0) and x ^ y == 0 (or != 0) to x == y (or x != y)
    and for constant y x + y == 0 (or != 0) to x == -y (or != -y),
    so the patch also virtually undoes those canonicalizations, because
    e.g. for the earlier PR98737 patch but even generally, it is better
    if a result of atomic op fetch is compared against 0 than doing
    atomic fetch op and compare it to some variable or non-zero constant.
    As for debug info, for non-reversible operations (& and |) the patch
    resets debug stmts if there are any, for -fnon-call-exceptions too
    (didn't want to include debug temps right before all uses), but
    otherwise it emits (on richi's request) the reverse operation from
    the result as a new setter of the old lhs, so that later DCE fixes
    up the debug info.

    On the emitted assembly for the testcases which are fairly large,
    I see substantial decreases of the *.s size:
    -rw-rw-r--. 1 jakub jakub 116897 Jan 13 09:58 pr98737-1.svanilla
    -rw-rw-r--. 1 jakub jakub  93861 Jan 13 09:57 pr98737-1.spatched
    -rw-rw-r--. 1 jakub jakub  70257 Jan 13 09:57 pr98737-2.svanilla
    -rw-rw-r--. 1 jakub jakub  67537 Jan 13 09:57 pr98737-2.spatched
    There are some functions where due to RA we get one more instruction
    than previously, but most of them are smaller even when not hitting
    the PR98737 previous patch's optimizations.

    2022-01-14  Jakub Jelinek  <jakub@redhat.com>

            PR target/98737
            * tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize
            __atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, y)
            and __atomic_op_fetch (p, x, y) iop x into
            __atomic_fetch_op (p, x, y).

            * gcc.dg/tree-ssa/pr98737-1.c: New test.
            * gcc.dg/tree-ssa/pr98737-2.c: New test.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-01-14 11:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18 19:59 [Bug target/98737] New: Atomic operation on x86 no optimized to use flags drepper.fsp+rhbz at gmail dot com
2021-01-19  7:55 ` [Bug target/98737] " rguenth at gcc dot gnu.org
2021-01-19  8:37 ` ubizjak at gmail dot com
2021-01-19  9:53 ` redi at gcc dot gnu.org
2021-01-19 10:01 ` jakub at gcc dot gnu.org
2021-01-26 15:09 ` jakub at gcc dot gnu.org
2021-01-26 16:45 ` drepper.fsp+rhbz at gmail dot com
2021-01-26 17:11 ` jakub at gcc dot gnu.org
2021-01-26 20:10 ` drepper.fsp+rhbz at gmail dot com
2021-12-14 10:20 ` jakub at gcc dot gnu.org
2022-01-03 13:17 ` cvs-commit at gcc dot gnu.org
2022-01-12 15:50 ` jakub at gcc dot gnu.org
2022-01-14 11:07 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).