public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization
@ 2022-05-05 14:00 lh_mouse at 126 dot com
  2022-05-05 14:10 ` [Bug c/105495] " lh_mouse at 126 dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: lh_mouse at 126 dot com @ 2022-05-05 14:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

            Bug ID: 105495
           Summary: `__atomic_compare_exchange` prevents tail-call
                    optimization
           Product: gcc
           Version: 11.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lh_mouse at 126 dot com
  Target Milestone: ---

Godbolt: https://gcc.godbolt.org/z/7ob6zc17P

Offending testcase:

```c
typedef struct { int b; } cond;

int
__MCF_batch_release_common(cond* p, int c);

int
_MCF_cond_signal_some(cond* p, int x)
  {
    cond c = {x}, n = {2};
    __atomic_compare_exchange(p, &c, &n, 1, 0, 0);
    return __MCF_batch_release_common(p, x);
  }
```


GCC output:

```asm
_MCF_cond_signal_some:
        sub     rsp, 24
        mov     edx, 2
        mov     eax, esi
        mov     DWORD PTR [rsp+12], esi
        lock cmpxchg    DWORD PTR [rdi], edx
        je      .L2
        mov     DWORD PTR [rsp+12], eax      <------- note this extra store,
which clang doesn't generate
.L2:
        call    __MCF_batch_release_common
        add     rsp, 24
        ret
```


Clang output:

```asm
_MCF_cond_signal_some:                  # @_MCF_cond_signal_some
        mov     ecx, 2
        mov     eax, esi
        lock            cmpxchg dword ptr [rdi], ecx
        jmp     __MCF_batch_release_common      # TAILCALL
```

1. If `cond` was defined as a scalar type such as `long`, there is no such
issue.
2. `__atomic_exchange` doesn't suffer from this issue.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/105495] `__atomic_compare_exchange` prevents tail-call optimization
  2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
@ 2022-05-05 14:10 ` lh_mouse at 126 dot com
  2022-05-05 14:17 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: lh_mouse at 126 dot com @ 2022-05-05 14:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

--- Comment #1 from LIU Hao <lh_mouse at 126 dot com> ---
A possible workaround is to use a scalar type to provide storage for local
variables, and cast them as needed:

Godbolt: https://gcc.godbolt.org/z/n7zq7Pn4G

```c
typedef struct { int b; } cond;

int
__MCF_batch_release_common(cond* p, int c);

int
_MCF_cond_signal_some(cond* p, int x)
  {
    int c = {x}, n = {2};
    __atomic_compare_exchange((cond*)p, (cond*)&c, (cond*)&n, 1, 0, 0);
    return __MCF_batch_release_common(p, x);
  }
```

This makes GCC output the same assembly as Clang.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/105495] `__atomic_compare_exchange` prevents tail-call optimization
  2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
  2022-05-05 14:10 ` [Bug c/105495] " lh_mouse at 126 dot com
@ 2022-05-05 14:17 ` rguenth at gcc dot gnu.org
  2022-05-05 14:30 ` lh_mouse at 126 dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-05 14:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-05-05
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we pass the 2nd argument by reference which causes a stack
slot to be allocated for 'c':

  c.b = x_2(D);
  __atomic_compare_exchange_4 (p_4(D), &c, 2, 1, 0, 0);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/105495] `__atomic_compare_exchange` prevents tail-call optimization
  2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
  2022-05-05 14:10 ` [Bug c/105495] " lh_mouse at 126 dot com
  2022-05-05 14:17 ` rguenth at gcc dot gnu.org
@ 2022-05-05 14:30 ` lh_mouse at 126 dot com
  2022-05-05 14:33 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: lh_mouse at 126 dot com @ 2022-05-05 14:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

--- Comment #3 from LIU Hao <lh_mouse at 126 dot com> ---
Wouldn't that go away if the value in it is never read back?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/105495] `__atomic_compare_exchange` prevents tail-call optimization
  2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
                   ` (2 preceding siblings ...)
  2022-05-05 14:30 ` lh_mouse at 126 dot com
@ 2022-05-05 14:33 ` jakub at gcc dot gnu.org
  2022-05-05 15:28 ` [Bug middle-end/105495] " pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-05 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The reason why it works in #c1 is that we replace the
  c = x_4(D);
  n_6 = 2;
  n.0_1 = n_6;
  n.1_2 = (unsigned int) n.0_1;
  __atomic_compare_exchange_4 (p_7(D), &c, n.1_2, 1, 0, 0);
call in the IL with:
  c_20 = x_4(D);
  _15 = c_20;
  _16 = VIEW_CONVERT_EXPR<unsigned int>(_15);
  _17 = .ATOMIC_COMPARE_EXCHANGE (p_7(D), _16, 2, 260, 0, 0);
  _18 = REALPART_EXPR <_17>;
  _19 = VIEW_CONVERT_EXPR<int>(_18);
  c_21 = _19;
during ccp1 pass, optimizing away the addressables.
But we don't do that for aggregates with sizes of integer types,
but supposedly we could do that too.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/105495] `__atomic_compare_exchange` prevents tail-call optimization
  2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
                   ` (3 preceding siblings ...)
  2022-05-05 14:33 ` jakub at gcc dot gnu.org
@ 2022-05-05 15:28 ` pinskia at gcc dot gnu.org
  2022-05-06  7:40 ` lh_mouse at 126 dot com
  2022-10-31  7:55 ` [Bug tree-optimization/105495] " pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-05-05 15:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
          Component|c                           |middle-end

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/105495] `__atomic_compare_exchange` prevents tail-call optimization
  2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
                   ` (4 preceding siblings ...)
  2022-05-05 15:28 ` [Bug middle-end/105495] " pinskia at gcc dot gnu.org
@ 2022-05-06  7:40 ` lh_mouse at 126 dot com
  2022-10-31  7:55 ` [Bug tree-optimization/105495] " pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: lh_mouse at 126 dot com @ 2022-05-06  7:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

--- Comment #5 from LIU Hao <lh_mouse at 126 dot com> ---
This does not trigger the issue:

```c
#define __atomic_compare_exchange(p,c,n,w,ms,mf)  \
  ({ int __temp;  \
     __builtin_memcpy(&__temp, c, sizeof(*c));  \
     _Bool __r = __atomic_compare_exchange(p, (__typeof__(*(c))*) &__temp, n,
w, ms, mf);  \
     __builtin_memcpy(c, &__temp, sizeof(*c));  \
     __r;  });

typedef struct { int b; } cond;

int
__MCF_batch_release_common(cond* p, int c);

int
_MCF_cond_signal_some(cond* p, int x)
  {
    cond c = {x}, n = {2};
    __atomic_compare_exchange(p, &c, &n, 1, 0, 0);
    return __MCF_batch_release_common(p, x);
  }
```

which results in (godbolt https://gcc.godbolt.org/z/n68T1c6oP):

```asm
_MCF_cond_signal_some:
        mov     edx, 2
        mov     eax, esi
        lock cmpxchg    DWORD PTR [rdi], edx
        jmp     __MCF_batch_release_common
```

Effectively, we are using a `int` to provide storage for a struct of
`sizeof(int)`.

But if we use a `long` to provide storage for the struct, such issue reappears:

```c
#define __atomic_compare_exchange(p,c,n,w,ms,mf)  \
  ({ long __temp;  \
     __builtin_memcpy(&__temp, c, sizeof(*c));  \
     _Bool __r = __atomic_compare_exchange(p, (__typeof__(*(c))*) &__temp, n,
w, ms, mf);  \
     __builtin_memcpy(c, &__temp, sizeof(*c));  \
     __r;  });

typedef struct { int b; } cond;

int
__MCF_batch_release_common(cond* p, int c);

int
_MCF_cond_signal_some(cond* p, int x)
  {
    cond c = {x}, n = {2};
    __atomic_compare_exchange(p, &c, &n, 1, 0, 0);
    return __MCF_batch_release_common(p, x);
  }
```

which results in (https://gcc.godbolt.org/z/PGof8nGd7)

```asm
_MCF_cond_signal_some:
        mov     edx, 2
        mov     eax, esi
        mov     DWORD PTR [rsp-16], esi
        lock cmpxchg    DWORD PTR [rdi], edx
        je      .L2
        mov     DWORD PTR [rsp-16], eax
.L2:
        jmp     __MCF_batch_release_common
```

It is also notable that with this kind of hacks, GCC is finally able to perform
TCO on the return statement.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/105495] `__atomic_compare_exchange` prevents tail-call optimization
  2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
                   ` (5 preceding siblings ...)
  2022-05-06  7:40 ` lh_mouse at 126 dot com
@ 2022-10-31  7:55 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-31  7:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105495

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |tree-optimization

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I wonder if we could add some access attribute such that the 2nd argument of
__atomic_compare_exchange_4 is marked as read only ...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-31  7:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-05 14:00 [Bug c/105495] New: `__atomic_compare_exchange` prevents tail-call optimization lh_mouse at 126 dot com
2022-05-05 14:10 ` [Bug c/105495] " lh_mouse at 126 dot com
2022-05-05 14:17 ` rguenth at gcc dot gnu.org
2022-05-05 14:30 ` lh_mouse at 126 dot com
2022-05-05 14:33 ` jakub at gcc dot gnu.org
2022-05-05 15:28 ` [Bug middle-end/105495] " pinskia at gcc dot gnu.org
2022-05-06  7:40 ` lh_mouse at 126 dot com
2022-10-31  7:55 ` [Bug tree-optimization/105495] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).