public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though)
@ 2023-10-18 20:17 fallenleafs at icloud dot com
  2023-10-18 20:24 ` [Bug target/111870] " pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: fallenleafs at icloud dot com @ 2023-10-18 20:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

            Bug ID: 111870
           Summary: Miscompile of atomic rmw or on x86 (not aarch, though)
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fallenleafs at icloud dot com
  Target Milestone: ---

Recently I wrote one little piece of code that gave me too much headache. After
investigation, I discovered that on x86 the sequence of instructions that GCC
(strangely, llvm as well) produces is invalid because after applying
optimisations, atomicity of load gets elided. You can witness that gcc makes up
nonsense here[https://godbolt.org/z/Thfeq1KGW]. There you can find c code and
if you compile it with gcc13.2 to aarch, atomic_fetch_or_explicit gets
translated to a loop of a pair of special load-store instructions which is
correct lowering, but if you do it for x86, you can, in fact, witness that
generated code does not contain `lock or ...` instruction, which would be
correct code, but instead `lock cmpxchg ...` which is invalid.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
@ 2023-10-18 20:24 ` pinskia at gcc dot gnu.org
  2023-10-18 20:32 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 20:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 56145
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56145&action=edit
testcase

Next time please enter attach or place inline the testcase rather than just a
link.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
  2023-10-18 20:24 ` [Bug target/111870] " pinskia at gcc dot gnu.org
@ 2023-10-18 20:32 ` pinskia at gcc dot gnu.org
  2023-10-18 20:35 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 20:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-10-18
             Status|UNCONFIRMED                 |WAITING

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>atomicity of load gets elided. 

No IT DOES NOT.

        mov     rax, QWORD PTR [rdi]
.L3:
        mov     rsi, rax
        mov     rdx, rax
        or      rsi, 1
        lock cmpxchg    QWORD PTR [rdi], rsi
        jne     .L3

That is correct as far as I know.

an atomic load from [rdi] and then do the or and then do a compare-and-exchange
with the new value (old value was in rdx for the comparison and the new value
is stored into rax).

That is very much atomically doing the IOR.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
  2023-10-18 20:24 ` [Bug target/111870] " pinskia at gcc dot gnu.org
  2023-10-18 20:32 ` pinskia at gcc dot gnu.org
@ 2023-10-18 20:35 ` pinskia at gcc dot gnu.org
  2023-10-18 20:37 ` fallenleafs at icloud dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 20:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Or maybe the issue is you don't understand the cmpxchg instruction and how it
gives back the original value too.


The RTL form for the "lock;cmpxchg " is:

(insn:TI 14 10 17 5 (parallel [
            (set (reg:DI 0 ax [108])
                (unspec_volatile:DI [
                        (mem/v:DI (reg/v/f:DI 5 di [orig:105 map ] [105]) [-1 
S8 A64])
                        (reg:DI 0 ax [108])
                        (reg:DI 4 si [107])
                        (const_int 32773 [0x8005])
                    ] UNSPECV_CMPXCHG))
            (set (mem/v:DI (reg/v/f:DI 5 di [orig:105 map ] [105]) [-1  S8
A64])
                (unspec_volatile:DI [
                        (const_int 0 [0])
                    ] UNSPECV_CMPXCHG))
            (set (reg:CCZ 17 flags)
                (unspec_volatile:CCZ [
                        (const_int 0 [0])
                    ] UNSPECV_CMPXCHG))
        ]) "/app/example.c":15:22 9336 {atomic_compare_and_swapdi_1}
     (expr_list:REG_DEAD (reg:DI 4 si [107])
        (nil)))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
                   ` (2 preceding siblings ...)
  2023-10-18 20:35 ` pinskia at gcc dot gnu.org
@ 2023-10-18 20:37 ` fallenleafs at icloud dot com
  2023-10-18 20:38 ` fallenleafs at icloud dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fallenleafs at icloud dot com @ 2023-10-18 20:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #4 from isoosqa <fallenleafs at icloud dot com> ---
Oops, I sent wrong code. This is the one https://godbolt.org/z/GxdvMdP76

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
                   ` (3 preceding siblings ...)
  2023-10-18 20:37 ` fallenleafs at icloud dot com
@ 2023-10-18 20:38 ` fallenleafs at icloud dot com
  2023-10-18 20:43 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fallenleafs at icloud dot com @ 2023-10-18 20:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #5 from isoosqa <fallenleafs at icloud dot com> ---
Please, forgive me. I typed stuff wrong in original link

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
                   ` (4 preceding siblings ...)
  2023-10-18 20:38 ` fallenleafs at icloud dot com
@ 2023-10-18 20:43 ` pinskia at gcc dot gnu.org
  2023-10-18 20:44 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 20:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
If you don't use the return value of atomic_fetch_or_explicit, there is no need
for a compare-and-exchange (swap) loop. If you need the fetch part, the
compare-and-exchange loop needs to be used as `lock or` does not provide that.


I still don't understand what exactly you are saying is wrong except you want
to use `lock or` in both cases but you can't since that will not get the right
atomic fetch part of the atomic_fetch_or.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
                   ` (5 preceding siblings ...)
  2023-10-18 20:43 ` pinskia at gcc dot gnu.org
@ 2023-10-18 20:44 ` pinskia at gcc dot gnu.org
  2023-10-18 20:45 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 20:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
That is not using the fetch part is optimized to just `lock or`.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
                   ` (6 preceding siblings ...)
  2023-10-18 20:44 ` pinskia at gcc dot gnu.org
@ 2023-10-18 20:45 ` pinskia at gcc dot gnu.org
  2023-10-18 20:46 ` pinskia at gcc dot gnu.org
  2023-10-18 22:38 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 20:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
On aarch64, ldset does both a load and ior. that is unlike the `lock or` on
x86.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
                   ` (7 preceding siblings ...)
  2023-10-18 20:45 ` pinskia at gcc dot gnu.org
@ 2023-10-18 20:46 ` pinskia at gcc dot gnu.org
  2023-10-18 22:38 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 20:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
If I change your testcase to be:
uint64_t huh2 (_Atomic(uint64_t)* map, int t) {
   return atomic_fetch_or_explicit(map, t, memory_order_relaxed);
}

You will see that it does the `lock cmpxchg` loop too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)
  2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
                   ` (8 preceding siblings ...)
  2023-10-18 20:46 ` pinskia at gcc dot gnu.org
@ 2023-10-18 22:38 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 22:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|WAITING                     |RESOLVED

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There is no atomicity issue here.
If we need the old (or new) value, we use a compare-and-exchange loop to get
the old value. If we don't need it, we use an atomic or.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-18 22:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-18 20:17 [Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though) fallenleafs at icloud dot com
2023-10-18 20:24 ` [Bug target/111870] " pinskia at gcc dot gnu.org
2023-10-18 20:32 ` pinskia at gcc dot gnu.org
2023-10-18 20:35 ` pinskia at gcc dot gnu.org
2023-10-18 20:37 ` fallenleafs at icloud dot com
2023-10-18 20:38 ` fallenleafs at icloud dot com
2023-10-18 20:43 ` pinskia at gcc dot gnu.org
2023-10-18 20:44 ` pinskia at gcc dot gnu.org
2023-10-18 20:45 ` pinskia at gcc dot gnu.org
2023-10-18 20:46 ` pinskia at gcc dot gnu.org
2023-10-18 22:38 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).