public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3
@ 2022-02-10  8:13 marc.mutz at hotmail dot com
  2022-02-10  8:25 ` [Bug tree-optimization/104480] [12 Regression] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: marc.mutz at hotmail dot com @ 2022-02-10  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480

            Bug ID: 104480
           Summary: [trunk] Combining stores across memory locations might
                    violate [intro.memory]/3
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marc.mutz at hotmail dot com
  Target Milestone: ---

I'm not sure whether GCC trunk just became much smarter, or introduced a
regresssion. Sorry if it's the former.

Consider:

// https://gcc.godbolt.org/z/ch8rTob7c
struct S1
{
    int a1 : 16;
    int a2 : 16;
};
struct S2
{
    short a1;
    short a2;
};

extern char x;

template<typename T> void f(T &t) {
    t.a1 = x;
    t.a2 = x + 1;
}
template void f(S1 &);
template void f(S2 &);

All GCC version up to 11.2 will use two movw to implement both f()
instantiations. GCC trunk now uses one movl in both instantiations. That's
clearly allowed for f<S1>() by [intro.memory]/3, but it's less clear that it's
an allowed optimisation for S2, because a1, s2 are two separate memory
locations there. Clang, in fact, produces different code for the two
instantiations.

Of course, GCC might be the clever kid here and realize that it can combine the
writes, because that's a valid observable sequence, but an object of type S2,
having alignment 2, may cross a cacheline-boundary, in which case the movl
might not be atomic, even on x86, and then a different core may observe the
writes out of order, which probably shouldn't happen.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/104480] [12 Regression] Combining stores across memory locations might violate [intro.memory]/3
  2022-02-10  8:13 [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3 marc.mutz at hotmail dot com
@ 2022-02-10  8:25 ` pinskia at gcc dot gnu.org
  2022-02-10  8:40 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-10  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
          Component|c++                         |tree-optimization
            Summary|[trunk] Combining stores    |[12 Regression] Combining
                   |across memory locations     |stores across memory
                   |might violate               |locations might violate
                   |[intro.memory]/3            |[intro.memory]/3

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
That would mean any vectorization of stores that is unaligned would cause
problems.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/104480] [12 Regression] Combining stores across memory locations might violate [intro.memory]/3
  2022-02-10  8:13 [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3 marc.mutz at hotmail dot com
  2022-02-10  8:25 ` [Bug tree-optimization/104480] [12 Regression] " pinskia at gcc dot gnu.org
@ 2022-02-10  8:40 ` rguenth at gcc dot gnu.org
  2022-02-10  8:49 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-10  8:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |12.0

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I don't think [intro.memory]/3 (wherever that should point to?) is realized
this way on CPUs with a less strong memory ordering guarantee than x86.  And we
definitely do not ensure atomicity or commit order unless you use atomic access
primitives.

So I think this is invalid.  As Andrew says we're happily combining

void foo (double * __restrict a, double *b)
{
  a[0] = b[0];
  a[1] = b[1];
}

into

        movupd  (%rsi), %xmm0
        movups  %xmm0, (%rdi)

since forever using vectorization which would have the exact same issue
when the store crosses a cacheline boundary.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/104480] [12 Regression] Combining stores across memory locations might violate [intro.memory]/3
  2022-02-10  8:13 [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3 marc.mutz at hotmail dot com
  2022-02-10  8:25 ` [Bug tree-optimization/104480] [12 Regression] " pinskia at gcc dot gnu.org
  2022-02-10  8:40 ` rguenth at gcc dot gnu.org
@ 2022-02-10  8:49 ` pinskia at gcc dot gnu.org
  2022-02-10 10:48 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-10  8:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> I don't think [intro.memory]/3 

https://eel.is/c++draft/intro.memory


The only thing this mentions is:
Two or more threads of execution can access separate memory locations without
interfering with each other.


So that means you can access S2.a1 and S2.a2 on two seperate threads
seperately. It also means you cannot depend on the sequence if you write S2.a1
and S2.a2 either and reading it in the same order from the other thread. Unless
you have a memory barrier.

So I think this is invalid too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/104480] [12 Regression] Combining stores across memory locations might violate [intro.memory]/3
  2022-02-10  8:13 [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3 marc.mutz at hotmail dot com
                   ` (2 preceding siblings ...)
  2022-02-10  8:49 ` pinskia at gcc dot gnu.org
@ 2022-02-10 10:48 ` redi at gcc dot gnu.org
  2022-02-10 10:51 ` redi at gcc dot gnu.org
  2022-02-10 11:54 ` marc.mutz at hotmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2022-02-10 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Marc Mutz from comment #0)
> the movl might not be atomic, even on x86, and then a different core may
> observe the writes out of order, which probably shouldn't happen.

Nothing in your code imposes an ordering on those stores. Another thread cannot
observe it, because you are using non-atomic operations.

If you want a specific order, you need to use atomics (or volatile, but that
would be the wrong solution).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/104480] [12 Regression] Combining stores across memory locations might violate [intro.memory]/3
  2022-02-10  8:13 [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3 marc.mutz at hotmail dot com
                   ` (3 preceding siblings ...)
  2022-02-10 10:48 ` redi at gcc dot gnu.org
@ 2022-02-10 10:51 ` redi at gcc dot gnu.org
  2022-02-10 11:54 ` marc.mutz at hotmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2022-02-10 10:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #4)
> Nothing in your code imposes an ordering on those stores. Another thread
> cannot observe it, because you are using non-atomic operations.

To be more precise, another thread cannot even observe *one* of those stores
without a data race. It certainly can't observe both and conclude they happened
out of order.

To avoid a data race the other thread must ensure that the entire call to f
happens-before any read of either memory location. And if the entire call
happens-before the read, you can't see which order the stores happened in.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/104480] [12 Regression] Combining stores across memory locations might violate [intro.memory]/3
  2022-02-10  8:13 [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3 marc.mutz at hotmail dot com
                   ` (4 preceding siblings ...)
  2022-02-10 10:51 ` redi at gcc dot gnu.org
@ 2022-02-10 11:54 ` marc.mutz at hotmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: marc.mutz at hotmail dot com @ 2022-02-10 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480

Marc Mutz <marc.mutz at hotmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #6 from Marc Mutz <marc.mutz at hotmail dot com> ---
Thanks, that makes sense. It's a data race if it's not synchronized.

And if I write to only one of the members, then GCC uses movw like everyone
else: https://gcc.godbolt.org/z/9rsK5osaM

So, GCC _is_ the clever kid!

Sorry for the noise, then.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-10 11:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10  8:13 [Bug c++/104480] New: [trunk] Combining stores across memory locations might violate [intro.memory]/3 marc.mutz at hotmail dot com
2022-02-10  8:25 ` [Bug tree-optimization/104480] [12 Regression] " pinskia at gcc dot gnu.org
2022-02-10  8:40 ` rguenth at gcc dot gnu.org
2022-02-10  8:49 ` pinskia at gcc dot gnu.org
2022-02-10 10:48 ` redi at gcc dot gnu.org
2022-02-10 10:51 ` redi at gcc dot gnu.org
2022-02-10 11:54 ` marc.mutz at hotmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).