public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* TBAA
@ 2021-05-16  6:56 Uecker, Martin
  2021-05-17  7:08 ` TBAA Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Uecker, Martin @ 2021-05-16  6:56 UTC (permalink / raw)
  To: gcc



Hi Richard,

I noticed that GCC 11 has different behavior in the following
example relative to 10.2 with -O2. I wonder whether this
is an intentional change and if yes, what are the rules?

Thanks!
Martin

https://godbolt.org/z/57res7ax1

#include <stdio.h>
#include <stdlib.h>


__attribute__((__noinline__, __weak__))
void f(long unk, void *pa, void *pa2, void *pb, long *x) {
  for (long i = 0; i < unk; ++i) {
    int oldy = *(int *)pa;
    *(float *)pb = 42;
    *(int *)pa2 = oldy ^ x[i];
  }
}

int main(void) {
  void *p = malloc(sizeof(int));
  *(int *)p = 13;
  f(1, p, p, p, (long []){ 0 });
  printf("*pa(%d)\n", *(int *)p);
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TBAA
  2021-05-16  6:56 TBAA Uecker, Martin
@ 2021-05-17  7:08 ` Richard Biener
  2021-05-17  7:32   ` TBAA Uecker, Martin
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2021-05-17  7:08 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: gcc

On Sun, May 16, 2021 at 8:57 AM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
>
>
> Hi Richard,
>
> I noticed that GCC 11 has different behavior in the following
> example relative to 10.2 with -O2. I wonder whether this
> is an intentional change and if yes, what are the rules?

Yes, this is a fix for the long-standing PR57359 where we
failed to preserve the order of stores when applying store-motion.

The basic rule is that stores may not be re-ordered based on
TBAA since a store changes the dynamic type of a memory
location.  That notably happens for C/C++ programs re-using
allocated storage and for C++ programs using placement
new.

It looked like a not so important bug since all other compilers
fall into the same trap, nevertheless they are all wrong in
doing so (and so was GCC).  All work fine at -O0 of course.

Richard.

> Thanks!
> Martin
>
> https://godbolt.org/z/57res7ax1
>
> #include <stdio.h>
> #include <stdlib.h>
>
>
> __attribute__((__noinline__, __weak__))
> void f(long unk, void *pa, void *pa2, void *pb, long *x) {
>   for (long i = 0; i < unk; ++i) {
>     int oldy = *(int *)pa;
>     *(float *)pb = 42;
>     *(int *)pa2 = oldy ^ x[i];
>   }
> }
>
> int main(void) {
>   void *p = malloc(sizeof(int));
>   *(int *)p = 13;
>   f(1, p, p, p, (long []){ 0 });
>   printf("*pa(%d)\n", *(int *)p);
> }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TBAA
  2021-05-17  7:08 ` TBAA Richard Biener
@ 2021-05-17  7:32   ` Uecker, Martin
  2021-05-17  7:44     ` TBAA Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Uecker, Martin @ 2021-05-17  7:32 UTC (permalink / raw)
  To: richard.guenther; +Cc: gcc



Am Montag, den 17.05.2021, 09:08 +0200 schrieb Richard Biener:
> On Sun, May 16, 2021 at 8:57 AM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > 
> > Hi Richard,
> > 
> > I noticed that GCC 11 has different behavior in the following
> > example relative to 10.2 with -O2. I wonder whether this
> > is an intentional change and if yes, what are the rules?
> 
> Yes, this is a fix for the long-standing PR57359 where we
> failed to preserve the order of stores when applying store-motion.
> 
> The basic rule is that stores may not be re-ordered based on
> TBAA since a store changes the dynamic type of a memory
> location.  That notably happens for C/C++ programs re-using
> allocated storage and for C++ programs using placement
> new.
> 
> It looked like a not so important bug since all other compilers
> fall into the same trap, nevertheless they are all wrong in
> doing so (and so was GCC).  All work fine at -O0 of course.
> 
> Richard.

This is excellent news! We were already thinking about
how one could change the effective type rules in C, but 
it is not clear how this could have been done.

Best,
Martin



> > Thanks!
> > Martin
> > 
> > https://godbolt.org/z/57res7ax1
> > 
> > #include <stdio.h>
> > #include <stdlib.h>
> > 
> > 
> > __attribute__((__noinline__, __weak__))
> > void f(long unk, void *pa, void *pa2, void *pb, long *x) {
> >   for (long i = 0; i < unk; ++i) {
> >     int oldy = *(int *)pa;
> >     *(float *)pb = 42;
> >     *(int *)pa2 = oldy ^ x[i];
> >   }
> > }
> > 
> > int main(void) {
> >   void *p = malloc(sizeof(int));
> >   *(int *)p = 13;
> >   f(1, p, p, p, (long []){ 0 });
> >   printf("*pa(%d)\n", *(int *)p);
> > }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TBAA
  2021-05-17  7:32   ` TBAA Uecker, Martin
@ 2021-05-17  7:44     ` Richard Biener
  2021-05-17 11:46       ` TBAA Uecker, Martin
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2021-05-17  7:44 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: gcc

On Mon, May 17, 2021 at 9:32 AM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
>
>
> Am Montag, den 17.05.2021, 09:08 +0200 schrieb Richard Biener:
> > On Sun, May 16, 2021 at 8:57 AM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > >
> > >
> > > Hi Richard,
> > >
> > > I noticed that GCC 11 has different behavior in the following
> > > example relative to 10.2 with -O2. I wonder whether this
> > > is an intentional change and if yes, what are the rules?
> >
> > Yes, this is a fix for the long-standing PR57359 where we
> > failed to preserve the order of stores when applying store-motion.
> >
> > The basic rule is that stores may not be re-ordered based on
> > TBAA since a store changes the dynamic type of a memory
> > location.  That notably happens for C/C++ programs re-using
> > allocated storage and for C++ programs using placement
> > new.
> >
> > It looked like a not so important bug since all other compilers
> > fall into the same trap, nevertheless they are all wrong in
> > doing so (and so was GCC).  All work fine at -O0 of course.
> >
> > Richard.
>
> This is excellent news! We were already thinking about
> how one could change the effective type rules in C, but
> it is not clear how this could have been done.

Yes, indeed.  It also turned out the solution found for GCC
has zero performance impact in practice.  The idea is simply
to re-do dependent stores in program order on each loop exit.
Without this "trick" fixing the bug has severe impact on SPEC
(fixing as in disabling the invalid store-motion optimizations).

Richard.

> Best,
> Martin
>
>
>
> > > Thanks!
> > > Martin
> > >
> > > https://godbolt.org/z/57res7ax1
> > >
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > >
> > >
> > > __attribute__((__noinline__, __weak__))
> > > void f(long unk, void *pa, void *pa2, void *pb, long *x) {
> > >   for (long i = 0; i < unk; ++i) {
> > >     int oldy = *(int *)pa;
> > >     *(float *)pb = 42;
> > >     *(int *)pa2 = oldy ^ x[i];
> > >   }
> > > }
> > >
> > > int main(void) {
> > >   void *p = malloc(sizeof(int));
> > >   *(int *)p = 13;
> > >   f(1, p, p, p, (long []){ 0 });
> > >   printf("*pa(%d)\n", *(int *)p);
> > > }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TBAA
  2021-05-17  7:44     ` TBAA Richard Biener
@ 2021-05-17 11:46       ` Uecker, Martin
  2021-05-17 14:10         ` TBAA Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Uecker, Martin @ 2021-05-17 11:46 UTC (permalink / raw)
  To: richard.guenther; +Cc: gcc



Am Montag, den 17.05.2021, 09:44 +0200 schrieb Richard Biener:
> On Mon, May 17, 2021 at 9:32 AM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > 
> > Am Montag, den 17.05.2021, 09:08 +0200 schrieb Richard Biener:
> > > On Sun, May 16, 2021 at 8:57 AM Uecker, Martin
> > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > 
> > > > Hi Richard,
> > > > 
> > > > I noticed that GCC 11 has different behavior in the following
> > > > example relative to 10.2 with -O2. I wonder whether this
> > > > is an intentional change and if yes, what are the rules?
> > > 
> > > Yes, this is a fix for the long-standing PR57359 where we
> > > failed to preserve the order of stores when applying store-
> > > motion.
> > > 
> > > The basic rule is that stores may not be re-ordered based on
> > > TBAA since a store changes the dynamic type of a memory
> > > location.  That notably happens for C/C++ programs re-using
> > > allocated storage and for C++ programs using placement
> > > new.
> > > 
> > > It looked like a not so important bug since all other compilers
> > > fall into the same trap, nevertheless they are all wrong in
> > > doing so (and so was GCC).  All work fine at -O0 of course.
> > > 
> > > Richard.
> > 
> > This is excellent news! We were already thinking about
> > how one could change the effective type rules in C, but
> > it is not clear how this could have been done.
> 
> Yes, indeed.  It also turned out the solution found for GCC
> has zero performance impact in practice.  The idea is simply
> to re-do dependent stores in program order on each loop exit.
> Without this "trick" fixing the bug has severe impact on SPEC
> (fixing as in disabling the invalid store-motion optimizations).

Nice!

Was moving these stores out of a loop the only
case where GCC did this incorrect store-motions? 


The other areas where the effective type
rules appear unclear and/or not matching reality are:

- effective type of aggregates when they are incrementally
constructed or changed  (not sure whether this is relevant
for optimization in GCC)

- access paths, i.e. is 'i.p' an access to 'i' or is
'a[3]' an access to 'a' with array type (assuming 'a'
is an lvalue of array type before it decays and not
a pointer). And if yes, when does this access happen? 
I think GCC relies on this.

- initial sequence rule


Anything else?

Best,
Martin



> Richard.
> 
> > Best,
> > Martin
> > 
> > 
> > 
> > > > Thanks!
> > > > Martin
> > > > 
> > > > https://godbolt.org/z/57res7ax1
> > > > 
> > > > #include <stdio.h>
> > > > #include <stdlib.h>
> > > > 
> > > > 
> > > > __attribute__((__noinline__, __weak__))
> > > > void f(long unk, void *pa, void *pa2, void *pb, long *x) {
> > > >   for (long i = 0; i < unk; ++i) {
> > > >     int oldy = *(int *)pa;
> > > >     *(float *)pb = 42;
> > > >     *(int *)pa2 = oldy ^ x[i];
> > > >   }
> > > > }
> > > > 
> > > > int main(void) {
> > > >   void *p = malloc(sizeof(int));
> > > >   *(int *)p = 13;
> > > >   f(1, p, p, p, (long []){ 0 });
> > > >   printf("*pa(%d)\n", *(int *)p);
> > > > }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TBAA
  2021-05-17 11:46       ` TBAA Uecker, Martin
@ 2021-05-17 14:10         ` Richard Biener
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2021-05-17 14:10 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: gcc

On Mon, May 17, 2021 at 1:46 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
>
>
> Am Montag, den 17.05.2021, 09:44 +0200 schrieb Richard Biener:
> > On Mon, May 17, 2021 at 9:32 AM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > >
> > >
> > > Am Montag, den 17.05.2021, 09:08 +0200 schrieb Richard Biener:
> > > > On Sun, May 16, 2021 at 8:57 AM Uecker, Martin
> > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > >
> > > > > Hi Richard,
> > > > >
> > > > > I noticed that GCC 11 has different behavior in the following
> > > > > example relative to 10.2 with -O2. I wonder whether this
> > > > > is an intentional change and if yes, what are the rules?
> > > >
> > > > Yes, this is a fix for the long-standing PR57359 where we
> > > > failed to preserve the order of stores when applying store-
> > > > motion.
> > > >
> > > > The basic rule is that stores may not be re-ordered based on
> > > > TBAA since a store changes the dynamic type of a memory
> > > > location.  That notably happens for C/C++ programs re-using
> > > > allocated storage and for C++ programs using placement
> > > > new.
> > > >
> > > > It looked like a not so important bug since all other compilers
> > > > fall into the same trap, nevertheless they are all wrong in
> > > > doing so (and so was GCC).  All work fine at -O0 of course.
> > > >
> > > > Richard.
> > >
> > > This is excellent news! We were already thinking about
> > > how one could change the effective type rules in C, but
> > > it is not clear how this could have been done.
> >
> > Yes, indeed.  It also turned out the solution found for GCC
> > has zero performance impact in practice.  The idea is simply
> > to re-do dependent stores in program order on each loop exit.
> > Without this "trick" fixing the bug has severe impact on SPEC
> > (fixing as in disabling the invalid store-motion optimizations).
>
> Nice!
>
> Was moving these stores out of a loop the only
> case where GCC did this incorrect store-motions?

Those are the ones I'm aware of, yes.

> The other areas where the effective type
> rules appear unclear and/or not matching reality are:
>
> - effective type of aggregates when they are incrementally
> constructed or changed  (not sure whether this is relevant
> for optimization in GCC)

So like

struct S { int i; };
(int *)p = 1;
 ... = *(struct S *)p;
or
 ... = ((struct S*)p)->i;

this and cases with multiple members should work fine with GCC.

> - access paths, i.e. is 'i.p' an access to 'i' or is
> 'a[3]' an access to 'a' with array type (assuming 'a'
> is an lvalue of array type before it decays and not
> a pointer). And if yes, when does this access happen?
> I think GCC relies on this.

Yes, GCC assumes that i.p is an access to 'i' (but not
in address-taking context like &i.p).  Likewise for
((struct S *)p)->i

> - initial sequence rule

We do not honor this, but in practice if you have

struct S1 { int i; float f; };
struct S2 { int i; double g; };

then accessing the initial sequence as *(int *) works,
but

struct S1 s1 = {};
int i = ((struct S2 *)&s1)->i;

does not because we consider it an access as 'S2' due
to the previous point.  If it were just an access to 'i' it
would work again.

> Anything else?

The most recent interesting case from the C++ world is
when they allowed

struct S { int n; std::byte a[16]; }

S s;
new (s.a) T();
S s2;
s2 = s;

thus construct an arbitrary type into an aggregate member
of std::byte (or char type I believe).  With GCCs memory model
copying this like s2.a = s.a would have worked (because std::byte
and char alias everything).  But when you do s2 = s then the
accesses as 'struct S' do not TBAA conflict with say 'double'
(if you make T == double).  For this reason the C++ frontend
now drops TBAA for all std::byte/char array containing aggregeates ...

I do know this kind of stuff is used in practice in C code
as well but usually C people do not use aggregate copying
but instead use memcpy (which would be fine).

So maybe just an anecdote but eventually such situation occurs
in (future) C as well.

Richard.


> Best,
> Martin
>
>
>
> > Richard.
> >
> > > Best,
> > > Martin
> > >
> > >
> > >
> > > > > Thanks!
> > > > > Martin
> > > > >
> > > > > https://godbolt.org/z/57res7ax1
> > > > >
> > > > > #include <stdio.h>
> > > > > #include <stdlib.h>
> > > > >
> > > > >
> > > > > __attribute__((__noinline__, __weak__))
> > > > > void f(long unk, void *pa, void *pa2, void *pb, long *x) {
> > > > >   for (long i = 0; i < unk; ++i) {
> > > > >     int oldy = *(int *)pa;
> > > > >     *(float *)pb = 42;
> > > > >     *(int *)pa2 = oldy ^ x[i];
> > > > >   }
> > > > > }
> > > > >
> > > > > int main(void) {
> > > > >   void *p = malloc(sizeof(int));
> > > > >   *(int *)p = 13;
> > > > >   f(1, p, p, p, (long []){ 0 });
> > > > >   printf("*pa(%d)\n", *(int *)p);
> > > > > }

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-17 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16  6:56 TBAA Uecker, Martin
2021-05-17  7:08 ` TBAA Richard Biener
2021-05-17  7:32   ` TBAA Uecker, Martin
2021-05-17  7:44     ` TBAA Richard Biener
2021-05-17 11:46       ` TBAA Uecker, Martin
2021-05-17 14:10         ` TBAA Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).