From mboxrd@z Thu Jan  1 00:00:00 1970
From: Joe Buck <jbuck@synopsys.COM>
To: gcc@gcc.gnu.org
Cc: rms@gnu.org
Subject: type based aliasing again
Date: Wed, 08 Sep 1999 18:11:00 -0000
Message-id: <199909090109.SAA28465@atrus.synopsys.com>
X-SW-Source: 1999-09/msg00316.html

It seems that RMS and Linus have something in common, in that neither
is happy with the way type-based aliasing is used in gcc 2.95.

RMS has a proposal to do something about it that I'd like to discuss
with you all.  It would be nice if RMS had the bandwidth to take this
up himself, but there seem to be some sensitivities among some developers
over RMS telling them what to do, so it'll probably just go smoother
if I pass the thing on.  Besides, I like his idea.  (Please keep the
cc to RMS on replies).


Some background, for those unfamiliar with the issue: in the past, Fortran
compilers typically generated much faster code than C compilers for
essentially the same code in part because of Fortran's aliasing rules: the
arguments to a Fortran subroutine can never overlap, so the compiler
doesn't have to worry that writing to one array argument will change
another, so loops can keep more stuff in registers.  In C, on the other
hand, it's common to pass two pointers to the same array to a function.

So the ANSI C folks came up with the type-aliasing rule: the compiler
is entitled to assume that accesses (possibly via a pointer) to memory
of one type can't alter memory of another type (they made an exception
for char* pointers and void* pointers, also const char, unsigned char,
etc).  The rule basically makes it illegal to access a value of one
type through a pointer of another type (except for a list of exceptions,
e.g. signed and unsigned flavors of the same integral type).

(For a more precise statement of the rule, see the mail archive).

As of gcc 2.95, optimizations that take advantage of this rule are turned
on by default.  The problem is that there is a good deal of
technically-invalid code out there (that, for example, reads and writes
long data as pairs of shorts, without using unions).  gcc 2.95 may
malfunction on such code unless the flag -fno-strict-aliasing is provided.

The question that has been asked by a number of people is whether we
can make some of the more "obvious" cases work correctly, or somehow
help users find and clean up such cases, and there were a number of
technical proposals, some of which were infeasible, some of which just
weren't satisfactory.

RMS came up with a proposal that I think is reasonable, though we should
have expert input on it.  The idea goes like this: currently, we do
the type-based check first: if two references are of different types,
they do not alias.  If they are of the same or compatible types, we
then proceed to ask (to oversimplify things) whether the references
can collide.  We obtain a three-way answer: yes, we know they collide
(aliasing); no, they never collide (no aliasing), or maybe, we can't tell
(assume aliasing to be safe).  Generally speaking, we do this analysis
by seeing if two references are both offsets from the same base address
and then look at the offsets.

The change is simply to postpone the type-based check until after the
other analysis is done.  If we detect that the references collide, but
the types say that we can assume they don't, we issue a warning and
then tell the compiler that there is aliasing.  (A variant is to silently
accept the code, but I would prefer issuing a warning).  If we fall into
the "maybe" case, we assume no aliasing if the types don't match.

Some questions:

- Can anyone tell me why this is not feasible?

- For those of you with code that is impacted by -fstrict-aliasing, can
  you look and see whether this modification would give you a warning?
  (That is, can the compiler see based on purely local information that
  the true address and the pointed-to object overlap)?

- Should we do this?

Joe


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Joe Buck <jbuck@synopsys.COM>
To: gcc@gcc.gnu.org
Cc: rms@gnu.org
Subject: type based aliasing again
Date: Thu, 30 Sep 1999 18:02:00 -0000
Message-ID: <199909090109.SAA28465@atrus.synopsys.com>
X-SW-Source: 1999-09n/msg00316.html
Message-ID: <19990930180200.AmSw3astPLemE6uoaYFBGC6iGKlaDEI7c3fJDj-4Lfc@z>

It seems that RMS and Linus have something in common, in that neither
is happy with the way type-based aliasing is used in gcc 2.95.

RMS has a proposal to do something about it that I'd like to discuss
with you all.  It would be nice if RMS had the bandwidth to take this
up himself, but there seem to be some sensitivities among some developers
over RMS telling them what to do, so it'll probably just go smoother
if I pass the thing on.  Besides, I like his idea.  (Please keep the
cc to RMS on replies).


Some background, for those unfamiliar with the issue: in the past, Fortran
compilers typically generated much faster code than C compilers for
essentially the same code in part because of Fortran's aliasing rules: the
arguments to a Fortran subroutine can never overlap, so the compiler
doesn't have to worry that writing to one array argument will change
another, so loops can keep more stuff in registers.  In C, on the other
hand, it's common to pass two pointers to the same array to a function.

So the ANSI C folks came up with the type-aliasing rule: the compiler
is entitled to assume that accesses (possibly via a pointer) to memory
of one type can't alter memory of another type (they made an exception
for char* pointers and void* pointers, also const char, unsigned char,
etc).  The rule basically makes it illegal to access a value of one
type through a pointer of another type (except for a list of exceptions,
e.g. signed and unsigned flavors of the same integral type).

(For a more precise statement of the rule, see the mail archive).

As of gcc 2.95, optimizations that take advantage of this rule are turned
on by default.  The problem is that there is a good deal of
technically-invalid code out there (that, for example, reads and writes
long data as pairs of shorts, without using unions).  gcc 2.95 may
malfunction on such code unless the flag -fno-strict-aliasing is provided.

The question that has been asked by a number of people is whether we
can make some of the more "obvious" cases work correctly, or somehow
help users find and clean up such cases, and there were a number of
technical proposals, some of which were infeasible, some of which just
weren't satisfactory.

RMS came up with a proposal that I think is reasonable, though we should
have expert input on it.  The idea goes like this: currently, we do
the type-based check first: if two references are of different types,
they do not alias.  If they are of the same or compatible types, we
then proceed to ask (to oversimplify things) whether the references
can collide.  We obtain a three-way answer: yes, we know they collide
(aliasing); no, they never collide (no aliasing), or maybe, we can't tell
(assume aliasing to be safe).  Generally speaking, we do this analysis
by seeing if two references are both offsets from the same base address
and then look at the offsets.

The change is simply to postpone the type-based check until after the
other analysis is done.  If we detect that the references collide, but
the types say that we can assume they don't, we issue a warning and
then tell the compiler that there is aliasing.  (A variant is to silently
accept the code, but I would prefer issuing a warning).  If we fall into
the "maybe" case, we assume no aliasing if the types don't match.

Some questions:

- Can anyone tell me why this is not feasible?

- For those of you with code that is impacted by -fstrict-aliasing, can
  you look and see whether this modification would give you a warning?
  (That is, can the compiler see based on purely local information that
  the true address and the pointed-to object overlap)?

- Should we do this?

Joe