From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Buck To: gcc@gcc.gnu.org Cc: rms@gnu.org Subject: type based aliasing again Date: Wed, 08 Sep 1999 18:11:00 -0000 Message-id: <199909090109.SAA28465@atrus.synopsys.com> X-SW-Source: 1999-09/msg00316.html It seems that RMS and Linus have something in common, in that neither is happy with the way type-based aliasing is used in gcc 2.95. RMS has a proposal to do something about it that I'd like to discuss with you all. It would be nice if RMS had the bandwidth to take this up himself, but there seem to be some sensitivities among some developers over RMS telling them what to do, so it'll probably just go smoother if I pass the thing on. Besides, I like his idea. (Please keep the cc to RMS on replies). Some background, for those unfamiliar with the issue: in the past, Fortran compilers typically generated much faster code than C compilers for essentially the same code in part because of Fortran's aliasing rules: the arguments to a Fortran subroutine can never overlap, so the compiler doesn't have to worry that writing to one array argument will change another, so loops can keep more stuff in registers. In C, on the other hand, it's common to pass two pointers to the same array to a function. So the ANSI C folks came up with the type-aliasing rule: the compiler is entitled to assume that accesses (possibly via a pointer) to memory of one type can't alter memory of another type (they made an exception for char* pointers and void* pointers, also const char, unsigned char, etc). The rule basically makes it illegal to access a value of one type through a pointer of another type (except for a list of exceptions, e.g. signed and unsigned flavors of the same integral type). (For a more precise statement of the rule, see the mail archive). As of gcc 2.95, optimizations that take advantage of this rule are turned on by default. The problem is that there is a good deal of technically-invalid code out there (that, for example, reads and writes long data as pairs of shorts, without using unions). gcc 2.95 may malfunction on such code unless the flag -fno-strict-aliasing is provided. The question that has been asked by a number of people is whether we can make some of the more "obvious" cases work correctly, or somehow help users find and clean up such cases, and there were a number of technical proposals, some of which were infeasible, some of which just weren't satisfactory. RMS came up with a proposal that I think is reasonable, though we should have expert input on it. The idea goes like this: currently, we do the type-based check first: if two references are of different types, they do not alias. If they are of the same or compatible types, we then proceed to ask (to oversimplify things) whether the references can collide. We obtain a three-way answer: yes, we know they collide (aliasing); no, they never collide (no aliasing), or maybe, we can't tell (assume aliasing to be safe). Generally speaking, we do this analysis by seeing if two references are both offsets from the same base address and then look at the offsets. The change is simply to postpone the type-based check until after the other analysis is done. If we detect that the references collide, but the types say that we can assume they don't, we issue a warning and then tell the compiler that there is aliasing. (A variant is to silently accept the code, but I would prefer issuing a warning). If we fall into the "maybe" case, we assume no aliasing if the types don't match. Some questions: - Can anyone tell me why this is not feasible? - For those of you with code that is impacted by -fstrict-aliasing, can you look and see whether this modification would give you a warning? (That is, can the compiler see based on purely local information that the true address and the pointed-to object overlap)? - Should we do this? Joe From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Buck To: gcc@gcc.gnu.org Cc: rms@gnu.org Subject: type based aliasing again Date: Thu, 30 Sep 1999 18:02:00 -0000 Message-ID: <199909090109.SAA28465@atrus.synopsys.com> X-SW-Source: 1999-09n/msg00316.html Message-ID: <19990930180200.AmSw3astPLemE6uoaYFBGC6iGKlaDEI7c3fJDj-4Lfc@z> It seems that RMS and Linus have something in common, in that neither is happy with the way type-based aliasing is used in gcc 2.95. RMS has a proposal to do something about it that I'd like to discuss with you all. It would be nice if RMS had the bandwidth to take this up himself, but there seem to be some sensitivities among some developers over RMS telling them what to do, so it'll probably just go smoother if I pass the thing on. Besides, I like his idea. (Please keep the cc to RMS on replies). Some background, for those unfamiliar with the issue: in the past, Fortran compilers typically generated much faster code than C compilers for essentially the same code in part because of Fortran's aliasing rules: the arguments to a Fortran subroutine can never overlap, so the compiler doesn't have to worry that writing to one array argument will change another, so loops can keep more stuff in registers. In C, on the other hand, it's common to pass two pointers to the same array to a function. So the ANSI C folks came up with the type-aliasing rule: the compiler is entitled to assume that accesses (possibly via a pointer) to memory of one type can't alter memory of another type (they made an exception for char* pointers and void* pointers, also const char, unsigned char, etc). The rule basically makes it illegal to access a value of one type through a pointer of another type (except for a list of exceptions, e.g. signed and unsigned flavors of the same integral type). (For a more precise statement of the rule, see the mail archive). As of gcc 2.95, optimizations that take advantage of this rule are turned on by default. The problem is that there is a good deal of technically-invalid code out there (that, for example, reads and writes long data as pairs of shorts, without using unions). gcc 2.95 may malfunction on such code unless the flag -fno-strict-aliasing is provided. The question that has been asked by a number of people is whether we can make some of the more "obvious" cases work correctly, or somehow help users find and clean up such cases, and there were a number of technical proposals, some of which were infeasible, some of which just weren't satisfactory. RMS came up with a proposal that I think is reasonable, though we should have expert input on it. The idea goes like this: currently, we do the type-based check first: if two references are of different types, they do not alias. If they are of the same or compatible types, we then proceed to ask (to oversimplify things) whether the references can collide. We obtain a three-way answer: yes, we know they collide (aliasing); no, they never collide (no aliasing), or maybe, we can't tell (assume aliasing to be safe). Generally speaking, we do this analysis by seeing if two references are both offsets from the same base address and then look at the offsets. The change is simply to postpone the type-based check until after the other analysis is done. If we detect that the references collide, but the types say that we can assume they don't, we issue a warning and then tell the compiler that there is aliasing. (A variant is to silently accept the code, but I would prefer issuing a warning). If we fall into the "maybe" case, we assume no aliasing if the types don't match. Some questions: - Can anyone tell me why this is not feasible? - For those of you with code that is impacted by -fstrict-aliasing, can you look and see whether this modification would give you a warning? (That is, can the compiler see based on purely local information that the true address and the pointed-to object overlap)? - Should we do this? Joe