public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Idea: Eliminate libf2c/f2c.h installation from g77 entirely?
@ 1998-04-18 16:05 Craig Burley
  1998-04-18 20:56 ` Mumit Khan
  1998-04-19  0:11 ` Jeffrey A Law
  0 siblings, 2 replies; 24+ messages in thread
From: Craig Burley @ 1998-04-18 16:05 UTC (permalink / raw)
  To: egcs, g77-alpha

I'm thinking it might be time to eliminate from g77 any and all
attempts to "help" the sysadmin by installing the libf2c.a and
f2c.h files somewhere, which is currently done so that f2c users
"automatically" use the same files as used by g77, making successful
linking of executables combining objects generated by both g77 and
f2c more likely.

(A short summary of the overall issues: g77 currently uses libf2c,
f2c's run-time library, for it's own run-time needs.  It ships with
a patched version of libf2c, but we've avoided creating gratuitous
incompatibilities in the interface.  That, combined with appropriate
effort in g77, allows users to pretty much pick and choose which
Fortran modules they compile with g77 and which with f2c, link
the objects together, and have things actually work, even including
starting I/O in a g77-compiled module and continuing it in an
f2c-compiled one.)

In the past it seemed convenient to try and offer this sort of
"one-stop shopping" from g77, but now I wonder whether this produces
more problems than it's worth.  A quick list of my thoughts on this:

  -  If f2c isn't being used, the only need for f2c.h (which g77
     generates its copy of via its configuration process from f2c.h.in,
     which in turn is just a modified copy of netlib's f2c's f2c.h)
     is when building g77's copy of libf2c.  There is no need to install
     this f2c.h anywhere; it's almost like an object file in this
     respect, in that, once the build is completed, it isn't needed.

  -  If f2c isn't being used, as long as g77 prefers "its" copy of libf2c.a
     (e.g. the one in $prefix/lib/gcc-lib/$machine/$version) over the one
     in the system's (/usr/lib) library, libf2c.a need not be installed
     anywhere else.  That is, it's really no different than the cc1,
     cc1plus, f771, and specs files, AFAIK.

  -  We've gotten bug report(s) that official-g77's method to determine
     whether the sysadmin wants libf2c.a and f2c.h installed is somewhat
     buggy.  (This method involves creating a file named `f2c-install-ok'
     in the source or build directory, or overriding a `make' macro
     when building.)  In fact, I just noticed that the f77.uninstall target
     in g77 0.5.22 uses the wrong pathname to delete f2c.h, a different
     bug than the one(s) reported.

  -  egcs apparently forcibly installs f2c.h in the system's include
     directory upon "make install" instead of "consulting" the
     `f2c-install-ok' mechanism described above.  At least one bug
     report has been filed about, IIRC, this problem, which is easily
     fixed regardless.

  -  The various configuration/build/install mechanisms employed by
     g77 -- meaning the ones in gcc 2.7, gcc 2.8, egcs 1.0, and, soon,
     egcs 1.1, all seem to agree, at least in principle, on the idea
     of explicit configuration of locations for where to install things.

     But this approach doesn't offer an elegant way to cope with what
     g77 is trying to do -- namely, install what amounts to part of
     some *other* product in a location the installer cannot control
     (except, modulo the egcs install bug, by enabling or disabling it).
     Especially since that other product is not (currently, I believe)
     distributed in a GNUish form, i.e. no GNU configuration, etc.

  -  From the above item, it seems practical to expect the sysadmin
     to be able, if not actually prefer, to make the decision when
     and where to move g77's libf2c.a and f2c.h to provide inter-
     operability with f2c.

A sysadmin who doesn't have f2c users won't care about what we decide
to do here.  A sysadmin who has f2c users who don't care about g77
inter-operability usually won't either, although perhaps there's some
bonus to getting g77's versions of libf2c.a and f2c.h (perhaps the
automatic configuration is nice).

Only a sysadmin who cares about g77 and f2c inter-operability will be
interested in what we do, and I believe that it'll be safer and more
robust, overall, to have g77 just "do its job" and not futz with bits
and pieces of f2c installation, leaving it to the sysadmin to do that,
as would be described in the g77 docs.

The only thing, other than a minor convenience, I can think of that
might be a problem is that sysadmins who don't notice that we've
removed this capability might end up thinking they've got a coherent
f2c+g77 installation when they haven't -- and that *might* result
in subtle problems down the road for users who combine f2c and
g77 objects in executables (e.g. if the API, or its implementation
in the egcs/gcc back end, changes subtly on the system).

I think that's not worth worrying about much, especially for egcs
at the moment.  For gcc, g77 could notice if the `f2c-install-ok'
mechanism is still being enabled and, if so, issue some kind of
warning, pointing to a relevant node in the g77 docs.  In any case,
it's not clear whether g77's even putting things in the right
place for the typical f2c/gcc interoperability-using installation.

So, I'd like to implement this removal, along with the relevant docs,
myself in the following release schedule:

  -  For g77, in version 0.5.23, the upcoming version that will also
     be the first to be based on gcc 2.8 (instead of gcc 2.7).

  -  For egcs, in version 1.1.

Please let me know if you see any problems with this idea, though
I'm also interested in "sounds like a good idea" messages as well.

In the long run, g77 should be improved to use a run-time library
better suited to its needs, which we've been calling `libg77',
though no real work has started on this.  One of the advantages to
a new library is that semi-mangled names for external (visible)
procedures can be used to significantly reduce the possibility
of inadvertent clashes causing subtle numerical bugs (as happens
for, e.g., cabs() between libf2c.a and, IIRC, a SunOS library).

Note that <199801161552.KAA18293@melange.gnu.org>, which should
be in the egcs mailing-list archives, is a message I sent that
attempted to describe the state of g77 vis-a-vis f2c.h, in
particular.  People wanting more info on the subject matter should
read that first.  I'm essentially proposing to undertake Choice 2
from a list of possible choices described in that email, now.

(I'm going to stuff this message in the g77.plan file too.)

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Idea: Eliminate libf2c/f2c.h installation from g77 entirely?
  1998-04-18 16:05 Idea: Eliminate libf2c/f2c.h installation from g77 entirely? Craig Burley
@ 1998-04-18 20:56 ` Mumit Khan
  1998-04-19  0:11 ` Jeffrey A Law
  1 sibling, 0 replies; 24+ messages in thread
From: Mumit Khan @ 1998-04-18 20:56 UTC (permalink / raw)
  To: egcs

Craig Burley <burley@gnu.org> writes:
> I'm thinking it might be time to eliminate from g77 any and all
> attempts to "help" the sysadmin by installing the libf2c.a and
> f2c.h files somewhere, which is currently done so that f2c users
> "automatically" use the same files as used by g77, making successful
> linking of executables combining objects generated by both g77 and
> f2c more likely.

I would like to see this "help" feature eliminiated as well. Then again,
I can't speak for f2c users.

This "help" has caused some headache here, and I've resorted to patching 
the Makefile to *not* install the extra copies of f2c.h and libf2c.a for
a few years now.

Regards,
Mumit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Idea: Eliminate libf2c/f2c.h installation from g77 entirely?
  1998-04-18 16:05 Idea: Eliminate libf2c/f2c.h installation from g77 entirely? Craig Burley
  1998-04-18 20:56 ` Mumit Khan
@ 1998-04-19  0:11 ` Jeffrey A Law
  1998-04-19  1:49   ` Craig Burley
  1 sibling, 1 reply; 24+ messages in thread
From: Jeffrey A Law @ 1998-04-19  0:11 UTC (permalink / raw)
  To: Craig Burley; +Cc: egcs, g77-alpha

  In message < 199804182125.RAA23699@melange.gnu.org >you write:
  > I'm thinking it might be time to eliminate from g77 any and all
  > attempts to "help" the sysadmin by installing the libf2c.a and
  > f2c.h files somewhere, which is currently done so that f2c users
  > "automatically" use the same files as used by g77, making successful
  > linking of executables combining objects generated by both g77 and
  > f2c more likely.
I'm all for it :-)  We quite a few folks complaining about the
installation procedure writing outside of $prefix on the egcs
lists.

Of course, we might get complaints in the other direction after
we make the change.

However, I still feel removing the extra copies of f2c.h and
libf2c.a is the right thing to do.

  > (A short summary of the overall issues: g77 currently uses libf2c,
  > f2c's run-time library, for it's own run-time needs.  It ships with
  > a patched version of libf2c, but we've avoided creating gratuitous
  > incompatibilities in the interface.  That, combined with appropriate
  > effort in g77, allows users to pretty much pick and choose which
  > Fortran modules they compile with g77 and which with f2c, link
  > the objects together, and have things actually work, even including
  > starting I/O in a g77-compiled module and continuing it in an
  > f2c-compiled one.)
Yup.  And I've even taken advantage of this at one point, though
I don't remember why :-)

  >   -  If f2c isn't being used, the only need for f2c.h (which g77
  >      generates its copy of via its configuration process from f2c.h.in,
  >      which in turn is just a modified copy of netlib's f2c's f2c.h)
  >      is when building g77's copy of libf2c.  There is no need to install
  >      this f2c.h anywhere; it's almost like an object file in this
  >      respect, in that, once the build is completed, it isn't needed.
I thought we needed to make sure that someone building a translated
file picked up the right f2c.h.

  >   -  If f2c isn't being used, as long as g77 prefers "its" copy of libf2c.a
  >      (e.g. the one in $prefix/lib/gcc-lib/$machine/$version) over the one
  >      in the system's (/usr/lib) library, libf2c.a need not be installed
  >      anywhere else.  That is, it's really no different than the cc1,
  >      cc1plus, f771, and specs files, AFAIK.
I'm pretty sure that the libf2c.a from libsubdir will be preferred,
but we should double check.



  > Only a sysadmin who cares about g77 and f2c inter-operability will be
  > interested in what we do, and I believe that it'll be safer and more
  > robust, overall, to have g77 just "do its job" and not futz with bits
  > and pieces of f2c installation, leaving it to the sysadmin to do that,
  > as would be described in the g77 docs.
I'd tend to agree.  If we want to make their life easier we might
consider a configure time option to install f2c.h & libf2c.a
in /usr/local/..., regardless of the $prefix option.

I also get the feeling that f2c is becoming less and less important
as g77 continues to move forward and gain wider acceptance.  I can't
think of a reason to use f2c over g77 on the two platforms where I
do most of my work (hppa & x86).


  > So, I'd like to implement this removal, along with the relevant docs,
  > myself in the following release schedule:
  > 
  >   -  For g77, in version 0.5.23, the upcoming version that will also
  >      be the first to be based on gcc 2.8 (instead of gcc 2.7).
  > 
  >   -  For egcs, in version 1.1.
  > 
  > Please let me know if you see any problems with this idea, though
  > I'm also interested in "sounds like a good idea" messages as well.
They both sound good to me.


jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Idea: Eliminate libf2c/f2c.h installation from g77 entirely?
  1998-04-19  0:11 ` Jeffrey A Law
@ 1998-04-19  1:49   ` Craig Burley
  1998-04-19  5:18     ` Dave Love
                       ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Craig Burley @ 1998-04-19  1:49 UTC (permalink / raw)
  To: law; +Cc: egcs, g77-alpha

>  In message < 199804182125.RAA23699@melange.gnu.org >you write:
>  > I'm thinking it might be time to eliminate from g77 any and all
>  > attempts to "help" the sysadmin by installing the libf2c.a and
>  > f2c.h files somewhere, which is currently done so that f2c users
>  > "automatically" use the same files as used by g77, making successful
>  > linking of executables combining objects generated by both g77 and
>  > f2c more likely.
>I'm all for it :-)  We quite a few folks complaining about the
>installation procedure writing outside of $prefix on the egcs
>lists.
>
>Of course, we might get complaints in the other direction after
>we make the change.

Yeah, but, in the end, it's easier to tell people to copy the
two files they want to wherever they want them than to tell them
to restore the two system files "make install" on egcs overwrote.  :)

>However, I still feel removing the extra copies of f2c.h and
>libf2c.a is the right thing to do.

I agree.  In looking at g77 0.5.22 even more closely, I discovered
the whole installation of these two files was sufficiently screwed
up that it probably never did install them into the system directories
anyway, despite all the attempts to check whether doing so was okay.

If I'm right about that, given that there have been basically no
complaints, I'd say not having g77 installation overwrite the system
copies of libf2c.a and f2c.h is something nobody is likely to object
to (at least for their own purposes).

>  >   -  If f2c isn't being used, the only need for f2c.h (which g77
>  >      generates its copy of via its configuration process from f2c.h.in,
>  >      which in turn is just a modified copy of netlib's f2c's f2c.h)
>  >      is when building g77's copy of libf2c.  There is no need to install
>  >      this f2c.h anywhere; it's almost like an object file in this
>  >      respect, in that, once the build is completed, it isn't needed.
>I thought we needed to make sure that someone building a translated
>file picked up the right f2c.h.

The first phrase on that item applies to the whole item, though that
wasn't clear -- *only* f2c users need f2c.h sitting around somewhere
(except that g77 needs it during its build of libf2c.a, as explained).

>  >   -  If f2c isn't being used, as long as g77 prefers "its" copy of libf2c.a
>  >      (e.g. the one in $prefix/lib/gcc-lib/$machine/$version) over the one
>  >      in the system's (/usr/lib) library, libf2c.a need not be installed
>  >      anywhere else.  That is, it's really no different than the cc1,
>  >      cc1plus, f771, and specs files, AFAIK.
>I'm pretty sure that the libf2c.a from libsubdir will be preferred,
>but we should double check.

This is the only mystery to me at this point.  I've not yet really
explored the vast underworld of UNIX (and other systems') linking,
search lists, and so on.  (Except PRIMOS, where I pretty much
wrote the book.  :)

>  > Only a sysadmin who cares about g77 and f2c inter-operability will be
>  > interested in what we do, and I believe that it'll be safer and more
>  > robust, overall, to have g77 just "do its job" and not futz with bits
>  > and pieces of f2c installation, leaving it to the sysadmin to do that,
>  > as would be described in the g77 docs.
>I'd tend to agree.  If we want to make their life easier we might
>consider a configure time option to install f2c.h & libf2c.a
>in /usr/local/..., regardless of the $prefix option.

That's sort of like what g77 pretended, but apparently failed, to
offer.  (Maybe it worked for some old releases, but I don't think
it does for those circa 0.5.22.)

I suggest we wait until people ask for it, then ask them why they
can't just write up and distribute a simple script that, given the
f2c and g77 configurations they point it to, update the former
with the latter's libf2c.a and f2c.h, instead of re-infecting
g77 (via egcs or whatever) with non-GNU configuration info.

Pretty soon it'd also be nice to add the use of libtool or
what-not to g77 so libf2c.a can be built as a shared library.
Maybe that involves the whole multi-libbing thing, which I've
yet to explore, but maybe not.

In any case, the simpler the procedure, the easier it is to
introduce new stuff like making a shared library, multi-libbing,
and so on to libf2c.a (or maybe we'll rename it libg77.a soon).

>I also get the feeling that f2c is becoming less and less important
>as g77 continues to move forward and gain wider acceptance.  I can't
>think of a reason to use f2c over g77 on the two platforms where I
>do most of my work (hppa & x86).

Array bounds checking comes to mind.  Yes, I'd like to knock this
one off someday soon.  It'll probably take only a little research
and time to do an f2c-equivalent job of it; much more to do a
Digital-Fortran-style job.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Idea: Eliminate libf2c/f2c.h installation from g77 entirely?
  1998-04-19  1:49   ` Craig Burley
@ 1998-04-19  5:18     ` Dave Love
  1998-04-19 11:19       ` Craig Burley
  1998-04-19  5:37     ` array bounds checking? Dave Love
  1998-04-19  8:15     ` using libtool Dave Love
  2 siblings, 1 reply; 24+ messages in thread
From: Dave Love @ 1998-04-19  5:18 UTC (permalink / raw)
  To: egcs, g77-alpha

I don't understand this discussion (again).  libf2c.a and f2c.h were
installed outside the gcc tree to avoid an f2c installation screwing
g77, not to help f2c.  What's changed to avoid this?  I don't think
anything has on my GNU box.  (gcc will prefer stuff in /usr/local.)

I think the way to avoid conflicts is renaming of the g77 version.
Linking f2c-generated C would still work as well by doing the job with
the `g77' program if that would still compile other languages than
Fortran.  (That's what I do with the non-egcs version when I test f2c
stuff.)

(The correct) f2c.h is necessary or, IMHO should be used, if you write
g77-callable C.

AFAIR at one time what happened about installing was controlled by a
configure option, but it was tricky to implement and fragile in the
gcc 2.7 framework.

Sorry I missed the lossage with the current f2c (non-)installation.  

^ permalink raw reply	[flat|nested] 24+ messages in thread

* array bounds checking?
  1998-04-19  1:49   ` Craig Burley
  1998-04-19  5:18     ` Dave Love
@ 1998-04-19  5:37     ` Dave Love
  1998-04-21 19:10       ` Jim Wilson
  1998-04-19  8:15     ` using libtool Dave Love
  2 siblings, 1 reply; 24+ messages in thread
From: Dave Love @ 1998-04-19  5:37 UTC (permalink / raw)
  To: egcs

 Craig> Array bounds checking comes to mind.  Yes, I'd like to knock
 Craig> this one off someday soon.  It'll probably take only a little
 Craig> research and time to do an f2c-equivalent job of it; much more
 Craig> to do a Digital-Fortran-style job.

Is this no longer proposed for the backend?  If not, I can probably do
it for Fortran with some advice on writing the tree code, for which I
couldn't find a suitable example.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* using libtool
  1998-04-19  1:49   ` Craig Burley
  1998-04-19  5:18     ` Dave Love
  1998-04-19  5:37     ` array bounds checking? Dave Love
@ 1998-04-19  8:15     ` Dave Love
  2 siblings, 0 replies; 24+ messages in thread
From: Dave Love @ 1998-04-19  8:15 UTC (permalink / raw)
  To: egcs

>>>>> "Craig" == Craig Burley <burley@gnu.org> writes:

 Craig> Pretty soon it'd also be nice to add the use of libtool or
 Craig> what-not to g77 so libf2c.a can be built as a shared library.

Is there any reason we cabn't do that now?  I can't remember whether
I couldn't/didn't just because building the library was going to
change or whether it was ill-advised.

Similarly, is there any reason not to use automake to maintain libf2c?
It would have prevented a few build/configuration bugs in the past, I
think, and I already have a somewhat outdated f2c distribution set up
to use it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Idea: Eliminate libf2c/f2c.h installation from g77 entirely?
  1998-04-19  5:18     ` Dave Love
@ 1998-04-19 11:19       ` Craig Burley
  0 siblings, 0 replies; 24+ messages in thread
From: Craig Burley @ 1998-04-19 11:19 UTC (permalink / raw)
  To: d.love; +Cc: egcs, g77-alpha

>I don't understand this discussion (again).  libf2c.a and f2c.h were
>installed outside the gcc tree to avoid an f2c installation screwing
>g77, not to help f2c.  What's changed to avoid this?  I don't think
>anything has on my GNU box.  (gcc will prefer stuff in /usr/local.)

Maybe you understand the situation better than I do, which could
easily explain your not understanding my email!

"gcc foo.o -lf2c -lm" (aka "g77 foo.o") needs to pull in libf2c.a.
It seem like you're saying it'll prefer, e.g., /usr/local/lib/libf2c.a
over /usr/local/lib/gcc-lib/$machine/$version/libf2c.a.  The latter
certainly should be "correct" in the near-vanilla cases; the former
might not be if it's not installed by g77 or if, after being
installed, is overwritten by installing (or re-installing) f2c.

If that's the case, then my proposal is indeed wrong for libf2c.a.
(It still works for f2c.h, since g77 doesn't depend on that post-
installation, but I'd like to fix the whole problem in a
consistent way.)

>I think the way to avoid conflicts is renaming of the g77 version.
>Linking f2c-generated C would still work as well by doing the job with
>the `g77' program if that would still compile other languages than
>Fortran.  (That's what I do with the non-egcs version when I test f2c
>stuff.)

Okay, I'm all for this option, in fact I came close to proposing it
before deciding to offer what seemed like a less radical approach.

>(The correct) f2c.h is necessary or, IMHO should be used, if you write
>g77-callable C.

Right, I'm still in favor of "installing" it as
/usr/local/lib/gcc-lib/$machine/$version/f2c.h, so anyone
can copy it out or whatever afterwards.  Don't see any need
to stop that practice.

>AFAIR at one time what happened about installing was controlled by a
>configure option, but it was tricky to implement and fragile in the
>gcc 2.7 framework.

Apparently!  As an experiment I was trying yesterday to fix this
in a mythical 0.5.22.1.  After getting the basics, I abandoned it,
because I felt I'd learned enough by poring over the relevant stuff
without trying to get it correct (given that I'm not planning to
release a 0.5.22.1).

>Sorry I missed the lossage with the current f2c (non-)installation.  

That's okay, if people don't complain something's broken, it's
pretty amazing if we ever notice it at all.  In this case, two
different bug reports came in within a few weeks of each other,
one against g77 0.5.22's confusions, the other against egcs
overwriting libf2c.a even when installing a cross-compiler.

I'd like to amend my proposal now to include renaming `libf2c.a' to
`libg2c.a', to further reduce the chances of conflict (e.g. the
situation where installing f2c results in g77 linking to the "new",
but possibly older and not properly configured, libf2c.a, instead
of the one built specifically for that version of g77).

That is, I propose that g77 0.5.23 (based on gcc 2.8) and egcs 1.1,
when released, install libf2c.a as libg2c.a *only* in the gcc-lib
heirarchy, and f2c.h as the same name *only* in that heirarchy as well.

Any objections?

Also, is it really certain that gcc's invocation of ld defaults
to searching the system libraries before its gcc-lib/ ones?  If that's
certainly *not* the case, then we don't really need to rename the
library.

        tq vm, (burley)

P.S. I've got a good-enough memory to know that there's almost nothing
I'm proposing above, or previously, that hasn't already been suggested
to me or ask for, by g77 users for the most part, in the past!  Thanks
to those who made these suggestions!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-04-19  5:37     ` array bounds checking? Dave Love
@ 1998-04-21 19:10       ` Jim Wilson
  1998-05-03 22:27         ` Greg McGary
  0 siblings, 1 reply; 24+ messages in thread
From: Jim Wilson @ 1998-04-21 19:10 UTC (permalink / raw)
  To: Dave Love; +Cc: egcs

	Is this no longer proposed for the backend?  If not, I can probably do
	it for Fortran with some advice on writing the tree code, for which I
	couldn't find a suitable example.

It is probably more a question of when it will happen.  It is likely that
this support will appear in the middle-end eventually, but it may not happen
soon enough for your purposes.

Jim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-04-21 19:10       ` Jim Wilson
@ 1998-05-03 22:27         ` Greg McGary
  1998-05-08 16:04           ` Gerald Pfeifer
  1998-05-22  0:21           ` Jeffrey A Law
  0 siblings, 2 replies; 24+ messages in thread
From: Greg McGary @ 1998-05-03 22:27 UTC (permalink / raw)
  To: Jim Wilson, Dave Love, burley; +Cc: egcs

Jim Wilson <wilson@cygnus.com> writes:

> 	Is this no longer proposed for the backend?  If not, I can probably do
> 	it for Fortran with some advice on writing the tree code, for which I
> 	couldn't find a suitable example.
> 
> It is probably more a question of when it will happen.  It is likely that
> this support will appear in the middle-end eventually, but it may not happen
> soon enough for your purposes.

Last year, I did a complete array and pointer bounds checking
implementation for C and C++ in gcc-2.7.2.  I have every intention of
getting this merged into gcc-2.8.x and/or egcs, but my paid work has
been placed unrelenting demands on my time.  It should be adaptable to
g77 as well.

I would be *very* grateful for some assistance with merging, cleanup
suggested by kenner, and testing on more targets.  So far, my only
runtime testing has been for i960 and i386.

Here's a detailed technical summary of what I've done:

------------------------------------------------------------------------------
	Technical Specification for
	Bounded Pointers in GCC

	by Greg McGary <gkm@gnu.ai.mit.edu>
	revised April 8, 1997

In the following discussion, `gcc' shall refer to both the GNU C
compiler and the GNU C++ compiler.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Goals:
>>>>>>>>>>>>>>>>

* Fine grained bounds checking of pointer, array and object references
  of all storage classes: static, heap and automatic.

* Runtime overhead of 50-75% and space overhead of 50-75%: This should
  be small enough that programmers will find it acceptable to enable
  bounded-pointers during development, only disabling them for
  production releases.

* Escape mechanism to override defaults and explicitly enable or
  disable checking of specific types or declarations.

* Design & implementation acceptable as a permanent feature of gcc.

* Ability to mix checked and unchecked object code without
  recompilation: The option `--(un)bounded-libraries' allows you to
  tell gcc what default boundedness is possessed by pointers whose
  declarations appear system header files have, and whose machine
  representations occur in system libraries.  This allows you to
  conveniently mix bounds-checked application code with unchecked
  libraries.

  Bounded pointers can't do anything to patch the unchecked objects
  the way Purify does, but a program compiled with bounded pointers
  can at least inter-operate with code compiled without.  However,
  bounded pointers offer functionality that Purify can only dream
  about, since Purify is constrained by a commercial business model
  wherein it must operate on code available in object-form only.
  Since the GNU system has no such constraint, we are free to
  implement bounds checking a better way, and since the GNU system
  has a complete C library, it's possible to build applications that
  contain 100% bounds-checked code.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Non-Goals:
>>>>>>>>>>>>>>>>

Bounded pointers are no help for the following problems.  Bounded
pointers are just one piece, albeit a major one, in a complete
system for automatic detection of memory usage bugs.

* Detection of memory leaks for heap-allocated memory.

* Detection of the use of uninitialized memory:  References through a
  valid pointer will not detect the (un)initialized condition of the
  referent object.  However, references through NULL pointers can be
  caught by bounded pointers, and if gcc is optionally instructed to
  initialize all otherwise uninitialized pointer variables with
  automatic storage class, some level of uninitialized variable
  checking is possible.

* Detection of the use of stale or dangling pointers: A bounded
  pointer appears valid if its value falls within the interval
  [base..extent).  However, the referent object might be an automatic
  variable that's no longer in scope, or a heap variable that's been
  released.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bounded Pointers with Checks Prior to Dereference:
>>>>>>>>>>>>>>>>

All pointer objects (unless explicitly overridden), comprise three
words: the pointer, a base and an extent.  The base is lowest valid
address value that the pointer may assume, while the extent is the
lowest address above the highest valid address value.

Dereferencing a pointer causes gcc to generate code to check the
pointer against its base & extent, and to call an error-function if
the bounds are violated.  (An alternate bounds-checking approach is to
check the results of all pointer arithmetic.  The drawback here is
that loops commonly auto-increment past the extent of an array in the
loop-exit test.)

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Error functions:
>>>>>>>>>>>>>>>>

(Note: the design of this facility is not yet final, and will change)

The default error function is `abort'.  The user may override this
with a custom error function.  If a function with the name
`__array_bounds_violation' has been declared at the time gcc generates
bounds checks for an array, it will be called instead of `abort'.  If
a function with the name `__pointer_bounds_violation' has been
declared at the time gcc generates bounds checks for a pointer, it
will be called instead of `abort'.

The function `__array_bounds_violation' accepts zero or more
arguments from the following list:

    char const *file_name;
    int line_number;
    char const *array_name;
    int upper_bound;
    char const *index_name;
    int bad_index;

The function `__pointer_bounds_violation' accepts zero or more
arguments from the following list:

    char const *file_name;
    int line_number;
    char const *pointer_name;
    int upper_bound;
    int bad_index;

Gcc looks for these arguments by name and in this order, so the
user-declared argument names and types *must* match those in this
list, and appear in this order, though unwanted arguments may be
freely omitted.

Here is an example definition of an array error function that accepts
all possible arguments:

    void
    __array_bounds_violation (char const *file_name, int line_number,
			      char const *array_name, int upper_bound,
			      char const *index_name, int bad_index)
    {
      printf ("%s:%d: array bounds violation: %s[%s == %d] bounds: 0..%d\n",
	      file_name, line_number, array_name,
	      index_name, bad_index, upper_bound);
    }

Here is an example definition of an array error function that accepts
all possible arguments:

    void
    __pointer_bounds_violation (char const *file_name, int line_number,
				char const *pointer_name,
				int upper_bound, int bad_index)
    {
      printf ("%s:%d: pointer bounds violation: %s, index: %d, bounds: 0..%d\n",
	      file_name, line_number, pointer_name, bad_index, upper_bound);
    }

Here are example functions that only report filename and line number:

    void
    __array_bounds_violation (char const *file_name, int line_number)
    {
      printf ("%s:%d: array bounds violation\n", file_name, line_number);
    }

    void
    __pointer_bounds_violation (char const *file_name, int line_number)
    {
      printf ("%s:%d: array bounds violation\n", file_name, line_number);
    }

In all cases, these functions return without causing the program to
crash, so potentially many violations may be reported in one run.
If you wish to cause a crash at the point of a violation, simply
add a call to `abort' after printing the message.


The type of the inner bounded object determines which of these is
called, *not* the syntax of the expression in which it is used.  If
the inner bounded object is a pointer, then we use
`__pointer_bounds_violation', if it's an array, we use
`__array_bounds_violation'.  e.g.,

    char array[10]
    char *pointer = array + 5;

    array[i] = 1;	/* calls __array_bounds_violation */
    pointer[i] = 2;	/* calls __pointer_bounds_violation */
    *(array + i) = 3;	/* calls __array_bounds_violation */
    *(pointer + i) = 4;	/* calls __pointer_bounds_violation */
    
For `__pointer_bounds_violation', the value of `bad_index' is computed
as (__ptrvalue__ (p) - __ptrbase__ (p)), and the value of
`upper_bound' is computed as (__ptrextent__ (p) - __ptrbase__ (p) - 1)
(dereferencing a NULL pointer will show an upper_bound of -1).  The
value of `pointer_name' is the expression that's dereferenced, e.g.,
"(pointer+i)" for *both* of the above example assignments using
`pointer'.

The most convenient way of ensuring that gcc sees the declarations of
your reporting function for all compilation units, to put them into a
header file, then use gcc's `-include <file>' command-line option.
This relieves you of the burden of changing the source code.

Caveats:

* As is also the case with compiler syntax errors, it is possible for
  bounds violations to cascade, or for one violation to be reported
  many times.

* The above feature is no substitute for a symbolic debugger.  The
  reporting functions above only indicate the source code location of
  the topmost frame of the call stack.  It is often necessary to
  inspect the complete program state in order to diagnose the cause of
  the bounds violation and identify the best fix.

* The verbose reporting functions can be very expensive in text space
  overhead, which is proportional to the number of arguments passed.
  If you're already in the habit of using gdb to debug bounds violations,
  you're probably better off just using the default `abort'.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Other Checks:
>>>>>>>>>>>>>>>>

Pointer subtraction could cause gcc to generate checks that the
operands refer to the same object.  Low-level systems code should
probably disable this form of checking by operating on unbounded
pointers.

Use of uninitialized pointers and the NULL pointer can be trapped by
automatically by initializing the bounded-pointer object to a value
that is guaranteed to trigger a bounds violation upon dereference.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bounded Pointer Declarations:
>>>>>>>>>>>>>>>>

The bounded or unbounded state of a pointer is determined by a
type-qualifier of either `__bounded__' or `__unbounded__'.  With the
`-fno-bounded-pointers' switch (the gcc default), in the absence of an
explicit qualifier, a pointer decl is considered `__unbounded__'.  With
the `-fbounded-pointers' switch, in the absence of an explicit
qualifier, a pointer decl is considered `__bounded__'.  A pointer
qualified implicitly or explicitly as `__bounded__' occupies three
words of storage for the pointer itself, a base and an extent.  A
pointer qualified implicitly or explicitly as `__unbounded__' occupies
a single word, as is traditional for C pointers.

If a file is compiled with `-fno-bounded-pointers' *and* contains no
explicit `__bounded__' qualifiers, then the resultant object will be
the same as if it were compiled with a version of gcc without this
feature.  `-fno-bounded-pointers' will *not* disable the effect of an
explicit `__bounded__' qualifier.  If the user desires that behavior,
s/he should include add `-D__bounded__= -D__unbounded__=' to the gcc
command-line to elide the qualifiers.

Header files included from specific directories can be designated as
containing pointer declarations that are bounded or unbounded.

    --unbounded-includes DIRECTORY
    -iunbounded DIRECTORY
	When processing an include file whose directory prefix matches
	DIRECTORY, make the pointer declarations unbounded by default.

    --bounded-includes DIRECTORY
    -ibounded DIRECTORY
	When processing an include file whose directory prefix matches
	DIRECTORY, make the pointer declarations bounded by default.

    --unbounded-standard-includes
    -iunboundedstdinc
	When processing an include file whose directory prefix matches
	one of the standard system include directories, make the
	pointer declarations unbounded by default.

    --bounded-standard-includes
    -iboundedstdinc
	When processing an include file whose directory prefix matches
	one of the standard system include directories, make the
	pointer declarations bounded by default.

Type qualifiers represent fine-grained control over the boundedness of
pointers.  Command-line switches represent coarse-grained control.
Intermediate-grained control is provided by the following syntax:

    __bounded__ {
	/* pointer declarations are bounded by default
	   within this block. */
    }

    __unbounded__ {
	/* pointer declarations are unbounded by default
	   within this block. */
    }


The __bounded__ and __unbounded__ keywords may be regarded as prefix
operators that apply to blocks.  This syntax may also appear at
toplevel (file scope).

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Runtime pointer boundary checks:
>>>>>>>>>>>>>>>>

The command-line switch `-fcheck-bounds' controls generation of
runtime code to perform bounds checks on bounded pointers.  The effect
of this switch is orthogonal to `-fbounded-pointers'.
`-fbounded-pointers' controls how much space (one word or three words)
is occupied by pointers passed as function arguments, or stored in
memory with storage class static, whereas `-fcheck-bounds' controls
whether or not gcc generates runtime code to validate bounded
pointers.  When `-fcheck-bounds' is specified in conjunction with
`-fno-bounded-pointers', gcc treats pointer variables having automatic
storage-class as bounded, and those passed as arguments or having
static storage-class as unbounded.  This gives as much bounds checking
as can be accomplished without altering the static storage layout or
function calling conventions.  Generally, this means that all array
references and some pointer dereferences can be checked.

The following syntax offers finer grained control over generation of
runtime pointer boundary checks:

    __check_bounds__ {
	/* generate range checks for bounded pointers. */
    }

    __no_check_bounds__ {
	/* Don't generate range checks for bounded pointers. */
    }

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Selection of Bounded-Pointer Components:
>>>>>>>>>>>>>>>>

The builtin prefix operators `__ptrvalue__ <expr>', `__ptrbase__
<expr>', `__ptrextent__ <expr>', where <expr> is a pointer-valued
expression, extract the individual components of a bounded pointer.
These operators have the same syntax and precedence as `sizeof'.

Applied to an unbounded pointer, `__ptrbase__' returns UNKNOWN_BASE and
`__ptrextent__' returns UNKNOWN_EXTENT, in other words, the most permissive range
of values possible.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Optimizations:
>>>>>>>>>>>>>>>>

These two optimizations are essential to meeting both the time and
space performance goals:

(1) It is unnecessary to recheck bounds of a pointer that remains
    unchanged since the last check.  The gcc back-end must be enhanced
    to eliminate the redundant checks.

(2) If a bounded-pointer loop-induction variable increments
    monotonically the check against the base need be performed only
    once at the top of the loop.  Similarly, if such a variable
    decrements monotonically, the check against the extent need only
    be performed once.

    Even better, if gcc can statically determine the range of values a
    bounded-pointer loop-induction variable will assume, it need only
    check the variable's minimum value against the base, and the
    variable's maximum value against the extent, performing both
    checks once before the loop.

    Since these are generally valid loop optimizations, they should be
    implemented in a general way.

This third item is not an optimization per se, but rather describes
storage allocation policy that allows the optimizer to function on
bounded-pointer variables with automatic storage class:

(3) The three components of a bounded-pointer triple (pointer, base,
    extent) should be assigned to registers whenever registers are
    available.  Although it is tempting to think of a bounded pointer
    object as a three-word structure, there is no requirement that the
    three components of a bounded-pointer variable with automatic
    storage class occupy contiguous stack slots, unless the address of
    the variable is taken, in which case it must be laid out
    contiguously as (pointer, base, extent).  If in registers, there
    should be no requirement that they occupy contiguous registers
    (although there is an advantage to doing so on architectures that
    can move, load or store multiple registers with a single
    instruction), nor should there be any restriction against
    assigning some components to registers and others to stack slots.
    As an automatic variable, unless its address is taken, a
    bounded-pointer may be considered to be individually declared
    variables whose register assignment is done independently.
    Bounded pointers passed as arguments must be assigned contiguous
    positions in the proper order under the target's argument passing
    convention, either as contiguous registers or as contiguous stack
    slots (or possibly a mixture, if the three components of the
    argument straddle the boundary between args passed in registers
    and those passed on the stack).  Note that gcc already handles
    automatic storage for the two components (real & imaginary) of its
    builtin complex type in this way, using the CONCAT RTL expression
    to hold the two components.  Bounded pointers employ the CONCAT3
    RTL expression type.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> External Arrays:
>>>>>>>>>>>>>>>>

When arrays are declared `extern' with array-bounds omitted, a
compilation unit has no way of knowing the bounds of the array at
compile time.  In this case, gcc emits the extent as a SYMBOL_REF to a
synthetic symbol called `foo.ext' (for an array named `foo').  If such
a reference is generated for an array defined in an object not
compiled with `-fbounded-pointers' or `-fcheck-bounds', then this
reference will be undefined.  In this case, if the GNU ld (or collect)
can be modified to synthesize the definition, or set it to
UNKNOWN_EXTENT if for some reason it can't determine the size of the
array.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Incomplete Structures:
>>>>>>>>>>>>>>>>

GNU C allows variables of incomplete structure type to be declared
extern.  E.g., the following from libio.h:

    extern struct _IO_FILE_plus _IO_stdin_;
    #define _IO_stdin ((_IO_FILE*)(&_IO_stdin_))

Gcc can't determine the extent of _IO_stdin for ordinary user programs
whose compilation units do not include the declaration of this
structure.  This is currently an unsolved problem.  A probable
solution is to mark the extent of all public static-storage-class
structs with a synthetic symbol exactly as is now done for arrays, as
described above.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Variably Sized Arrays in Structs:
>>>>>>>>>>>>>>>>

A common technique for coding a fixed header followed by a variably
sized body is to declare a structure such as this:

    struct foo
    {
      int fixed1;
      int fixed2;
      char name[1];
    };

Where `name' is the variably sized member.  Gcc even allows the size
of such an array to be specified as 0.  Such structures are always
dynamically allocated.  In this case, the extent of the variably sized
array is the same as the extent of the enclosing structure.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Function Pointers:
>>>>>>>>>>>>>>>>

To save space, pointers to functions are `__unbounded__' by default,
since programs don't generally perform arithmetic on them.
`-Wpointer-arith' will give compiler warnings about arithmetic on
function pointers, and so long one adheres to the the discipline of
avoiding casts of function pointers to non-function-pointer types,
that should be sufficient to ensure safety in this context.
When a function is called through a pointer, the pointer value will be
implicitly checked against the range [start, etext).  Calls through
trampolines will have such checks disabled.

	Explicit Casts:

By default, casts have no effect on the bounds of a pointer.
E.g.:

    char chars[10];		/* bounds are [chars..chars+10): 10 bytes */
    int *icp = (int *)chars;	/* bounds remain [chars..chars+10): 10 bytes */
    int i;			/* bounds are [&i..&i+1): 4 bytes */
    char *cip = (char *)&i	/* bounds remain [&i..&i+1): 4 bytes */

If the programmer desires a cast to reset bounds, the __bounded__
qualifier must be applied to the pointer type in the cast expression.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Heap Storage Allocation (Malloc):
>>>>>>>>>>>>>>>>

A special bounds-aware malloc and C++ `new' operator returns
bounded pointers with a base & extent covering the allocated region.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stack Storage Allocation (alloca):
>>>>>>>>>>>>>>>>

Just as with malloc, a bounded-pointer aware alloca returns bounded
pointers.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> UNKNOWN:
>>>>>>>>>>>>>>>>

A special bit-patterns represent unknown values of base & extent.  The
value of UNKNOWN_BASE is generally 0, and the UNKNOWN_EXTENT value is
generally ~0 (all 1s).  The values are chosen to yield the maximally
permissive bounds.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> String and memory library functions:
>>>>>>>>>>>>>>>>

String & mem functions and any other library primitives that must be
fast can be coded so that the boundary checks are performed once at
the beginning, followed by the real work which is done using simple
unbounded pointers, e.g.:

    char *
    strcpy (char *dest, char const *src)
    {
      char const *dest_0 = dest;
    #if BOUNDED_POINTERS
      /* strlen will check bounds of src */
      char const *__unbounded__ dest_end = dest + 1 + strlen (src);
      if ((__ptrvalue__ dest) < (__ptrbase__ dest)
	  || (dest_end >= (__ptrextent__ dest)))
	abort ();
    #endif
      do
	*(__ptrvalue__ dest)++ = *(__ptrvalue__ src);
      while (*(__ptrvalue__ src)++);
      return dest_0;
    }

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> argv and environ arrays:
>>>>>>>>>>>>>>>>

The program argument string vector (argv) and environment string
vector (environ) come from the OS as arrays of unbounded pointers.
The interim solution is to apply explcit __unbounded__ qualifiers
to both of these declarations, e.g.:

    char *__unbounded__ *__unbounded__ argv;

This approach, while expedient, has the disadvantages of requiring
source code changes to all programs and creates an unnecessary
bounds-checking loophole.  A better solution is to recreate both
vectors with bounded-pointers very early in the startup sequence.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mixing Bounded & Unbounded Pointers at Source Level:
>>>>>>>>>>>>>>>>

Semantics of assignment & argument passing for mixed bounded and
unbounded pointers is as follows:

lhs (or param)	rhs (or arg)	semantics
--------------	------------	---------
bounded		unbounded	lhs of base & extent set to UNKNOWN
unbounded	bounded		base & extent of rhs are discarded

Two types must have identical __unbounded__ qualifiers at all but the
outermost pointer level in order to be compatible.  e.g.,

    char *__unbounded__ *a;
    char *__unbounded__ *__unbounded__ b;
    char **__unbounded__ c;

`a' and `b' are compatible, but `c' is compatible with neither `a' nor
`b'.  The incompatibility arises from the different sizes of bounded
and unbounded pointer objects.

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mixing Compiled Modules, some with Bounded
>>>>>>>>>>>>>>>> Pointers and others with Unbounded Pointers:
>>>>>>>>>>>>>>>>

As long as the module interface declarations are compiled with
boundedness type-qualifiers that match the module's object code, all
should be well, e.g.:

    /* My libc doesn't grok bounded pointers 8^( */
    extern int strcmp (char const *__unbounded__ x, char const *__unbounded__ y);

    void
    my_strcmp (char const *x, char const *y)
    {
      return strcmp (x, y);
    }

The above example is contrived.  Here's another contrived example:

    /* My libc doesn't grok bounded pointers 8^( */
    __unbounded__ {
    #include <string.h>
    #include <stdlib.h>
    #include <stdio.h>
    }

    ...

The easiest way to handle the problem of properly declaring standard
system library interfaces whose pointers are unbounded is to use the
option `--unbounded-standard-includes' (also known as `-iunboundedstdinc').

>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Implementation Status:
>>>>>>>>>>>>>>>>

As of the first week of March, 1997, the full bounded pointer
implementation for C is 95% complete, though bugs surely remain.
glibc has been ported to bounded-pointers for Linux on the x86, which
involved adding bounded-pointer checks to the top of assorted
assembler-language string functions, and putting wrappers around the
system call entry-points to check the bounds of pointer arguments and
convert the bounded pointers to unbounded prior to trapping to the OS.

glibc-2.0.1 successfully compiles with `-fbounded-pointers', several
utilities from textutils (md5sum, fmt and sort) run to completion for
the small number of test cases tried so far.  The next step is to test
more GNU packages compiled with `-fbounded-pointers' and linked with
the bounded-pointer version of glibc.

For the C++ front-end, only `-fcheck-bounds' is complete and tested.
Much of the support for bounded-pointers is present in the C++
front-end and back-end but is still incomplete and untested.

Remaining tasks are:

* Add bounded-pointer support to libstdc++.
* Implement optimizations to eliminate redundant checks.
* Test, debug, test, debug, test, debug...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-03 22:27         ` Greg McGary
@ 1998-05-08 16:04           ` Gerald Pfeifer
  1998-05-08 16:08             ` Greg McGary
  1998-05-22  0:21           ` Jeffrey A Law
  1 sibling, 1 reply; 24+ messages in thread
From: Gerald Pfeifer @ 1998-05-08 16:04 UTC (permalink / raw)
  To: Greg McGary; +Cc: Jim Wilson, Dave Love, burley, egcs

On 3 May 1998, Greg McGary wrote:
> Last year, I did a complete array and pointer bounds checking
> implementation for C and C++ in gcc-2.7.2.  I have every intention of
> getting this merged into gcc-2.8.x and/or egcs, but my paid work has
> been placed unrelenting demands on my time. [...]

Getting this into egcs should be much easier than getting it into gcc. :-/

> I would be *very* grateful for some assistance with merging, cleanup
> suggested by kenner, and testing on more targets.  So far, my only
> runtime testing has been for i960 and i386.

I can offer testing support under sun-sparc-solaris2.5.1/2.6 and
FreeBSD-2.2.6, perhaps also alpha-dec-osf4.

Gerald
-- 
Gerald Pfeifer (Jerry)      Vienna University of Technology
pfeifer@dbai.tuwien.ac.at   http://www.dbai.tuwien.ac.at/~pfeifer/


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-08 16:04           ` Gerald Pfeifer
@ 1998-05-08 16:08             ` Greg McGary
  0 siblings, 0 replies; 24+ messages in thread
From: Greg McGary @ 1998-05-08 16:08 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Greg McGary, Jim Wilson, Dave Love, burley, egcs

Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at> writes:

> Getting this into egcs should be much easier than getting it into gcc. :-/

Probably true.

> > I would be *very* grateful for some assistance with merging, cleanup
> > suggested by kenner, and testing on more targets.  So far, my only
> > runtime testing has been for i960 and i386.

FYI, I just finished a gcc-2.8.1 merge and from there will merge into egcs.
I'll also do the cleanup.  I don't think it will be as much work as I
had anticipated.

> I can offer testing support under sun-sparc-solaris2.5.1/2.6 and
> FreeBSD-2.2.6, perhaps also alpha-dec-osf4.

I already have SPARC & i386 solaris2.5.1.  If you can get it, the
useful one would be Alpha since that represents another target arch.
FreeBSD is just another i386.

Thanks for the offer.  I'll get back to you when I have the gcc-2.8.1
and egcs merges done, and it's all working as well it did in gcc-2.7.2.

Greg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-03 22:27         ` Greg McGary
  1998-05-08 16:04           ` Gerald Pfeifer
@ 1998-05-22  0:21           ` Jeffrey A Law
  1998-05-22  0:32             ` Greg McGary
                               ` (4 more replies)
  1 sibling, 5 replies; 24+ messages in thread
From: Jeffrey A Law @ 1998-05-22  0:21 UTC (permalink / raw)
  To: Greg McGary; +Cc: Jim Wilson, Dave Love, burley, egcs

I looked over the bounds checking and bounded pointer stuff for
a little while today.

Based on specs alone it sounds like bounded pointers are the
better direction.

My biggest concern is the change in pointer size; yes BP has
mechanisms to handle this, but it just makes me uneasy.

It sounds like you use a CONCAT like rtx to represent a bounded
pointer.  Is that correct?  If so, then we're going to need to
actually fix the CONCAT support.  It's got some problems right now.

The fact that support for the C++ front-end is already done is a
major win.

I also believe that checking just memory accesses is probably
better than checking all pointer arithmetic.  It is possible
in some languages (Ada) to have a pointer outside the object
being pointed to.  Some backends do similar things when
optimizing.

It might be interesting to hear from the Fortran folks if they
have any opinions.

It would also be good to get some sense of the changes involved;
particularly for front-ends.  We'd want to be able to hook into
the Fortran front end initially.  The gpc guys would probably also
benefit from this code, so we'll want them to be able to hook in
as well.


I'd also be a little concerned about legal issues -- do we have
any known patent exposure with either implementation?  I don't
really know what patents might be floating around in this 
arena.

I also don't see a copyright on file for you personally or a
disclaimer from Ascend.  Those issues will also have to be
resolved.


We'll probably want to start looking at the code before we go
any further.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22  0:21           ` Jeffrey A Law
@ 1998-05-22  0:32             ` Greg McGary
  1998-05-22  5:42             ` Greg McGary
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Greg McGary @ 1998-05-22  0:32 UTC (permalink / raw)
  To: law; +Cc: Greg McGary, Jim Wilson, Dave Love, burley, egcs

Jeffrey A Law <law@hurl.cygnus.com> writes:

> My biggest concern is the change in pointer size; yes BP has
> mechanisms to handle this, but it just makes me uneasy.

It does take some getting used to.  Having a BP C library helps
immensely.

> It sounds like you use a CONCAT like rtx to represent a bounded
> pointer.  Is that correct?

I invented a new rtx CONCAT3.  I don't much like it and think both
special-cases CONCAT and CONCAT3 should be replaced with a
variable-length rtx where XEXP (foo, 0) is the length.

> If so, then we're going to need to actually fix the CONCAT support.
> It's got some problems right now.

Whatever you can tell me about CONCAT problems would be a great help.
I already fixed a few CONCAT bugs in 2.7.2.

> It would also be good to get some sense of the changes involved;
> particularly for front-ends.

As promised in an earlier message, I'll send you a useless patch you
can review.  (Useless because it's either for 2.7.2 or untested for
2.8.1)

> We'd want to be able to hook into the Fortran front end initially.
> The gpc guys would probably also benefit from this code, so we'll
> want them to be able to hook in as well.

What's "gpc"?  Pascal?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22  0:21           ` Jeffrey A Law
  1998-05-22  0:32             ` Greg McGary
@ 1998-05-22  5:42             ` Greg McGary
  1998-05-22 12:21             ` Per Bothner
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Greg McGary @ 1998-05-22  5:42 UTC (permalink / raw)
  To: law; +Cc: Greg McGary, Jim Wilson, Dave Love, burley, egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3487 bytes --]

Jeffrey A Law <law@hurl.cygnus.com> writes:

> I'd also be a little concerned about legal issues -- do we have
> any known patent exposure with either implementation?  I don't
> really know what patents might be floating around in this 
> arena.

I have no idea about that, but my gut feeling is that BPs aren't very
interesting to commercial enterprises because BPs do such violence to
the size of pointers and appear to irrevocably break binary
compatibility, being designed to work best on systems where full
source code is available.  Pure/Atria has patents on techniques that
allow them to check object code for which there is no source.  In the
commercial world, the focus is on interoperability with 3rd party
proprietary object-code-only libraries.

Someone should look at this reference from Richard W.M. Jones's
report:

    S.C. Kendall, “Bcc: run-time checking for C programs”, USENIX
    Toronto 1983 Summer Conference Proceedings, USENIX Assoc., El
    Cerrito, C.A. (1983)

RWMJ writes:

    Bcc was the first attempt to add runtime checking to C
    programs. It is a prepass compiler that performs lint-like static
    checking of the code and adds dynamic checking to the code it
    writes out too. Like other work in this area it suffered from
    being too slow for developers to include bounds checking in
    distributed software. It worked by replacing pointers with
    extended structures containing the pointer and its lower and upper
    bounds. Naturally C libraries needed to be recompiled.

That sure looks like prior art to me.  I'm pretty sure Sam did this
work at Sun.  I asked Sam, for a copy of his paper, but he didn't have
one, and didn't have it online either.  (I'll bet someone at Cygnus
can lay their hands on it the Toronto '83 USENIX Proceedings.)  I
corresponded with Sam last year, though I can't find the messages in
my email archives...  As I recall, he was very down on the whole
experience since performance was so bad, wished me sympathy, and said
he hoped never to write such a tool again.  He seemed to think the BP
approach had little value, so I would be surprised if he or Sun
bothered to patenet it.  I'm sure performance sucked because the
implementation was naive and didn't allow for the components to be
individually assigned to registers.  I recall that he mentioned a 10x
slowdown.  BPs as I implemented them in gcc only showed approx 70%
runtime overhead and that's before optimizing-out redundant checks.

> I also don't see a copyright on file for you personally or a
> disclaimer from Ascend.  Those issues will also have to be
> resolved.

A couple weeks ago, I mailed to rms assingments of future copyright
interest for gcc, g++, binutils and just about anything else I might
conceivably hack.  I'll check to make sure he got it.  The assignments
were sighed by me as an individual and as president of my corporation.
My corporation does work under contract with Ascend, so a disclaimer
from them shouldn't be necessary.

> We'll probably want to start looking at the code before we go
> any further.

Fine.  As soon as I have some spare cycles, I can send you a patch.  I
can either send you a working patch for gcc-2.7.2, or an untested,
probably broken patch for gcc-2.8.1.  Since you're just looking
generally at the design and implementation, running code shouldn't be
necessary for you at this point.  There are known problems which are
on my list to fix, and I will list those along with the patch.

Greg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22 12:21             ` Per Bothner
@ 1998-05-22 12:14               ` Jeffrey A Law
  1998-05-22 16:41               ` Pieter Nagel
  1 sibling, 0 replies; 24+ messages in thread
From: Jeffrey A Law @ 1998-05-22 12:14 UTC (permalink / raw)
  To: Per Bothner; +Cc: egcs

  In message < 199805221832.LAA15463@cygnus.com >you write:
  > Are you sure?  That is not my understanding.
Yes.  In the Ada world they're usually called virtual origins.  I
don't know if it's a user-visible feature or just something that
the optimizer folks deal with.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22  0:21           ` Jeffrey A Law
  1998-05-22  0:32             ` Greg McGary
  1998-05-22  5:42             ` Greg McGary
@ 1998-05-22 12:21             ` Per Bothner
  1998-05-22 12:14               ` Jeffrey A Law
  1998-05-22 16:41               ` Pieter Nagel
  1998-05-22 16:41             ` Dave Love
  1998-05-22 20:21             ` Greg McGary
  4 siblings, 2 replies; 24+ messages in thread
From: Per Bothner @ 1998-05-22 12:21 UTC (permalink / raw)
  To: law; +Cc: egcs

> It is possible in some languages (Ada) to have a pointer outside the object
> being pointed to.

Are you sure?  That is not my understanding.

> We'd want to be able to hook into
> the Fortran front end initially.  The gpc guys would probably also
> benefit from this code, so we'll want them to be able to hook in as well.

The need for bounded pointers arises in languages that
don't have real arrays, but instead allow unchecked
pointer arithmetic - i.e. C, C++, and objective-C.
You don't have pointer arithmetic in (most) other languages,
so the problem is a lot simpler.

	--Per

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22 12:21             ` Per Bothner
  1998-05-22 12:14               ` Jeffrey A Law
@ 1998-05-22 16:41               ` Pieter Nagel
  1 sibling, 0 replies; 24+ messages in thread
From: Pieter Nagel @ 1998-05-22 16:41 UTC (permalink / raw)
  To: egcs

On Fri, 22 May 1998, Per Bothner wrote:

> > It is possible in some languages (Ada) to have a pointer outside the object
> > being pointed to.
> 
> Are you sure?  That is not my understanding.

In C/C++, a pointer value which is "one past the edge" is legal,
because a lot of loops increment/decrement pointers before they test
for end of loop.

Note that the pointer *value* is legal, but dereferencing it is still
undefined.

-- 
     ,_
     /_)              /| /
    /   i e t e r    / |/ a g e l


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22  0:21           ` Jeffrey A Law
                               ` (2 preceding siblings ...)
  1998-05-22 12:21             ` Per Bothner
@ 1998-05-22 16:41             ` Dave Love
  1998-05-22 20:21             ` Greg McGary
  4 siblings, 0 replies; 24+ messages in thread
From: Dave Love @ 1998-05-22 16:41 UTC (permalink / raw)
  To: egcs

>>>>> "Jeff" == Jeffrey A Law <law@hurl.cygnus.com> writes:

 Jeff> It might be interesting to hear from the Fortran folks if they
 Jeff> have any opinions.

[I think Craig is busy at the moment.]

I haven't looked at this closely, but I don't think it's really
relevant to Fortran.  With its proper multidimensional arrays the
problem seems quite simple (modulo possible tricks to reduce the
overhead which we wouldn't expect).

To do the runtime bounds checking we want is probably only an hour or
two's work on the front end for someone sufficiently expert on tree
code; Toon and I aren't and we haven't hassled Craig over it.  I think
there are three places in f/com.c where one needs to generate a loop
over the array indices and define/invoke a suitable runtime error when
appropriate.  (AFAIR the relevant middle-end code seemed to get the
information so it could be done there instead.)

The situation in a future with fortran90-style arrays and `pointers'
might be significantly different in that respect, but I guess not.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22  0:21           ` Jeffrey A Law
                               ` (3 preceding siblings ...)
  1998-05-22 16:41             ` Dave Love
@ 1998-05-22 20:21             ` Greg McGary
  1998-05-26  2:32               ` Ken Raeburn
  4 siblings, 1 reply; 24+ messages in thread
From: Greg McGary @ 1998-05-22 20:21 UTC (permalink / raw)
  To: law; +Cc: Greg McGary, Jim Wilson, Dave Love, burley, egcs

Jeffrey A Law <law@hurl.cygnus.com> writes:

> My biggest concern is the change in pointer size; yes BP has
> mechanisms to handle this, but it just makes me uneasy.

There is a problem mixing BP application code with non-BP libc for GNU
packages and other programs that try hard to be portable.  Such
packages often contain declarations of system functions in order to
make up for deficiencies in the vendor's header files, either because
the function is missing entirely, or because the application wishes to
declare as a prototype.  Unless care is taken, the application's
declaration will use bounded pointers, but the vendor's declaration
and the definition from the vendor's library will have unbounded.
Portable packages also define replacements for system functions when
the vendor's version is buggy.  If a declaration for such a function
appears in a system header, it will use unbounded pointers, but, as
part of the application, the definition might easily use bounded
pointers.

All of these problems can be sidestepped by using a BP libc, but that
option is only available if you have GNU libc, or some other libc that
can be compiled from source.  For those stuck with vendor-supplied
binary-only system libraries, BPs can be considered incentive to move
to an open-source OS!  Alternatively, the package can be reworked to
ensure that boundedness of application-supplied declarations of
standard functions is consistent with the function definitions.

For libraries that aren't libc, the problem is much reduced, to the
extent that the headers for the library are complete and correct.  I
found in practice that mixing BP application with non-BP X11 presented
no problems.

One can also play games with wrappers for standard functions that
convert bounded to unbounded pointers, however that sounds like mucho
work.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22 20:21             ` Greg McGary
@ 1998-05-26  2:32               ` Ken Raeburn
  0 siblings, 0 replies; 24+ messages in thread
From: Ken Raeburn @ 1998-05-26  2:32 UTC (permalink / raw)
  To: Greg McGary; +Cc: egcs

Greg McGary <gkm@eng.ascend.com> writes:

> All of these problems can be sidestepped by using a BP libc, but that
> option is only available if you have GNU libc, or some other libc that
> can be compiled from source.  For those stuck with vendor-supplied
> binary-only system libraries, BPs can be considered incentive to move
> to an open-source OS!  Alternatively, the package can be reworked to
> ensure that boundedness of application-supplied declarations of
> standard functions is consistent with the function definitions.

The Checker support already in gcc/egcs has another way to work around
libraries that can't be recompiled: Use a compiler option to rename
the versions of functions compiled to use memory checking, and all
functions they call, and provide a means to write the appropriate
wrapper function for functions you can't just recompile.  The wrapper
function can explicitly check the memory references the library
function is expected to make.

	extern void memset (...);
	extern void checked_memset (...) asm ("chkr.memset");

	void checked_memset (...) {
		chkr_check_addr (...);
		memset (...);
	}

I'm not saying it's the ideal fix, but it might be worth considering.

Ken

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
@ 1998-05-24  4:18 Geert Bosch
  0 siblings, 0 replies; 24+ messages in thread
From: Geert Bosch @ 1998-05-24  4:18 UTC (permalink / raw)
  To: law, Per Bothner; +Cc: egcs

On Fri, 22 May 1998 13:13:42 -0600, Jeffrey A Law wrote:

  Yes.  In the Ada world they're usually called virtual origins.  I
  don't know if it's a user-visible feature or just something that
  the optimizer folks deal with.

Currently virtual origins are not used with the GNAT (GNU Ada)
compiler, but this could get implemented in the future. Such
optimizations are not in any way be user-visible. Even taking
the address of such an object has to return the address of the
first storage unit occupied by the actual data. 

Regards,
   Geert



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
  1998-05-22 12:21 array bounds checking? Niall Smart
@ 1998-05-22 16:41 ` Greg McGary
  0 siblings, 0 replies; 24+ messages in thread
From: Greg McGary @ 1998-05-22 16:41 UTC (permalink / raw)
  To: Niall Smart; +Cc: law, Greg McGary, Jim Wilson, Dave Love, burley, egcs

njs3@doc.ic.ac.uk (Niall Smart) writes:

> > I looked over the bounds checking and bounded pointer stuff for
> > a little while today.
> 
> Are you guys familiar with the stuff at
> 
> 	http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

Niall,

Thanks for the heads-up!

Yes, we are familiar with it--it is Richard W.M. Jones's
implementation and is exactly what Jeff is evaluating along with my
bounded pointer implementation.

Greg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: array bounds checking?
@ 1998-05-22 12:21 Niall Smart
  1998-05-22 16:41 ` Greg McGary
  0 siblings, 1 reply; 24+ messages in thread
From: Niall Smart @ 1998-05-22 12:21 UTC (permalink / raw)
  To: law, Greg McGary; +Cc: Jim Wilson, Dave Love, burley, egcs

On May 21,  9:03pm, Jeffrey A Law wrote:
} Subject: Re: array bounds checking?
> 
> I looked over the bounds checking and bounded pointer stuff for
> a little while today.

Are you guys familiar with the stuff at

	http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

If your interested I'm sure Paul would like to know, tell him 
I sent you :)

Niall Smart

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~1998-05-26  2:32 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-04-18 16:05 Idea: Eliminate libf2c/f2c.h installation from g77 entirely? Craig Burley
1998-04-18 20:56 ` Mumit Khan
1998-04-19  0:11 ` Jeffrey A Law
1998-04-19  1:49   ` Craig Burley
1998-04-19  5:18     ` Dave Love
1998-04-19 11:19       ` Craig Burley
1998-04-19  5:37     ` array bounds checking? Dave Love
1998-04-21 19:10       ` Jim Wilson
1998-05-03 22:27         ` Greg McGary
1998-05-08 16:04           ` Gerald Pfeifer
1998-05-08 16:08             ` Greg McGary
1998-05-22  0:21           ` Jeffrey A Law
1998-05-22  0:32             ` Greg McGary
1998-05-22  5:42             ` Greg McGary
1998-05-22 12:21             ` Per Bothner
1998-05-22 12:14               ` Jeffrey A Law
1998-05-22 16:41               ` Pieter Nagel
1998-05-22 16:41             ` Dave Love
1998-05-22 20:21             ` Greg McGary
1998-05-26  2:32               ` Ken Raeburn
1998-04-19  8:15     ` using libtool Dave Love
1998-05-22 12:21 array bounds checking? Niall Smart
1998-05-22 16:41 ` Greg McGary
1998-05-24  4:18 Geert Bosch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).