Re: "Documentation by paper"

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: "Documentation by paper"
@ 2004-01-27 18:56 Richard Kenner
  2004-01-27 19:31 ` Diego Novillo
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 18:56 UTC (permalink / raw)
  To: asutton; +Cc: gcc

    not everybody learns from the bottom up. using documentation
    generators like doxygen provides a pretty convenient way to provide a
    high-level picture of the system and its components. so yes...  there
    is most definitely value there.

I strongly agree that having high-level documentation is important, but
I don't see that documentation generators produce it!

"high-level documention" means a very broad overview of a program and
how it works.  This *cannot* be produced automatically since none of
the information in it would be present anywhere else.

What you are talking about producing isn't anything "high-level", but just
a very low-level directory of functions and data structures.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:56 "Documentation by paper" Richard Kenner
@ 2004-01-27 19:31 ` Diego Novillo
  0 siblings, 0 replies; 171+ messages in thread
From: Diego Novillo @ 2004-01-27 19:31 UTC (permalink / raw)
  To: Richard Kenner; +Cc: asutton, gcc

On Tue, 2004-01-27 at 13:55, Richard Kenner wrote:
>     not everybody learns from the bottom up. using documentation
>     generators like doxygen provides a pretty convenient way to provide a
>     high-level picture of the system and its components. so yes...  there
>     is most definitely value there.
> 
> I strongly agree that having high-level documentation is important, but
> I don't see that documentation generators produce it!
> 
Not 'produce', 'extract'.

If such high-level documentation is written inside a comment at the top
of each file, then you can coax the publishing tool of your choice to
emit it in a dozen different formats.

If folks don't care for having formatting markes in the comments, that's
fine with me.  But it is possible to extract documentation from the
source code directly, and that is what I would like us to do for the
internal API and design documentation, at least.

Diego.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-11 15:42                         ` Daniel Berlin
@ 2004-02-11 15:56                           ` Daniel Berlin
  0 siblings, 0 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-02-11 15:56 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc, Jamie Lokier, Kai Henningsen


On Feb 11, 2004, at 10:42 AM, Daniel Berlin wrote:

>
> On Feb 11, 2004, at 7:30 AM, Jamie Lokier wrote:
>
>> Daniel Berlin wrote:
>>> Someone mentioned something that wasn't related to the thread, which 
>>> is
>>> that they thought they read in a paper that a certain algorithm
>>> required useless phi nodes.
>>> I responded that I had provided evidence to contradict this.
>>
>> You have not provided such evidence.
>
> Pardon?
> Look through the archives.
>
>>
>> What makes you think we read the same paper or the same algorithm?
>
> I provided numbers for SSAPRE, SSA-CCP, and a few other algorithms 
> which claimed they needed extraneous phi nodes.

I didn't mean "needed" here, actually.
Needed implies they were a correctness issue.
Instead, they claimed it might help the algorithm perform better.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-11 12:38               ` Jamie Lokier
@ 2004-02-11 15:51                 ` Daniel Berlin
  0 siblings, 0 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-02-11 15:51 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: paolo.bonzini, law, kenner, gcc, Peter Barada

>
> As I recall the extra PHI nodes were for correctness of a particular
> algorithm, not performance.

>  In other words, additional structure
> beyond SSA.
SSA includes phi nodes.  Having extraneous phi nodes is still an SSA 
form, it's just not a minimal one.
If you are referring to some algorithm that requires an extended SSA 
form, such as gated SSA, that would be a different case.

> I could be 100% mistaken, because it was many years ago,
> but your bizarre leap of logic here is still bizarre :/

I have never read an SSA optimization paper that *required* extra phi 
nodes for correctness.  I have read plenty that provide claims that 
extra phi nodes help performance of the algorithm  (all of which are 
based on some original claim by Cytron about pruned vs minimal vs 
semi-pruned).

An algorithm which required useless phi nodes for correctness (IE 
required semi-pruned and not working on minimal) would be 100% 
broken/unsound, and certainly, we would not use it.
--Dan
>
> -- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-11 12:31                       ` Jamie Lokier
@ 2004-02-11 15:42                         ` Daniel Berlin
  2004-02-11 15:56                           ` Daniel Berlin
  0 siblings, 1 reply; 171+ messages in thread
From: Daniel Berlin @ 2004-02-11 15:42 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: gcc, Kai Henningsen


On Feb 11, 2004, at 7:30 AM, Jamie Lokier wrote:

> Daniel Berlin wrote:
>> Someone mentioned something that wasn't related to the thread, which 
>> is
>> that they thought they read in a paper that a certain algorithm
>> required useless phi nodes.
>> I responded that I had provided evidence to contradict this.
>
> You have not provided such evidence.

Pardon?
Look through the archives.

>
> What makes you think we read the same paper or the same algorithm?

I provided numbers for SSAPRE, SSA-CCP, and a few other algorithms 
which claimed they needed extraneous phi nodes.
--Dan

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:39             ` law
  2004-02-09 19:12               ` Robert Dewar
@ 2004-02-11 12:38               ` Jamie Lokier
  2004-02-11 15:51                 ` Daniel Berlin
  1 sibling, 1 reply; 171+ messages in thread
From: Jamie Lokier @ 2004-02-11 12:38 UTC (permalink / raw)
  To: law; +Cc: Peter Barada, kenner, paolo.bonzini, gcc

law@redhat.com wrote:
> In message <20040209175301.GB3455@mail.shareable.org>, Jamie Lokier writes:
>  >Granted, but papers do refer to them as "minimal SSA" et al., and the
>  >properties are relevant to algorithms.  Without being 100% certain
>  >from memory, I recall some algorithms require some of the "useless"
>  >PHI nodes to be present.
> The claims were that some algorithms may be able to do a better job at
> optimizing when those useless PHIs were present.  I believe Daniel showed
> some evidence that refuted those claims.

Now I'm really confused.  I don't recall which paper I read that an
algorithm (described in the paper) required certain extra PHI nodes,
i.e. a certain non-minimal requirement, but how can you assume it had
anything at all to do with performance claims in a different paper
that you read?

As I recall the extra PHI nodes were for correctness of a particular
algorithm, not performance.  In other words, additional structure
beyond SSA.  I could be 100% mistaken, because it was many years ago,
but your bizarre leap of logic here is still bizarre :/

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-10 20:31                     ` Daniel Berlin
  2004-02-10 20:49                       ` Joern Rennecke
@ 2004-02-11 12:31                       ` Jamie Lokier
  2004-02-11 15:42                         ` Daniel Berlin
  1 sibling, 1 reply; 171+ messages in thread
From: Jamie Lokier @ 2004-02-11 12:31 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Kai Henningsen, gcc

Daniel Berlin wrote:
> Someone mentioned something that wasn't related to the thread, which is 
> that they thought they read in a paper that a certain algorithm 
> required useless phi nodes.
> I responded that I had provided evidence to contradict this.

You have not provided such evidence.

What makes you think we read the same paper or the same algorithm?

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-10 20:31                     ` Daniel Berlin
@ 2004-02-10 20:49                       ` Joern Rennecke
  2004-02-11 12:31                       ` Jamie Lokier
  1 sibling, 0 replies; 171+ messages in thread
From: Joern Rennecke @ 2004-02-10 20:49 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc, Kai Henningsen

> Someone mentioned something that wasn't related to the thread, which is 
> that they thought they read in a paper that a certain algorithm 
> required useless phi nodes.
> I responded that I had provided evidence to contradict this.
> It had nothing to do with documentation per se, actually, it was just 
> dragged back into the documentation thread by someone saying the fact 
> that there is no restriction where nobody would expect one,  should be 
> documented, which is actually a bit silly.

If there is a paper out there that deals with this algorithm -
particularly if it's a paper used to write the implementation and/or 
referenced in the documentation - which states that there is such
a restriction, somebody who has read the paper will expect this restriction,
unless he has read also some other paper / email / comment / etc that
disproves or at least denies that statement.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-10 19:51                   ` Kai Henningsen
@ 2004-02-10 20:31                     ` Daniel Berlin
  2004-02-10 20:49                       ` Joern Rennecke
  2004-02-11 12:31                       ` Jamie Lokier
  0 siblings, 2 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-02-10 20:31 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: gcc

On Feb 10, 2004, at 12:46 PM, Kai Henningsen wrote:

> dberlin@dberlin.org (Daniel Berlin)  wrote on 09.02.04 in 
> <67F3F3F9-5B2D-11D8-ABD3-000A95DA505C@dberlin.org>:
>
>> On Feb 9, 2004, at 1:13 PM, Robert Dewar wrote:
>>
>>> Daniel Berlin wrote:
>>>
>>>> This is theoretical.
>>>> I provided some data that showed at least one algorithm that claimed
>>>> they wanted "useless" phi nodes to be present worked slightly better
>>>> without them.
>>>
>>> Well surely it is part of the specification of the *implementation*
>>> whether these "useless" nodes are supposed to be present or not.
>> No, and you've missed my point.
>> The paper on the exact algorithm we implement claims they help.
>> With an implementation implemented *EXACTLY* according to this
>> algorithm, they do not help.
>> That is all.
>> It has nothing to do with an implementation difference.
>
> I smell a case of not seeing the wood for the trees.
Whatever.
Someone mentioned something that wasn't related to the thread, which is 
that they thought they read in a paper that a certain algorithm 
required useless phi nodes.
I responded that I had provided evidence to contradict this.
It had nothing to do with documentation per se, actually, it was just 
dragged back into the documentation thread by someone saying the fact 
that there is no restriction where nobody would expect one,  should be 
documented, which is actually a bit silly.

>
> Or in other words, it's you who didn't get the point. This thread is 
> NOT
> about what a specific optimizer wants to see wrt.

>  PHI nodes, it is about
> DOCUMENTING things like what this optimizer wants to see,

The optimizer doesn't care what it sees here.
What is there to document?

>  whatever that happens to be.

Right.
"This optimizer accepts everything given to it, just the same, and 
optimizes it all equally well"
That's real useful.
Any optimizer than can't optimize certain things or patterns should 
certainly be noted what they can't optimize (if only so they can be 
improved), but that is *NOT* the case here.
Here the case is that there *IS* no restriction.  Thus, there is 
nothing to document in regards to a restriction.
Of course, i'm sure someone else will say that we should document the 
fact that there is no restriction, lest some alien from mars be reading 
the code and on that planet, they assume restrictions that we don't.

Of course, it's that style of documentation that reminds me so much of 
the statutes i end of reading for various classes, and causes my eyes 
and brain to bleed.   Congress tries to documents things in at least a 
thousand pages per spec. (I could also point out that even as detailed 
as they are, they never work anyway. Imagine that. Developer 
documentation [which is what it really is] that is both voluminous and 
highly detailed, and doesn't help at all)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:26                 ` Daniel Berlin
@ 2004-02-10 19:51                   ` Kai Henningsen
  2004-02-10 20:31                     ` Daniel Berlin
  0 siblings, 1 reply; 171+ messages in thread
From: Kai Henningsen @ 2004-02-10 19:51 UTC (permalink / raw)
  To: gcc

dberlin@dberlin.org (Daniel Berlin)  wrote on 09.02.04 in <67F3F3F9-5B2D-11D8-ABD3-000A95DA505C@dberlin.org>:

> On Feb 9, 2004, at 1:13 PM, Robert Dewar wrote:
>
> > Daniel Berlin wrote:
> >
> >> This is theoretical.
> >> I provided some data that showed at least one algorithm that claimed
> >> they wanted "useless" phi nodes to be present worked slightly better
> >> without them.
> >
> > Well surely it is part of the specification of the *implementation*
> > whether these "useless" nodes are supposed to be present or not.
> No, and you've missed my point.
> The paper on the exact algorithm we implement claims they help.
> With an implementation implemented *EXACTLY* according to this
> algorithm, they do not help.
> That is all.
> It has nothing to do with an implementation difference.

I smell a case of not seeing the wood for the trees.

Or in other words, it's you who didn't get the point. This thread is NOT  
about what a specific optimizer wants to see wrt. PHI nodes, it is about  
DOCUMENTING things like what this optimizer wants to see, whatever that  
happens to be. Don't confuse an example for an argument with the argument  
itself.

Now if you had said "sure, and that's why file XXX says this", *that*  
would have been relevant.

MfG Kai

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 16:28     ` law
  2004-02-09 16:45       ` Robert Dewar
@ 2004-02-10 19:51       ` Kai Henningsen
  1 sibling, 0 replies; 171+ messages in thread
From: Kai Henningsen @ 2004-02-10 19:51 UTC (permalink / raw)
  To: gcc

law@redhat.com  wrote on 09.02.04 in <200402091627.i19GRE1F022341@speedy.slc.redhat.com>:

> In message <92K5FmlXw-B@khms.westfalen.de>, Kai Henningsen writes:
>  >law@redhat.com  wrote on 03.02.04 in
>  ><200402031634.i13GYTa3019190@speedy.slc. redhat.com>:
>  >
>  >> In message <10402031550.AA21796@vlsi1.ultra.nyu.edu>, Richard Kenner
>  >> writes How is that relevant?  Dominators are discussed in several texts
>  >> one can read, including, but not limited to Morgan, Muchnick, Appel,
>  >> Aho, etc. Dominators actually pre-date your decade-ago compiler course.
>  >> Pick up a book and do a little reading :-)
>  >
>  >Hmmm ... are they in the Dragon book? I certainly don't recall them, but
>  >then it's been a while since I last looked in there.
> Yup.  I went back and double checked before making that statement.
>
>  >> At some point you have to assume a base level of knowledge for your
>  >> reader.  Are we going to define CFG in every file which uses the CFG?
>  >> Are we going to define the basic properties of SSA in every
>  >> file which uses that form?
>  >
>  >Ever heard of this thing called a "glossary"? Needn't be in every file
>  >either. How about - daring idea - it gets put into a file called, oh,
>  >"glossary.texi"?
> But again, why bother when there are already things like wikipedia and
> numerous texts which already define these basic terms.  Other folks have
> already defined these basic terms, probably in better verbage than we'd come
> up with.  Why recreate the wheel?

Because those aren't part of gcc.

And because by your argument, nobody would ever need a glossary. Which is  
obviously absurd, so your argument cannot be right.

MfG Kai

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:54 ` Zack Weinberg
@ 2004-02-10 19:51   ` Kai Henningsen
  0 siblings, 0 replies; 171+ messages in thread
From: Kai Henningsen @ 2004-02-10 19:51 UTC (permalink / raw)
  To: gcc

zack@codesourcery.com (Zack Weinberg)  wrote on 09.02.04 in <87n07s13uf.fsf@egil.codesourcery.com>:

> kenner@vlsi1.ultra.nyu.edu (Richard Kenner) writes:
>
> >     Are you advocating that there should be no internals manual?  I can
> >     see an argument for "no internals manual is necessary, just use
> >     comments in the code", and I can see an argument for "every internal
> >     interface should at least be mentioned in the internals manual", but
> >     not for some limbo state where there's no way to know if something is
> >     in the manual or not.
> >
> > No.  I'm advocating that the *primary* documentation should be in the
> > source code and that any documenation (internal or external) should be
> > derived from there, not vice versa.
>
> Weird, I thought you didn't like extracting documentation from the
> source code.

The way I read him, as long as he gets his primary readable docs in the  
source, he's willing to have the rest of you extract something from there.

As for the internals manual -

A case *could* be made to reduce that to a sort of "Introduction to GCC  
internals for beginners", including a general introduction and a directory  
of what to find in which file (the latter presumably at least partly  
generated), and having all the rest in some source files.

Or, alternatively, having all the rest in those sourcefiles and extract it  
from there for the manual.

Of course, this then runs head-on into the GPL/GFDL incompatibility ...

Anyway, if one wants to go that way, presumably there'd have to be some  
sort of standard file preamble. Just for sake of illustration, not meant  
as a serious proposal:

----- start of file ----
/* <group> <one-line description>
<copyright/licence boilerplate>*/

/* <long
    description> */

#include ...
...
----- rest of file -----

where group is meant to sort this to the right place in a directory, say  
"c.parser" or whatever - meant to be sorted by <group>.<filename>. (This  
is actually mostly taken from a single look at c-decl.c - adding the  
group.)

MfG Kai

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:39             ` law
@ 2004-02-09 19:12               ` Robert Dewar
  2004-02-11 12:38               ` Jamie Lokier
  1 sibling, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-09 19:12 UTC (permalink / raw)
  To: law; +Cc: Jamie Lokier, Peter Barada, kenner, paolo.bonzini, gcc

law@redhat.com wrote:

> In message <20040209175301.GB3455@mail.shareable.org>, Jamie Lokier writes:
>  >Granted, but papers do refer to them as "minimal SSA" et al., and the
>  >properties are relevant to algorithms.  Without being 100% certain
>  >from memory, I recall some algorithms require some of the "useless"
>  >PHI nodes to be present.

> I'll also note that our optimizers don't care if they're presented with the
> extra PHI nodes or not

Fine, this should be part of the documented spec.



^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-09 19:05 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-09 19:05 UTC (permalink / raw)
  To: pkoning; +Cc: gcc

    An internals manual has indexing (or at least it can, and should).
    How do you do that in code docs?

You lost me.

    In my experience, a major problem with gcc -- unless you've been
    working on it full time for at least 5 years -- is that it's so
    complex that it's nearly impossible to get started.  The internals
    manual helps a little, though its indexing and cross-referencing isn't
    nearly good enough.  If the internals manual is deprecated in favor of
    /* ... */, I fear this problem will get much worse, not better.

I haven't heard anybody say it's "deprecated".  The point, though, is that
the code documenation will always need to be more detailed than an internals
manual. This is usually true for even overview documentation.  Some person,
not a tool, needs to figure out what information to include in the
internals manual, how to present it, and how to index it.  That person has
all the code documentation as material to use for the manual, but presumably
would want to both edit and reformat it.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:55 Richard Kenner
@ 2004-02-09 18:59 ` Paul Koning
  0 siblings, 0 replies; 171+ messages in thread
From: Paul Koning @ 2004-02-09 18:59 UTC (permalink / raw)
  To: kenner; +Cc: zack, gcc

>>>>> "Richard" == Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:

 >> Weird, I thought you didn't like extracting documentation
 >> from the source code.

 Richard> I don't like *automated* extraction, but there's nothing
 Richard> wrong with a human rewriting code documentation to the level
 Richard> appropriate for an internals manual.

An internals manual has indexing (or at least it can, and should).
How do you do that in code docs?

In my experience, a major problem with gcc -- unless you've been
working on it full time for at least 5 years -- is that it's so
complex that it's nearly impossible to get started.  The internals
manual helps a little, though its indexing and cross-referencing isn't
nearly good enough.  If the internals manual is deprecated in favor of
/* ... */, I fear this problem will get much worse, not better.

   paul

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-09 18:55 Richard Kenner
  2004-02-09 18:59 ` Paul Koning
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-09 18:55 UTC (permalink / raw)
  To: zack; +Cc: gcc

    Weird, I thought you didn't like extracting documentation from the
    source code.

I don't like *automated* extraction, but there's nothing wrong with a human
rewriting code documentation to the level appropriate for an internals manual.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:52 Richard Kenner
@ 2004-02-09 18:54 ` Zack Weinberg
  2004-02-10 19:51   ` Kai Henningsen
  0 siblings, 1 reply; 171+ messages in thread
From: Zack Weinberg @ 2004-02-09 18:54 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

kenner@vlsi1.ultra.nyu.edu (Richard Kenner) writes:

>     Are you advocating that there should be no internals manual?  I can
>     see an argument for "no internals manual is necessary, just use
>     comments in the code", and I can see an argument for "every internal
>     interface should at least be mentioned in the internals manual", but
>     not for some limbo state where there's no way to know if something is
>     in the manual or not.
>
> No.  I'm advocating that the *primary* documentation should be in the
> source code and that any documenation (internal or external) should be
> derived from there, not vice versa.

Weird, I thought you didn't like extracting documentation from the
source code.

zw

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-09 18:52 Richard Kenner
  2004-02-09 18:54 ` Zack Weinberg
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-09 18:52 UTC (permalink / raw)
  To: zack; +Cc: gcc

    Are you advocating that there should be no internals manual?  I can
    see an argument for "no internals manual is necessary, just use
    comments in the code", and I can see an argument for "every internal
    interface should at least be mentioned in the internals manual", but
    not for some limbo state where there's no way to know if something is
    in the manual or not.

No.  I'm advocating that the *primary* documentation should be in the
source code and that any documenation (internal or external) should be
derived from there, not vice versa.

    Your original patches don't contain much in the way of documentation,
    either.

What happened here was that the original implementation had few enough
functions that there wasn't a need for anything other than the localized
documentation, but then things grew to the point where it was indeed needed
and I never went back and added it.  I will do so soon.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:37 Richard Kenner
@ 2004-02-09 18:45 ` Zack Weinberg
  0 siblings, 0 replies; 171+ messages in thread
From: Zack Weinberg @ 2004-02-09 18:45 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

kenner@vlsi1.ultra.nyu.edu (Richard Kenner) writes:

>     You invented MEM_ATTRS and the adjust_address family of functions.
>     There is not one word about these things in doc/*.texi.  
>
> I don't believe they should be there, but they should indeed be *somewhere*.

Are you advocating that there should be no internals manual?  I can
see an argument for "no internals manual is necessary, just use
comments in the code", and I can see an argument for "every internal
interface should at least be mentioned in the internals manual", but
not for some limbo state where there's no way to know if something is
in the manual or not.

>     Write documentation for all this, to the standard you are advocating,
>     and let us see if it's good enough.
>
> Fair enough, though there was stuff added after I did the original work.

Your original patches don't contain much in the way of documentation,
either.

zw

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 17:53           ` Jamie Lokier
  2004-02-09 18:07             ` Daniel Berlin
@ 2004-02-09 18:39             ` law
  2004-02-09 19:12               ` Robert Dewar
  2004-02-11 12:38               ` Jamie Lokier
  1 sibling, 2 replies; 171+ messages in thread
From: law @ 2004-02-09 18:39 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Peter Barada, kenner, paolo.bonzini, gcc

In message <20040209175301.GB3455@mail.shareable.org>, Jamie Lokier writes:
 >Granted, but papers do refer to them as "minimal SSA" et al., and the
 >properties are relevant to algorithms.  Without being 100% certain
 >from memory, I recall some algorithms require some of the "useless"
 >PHI nodes to be present.
The claims were that some algorithms may be able to do a better job at
optimizing when those useless PHIs were present.  I believe Daniel showed
some evidence that refuted those claims.

I'll also note that our optimizers don't care if they're presented with the
extra PHI nodes or not (which should be true for any SSA optimizer since
otherwise you'd be required to run a DCE pass before entering any SSA
optimizer to remove dead phi nodes).

I'll note that the differences in fully, semi and minimal are also discussed
in a variety of papers and textbooks, though not as far back as the Dragon
book.

jeff


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-09 18:37 Richard Kenner
  2004-02-09 18:45 ` Zack Weinberg
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-09 18:37 UTC (permalink / raw)
  To: zack; +Cc: gcc

    You invented MEM_ATTRS and the adjust_address family of functions.
    There is not one word about these things in doc/*.texi.  

I don't believe they should be there, but they should indeed be *somewhere*.

    Write documentation for all this, to the standard you are advocating,
    and let us see if it's good enough.

Fair enough, though there was stuff added after I did the original work.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 17:19 Richard Kenner
  2004-02-09 18:14 ` Joe Buck
@ 2004-02-09 18:34 ` Zack Weinberg
  1 sibling, 0 replies; 171+ messages in thread
From: Zack Weinberg @ 2004-02-09 18:34 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Put your money where your mouth is.

You've been saying all along that the original author is the person
who must write the documentation.  Fine.  You invented MEM_ATTRS and
the adjust_address family of functions.  There is not one word about
these things in doc/*.texi.  The function-level comments for the
adjust_address family do not give any clue of the most important
detail, viz., which one to use for what.  The fields of MEM_ATTRS are
cursorily documented in rtl.h but there is no high-level view.

Write documentation for all this, to the standard you are advocating,
and let us see if it's good enough.

zw

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-09 18:28 Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-09 18:28 UTC (permalink / raw)
  To: dberlin, dewar; +Cc: gcc, jamie, kenner, law, paolo.bonzini, peter

To: dberlin@dberlin.org, dewar@gnat.com
Subject: Re: "Documentation by paper"
Cc: gcc@gcc.gnu.org, jamie@shareable.org, kenner@vlsi1.ultra.nyu.edu,
    law@redhat.com, paolo.bonzini@polimi.it, peter@the-baradas.com

> No, and you've missed my point.
> The paper on the exact algorithm we implement claims they help.
> With an implementation implemented *EXACTLY* according to this 
> algorithm, they do not help.
> That is all.
> It has nothing to do with an implementation difference.

Again, remember that this thread is about documentation. This is indeed
an interesting observation, and most certainly something that should be
in the documentation. Just referring to the paper (see subject line
above) would be quite misleading if the paper implies that they help
but the implementation does not use them because of evidence that the
paper is wrong at that stage.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:14               ` Robert Dewar
@ 2004-02-09 18:26                 ` Daniel Berlin
  2004-02-10 19:51                   ` Kai Henningsen
  0 siblings, 1 reply; 171+ messages in thread
From: Daniel Berlin @ 2004-02-09 18:26 UTC (permalink / raw)
  To: Robert Dewar; +Cc: paolo.bonzini, Jamie Lokier, law, kenner, gcc, Peter Barada


On Feb 9, 2004, at 1:13 PM, Robert Dewar wrote:

> Daniel Berlin wrote:
>
>> This is theoretical.
>> I provided some data that showed at least one algorithm that claimed 
>> they wanted "useless" phi nodes to be present worked slightly better 
>> without them.
>
> Well surely it is part of the specification of the *implementation*
> whether these "useless" nodes are supposed to be present or not.
No, and you've missed my point.
The paper on the exact algorithm we implement claims they help.
With an implementation implemented *EXACTLY* according to this 
algorithm, they do not help.
That is all.
It has nothing to do with an implementation difference.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-09 18:20 Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-09 18:20 UTC (permalink / raw)
  To: Joe.Buck, kenner; +Cc: gcc, law

> Almost all the differences in the standard compiler terms (for example,
> dominators) that have been mentioned in this discussion, or that I have
> observed, are the difference between < and <= : a partial order relation
> is defined, and some texts use the < form and others use the <= form.
> The texts that use the <= definition usually then use some adjective like
> "strict" or "proper" to get the corresponding < definition.

The definition of basic blocks also varies somewhat. CS in general
suffers from a lack of absolutely well defined terminology, both for
data structures and for algorithms (for instance for some people quicksort
inherently includes the in place exchange algorithm, for others the
essence is only the arbitrary division, and the exchange is a data
structure detail). It never does any harm to define terms :-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 17:19 Richard Kenner
@ 2004-02-09 18:14 ` Joe Buck
  2004-02-09 18:34 ` Zack Weinberg
  1 sibling, 0 replies; 171+ messages in thread
From: Joe Buck @ 2004-02-09 18:14 UTC (permalink / raw)
  To: Richard Kenner; +Cc: law, gcc

On Mon, Feb 09, 2004 at 12:22:12PM -0500, Richard Kenner wrote:
>     But again, why bother when there are already things like wikipedia and
>     numerous texts which already define these basic terms.  Other folks
>     have already defined these basic terms, probably in better verbage
>     than we'd come up with.  Why recreate the wheel?
> 
> Once again, because:
> 
> (1) There are usually subtle differences between the definitions in these
> "numerous texts" and we need to be specific which one we need.

Almost all the differences in the standard compiler terms (for example,
dominators) that have been mentioned in this discussion, or that I have
observed, are the difference between < and <= : a partial order relation
is defined, and some texts use the < form and others use the <= form.
The texts that use the <= definition usually then use some adjective like
"strict" or "proper" to get the corresponding < definition.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 18:07             ` Daniel Berlin
@ 2004-02-09 18:14               ` Robert Dewar
  2004-02-09 18:26                 ` Daniel Berlin
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-09 18:14 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Jamie Lokier, paolo.bonzini, law, kenner, gcc, Peter Barada

Daniel Berlin wrote:

> This is theoretical.
> I provided some data that showed at least one algorithm that claimed 
> they wanted "useless" phi nodes to be present worked slightly better 
> without them.

Well surely it is part of the specification of the *implementation*
whether these "useless" nodes are supposed to be present or not.
Actually you want to know if the producer is supposed to produce
them, and whether the consumer requires/prohibits/tolerates them.

Remember this thread is about documentation. This kind of detail
is exactly the sort of thing that the documentation should provide.

It's also a very good indication of why you can't easily reverse
documentation from engineering.

Suppose the spec is

Producer is not supposed to produce "useless" nodes.

Consumer is supposed to tolerate them if they are produced

Now in practice the consumer does not get tested with useless
nodes, but it is supposed to accept them. It may be quite
difficult to tell that this is the case from just the code.
Furthermode, if there is a latent bug where one of these
nodes would blow up the consumer, then the reverse engineered
documentation might incorrectly conclude that the consumer
is not supposed to tolerate them.

It's really not that much trouble to pin things like this
down with proper documentation!

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 17:53           ` Jamie Lokier
@ 2004-02-09 18:07             ` Daniel Berlin
  2004-02-09 18:14               ` Robert Dewar
  2004-02-09 18:39             ` law
  1 sibling, 1 reply; 171+ messages in thread
From: Daniel Berlin @ 2004-02-09 18:07 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: paolo.bonzini, law, kenner, gcc, Peter Barada


On Feb 9, 2004, at 12:53 PM, Jamie Lokier wrote:

> law@redhat.com wrote:
>> In message <20040208062304.GB25666@mail.shareable.org>, Jamie Lokier 
>> writes:
>>> Note that there are several different variations of SSA form,
>>> particularly minimal vs. non-minimal, so a definition of which one is
>>> being used and assumed e.g. by tree-SSA would be good.
>>
>> Minimal vs pruned vs semi-pruned are not properties of the SSA form 
>> -- they
>> only differ in how many useless PHI nodes are initially generated when
>> translating into SSA form.
>
> Granted, but papers do refer to them as "minimal SSA" et al., and the
> properties are relevant to algorithms.  Without being 100% certain
> from memory, I recall some algorithms require some of the "useless"
> PHI nodes to be present.

This is theoretical.
I provided some data that showed at least one algorithm that claimed 
they wanted "useless" phi nodes to be present worked slightly better 
without them.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 16:25         ` law
@ 2004-02-09 17:53           ` Jamie Lokier
  2004-02-09 18:07             ` Daniel Berlin
  2004-02-09 18:39             ` law
  0 siblings, 2 replies; 171+ messages in thread
From: Jamie Lokier @ 2004-02-09 17:53 UTC (permalink / raw)
  To: law; +Cc: Peter Barada, kenner, paolo.bonzini, gcc

law@redhat.com wrote:
> In message <20040208062304.GB25666@mail.shareable.org>, Jamie Lokier writes:
>  >Note that there are several different variations of SSA form,
>  >particularly minimal vs. non-minimal, so a definition of which one is
>  >being used and assumed e.g. by tree-SSA would be good.
>
> Minimal vs pruned vs semi-pruned are not properties of the SSA form -- they
> only differ in how many useless PHI nodes are initially generated when
> translating into SSA form.

Granted, but papers do refer to them as "minimal SSA" et al., and the
properties are relevant to algorithms.  Without being 100% certain
from memory, I recall some algorithms require some of the "useless"
PHI nodes to be present.

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-09 17:19 Richard Kenner
  2004-02-09 18:14 ` Joe Buck
  2004-02-09 18:34 ` Zack Weinberg
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-09 17:19 UTC (permalink / raw)
  To: law; +Cc: gcc

    But again, why bother when there are already things like wikipedia and
    numerous texts which already define these basic terms.  Other folks
    have already defined these basic terms, probably in better verbage
    than we'd come up with.  Why recreate the wheel?

Once again, because:

(1) There are usually subtle differences between the definitions in these
"numerous texts" and we need to be specific which one we need.

(2) If you're taking some code to read on a plane (or similar), you
don't have access to "things like wikipedia and numerous texts".

(3) We have to explain our datastructures and functions anyway and
it's simpler to do so as part of a definition or explanation than
just referencing one and trying to stitch things in.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-09 16:28     ` law
@ 2004-02-09 16:45       ` Robert Dewar
  2004-02-10 19:51       ` Kai Henningsen
  1 sibling, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-09 16:45 UTC (permalink / raw)
  To: law; +Cc: Kai Henningsen, gcc

law@redhat.com wrote:

> But again, why bother when there are already things like wikipedia and
> numerous texts which already define these basic terms.  Other folks have
> already defined these basic terms, probably in better verbage than we'd come
> up with.  Why recreate the wheel?

Because precise definitions can vary. When people write mathematical or
scientific papers, they always precisely define terms, and the reason
for this is that there can be subtle (or sometimes not so subtle)
variations in exact definitions. These kind of variations are exactly
what can cause difficulties in programs.

Furthermore, the general definitions are likely to be a notch too 
abstract. Often in the context of a given program, you can provide
definitions which are slightly more concrete and understandable in
the context of an implementation.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-07  0:14   ` Kai Henningsen
@ 2004-02-09 16:28     ` law
  2004-02-09 16:45       ` Robert Dewar
  2004-02-10 19:51       ` Kai Henningsen
  0 siblings, 2 replies; 171+ messages in thread
From: law @ 2004-02-09 16:28 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: gcc

In message <92K5FmlXw-B@khms.westfalen.de>, Kai Henningsen writes:
 >law@redhat.com  wrote on 03.02.04 in <200402031634.i13GYTa3019190@speedy.slc.
 >redhat.com>:
 >
 >> In message <10402031550.AA21796@vlsi1.ultra.nyu.edu>, Richard Kenner writes
 >> How is that relevant?  Dominators are discussed in several texts one
 >> can read, including, but not limited to Morgan, Muchnick, Appel, Aho, etc.
 >> Dominators actually pre-date your decade-ago compiler course.  Pick up a
 >> book and do a little reading :-)
 >
 >Hmmm ... are they in the Dragon book? I certainly don't recall them, but  
 >then it's been a while since I last looked in there.
Yup.  I went back and double checked before making that statement.

 >> At some point you have to assume a base level of knowledge for your
 >> reader.  Are we going to define CFG in every file which uses the CFG?
 >> Are we going to define the basic properties of SSA in every
 >> file which uses that form?
 >
 >Ever heard of this thing called a "glossary"? Needn't be in every file  
 >either. How about - daring idea - it gets put into a file called, oh,  
 >"glossary.texi"?
But again, why bother when there are already things like wikipedia and
numerous texts which already define these basic terms.  Other folks have
already defined these basic terms, probably in better verbage than we'd come
up with.  Why recreate the wheel?

jeff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-08  6:23       ` Jamie Lokier
@ 2004-02-09 16:25         ` law
  2004-02-09 17:53           ` Jamie Lokier
  0 siblings, 1 reply; 171+ messages in thread
From: law @ 2004-02-09 16:25 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Peter Barada, kenner, paolo.bonzini, gcc

In message <20040208062304.GB25666@mail.shareable.org>, Jamie Lokier writes:
 >law@redhat.com wrote:
 >> You can look up SSA, dominator, and a variety of other common terms and get
 >> reasonably concise definitions.
 >
 >Note that there are several different variations of SSA form,
 >particularly minimal vs. non-minimal, so a definition of which one is
 >being used and assumed e.g. by tree-SSA would be good.
Minimal vs pruned vs semi-pruned are not properties of the SSA form -- they
only differ in how many useless PHI nodes are initially generated when
translating into SSA form.

jeff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 17:07     ` law
  2004-02-03 17:28       ` Daniel Berlin
@ 2004-02-08  6:23       ` Jamie Lokier
  2004-02-09 16:25         ` law
  1 sibling, 1 reply; 171+ messages in thread
From: Jamie Lokier @ 2004-02-08  6:23 UTC (permalink / raw)
  To: law; +Cc: Peter Barada, kenner, paolo.bonzini, gcc

law@redhat.com wrote:
> You can look up SSA, dominator, and a variety of other common terms and get
> reasonably concise definitions.

Note that there are several different variations of SSA form,
particularly minimal vs. non-minimal, so a definition of which one is
being used and assumed e.g. by tree-SSA would be good.

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 22:02 ` Joern Rennecke
@ 2004-02-07  0:15   ` Kai Henningsen
  0 siblings, 0 replies; 171+ messages in thread
From: Kai Henningsen @ 2004-02-07  0:15 UTC (permalink / raw)
  To: gcc

joern.rennecke@superh.com (Joern Rennecke)  wrote on 02.02.04 in <200402022202.i12M24814736@linsvr4.uk.superh.com>:

[quoteto.xps]
> > Of course sufficiently powerful pre and post conditions would be able
> > to describe what a function does, but if these conditions are written
> > in a low level language without set comprehension and quantifiers, this
> > can be impractical in practice.
>
> Do you think this is practical for functions like rest_of_compilation
> using *any* description language different from the implementation
> language?

I'd think it is pretty much impossible in the implementation language. (C  
is bad at assert()ing anything complex.)

However, you could try the "English" description language.

MfG Kai

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:35 ` law
  2004-02-03 16:48   ` Peter Barada
@ 2004-02-07  0:14   ` Kai Henningsen
  2004-02-09 16:28     ` law
  1 sibling, 1 reply; 171+ messages in thread
From: Kai Henningsen @ 2004-02-07  0:14 UTC (permalink / raw)
  To: gcc

law@redhat.com  wrote on 03.02.04 in <200402031634.i13GYTa3019190@speedy.slc.redhat.com>:

> In message <10402031550.AA21796@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
>  >    Which you have never read (at least a modern one), if you do not know
>  >    what a dominator is.  My first compiler course, which used only part
>  >    of Appel's "Modern compiler implementation" (which is nowhere near in
>  >    depth with respect to Muchnick or Morgan) did teach dominators.
>  >
>  >Correct.  My last compiler course was well over a decade ago.
> How is that relevant?  Dominators are discussed in several texts one
> can read, including, but not limited to Morgan, Muchnick, Appel, Aho, etc.
> Dominators actually pre-date your decade-ago compiler course.  Pick up a
> book and do a little reading :-)

Hmmm ... are they in the Dragon book? I certainly don't recall them, but  
then it's been a while since I last looked in there.

The Dragon book certainly covers (a lot) more than the compiler courses I  
took (which were at least two decades ago) - those covered none of those  
terms.

> Which I'm less and less inclined to do since we're using standard
> terminology dating back over 15 years (you can find dominators and
> dominator tree all the way back in the dragon book and probably
> earlier if you care to look).

Ah. Well, if it's in the Dragon book, I consider it easily available  
knowledge. (Gah. Where *did* I put that?)

> At some point you have to assume a base level of knowledge for your
> reader.  Are we going to define CFG in every file which uses the CFG?
> Are we going to define the basic properties of SSA in every
> file which uses that form?

Ever heard of this thing called a "glossary"? Needn't be in every file  
either. How about - daring idea - it gets put into a file called, oh,  
"glossary.texi"?

MfG Kai

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-05 19:57 ` Felix Lee
@ 2004-02-06 10:51   ` Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-06 10:51 UTC (permalink / raw)
  To: Felix Lee; +Cc: gcc

Felix Lee wrote:

> kenner@vlsi1.ultra.nyu.edu (Richard Kenner):
> 
>>How could I possibly write documentation on code that I need documentation
>>to understand?  Any documentation I could write would be totally useless!
> 
> 
> why do you say that?  I think it's generally not that hard to
> approach unfamiliar code and add documentation when I figure out
> what it's doing.  it's often just tedious.  of course, my
> understanding can be faulty, but that's what peer review is
> supposed to be for.

Well I already answered this. It is *impossible* to generate
complete documentation this way. Yes, you can say what a function
does (that's the information contained in the code), but you
can't say:

   o  what it is supposed to do (there may be a discrepancy due to
      bugs, or the function may be non-deterministic, i.e. there are
      several ways to achieve the required result, and by documenting
      from the code, you will overspecify.)

   o  why particular choices were made in the way it was written

   o  what it is doing at a higher level of abstraction

   o  what was not done, and why it was not done.

Certainly there are cases where people who did not write the code
originally can improve the documentation if it is missing, but it
is an illusion to think that this can subsitute for accurate and
complete documentation written contemperaneously with the code.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-05 20:09 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-05 20:09 UTC (permalink / raw)
  To: felix.1; +Cc: gcc

    > How could I possibly write documentation on code that I need
    > documentation to understand?  Any documentation I could write would
    > be totally useless!

    why do you say that?  I think it's generally not that hard to
    approach unfamiliar code and add documentation when I figure out
    what it's doing.

Yes, you can often add missing comments in the body of a function, but the
really important comments, as Robert says, are *why* things are done the
way they are done and what it *intended* to be done.  All you're ever
going to be able to figure out is what *is* being done no matter how
long you look at the code.


    /* Write a function, `added_clobbers_hard_reg_p' this is given an insn_code
       number that needs clobbers and returns 1 if they include a clobber of a
       hard reg and 0 if they just clobber SCRATCH.  */

    I don't have much idea what that comment means, mostly because the
    function name 'added_clobbers_hard_reg_p' doesn't make sense to me.  I
    don't know what the 'added' refers to.

Ooops.  Typo: "this is given" should be "that is given".

    that was also a change you made.  so, starting from the assumption
    that you're not an idiot, I can work on reconstructing your logic,
    which involves looking at insn_invalid_p's callers, digging into
    mailing list discussions, asking you, etc.  after doing all that, I
    could submit patches that summarize what I find, so other people don't
    have to go through the same investigative process.  but all that's too
    much work right now.  I have other things I need to do.  maybe later.

Yes, but note that all of that is about *one* comment on *one* function.
The major issue here isn't the comments on individual comments, but
missing multiple paragraphs that explain an overview of how things
are supposed to work.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 14:08 Richard Kenner
  2004-02-04 17:50 ` Joe Buck
@ 2004-02-05 19:57 ` Felix Lee
  2004-02-06 10:51   ` Robert Dewar
  1 sibling, 1 reply; 171+ messages in thread
From: Felix Lee @ 2004-02-05 19:57 UTC (permalink / raw)
  To: gcc

kenner@vlsi1.ultra.nyu.edu (Richard Kenner):
> How could I possibly write documentation on code that I need documentation
> to understand?  Any documentation I could write would be totally useless!

why do you say that?  I think it's generally not that hard to
approach unfamiliar code and add documentation when I figure out
what it's doing.  it's often just tedious.  of course, my
understanding can be faulty, but that's what peer review is
supposed to be for.

as an experiment, I picked a random gcc file, genemit.c.  I've
never looked at it before, and I'm not really familiar with gcc
internals.  you added a function to it a couple years ago

    /* Write a function, `added_clobbers_hard_reg_p' this is given an insn_code
       number that needs clobbers and returns 1 if they include a clobber of a
       hard reg and 0 if they just clobber SCRATCH.  */

    static void
    output_added_clobbers_hard_reg_p ()

I don't have much idea what that comment means, mostly because
the function name 'added_clobbers_hard_reg_p' doesn't make sense
to me.  I don't know what the 'added' refers to.

so I look at who calls added_clobbers_hard_reg_p, which is
insn_invalid_p in recog.c.  that's helpful, and it suggests to me
that if the function were named something like
'adding_clobbers_would_clobber_hard_reg_p' then I'd have
understood the comment better, and maybe even understood the
function without a descriptive comment.

or maybe the word 'added' should just be deleted, since it
doesn't seem to me that where the function's used is relevant to
what it does.  if it were called just 'clobbers_hard_reg_p', then
I wouldn't have been confused by the comment.

but after looking at insn_invalid_p, I'm now unclear why the
added_clobbers_hard_reg_p function exists at all, since my
intuition is that a function named insn_invalid_p should be a
pure function, and the prologue comment doesn't give me any
reason to expect the function to have side effects.  so I'm
puzzled insn_invalid_p modifies the insn it's given.  it seems to
me the modification should happen somewhere earlier, but perhaps
there's a reason for doing it this way.

that was also a change you made.  so, starting from the
assumption that you're not an idiot, I can work on reconstructing
your logic, which involves looking at insn_invalid_p's callers,
digging into mailing list discussions, asking you, etc.  after
doing all that, I could submit patches that summarize what I
find, so other people don't have to go through the same
investigative process.  but all that's too much work right now.
I have other things I need to do.  maybe later.
--

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 18:58 ` Joe Buck
@ 2004-02-04 19:10   ` Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-04 19:10 UTC (permalink / raw)
  To: Joe Buck; +Cc: Richard Kenner, gcc

Joe Buck wrote:

> OK, the *best* would be written by that almost non-existent programmer who
> combines a love of writing clear English prose with the extraordinary
> discipline required to do it first and keep it current.  But since this
> species is almost non-existent, you need to read my phrase "best
> documentation" for "best documentation achievable in practice".

You are FAR too pessimistic, I know lots of programmers who meet
these criteria. It is of course important that comments are reviewed
with the same energy and intensity used in reviewing code. You simply
have to establish an environment in which comments are first class
citizens and everyone spends effort appropriately

> In any case, we have to go from what we have: right now we have code that
> is not adequately documented.  Time travel is not an option.

Sure, but we can in many cases ask authors to do the best retrospective
job they can (and preferably not simply tell us to go rummage around in
the code :-)



^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 18:05 Richard Kenner
@ 2004-02-04 18:58 ` Joe Buck
  2004-02-04 19:10   ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Joe Buck @ 2004-02-04 18:58 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Wed, Feb 04, 2004 at 01:07:40PM -0500, Richard Kenner wrote:
>     In my experience, the best documention is written by someone with good
>     technical writing skill as well as some programming knowledge who
>     initially doesn't understand the code, but who learns what he or she needs
>     to know by a dialog with the author.  
> 
> I agree you can develope *adequate* documentation that way, but strongly
> disagree that it's the *best*.

OK, the *best* would be written by that almost non-existent programmer who
combines a love of writing clear English prose with the extraordinary
discipline required to do it first and keep it current.  But since this
species is almost non-existent, you need to read my phrase "best
documentation" for "best documentation achievable in practice".

In any case, we have to go from what we have: right now we have code that
is not adequately documented.  Time travel is not an option.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-04 18:05 Richard Kenner
  2004-02-04 18:58 ` Joe Buck
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-04 18:05 UTC (permalink / raw)
  To: Joe.Buck; +Cc: gcc

    In my experience, the best documention is written by someone with good
    technical writing skill as well as some programming knowledge who
    initially doesn't understand the code, but who learns what he or she needs
    to know by a dialog with the author.  

I agree you can develope *adequate* documentation that way, but strongly
disagree that it's the *best*.  When you do this after-the-fact, the author
will likely have forgotten many things that were important to have known
when writing the code.  Also, it's very hard to really find out everything
significant in this "game of 20 questions" approach.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 14:08 Richard Kenner
@ 2004-02-04 17:50 ` Joe Buck
  2004-02-05 19:57 ` Felix Lee
  1 sibling, 0 replies; 171+ messages in thread
From: Joe Buck @ 2004-02-04 17:50 UTC (permalink / raw)
  To: Richard Kenner; +Cc: matz, gcc

On Wed, Feb 04, 2004 at 09:11:29AM -0500, Richard Kenner wrote:
> How could I possibly write documentation on code that I need documentation
> to understand?  Any documentation I could write would be totally useless!

In my experience, the best documention is written by someone with good
technical writing skill as well as some programming knowledge who
initially doesn't understand the code, but who learns what he or she needs
to know by a dialog with the author.  The process of generating the
documentation this way often uncovers bugs, which fall out when the coder
can't explain something, or can't answer a question about a boundary case.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04  2:56 ` Russ Allbery
@ 2004-02-04 17:26   ` Phil Edwards
  0 siblings, 0 replies; 171+ messages in thread
From: Phil Edwards @ 2004-02-04 17:26 UTC (permalink / raw)
  To: Russ Allbery; +Cc: gcc

On Tue, Feb 03, 2004 at 06:56:49PM -0800, Russ Allbery wrote:
> Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:
> 
> > Yes, but a *user* of bison is usually a compiler writer!
> 
> Er, no.  Not even remotely.  A user of bison is usually someone who has
> something they want to parse that's representable by a simple grammar.  I
> would wager that the majority of uses of bison in practice are for parsing
> configuration files.

Most definitely, in my experience.  I taught myself yacc/bison using the
Levine text well before my first compiler course, specifically to write a
parser for various input files for my then-employer.  Nobody else in my
group there had ever written a compiler, but they'd all used some yacc
variant for config or data files.

-- 
"What attribute do you consider most valuable to the politician?"
"To be able to raise a cause which shall produce an effect, and
then *fight the effect*."
    - Captain T.W.S. Kidd and Senatorial candidate Abraham Lincoln

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 14:46       ` Robert Dewar
@ 2004-02-04 15:56         ` Daniel Berlin
  0 siblings, 0 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-02-04 15:56 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc, Michael Matz, Richard Kenner, felix.1

>
> > Thread closed now?  Pretty please?
>
> Feel free to use the amazing kill thread capability of your
> mailer (I trust it has this :-) The subject line has been
> quite clear, and although we have deviated from the original
> point (that a mere reference to a paper does not constitute
> acceptable documentation), the discussion has in fact stayed
> quite consistently on track.

Except that some of us don't use mailers that reply to discussions 
properly, so the replies don't get put in the thread.

This is an inconvenience to all of us, of course.
Maybe people would be more willing to sacrifice and go along with 
certain viewpoints that aren't their own if others would do the same 
(IE "sacrifice" and use threading mail programs) in general. Just a 
thought.  After all, I don't believe this is the first time the whole 
"non-properly threading mailer" problem has been pointed out.  While 
you (Robert), have been accommodating in this regard (and a big thank 
you for that), others have not, even when it hurts all of us (much like 
the lack of documentation).

Then again, in this thread, some are continually appearing to say (I 
don't want to put words in people's mouths) that their way is the only 
way to good software, and that everyone else's viewpoint (even if it 
only minorly varies from their own) is evil and must be destroyed in 
order to produce good software.
As if the exact level of documentation necessary in programs was a 
settled argument in software engineering, and everyone is just being 
lazy and not wanting to follow it.
It's all about compromise, folks.
Here it's quite clear that everyone has their own dug in position on 
this topic, and it's probably better to find out *exactly* what their 
positions are (In terms of what documentation they think should be 
required, and to what level), and find some  compromise, than it is to 
reiterate the positions over and over again and the reasons for them.

Right now what we are doing reminds me of the Simpsons where they are 
getting electro-shock treatment, and are all strapped into chairs 
shocking each other (and of course, Maggie is just randomly hitting 
buttons shocking other family members).
To quote Dr. Marvin Monroe from the Simpsons, "This is not the way to a 
healthy family interaction"

As a lawyer in training, I see this style of "discussion" maybe 8 
million times a day.  It's never produced a solution, and after 
watching it for a while you just want to throttle everyone on both 
sides until they come to a compromise.  Unfortunately, throttling 
people is not considered to be a valid style of mediation.

On a side note, I'd love to be able to just kill thread it instead of 
writing these emails, but as previously stated, some of us don't use 
mailers that do threaded replies properly.

--Dan

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 14:30     ` Michael Matz
  2004-02-04 14:43       ` Arnaud Charlet
@ 2004-02-04 14:46       ` Robert Dewar
  2004-02-04 15:56         ` Daniel Berlin
  1 sibling, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-04 14:46 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Kenner, felix.1, gcc

Michael Matz wrote:

> This is all nice and fine.  I think you just want to discuss for the sake
> of discussion, because otherwise you would realize that this is all
> already after the fact.  We _have_ code which Kenner thinks is not
> documented very well (the only prove being that he hadn't heard of
> dominators before).  Adjusting any rules whatsoever (or enforcing existing
> once) is not going to help a little bit with that fact.  And still Kenner
> states the obvious again and again instead of helping.  Note that you do 
> a similar thing.  _That_ is my point. 

How can we help? As we point out in detail, we can't write
documentation successfully for undocumented code written
by others. The burden of documentation must lie with the
authors of the code, nothing else can work successfully.

Doing it after the fact is not ideal, but if it has to be
done after the fact OK. But the documentation must be done
by those who wrote the code, know what it is supposed to do,
know why they made the choices they did, know whey they did
not do certain things that may otherwise appear obvious
choices, know what invariants hold in the code etc.

An interesting analogy is in real engineering. Suppose you are
Israel, and you have a Mirage jet fighter, but you lack the
documentation (i.e. blue prints) and France suddently decides
not to provide any more spare parts. Can you reverse engineer
to do it yourself. Answer: almost impossible, you can look at
a ball bearing for example, and measure its size, but you can't
measure the tolerance, which is all important. You just have to
have the blue prints with these specifications.

 > Thread closed now?  Pretty please?

Feel free to use the amazing kill thread capability of your
mailer (I trust it has this :-) The subject line has been
quite clear, and although we have deviated from the original
point (that a mere reference to a paper does not constitute
acceptable documentation), the discussion has in fact stayed
quite consistently on track.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 14:30     ` Michael Matz
@ 2004-02-04 14:43       ` Arnaud Charlet
  2004-02-04 14:46       ` Robert Dewar
  1 sibling, 0 replies; 171+ messages in thread
From: Arnaud Charlet @ 2004-02-04 14:43 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Kenner, felix.1, gcc

> And still Kenner
> states the obvious again and again instead of helping.  Note that you do 
> a similar thing.  _That_ is my point.  Thread closed now?  Pretty please?

Maybe Richard is stating the obvious because people are not listening:
Richard cannot contribute these comments since he does not understand
the code in question, and other people are apparently simply
ignoring his requests for adding comments (except for a few welcome
exceptions).

Your messages seem to imply that this is Richard's fault if the code is
undocumented and that he can't document it. I find that pretty unfair
and representative of the fact that too many people are still considering
adding documentation as unimportant and boring work. This is apparently
your case, which I find too bad.

Arno

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-04 14:35 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-04 14:35 UTC (permalink / raw)
  To: matz; +Cc: gcc

    And still Kenner states the obvious again and again instead of
    helping.  Note that you do a similar thing.  _That_ is my point.

There's no way anybody other than the original author can "help" with the
missing documentation because the documentation, to be useful, must contain
things that *aren't* in the code, and nobody else can know those things.

Moreover, the reason I complain about things is that I don't understand them
because they aren't sufficiently documented (note that none of these to date
have involved tree-ssa code since I haven't looked at it yet).  Given that I
can't understand it, how can I possibly "help" with documenting it?

That's why it's so critical that we enforce standards before code is
committed into the main tree.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 14:11   ` Robert Dewar
@ 2004-02-04 14:30     ` Michael Matz
  2004-02-04 14:43       ` Arnaud Charlet
  2004-02-04 14:46       ` Robert Dewar
  0 siblings, 2 replies; 171+ messages in thread
From: Michael Matz @ 2004-02-04 14:30 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Richard Kenner, felix.1, gcc

Hi,

On Wed, 4 Feb 2004, Robert Dewar wrote:

> > You are global write, you can surely apply your documentation patches on
> > your own.  Oh wait ...
> 
> There's something very wrong here. It is essential that documentation
> be written as the code is produced, or where feasible *before* the

This is all nice and fine.  I think you just want to discuss for the sake
of discussion, because otherwise you would realize that this is all
already after the fact.  We _have_ code which Kenner thinks is not
documented very well (the only prove being that he hadn't heard of
dominators before).  Adjusting any rules whatsoever (or enforcing existing
once) is not going to help a little bit with that fact.  And still Kenner
states the obvious again and again instead of helping.  Note that you do 
a similar thing.  _That_ is my point.  Thread closed now?  Pretty please?

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 13:49 ` Michael Matz
@ 2004-02-04 14:11   ` Robert Dewar
  2004-02-04 14:30     ` Michael Matz
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-04 14:11 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Kenner, felix.1, gcc

Michael Matz wrote:

> You are global write, you can surely apply your documentation patches on 
> your own.  Oh wait ...

There's something very wrong here. It is essential that documentation
be written as the code is produced, or where feasible *before* the
code is produced. Adding documentation after the fact is almost
never satisfactory for several reasons.

1. The style of adding documentation after the fact tends to go along
with a viewpoint of documentation as some kind of annoying extra
junk work, which is not conducive to getting good documentation.

2. It encourages a style of "I will write the code, but someone
else can document it if they really need documentation". This is
a variation of "real programmers don't need no !@# documentation"
and again is not conducive to getting good documentation.

3. By the time the program is all working, you may well have forgotten
details that in fact are quite critical for understanding things.
Better to write the documentation at the point where you are working
to understand things, and they are fresh in your mind.

4. In the case where someone else is expected to provide this
documentation (apparently what Michael has in mind here), the
process of groveling through code to reconstruct the documentation
can never succeed for two fundamental reasons:

    4a. You end up documenting what the code does, instead of what
        it is supposed to do.

    4b. You can document what the code does, but not why particular
        choices of implementation were made.

    4c. An important part of documentation is what you did NOT do
        and why you did NOT do it. That can never be deduced from
        the code

Note that paragraph 4 here also shows why "code groveling" itself
can never substitute for good documentation.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-04 14:08 Richard Kenner
  2004-02-04 17:50 ` Joe Buck
  2004-02-05 19:57 ` Felix Lee
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-04 14:08 UTC (permalink / raw)
  To: matz; +Cc: gcc

    > The problem is that it doesn't get done!
    
    Thank you for all the nice documentation patches you send instead of
    wasting time by repeating the same rant over and over.  They help
    tremendously to resolve this situation.

How could I possibly write documentation on code that I need documentation
to understand?  Any documentation I could write would be totally useless!

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-04 13:20 Richard Kenner
@ 2004-02-04 13:49 ` Michael Matz
  2004-02-04 14:11   ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Michael Matz @ 2004-02-04 13:49 UTC (permalink / raw)
  To: Richard Kenner; +Cc: felix.1, gcc

Hi,

On Wed, 4 Feb 2004, Richard Kenner wrote:

> The problem is that it doesn't get done!

Thank you for all the nice documentation patches you send instead of
wasting time by repeating the same rant over and over.  They help
tremendously to resolve this situation.

> I've sent about a half dozen of such messages and in only *one* of those
> cases did anybody do anything about it.

You are global write, you can surely apply your documentation patches on 
your own.  Oh wait ...

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 22:20   ` Dale Johannesen
@ 2004-02-04 13:48     ` Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-04 13:48 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: gcc, Richard Kenner, s.bosscher

Dale Johannesen wrote:

> While I have no wish to discourage anyone from improving the
> documentation (and I'm curious how many people on this thread
> are actually doing so), I think you exaggerate the dangers.
> "block" means different things to FE people and BE people,
> and there are places where it isn't obvious which is meant until
> you grovel through the code a bit, and we've all been coping with
> it for years.

I am a bit flabbergasted by this reply, which essentially
says "we don't need accurate documentation, because we have
managed in the past by "grovel[ing] through code a bit".

The push for good documentation comes from a viewpoint that
this kind of groveling is highly undesirable. Yes, you can
program everything in absolute binary machine language with
no comments at all if you have to, but the fact that you
can manage with an unacceptable state of affairs is not a
good argument for continuing it.

The particular example I gave (the question of whether A
dominates itself) is exactly the sort of thing that can
cause subtle bugs in code, especially in the *maintenance*
of such code. The original author may have absolutely known
the answer to this question (and may well, indoctrinated by
some particular compiler course or book, not even know there
*is* a question, after all several correspondents in this
thread seemed to think the definition of dominator was
universal).

But a maintenance programmer who has taken a different
compiler course or read a different compiler book, may
assume the other definition, and that can indeed result
in confusion and subtle bugs.

The case of a dominator is actually a very nice one. This
is a simple concept that can be explained in a couple of
short sentences. Why on EARTH would anyone object to
including such a definition in the code? I find such
objection inexplicable.

In the GNAT world, we regard it as a bug if someone
coming to read the code new finds the comments incomplete.
We file and fix bug reports of this kind like any other
bug report, since we regard correct and complete
documentation as being as important as correct and
complete code.

I am sure we still have lots of such bugs in the GNAT
code (some of them are marked with ??? in the sources,
often by me :-) Feel free to point out others :-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-04 13:20 Richard Kenner
  2004-02-04 13:49 ` Michael Matz
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-04 13:20 UTC (permalink / raw)
  To: felix.1; +Cc: gcc

    well, that's true for pretty much any word in any natural language.
    it's generally expected that people can resolve ambiguities from
    context (even in math papers).  and in programming, the actual program
    is usually a pretty strong disambiguator.

It most certainly *is not*!  The problem is that programs are not perfect and
may have bugs.  The whole purpose of the documentation is to describe what
the program is *supposed* to do.  If you don't have the details of what it's
supposed to do, you can't tell if it's correct or not.

And these subtle variations in definition are precisely the places where
bugs will reside!

    everyone already agrees that readable code is a good thing.  it
    isn't hard to handle that on a case-by-case basis.  "it took me a
    while to figure out what this meant.  how about this patch for
    clarity?"

The problem is that it doesn't get done!  I've sent about a half dozen of
such messages and in only *one* of those cases did anybody do anything
about it.

That's why we need strict documentation standards that are enforced *before*
code is accepted.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-04 13:10 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-04 13:10 UTC (permalink / raw)
  To: dalej; +Cc: gcc

    While I have no wish to discourage anyone from improving the
    documentation (and I'm curious how many people on this thread
    are actually doing so), I think you exaggerate the dangers.
    "block" means different things to FE people and BE people,
    and there are places where it isn't obvious which is meant until
    you grovel through the code a bit, and we've all been coping with
    it for years.

And that's a *good thing* that we should seek to emulate in the future?

As I said, the lack of good definition of "block" and "edge" as used in
the CFG code makes that code very inaccessable and very hard to maintain
since it's not clear what the intent of a lot of the code is.

This most certainly needs to be cleaned up!

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:28 Richard Kenner
  2004-02-03 22:06 ` Robert Dewar
@ 2004-02-04  2:56 ` Russ Allbery
  2004-02-04 17:26   ` Phil Edwards
  1 sibling, 1 reply; 171+ messages in thread
From: Russ Allbery @ 2004-02-04  2:56 UTC (permalink / raw)
  To: gcc

Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:

> Yes, but a *user* of bison is usually a compiler writer!

Er, no.  Not even remotely.  A user of bison is usually someone who has
something they want to parse that's representable by a simple grammar.  I
would wager that the majority of uses of bison in practice are for parsing
configuration files.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
       [not found]         ` <dewar@gnat.com>
@ 2004-02-04  0:00           ` Felix Lee
  0 siblings, 0 replies; 171+ messages in thread
From: Felix Lee @ 2004-02-04  0:00 UTC (permalink / raw)
  To: gcc

Robert Dewar <dewar@gnat.com>:
> If any term yields more than 500 pages, you can almost COUNT on the
> fact that there will be subtle variations in the definition of the
> term!

well, that's true for pretty much any word in any natural
language.  it's generally expected that people can resolve
ambiguities from context (even in math papers).  and in
programming, the actual program is usually a pretty strong
disambiguator.

if all those pages refer to the same paper that defines the term,
then there's probably not going to be much variation in usage.
this can be turned into an argument in favor of "documentation by
paper", since semantic drift happens when you don't simply cite
authority, but this is getting silly.

everyone already agrees that readable code is a good thing.  it
isn't hard to handle that on a case-by-case basis.  "it took me a
while to figure out what this meant.  how about this patch for
clarity?"
--

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:57 Richard Kenner
@ 2004-02-03 22:22 ` Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-03 22:22 UTC (permalink / raw)
  To: Richard Kenner; +Cc: law, gcc

Richard Kenner wrote:

> I'd also expect "symbol" and "context-free grammers" to be well known among
> the CS community that would use Bison, but their manual defines them.

Context Free Grammar is another term for which different authors have
subtly different definitions, e.g. are empty rules allowed (they are
of course not necessary).

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:48   ` Peter Barada
                       ` (2 preceding siblings ...)
  2004-02-03 17:09     ` Daniel Berlin
@ 2004-02-03 22:20     ` Robert Dewar
  3 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-03 22:20 UTC (permalink / raw)
  To: Peter Barada; +Cc: law, kenner, paolo.bonzini, gcc

Peter Barada wrote:

> Another possiblity is to have one file that has the definition of
> terms, as well as bibliography entries for the papers and books that
> the algorithms were derived from.  That way if someone(like me who
> took his last compiler course twenty years ago using the dragon book)
> wanted to learn more about GCC and especially its optimizers, they could
> use the bibliography as a guide to the relavent papers and books.

Another advantage of this approach is that it helps promote uniformity
in the use of terms. Even with definitions and references, it can be
very confusing if one part of the compiler uses a term with a slightly
different definition than another part.


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 22:06 ` Robert Dewar
@ 2004-02-03 22:20   ` Dale Johannesen
  2004-02-04 13:48     ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Dale Johannesen @ 2004-02-03 22:20 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc, Dale Johannesen, Richard Kenner, s.bosscher

On Feb 3, 2004, at 2:05 PM, Robert Dewar wrote:
> A little followup on Dominator. I looked through a bunch of
> Google references, and indeed there is a variation on whether
> people consider that a node can dominate itself. Some authors
> use the term strict dominator to refer to the relation that
> excludes A dominating itself, but others use simply Dominator
> to refer to this.
>
> That's typical of little variations that can cause endless
> troubles if you don't have accurate sets of definitions.

While I have no wish to discourage anyone from improving the
documentation (and I'm curious how many people on this thread
are actually doing so), I think you exaggerate the dangers.
"block" means different things to FE people and BE people,
and there are places where it isn't obvious which is meant until
you grovel through the code a bit, and we've all been coping with
it for years.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:44     ` Felix Lee
@ 2004-02-03 22:18       ` Robert Dewar
       [not found]         ` <dewar@gnat.com>
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-03 22:18 UTC (permalink / raw)
  To: Felix Lee; +Cc: gcc

Felix Lee wrote:

> how about, any term that yields more than 500 pages at Google can
> be assumed to be common knowledge in the field. 

If any term yields more than 500 pages, you can almost COUNT on the
fact that there will be subtle variations in the definition of the
term!

There is a reason why scientific papers (including computer science
papers) define their terminology.

Now in some cases, is it reasonable to refer to some standard
reference for the definitions being used, but at LEAST that should
be done. It is definitely a bad idea to assume that everyone has
the same understanding of a term that you have. Just because you
learned a term in a compiler course, or read it in a compiler
text book does NOT mean that everyone uses the term the way you do
and subtle differences are the details that the devil hides in!

In a case like dominator, the definition is easy and
straightforward, and might as well be repeated. That's not
true of all terms, and indeed for some more complex terms
it is reasonable to use references for definitions.

So something like "we use the term XXX" to mean [brief
overview of definition]. For a precise definition of the
way this term is used here see YYY." is appropriately
helpful.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 21:27   ` Robert Dewar
@ 2004-02-03 22:16     ` Daniel Berlin
  0 siblings, 0 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-02-03 22:16 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc, Steven Bosscher, Richard Kenner, paolo.bonzini


On Feb 3, 2004, at 4:26 PM, Robert Dewar wrote:

> Steven Bosscher wrote:
>
>> You can leave out an explanation of what a dominator is and still
>> be self-contained.  What's next, do we have to explain what a finite
>> automaton is in the sources of GCC?
>
> It's definitely worth defining what a dominator is, since this is
> indeed a term that is not used in absolutely identical manner by
> all compiler papers.

Pardon?
This is certainly an interesting claim.
I've never seen a compiler paper that didn't define a dominator as 
"Node V dominates Node U if every path from start to U contains V" 
(excluding s/Node/Vertex/ and s/start/entry/)

I think if they did, the author would likely be ridiculed and possibly 
beaten.
--Dan

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:28 Richard Kenner
@ 2004-02-03 22:06 ` Robert Dewar
  2004-02-03 22:20   ` Dale Johannesen
  2004-02-04  2:56 ` Russ Allbery
  1 sibling, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-03 22:06 UTC (permalink / raw)
  To: Richard Kenner; +Cc: s.bosscher, gcc

A little followup on Dominator. I looked through a bunch of
Google references, and indeed there is a variation on whether
people consider that a node can dominate itself. Some authors
use the term strict dominator to refer to the relation that
excludes A dominating itself, but others use simply Dominator
to refer to this.

That's typical of little variations that can cause endless
troubles if you don't have accurate sets of definitions.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:00 ` Steven Bosscher
@ 2004-02-03 21:27   ` Robert Dewar
  2004-02-03 22:16     ` Daniel Berlin
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-03 21:27 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Richard Kenner, paolo.bonzini, gcc

Steven Bosscher wrote:

> You can leave out an explanation of what a dominator is and still
> be self-contained.  What's next, do we have to explain what a finite
> automaton is in the sources of GCC?

It's definitely worth defining what a dominator is, since this is
indeed a term that is not used in absolutely identical manner by
all compiler papers. You will certainly see in such a paper that
terms like this are typically defined. Since papers don't see fit
to assume that everyone knows the terminology, I hardly think that
gcc can take a different position :-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 15:40 Paolo Bonzini
@ 2004-02-03 21:21 ` Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-03 21:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kenner, law, gcc

Paolo Bonzini wrote:

>>>I assume that everyone knows what dominators and post-dominators are?
>>>What about a dominator tree?  Value numbering? SSA form?  How using
>>>value numbering on the SSA form during a dominator walk gives us
>>>redundancy elimination on an almost-global scale without the need to
>>>invalidate values from our hash tables?

Note that it is VERY dangerous to rely on everyone knowing what
you mean when you use terms like this. It is very often the case
that different authors use the same term in either very different
senses, or, more worryingly, in subtly different senses.

So it is very important to define terms here. In the case where
definitions correspond to those used by some particular standard
reference book, it is fine to refer to that book for additional
information, but the definitions should be reasonably complete.

Consider that even the definition of prime number, or natural
number, is controversial :-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 17:07     ` law
@ 2004-02-03 17:28       ` Daniel Berlin
  2004-02-08  6:23       ` Jamie Lokier
  1 sibling, 0 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-02-03 17:28 UTC (permalink / raw)
  To: law; +Cc: gcc, Peter Barada, kenner, paolo.bonzini

> Yup.  Though again wikipedia is probably a better solution for that
> problem as well :-)
>
> You can look up SSA, dominator, and a variety of other common terms 
> and get
> reasonably concise definitions.
>
> Having a bibliography is definitely a good thing, particularly since I 
> want
> to see us working more with well known, published algorithms rather 
> than
> cobbling together our own from scratch.

This bibliography could, of course, be auto-generated from comments 
containing paper references in the files.

(Sorry, i had to).
--Dan

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 17:09     ` Daniel Berlin
@ 2004-02-03 17:28       ` Peter Barada
  0 siblings, 0 replies; 171+ messages in thread
From: Peter Barada @ 2004-02-03 17:28 UTC (permalink / raw)
  To: dberlin; +Cc: gcc, law, paolo.bonzini, kenner


>If you really want to learn the basics, you could always look at slides 
>from todays compiler construction courses.
>
>I did a quick google search and found at least 5 nice sets of slides 
>that explain this stuff very well, some with bibliographies at the end 
>(and some with copies of the papers).

You have a URL?

>However, a bunch of compiler writers trying to keep an up to date 
>bibliography of seminal papers in compiler history seems like asking 
>for trouble.

That would be a detriment since instead of reflecting the algorithms used to
write the GCC optimizer passes, the bibliography would reflect the
current state of compiler writing.

Of course if the bibliogrphy gets *really* out of date, then its time
to visit the optimizer passes and update the code :-)

-- 
Peter Barada
peter@the-baradas.com

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:56 Richard Kenner
@ 2004-02-03 17:13 ` Lars Segerlund
  0 siblings, 0 replies; 171+ messages in thread
From: Lars Segerlund @ 2004-02-03 17:13 UTC (permalink / raw)
  To: gcc

On Tue, 3 Feb 04 11:58:34 EST
kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:

>      >Correct.  My last compiler course was well over a decade ago.
>     How is that relevant?
> 
> It acknowleged the statement made.
> 
>     At some point you have to assume a base level of knowledge for your
>     reader.  Are we going to define CFG in every file which uses the CFG?
> 
> No, but we need to in *some* file.
> 
>     Are we going to define the basic properties of SSA in every file which
>     uses that form? 
> 
> Likewise.
> 
>     Hmm, wait, you have to know what blocks and edges are to understand
>     what a CFG is, so every file which uses the CFG has to also define
>     blocks & edges.
> 
> No, but *something* needs to.  In fact, it's very important to define
> what a "block" is because the definition is quite complex and depends on
> such things as whether -fno-call-exceptions (or whatever it's called) is
> enabled.  Indeed, there has ben considerable confusion about how jump
> tables fit into blocks.
> 
> In it's simplest form, edges can be defined in one sentence.

 Hi hi ha ha ha ho ho ho ... stop this is killing me :-) .....

 / pun intended, Lars Segerlund.


>  But actually
> need a lot more than that because of all the special edge types we define
> for EH, for example.
> 
> Indeed I find the CFG code very hard to read (and even harder to modify)
> precisely because these things are *not* defined.
> 
> So you have chosen a good example!

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:48   ` Peter Barada
  2004-02-03 17:03     ` Paul Koning
  2004-02-03 17:07     ` law
@ 2004-02-03 17:09     ` Daniel Berlin
  2004-02-03 17:28       ` Peter Barada
  2004-02-03 22:20     ` Robert Dewar
  3 siblings, 1 reply; 171+ messages in thread
From: Daniel Berlin @ 2004-02-03 17:09 UTC (permalink / raw)
  To: Peter Barada; +Cc: gcc, law, paolo.bonzini, kenner

>
> Another possiblity is to have one file that has the definition of
> terms, as well as bibliography entries for the papers and books that
> the algorithms were derived from.  That way if someone(like me who
> took his last compiler course twenty years ago using the dragon book)
> wanted to learn more about GCC and especially its optimizers, they 
> could
> use the bibliography as a guide to the relavent papers and books.

If you really want to learn the basics, you could always look at slides 
from todays compiler construction courses.

I did a quick google search and found at least 5 nice sets of slides 
that explain this stuff very well, some with bibliographies at the end 
(and some with copies of the papers).

However, a bunch of compiler writers trying to keep an up to date 
bibliography of seminal papers in compiler history seems like asking 
for trouble.
--Dan

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:48   ` Peter Barada
  2004-02-03 17:03     ` Paul Koning
@ 2004-02-03 17:07     ` law
  2004-02-03 17:28       ` Daniel Berlin
  2004-02-08  6:23       ` Jamie Lokier
  2004-02-03 17:09     ` Daniel Berlin
  2004-02-03 22:20     ` Robert Dewar
  3 siblings, 2 replies; 171+ messages in thread
From: law @ 2004-02-03 17:07 UTC (permalink / raw)
  To: Peter Barada; +Cc: kenner, paolo.bonzini, gcc

In message <20040203164801.62EB4990D5@baradas.org>, Peter Barada writes:
 >Another possiblity is to have one file that has the definition of
 >terms, as well as bibliography entries for the papers and books that
 >the algorithms were derived from.  That way if someone(like me who
 >took his last compiler course twenty years ago using the dragon book)
 >wanted to learn more about GCC and especially its optimizers, they could
 >use the bibliography as a guide to the relavent papers and books.
Yup.  Though again wikipedia is probably a better solution for that
problem as well :-)

You can look up SSA, dominator, and a variety of other common terms and get
reasonably concise definitions.

Having a bibliography is definitely a good thing, particularly since I want
to see us working more with well known, published algorithms rather than
cobbling together our own from scratch.


Jeff



^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:48   ` Peter Barada
@ 2004-02-03 17:03     ` Paul Koning
  2004-02-03 17:07     ` law
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 171+ messages in thread
From: Paul Koning @ 2004-02-03 17:03 UTC (permalink / raw)
  To: peter; +Cc: law, kenner, paolo.bonzini, gcc

>>>>> "Peter" == Peter Barada <peter@the-baradas.com> writes:

 Peter> Another possiblity is to have one file that has the definition
 Peter> of terms, as well as bibliography entries for the papers and
 Peter> books that the algorithms were derived from.  That way if
 Peter> someone(like me who took his last compiler course twenty years
 Peter> ago using the dragon book) wanted to learn more about GCC and
 Peter> especially its optimizers, they could use the bibliography as
 Peter> a guide to the relavent papers and books.

I would welcome that too.  None of the terms that have been argued
about here were ones that showed up in my 1970s vintage compiler
course or textbooks.  Telling people to read up in modern texts is
fine, but it would help to know which ones.  I assume it is still
true, as it was in the 1970s, that some compiler textbooks are fit
only for kindling.

     paul

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 16:57 Richard Kenner
  2004-02-03 22:22 ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 16:57 UTC (permalink / raw)
  To: law; +Cc: gcc

    I would disagree strongly.  dominators, dominator tree, SSA, value
    numbering, loop tree are quite well known within the compiler
    community.  In fact, I would hazard a guess that it's probably just
    some GCC folks that don't have a firm grasp on these concepts.  They
    are what I would consider basic terminology and concepts for anyone in
    this field.

I'd also expect "symbol" and "context-free grammers" to be well known among
the CS community that would use Bison, but their manual defines them.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 16:56 Richard Kenner
  2004-02-03 17:13 ` Lars Segerlund
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 16:56 UTC (permalink / raw)
  To: law; +Cc: gcc

     >Correct.  My last compiler course was well over a decade ago.
    How is that relevant?

It acknowleged the statement made.

    At some point you have to assume a base level of knowledge for your
    reader.  Are we going to define CFG in every file which uses the CFG?

No, but we need to in *some* file.

    Are we going to define the basic properties of SSA in every file which
    uses that form? 

Likewise.

    Hmm, wait, you have to know what blocks and edges are to understand
    what a CFG is, so every file which uses the CFG has to also define
    blocks & edges.

No, but *something* needs to.  In fact, it's very important to define
what a "block" is because the definition is quite complex and depends on
such things as whether -fno-call-exceptions (or whatever it's called) is
enabled.  Indeed, there has ben considerable confusion about how jump
tables fit into blocks.

In it's simplest form, edges can be defined in one sentence.  But actually
need a lot more than that because of all the special edge types we define
for EH, for example.

Indeed I find the CFG code very hard to read (and even harder to modify)
precisely because these things are *not* defined.

So you have chosen a good example!

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:20 Richard Kenner
@ 2004-02-03 16:54 ` Jan Hubicka
  0 siblings, 0 replies; 171+ messages in thread
From: Jan Hubicka @ 2004-02-03 16:54 UTC (permalink / raw)
  To: Richard Kenner; +Cc: paolo.bonzini, gcc

>     Everybody with less than guru knowledge of gcc asks, why.  And why CSE
>     propagates addressof into memories.  And who takes care of it for
>     nonoptimizing compilation.  And whether it is even needed for
>     nonoptimizing compilation.  And the flow of thought goes on.
> 
> Indeed.  The ADDRESSOF addition was one that gave GCC a lot of
> trouble, precisely because of the issues you raise, that its semantics
> were very poorly defined.
> 
> I see that as cautionary that we need to do better in the future, not
> as something to emulate into the future!

I hope that addressof will be elliminated soon.  In fact I do have patch
for this for tree-SSA, but still we need to resolve on how to optimize
out the casts of stype *(const char *)&char_variable that occurs
commonly in gimplified C++ code (Richard Henderson has plans on fixing
C++ FE to avoid producing this, so hope this will be sorted out soon)
> 
>     Passes that are developed right now, especially as part of the
>     tree-ssa work (but also Jan and Zdenek's rtlopt work), work at a
>     different, more generic level, and require a different kind of culture
>     on the reader.
> 
> Perhaps, but we have to be very careful of setting too high a level of
> knowlege to know to be able to work on GCC.  Sure, if you're working on
> subtle parts of these algorithms, you need to understand them "cold", but
> an overview description is essential for those doing minor work in these
> files.

I would agree here that the notion of counts/frequencies/probabilites,
hottness/coldness of basic block and various types of edges is midly
confusing (the last one even to someone who knows what CFG/profile is).

I tried to use quite precise names that results in longer identifiers
(like maybe_hot_bb_p/probably_cold_bb_p) that indicate what kind of
approximation it is, but it is still not very obvous.

I wrote chaper to gccint.texi devoted to the CFG and profile and I think
it is quite shame that it never has been accepted.  Steven has updated
version.  This only show that it is quite dificult to judge where the
documentation shall go in the current scheme.

Honza
> 
> As I said, the Bison manual even defines the term "symbol".  That sort of
> approach seems the appropriate one to me.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:35 ` law
@ 2004-02-03 16:48   ` Peter Barada
  2004-02-03 17:03     ` Paul Koning
                       ` (3 more replies)
  2004-02-07  0:14   ` Kai Henningsen
  1 sibling, 4 replies; 171+ messages in thread
From: Peter Barada @ 2004-02-03 16:48 UTC (permalink / raw)
  To: law; +Cc: kenner, paolo.bonzini, gcc


> >Of course not, but it's always useful to define each term used at least
> >to some extent.
>Which I'm less and less inclined to do since we're using standard 
>terminology dating back over 15 years (you can find dominators and 
>dominator tree all the way back in the dragon book and probably 
>earlier if you care to look).
>
>At some point you have to assume a base level of knowledge for your
>reader.  Are we going to define CFG in every file which uses the CFG?
>Are we going to define the basic properties of SSA in every
>file which uses that form?    Hmm, wait, you have to know what blocks
>and edges are to understand what a CFG is, so every file which uses the
>CFG has to also define blocks & edges.  It's actually rather silly when you
>start to think about it -- particularly when we use a series of building
>blocks (blocks, edge, CFG, dominators, loop tree, SSA, value numbering) to 
>implement an optimization or analysis pass.

Another possiblity is to have one file that has the definition of
terms, as well as bibliography entries for the papers and books that
the algorithms were derived from.  That way if someone(like me who
took his last compiler course twenty years ago using the dragon book)
wanted to learn more about GCC and especially its optimizers, they could
use the bibliography as a guide to the relavent papers and books.

-- 
Peter Barada
peter@the-baradas.com

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 12:12 Richard Kenner
@ 2004-02-03 16:46 ` Felix Lee
  0 siblings, 0 replies; 171+ messages in thread
From: Felix Lee @ 2004-02-03 16:46 UTC (permalink / raw)
  To: gcc

kenner@vlsi1.ultra.nyu.edu (Richard Kenner):
> In some ways, sure, but often the best way to study code is to do it
> on a long plane flight and that means having everything in a couple of
> well-defined places.

gcc/doc is already over half the size of the complete works of
Shakespeare.  that's a very long plane flight.

core gcc source (no subdirectories) is about 4 Shakespeares.
gcc/cp is another Shakespeare.
gcc/f is another Shakespeare.
gcc/objc is tiny, 3 Macbeths.

1 Shakespeare ~ 5M chars
    1 Macbeth ~ 100K chars
--

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03  8:09   ` law
@ 2004-02-03 16:44     ` Felix Lee
  2004-02-03 22:18       ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Felix Lee @ 2004-02-03 16:44 UTC (permalink / raw)
  To: gcc

law@redhat.com:
> One of the interesting questions I'm working through right now is to
> figure out how much I should assume the reader knows.  For example,

how about, any term that yields more than 500 pages at Google can
be assumed to be common knowledge in the field.  like, "register
coloring" (with quotes) gets 572 pages.  well, if that's
considered an obscure technique, then maybe raise the threshold
to 2000 or so.
--

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:23 ` Steven Bosscher
@ 2004-02-03 16:40   ` law
  0 siblings, 0 replies; 171+ messages in thread
From: law @ 2004-02-03 16:40 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Richard Kenner, gcc

In message <200402031724.10531.s.bosscher@student.tudelft.nl>, Steven Bosscher 
writes:
 >discussion was, AFAICT, about GCC internals documentation.  I honestly don't
 >believe that anyone working on compiler optimizations does now 
know/understand
 >the concept of dominance, or DFA, and so on.
Well, certainly such people do exist (probably concentrated in the GCC
community :(.  But I'm not particularly interested in trying to re-invent
what so many basic compiler texts already have done as far as explaining these
basic concepts.


 > These are just computer science fundamentals.  A user doesn't have to
 > understand them, but any compiler writer should.
Agreed 100%.


jeff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:16 Richard Kenner
  2004-02-03 16:23 ` Steven Bosscher
@ 2004-02-03 16:38 ` law
  1 sibling, 0 replies; 171+ messages in thread
From: law @ 2004-02-03 16:38 UTC (permalink / raw)
  To: Richard Kenner; +Cc: s.bosscher, gcc

In message <10402031618.AA21947@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >As an example of the sort of thing I'd expect to see, look at the Bison
 >manual.  Not only does it not simply leave "LALR(1)" to be looked up (and
 >that's a much more well known term than those previously mentioned), it even
 >defines "context-free grammar" and "symbol".
I would disagree strongly.  dominators, dominator tree, SSA, value numbering,
loop tree are quite well known within the compiler community.  In fact, I
would hazard a guess that it's probably just some GCC folks that don't have
a firm grasp on these concepts.  They are what I would consider basic
terminology and concepts for anyone in this field.

jeff




^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 16:37 Paolo Bonzini
  0 siblings, 0 replies; 171+ messages in thread
From: Paolo Bonzini @ 2004-02-03 16:37 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

> Yes, but a *user* of bison is usually a compiler writer!

I used bison in the project for an image processing course.  I documented
why my grammar has shift-reduce conflicts; but not why I used
corner-reflection to implement image convolution, only that I implemented
it.

Paolo

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 15:48 Richard Kenner
  2004-02-03 16:00 ` Steven Bosscher
  2004-02-03 16:07 ` Paolo Bonzini
@ 2004-02-03 16:35 ` law
  2004-02-03 16:48   ` Peter Barada
  2004-02-07  0:14   ` Kai Henningsen
  2 siblings, 2 replies; 171+ messages in thread
From: law @ 2004-02-03 16:35 UTC (permalink / raw)
  To: Richard Kenner; +Cc: paolo.bonzini, gcc

In message <10402031550.AA21796@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >    Which you have never read (at least a modern one), if you do not know
 >    what a dominator is.  My first compiler course, which used only part
 >    of Appel's "Modern compiler implementation" (which is nowhere near in
 >    depth with respect to Muchnick or Morgan) did teach dominators.
 >
 >Correct.  My last compiler course was well over a decade ago.
How is that relevant?  Dominators are discussed in several texts one 
can read, including, but not limited to Morgan, Muchnick, Appel, Aho, etc.
Dominators actually pre-date your decade-ago compiler course.  Pick up a
book and do a little reading :-)

What has changed in the last 15 years is that dominators are used by a
variety of optimization and analysis algorithms instead of primarily being
used for loop discovery.

 >
 >    I'd love gcc to become a free compiler text, but it is not its purpose.
 >
 >Of course not, but it's always useful to define each term used at least
 >to some extent.
Which I'm less and less inclined to do since we're using standard 
terminology dating back over 15 years (you can find dominators and 
dominator tree all the way back in the dragon book and probably 
earlier if you care to look).

At some point you have to assume a base level of knowledge for your
reader.  Are we going to define CFG in every file which uses the CFG?
Are we going to define the basic properties of SSA in every
file which uses that form?    Hmm, wait, you have to know what blocks
and edges are to understand what a CFG is, so every file which uses the
CFG has to also define blocks & edges.  It's actually rather silly when you
start to think about it -- particularly when we use a series of building
blocks (blocks, edge, CFG, dominators, loop tree, SSA, value numbering) to 
implement an optimization or analysis pass.

 > GCC has always been pretty self-contained and I see no
 >reason to change that policy at this point.
We disagree then.

jeff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 16:28 Richard Kenner
  2004-02-03 22:06 ` Robert Dewar
  2004-02-04  2:56 ` Russ Allbery
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 16:28 UTC (permalink / raw)
  To: s.bosscher; +Cc: gcc

    The difference here is that you are talking about a bison _user_
    manual.  This discussion was, AFAICT, about GCC internals
    documentation.  I honestly don't believe that anyone working on
    compiler optimizations does now know/understand the concept of
    dominance, or DFA, and so on. These are just computer science
    fundamentals.  A user doesn't have to understand them, but any
    compiler writer should.

Yes, but a *user* of bison is usually a compiler writer!

    I would like better internals/interfaces documentation too.  But I'd rather
    focus on GCC specific implementation details than on terminology and basic
    algorithms and concepts.

I agree with the *focus*, but I think both can be done and the documentation
will read a lot smoother if its spend the time to discuss the terms it's
using.  It will at least serve as reminders and also help tie the details of
the implementation into the theory.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 16:16 Richard Kenner
@ 2004-02-03 16:23 ` Steven Bosscher
  2004-02-03 16:40   ` law
  2004-02-03 16:38 ` law
  1 sibling, 1 reply; 171+ messages in thread
From: Steven Bosscher @ 2004-02-03 16:23 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Tuesday 03 February 2004 17:18, Richard Kenner wrote:
>     You can leave out an explanation of what a dominator is and still
>     be self-contained.
>
> Not in the sense that it has been in the past.
>
>     What's next, do we have to explain what a finite automaton is in the
>     sources of GCC?
>
> It's always hard to know where to draw the line, but it would seem to me
> that in any case where you have a finite automaton, the description of the
> automaton itself would almost serve as a definition-by-example of the term.
>
> As an example of the sort of thing I'd expect to see, look at the Bison
> manual.  Not only does it not simply leave "LALR(1)" to be looked up (and
> that's a much more well known term than those previously mentioned), it
> even defines "context-free grammar" and "symbol".

The difference here is that you are talking about a bison _user_ manual.  This
discussion was, AFAICT, about GCC internals documentation.  I honestly don't
believe that anyone working on compiler optimizations does now know/understand
the concept of dominance, or DFA, and so on. These are just computer science
fundamentals.  A user doesn't have to understand them, but any compiler writer
should.

I would like better internals/interfaces documentation too.  But I'd rather
focus on GCC specific implementation details than on terminology and basic
algorithms and concepts.

Gr.
Steven

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 16:20 Richard Kenner
  2004-02-03 16:54 ` Jan Hubicka
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 16:20 UTC (permalink / raw)
  To: paolo.bonzini; +Cc: gcc

    Everybody with less than guru knowledge of gcc asks, why.  And why CSE
    propagates addressof into memories.  And who takes care of it for
    nonoptimizing compilation.  And whether it is even needed for
    nonoptimizing compilation.  And the flow of thought goes on.

Indeed.  The ADDRESSOF addition was one that gave GCC a lot of
trouble, precisely because of the issues you raise, that its semantics
were very poorly defined.

I see that as cautionary that we need to do better in the future, not
as something to emulate into the future!

    Passes that are developed right now, especially as part of the
    tree-ssa work (but also Jan and Zdenek's rtlopt work), work at a
    different, more generic level, and require a different kind of culture
    on the reader.

Perhaps, but we have to be very careful of setting too high a level of
knowlege to know to be able to work on GCC.  Sure, if you're working on
subtle parts of these algorithms, you need to understand them "cold", but
an overview description is essential for those doing minor work in these
files.

As I said, the Bison manual even defines the term "symbol".  That sort of
approach seems the appropriate one to me.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 16:16 Richard Kenner
  2004-02-03 16:23 ` Steven Bosscher
  2004-02-03 16:38 ` law
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 16:16 UTC (permalink / raw)
  To: s.bosscher; +Cc: gcc

    You can leave out an explanation of what a dominator is and still
    be self-contained.  

Not in the sense that it has been in the past.

    What's next, do we have to explain what a finite automaton is in the
    sources of GCC?

It's always hard to know where to draw the line, but it would seem to me that
in any case where you have a finite automaton, the description of the
automaton itself would almost serve as a definition-by-example of the term.

As an example of the sort of thing I'd expect to see, look at the Bison
manual.  Not only does it not simply leave "LALR(1)" to be looked up (and
that's a much more well known term than those previously mentioned), it even
defines "context-free grammar" and "symbol".

I think that manual sets a good standard for GCC documentation and we should
do at least as much as it does.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 15:48 Richard Kenner
  2004-02-03 16:00 ` Steven Bosscher
@ 2004-02-03 16:07 ` Paolo Bonzini
  2004-02-03 16:35 ` law
  2 siblings, 0 replies; 171+ messages in thread
From: Paolo Bonzini @ 2004-02-03 16:07 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

> Of course not, but it's always useful to define each term used at least
> to some extent.  GCC has always been pretty self-contained and I see no
> reason to change that policy at this point.

Lovely policy indeed, but the way GCC is designed has changed a lot in the
last decade.  Passes like CSE, reload and combine, while very simple in
their basic work, are full of ad-hoc tests and are very low-level: you
cannot understand them actually unless you understand what kind of RTL
previous pass are expected to produce.

Look at this (7 November 2002) discussion between Zack Weinberg and Richard
Henderson to uinderstand what I mean:

zw  Staring at toplev.c for awhile leads me to believe that the right fix
    for this is to run GCSE _before_ CSE, and make GCSE aware of
    CONSTANT_P_RTX.  The appended patch does just that.  Since this is a
    drastic change, I will also check what overall effect this has on code
    quality by examining bootstrap times and the size of the generated
    compiler.
    --
rth I wonder if this means we can get rid of one of the local cse
    passes?  It wouldn't surprise me if this is indeed now redundant,
    since gcse has acquired some local propagation skills.
    --
zw  What actually happens is it blows up compiling libgcc, due to some
    construct or other that simplify-rtx.c doesn't know how to handle but
    cse's simplifiers do.  I haven't yet dug into this.
    --
rth I should have remembered this.  cse1 is needed so that we
    propagate addressof into memories.  Addressof elimination
    needs to happen before gcse because gcse can't handle
    aliasing memory and registers.
    --
zw  How hard would it be to teach gcse about aliasing
    memory and registers?
    --
rth Very.

Everybody with less than guru knowledge of gcc asks, why.  And why CSE
propagates addressof into memories.  And who takes care of it for
nonoptimizing compilation.  And whether it is even needed for nonoptimizing
compilation.  And the flow of thought goes on.

Passes that are developed right now, especially as part of the tree-ssa work
(but also Jan and Zdenek's rtlopt work), work at a different, more generic
level, and require a different kind of culture on the reader.

Paolo

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03 15:48 Richard Kenner
@ 2004-02-03 16:00 ` Steven Bosscher
  2004-02-03 21:27   ` Robert Dewar
  2004-02-03 16:07 ` Paolo Bonzini
  2004-02-03 16:35 ` law
  2 siblings, 1 reply; 171+ messages in thread
From: Steven Bosscher @ 2004-02-03 16:00 UTC (permalink / raw)
  To: Richard Kenner, paolo.bonzini; +Cc: gcc

On Tuesday 03 February 2004 16:50, Richard Kenner wrote:
>     Which you have never read (at least a modern one), if you do not know
>     what a dominator is.  My first compiler course, which used only part
>     of Appel's "Modern compiler implementation" (which is nowhere near in
>     depth with respect to Muchnick or Morgan) did teach dominators.
>
> Correct.  My last compiler course was well over a decade ago.
>
>     I'd love gcc to become a free compiler text, but it is not its purpose.
>
> Of course not, but it's always useful to define each term used at least
> to some extent.  GCC has always been pretty self-contained and I see no
> reason to change that policy at this point.

You can leave out an explanation of what a dominator is and still
be self-contained.  What's next, do we have to explain what a finite
automaton is in the sources of GCC?

Gr.
Steven

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 15:48 Richard Kenner
  2004-02-03 16:00 ` Steven Bosscher
                   ` (2 more replies)
  0 siblings, 3 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 15:48 UTC (permalink / raw)
  To: paolo.bonzini; +Cc: gcc

    Which you have never read (at least a modern one), if you do not know
    what a dominator is.  My first compiler course, which used only part
    of Appel's "Modern compiler implementation" (which is nowhere near in
    depth with respect to Muchnick or Morgan) did teach dominators.

Correct.  My last compiler course was well over a decade ago.

    I'd love gcc to become a free compiler text, but it is not its purpose.

Of course not, but it's always useful to define each term used at least
to some extent.  GCC has always been pretty self-contained and I see no
reason to change that policy at this point.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 15:40 Paolo Bonzini
  2004-02-03 21:21 ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Paolo Bonzini @ 2004-02-03 15:40 UTC (permalink / raw)
  To: kenner; +Cc: law, gcc

> > I assume that everyone knows what dominators and post-dominators are?
> > What about a dominator tree?  Value numbering? SSA form?  How using
> > value numbering on the SSA form during a dominator walk gives us
> > redundancy elimination on an almost-global scale without the need to
> > invalidate values from our hash tables?

I'd expect something like this to suffice [just guessing, this is not about
the actual tree-ssa code]:

"The value numbering pass builds on a simple dominator tree walk.  We go
through every available expression and put it into a hash table ("we give it
a number").  In the dominator tree, blocks higher in the tree must execute
before their children: thus, visiting it pre-order assures that all the
available numbered expressions are already in the hash table.  Using SSA
form, we can avoid building def-use/use-def chains, and find redundant
expressions without invalidating the values into the hash table; we only
remove the expressions from the hash table when we have visited all the
dominated blocks.

Note that the redundancy elimination is not really global because..."

I can presume that SSA form is documented (at least briefly) in the
convert-to-SSA pass.  I would not document the algorithms for convert-to-SSA
though, these are available in compiler texts without substantial flaws.
Value numbering is actually quite trivial, so no paper reference may even be
necessary, but it may be different for SSAPRE or points-to analysis.

> I can't answer for people in general, but I don't know what *any* of those
> things are and I would expect to see at least a definition of any of those
> terms in adequate documentation.  Of course, the documentation can't
> substitute for a good compiler text,

Which you have never read (at least a modern one), if you do not know what a
dominator is.  My first compiler course, which used only part of Appel's
"Modern compiler implementation" (which is nowhere near  in depth with
respect to Muchnick or Morgan) did teach dominators.

I'd love gcc to become a free compiler text, but it is not its purpose.

Paolo

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 12:20 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 12:20 UTC (permalink / raw)
  To: joern.rennecke; +Cc: gcc

    That's not a formal specification, but an informal description, just
    as we have in our comments right now.  In fact, our current comment
    goes into a little detail than that, although it is still necessarily
    incomplete.

We're talking about "informal description".  The term "specification" is not
a shorthand for "formal specification", but for "interface specification".

    You couldn't use either of these descriptions to tell you all you need
    to implement the function separately.

Of course not, but that's not its purpose.  The documentation of the
interface says what you have to know to *call* the function, not to
implement it.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-03  0:59 Richard Kenner
@ 2004-02-03 12:17 ` Joern Rennecke
  0 siblings, 0 replies; 171+ messages in thread
From: Joern Rennecke @ 2004-02-03 12:17 UTC (permalink / raw)
  To: Richard Kenner; +Cc: joern.rennecke, gcc

> Sure.  The specification is unrelated to the implementation: the function
> does whatever is needed to compile and optimize a function.  The details
> don't matter at all to the specification.

That's not a formal specification, but an informal description, just
as we have in our comments right now.  In fact, our current comment goes
into a little detail than that, although it is still necessarily incomplete.

You couldn't use either of these descriptions to tell you all you need
to implement the function separately.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03 12:12 Richard Kenner
  2004-02-03 16:46 ` Felix Lee
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-03 12:12 UTC (permalink / raw)
  To: law; +Cc: gcc

    Though sometimes it is difficult to find a more concise way to
    describe the algorithms than is found in the references.  

It's perfectly reasonable to paraphrase the description in the reference.
The reference itself, of course, will have lots more information than
the algorithm, so that description would only be a fraction of the size of 
the whole reference.

    What I tend to be particularly interested in is implementation
    details, flaws in the paper, limitations imposed due to our
    framework/implementation, differences in design decisions, etc.

Right, but that's hard to do without having a separate description of
the algorithm because the best way to do those things is inside the
narrative of the algorithm.

    One of the interesting questions I'm working through right now is to
    figure out how much I should assume the reader knows.  For example, do
    I assume that everyone knows what dominators and post-dominators are?
    What about a dominator tree?  Value numbering? SSA form?  How using
    value numbering on the SSA form during a dominator walk gives us
    redundancy elimination on an almost-global scale without the need to
    invalidate values from our hash tables?

I can't answer for people in general, but I don't know what *any* of those
things are and I would expect to see at least a definition of any of those
terms in adequate documentation.  Of course, the documentation can't
substitute for a good compiler text, but there should at least be a few
sentences saying what each thing is.

    Which brings me to the need for this kind of documentation to also
    live in the modern world of the web -- hyperlinks from the optimizer's
    high level documentation back to the underlying concepts it builds
    upon, IMHO, is much better than bringing up a zillion files in your
    favorite text editor.

In some ways, sure, but often the best way to study code is to do it
on a long plane flight and that means having everything in a couple of
well-defined places.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-30 23:18 ` Mark Mitchell
  2004-02-02 11:02   ` Lars Segerlund
@ 2004-02-03  8:09   ` law
  2004-02-03 16:44     ` Felix Lee
  1 sibling, 1 reply; 171+ messages in thread
From: law @ 2004-02-03  8:09 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Richard Kenner, gcc

In message <401AE59C.9080809@codesourcery.com>, Mark Mitchell writes:
 >Richard Kenner wrote:
 >
 >>I've been noticing for a while that there are an increasing number of files
 >>in GCC where the only overview documentation is a reference to a paper or
 >>textbook.
 >>
 >>I think this is totally unacceptable documentation and that we need to have 
 >a
 >>policy about this sort of documentation.
 >>  
 >>
 >Very little discussion in the long ensuing thread seems to relate back 
 >to this key point from Kenner's original email.  Independent of the pros 
 >and cons of Doxygen and its ilk, let's agree that documentation has to 
 >be present in the GCC source tree for the algorithms that are in use in 
 >the compiler.  If you write new code, it's good to reference papers that 
 >inspired it, but that's no excuse for good comments on the functions 
 >that explain what they do and good comments in the code that explain why 
 >it works the way it does.
 >
 >I don't think we need to officially adopt Kenner's list of policies 
 >because I think they are already implied by our current coding standards. 
 >
 >But, we do need to enforce them!
Can't argue with that.  Though sometimes it is difficult to find a more
concise way to describe the algorithms than is found in the references.
What I tend to be particularly interested in is implementation details,
flaws in the paper, limitations imposed due to our framework/implementation,
differences in design decisions, etc.

One of the interesting questions I'm working through right now is to
figure out how much I should assume the reader knows.  For example,
do I assume that everyone knows what dominators and post-dominators are?
What about a dominator tree?  Value numbering? SSA form?   How using
value numbering on the SSA form during a dominator walk gives us redundancy
elimination on an almost-global scale without the need to invalidate values
from our hash tables?

I'm going to be somewhat lucky in the specific documentation I'm working
on right now as each of these subjects largely lives in a single file where
I can document the basic concepts.  [ And in cases where we don't have that
kind of nice separation, I'll be looking to add that separation. ]

But even with that documentation, someone reading the dominator optimizer's
docs is going to have to have a good grasp on a number of underlying
concepts before the dominator optimizer's documentation is going to make
sense.

Which brings me to the need for this kind of documentation to also live
in the modern world of the web -- hyperlinks from the optimizer's high
level documentation back to the underlying concepts it builds upon,
IMHO, is much better than bringing up a zillion files in your favorite
text editor.

I doubt we'll have an integrated solution where the docs live in one place
(the source) and are automagically extracted into web pages, but if I could
do that without introducing too much clutter in the source, I definitely
would...  Sigh, sniffle.

[ Not to mention things like graphs look at lot better if you're not
  limited by ascii text :-) ]

On the subject of APIs -- as someone mentioned, there are things in the
compiler that really should be exposed purely by an API -- interfaces
into the gimplifier, dominator walker, statement linking/walking, etc.
Even if we never actually separate that code into a library, programming
as if it were a shared library would greatly help with the separation
issues that plague gcc.

jeff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-03  0:59 Richard Kenner
  2004-02-03 12:17 ` Joern Rennecke
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-02-03  0:59 UTC (permalink / raw)
  To: joern.rennecke; +Cc: gcc

    Do you think this is practical for functions like rest_of_compilation
    using *any* description language different from the implementation
    language?

Sure.  The specification is unrelated to the implementation: the function
does whatever is needed to compile and optimize a function.  The details
don't matter at all to the specification.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 17:19 Robert Dewar
@ 2004-02-02 22:02 ` Joern Rennecke
  2004-02-07  0:15   ` Kai Henningsen
  0 siblings, 1 reply; 171+ messages in thread
From: Joern Rennecke @ 2004-02-02 22:02 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Joe.Buck, jamie, asutton, dewar, gcc, kenner, law

> Of course sufficiently powerful pre and post conditions would be able
> to describe what a function does, but if these conditions are written
> in a low level language without set comprehension and quantifiers, this
> can be impractical in practice.

Do you think this is practical for functions like rest_of_compilation
using *any* description language different from the implementation
language?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 17:51 ` Jamie Lokier
@ 2004-02-02 19:28   ` Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-02 19:28 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Joe.Buck, asutton, gcc, kenner, law

Jamie Lokier wrote:

> Robert Dewar wrote:
> 
>>I would also say that it is unacceptable to start work on
>>an implementation without first creating the fully commented spec.
> 
> 
> How do you write the fully commented spec of an implementation which
> may take 10 years to figure out what it should do exactly?  You can
> easily be stuck with a spec that is not practical to implement, or
> does not behave as you hoped it would.  Sometimes the only way to
> understand the implications of the logic of a specification is to have
> an implementation to hand.

Implementing something without knowing what you are implementing
is a recipe for mess. Of course you go through a prototyping
process in which the spec can and will change. Note that the
waterfall model is largely replaced by the spiral model these
days for precisely that reason.

> To pick an example, are you saying all of the exposed mechanism of the
> VM layer of the Linux kernel must be fully specified before anyone
> begins to implement it?  That's absurdly impractical, because until
> many varying structures have been tried, nobody knows what will work
> well.  Only the highest levels of specification are possible in
> advance.

I very strongly disagree with this. You cannot implement anything
without knowing what you are implementing. The rule is quite simple
write down what you are going to implement before you implement it.
If during implementation you find problems in the spec, then you
modify the spec, but once you get into this discipline it is
surprising how little such modification is needed.

For PD/FMS, an operating system I wrote for Honeywell (included
visual interface, full development tools, multi tasking exec,
debugger, fancy file system with indexed files, database
manager and lots of other goodies), the reference manual was
completely written before a line of code was written, and
relatively few changes were made to that manual later on.

Yes, with some projects it is hard to specify so much in
advance, and that is why you must iterate. But when you
start to write a C function, you must have some idea of
what you are doing, the rule is simply, please write it
down first. Documenting after the fact is a recipe for
poor and inadequate documentation.

But probably this has got off topic, the important thing
which everyone should agree on is that at least when code
is delivered for integration, it should have full functional
specs, and that should be a prerequisite for inclusion.
Furthermore, a mere reference to a paper with an algorithm
is NOT adequate documentation.

> This is what I meant by non-waterfall development style: sometimes
> it's just _easier_ to develop an implementation and the details of its
> spec at the same time (and change all the callers as you go if there
> are any by then), because it's too hard to figure out what is
> practical to implement until you have done some of it.

The spiral model is well suited to dealing with this kind
of situation.

> However you may have a point.  Some parts of GCC are an example of
> this type of development gone awry :)

Indeed :-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 16:59                 ` Joe Buck
  2004-02-02 17:10                   ` Jamie Lokier
@ 2004-02-02 18:55                   ` Alexandre E. Kopilovitch
  1 sibling, 0 replies; 171+ messages in thread
From: Alexandre E. Kopilovitch @ 2004-02-02 18:55 UTC (permalink / raw)
  To: Joe Buck, Robert Dewar
  Cc: Andrew Sutton, gcc, Jamie Lokier, Richard Kenner, law

On Mon, 2 Feb 2004 08:59:11 -0800, Joe Buck wrote:
> Robert, the Ada interface description is far from a spec, for any
> nontrivial function.  Almost any real spec will describe restrictions
> on input, and guarantees about the output, that are not written
> in a form that the compiler parses (even in languages like Eiffel with
> support for preconditions and postconditions).  This means that we
> are back to comments.

Yes, but these comments are localized inside the specs, so for documentation
purposes there isn't too much difference between them and actual code (except
that a comment may be plain wrong -:) .

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 17:16 Robert Dewar
@ 2004-02-02 17:51 ` Jamie Lokier
  2004-02-02 19:28   ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Jamie Lokier @ 2004-02-02 17:51 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Joe.Buck, asutton, gcc, kenner, law

Robert Dewar wrote:
> I would also say that it is unacceptable to start work on
> an implementation without first creating the fully commented spec.

How do you write the fully commented spec of an implementation which
may take 10 years to figure out what it should do exactly?  You can
easily be stuck with a spec that is not practical to implement, or
does not behave as you hoped it would.  Sometimes the only way to
understand the implications of the logic of a specification is to have
an implementation to hand.

To pick an example, are you saying all of the exposed mechanism of the
VM layer of the Linux kernel must be fully specified before anyone
begins to implement it?  That's absurdly impractical, because until
many varying structures have been tried, nobody knows what will work
well.  Only the highest levels of specification are possible in
advance.

This is what I meant by non-waterfall development style: sometimes
it's just _easier_ to develop an implementation and the details of its
spec at the same time (and change all the callers as you go if there
are any by then), because it's too hard to figure out what is
practical to implement until you have done some of it.

However you may have a point.  Some parts of GCC are an example of
this type of development gone awry :)

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 17:10                   ` Jamie Lokier
@ 2004-02-02 17:30                     ` Joe Buck
  0 siblings, 0 replies; 171+ messages in thread
From: Joe Buck @ 2004-02-02 17:30 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Robert Dewar, Andrew Sutton, law, Richard Kenner, gcc

On Mon, Feb 02, 2004 at 05:10:04PM +0000, Jamie Lokier wrote:
> Joe Buck wrote:
> > Robert, the Ada interface description is far from a spec, for any
> > nontrivial function.  Almost any real spec will describe restrictions
> > on input, and guarantees about the output, that are not written
> > in a form that the compiler parses (even in languages like Eiffel with
> > support for preconditions and postconditions).  This means that we
> > are back to comments.
> 
> I would add that a real spec tells you what the function _does_ and
> sometimes how to use it, in addition to pre/postconditions :)

Taken generally enough, "what the function _does_" == "postconditions",
"how to use it" == "preconditions".

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-02 17:19 Robert Dewar
  2004-02-02 22:02 ` Joern Rennecke
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-02 17:19 UTC (permalink / raw)
  To: Joe.Buck, jamie; +Cc: asutton, dewar, gcc, kenner, law

> I would add that a real spec tells you what the function _does_ and
> sometimes how to use it, in addition to pre/postconditions :)

It also might have *some* details of the implementation if they are
important to its use, e.g. knowing whether a sort is nlogn or n**2.

Of course sufficiently powerful pre and post conditions would be able
to describe what a function does, but if these conditions are written
in a low level language without set comprehension and quantifiers, this
can be impractical in practice.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-02 17:16 Robert Dewar
  2004-02-02 17:51 ` Jamie Lokier
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-02 17:16 UTC (permalink / raw)
  To: Joe.Buck, dewar; +Cc: asutton, gcc, jamie, kenner, law

> Robert, the Ada interface description is far from a spec, for any
> nontrivial function.  Almost any real spec will describe restrictions
> on input, and guarantees about the output, that are not written
> in a form that the compiler parses (even in languages like Eiffel with
> support for preconditions and postconditions).  This means that we
> are back to comments.

Of course, but the interface description is required to be there, and it is
universal Ada practice to put the necessary additional comments in the spec.
In C, header files are not required, and although it *might* be universal
practice to create header files analogous to Ada specs, in practice it is
not. Certainly in Ada you can't go creating a package body before you
have created the package spec, and no one would think of trying :-)

By the way, for me, even for a trivial function, comments are essential. I
very much dislike the style of depending on names of parameters etc and
having the reader guess the function. To me, comments must be complete
and describe all parameters etc in detail, without depending on any
guesswork. I would also say that it is unacceptable to start work on
an implementation without first creating the fully commented spec.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 16:59                 ` Joe Buck
@ 2004-02-02 17:10                   ` Jamie Lokier
  2004-02-02 17:30                     ` Joe Buck
  2004-02-02 18:55                   ` Alexandre E. Kopilovitch
  1 sibling, 1 reply; 171+ messages in thread
From: Jamie Lokier @ 2004-02-02 17:10 UTC (permalink / raw)
  To: Joe Buck; +Cc: Robert Dewar, Andrew Sutton, law, Richard Kenner, gcc

Joe Buck wrote:
> Robert, the Ada interface description is far from a spec, for any
> nontrivial function.  Almost any real spec will describe restrictions
> on input, and guarantees about the output, that are not written
> in a form that the compiler parses (even in languages like Eiffel with
> support for preconditions and postconditions).  This means that we
> are back to comments.

I would add that a real spec tells you what the function _does_ and
sometimes how to use it, in addition to pre/postconditions :)

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 15:56               ` Robert Dewar
@ 2004-02-02 16:59                 ` Joe Buck
  2004-02-02 17:10                   ` Jamie Lokier
  2004-02-02 18:55                   ` Alexandre E. Kopilovitch
  0 siblings, 2 replies; 171+ messages in thread
From: Joe Buck @ 2004-02-02 16:59 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Jamie Lokier, Andrew Sutton, law, Richard Kenner, gcc

On Mon, Feb 02, 2004 at 10:55:21AM -0500, Robert Dewar wrote:
> Of course my anticipated response is that the spec should ALWAYS be
> written first. A nice property of Ada is that it mandates this at
> the language level (you can't compile a package body without
> compiling the spec :-)

Robert, the Ada interface description is far from a spec, for any
nontrivial function.  Almost any real spec will describe restrictions
on input, and guarantees about the output, that are not written
in a form that the compiler parses (even in languages like Eiffel with
support for preconditions and postconditions).  This means that we
are back to comments.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-02 15:40             ` Jamie Lokier
@ 2004-02-02 15:56               ` Robert Dewar
  2004-02-02 16:59                 ` Joe Buck
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-02 15:56 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andrew Sutton, law, Richard Kenner, Joe.Buck, gcc

Jamie Lokier wrote:
>     2. Implementers of a function which is not necessarily written in
>        "waterfall" style, i.e. not spec first, need to see and edit
>        documentation at the same time as the corresponding code.

Of course my anticipated response is that the spec should ALWAYS be
written first. A nice property of Ada is that it mandates this at
the language level (you can't compile a package body without
compiling the spec :-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-01 23:05           ` Robert Dewar
@ 2004-02-02 15:40             ` Jamie Lokier
  2004-02-02 15:56               ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Jamie Lokier @ 2004-02-02 15:40 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Andrew Sutton, law, Richard Kenner, Joe.Buck, gcc

Robert Dewar wrote:
> >I like full documented headers a lot.  The problem I find is what to
> >put in the implementation file?  When editing something in a .c file,
> >it's very helpful to have the textual comments which describe the
> >interface near the source which implements them, usually as a comment
> >just prior to the function it documents.
> 
> Well that's a point of disagreement, I see no reason for
> duplicating the interface specs. of course this is also
> partially dependent on tools. I am assuming that any reasonable
> environment is set up so that the spec you want is a mouse click
> away (that's certainly the way we edit any Ada source using GPS
> or GLIDE).

As long as it doesn't interrupt the editing flow, e.g. by hiding the
code I'm working on.

> >In my own programs, I tend to duplicate interface comments and whole
> >file explanations: what's in the header file is repeated in the
> >implementation file.  I hate the duplication, and the extra work
> >required to maintain it, but find it makes the source that much
> >clearer than a bunch of functions with no interface comments would be.
> 
> I think this duplication is a mistake. perhaps it reflects
> a poor tool environment?

Yes it does, but it also reflects two different needs from the documentation:

    1. Callers of a function need to see documentation and minimal
       type signatures.  For this, header files are quite good, as are
       popup documentation, "mouse click" etc.

    2. Implementers of a function which is not necessarily written in
       "waterfall" style, i.e. not spec first, need to see and edit
       documentation at the same time as the corresponding code.

The best language environment I've used for handling code with
documentation is Emacs Lisp, where a couple of keypresses brings up
the documentation for any function without interrupting the flow of
editing.  Rather than jumping to the other text, it pops up occupying
part of the screen.  If Emacs did that for other languagues such as C,
it would be exceptionally useful; it is perfect for the caller
audience.

Emacs Lisp also requires that you write interface and specifications
right inside the code which implements them.  This works very well,
and not just for me.  Look at the libraries of 3rd party Emacs Lisp
code: authors tend to document every public function instinctively.

In theory Java is similar, but the barrier to using the Javadoc
comments is higher than in Emacs Lisp, and I found myself writing Java
code without extensively documenting functions using Javadoc.  Java
code from other authors that I have worked with is similar, although I
know that any professional quality library should have Javadoc comments.

An interesting difference.  I'm sure this has a lot to do with tools,
and with the fact that Emacs Lisp comments use simple conventions
(like GNU standard comments: capitals indicate parameter names, short
flowing descriptions), whereas Javadoc requires more skilled use of
markup and is not as plain to read for the person editing the source code.

That last point may be relevant to the style of comments preferred in
GCC source...

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-30 23:18 ` Mark Mitchell
@ 2004-02-02 11:02   ` Lars Segerlund
  2004-02-03  8:09   ` law
  1 sibling, 0 replies; 171+ messages in thread
From: Lars Segerlund @ 2004-02-02 11:02 UTC (permalink / raw)
  To: gcc


 I think a nice point to make would be that nothing that's not documented gets 'in'.

 Particularily algorithms and such, this way people could discuss an issue instead for waitning for the trouble to show up.

 / Lars Segerlund.

On Fri, 30 Jan 2004 15:15:40 -0800
Mark Mitchell <mark@codesourcery.com> wrote:

> Richard Kenner wrote:
> 
> >I've been noticing for a while that there are an increasing number of files
> >in GCC where the only overview documentation is a reference to a paper or
> >textbook.
> >
> >I think this is totally unacceptable documentation and that we need to have a
> >policy about this sort of documentation.
> >  
> >
> Very little discussion in the long ensuing thread seems to relate back 
> to this key point from Kenner's original email.  Independent of the pros 
> and cons of Doxygen and its ilk, let's agree that documentation has to 
> be present in the GCC source tree for the algorithms that are in use in 
> the compiler.  If you write new code, it's good to reference papers that 
> inspired it, but that's no excuse for good comments on the functions 
> that explain what they do and good comments in the code that explain why 
> it works the way it does.
> 
> I don't think we need to officially adopt Kenner's list of policies 
> because I think they are already implied by our current coding standards. 
> 
> But, we do need to enforce them!
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> (916) 791-8304
> mark@codesourcery.com
> 

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-01 21:13         ` Jamie Lokier
@ 2004-02-01 23:05           ` Robert Dewar
  2004-02-01 23:05           ` Robert Dewar
  1 sibling, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-01 23:05 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andrew Sutton, law, Richard Kenner, Joe.Buck, gcc

Jamie Lokier wrote:
> Robert Dewar wrote:
> 
>>Now in the case of Java, there is no natural separation of specs and
>>implementation (for those of us of the Ada persuasion, this is one
>>of the serious deficiencies of the language). So you are most definitely
>>forced to use the approach of documentation extraction (assuming for
>>a moment that we are all agreeing on the merits of having the *source*
>>of the documentation reside with the source files).
> 
> 
> That _is_ the natural separation of specs and implementation in Java:
> the class source is the implementation (which you might not have
> access to), and the extracted documentation is the spec.
> 
> The mechanics of this can be seen as a way to synchronise text and
> declarations between the implementation and spec.  The fact that
> everyone does it in the direction class -> documentation is
> conventional, it's not a given.
> 
> In principle you could edit text in the spec (i.e. the documentation)
> and have changes applied automatically to the class source, but I've
> not heard of any editing environments that work that way.
> 
> (Ada and C imply _some_ duplication between the spec and
> implementation, don't they?  Every C type signature in a header file
> is repeated in the implementation; it's accepted practice that you
> have to edit both places to keep them synchronised).
> 
> 
>>I far prefer the fully documented header approach when it comes to C,
> 
> 
> I like full documented headers a lot.  The problem I find is what to
> put in the implementation file?  When editing something in a .c file,
> it's very helpful to have the textual comments which describe the
> interface near the source which implements them, usually as a comment
> just prior to the function it documents.
> 
> In my own programs, I tend to duplicate interface comments and whole
> file explanations: what's in the header file is repeated in the
> implementation file.  I hate the duplication, and the extra work
> required to maintain it, but find it makes the source that much
> clearer than a bunch of functions with no interface comments would be.
> 
> A tool which keeps the two synchronised might be helpful, but I
> haven't seen one or thought much about that.
> 
> -- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-01 21:13         ` Jamie Lokier
  2004-02-01 23:05           ` Robert Dewar
@ 2004-02-01 23:05           ` Robert Dewar
  2004-02-02 15:40             ` Jamie Lokier
  1 sibling, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-01 23:05 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andrew Sutton, law, Richard Kenner, Joe.Buck, gcc

Jamie Lokier wrote:

> That _is_ the natural separation of specs and implementation in Java:
> the class source is the implementation (which you might not have
> access to), and the extracted documentation is the spec.

Yes, of course, that's what I said. The point (and the important
difference) is that this documentation extraction is not an
integral part of the language

> The mechanics of this can be seen as a way to synchronise text and
> declarations between the implementation and spec.  The fact that
> everyone does it in the direction class -> documentation is
> conventional, it's not a given.

nor is it defined within the language

> In principle you could edit text in the spec (i.e. the documentation)
> and have changes applied automatically to the class source, but I've
> not heard of any editing environments that work that way.

Yes, indeed you can often deal with what might otherwise be seen
as deficiencies in a language with external tools (cf lint).

> (Ada and C imply _some_ duplication between the spec and
> implementation, don't they?  Every C type signature in a header file
> is repeated in the implementation; it's accepted practice that you
> have to edit both places to keep them synchronised).

Right, but that of course is a concious choice (the duplication).
Pascal for instance avoids that duplication. In practice there is
no inconvenience here, since changing the physical interface will
of course affect the implementation in any case.

> I like full documented headers a lot.  The problem I find is what to
> put in the implementation file?  When editing something in a .c file,
> it's very helpful to have the textual comments which describe the
> interface near the source which implements them, usually as a comment
> just prior to the function it documents.

Well that's a point of disagreement, I see no reason for
duplicating the interface specs. of course this is also
partially dependent on tools. I am assuming that any reasonable
environment is set up so that the spec you want is a mouse click
away (that's certainly the way we edit any Ada source using GPS
or GLIDE).

> In my own programs, I tend to duplicate interface comments and whole
> file explanations: what's in the header file is repeated in the
> implementation file.  I hate the duplication, and the extra work
> required to maintain it, but find it makes the source that much
> clearer than a bunch of functions with no interface comments would be.

I think this duplication is a mistake. perhaps it reflects
a poor tool environment?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-01 12:08       ` Robert Dewar
@ 2004-02-01 21:13         ` Jamie Lokier
  2004-02-01 23:05           ` Robert Dewar
  2004-02-01 23:05           ` Robert Dewar
  0 siblings, 2 replies; 171+ messages in thread
From: Jamie Lokier @ 2004-02-01 21:13 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Andrew Sutton, law, Richard Kenner, Joe.Buck, gcc

Robert Dewar wrote:
> Now in the case of Java, there is no natural separation of specs and
> implementation (for those of us of the Ada persuasion, this is one
> of the serious deficiencies of the language). So you are most definitely
> forced to use the approach of documentation extraction (assuming for
> a moment that we are all agreeing on the merits of having the *source*
> of the documentation reside with the source files).

That _is_ the natural separation of specs and implementation in Java:
the class source is the implementation (which you might not have
access to), and the extracted documentation is the spec.

The mechanics of this can be seen as a way to synchronise text and
declarations between the implementation and spec.  The fact that
everyone does it in the direction class -> documentation is
conventional, it's not a given.

In principle you could edit text in the spec (i.e. the documentation)
and have changes applied automatically to the class source, but I've
not heard of any editing environments that work that way.

(Ada and C imply _some_ duplication between the spec and
implementation, don't they?  Every C type signature in a header file
is repeated in the implementation; it's accepted practice that you
have to edit both places to keep them synchronised).

> I far prefer the fully documented header approach when it comes to C,

I like full documented headers a lot.  The problem I find is what to
put in the implementation file?  When editing something in a .c file,
it's very helpful to have the textual comments which describe the
interface near the source which implements them, usually as a comment
just prior to the function it documents.

In my own programs, I tend to duplicate interface comments and whole
file explanations: what's in the header file is repeated in the
implementation file.  I hate the duplication, and the extra work
required to maintain it, but find it makes the source that much
clearer than a bunch of functions with no interface comments would be.

A tool which keeps the two synchronised might be helpful, but I
haven't seen one or thought much about that.

-- Jamie

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-02-01 12:17 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-02-01 12:17 UTC (permalink / raw)
  To: asutton; +Cc: gcc

    maybe it's me, but it seems a bit odd that people would argue against
    documentation extraction in favor of having developers look at the
    implementation as a reference. but then again, i'm only a software
    engin... i mean grad student.

I haven't heard anybody make that argument.  The issue is that of
"extraction".  A lot of us fail to see that "extracting" documentation
is useful and if the method of generating it involves anotations that
make the source harder to read and write, it's a net negative.

As Robert points out, if you generate a separate file for documentation,
it's easy to have old versions of that file around without knowing it.
If all the documentation (both specification and implementation) is in
the same place as the source, you don't have that risk.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-02-01  6:19     ` Andrew Sutton
@ 2004-02-01 12:08       ` Robert Dewar
  2004-02-01 21:13         ` Jamie Lokier
  0 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-02-01 12:08 UTC (permalink / raw)
  To: Andrew Sutton; +Cc: law, Richard Kenner, Joe.Buck, gcc

Andrew Sutton wrote:

>>There is a *big* difference between looking at interface comments in
>>a header and looking at the implementation!
> 
> 
> isn't that the _point_ of an interface? i sure as hell don't look at the libc 
> implementations of posix methods to figure out what the parameters do.
> 
> sorry to keep extending this thread... it's late and i have some strong 
> feelings about the topic - especially since my masters thesis revolves around 
> the topics of program understanding, redocumentation and reverse engineering 
> (not reverse engineering like decss). maybe it's me, but it seems a bit odd 
> that people would argue against documentation extraction in favor of having 
> developers look at the implementation as a reference. but then again, i'm 
> only a software engin... i mean grad student.

You seem to be missing the point I made that you quoted above. Looking 
at interface documentation (in a C header or Ada spec) is not at ALL
the same thing as "look[ing] at the implementation". In the case of Ada,
it is universally understood that the spec of a package should act 
precisely as the documentation. There is no real point in "extracting"
this documentation, since the spec should contain exactly the needed
documentation and nothing more or less. The advantage of this approach
is that the spec is compiled by the compiler, so at least those parts
of it that are syntactic (argument types etc) are guaranteed to be
properly formed, and if not guaranteed to be correct, at least 
guaranteed to be consistent with the implementation. The comments in
a spec should be *exactly* the required interface documentation. Nothing
more and nothing else.

The GNAT library follows this style. In the reference manual, you find
for example that g-spipat.ads/adb provides Spitbol-like pattern matching.

That single sentence acts as a guidepost to where to look if you need
this kind of functionality. If you want to know the details of how to
use this package, you look in the spec file g-spipat.ads (you certainly
do NOT look in the (very complex) implementation in the body in the file
g-spipat.adb.

Now in the case of Java, there is no natural separation of specs and
implementation (for those of us of the Ada persuasion, this is one
of the serious deficiencies of the language). So you are most definitely
forced to use the approach of documentation extraction (assuming for
a moment that we are all agreeing on the merits of having the *source*
of the documentation reside with the source files).

In the case of C, we can go either way. If we want to follow the Ada
style, we treat C headers in exactly the same way as Ada programmers
would treat Ada specs, i.e. we put in these headers exactly the
required interface documentation, nothing more and nothing less,
formatted in a manner that makes it as easy to read as possible.

If you want to follow the Java style with C, you end up putting the
bare bones documentation into headers, and then extracting the
documentation from the implementation files.

There is nothing that says objectively that one of these approaches
in C is better than the other. Obviously the contrasting Ada and
Java designs show that there are strong proponents of the two
approaches. Furthermore, the opinions about which approach is
better are strongly held (Ada advocates for example consider
Ada's approach to separation of spec and body to be one of the
very important strong features of the language).

So it is quite natural that there be a significant argument about
which style is preferable when it comes to C. To me, as always,
consistency is the most important thing. That's especially true
when it comes to documentation, where unfortunately many programmers
seem to have to be lead kicking and screaming to this task :-)

I far prefer the fully documented header approach when it comes
to C, but I could live with an enforced and universal style of
annotations and document extraction. What is NOT a good idea is
to let individual programmers decide for themselves, since then
the third alternative (no good documentation at all) is far too
attractive to far too many programmers!

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
       [not found]   ` <401C3F16.3040706@gnat.com>
@ 2004-02-01  6:19     ` Andrew Sutton
  2004-02-01 12:08       ` Robert Dewar
  0 siblings, 1 reply; 171+ messages in thread
From: Andrew Sutton @ 2004-02-01  6:19 UTC (permalink / raw)
  To: law; +Cc: Robert Dewar, Richard Kenner, Joe.Buck, gcc

> There is a *big* difference between looking at interface comments in
> a header and looking at the implementation!

isn't that the _point_ of an interface? i sure as hell don't look at the libc 
implementations of posix methods to figure out what the parameters do.

sorry to keep extending this thread... it's late and i have some strong 
feelings about the topic - especially since my masters thesis revolves around 
the topics of program understanding, redocumentation and reverse engineering 
(not reverse engineering like decss). maybe it's me, but it seems a bit odd 
that people would argue against documentation extraction in favor of having 
developers look at the implementation as a reference. but then again, i'm 
only a software engin... i mean grad student.

andrew sutton
asutton@cs.kent.edu

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-31 23:52         ` Robert Dewar
@ 2004-02-01  6:08           ` Andrew Sutton
  0 siblings, 0 replies; 171+ messages in thread
From: Andrew Sutton @ 2004-02-01  6:08 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Robert Dewar, Zack Weinberg, gcc, Joe Buck, Richard Kenner,
	Diego Novillo, lars.segerlund

> > Is there anyone *except* Richard Kenner who is against this?
>
> Yes, me. I agree with Richard on this, I don't see any
> significant advantage in the separated documentation. In
> fact I suspect it in practice has the negative effect of
> out-of-date documentation wandering around.
> 
> I would certainly not pay even a minimal obfuscation cost
> in the sources for this. For one thing, it makes it harder
> work to keep the documentation up to date in the sources.
> Given most people's attitude to documentation, it is hard
> enough to get people to keep doc up to date without putting
> additional roadblocks in the way.

just a lurker's comment, but i think the point that several people were trying 
to make was that you can write a relatively static overview and use something 
like doxygen to keep the specifics of the api up to date. it shouldn't be 
that hard to document your functions.

andrew sutton
asutton@cs.kent.edu

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 15:48 ` Lars Segerlund
@ 2004-02-01  0:43   ` Robert Dewar
  0 siblings, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-02-01  0:43 UTC (permalink / raw)
  To: Lars Segerlund; +Cc: gcc

Apologies for waking this thread up :-)
I was in Rome with somewhat marginal email access!

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:37       ` Daniel Berlin
  2004-01-27 18:58         ` Ian Lance Taylor
  2004-01-28 16:42         ` Joern Rennecke
@ 2004-01-31 23:52         ` Robert Dewar
  2004-02-01  6:08           ` Andrew Sutton
  2 siblings, 1 reply; 171+ messages in thread
From: Robert Dewar @ 2004-01-31 23:52 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Zack Weinberg, gcc, Joe Buck, Richard Kenner, Diego Novillo,
	lars.segerlund

Daniel Berlin wrote:
>>
>> Dunno about anyone else, but I am all for this.
>>
> 
> Is there anyone *except* Richard Kenner who is against this?

Yes, me. I agree with Richard on this, I don't see any
significant advantage in the separated documentation. In
fact I suspect it in practice has the negative effect of
out-of-date documentation wandering around.

I would certainly not pay even a minimal obfuscation cost
in the sources for this. For one thing, it makes it harder
work to keep the documentation up to date in the sources.
Given most people's attitude to documentation, it is hard
enough to get people to keep doc up to date without putting
additional roadblocks in the way.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 15:40 Richard Kenner
  2004-01-27 15:48 ` Lars Segerlund
@ 2004-01-30 23:18 ` Mark Mitchell
  2004-02-02 11:02   ` Lars Segerlund
  2004-02-03  8:09   ` law
  1 sibling, 2 replies; 171+ messages in thread
From: Mark Mitchell @ 2004-01-30 23:18 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner wrote:

>I've been noticing for a while that there are an increasing number of files
>in GCC where the only overview documentation is a reference to a paper or
>textbook.
>
>I think this is totally unacceptable documentation and that we need to have a
>policy about this sort of documentation.
>  
>
Very little discussion in the long ensuing thread seems to relate back 
to this key point from Kenner's original email.  Independent of the pros 
and cons of Doxygen and its ilk, let's agree that documentation has to 
be present in the GCC source tree for the algorithms that are in use in 
the compiler.  If you write new code, it's good to reference papers that 
inspired it, but that's no excuse for good comments on the functions 
that explain what they do and good comments in the code that explain why 
it works the way it does.

I don't think we need to officially adopt Kenner's list of policies 
because I think they are already implied by our current coding standards. 

But, we do need to enforce them!

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-29 20:37 ` Joe Buck
  2004-01-29 22:50   ` Andrew Sutton
@ 2004-01-30 17:29   ` Robert Dewar
  1 sibling, 0 replies; 171+ messages in thread
From: Robert Dewar @ 2004-01-30 17:29 UTC (permalink / raw)
  To: Joe Buck; +Cc: Richard Kenner, law, gcc

Joe Buck wrote:

> This is a mistake, in my view.  It has always been the practice on the
> teams I work with that a header file is supposed to be self-contained,
> in the sense that a user of a module only needs to read the declarations
> and documentation in the header file to use any functions declared there.
> Wading into the source code is of course needed for debugging, but if
> studying the source code is needed to figure out restrictions on use of
> the function, that means that the header documentation is not adequate.

I strongly agree with this position. But a more general comment is that
clearly there are two very different schools of thought on C headers, so
what is important is to have a clear policy and be consistent throughout
gcc. A mixture of the two styles (document in header, document in
implementation file) is the worst of all worlds.




^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-29 20:37 ` Joe Buck
@ 2004-01-29 22:50   ` Andrew Sutton
  2004-01-30 17:29   ` Robert Dewar
  1 sibling, 0 replies; 171+ messages in thread
From: Andrew Sutton @ 2004-01-29 22:50 UTC (permalink / raw)
  To: gcc

> This is a mistake, in my view.  It has always been the practice on the
> teams I work with that a header file is supposed to be self-contained,
> in the sense that a user of a module only needs to read the declarations
> and documentation in the header file to use any functions declared there.
> Wading into the source code is of course needed for debugging, but if
> studying the source code is needed to figure out restrictions on use of
> the function, that means that the header documentation is not adequate.

just a comment... no pun intended. commenting header files is great, but if 
you're using a documentation extractor then you probably don't need to. 
besides, in an active build environment, changing the documentation in a 
header file can be a real pain if there are dependencies on the header file.

just a thought... why rely on header files for api documentation anyway. i 
mean, if i need to remind myself of the parameters to open(), i don't look at 
unistd.h, i look at the man page. it should be no different for internal 
api's either (not man pages per se, but other documentation).

andrew sutton
asutton@cs.kent.edu

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-29 20:50 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-29 20:50 UTC (permalink / raw)
  To: Joe.Buck; +Cc: gcc

    This is a mistake, in my view.  It has always been the practice on the
    teams I work with that a header file is supposed to be self-contained,
    in the sense that a user of a module only needs to read the declarations
    and documentation in the header file to use any functions declared there.

Sorry I wasn't clear: I certainly agree with that, but I don't think it's
realistic to get that changed given the pervasiveness of the opposing view.

What can be done as a compromise, and what I try to do, is to duplicate
all the information in front of a function at its declaration in the
header file. That has that effect, but is more of a maintenance burden
(though slight, due to cut-and-paste).

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-29 19:05 Richard Kenner
@ 2004-01-29 20:37 ` Joe Buck
  2004-01-29 22:50   ` Andrew Sutton
  2004-01-30 17:29   ` Robert Dewar
  0 siblings, 2 replies; 171+ messages in thread
From: Joe Buck @ 2004-01-29 20:37 UTC (permalink / raw)
  To: Richard Kenner; +Cc: law, gcc

On Thu, Jan 29, 2004 at 02:07:03PM -0500, Richard Kenner wrote:
>     Conceptually, an API should be usable without looking at the
>     implementation.
> 
> Most certainly!
> 
> In Ada, the specification and implementation are always in different files,
> so that can actually be enforced.
> 
> You can that in C too, of course, by putting the specification into the .h
> file and the implementation into the .c file, but the convention has been to
> document the function's parameters in the .c file rather than .h.

This is a mistake, in my view.  It has always been the practice on the
teams I work with that a header file is supposed to be self-contained,
in the sense that a user of a module only needs to read the declarations
and documentation in the header file to use any functions declared there.
Wading into the source code is of course needed for debugging, but if
studying the source code is needed to figure out restrictions on use of
the function, that means that the header documentation is not adequate.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-29 19:05 Richard Kenner
  2004-01-29 20:37 ` Joe Buck
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-01-29 19:05 UTC (permalink / raw)
  To: law; +Cc: gcc

    Conceptually, an API should be usable without looking at the
    implementation.

Most certainly!

In Ada, the specification and implementation are always in different files,
so that can actually be enforced.

You can that in C too, of course, by putting the specification into the .h
file and the implementation into the .c file, but the convention has been to
document the function's parameters in the .c file rather than .h.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:43   ` Ian Lance Taylor
@ 2004-01-29 18:37     ` law
  0 siblings, 0 replies; 171+ messages in thread
From: law @ 2004-01-29 18:37 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Kenner, Joe.Buck, gcc

In message <m34quhjii1.fsf@gossamer.airs.com>, Ian Lance Taylor writes:
 >law@redhat.com writes:
 >
 >> IMHO that's actually a huge long term liability as it actually discourages
 >> writing good API interfaces and documenting them, while at the same time
 >> encourages developers to actually look at the implementation to determine
 >> if the code in question actually does what they want.
 >
 >I agree with you, but extracting documentation automatically from
 >source code comments, and from the source code itself, isn't API
 >documentation either.  It's just another way of reading the source
 >code.
Agreed.  My point was that the way we do things now discourages us from
actually having well-defined APIs and encourages developers to dig into
the implementation details.

Conceptually, an API should be usable without looking at the implementation.

Jeff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 21:55 ` Phil Edwards
  2004-01-27 21:59   ` Ian Lance Taylor
@ 2004-01-29 18:33   ` law
  1 sibling, 0 replies; 171+ messages in thread
From: law @ 2004-01-29 18:33 UTC (permalink / raw)
  To: Phil Edwards; +Cc: Richard Kenner, gcc

In message <4016DC85.1040704@codesourcery.com>, Phil Edwards writes:
 >You're also ignoring the fact that we don't need to be a libfoo.a in order to
 >have an exported API.  "Functions to work with GIMPLE" would be an excellent
 >module to have listed, for example.  It certainly has an API. 
Precisely.  And there's other code which should, IMHO have a well defined
API.  Examples off the top of my head would be the dominator tree walker
and the SSA-CCP engine.

 >Kenner, nobody is arguing against having a high-level documentation overview.
Right.  In fact, I think we might as well make high-level documentation a
requirement for the tree-ssa bits.  It shouldn't be a surprise that I've
already cobbled something together for the dominator walker and the 
dominator optimizer.

Jeff


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:37       ` Daniel Berlin
  2004-01-27 18:58         ` Ian Lance Taylor
@ 2004-01-28 16:42         ` Joern Rennecke
  2004-01-31 23:52         ` Robert Dewar
  2 siblings, 0 replies; 171+ messages in thread
From: Joern Rennecke @ 2004-01-28 16:42 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: gcc, Joe Buck, Richard Kenner, Diego Novillo, lars.segerlund,
	Zack Weinberg

> > Dunno about anyone else, but I am all for this.
> >
> 
> Is there anyone *except* Richard Kenner who is against this?

It depends on how much clutteris added to the source, and also
on the impact on maintenance.

Regression testing is already quite time-consuming.  I wouldn't want to
have to check the doxygenation for every patch I install.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:17 ` Phil Edwards
  2004-01-27 21:08   ` Ian Lance Taylor
@ 2004-01-27 23:22   ` Bernd Schmidt
  1 sibling, 0 replies; 171+ messages in thread
From: Bernd Schmidt @ 2004-01-27 23:22 UTC (permalink / raw)
  To: Phil Edwards; +Cc: Richard Kenner, gcc

On Tue, 27 Jan 2004, Phil Edwards wrote:

> Richard Kenner wrote:
> >
> >     The @param and @return and whatnot are not intended for humans
> >
> > But humans will be reading them.  Every time they look at the sources.
>
> Then use a folding text editor.  /** is a clear marker to begin folding.
> I certainly never called it that.  I certainly don't dispute the need for
> writing down what you're asking for.  But we also need useful API catalogs:

I guess this debate is mainly about people's differing habits.  Personally,
I like uncluttered comments in my source code.  Cluttering them and then
hiding them with a folding editor isn't my idea of an improvement.
I don't see a need for external documentation duplicating these comments; I
don't see what is gained by having to look at two different files instead of
just one when you're working with the source.

>      http://www.jaj.com/space/phil/gccdoxy/struct___unwind___context.html
>
> and cross-references:
>
>      http://www.jaj.com/space/phil/gccdoxy/unwind-dw2-fde-glibc_8c__incl.png

I've looked at these, and I can't say I find this information useful.  But
that's probably because I'm not used to working with it.

But this is all offtopic.  FWIW, I agree with Kenner's observation that
just citing a paper isn't good enough to document an algorithm.

Bernd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:29 Richard Kenner
  2004-01-27 20:58 ` DJ Delorie
@ 2004-01-27 22:34 ` Tom Tromey
  1 sibling, 0 replies; 171+ messages in thread
From: Tom Tromey @ 2004-01-27 22:34 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

>>>>> "Richard" == Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:

Richard> In other words, I think this sort of approach is perfectly
Richard> acceptable, and a good idea, for libraries, where the API is
Richard> what counts.  But GCC is a very different animal, precisely
Richard> because of the interfactions you mention.

Well, GCC has many faces.  Parts of it definitely do need API-type
documentation.  For instance, documenting the API to (generic- or
gimple-) trees would be a great benefit to front-end writers.  OTOH
writing API documentation for every non-static function of every
tree-ssa optimizer may be less useful.

My experience with java (which heavily uses the doxygen-like javadoc)
is that documentation like this is very useful.  It isn't a complete
solution, you must also have a high-level overview and, sometimes,
tutorials.  It also works best when the javadoc is written as a
specification or contract and not a mere description of the method's
body.

Tom

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 17:51 ` Joe Buck
  2004-01-27 17:57   ` Daniel Berlin
  2004-01-27 18:04   ` Diego Novillo
@ 2004-01-27 22:12   ` Geert Bosch
  2 siblings, 0 replies; 171+ messages in thread
From: Geert Bosch @ 2004-01-27 22:12 UTC (permalink / raw)
  To: Joe Buck; +Cc: gcc, lars.segerlund, Richard Kenner

On Jan 27, 2004, at 12:50, Joe Buck wrote:
> Paper documentation is nice as well, but the way to get both, and to 
> keep
> the code and documentation consistent, is to use doxygen-style comments
> and use that to generate the documentation.

Having looked at the tree-ssa documentation for a bit, I find that
the doxygen generated documentation is nice in that it provides
cross-references. However, I find the fill-in style of documentation
useless. There is a lot of "empty" documentation, which basically
rehashes all fields of a structure with their references, but without
any description of what that field does. Since all semantic information
is obviously present to link each identifier use in the code to its
definition and all other uses, I don't see why extra markup is needed,
or why the generated documentation needs so much verbosity.

The main issue right now is lack of documentation, not the
formatting of it. Especially high-level per file overviews
and full descriptions of  datastructures (including individual
fields) and a few key  properties such as conditions for data
to be valid, initialization and main use of the data (why it exists).

   -Geert

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 21:55 ` Phil Edwards
@ 2004-01-27 21:59   ` Ian Lance Taylor
  2004-01-29 18:33   ` law
  1 sibling, 0 replies; 171+ messages in thread
From: Ian Lance Taylor @ 2004-01-27 21:59 UTC (permalink / raw)
  To: Phil Edwards; +Cc: Richard Kenner, gcc

Phil Edwards <phil@codesourcery.com> writes:

> Kenner, nobody is arguing against having a high-level documentation overview.
> Everybody agrees that it's a great idea.  Please, write one.  Those of us who
> cannot keep the source code in our heads use these crutches like API listings
> and cross-reference tools.  One is not a replacement for the other.

Conversely, nobody is arguing against API listings and cross-reference
tools.

The immediate question is whether to add annotations to comments in
the source code.  (Your examples demonstrate that these annotations
are not absolutely required for cross-reference tools.)

The original question was whether to require full descriptions of
algorithms in gcc source files, rather than just references to
academic papers.

Ian

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:51 Richard Kenner
@ 2004-01-27 21:55 ` Phil Edwards
  2004-01-27 21:59   ` Ian Lance Taylor
  2004-01-29 18:33   ` law
  0 siblings, 2 replies; 171+ messages in thread
From: Phil Edwards @ 2004-01-27 21:55 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner wrote:
>     I certainly never called it that.  I certainly don't dispute the need
>     for writing down what you're asking for.  But we also need useful API
>     catalogs:
> 
> But these aren't libraries that have APIs.  Instead, they are a
> collection of functions that implement a compiler.

How many times has this list seen an email asking, "I need to do thus-and-so,
do we have a function for that?"  I've asked it myself.  Yes, we could go
read every header file and hope.  I find a list of APIs far more handy and
far more accessible.

You're also ignoring the fact that we don't need to be a libfoo.a in order to
have an exported API.  "Functions to work with GIMPLE" would be an excellent
module to have listed, for example.  It certainly has an API.  The last summit
pointed out that GCC has nothing approaching a modular interface; everything
knows about everything else.  Claiming that internal blocks of code don't have
APIs is an excellent step towards preserving the mish-mash state.

> I'm not convinced having a printed cross-reference (or one in a format
> related to printing) is useful.  I don't see what benefit they have that
> a tags file does not.  Can you explain that?

...the heck?  You read tags file with your bare eyes, then?  I think I'm done
with this conversation.

Kenner, nobody is arguing against having a high-level documentation overview.
Everybody agrees that it's a great idea.  Please, write one.  Those of us who
cannot keep the source code in our heads use these crutches like API listings
and cross-reference tools.  One is not a replacement for the other.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 21:08   ` Ian Lance Taylor
@ 2004-01-27 21:37     ` Phil Edwards
  0 siblings, 0 replies; 171+ messages in thread
From: Phil Edwards @ 2004-01-27 21:37 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On Tue, Jan 27, 2004 at 03:58:32PM -0500, Ian Lance Taylor wrote:
> 
> Do source code annotations help produce the sorts of thing you show in
> these examples?

No, those particular examples were done in a "pretend that everything
actually has an invisible comment" mode, as an example of what we can get
out of doxygen even without commenting.  Adding annotations would help
to reduce the clutter, as by default doxygen (and some similar tools)
will only follow/graph entities that have been annotated.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:17 ` Phil Edwards
@ 2004-01-27 21:08   ` Ian Lance Taylor
  2004-01-27 21:37     ` Phil Edwards
  2004-01-27 23:22   ` Bernd Schmidt
  1 sibling, 1 reply; 171+ messages in thread
From: Ian Lance Taylor @ 2004-01-27 21:08 UTC (permalink / raw)
  To: Phil Edwards; +Cc: gcc

Phil Edwards <phil@codesourcery.com> writes:

> I certainly never called it that.  I certainly don't dispute the need for
> writing down what you're asking for.  But we also need useful API catalogs:
> 
>      http://www.jaj.com/space/phil/gccdoxy/struct___unwind___context.html
> 
> and cross-references:
> 
>      http://www.jaj.com/space/phil/gccdoxy/unwind-dw2-fde-glibc_8c__incl.png
> 
> and trying to do those by hand is just monstrously stupid.

Well, those aren't really my cup of tea, but, if you like them, it
seems entirely possible to generate them from the source code
directly.  At least, Source Navigator manages it.  If you want to also
pick up the comment immediately preceding the function, that too seems
possible.

Do source code annotations help produce the sorts of thing you show in
these examples?

Ian

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:29 Richard Kenner
@ 2004-01-27 20:58 ` DJ Delorie
  2004-01-27 22:34 ` Tom Tromey
  1 sibling, 0 replies; 171+ messages in thread
From: DJ Delorie @ 2004-01-27 20:58 UTC (permalink / raw)
  To: kenner; +Cc: ian, gcc

> In other words, I think this sort of approach is perfectly
> acceptable, and a good idea, for libraries, where the API is what
> counts.

Note that we do something similar in libiberty.  Comments before each
function are extracted directly into texinfo documents, simply to
collect the API from where it's edited to where it's readable.
DJGPP's libc does something similar - texinfo docs are kept with the
sources (one texi file per c file), and collected into the libc
reference.

But I agree that none of the comments in any of the gcc sources have
really helped me *understand* why the gcc code is designed the way it
is.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 20:51 Richard Kenner
  2004-01-27 21:55 ` Phil Edwards
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 20:51 UTC (permalink / raw)
  To: phil; +Cc: gcc

    I certainly never called it that.  I certainly don't dispute the need
    for writing down what you're asking for.  But we also need useful API
    catalogs:

But these aren't libraries that have APIs.  Instead, they are a
collection of functions that implement a compiler.

    and cross-references:

I'm not convinced having a printed cross-reference (or one in a format
related to printing) is useful.  I don't see what benefit they have that
a tags file does not.  Can you explain that?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:17     ` Daniel Berlin
  2004-01-27 20:34       ` Gabriel Dos Reis
@ 2004-01-27 20:42       ` Andrew Sutton
  1 sibling, 0 replies; 171+ messages in thread
From: Andrew Sutton @ 2004-01-27 20:42 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: Daniel Berlin, Joe.Buck, gcc, Richard Kenner

On Tuesday 27 January 2004 03:16 pm, Daniel Berlin wrote:
> > | Maybe for you, but i find that incredibly difficult to read and parse
> > | compared to the original marked up version, let alone understand.
> >
> >
> > I far much prefer Richard's version. Simple and clear to read. I've
> > tried to stay far away from the doxygen business in V3land.
> >
> 
> To each his own. For me, at least, it requires a few minutes of intense 
> staring to try to understand Richard's version, but about 5 seconds to 
> understand the first version.
> 
> Maybe it's cause law school has screwed up my ability to read.

maybe the @param, @return act as beacons and represent abstractions about the 
commented function pretty well. you don't have to read the entire comment to 
find the return type or an explanation of one of the parameters.

i'm probably reading it the same way you are. maybe if we built an WYSIWYG 
editor that would render doxygen comments on the fly, the whole world could 
be happy.

andrew sutton
asutton@cs.kent.edu

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:55 Richard Kenner
  2004-01-27 20:39 ` Diego Novillo
@ 2004-01-27 20:40 ` Laurent GUERBY
  1 sibling, 0 replies; 171+ messages in thread
From: Laurent GUERBY @ 2004-01-27 20:40 UTC (permalink / raw)
  To: Richard Kenner; +Cc: dnovillo, gcc

On Tue, 2004-01-27 at 20:55, Richard Kenner wrote:
> No, that's not the level of documentation we're talking about!
> 
> The comments in each file describe what each file does at the high
> level.  What high-level documentation of GCC would consist of is an
> overview of what the files do when put together.  That is not at all
> the same as a concatenation of all the high-level comments in file.
> They are two very different things and you cannot derive one from the
> other.  The upper-level documentation should concentrate on such
> things as what order optimizations are done and how they interact.
> These are not things that are documented into the individual source files.

BTW, does the Ada front-end and middle-end (gigi) have such high-level
documentation? For the Ada expander I remember the original paper, may
be it has been integrated as comments in one file nowadays? High
level comments in file are there IIRC, but it's been a while
since I looked at it, it may have changed for new stuff.

Laurent


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:55 Richard Kenner
@ 2004-01-27 20:39 ` Diego Novillo
  2004-01-27 20:40 ` Laurent GUERBY
  1 sibling, 0 replies; 171+ messages in thread
From: Diego Novillo @ 2004-01-27 20:39 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Tue, 2004-01-27 at 14:55, Richard Kenner wrote:

> They are two very different things and you cannot derive one from the
> other.  The upper-level documentation should concentrate on such
> things as what order optimizations are done and how they interact.
> These are not things that are documented into the individual source files.
> 
We agree on that point.  And that kind of documentation is something
that could go in a single file, say gcc.c.  I am not advocating against
design documentation.  On the contrary, we need it badly.  What I'm
saying is that we can have it together with the source code.


>     But it is possible to extract documentation from the source code
>     directly, and that is what I would like us to do for the internal API
>     and design documentation, at least.
> 
> I don't see this.  I don't understand how duplicating the low-level
> information that's in each file into a master document would improve the
> documentation quality of GCC.
>
You can have it cross referenced from the high-level documentation.  

> But from an *API* point
> of view, it's very simple: you call a function of a certain name and it
> does a certain optimization.
>
And so, you would tell your tool to extract that one single function.


Diego.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:17     ` Daniel Berlin
@ 2004-01-27 20:34       ` Gabriel Dos Reis
  2004-01-27 20:42       ` Andrew Sutton
  1 sibling, 0 replies; 171+ messages in thread
From: Gabriel Dos Reis @ 2004-01-27 20:34 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Joe.Buck, gcc, Richard Kenner

Daniel Berlin <dberlin@dberlin.org> writes:

| > | Maybe for you, but i find that incredibly difficult to read and parse
| > | compared to the original marked up version, let alone understand.
| >
| > I far much prefer Richard's version. Simple and clear to read. I've
| > tried to stay far away from the doxygen business in V3land.
| >
| 
| To each his own. For me, at least, it requires a few minutes of
| intense staring to try to understand Richard's version, but about 5
| seconds to understand the first version.
| 
| Maybe it's cause law school has screwed up my ability to read.

I have no idea whether that is the case or how law school can have
that impact.  

Oh, just to clear any misunderstanding, I'm also used to typesetting
tools like (La)TeX and I have been using them for more than a decade
to typeset scientific papers; but I find the whole machine-oriented
thingy awful to follow (that is why I much prefer Joris' TeXmacs).
The only reason I like (La)TeX is that it provides me with a far
better tool for typesetting in a constrained environment.  I do not
see such an advantage for the doxygen stuff in the case of documenting
C codes.  

Anyway, I spoke because someone wanted to know whether Kenner was the
only one against.  Other than that, I do not believe you're going to
convince right now ;-)

-- Gaby

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 20:29 Richard Kenner
  2004-01-27 20:58 ` DJ Delorie
  2004-01-27 22:34 ` Tom Tromey
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 20:29 UTC (permalink / raw)
  To: ian; +Cc: gcc

    I would say that libstdc++ is a place where doxygen-style comments are
    more useful, because the documentation for a library tends to be more
    reference style anyhow.  There are relatively few specific cases where
    different library functions must work together in interesting ways,
    and thus require additional documentation.

I agree, but would put it a little differently: because the functions
represent a published interface, in that case, the intended readership of the
comments is very different than the code, but you want to put the two
physically close together for maintainabilty purposes.

In other words, I think this sort of approach is perfectly acceptable, and a
good idea, for libraries, where the API is what counts.  But GCC is a very
different animal, precisely because of the interfactions you mention.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:44 ` Joe Buck
@ 2004-01-27 20:17   ` Phil Edwards
  0 siblings, 0 replies; 171+ messages in thread
From: Phil Edwards @ 2004-01-27 20:17 UTC (permalink / raw)
  To: Joe Buck; +Cc: Richard Kenner, gcc

On Tue, Jan 27, 2004 at 11:43:39AM -0800, Joe Buck wrote:
> On Tue, Jan 27, 2004 at 01:50:36PM -0500, Richard Kenner wrote:
> > This should simply be:
> > 
> > /* This function searches a sequence for a matching sub-sequence.
> >    FIRST1, LAST1, FIRST2 and LAST2 are all forward iterators.
> >    The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) such that
> >    *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
> >    returned, or LAST1 if no such iterator exists.  */
> > 
> > That's a *lot* cleaner and easier to read.
> 
> Fine, doxygen is cool with that.  If you write it as
> 
> /** This function searches a sequence for a matching sub-sequence.
>     FIRST1, LAST1, FIRST2 and LAST2 are all forward iterators.
>     The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) such that
>     *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
>     returned, or LAST1 if no such iterator exists.  */
> 
> it will be treated as a doxygen comment, with the first sentence as
> the brief description and the whole paragraph as the long description.
> All I changed was the extra * character.

Now look at the output.  Compare it to what the output looked like before.
Note the loss of presented information.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:08   ` Gabriel Dos Reis
@ 2004-01-27 20:17     ` Daniel Berlin
  2004-01-27 20:34       ` Gabriel Dos Reis
  2004-01-27 20:42       ` Andrew Sutton
  0 siblings, 2 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-01-27 20:17 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: Joe.Buck, gcc, Richard Kenner

> | Maybe for you, but i find that incredibly difficult to read and parse
> | compared to the original marked up version, let alone understand.
>
> I far much prefer Richard's version. Simple and clear to read. I've
> tried to stay far away from the doxygen business in V3land.
>

To each his own. For me, at least, it requires a few minutes of intense 
staring to try to understand Richard's version, but about 5 seconds to 
understand the first version.

Maybe it's cause law school has screwed up my ability to read.

> -- Gaby

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:05 Richard Kenner
  2004-01-27 20:11 ` Phil Edwards
@ 2004-01-27 20:17 ` Phil Edwards
  2004-01-27 21:08   ` Ian Lance Taylor
  2004-01-27 23:22   ` Bernd Schmidt
  1 sibling, 2 replies; 171+ messages in thread
From: Phil Edwards @ 2004-01-27 20:17 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner wrote:
> 
>     The @param and @return and whatnot are not intended for humans
> 
> But humans will be reading them.  Every time they look at the sources.

Then use a folding text editor.  /** is a clear marker to begin folding.


> I would *certainly* dispute that fact. It's far *worse* than "nothing".
> What we desperately need is well-written high-level documentation of
> methods and algorithms and how everything fits together.  Finding ways
> of automatically extracting low-level details from code and calling
> it "high-level documentation" *detracts* from that effort.

We're talking past each other.

I certainly never called it that.  I certainly don't dispute the need for
writing down what you're asking for.  But we also need useful API catalogs:

     http://www.jaj.com/space/phil/gccdoxy/struct___unwind___context.html

and cross-references:

     http://www.jaj.com/space/phil/gccdoxy/unwind-dw2-fde-glibc_8c__incl.png

and trying to do those by hand is just monstrously stupid.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 20:13 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 20:13 UTC (permalink / raw)
  To: Joe.Buck; +Cc: gcc

    Fine, doxygen is cool with that.  If you write it as

	...

    it will be treated as a doxygen comment, with the first sentence as
    the brief description and the whole paragraph as the long description.
    All I changed was the extra * character.

The issue I had was the annotations used to describe the parameters and
return value.  I certainly have no problem with the above, except if it's
viewed as replacing proper high-level documentation.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 20:05 Richard Kenner
@ 2004-01-27 20:11 ` Phil Edwards
  2004-01-27 20:17 ` Phil Edwards
  1 sibling, 0 replies; 171+ messages in thread
From: Phil Edwards @ 2004-01-27 20:11 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner wrote:
>     Do you have a suggested syntax which allows us to extract nicely formatted
>     HTML, LaTeX, and troff documentation?
> 
> No, because you can't "extract" documentation in a mechanical fashion.
> High-level documentation is not a collection of low-level details.

I've said many times on the libstdc++ list, and I'll say it here:  the goal of
extracting comments is to get a /reference/, not a textbook, not a tutorial.
Look at the libstdc++ web pages -- actually go /look/ at them -- and note that
the HOWTOs and writeups are separate from the API extractions.  They reference
each other, but neither replaces the other.

If you want high-level documentation, then look at the internals manual -- the
TeXinfo one, the badly outdated and incomplete one -- and improve it.  If you
want API documentation, then extract comments.

We can even combine them.  We can even write the high-level stuff in such a
way as to have doxygen "extract" it verbatim, formatting intact.  I don't
believe anybody's suggesting that we stop trying to write an internals manual.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:09 ` Daniel Berlin
  2004-01-27 19:13   ` Ian Lance Taylor
@ 2004-01-27 20:08   ` Gabriel Dos Reis
  2004-01-27 20:17     ` Daniel Berlin
  1 sibling, 1 reply; 171+ messages in thread
From: Gabriel Dos Reis @ 2004-01-27 20:08 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Richard Kenner, Joe.Buck, gcc

Daniel Berlin <dberlin@dberlin.org> writes:

| On Jan 27, 2004, at 1:50 PM, Richard Kenner wrote:
| 
| >   /**
| >    *  @brief Search a sequence for a matching sub-sequence.
| >    *  @param  first1  A forward iterator.
| >    *  @param  last1   A forward iterator.
| >    *  @param  first2  A forward iterator.
| >    *  @param  last2   A forward iterator.
| >    *  @return   The first iterator @c i in the range
| >    *  @p [first1,last1-(last2-first2)) such that @c *(i+N) == @p
| > *(first2+N)
| >    *  for each @c N in the range @p [0,last2-first2), or @p last1 if no
| >    *  such iterator exists.
| >
| ...
| 
| > This should simply be:
| >
| > /* This function searches a sequence for a matching sub-sequence.
| >    FIRST1, LAST1, FIRST2 and LAST2 are allforward iterators.
| >    The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2))
| > such that
| >    *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
| >    returned, or LAST1 if no such iterator exists.  */
| 
| > That's a *lot* cleaner and easier to read.
| >
| Maybe for you, but i find that incredibly difficult to read and parse
| compared to the original marked up version, let alone understand.

I far much prefer Richard's version. Simple and clear to read. I've
tried to stay far away from the doxygen business in V3land.

-- Gaby

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 20:05 Richard Kenner
  2004-01-27 20:11 ` Phil Edwards
  2004-01-27 20:17 ` Phil Edwards
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 20:05 UTC (permalink / raw)
  To: phil; +Cc: gcc

    Do you have a suggested syntax which allows us to extract nicely formatted
    HTML, LaTeX, and troff documentation?

No, because you can't "extract" documentation in a mechanical fashion.
High-level documentation is not a collection of low-level details.

    The @param and @return and whatnot are not intended for humans

But humans will be reading them.  Every time they look at the sources.

    I have complaints about doxygen myself.  But IMO it's better than anything
    else that's been concretely suggested.  And -- undisputed fact -- it's
    better than what we habe right now, which is nothing.

I would *certainly* dispute that fact. It's far *worse* than "nothing".
What we desperately need is well-written high-level documentation of
methods and algorithms and how everything fits together.  Finding ways
of automatically extracting low-level details from code and calling
it "high-level documentation" *detracts* from that effort.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:31   ` Phil Edwards
@ 2004-01-27 19:59     ` Ian Lance Taylor
  0 siblings, 0 replies; 171+ messages in thread
From: Ian Lance Taylor @ 2004-01-27 19:59 UTC (permalink / raw)
  To: Phil Edwards; +Cc: law, Richard Kenner, Joe.Buck, gcc

Phil Edwards <phil@codesourcery.com> writes:

> I'm going to regret getting into this discussion...

Yeah, me too....

> Blame the leading '*' on me.  I introduced doxygen into libstdc++, and
> all these comments have followed that initial style.

I would say that libstdc++ is a place where doxygen-style comments are
more useful, because the documentation for a library tends to be more
reference style anyhow.  There are relatively few specific cases where
different library functions must work together in interesting ways,
and thus require additional documentation.

Also, libstdc++, as an implementation of the C++ standard library, has
a variety of high quality API documentation written by other people.
Not to mention it starts from a high quality design, certainly a
design which has had much more thought put into it than the design of
the gcc internals.

> I have complaints about doxygen myself.  But IMO it's better than anything
> else that's been concretely suggested.  And -- undisputed fact -- it's
> better than what we habe right now, which is nothing.

Well, no, I think that fact is precisely what is being disputed.  If
you have something which makes it a bit harder to write source code,
and bit harder to read source code, and doesn't help otherwise, then
it is in fact better to have nothing.

Just to give you an alternate concrete suggestion, we could add
sections to the internals documentation for each major data structure,
and we could add sections to the internals documentation for each
compiler pass.  The initial versions of those docs could be pulled
from the source code and lightly edited.  We could rely on volunteers
to clean up the text over time, as already happens from time to time
with the current manuals.  We could demand good documentation for
future additions.

This is obviously a lot harder, but I think it's in the realm of the
possible.  I personally think it would be better than anything
extracted from the source code.  And I think it would be better than
nothing.

Ian

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 19:55 Richard Kenner
  2004-01-27 20:39 ` Diego Novillo
  2004-01-27 20:40 ` Laurent GUERBY
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 19:55 UTC (permalink / raw)
  To: dnovillo; +Cc: gcc

    If such high-level documentation is written inside a comment at the top
    of each file, then you can coax the publishing tool of your choice to
    emit it in a dozen different formats.

No, that's not the level of documentation we're talking about!

The comments in each file describe what each file does at the high
level.  What high-level documentation of GCC would consist of is an
overview of what the files do when put together.  That is not at all
the same as a concatenation of all the high-level comments in file.
They are two very different things and you cannot derive one from the
other.  The upper-level documentation should concentrate on such
things as what order optimizations are done and how they interact.
These are not things that are documented into the individual source files.

    But it is possible to extract documentation from the source code
    directly, and that is what I would like us to do for the internal API
    and design documentation, at least.

I don't see this.  I don't understand how duplicating the low-level
information that's in each file into a master document would improve the
documentation quality of GCC.  You can make the API argument, but even
that's weak.  Consider the optimizers, for example: you may have a 10,000
line source file that goes into extensive detail on on optimization
algorithm and has dozens of internal functions. But from an *API* point
of view, it's very simple: you call a function of a certain name and it
does a certain optimization.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:53 Richard Kenner
  2004-01-27 19:09 ` Daniel Berlin
  2004-01-27 19:20 ` law
@ 2004-01-27 19:44 ` Joe Buck
  2004-01-27 20:17   ` Phil Edwards
  2 siblings, 1 reply; 171+ messages in thread
From: Joe Buck @ 2004-01-27 19:44 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Tue, Jan 27, 2004 at 01:50:36PM -0500, Richard Kenner wrote:
> This should simply be:
> 
> /* This function searches a sequence for a matching sub-sequence.
>    FIRST1, LAST1, FIRST2 and LAST2 are all forward iterators.
>    The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) such that
>    *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
>    returned, or LAST1 if no such iterator exists.  */
> 
> That's a *lot* cleaner and easier to read.

Fine, doxygen is cool with that.  If you write it as

/** This function searches a sequence for a matching sub-sequence.
    FIRST1, LAST1, FIRST2 and LAST2 are all forward iterators.
    The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) such that
    *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
    returned, or LAST1 if no such iterator exists.  */

it will be treated as a doxygen comment, with the first sentence as
the brief description and the whole paragraph as the long description.
All I changed was the extra * character.





^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:20 ` law
  2004-01-27 19:31   ` Phil Edwards
@ 2004-01-27 19:43   ` Ian Lance Taylor
  2004-01-29 18:37     ` law
       [not found]   ` <401C3F16.3040706@gnat.com>
  2 siblings, 1 reply; 171+ messages in thread
From: Ian Lance Taylor @ 2004-01-27 19:43 UTC (permalink / raw)
  To: law; +Cc: Richard Kenner, Joe.Buck, gcc

law@redhat.com writes:

> IMHO that's actually a huge long term liability as it actually discourages
> writing good API interfaces and documenting them, while at the same time
> encourages developers to actually look at the implementation to determine
> if the code in question actually does what they want.

I agree with you, but extracting documentation automatically from
source code comments, and from the source code itself, isn't API
documentation either.  It's just another way of reading the source
code.

Source code and documentation aren't the same thing.  The only case
where I've seen source code work as documentation is Web (that is, the
TeX implementation language), and Web is much harder to write than the
combined effort of writing source code and separate documentation.

The reason source code and documentation aren't the same thing is that
they serve different purposes.  Source code tells you what the
function does.  Documentation tells you how that fits into a larger
plan.  Documentation is organized for a human reader.  Source code is
organized for the compiler.

The advantage of Web is that it lets you split apart a single function
into different logical portions, and lets you group together the
logical portions of several different functions.  The most obvious use
is that you can pull the error cases out of the function, leaving the
reader looking only at the mainline case, which is generally much
easier to understand and to document.  Unfortunately, as noted, it's
very hard to program this way.

As I said, I'm not strongly opposed to extracting documentation from
the gcc sources.  But it's no substitute for actual documentation.

Ian

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 19:43 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 19:43 UTC (permalink / raw)
  To: law; +Cc: gcc

    And totally useless unless one goes into the source code to read it.

I think you're making that statement irrespective of the formatting of
the comments and I agree.

    IMHO that's actually a huge long term liability as it actually
    discourages writing good API interfaces and documenting them, while at
    the same time encourages developers to actually look at the
    implementation to determine if the code in question actually does what
    they want.

I'm not sure what "that" means in your first line above.

The problem I have with automatic documentation systems is that they
encourage a "fill in the checklist" mentality to documentation.  In other
words, programmers tend to feel that since they've filled in all the blanks,
they've written good documentation. Lots of industrial code looks that way.

But actually the quality of documention can't be measure by any metric.  The
ideal documentation is what's needed to explain the code. Less than that
isn't sufficient, but more than that adds clutter.  No mechanical system can
create that quality of work.  There are 500-line functions that are totally
clear without any comments and where adding comments is clutter and there are
10-line functions that need a few paragraphs of comments.

Writing quality documentation is an art, much like writing good code.
Mechanizing things doesn't help.  It's far easier to do a good job
documenting a function if you're writing a paragraph consisting of English
sentences than filling in the blanks.

For example, if I have a function with four parameter named FIRST1, LAST1,
FIRST2, and LAST2, simply saying "The four parameters are the ranges of two
insn chains that we are comparing" is completely fine to document what the
parameters to.  Writing:

	* @param first1 RTL pointer to head of insn chain 1
	* @param last1  RTL pointer to tail of insn chain 1
	* @param first2 RTL pointer to head of insn chain 2
	* @param last2  RTL pointer to tail of insn chain 2

actually says *less* than the above sentence and is a *huge* amount of
totally unnecessary clutter.

    Now having said that, I've never liked reading doxygen annotated
    comments; I find the annotations make the comments harder to read.
    Unfortunately, I don't have a better solution.

To what problem?

Also, can I take this discussion of annotations to mean that everybody agrees
with my original message about using papers and textbooks as substitutes to
documentation?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:20 ` law
@ 2004-01-27 19:31   ` Phil Edwards
  2004-01-27 19:59     ` Ian Lance Taylor
  2004-01-27 19:43   ` Ian Lance Taylor
       [not found]   ` <401C3F16.3040706@gnat.com>
  2 siblings, 1 reply; 171+ messages in thread
From: Phil Edwards @ 2004-01-27 19:31 UTC (permalink / raw)
  To: law; +Cc: Richard Kenner, Joe.Buck, gcc


I'm going to regret getting into this discussion...


On Tue, Jan 27, 2004 at 12:12:22PM -0700, law@redhat.com wrote:
> In message <10401271850.AA29985@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
>  >  /**
>  >   *  @brief Search a sequence for a matching sub-sequence.
>  >   *  @param  first1  A forward iterator.
>  >   *  @param  last1   A forward iterator.
>  >   *  @param  first2  A forward iterator.
>  >   *  @param  last2   A forward iterator.
>  >   *  @return   The first iterator @c i in the range
>  >   *  @p [first1,last1-(last2-first2)) such that @c *(i+N) == @p *(first2+N)
>  >   *  for each @c N in the range @p [0,last2-first2), or @p last1 if no
>  >   *  such iterator exists.
>  >
>  >This *exactly* the sort of clutter I'm very strongly against.  It makes it
>  >*far* harder to read the source file with those "@param" tokens in the way.

Do you have a suggested syntax which allows us to extract nicely formatted
HTML, LaTeX, and troff documentation?


>  >And the "*" in each line is a violation of the GNU coding conventions, in
>  >addition to adding yet more clutter.  Sure, external documentation can be
>  >handy, but not at that huge readability cost.

Blame the leading '*' on me.  I introduced doxygen into libstdc++, and
all these comments have followed that initial style.


>  >This should simply be:
>  >
>  >/* This function searches a sequence for a matching sub-sequence.
>  >   FIRST1, LAST1, FIRST2 and LAST2 are allforward iterators.
>  >   The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) such that
>  >   *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
>  >   returned, or LAST1 if no such iterator exists.  */
>  >
>  >That's a *lot* cleaner and easier to read.
> And totally useless unless one goes into the source code to read it.

Exactly.  The @param and @return and whatnot are not intended for humans
until after they've been processed.  If you think variables look good in
all caps in the source, fine, that's a valid opinion, but they look like
crap in HTML and LaTeX output.

I have complaints about doxygen myself.  But IMO it's better than anything
else that's been concretely suggested.  And -- undisputed fact -- it's
better than what we habe right now, which is nothing.

-- 
Besides a mathematical inclination, an exceptionally good mastery of
one's native tongue is the most vital asset of a competent programmer.
                                          - Edsger Dijkstra, 1930-2002

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 19:28 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 19:28 UTC (permalink / raw)
  To: ian; +Cc: gcc

    An organized reference to gcc implementation internals would be very
    useful.  But that's not what we get from documentation extracted from
    the source code.  What we get is a disorganized set of reference
    information.  It tells you the information that's easy to find out in
    various other ways, but it doesn't tell you what you really need to
    know, which is the implicit assumptions and the underlying ideas.

That's exactly my point.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:53 Richard Kenner
  2004-01-27 19:09 ` Daniel Berlin
@ 2004-01-27 19:20 ` law
  2004-01-27 19:31   ` Phil Edwards
                     ` (2 more replies)
  2004-01-27 19:44 ` Joe Buck
  2 siblings, 3 replies; 171+ messages in thread
From: law @ 2004-01-27 19:20 UTC (permalink / raw)
  To: Richard Kenner; +Cc: Joe.Buck, gcc

In message <10401271850.AA29985@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >  /**
 >   *  @brief Search a sequence for a matching sub-sequence.
 >   *  @param  first1  A forward iterator.
 >   *  @param  last1   A forward iterator.
 >   *  @param  first2  A forward iterator.
 >   *  @param  last2   A forward iterator.
 >   *  @return   The first iterator @c i in the range
 >   *  @p [first1,last1-(last2-first2)) such that @c *(i+N) == @p *(first2+N)
 >   *  for each @c N in the range @p [0,last2-first2), or @p last1 if no
 >   *  such iterator exists.
 >
 >This *exactly* the sort of clutter I'm very strongly against.  It makes it
 >*far* harder to read the source file with those "@param" tokens in the way.
 >And the "*" in each line is a violation of the GNU coding conventions, in
 >addition to adding yet more clutter.  Sure, external documentation can be
 >handy, but not at that huge readability cost.
 >
 >This should simply be:
 >
 >/* This function searches a sequence for a matching sub-sequence.
 >   FIRST1, LAST1, FIRST2 and LAST2 are allforward iterators.
 >   The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) such that
 >   *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
 >   returned, or LAST1 if no such iterator exists.  */
 >
 >That's a *lot* cleaner and easier to read.
And totally useless unless one goes into the source code to read it.

IMHO that's actually a huge long term liability as it actually discourages
writing good API interfaces and documenting them, while at the same time
encourages developers to actually look at the implementation to determine
if the code in question actually does what they want.

Now having said that, I've never liked reading doxygen annotated comments;
I find the annotations make the comments harder to read.  Unfortunately, I
don't have a better solution.




jeff




^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 19:09 ` Daniel Berlin
@ 2004-01-27 19:13   ` Ian Lance Taylor
  2004-01-27 20:08   ` Gabriel Dos Reis
  1 sibling, 0 replies; 171+ messages in thread
From: Ian Lance Taylor @ 2004-01-27 19:13 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Richard Kenner, Joe.Buck, gcc

Daniel Berlin <dberlin@dberlin.org> writes:

> > That's a *lot* cleaner and easier to read.
> >
> Maybe for you, but i find that incredibly difficult to read and parse
> compared to the original marked up version, let alone understand.

I find Richard's version easier to understand.

Interesting.

Ian

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:53 Richard Kenner
@ 2004-01-27 19:09 ` Daniel Berlin
  2004-01-27 19:13   ` Ian Lance Taylor
  2004-01-27 20:08   ` Gabriel Dos Reis
  2004-01-27 19:20 ` law
  2004-01-27 19:44 ` Joe Buck
  2 siblings, 2 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-01-27 19:09 UTC (permalink / raw)
  To: Richard Kenner; +Cc: Joe.Buck, gcc


On Jan 27, 2004, at 1:50 PM, Richard Kenner wrote:

>   /**
>    *  @brief Search a sequence for a matching sub-sequence.
>    *  @param  first1  A forward iterator.
>    *  @param  last1   A forward iterator.
>    *  @param  first2  A forward iterator.
>    *  @param  last2   A forward iterator.
>    *  @return   The first iterator @c i in the range
>    *  @p [first1,last1-(last2-first2)) such that @c *(i+N) == @p 
> *(first2+N)
>    *  for each @c N in the range @p [0,last2-first2), or @p last1 if no
>    *  such iterator exists.
>
...

> This should simply be:
>
> /* This function searches a sequence for a matching sub-sequence.
>    FIRST1, LAST1, FIRST2 and LAST2 are allforward iterators.
>    The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) 
> such that
>    *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
>    returned, or LAST1 if no such iterator exists.  */

> That's a *lot* cleaner and easier to read.
>
Maybe for you, but i find that incredibly difficult to read and parse 
compared to the original marked up version, let alone understand.


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:37       ` Daniel Berlin
@ 2004-01-27 18:58         ` Ian Lance Taylor
  2004-01-28 16:42         ` Joern Rennecke
  2004-01-31 23:52         ` Robert Dewar
  2 siblings, 0 replies; 171+ messages in thread
From: Ian Lance Taylor @ 2004-01-27 18:58 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Zack Weinberg, gcc, Joe Buck, Richard Kenner, Diego Novillo,
	lars.segerlund

Daniel Berlin <dberlin@dberlin.org> writes:

> > Dunno about anyone else, but I am all for this.
> >
> 
> Is there anyone *except* Richard Kenner who is against this?

I've never found this sort of documentation to be useful.  If I want
to find out what a function does, I go look at it in the source code.
M-. is easy to type in Emacs.  If I wanted anything more, I would use
Source Navigator, but I don't.

An organized reference to gcc implementation internals would be very
useful.  But that's not what we get from documentation extracted from
the source code.  What we get is a disorganized set of reference
information.  It tells you the information that's easy to find out in
various other ways, but it doesn't tell you what you really need to
know, which is the implicit assumptions and the underlying ideas.

Also, I find source annotations to be slightly irritating in practice.

I don't feel terribly strongly about this.  But overall I would find
it to be a minor hindrance and no help.

Ian

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 18:53 Richard Kenner
  2004-01-27 19:09 ` Daniel Berlin
                   ` (2 more replies)
  0 siblings, 3 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 18:53 UTC (permalink / raw)
  To: Joe.Buck; +Cc: gcc

  /**
   *  @brief Search a sequence for a matching sub-sequence.
   *  @param  first1  A forward iterator.
   *  @param  last1   A forward iterator.
   *  @param  first2  A forward iterator.
   *  @param  last2   A forward iterator.
   *  @return   The first iterator @c i in the range
   *  @p [first1,last1-(last2-first2)) such that @c *(i+N) == @p *(first2+N)
   *  for each @c N in the range @p [0,last2-first2), or @p last1 if no
   *  such iterator exists.

This *exactly* the sort of clutter I'm very strongly against.  It makes it
*far* harder to read the source file with those "@param" tokens in the way.
And the "*" in each line is a violation of the GNU coding conventions, in
addition to adding yet more clutter.  Sure, external documentation can be
handy, but not at that huge readability cost.

This should simply be:

/* This function searches a sequence for a matching sub-sequence.
   FIRST1, LAST1, FIRST2 and LAST2 are allforward iterators.
   The first iterator J in the range [FIRST1,LAST1-(LAST2-FIRST2)) such that
   *(J+N) == *(FIRST2+N) for each @c N in the range [0,LAST2-FIRST2) is
   returned, or LAST1 if no such iterator exists.  */

That's a *lot* cleaner and easier to read.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:19 Richard Kenner
  2004-01-27 18:24 ` Diego Novillo
@ 2004-01-27 18:38 ` Andrew Sutton
  1 sibling, 0 replies; 171+ messages in thread
From: Andrew Sutton @ 2004-01-27 18:38 UTC (permalink / raw)
  To: gcc

On Tuesday 27 January 2004 01:09 pm, Richard Kenner wrote:
>     Which reminds me.  What happened to the proposal that was discussed in
>     the last GCC summit to start adding doxygen markers to the source code
>     comments?
> 
> I'd be very strongly *against* such a proposal.  I don't see anything but
> the slightest advantage of such "external" documentation, certainly not
> worth cluttering up code with "markers".
> 
> If there is a value to having such external documentation, let's teach it
> how to read GCC's documentation style, which already has enough
> information.

just a quick point of reference... almost any documentation on the topic of, 
well... documentation basically says that external documentation is a very, 
very good thing to have - there are whole academic conferences on the topic 
(SIGDOC, DocEng, etc.) and this turns out to be quite a big topic in software 
maintenance/evolution and program understanding.

besides, limiting yourself to internal (code)  documentation can pose 
significant barriers to people coming on to the project (think new 
contributors/bug fixers). not everybody learns from the bottom up. using 
documentation generators like doxygen provides a pretty convenient way to 
provide a high-level picture of the system and its components. so yes... 
there is most definitely value there.

personally, i'd be for doxygen comments, but that's me and i'm just a lurker 
on this list :)

andrew sutton
asutton@cs.kent.edu

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:24     ` Zack Weinberg
@ 2004-01-27 18:37       ` Daniel Berlin
  2004-01-27 18:58         ` Ian Lance Taylor
                           ` (2 more replies)
  0 siblings, 3 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-01-27 18:37 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: gcc, Joe Buck, Richard Kenner, Diego Novillo, lars.segerlund

>
> Dunno about anyone else, but I am all for this.
>

Is there anyone *except* Richard Kenner who is against this?

> zw

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:04   ` Diego Novillo
@ 2004-01-27 18:24     ` Zack Weinberg
  2004-01-27 18:37       ` Daniel Berlin
  0 siblings, 1 reply; 171+ messages in thread
From: Zack Weinberg @ 2004-01-27 18:24 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Joe Buck, Richard Kenner, lars.segerlund, gcc

Diego Novillo <dnovillo@redhat.com> writes:

> Which reminds me.  What happened to the proposal that was discussed in
> the last GCC summit to start adding doxygen markers to the source code
> comments?
>
> Right now, I run a simplistic script over some files in the tree-ssa
> branch to add the basic '/**' markers to get online documentation.  It'd
> be nice if we could start using the full range of doxygen mark ups in
> the comments (if not doxygen, any other similar tool would do).

Dunno about anyone else, but I am all for this.

zw

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:19 Richard Kenner
@ 2004-01-27 18:24 ` Diego Novillo
  2004-01-27 18:38 ` Andrew Sutton
  1 sibling, 0 replies; 171+ messages in thread
From: Diego Novillo @ 2004-01-27 18:24 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Tue, 2004-01-27 at 13:09, Richard Kenner wrote:
>     Which reminds me.  What happened to the proposal that was discussed in
>     the last GCC summit to start adding doxygen markers to the source code
>     comments?
> 
> I'd be very strongly *against* such a proposal.  I don't see anything but
> the slightest advantage of such "external" documentation, certainly not
> worth cluttering up code with "markers".
> 
*shrug*  No strong opinions here.  Formatting markers don't bother me.

> If there is a value to having such external documentation, let's teach it
> how to read GCC's documentation style, which already has enough information.
>
Which is what we do with the script to publish tree-ssa source
documentation.


Diego.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 18:03 Richard Kenner
@ 2004-01-27 18:20 ` Joe Buck
  0 siblings, 0 replies; 171+ messages in thread
From: Joe Buck @ 2004-01-27 18:20 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Tue, Jan 27, 2004 at 01:01:32PM -0500, Richard Kenner wrote:
>     Paper documentation is nice as well, but the way to get both, and to
>     keep the code and documentation consistent, is to use doxygen-style
>     comments and use that to generate the documentation.
> 
> Agreed.  I have no problems with having a *second* copy, as long as the
> primary one is in the file and "doxygen-style" does't clutter up the
> source and make it harder to read the comments.

> I don't know anything about the annotations, but if it were a tradeoff
> between making the source slightly less clean to get a secondary documention,
> I'd vote *against* the secondary documention since it isn't nearly as
> useful as a easy-to-read documentation in the source.

Here is an example of a doxygen comment that describes a function.
It describes one of the std::search functions in the standard C++ library.

See

http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/namespacestd.html#a287

for the output.

The directives @p and @c just make the output prettier and aren't strictly
necessary, and in fact it appears that they are misused here: @p or @c
seem to set the next identifier in a fixed-width font, but they seem to
do nothing if the next token is not an identifier.

  /**
   *  @brief Search a sequence for a matching sub-sequence.
   *  @param  first1  A forward iterator.
   *  @param  last1   A forward iterator.
   *  @param  first2  A forward iterator.
   *  @param  last2   A forward iterator.
   *  @return   The first iterator @c i in the range
   *  @p [first1,last1-(last2-first2)) such that @c *(i+N) == @p *(first2+N)
   *  for each @c N in the range @p [0,last2-first2), or @p last1 if no
   *  such iterator exists.
   *
   *  Searches the range @p [first1,last1) for a sub-sequence that compares
   *  equal value-by-value with the sequence given by @p [first2,last2) and
   *  returns an iterator to the first element of the sub-sequence, or
   *  @p last1 if the sub-sequence is not found.
   *
   *  Because the sub-sequence must lie completely within the range
   *  @p [first1,last1) it must start at a position less than
   *  @p last1-(last2-first2) where @p last2-first2 is the length of the
   *  sub-sequence.
   *  This means that the returned iterator @c i will be in the range
   *  @p [first1,last1-(last2-first2))
  */

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 18:19 Richard Kenner
  2004-01-27 18:24 ` Diego Novillo
  2004-01-27 18:38 ` Andrew Sutton
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 18:19 UTC (permalink / raw)
  To: dnovillo; +Cc: gcc

    Which reminds me.  What happened to the proposal that was discussed in
    the last GCC summit to start adding doxygen markers to the source code
    comments?

I'd be very strongly *against* such a proposal.  I don't see anything but
the slightest advantage of such "external" documentation, certainly not
worth cluttering up code with "markers".

If there is a value to having such external documentation, let's teach it
how to read GCC's documentation style, which already has enough information.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 18:06 Richard Kenner
  0 siblings, 0 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 18:06 UTC (permalink / raw)
  To: dberlin; +Cc: gcc

    I really liked it when tree-ssa had doxygen generated comments.
    They have a nice style and form to them that remind you to make sure 
    you've added docs for all the relevant things in a function (like 
    return value, etc).

I just looked at what appears to be a manual for doxygen and I must say that
I very much *do not* like those annotations.  I think the very slight advantage
of having a separate document does not justify the clutter they cause in the
source files.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 17:51 ` Joe Buck
  2004-01-27 17:57   ` Daniel Berlin
@ 2004-01-27 18:04   ` Diego Novillo
  2004-01-27 18:24     ` Zack Weinberg
  2004-01-27 22:12   ` Geert Bosch
  2 siblings, 1 reply; 171+ messages in thread
From: Diego Novillo @ 2004-01-27 18:04 UTC (permalink / raw)
  To: Joe Buck; +Cc: Richard Kenner, lars.segerlund, gcc

On Tue, 2004-01-27 at 12:50, Joe Buck wrote:
> >     Perhaps the documentation doesn't have to be in the file, but there
> >     has to be some texinfo documentation for it ?
> 
> On Tue, Jan 27, 2004 at 10:50:28AM -0500, Richard Kenner wrote:
> > No, I think it has to be in the file.  That's the most logical place for it
> > and the only place that has any hope of being maintained as the file changes.
> 
> Paper documentation is nice as well, but the way to get both, and to keep
> the code and documentation consistent, is to use doxygen-style comments
> and use that to generate the documentation.
>
Which reminds me.  What happened to the proposal that was discussed in
the last GCC summit to start adding doxygen markers to the source code
comments?

Right now, I run a simplistic script over some files in the tree-ssa
branch to add the basic '/**' markers to get online documentation.  It'd
be nice if we could start using the full range of doxygen mark ups in
the comments (if not doxygen, any other similar tool would do).


Diego.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 18:03 Richard Kenner
  2004-01-27 18:20 ` Joe Buck
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 18:03 UTC (permalink / raw)
  To: Joe.Buck; +Cc: gcc

    Paper documentation is nice as well, but the way to get both, and to
    keep the code and documentation consistent, is to use doxygen-style
    comments and use that to generate the documentation.

Agreed.  I have no problems with having a *second* copy, as long as the
primary one is in the file and "doxygen-style" does't clutter up the
source and make it harder to read the comments.

I don't know anything about the annotations, but if it were a tradeoff
between making the source slightly less clean to get a secondary documention,
I'd vote *against* the secondary documention since it isn't nearly as
useful as a easy-to-read documentation in the source.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 17:51 ` Joe Buck
@ 2004-01-27 17:57   ` Daniel Berlin
  2004-01-27 18:04   ` Diego Novillo
  2004-01-27 22:12   ` Geert Bosch
  2 siblings, 0 replies; 171+ messages in thread
From: Daniel Berlin @ 2004-01-27 17:57 UTC (permalink / raw)
  To: Joe Buck; +Cc: gcc, lars.segerlund, Richard Kenner


On Jan 27, 2004, at 12:50 PM, Joe Buck wrote:

>>     Perhaps the documentation doesn't have to be in the file, but 
>> there
>>     has to be some texinfo documentation for it ?
>
> On Tue, Jan 27, 2004 at 10:50:28AM -0500, Richard Kenner wrote:
>> No, I think it has to be in the file.  That's the most logical place 
>> for it
>> and the only place that has any hope of being maintained as the file 
>> changes.
>
> Paper documentation is nice as well, but the way to get both, and to 
> keep
> the code and documentation consistent, is to use doxygen-style comments
> and use that to generate the documentation.
>

That would be nice.
I really liked it when tree-ssa had doxygen generated comments.
They have a nice style and form to them that remind you to make sure 
you've added docs for all the relevant things in a function (like 
return value, etc).
--Dan

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 15:48 Richard Kenner
@ 2004-01-27 17:51 ` Joe Buck
  2004-01-27 17:57   ` Daniel Berlin
                     ` (2 more replies)
  0 siblings, 3 replies; 171+ messages in thread
From: Joe Buck @ 2004-01-27 17:51 UTC (permalink / raw)
  To: Richard Kenner; +Cc: lars.segerlund, gcc

>     Perhaps the documentation doesn't have to be in the file, but there
>     has to be some texinfo documentation for it ?

On Tue, Jan 27, 2004 at 10:50:28AM -0500, Richard Kenner wrote:
> No, I think it has to be in the file.  That's the most logical place for it
> and the only place that has any hope of being maintained as the file changes.

Paper documentation is nice as well, but the way to get both, and to keep
the code and documentation consistent, is to use doxygen-style comments
and use that to generate the documentation.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
@ 2004-01-27 15:48 Richard Kenner
  2004-01-27 17:51 ` Joe Buck
  0 siblings, 1 reply; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 15:48 UTC (permalink / raw)
  To: lars.segerlund; +Cc: gcc

    Perhaps the documentation doesn't have to be in the file, but there
    has to be some texinfo documentation for it ?

No, I think it has to be in the file.  That's the most logical place for it
and the only place that has any hope of being maintained as the file changes.

    If we had this describing every file in gccint it would be sooo nice !

Most file do have internal documentation saying what they are doing.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: "Documentation by paper"
  2004-01-27 15:40 Richard Kenner
@ 2004-01-27 15:48 ` Lars Segerlund
  2004-02-01  0:43   ` Robert Dewar
  2004-01-30 23:18 ` Mark Mitchell
  1 sibling, 1 reply; 171+ messages in thread
From: Lars Segerlund @ 2004-01-27 15:48 UTC (permalink / raw)
  To: gcc


 Perhaps the documentation doesn't have to be in the file, but there has to be some texinfo documentation for it ?

 If we had this describing every file in gccint it would be sooo nice !

 I suspect that would make it much  easier to get into gcc.

 A good crossreference/algorithmic overview doesn't exist as it is today.

 / regards, Lars Segerlund.


On Tue, 27 Jan 04 10:24:56 EST
kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:

> I've been noticing for a while that there are an increasing number of files
> in GCC where the only overview documentation is a reference to a paper or
> textbook.
> 
> I think this is totally unacceptable documentation and that we need to have a
> policy about this sort of documentation.
> 
> My reasons are as follows:
> 
> (1) It's unreasonable for any person who wants to work on that file to have
> to locate and read the paper.  Sure, if you are doing significant algorithmic
> work on the file, you need to have read the reference.  But for small things,
> or most debugging, you don't need that much information and most changes are
> small things.
> 
> (2) It's rare to implement an algorithm *exactly* as presented in the paper,
> so we'd need a list of changes.  That means the description is a combination
> of a reference and a set of changes, which is complex.
> 
> (3) A critical part of the overview is identifying what part of the code and
> data structures does what.  If you have a complete description of the
> algorithm in the file, this flows very naturally because you intersperse
> references to the functions and structs into your description of the
> algorithm.  Otherwise, you have to use odd language to link the
> implementation with the algorithm.  This link is perhaps the most critical
> part of the documentation but is the part most commonly left out.
> 
> 
> Certainly the reference needs to be there as well, both for credit purposes
> and to supply further details.  For example, normally a critical part of
> documentation is not just what's being done but *why* it's being done and why
> other things *aren't* being done.  Here, the paper can serve those purposes.
> 
> As I've been getting into some of the newer parts of the compiler, I've been
> very hampered by the lack of proper documentation.  I think improving this
> documentation ought to be one of the major goals of 3.5 aside from any other
> changes.
> 
> I'd like to get agreement on the following documentation standard for the
> cases where papers or texts are referenced:
> 
> (1) The algorithm be fully-enough described in a blocks of comments in the
> front of the file that the goals and methods of the algorithm can be
> completely understood just from those comments.
> 
> (2) As part of that narrative, any differences between the algorithm in the
> reference and the code should be explained.  Likewise, any implementation
> choices should be pointed out.
> 
> (3) Again as part of the narrative, each major function and data structure
> should be mentioned.
> 
> (4) The reference should be supplied in a clear manner.  If it available
> online, a URL should be supplied.
> 
> Of course, each file should also meet the minimum documentation requirements
> in other areas:
> 
> (1) There should be a block of comments in front of every function giving the
> external specification of each function, including the meaning of every
> argument.
> 
> (2) Within each function, there should be enough comments to explain the role
> of each part of the function in implementing those external specifications.
> At a minimum, this means a sentence or two for each non-trivial "if"
> statement or loop.  These should not be a translation of the code into
> English, but provide the linkage between specification and implementation.
> 
> (3) Any design choices, especially choices about *not* doing something, need
> to be clear documented.
> 
> Does everybody agree with these standards?  If we can't get concensus, I'd
> like to ask the SC to look at this issue.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* "Documentation by paper"
@ 2004-01-27 15:40 Richard Kenner
  2004-01-27 15:48 ` Lars Segerlund
  2004-01-30 23:18 ` Mark Mitchell
  0 siblings, 2 replies; 171+ messages in thread
From: Richard Kenner @ 2004-01-27 15:40 UTC (permalink / raw)
  To: gcc

I've been noticing for a while that there are an increasing number of files
in GCC where the only overview documentation is a reference to a paper or
textbook.

I think this is totally unacceptable documentation and that we need to have a
policy about this sort of documentation.

My reasons are as follows:

(1) It's unreasonable for any person who wants to work on that file to have
to locate and read the paper.  Sure, if you are doing significant algorithmic
work on the file, you need to have read the reference.  But for small things,
or most debugging, you don't need that much information and most changes are
small things.

(2) It's rare to implement an algorithm *exactly* as presented in the paper,
so we'd need a list of changes.  That means the description is a combination
of a reference and a set of changes, which is complex.

(3) A critical part of the overview is identifying what part of the code and
data structures does what.  If you have a complete description of the
algorithm in the file, this flows very naturally because you intersperse
references to the functions and structs into your description of the
algorithm.  Otherwise, you have to use odd language to link the
implementation with the algorithm.  This link is perhaps the most critical
part of the documentation but is the part most commonly left out.

Certainly the reference needs to be there as well, both for credit purposes
and to supply further details.  For example, normally a critical part of
documentation is not just what's being done but *why* it's being done and why
other things *aren't* being done.  Here, the paper can serve those purposes.

As I've been getting into some of the newer parts of the compiler, I've been
very hampered by the lack of proper documentation.  I think improving this
documentation ought to be one of the major goals of 3.5 aside from any other
changes.

I'd like to get agreement on the following documentation standard for the
cases where papers or texts are referenced:

(1) The algorithm be fully-enough described in a blocks of comments in the
front of the file that the goals and methods of the algorithm can be
completely understood just from those comments.

(2) As part of that narrative, any differences between the algorithm in the
reference and the code should be explained.  Likewise, any implementation
choices should be pointed out.

(3) Again as part of the narrative, each major function and data structure
should be mentioned.

(4) The reference should be supplied in a clear manner.  If it available
online, a URL should be supplied.

Of course, each file should also meet the minimum documentation requirements
in other areas:

(1) There should be a block of comments in front of every function giving the
external specification of each function, including the meaning of every
argument.

(2) Within each function, there should be enough comments to explain the role
of each part of the function in implementing those external specifications.
At a minimum, this means a sentence or two for each non-trivial "if"
statement or loop.  These should not be a translation of the code into
English, but provide the linkage between specification and implementation.

(3) Any design choices, especially choices about *not* doing something, need
to be clear documented.

Does everybody agree with these standards?  If we can't get concensus, I'd
like to ask the SC to look at this issue.

^ permalink raw reply	[flat|nested] 171+ messages in thread

end of thread, other threads:[~2004-02-11 15:56 UTC | newest]

Thread overview: 171+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-27 18:56 "Documentation by paper" Richard Kenner
2004-01-27 19:31 ` Diego Novillo
  -- strict thread matches above, loose matches on Subject: below --
2004-02-09 19:05 Richard Kenner
2004-02-09 18:55 Richard Kenner
2004-02-09 18:59 ` Paul Koning
2004-02-09 18:52 Richard Kenner
2004-02-09 18:54 ` Zack Weinberg
2004-02-10 19:51   ` Kai Henningsen
2004-02-09 18:37 Richard Kenner
2004-02-09 18:45 ` Zack Weinberg
2004-02-09 18:28 Robert Dewar
2004-02-09 18:20 Robert Dewar
2004-02-09 17:19 Richard Kenner
2004-02-09 18:14 ` Joe Buck
2004-02-09 18:34 ` Zack Weinberg
2004-02-05 20:09 Richard Kenner
2004-02-04 18:05 Richard Kenner
2004-02-04 18:58 ` Joe Buck
2004-02-04 19:10   ` Robert Dewar
2004-02-04 14:35 Richard Kenner
2004-02-04 14:08 Richard Kenner
2004-02-04 17:50 ` Joe Buck
2004-02-05 19:57 ` Felix Lee
2004-02-06 10:51   ` Robert Dewar
2004-02-04 13:20 Richard Kenner
2004-02-04 13:49 ` Michael Matz
2004-02-04 14:11   ` Robert Dewar
2004-02-04 14:30     ` Michael Matz
2004-02-04 14:43       ` Arnaud Charlet
2004-02-04 14:46       ` Robert Dewar
2004-02-04 15:56         ` Daniel Berlin
2004-02-04 13:10 Richard Kenner
2004-02-03 16:57 Richard Kenner
2004-02-03 22:22 ` Robert Dewar
2004-02-03 16:56 Richard Kenner
2004-02-03 17:13 ` Lars Segerlund
2004-02-03 16:37 Paolo Bonzini
2004-02-03 16:28 Richard Kenner
2004-02-03 22:06 ` Robert Dewar
2004-02-03 22:20   ` Dale Johannesen
2004-02-04 13:48     ` Robert Dewar
2004-02-04  2:56 ` Russ Allbery
2004-02-04 17:26   ` Phil Edwards
2004-02-03 16:20 Richard Kenner
2004-02-03 16:54 ` Jan Hubicka
2004-02-03 16:16 Richard Kenner
2004-02-03 16:23 ` Steven Bosscher
2004-02-03 16:40   ` law
2004-02-03 16:38 ` law
2004-02-03 15:48 Richard Kenner
2004-02-03 16:00 ` Steven Bosscher
2004-02-03 21:27   ` Robert Dewar
2004-02-03 22:16     ` Daniel Berlin
2004-02-03 16:07 ` Paolo Bonzini
2004-02-03 16:35 ` law
2004-02-03 16:48   ` Peter Barada
2004-02-03 17:03     ` Paul Koning
2004-02-03 17:07     ` law
2004-02-03 17:28       ` Daniel Berlin
2004-02-08  6:23       ` Jamie Lokier
2004-02-09 16:25         ` law
2004-02-09 17:53           ` Jamie Lokier
2004-02-09 18:07             ` Daniel Berlin
2004-02-09 18:14               ` Robert Dewar
2004-02-09 18:26                 ` Daniel Berlin
2004-02-10 19:51                   ` Kai Henningsen
2004-02-10 20:31                     ` Daniel Berlin
2004-02-10 20:49                       ` Joern Rennecke
2004-02-11 12:31                       ` Jamie Lokier
2004-02-11 15:42                         ` Daniel Berlin
2004-02-11 15:56                           ` Daniel Berlin
2004-02-09 18:39             ` law
2004-02-09 19:12               ` Robert Dewar
2004-02-11 12:38               ` Jamie Lokier
2004-02-11 15:51                 ` Daniel Berlin
2004-02-03 17:09     ` Daniel Berlin
2004-02-03 17:28       ` Peter Barada
2004-02-03 22:20     ` Robert Dewar
2004-02-07  0:14   ` Kai Henningsen
2004-02-09 16:28     ` law
2004-02-09 16:45       ` Robert Dewar
2004-02-10 19:51       ` Kai Henningsen
2004-02-03 15:40 Paolo Bonzini
2004-02-03 21:21 ` Robert Dewar
2004-02-03 12:20 Richard Kenner
2004-02-03 12:12 Richard Kenner
2004-02-03 16:46 ` Felix Lee
2004-02-03  0:59 Richard Kenner
2004-02-03 12:17 ` Joern Rennecke
2004-02-02 17:19 Robert Dewar
2004-02-02 22:02 ` Joern Rennecke
2004-02-07  0:15   ` Kai Henningsen
2004-02-02 17:16 Robert Dewar
2004-02-02 17:51 ` Jamie Lokier
2004-02-02 19:28   ` Robert Dewar
2004-02-01 12:17 Richard Kenner
2004-01-29 20:50 Richard Kenner
2004-01-29 19:05 Richard Kenner
2004-01-29 20:37 ` Joe Buck
2004-01-29 22:50   ` Andrew Sutton
2004-01-30 17:29   ` Robert Dewar
2004-01-27 20:51 Richard Kenner
2004-01-27 21:55 ` Phil Edwards
2004-01-27 21:59   ` Ian Lance Taylor
2004-01-29 18:33   ` law
2004-01-27 20:29 Richard Kenner
2004-01-27 20:58 ` DJ Delorie
2004-01-27 22:34 ` Tom Tromey
2004-01-27 20:13 Richard Kenner
2004-01-27 20:05 Richard Kenner
2004-01-27 20:11 ` Phil Edwards
2004-01-27 20:17 ` Phil Edwards
2004-01-27 21:08   ` Ian Lance Taylor
2004-01-27 21:37     ` Phil Edwards
2004-01-27 23:22   ` Bernd Schmidt
2004-01-27 19:55 Richard Kenner
2004-01-27 20:39 ` Diego Novillo
2004-01-27 20:40 ` Laurent GUERBY
2004-01-27 19:43 Richard Kenner
2004-01-27 19:28 Richard Kenner
2004-01-27 18:53 Richard Kenner
2004-01-27 19:09 ` Daniel Berlin
2004-01-27 19:13   ` Ian Lance Taylor
2004-01-27 20:08   ` Gabriel Dos Reis
2004-01-27 20:17     ` Daniel Berlin
2004-01-27 20:34       ` Gabriel Dos Reis
2004-01-27 20:42       ` Andrew Sutton
2004-01-27 19:20 ` law
2004-01-27 19:31   ` Phil Edwards
2004-01-27 19:59     ` Ian Lance Taylor
2004-01-27 19:43   ` Ian Lance Taylor
2004-01-29 18:37     ` law
     [not found]   ` <401C3F16.3040706@gnat.com>
2004-02-01  6:19     ` Andrew Sutton
2004-02-01 12:08       ` Robert Dewar
2004-02-01 21:13         ` Jamie Lokier
2004-02-01 23:05           ` Robert Dewar
2004-02-01 23:05           ` Robert Dewar
2004-02-02 15:40             ` Jamie Lokier
2004-02-02 15:56               ` Robert Dewar
2004-02-02 16:59                 ` Joe Buck
2004-02-02 17:10                   ` Jamie Lokier
2004-02-02 17:30                     ` Joe Buck
2004-02-02 18:55                   ` Alexandre E. Kopilovitch
2004-01-27 19:44 ` Joe Buck
2004-01-27 20:17   ` Phil Edwards
2004-01-27 18:19 Richard Kenner
2004-01-27 18:24 ` Diego Novillo
2004-01-27 18:38 ` Andrew Sutton
2004-01-27 18:06 Richard Kenner
2004-01-27 18:03 Richard Kenner
2004-01-27 18:20 ` Joe Buck
2004-01-27 15:48 Richard Kenner
2004-01-27 17:51 ` Joe Buck
2004-01-27 17:57   ` Daniel Berlin
2004-01-27 18:04   ` Diego Novillo
2004-01-27 18:24     ` Zack Weinberg
2004-01-27 18:37       ` Daniel Berlin
2004-01-27 18:58         ` Ian Lance Taylor
2004-01-28 16:42         ` Joern Rennecke
2004-01-31 23:52         ` Robert Dewar
2004-02-01  6:08           ` Andrew Sutton
2004-01-27 22:12   ` Geert Bosch
2004-01-27 15:40 Richard Kenner
2004-01-27 15:48 ` Lars Segerlund
2004-02-01  0:43   ` Robert Dewar
2004-01-30 23:18 ` Mark Mitchell
2004-02-02 11:02   ` Lars Segerlund
2004-02-03  8:09   ` law
2004-02-03 16:44     ` Felix Lee
2004-02-03 22:18       ` Robert Dewar
     [not found]         ` <dewar@gnat.com>
2004-02-04  0:00           ` Felix Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).