public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-10 15:29 Richard Kenner
  2003-05-10 15:52 ` Zdenek Dvorak
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-10 15:29 UTC (permalink / raw)
  To: rakdver; +Cc: gcc

    > Perhaps, but a lot of that complexity is *needed*.  The reason that GCC
    > has historically beaten other compilers in code quality is paying
    > attention to important heuristics and less on "textbook" material.
    > Sure the latter is also critical, but the pragmatics cannot be ignored.

    the problem with this approach is that you will eventually end up with
    an uncomprehensible piece of code where everything depends on everything
    and you cannot do a simplest change without everything else falling
    apart.

Obviously, that's not desirable, but I don't see why it necessarily follows
from taking a pragmatic approach to optimizations.

    GCSE is basically machine independent, so there should not be a need for
    it to take machine-dependent issues into accout. These dependencies
    should be isolated in a few later passses that can take care of the
    issues you mention without harming each other.

Here I don't follow you.  *All* of the optimizers are written in a style
of being machine-independent but take into account machine-dependent
parameterizations.  Are you claiming that somehow this should not be true
for gcse?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 15:29 An issue for the SC: horrible documentation quality of GCC Richard Kenner
@ 2003-05-10 15:52 ` Zdenek Dvorak
  2003-05-10 16:09   ` David Edelsohn
  2003-05-12  9:07   ` Richard Earnshaw
  0 siblings, 2 replies; 56+ messages in thread
From: Zdenek Dvorak @ 2003-05-10 15:52 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Hello,

>     > Perhaps, but a lot of that complexity is *needed*.  The reason that GCC
>     > has historically beaten other compilers in code quality is paying
>     > attention to important heuristics and less on "textbook" material.
>     > Sure the latter is also critical, but the pragmatics cannot be ignored.
> 
>     the problem with this approach is that you will eventually end up with
>     an uncomprehensible piece of code where everything depends on everything
>     and you cannot do a simplest change without everything else falling
>     apart.
> 
> Obviously, that's not desirable, but I don't see why it necessarily follows
> from taking a pragmatic approach to optimizations.

just from practice -- looking where did this "pragmatic approach" bring
gcc. Yes, we have might and cool "CSE" pass; but after 2 years working
on gcc, I would not dare to change a single line there because I don't
know what will happen.

>     GCSE is basically machine independent, so there should not be a need for
>     it to take machine-dependent issues into accout. These dependencies
>     should be isolated in a few later passses that can take care of the
>     issues you mention without harming each other.
> 
> Here I don't follow you.  *All* of the optimizers are written in a style
> of being machine-independent but take into account machine-dependent
> parameterizations.  Are you claiming that somehow this should not be true
> for gcse?

no -- there are optimizations that are obviously machine-dependent (like
scheduling, register allocation, peephole, combiner and whole lot of
others). Then there are optimizations that are based on some general
machine non-specific idea (like that computing one expression several
times is wrong, or that replacing variable with its constant value is
good -- CSE, GCSE, copy/const propagation, part of the loop
optimizations and some other fall into this cathegory). The sane way
how to write a compiler is to run these generic passes first, then
run machine-dependent passes that may eventually fix the mistakes
the previous optimizations have done (due to their generic ideas being
not right in some corner cases). This way we would get approximately
the same results, but in much more transparent way.

Zdenek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 15:52 ` Zdenek Dvorak
@ 2003-05-10 16:09   ` David Edelsohn
  2003-05-10 16:17     ` Zdenek Dvorak
  2003-05-10 16:57     ` Michael S. Zick
  2003-05-12  9:07   ` Richard Earnshaw
  1 sibling, 2 replies; 56+ messages in thread
From: David Edelsohn @ 2003-05-10 16:09 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Richard Kenner, gcc

>>>>> Zdenek Dvorak writes:

Zdenek> The sane way
Zdenek> how to write a compiler is to run these generic passes first, then
Zdenek> run machine-dependent passes that may eventually fix the mistakes
Zdenek> the previous optimizations have done (due to their generic ideas being
Zdenek> not right in some corner cases). This way we would get approximately
Zdenek> the same results, but in much more transparent way.

	This assumes that the mistakes can be corrected with a later
machine-dependent pass, which is not always correct.  Plus it takes more
compile time to correct the mistakes instead of choosing the correct
heuristics in the first place.  The generic pass already needs to apply
*some* heuristic.

David

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 16:09   ` David Edelsohn
@ 2003-05-10 16:17     ` Zdenek Dvorak
  2003-05-10 16:57     ` Michael S. Zick
  1 sibling, 0 replies; 56+ messages in thread
From: Zdenek Dvorak @ 2003-05-10 16:17 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Kenner, gcc

Hello,

> Zdenek> The sane way
> Zdenek> how to write a compiler is to run these generic passes first, then
> Zdenek> run machine-dependent passes that may eventually fix the mistakes
> Zdenek> the previous optimizations have done (due to their generic ideas being
> Zdenek> not right in some corner cases). This way we would get approximately
> Zdenek> the same results, but in much more transparent way.
> 
> 	This assumes that the mistakes can be corrected with a later
> machine-dependent pass, which is not always correct.  Plus it takes more
> compile time to correct the mistakes instead of choosing the correct
> heuristics in the first place.

I guess that for every such heuristic you will with some care find an
example where it does a stupidity. In general it makes more sense to
have a few passes with really good heuristics than have some lame
heuristic in every pass.

As for the speed -- I guess this is just a matter of opinion, as there
is nothing to compare. My opinion is that the increased speed of most of the
passes would pay up for the decreased speed of a few cleanup passes
(they must be in some form present anyway, they already must gather all
the data, so the slowdown should not be that dramatic).

> The generic pass already needs to apply *some* heuristic.

this is of course true; but it should target the most obvious ones and
do not try to solve every corner case at every optimization.

Zdenek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 16:09   ` David Edelsohn
  2003-05-10 16:17     ` Zdenek Dvorak
@ 2003-05-10 16:57     ` Michael S. Zick
  1 sibling, 0 replies; 56+ messages in thread
From: Michael S. Zick @ 2003-05-10 16:57 UTC (permalink / raw)
  To: David Edelsohn, Zdenek Dvorak; +Cc: Richard Kenner, gcc

On Saturday 10 May 2003 11:07 am, David Edelsohn wrote:
> >>>>> Zdenek Dvorak writes:
>
> Zdenek> The sane way
> Zdenek> how to write a compiler is to run these generic passes first, then
> Zdenek> run machine-dependent passes that may eventually fix the mistakes
> Zdenek> the previous optimizations have done (due to their generic ideas
> being Zdenek> not right in some corner cases). This way we would get
> approximately Zdenek> the same results, but in much more transparent way.
>
> 	This assumes that the mistakes can be corrected with a later
> machine-dependent pass, which is not always correct.  Plus it takes more
> compile time to correct the mistakes instead of choosing the correct
> heuristics in the first place.  The generic pass already needs to apply
> *some* heuristic.
>
> David
In support of David's conclusions; given two, simplified, extreme cases:

For a constant value which does not fit within an instruction field of an
instruction that is otherwise required;

Choosing where that constant value will be held during its lifetime is
clearly a hardware dependant decision.

On a 1024 register machine, you would probably choose to keep it
in a register.

On a 3 register machine, then keeping it in a register is probably not
a good choice.  This hardware designer probably has made provisions
to handle this situation ( a fast base page pool, fast stack ops, something).

The initial "generic", "hardware independent" code which does the bulk of
the transformations from source language representations to machine
representations should not be blindly generating any specific choice.

Doing so requires that someone envision, write, and maintain code that
later "undoes" those decisions.  Code which the compiler must execute.

All of which is an argument for the existing "hardware independent" 
transformations WITH consideration of "hardware parameterizations".
(Note the word: "BLINDLY" two sentences above.)

My point here, is that it would be better to strengthen and broaden the
usage of the "hardware parameterizations" in the "generic, hardware
independent" transformations than to rely on later "hardware dependant"
passes (which must be separately envisioned, written, maintained forever
and executed by the compiler) to "clean up" any "bad decisions".

Even to the extent that doing so might have a negative effect on the
generic pass'es performance.  Everything the compiler "gets right" at
this stage eliminates mucho (technical term there) later code.

And, in returning to the subject of this thread, it would be nice if each
function had at least a "statement of intent" at the point of definition.

Mike

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 15:52 ` Zdenek Dvorak
  2003-05-10 16:09   ` David Edelsohn
@ 2003-05-12  9:07   ` Richard Earnshaw
  1 sibling, 0 replies; 56+ messages in thread
From: Richard Earnshaw @ 2003-05-12  9:07 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Richard Kenner, gcc, Richard.Earnshaw


> no -- there are optimizations that are obviously machine-dependent (like
> scheduling, register allocation, peephole, combiner and whole lot of
> others). Then there are optimizations that are based on some general
> machine non-specific idea (like that computing one expression several
> times is wrong, or that replacing variable with its constant value is
> good -- CSE, GCSE, copy/const propagation, part of the loop
> optimizations and some other fall into this cathegory). The sane way
> how to write a compiler is to run these generic passes first, then
> run machine-dependent passes that may eventually fix the mistakes
> the previous optimizations have done (due to their generic ideas being
> not right in some corner cases). This way we would get approximately
> the same results, but in much more transparent way.

I don't believe you can generalize like this.  On a very fast machine with 
a small number of registers it *can* be more efficient to recalculate an 
expression than to keep a copy of a previously evaluated version.

On many ARM implementations a shift by a constant is free in an arithmetic 
operation, so cse'ing

  int *b, *c, d;

  b+=d;
  c+=d;

Which, in RTL is

  (set b (plus (mult d 4) b))
  (set c (plus (mult d 4) c))

into 

  (set t1 (mult d 4))
  (set b (plus t1 b))
  (set c (plus t1 c))

is a de-optimization (takes 3 cycles instead of 2).

R.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-11 19:58 Richard Kenner
@ 2003-05-13  5:59 ` Michael S. Zick
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Zick @ 2003-05-13  5:59 UTC (permalink / raw)
  To: Richard Kenner, kaih; +Cc: gcc

> On Sunday 11 May 2003 03:03 pm, Richard Kenner wrote:
>> On Sunday 11 May 2003 12:29 pm, Kai Henningsen wrote:
>>     As there is so often, there is an answer that is perfectly obvious, but
>>     not helpful in practice.
Unfortunately, the question is not perfectly obvious.

>>
>>     Have a cost function that determines how good the optimization of a
>>     program is.
For the phrase "cost function" taken to mean a general concept, agreed.

>>
>>     Run all optimizations in some predetermined sequence.
>>
>>     Compare the cost function of the output to the one of the input.
>>
>>     As long as there was some improvement, do it again.
Better would be a single "costing function principle" computed "on the
fly" and refined at each stage of optimization.
I.E: A single pass, stepwise refined, "costing function principle" rather
than an iteration.

>>
>>     Then use the best result.
>>
>>     Compile times? What do you mean, compile times?
>>
> Even aside from compile times, since some optimizations are performed
> before register allocation and some after, there's no such iteration
> possible.
Ah... but there could be a non-iterative "costing function" of the optimized
transformations.  One which begins in the non-hardware dependent, generic
transformations and converges to code suitable for a specific processor
within the hardware dependent transformations.

Following the long standing traditions of CS deceptive clarity, I will use a
well established, widely used term to mean something entirely different.

"Address Space"

The highest level (least refined) concept of "Address Space" would be 
where program logic transforms are performed for an abstract machine
with a (virtually) unlimited register set.
Those "Address Spaces" could be numbered:
0 - The address space with the property of "use in place".
1 - The address space with the property of "fetch to use".

"Address Space" in action:

When a specific constant is encountered, not having any hardware
knowledge, it must be presumed to reside in Address Space 1 - fetch
to use.
So assign it to a pseudo register.  Now that constant is in Address 
Space 0.
That movement from Address Space 1 to Address Space 0 incurs
a cost.  That cost may be represented by a small, integer, value.

Unfortunately, costing algorithms often stop with accumulating that
sense of "cost".

Instead, record that cost with the pseudo register - this small, integer,
value now represents the "investment" made in having this constant
in Address Space 0.

Each additional time this specific constant is encountered, it is found
to already be in Address Space 0.  Not having any hardware
knowledge, presume that pseudo register is the source of the constant.

The use of an Address Space 0 pseudo register has a cost.  That cost
may be represented by a small, integer value.  That value should be
numerically less than the cost of an Address Space 1 cost.

Each of those additional times this specific constant in encountered,
record the DIFFERENCE of the two costs.  Literally, the interest
earned on the investment of moving the constant into 
Address Space 0.

So far, our new pseudo register has three fields associated with
it:  1) The address space occupies.  2) The cumulative cost it has
acquired during its address space movements  3)  The cumulative
interest earned.

Now when an optimization decision has to be made, it can be
based on the "rate of return".  NOT on the cost (in instructions,
cycles, whatever - we don't know that yet.)  NOR on the interest
(which, at this point, is equivalent to frequency of re-use).

So far, not very interesting with only a two address space costing
function.  Also, the results would probably not be much different
than not bothering to do any of this in the first place.

Now the interesting part...

Question: "Who said our abstract machine only has two 
address spaces?"
Not I.

Question: "Who said that an abstract machine with a (virtually)
unlimited set of registers doesn't have a limit on the number
of registers that may occupy a single address space?"
Not I.

At this generic level, there would be an Address Space that
roughly similar to each of what the hardware people call an
"Addressing mode".
While there is great variation in hardware instructions, there
are relatively few addressing modes.

Which brings me around to what I called "hardware dependent
parameterization of the generic, hardware independent
optimization's" mentioned in:
Ref: <http://gcc.gnu.org/ml/gcc/2003-05/msg01009.html>

There would be a small table for each supported processor,
that included the cost and pseudo register limit for each
Address Space.

Such a table under this scheme should be adequate to limit
the early optimization's from generating something that the
hardware dependent backend might have to totally rewrite.

Note that these "Address Spaces", by the time they get to
the hardware code generation, would encompass the existing
"Register Mode Classes".  
One might want to turn this description inside-out by calling 
my "Address Spaces" instead "Pseudo Register Modes".

Note that this "costing function principle" can be carried
through (and used) the entire compilation process.

For instance; what choice do you make when you have a 
limited resource to spend (either money or hard registers)?

Easy, (and like the Kai said above "obvious") you use that
resource on the investment that has the highest rate of return.

Similarly for a decision of what to carry in a register, spill to
the stack, or re-compute.
Or what to hoist out of a loop or when a pointer increment
can stay in a loop because the hardware has auto-increment
registers.

It is a matter of asking the right question, not the obviousness
of the answer.

Mike

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-12 17:39 ` law
@ 2003-05-12 20:15   ` Richard Henderson
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Henderson @ 2003-05-12 20:15 UTC (permalink / raw)
  To: law; +Cc: Richard Kenner, jh, gcc

On Mon, May 12, 2003 at 11:36:08AM -0600, law@redhat.com wrote:
> I believe Richard Henderson outlined those in a message to
> the GCC lists sometime in the last couple months.

http://gcc.gnu.org/ml/gcc/2003-02/msg00763.html


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10  2:21 Richard Kenner
  2003-05-10 15:12 ` Zdenek Dvorak
@ 2003-05-12 17:39 ` law
  2003-05-12 20:15   ` Richard Henderson
  1 sibling, 1 reply; 56+ messages in thread
From: law @ 2003-05-12 17:39 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

In message <10305100225.AA20837@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >    Then you will end up with never ending dataflow problems in the global
 >    pass.  The GCSE in the simplest definition is already too complicated
 >    for us to be implemented correctly (we are on it since 97 and still it
 >    is inferrior to what is done by other compilers).  Adding more
 >    complexity, like CSE has would make it more crazier.
 >
 >Perhaps, but a lot of that complexity is *needed*.  The reason that GCC
 >has historically beaten other compilers in code quality is paying attention 
 >to important heuristics and less on "textbook" material.  Sure the latter
 >is also critical, but the pragmatics cannot be ignored.
Note, I would disagree with the statement "GCC has historically beaten
other compilers in code quality".

In my experience of looking at GCC against compilers from multiple vendors,
including HP, IBM, Intel, Sun and others, GCC has consistently performed
slightly worse for integer codes (assuming you didn't turn on any
intra-procedural optimizations in the competing compiler).  The difference
in performance is small, but measurable (typically around 5%).  Clearly in
some cases GCC does better and in some cases the competing compiler does
better, but the overall picture I've seen has GCC performing slightly worse.

In those same comparisons, but using FP intensive code, GCC's performance
is significantly worse -- on the order of 10-15% worse.

When I've analyzed the codes in question, the inability to spot and remove
higher level redunancies is one of the major issues, along with register
allocation and good scheduling.


 >    CSE does a lot and is powerfull pass that does approximately what is
 >    accomplished by multiple passes as described in literature.  The
 >    problem is that we don't know how to make it global and/or faster and
 >    we know that it is too slow right now.
 >
 >A lot of the slowness is due to the "global" parts of its activities.  I've
 >suggested a number of times that they be turned off since they would seem
 >redundant with GCSE.  But when this is tried, it lowers performance, which
 >certainly suggests that the "complexity" of CSE is worth something.
The global nature of CSE is not redundant with GCSE.  It never has been.

Roger's recent changes (jump bypassing I believe) got us closer, but
there are still a couple important cases that CSE handles that GCSE
does not.  I believe Richard Henderson outlined those in a message to
the GCC lists sometime in the last couple months.


Jeff


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-12 16:25 Richard Kenner
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-12 16:25 UTC (permalink / raw)
  To: matz; +Cc: gcc

    > For this and other reasons I've come to the conclusion that we need to
    > separate our optimizations into generic and target dependent
    > optimizations. And that conceptually the generic optimizations should
    > strive to eliminate as much redundancy as possible, fully propagate
    > constants/copies, etc and leave it to the expanders and target
    > dependent optimizers to "undo" unprofitable redundancy elimination,
    > cprop, etc.

    Amen.

And I continue to strongly disagree with the above.  The key to GCC's
success over the years has precisely been the knowlege of the
underlying machine characteristics within the optimizers.  I certainly
agree that there are some high-level optimizations that can be
peformed in a machine-independent manner and these are the ones that
are being done on trees.

But I think that once we're at the state of optimizing RTL, all optimizers
need to know a lot about the machine they are optimizing for.  Moreover,
I believe that the RTL optimizers will always have to have duplicate
optimizations with the tree level: for example, address calculations can 
produce common subexpressions (often global) and loop invariants.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-12 16:05   ` law
@ 2003-05-12 16:20     ` Michael Matz
  0 siblings, 0 replies; 56+ messages in thread
From: Michael Matz @ 2003-05-12 16:20 UTC (permalink / raw)
  To: law; +Cc: Zdenek Dvorak, Richard Kenner, jh, gcc

Hi,

On Mon, 12 May 2003 law@redhat.com wrote:

> For this and other reasons I've come to the conclusion that we need to
> separate our optimizations into generic and target dependent optimizations.
> And that conceptually the generic optimizations should strive to eliminate
> as much redundancy as possible, fully propagate constants/copies, etc and
> leave it to the expanders and target dependent optimizers to "undo"
> unprofitable redundancy elimination, cprop, etc.

Amen.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 15:12 ` Zdenek Dvorak
@ 2003-05-12 16:05   ` law
  2003-05-12 16:20     ` Michael Matz
  0 siblings, 1 reply; 56+ messages in thread
From: law @ 2003-05-12 16:05 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Richard Kenner, jh, gcc

In message <20030510151224.GA31523@atrey.karlin.mff.cuni.cz>, Zdenek Dvorak wri
tes:
 >the problem with this approach is that you will eventually end up with
 >an uncomprehensible piece of code where everything depends on everything
 >and you cannot do a simplest change without everything else falling
 >apart.
Actually, I would claim we already in this state largely because 
we've parameterized every optimization pass based on the target's
characteristics.  As a result, it has become nearly impossible to
do work on our optimizers and have any confidence that the work will
be at least performance neutral on targets which do not closely 
match the developer's target machine.

Worse yet, the highly target dependent nature of our optimizers often
results in missed optimization opportunities.  For example, consider a
typical risc target which has limitations on what (if any) constants can
be used in a conditional jump.  I've seen many cases where global constant
propgation can prove that both arguments to the conditional are constants,
but due to the limitations of the target architecture we can't do the
replacements which will ultimately result in realization that the 
condition is a compimle time constant.

For this and other reasons I've come to the conclusion that we need to 
separate our optimizations into generic and target dependent optimizations.
And that conceptually the generic optimizations should strive to eliminate
as much redundancy as possible, fully propagate constants/copies, etc and
leave it to the expanders and target dependent optimizers to "undo" 
unprofitable redundancy elimination, cprop, etc.

 >GCSE is basically machine independent, so there should not be a need for
 >it to take machine-dependent issues into accout.
Right.  In fact, I've never been happy about some of the machine dependencies
we've introduced into GCSE.  For example, why in the hell should GCSE care
whether or not a particular port has properly defined its reload_[inout]_ccmode
patterns?  Well, gcse does care!  It may not be obvious, but 
AVOID_CCMODE_COPIES
was introduced to deal with this problem.  That's unbelievably silly.

It's probably worth keeping in mind that GCSE does have one large 
target dependency -- namely that each and every substitution made by GCSE
must result in a valid insn.  This is a huge target dependency -- and a
legitimate target dependency given where GCSE sits in the optimization path.

Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-12 11:37 Robert Dewar
  0 siblings, 0 replies; 56+ messages in thread
From: Robert Dewar @ 2003-05-12 11:37 UTC (permalink / raw)
  To: kenner, matz; +Cc: gcc

> Well, on the top of gcse.c it says, that one of the passes in GCSE is
> copy/constant propagation.  This is a standard optimization, and hence the
> actual description of any algorithm (on the high level) seems superflous.
> Tricks used in the implementation should of course be documented, but have
> no place in any high level overview comments.

copy/constant propagation is really quite vague. it can refer to various
algorithms and intended optimizations. I think it is quite important to
define terms like this. Often people have taken a compiler course, or
read some particular book, or set of papers, and they assume that terminology
is standard. That's far from true in the compiler field in general, and in
optimization in particular, so it is quite important to define terms like
this clearly.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-12 11:36 Robert Dewar
  0 siblings, 0 replies; 56+ messages in thread
From: Robert Dewar @ 2003-05-12 11:36 UTC (permalink / raw)
  To: gcc, kenner

I definitely agree with Richard that it is very important for all interfaces
to be completely documented at the time that code is put in (indeed good
practice is to write the documentation at the time the code is written!)

Certainly minimally any new functions should be fully documented, including
the exact semantics of each argument.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-12 11:16 Richard Kenner
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-12 11:16 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: gcc

    Which, in RTL is

      (set b (plus (mult d 4) b))
      (set c (plus (mult d 4) c))

    into 

      (set t1 (mult d 4))
      (set b (plus t1 b))
      (set c (plus t1 c))

    is a de-optimization (takes 3 cycles instead of 2).

It's long been on my list to try to see if can find some way to turn the
latter back into the former.  The problem is that combine won't do it 
because it doesn't know it can kill T1.  But I've never come up with
any way that does it.  I think this sort of thing is very common,
especially with addressing modes on CISC machines.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-11 19:58 Richard Kenner
  2003-05-13  5:59 ` Michael S. Zick
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-11 19:58 UTC (permalink / raw)
  To: kaih; +Cc: gcc

    As there is so often, there is an answer that is perfectly obvious, but  
    not helpful in practice.

    Have a cost function that determines how good the optimization of a  
    program is.

    Run all optimizations in some predetermined sequence.

    Compare the cost function of the output to the one of the input.

    As long as there was some improvement, do it again.

    Then use the best result.

    Compile times? What do you mean, compile times?

Even aside from compile times, since some optimizations are performed
before register allocation and some after, there's no such iteration possible.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 16:49 Richard Kenner
@ 2003-05-11 19:42 ` Kai Henningsen
  0 siblings, 0 replies; 56+ messages in thread
From: Kai Henningsen @ 2003-05-11 19:42 UTC (permalink / raw)
  To: gcc

kenner@vlsi1.ultra.nyu.edu (Richard Kenner)  wrote on 10.05.03 in <10305101653.AA23590@vlsi1.ultra.nyu.edu>:

> The issue of what order to run optimizations is very tricky and I don't
> think can be answered in that sort of general way.

As there is so often, there is an answer that is perfectly obvious, but  
not helpful in practice.

Have a cost function that determines how good the optimization of a  
program is.

Run all optimizations in some predetermined sequence.

Compare the cost function of the output to the one of the input.

As long as there was some improvement, do it again.

Then use the best result.

Compile times? What do you mean, compile times?

MfG Kai

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-10 18:08 Richard Kenner
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-10 18:08 UTC (permalink / raw)
  To: rakdver; +Cc: gcc

    good luck. If you want them be really good, they must all work globally.
    They must alst take into account what any other later optimization could
    do. This is not feasible.

It isn't feasible to do it perfectly, of course not, but if we
properly specify what later optimizations do, it's not that unreasonable
to get something that makes the right decision the vast majority of the time.

    Why it should not? You must gather the data on that you base your
    decison, which definitely is not fast if you want to do it everywhere.

Usually that data is fairly cheap to get, such as a cost function.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10 17:04 Richard Kenner
@ 2003-05-10 17:18 ` Zdenek Dvorak
  0 siblings, 0 replies; 56+ messages in thread
From: Zdenek Dvorak @ 2003-05-10 17:18 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Hello,

>     I guess that for every such heuristic you will with some care find an
>     example where it does a stupidity. In general it makes more sense to
>     have a few passes with really good heuristics than have some lame
>     heuristic in every pass.
> 
> No, all should be good!

good luck. If you want them be really good, they must all work globally.
They must alst take into account what any other later optimization could
do. This is not feasible.

> The strength of GCC is that it does optimization
> everywhere.  We should not sacrifice that. 

The strength of gcc is that it has several fairly good passes, including
the "CSE". Unfortunately it is also a great weakness, as in order to
make them produce so good code we have sacrificed other important
criterions what "good" means.

>     As for the speed -- I guess this is just a matter of opinion, as there
>     is nothing to compare. My opinion is that the increased speed of most
>     of the passes would pay up for the decreased speed of a few cleanup
>     passes (they must be in some form present anyway, they already must
>     gather all the data, so the slowdown should not be that dramatic).
> 
> Increased over what?  Why does a heuristic slow things down?

Why it should not? You must gather the data on that you base your
decison, which definitely is not fast if you want to do it everywhere.

Zdenek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-10 17:04 Richard Kenner
  2003-05-10 17:18 ` Zdenek Dvorak
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-10 17:04 UTC (permalink / raw)
  To: rakdver; +Cc: gcc

    I guess that for every such heuristic you will with some care find an
    example where it does a stupidity. In general it makes more sense to
    have a few passes with really good heuristics than have some lame
    heuristic in every pass.

No, all should be good!  The strength of GCC is that it does optimization
everywhere.  We should not sacrifice that. 

    As for the speed -- I guess this is just a matter of opinion, as there
    is nothing to compare. My opinion is that the increased speed of most
    of the passes would pay up for the decreased speed of a few cleanup
    passes (they must be in some form present anyway, they already must
    gather all the data, so the slowdown should not be that dramatic).

Increased over what?  Why does a heuristic slow things down?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-10 16:49 Richard Kenner
  2003-05-11 19:42 ` Kai Henningsen
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-10 16:49 UTC (permalink / raw)
  To: rakdver; +Cc: gcc

    just from practice -- looking where did this "pragmatic approach" bring
    gcc. Yes, we have might and cool "CSE" pass; but after 2 years working
    on gcc, I would not dare to change a single line there because I don't
    know what will happen.

That's just lack of familiarity.  I fully understand cse.c but am afraid to
change any line of gcse.c!

    no -- there are optimizations that are obviously machine-dependent
    (like scheduling, register allocation, peephole, combiner and whole
    lot of others). Then there are optimizations that are based on some
    general machine non-specific idea (like that computing one expression
    several times is wrong, or that replacing variable with its constant
    value is good -- CSE, GCSE, copy/const propagation, part of the loop
    optimizations and some other fall into this cathegory). 

I don't understand this distinction at all.  The combiner, for example, acts
by always decreasing the number of insns.  That's a very machine-independent
criteria and optimization.  All of the folding done by it and CSE are
machine-independent.

Replacing a variable with it's constant value may or may not be an
"optimization", but whether it is or not is a machine-dependent choice, so I
don't see why you consider that in the machine-dependent parts.

CSE can be viewed as very machine-dependent because the choice of which
expression is best for a given insn is dependent on lots of machine-specific
parameters.

And even "computing one expression several times is wrong" may not be correct
for simple enough expressions when register pressure is an issue.

So I really don't see *any* optimization that, done properly, is
machine-independent.  The key to GCC has been to use machine-specific
information to guide all optimizations.

    The sane way how to write a compiler is to run these generic passes
    first, then run machine-dependent passes that may eventually fix the
    mistakes the previous optimizations have done (due to their generic
    ideas being not right in some corner cases). This way we would get
    approximately the same results, but in much more transparent way.

No, I disagree.  Some machine-dependent optimization might expose more
common expressions or loop invariants.

The issue of what order to run optimizations is very tricky and I don't
think can be answered in that sort of general way.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-10  2:21 Richard Kenner
@ 2003-05-10 15:12 ` Zdenek Dvorak
  2003-05-12 16:05   ` law
  2003-05-12 17:39 ` law
  1 sibling, 1 reply; 56+ messages in thread
From: Zdenek Dvorak @ 2003-05-10 15:12 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

Hello,

>     Then you will end up with never ending dataflow problems in the global
>     pass.  The GCSE in the simplest definition is already too complicated
>     for us to be implemented correctly (we are on it since 97 and still it
>     is inferrior to what is done by other compilers).  Adding more
>     complexity, like CSE has would make it more crazier.
> 
> Perhaps, but a lot of that complexity is *needed*.  The reason that GCC
> has historically beaten other compilers in code quality is paying attention 
> to important heuristics and less on "textbook" material.  Sure the latter
> is also critical, but the pragmatics cannot be ignored.

the problem with this approach is that you will eventually end up with
an uncomprehensible piece of code where everything depends on everything
and you cannot do a simplest change without everything else falling
apart.

GCSE is basically machine independent, so there should not be a need for
it to take machine-dependent issues into accout. These dependencies
should be isolated in a few later passses that can take care of the
issues you mention without harming each other.

Zdenek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 22:26 Richard Kenner
  2003-05-09 22:44 ` Jan Hubicka
  2003-05-09 22:48 ` Joe Buck
@ 2003-05-10 12:07 ` Richard Earnshaw
  2 siblings, 0 replies; 56+ messages in thread
From: Richard Earnshaw @ 2003-05-10 12:07 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc, Richard.Earnshaw


> 
> No, it's a dead machine.  I was just giving that as an example of the sort
> of problems you can get into by blindly replacing registers by constants.
> 

Actually, there's a case in reload_cse where exactly the opposite is done: 
a constant is blindly replaced by a register with no regard to the cost.  
On ARM this can be detrimental when the constant is a shift specifier, so

	mov	r0, #5
	mov	r1, r2, lsr r0

is at least a cycle slower than

	mov	r0, #5
	mov	r1, r2, lsr #5

R.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-10  2:21 Richard Kenner
  2003-05-10 15:12 ` Zdenek Dvorak
  2003-05-12 17:39 ` law
  0 siblings, 2 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-10  2:21 UTC (permalink / raw)
  To: jh; +Cc: gcc

    Then you will end up with never ending dataflow problems in the global
    pass.  The GCSE in the simplest definition is already too complicated
    for us to be implemented correctly (we are on it since 97 and still it
    is inferrior to what is done by other compilers).  Adding more
    complexity, like CSE has would make it more crazier.

Perhaps, but a lot of that complexity is *needed*.  The reason that GCC
has historically beaten other compilers in code quality is paying attention 
to important heuristics and less on "textbook" material.  Sure the latter
is also critical, but the pragmatics cannot be ignored.

    CSE does a lot and is powerfull pass that does approximately what is
    accomplished by multiple passes as described in literature.  The
    problem is that we don't know how to make it global and/or faster and
    we know that it is too slow right now.

A lot of the slowness is due to the "global" parts of its activities.  I've
suggested a number of times that they be turned off since they would seem
redundant with GCSE.  But when this is tried, it lowers performance, which
certainly suggests that the "complexity" of CSE is worth something.

    I am still in the dark.  Perhaps it is time to go sleep first and then
    continue in arguing.  Could you give me some example of code what you
    would like to get transfromed?

It's code that I *don't* want to see transformed.  For example, if I call
a group of functions and pass each a constant of 100 on a RISC machine where
the first operand is passed in RX, I expect to see a series of insns, each
setting RX to 100.  I don't want to see one insn setting a pseudo to 100
and then insns setting RX to that.

    >     I man extended basic block like:
    > 
    >       A
    >      / \
    >     B   C
    > 
    > That's not what cse.c means by an extended basis block.

    CSE will actually follow the jumps, so it should track it.

Yes, but that's not the case I'm talking about: I'm talking about the
extended basic block ase.

It is just pure if-then-else (I didn't draw the join edge)

    How? Assuming that B has some other instructions, I will end up with
    both, the copy and the branch.  Or do I miss something else?

I think we're talking about

	if (condition)
	   A
	else
	   B

This has two jumps.

If I can make A into a NOP, this becomes

	if (!condition)
           B

which has one jump.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 23:13 Richard Kenner
@ 2003-05-09 23:23 ` Jan Hubicka
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 23:23 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

>     They are tied in allocator, the point is to tie them earlier, so
>     expression
>     (plus (reg a)  (reg c))
>     will look equivalent to
>     (plus (reg a)  (reg a))
> 
> You don't need to do a replacement: you just record in a table, like
> cse.c does, what all the equivalences are.

Then you will end up with never ending dataflow problems in the global
pass.  The GCSE in the simplest definition is already too complicated
for us to be implemented correctly (we are on it since 97 and still it
is inferrior to what is done by other compilers).  Adding more
complexity, like CSE has would make it more crazier.

CSE does a lot and is powerfull pass that does approximately what is
accomplished by multiple passes as described in literature.  The problem
is that we don't know how to make it global and/or faster and we know
that it is too slow right now.
> 
> I think the confusion here is in trying to do two distinct things
> simultaneously: one is to track value and the other is to actually make
> changes in insns.

Perhaps doing GCSE purely on the reg_equal notes would accomplish what
you seem to be shooting for (we would have one canonical interpretation
of the value as described above), but I am not sure if it is step in
right direction (getting us from the CSE pass performance trap we are
in).
> 
> We all agree that it is important to have a good table of all the
> equivalent values.  But the question of what substitutions in the insn
> stream should be done is a completely independent question.
>     We do have heuristics to deal with this but they don't work well either
>     and I believe that they work worse than the copy propagation mechanizm
>     (at least that is how do I explain the speedups I measured after
>     introducing special purpose copy propagation pass)
> 
> Because you are using that pass to not only modify insns, but also track
> values with gcse.  If you separate those two functions, you may find that
> it's better still.
Perhaps I can do the test with disabled GCSE (so I won't take advantage
of that).  I would guess that it would be still a win.
> 
>     In the case we can use it each time, the constant propagation as
>     implemented will do that (replace every occurence of the pseudo by
>     constant).  I tought you are speaking about situation where it is not
>     good idea to replace the reigsters by constant?
> 
> That's in the case of a reg-reg assignment, not a use of a constant
> within an expression.

I am still in the dark.  Perhaps it is time to go sleep first and then
continue in arguing.  Could you give me some example of code what you
would like to get transfromed?
>     I man extended basic block like:
> 
>       A
>      / \
>     B   C
> 
> That's not what cse.c means by an extended basis block.

CSE will actually follow the jumps, so it should track it.
It is just pure if-then-else (I didn't draw the join edge)
> 
>     Now CSE is deciding whether NOP will be in A or B.  A is put before B
>     (this is usual) and B constains some other instructions.  Now it will
>     place the NOP in B and keep copy in A and it can be the case that A is
>     executed 1000000 times while B not at all.
> 
> You're forgetting that the case in question had *two* branches.  If the inner
> isn't executed, you're replacing a branch with a copy, which is much faster
> on modern machines. 
How? Assuming that B has some other instructions, I will end up with
both, the copy and the branch.  Or do I miss something else?

Honza

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 23:13 Richard Kenner
  2003-05-09 23:23 ` Jan Hubicka
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 23:13 UTC (permalink / raw)
  To: jh; +Cc: gcc

    They are tied in allocator, the point is to tie them earlier, so
    expression
    (plus (reg a)  (reg c))
    will look equivalent to
    (plus (reg a)  (reg a))

You don't need to do a replacement: you just record in a table, like
cse.c does, what all the equivalences are.

I think the confusion here is in trying to do two distinct things
simultaneously: one is to track value and the other is to actually make
changes in insns.

We all agree that it is important to have a good table of all the
equivalent values.  But the question of what substitutions in the insn
stream should be done is a completely independent question.

    You can construct cases where copy propagation increases live ranges
    of registers (when the ealiest incarantion of the value is killed so
    the later incarantions are needed) I seem to remember that even this
    subset of optimization (looking for minimal amount of copies with
    minimal number of simultatelously live registers) is NP complette.

Right.  That's why heuristics are important.

    We do have heuristics to deal with this but they don't work well either
    and I believe that they work worse than the copy propagation mechanizm
    (at least that is how do I explain the speedups I measured after
    introducing special purpose copy propagation pass)

Because you are using that pass to not only modify insns, but also track
values with gcse.  If you separate those two functions, you may find that
it's better still.

    In the case we can use it each time, the constant propagation as
    implemented will do that (replace every occurence of the pseudo by
    constant).  I tought you are speaking about situation where it is not
    good idea to replace the reigsters by constant?

That's in the case of a reg-reg assignment, not a use of a constant
within an expression.

    I man extended basic block like:

      A
     / \
    B   C

That's not what cse.c means by an extended basis block.

    Now CSE is deciding whether NOP will be in A or B.  A is put before B
    (this is usual) and B constains some other instructions.  Now it will
    place the NOP in B and keep copy in A and it can be the case that A is
    executed 1000000 times while B not at all.

You're forgetting that the case in question had *two* branches.  If the inner
isn't executed, you're replacing a branch with a copy, which is much faster
on modern machines. 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 22:51 Richard Kenner
@ 2003-05-09 22:59 ` Jan Hubicka
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 22:59 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

>     For each use of the register (reg c) you look for the chain of
>     operations preceeding it like:
>     (set (reg b) (reg a)
>     (set (reg c) (reg b)
>     and you replace the use by the oldest copy of the value (reg a) in this
>     case.
> 
> Why?  That seems, at best, to accomplish nothing and, at worst, to
> pessimize the code.  I'm not familiar with the new register allocator,
> but in the original one, all of those registers, assuming they are local,
> will be tied together and all allocated the same hard register.
They are tied in allocator, the point is to tie them earlier, so
expression
(plus (reg a)  (reg c))
will look equivalent to
(plus (reg a)  (reg a))
This is what PRE needs to work correctly.
> 
> It would seem to me that when you do this optimization, you run the
> real risk of increasing the lifetime of a register over what it was
> before, especially if you consider the cases where some of the
> registers in question are hard regs and some are global.  Don't we already
> have code that knows how to handle these cases efficiently?

Depends on the definition of efficiently.  The problem is dificult to
solve in the full generality.  You can construct cases where copy
propagation increases live ranges of registers (when the ealiest
incarantion of the value is killed so the later incarantions are needed)
I seem to remember that even this subset of optimization (looking for
minimal amount of copies with minimal number of simultatelously live
registers) is NP complette.

We do have heuristics to deal with this but they don't work well either
and I believe that they work worse than the copy propagation mechanizm
(at least that is how do I explain the speedups I measured after
introducing special purpose copy propagation pass)
>     It is not perfect, as in the case user already wrote code initializing
>     variable to 100 many times, we won't cache the copy of 100 in a
>     register.
> 
> But if the constant of 100 can fit into one insn, why would you *want to*
> cache it in a register?  It's more efficient to use it each time rather
> than waste a register.

In the case we can use it each time, the constant propagation as
implemented will do that (replace every occurence of the pseudo by
constant).  I tought you are speaking about situation where it is not
good idea to replace the reigsters by constant?
> 
>     This is also interesting.  How well does the heuristic work?  In fact
>     I would expect that it will often fail to elliminate the else block
>     and result in moving NOP to the less frequently executed place in the
>     superblock.
> 
> I'm not sure what you mean here.
I man extended basic block like:

      A
     / \
    B   C
Now CSE is deciding whether NOP will be in A or B.  A is put before B
(this is usual) and B constains some other instructions.  Now it will
place the NOP in B and keep copy in A and it can be the case that A is
executed 1000000 times while B not at all.

Honza

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 22:30 Richard Kenner
@ 2003-05-09 22:51 ` Jan Hubicka
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 22:51 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

>     The definition I find in all three books on compilers I have the
>     algorithm is documented as maintaining hash of available expressions and
>     when there is a hit it inserts a copy instruction at one side and use on
>     the other.  This is what GCSE does.
> 
> I'd call that a rather over-specified definition.  I wouldn't say that a
> key attribute of a CSE algorithm would be the insertion of those copies.
> Indeed, cse.c doesn't *want* to do that since it would increase the number
> of insns considerably.  It's not at all clear to me that the number of
> times such copies would win over the more straightforward approach that 
> cse.c does justifies the overhead in all the new insns and pseudos.
> 
> I agree that it makes sense for a global cse to do such a thing, however,
> because then cse.c can clean it up.

In the books it is the copy propagation making the copy dead followed by DCE
killing it.  Of course it is simplified and for local pass there is no
reason to insert a copy when the register is not killed in between.  I
guess any sane implementation would do that.

The whole point is to make certain expectations on the code to be met so
the other algorithms behave consistently.

For instance when you do GCSE and you have two same chains in two basic
blocks, but you run CSE with a heuristics on these and it for some
reason decides to compute each chain slightly differently, you loose.
For globals to work, it is important to make locals behaving
consistently in all occurences of the same pattern.  This is what our
CSE pass lacks.  It is also what is dificult to reach at RTL level
because RTL brings inexpected limitations the algorithms are not
expected to deal with.  This is where the real magic starts...

Books also document reverse copy propagation that should clean up in
some cases where the forward algorithm fails to remove the copy (like in
the case I shown originally where the pseudo is killed, so the copy can
not be elliminated, but when you replace the first computation of the
value by computation into the copy directly, you save it.  The register
coalescing does that too, but in more expensive way.

Honza

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 22:51 Richard Kenner
  2003-05-09 22:59 ` Jan Hubicka
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 22:51 UTC (permalink / raw)
  To: jh; +Cc: gcc

    For each use of the register (reg c) you look for the chain of
    operations preceeding it like:
    (set (reg b) (reg a)
    (set (reg c) (reg b)
    and you replace the use by the oldest copy of the value (reg a) in this
    case.

Why?  That seems, at best, to accomplish nothing and, at worst, to
pessimize the code.  I'm not familiar with the new register allocator,
but in the original one, all of those registers, assuming they are local,
will be tied together and all allocated the same hard register.

It would seem to me that when you do this optimization, you run the
real risk of increasing the lifetime of a register over what it was
before, especially if you consider the cases where some of the
registers in question are hard regs and some are global.  Don't we already
have code that knows how to handle these cases efficiently?

    I see, you are looking for something like (set (reg a) (reg b)) where
    reg b is known to be 100.  Constant propagation will turn this into
    (set (reg a) (reg 100)) but it is possible that copy propagation would
    eeliminate the need for reg a entirely replacing all the uses by reg b
    (or this can be done in register allocation).

I'm talking about register allocation, not copy propagation.

    In the compiler books this does not matter; 

A *lot* of things "don't matter" in compiler books: that's why they are
*books*.  ;-)

    It is not perfect, as in the case user already wrote code initializing
    variable to 100 many times, we won't cache the copy of 100 in a
    register.

But if the constant of 100 can fit into one insn, why would you *want to*
cache it in a register?  It's more efficient to use it each time rather
than waste a register.

    This is also interesting.  How well does the heuristic work?  In fact
    I would expect that it will often fail to elliminate the else block
    and result in moving NOP to the less frequently executed place in the
    superblock.

I'm not sure what you mean here.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 22:26 Richard Kenner
  2003-05-09 22:44 ` Jan Hubicka
@ 2003-05-09 22:48 ` Joe Buck
  2003-05-10 12:07 ` Richard Earnshaw
  2 siblings, 0 replies; 56+ messages in thread
From: Joe Buck @ 2003-05-09 22:48 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

On Fri, May 09, 2003 at 06:30:56PM -0400, Richard Kenner wrote:
>     This would not work for the copy propagation (as we need to remove the
>     reference to the register in order to get dead code removed).
> 
> Can you define "copy propagation".  I'm not familiar with the term.

Muchnick's definition: a transformation that, given an assignment x <- y
for some variables x, y, replaces later uses of x with uses of y, as long
as intervening instructions have not changed the value of either x or y.
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 22:26 Richard Kenner
@ 2003-05-09 22:44 ` Jan Hubicka
  2003-05-09 22:48 ` Joe Buck
  2003-05-10 12:07 ` Richard Earnshaw
  2 siblings, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 22:44 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

>     This would not work for the copy propagation (as we need to remove the
>     reference to the register in order to get dead code removed).
> 
> Can you define "copy propagation".  I'm not familiar with the term.

For each use of the register (reg c) you look for the chain of
operations preceeding it like:
(set (reg b) (reg a)
(set (reg c) (reg b)
and you replace the use by the oldest copy of the value (reg a) in this
case.
> 
>     Can you do some benchmarking on ROMP?  
> 
> No, it's a dead machine.  I was just giving that as an example of the sort
> of problems you can get into by blindly replacing registers by constants.
> 
> It can be a tricky issue.  On the one hand, if you have a reg-reg SET with
> at least one being a pseudo, if you replace the source with a constant, you
> may have replaced a NOP with an insn that does something.  On the other,
> doing that replacement may have eliminated the last use of another pseudo.

I see, you are looking for something like
(set (reg a) (reg b)) where reg b is known to be 100.  Constant
propagation will turn this into (set (reg a) (reg 100)) but it is
possible that copy propagation would eeliminate the need for reg a
entirely replacing all the uses by reg b (or this can be done in
register allocation).

I see that this is handled incorrectly in GCC.  In the compiler books
this does not matter; the IL is regular enought to allow the constants
to be used as operands everywhere so there is no reason for such a loads
of constants into registers.

One approach would be to simply prohibit constant propagation on reg-reg
moves. That would keep code no worse and should not interact badly with
the other optimizers.  For PRE this does not matter as it is working
purely on expressions (reg-reg copy is not an expression).  We can do
copy propagation that will possibly result in killing the set entirely.

It is not perfect, as in the case user already wrote code initializing
variable to 100 many times, we won't cache the copy of 100 in a
register.  That can be accomplished by teaching PRE that constant is
also expression to eliminate, but one would need to be curefull to avoid
ping-pong in between of cprop and PRE.

> In other words, if we have
> 
> 	if (condition)
> 	   a = exp1;
>         else
> 	   a = exp2;
> 
> and we have a choice of putting the NOP before the "if" or making the
> assignment from exp1 be a NOP, the latter is preferred since it eliminates
> a junk.  Thus the heuristic is to have the NOP as *late* as possible.  But
> I think the gcse.c code moves it as *early* as possible.

This is also interesting.  How well does the heuristic work?  In fact I
would expect that it will often fail to elliminate the else block and
result in moving NOP to the less frequently executed place in the
superblock.

There is also generic sollution to that by register coalescing.  When
doing that you can elliminate copies in priority order depending on how
commonly they are executed and you can add heuristics saying that when
the condition is only thing in the basic block it has higher priority.

Simple register coalescing pass is implemented in CFG branch, new
register allocator has one too I suspect much more sophisticated.  This
is something Mark should comment on.

Honza

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 22:30 Richard Kenner
  2003-05-09 22:51 ` Jan Hubicka
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 22:30 UTC (permalink / raw)
  To: jh; +Cc: gcc

    The definition I find in all three books on compilers I have the
    algorithm is documented as maintaining hash of available expressions and
    when there is a hit it inserts a copy instruction at one side and use on
    the other.  This is what GCSE does.

I'd call that a rather over-specified definition.  I wouldn't say that a
key attribute of a CSE algorithm would be the insertion of those copies.
Indeed, cse.c doesn't *want* to do that since it would increase the number
of insns considerably.  It's not at all clear to me that the number of
times such copies would win over the more straightforward approach that 
cse.c does justifies the overhead in all the new insns and pseudos.

I agree that it makes sense for a global cse to do such a thing, however,
because then cse.c can clean it up.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 22:26 Richard Kenner
  2003-05-09 22:44 ` Jan Hubicka
                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 22:26 UTC (permalink / raw)
  To: jh; +Cc: gcc

    This would not work for the copy propagation (as we need to remove the
    reference to the register in order to get dead code removed).

Can you define "copy propagation".  I'm not familiar with the term.

    Can you do some benchmarking on ROMP?  

No, it's a dead machine.  I was just giving that as an example of the sort
of problems you can get into by blindly replacing registers by constants.

It can be a tricky issue.  On the one hand, if you have a reg-reg SET with
at least one being a pseudo, if you replace the source with a constant, you
may have replaced a NOP with an insn that does something.  On the other,
doing that replacement may have eliminated the last use of another pseudo.

    I was worried about that when I was adding the local coprop pass (that
    don't introduce the problem just make it worse) and i did evaulation
    on SPARC (maybe MIPS, don't remember exactly) and the code produced
    was faster and shorter on the average so I concluded that this should
    be mostly safe.  When the constant is expensive, it usually means that
    it needs to be loaded to the register always and then the pattern
    should refuse immediate operand.

Clearly, but the issue isn't just when you need more than one instruction
to load the constant.  See above.  You run a risk of turning an insn that
will become a NOP into a real insn.

In some sense, the question is where is it better to have a NOP.  But in
the case I mentioned, where the NOP is in a conditional context, it's best
to have it there.

In other words, if we have

	if (condition)
	   a = exp1;
        else
	   a = exp2;

and we have a choice of putting the NOP before the "if" or making the
assignment from exp1 be a NOP, the latter is preferred since it eliminates
a junk.  Thus the heuristic is to have the NOP as *late* as possible.  But
I think the gcse.c code moves it as *early* as possible.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 22:17 Richard Kenner
@ 2003-05-09 22:24 ` Jan Hubicka
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 22:24 UTC (permalink / raw)
  To: Richard Kenner; +Cc: jh, gcc

>     No, the CSE algorithm as usually defined in the books would simplify
>     for instance:
>     (set (reg a) (plus b c))
>     (use a)
>     (set (reg a) (something else))
>     (set (reg b) (plus b c))
> 
>     But our alrgorithm won't, but it would do different things CSE described
>     in the books won't, so it is not CSE as rest of the world know it and
>     naming it CSE is missleading.
> 
> Well I'm not sure what the "usual" cse algorithm would simplify this into
> since the result of the addition is no longer around, but defining an
> algorithm by what it would do in an obscure case seems odd to me.

The definition I find in all three books on compilers I have the
algorithm is documented as maintaining hash of available expressions and
when there is a hit it inserts a copy instruction at one side and use on
the other.  This is what GCSE does.

It is not too odd situation, it is what you usually get after the
(manual) loop unrolling when same variables are reused many times for no
good reason.

The global optimizers are usually described as group of three main
passes: copy propagation, constant propagation and CSE (PRE), each
having local and global pass.   This is what I would like to slowly
converge into.  We do have all the main bits in place, just they are
twisted.

Honza

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 21:36 Richard Kenner
@ 2003-05-09 22:19 ` Jan Hubicka
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 22:19 UTC (permalink / raw)
  To: Richard Kenner; +Cc: rth, gcc

>     Because cse.c is too slow.  It cannot be used for code cleanup
>     within another pass.  Compilation time being high on everyone's
>     minds, I think this is an exceedingly good reason to have two
>     passes.
> 
> Fair enough, but it would seem that all it has to do is to add a
> REG_EQUAL note to accomplish its inter-pass purposes and then let CSE
> decide whether the zero or a register is best.

This would not work for the copy propagation (as we need to remove the
reference to the register in order to get dead code removed).

For constant propagation this scheme would work I guess.  It came in
with the original gcse checkin in 97 (about the time I started to get
interested in GCC) and in fact I've added the code to add REG_EQUAL
notes in order to get it working in more complicated situations.  (where
the actual replacement is not possible - old code just gave up).  I
never tried the more radical approach of killing the replacement code
entirely. 

Can you do some benchmarking on ROMP?  I was worried about that when I
was adding the local coprop pass (that don't introduce the problem just
make it worse) and i did evaulation on SPARC (maybe MIPS, don't remember
exactly) and the code produced was faster and shorter on the average so
I concluded that this should be mostly safe.  When the constant is
expensive, it usually means that it needs to be loaded to the register
always and then the pattern should refuse immediate operand.  Perhaps
ROMP is different.
Honza

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 22:17 Richard Kenner
  2003-05-09 22:24 ` Jan Hubicka
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 22:17 UTC (permalink / raw)
  To: jh; +Cc: gcc

    No, the CSE algorithm as usually defined in the books would simplify
    for instance:
    (set (reg a) (plus b c))
    (use a)
    (set (reg a) (something else))
    (set (reg b) (plus b c))

    But our alrgorithm won't, but it would do different things CSE described
    in the books won't, so it is not CSE as rest of the world know it and
    naming it CSE is missleading.

Well I'm not sure what the "usual" cse algorithm would simplify this into
since the result of the addition is no longer around, but defining an
algorithm by what it would do in an obscure case seems odd to me.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 16:18 Richard Kenner
@ 2003-05-09 22:06 ` Jan Hubicka
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 22:06 UTC (permalink / raw)
  To: Richard Kenner; +Cc: matz, gcc

>     Well, when dead code is removed this will look like:
> 
>       p2 <-- 100 + p3
> 
> Only if p1 is, in fact, dead.
> 
>     But this needs to be determined at a later point, instead of cluttering
>     all the optimization passes with machine dependend checks.
> 
> You lost me here.  All of RTL and every choice within it is implicitly
> machine-dependent.  Every cost calculation is machine-dependent.  The
> entire RTL optimization strategy is based on making machine-dependent
> decisions (of course, written in machine-independent ways).
> 
>     CSE as we implement it is fundamentally different from gcse.  For one
>     our CSE is not any CSE but a kind of ad hoc value numbering
>     intertwined with expression simplification.
> 
> Again, you lost me.  It eliminates common subexpressions.  Isn't that what
> "cse" stands for?
No, the CSE algorithm as usually defined in the books would simplify
for instance:
(set (reg a) (plus b c))
(use a)
(set (reg a) (something else))
(set (reg b) (plus b c))

But our alrgorithm won't, but it would do different things CSE described
in the books won't, so it is not CSE as rest of the world know it and
naming it CSE is missleading.

Honza
> 
>     And I can't find a function in cse.c which does something similar
>     (i.e. just propagating constants and copies).  
> 
> Well, fold_rtx, I suppose, but that's because it does everything at
> one time.  And, as I've pointed out, it's not always better to
> substitute a constant for a register: sometimes, it's best to do it
> the other way around.  I have a favorite example for this:  the (non-obsolete)
> ROMP chip had an ABI where the return register and the first argument 
> register were the same. It also had a conditional return insn.  So if you
> have a function with one operand, say X, and you have:
> 
> 	if (x == 0)
> 	  return 0;
> 
> What you really want to do with the insn setting the return value is
> to change *from* the 0 *to* the equivalent, X.  That way that insn is a
> no-op, and you can do this as a conditional return insn.
> 
> CSE gets this right because it uses the cost metric.  It sounds like the
> code in gcse.c will get this wrong.
> 
>     Like I said, local_cprop* has nothing to do with gcse itself, except being
>     run to clean up the code between the gcse iterations.
> 
> Then that's all the *more* reason why it should be documented!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 15:57 ` Michael Matz
  2003-05-09 16:34   ` David Edelsohn
@ 2003-05-09 22:01   ` Jan Hubicka
  1 sibling, 0 replies; 56+ messages in thread
From: Jan Hubicka @ 2003-05-09 22:01 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Kenner, gcc

> Hi,
> 
> On Fri, 9 May 2003, Richard Kenner wrote:
> 
> > Then it what sense is it "local"?  Do you know?
> 
> Maybe the initial author felt it was local in the sense that it doesn't
> use real and global data flow information, but instead a conservative
> subset, like implemented with cselib.

Well, I actually originally did it purely local but later found that
a) initialization of cselib at each basic block is expensive
   (I fixed this later so this should be no longer so hot issue)
b) doing superblock CSE makes the whole thing to converge faster and is
   completely for free.
I failed to document that somehow that is a mistake.  I will add a
documentation.
It duplicates what cse already knows to do for three reason - at first it
needs to be fast, at second it needs to be consistent with what global
optimization expect.  The global optimization works optimally only when
constants are put as constants and copies uses the oldest value inside
BB.  CSE fails to do that.  It uses heuristics often leaving two copies
in the same code..
At third I would like to see CSE replaced by standard and documented
alrogrithms.  Still I preffer to have copy propagation where I can open
a book and see what copy propagation does over having CSE that is not
doing CSE at all.

I will send patch updating the documentation.
Also concerning the gcse.c file, Zdenek also did series of patches to
break it up into multiple files each for individual pass it does (like
cprop, pre, unification, store motion, null pointer ellimination) all
folded together.  These somehow got suck in the review process
unfortunately.  I believe splitting the whole thing and adding
introducionary comment for each file with reference to apropriate
paper/book would help a lot.
> 
> >
> >       p1 <-- 100
> >       p2 <-- p1 + p3
> >     into
> >       p1 <-- 100
> >       p2 <-- 100 + p3
> >
> > For many machines, the former is better.
> 
> Well, when dead code is removed this will look like:
> 
>   p2 <-- 100 + p3
> 
> i.e. without setting p1 at all.  I guess there are not many machines where
> the latter is not better (or at least not worse) than the initial sequence
> with the explicit setup of p1.

Yes, I think we should have simple specialized pass here instead of the
crazy heuristics we have in cse.  For example for i386 this is still
true, in the case constant 100 is used many times in the code.  This is
something CSE can't decide having just the local information in the
mind, but cost of using constant and cost of loading constant to
register would.

Honza
> 
> Also what is better depends on when this optimization is run.  To clean up
> code, which usually is the goal for optimizations run early you want to do
> this copy propagation done unconditionally to better expose the
> redundancies.
> 
> If it then is determined that for a certain machine emitting that constant
> instead if a register reference is worse, and that constant still sits
> around in some register (for instance if the p1-setup could not be
> removed for some reason), that conversion can be reversed again, i.e. the
> register can be used again, or even a copy can again be inserted.
> 
> But this needs to be determined at a later point, instead of cluttering
> all the optimization passes with machine dependend checks.
> 
> > CSE has lots of logic to do this and knows which is better.
> 
> Yes, and now look at CSE's code.
> 
> > What is this code doing, why does it to things differently than CSE, and
> > how does it interact with CSE?  These are both questions and things that
> > need to be documented in the code.
> >
> > I'd expect gcse to only operate on the *start* of a basic block and let
> > CSE make the decisions *within* the block.
> 
> CSE as we implement it is fundamentally different from gcse.  For one our
> CSE is not any CSE but a kind of ad hoc value numbering intertwined with
> expression simplification.
> 
> That having said, the local_cprop pass _does_ use cselib to determine if a
> pseudo is the same as another pseudo or equal to a constant.  And I can't
> find a function in cse.c which does something similar (i.e. just
> propagating constants and copies).  For the gcse we don't want to run a
> full featured CSE pass in each iteration (because CSE is one of the
> slowest things in gcc overall), so someone has obviously implemented
> the missing functionality using as much code from cse (cselib namely) as
> possible.  It just was placed into gcse.c, although it doesn't really has
> any connection for it.  It just is a small cleanup pass.
> 
> > But the code doesn't seem to do that.  Without the missing
> > documentation, I can't tell if that's a bug in the code or is the way it
> > was designed to work.
> 
> Like I said, local_cprop* has nothing to do with gcse itself, except being
> run to clean up the code between the gcse iterations.
> 
> 
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 21:36 Richard Kenner
  2003-05-09 22:19 ` Jan Hubicka
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 21:36 UTC (permalink / raw)
  To: rth; +Cc: gcc

    Because cse.c is too slow.  It cannot be used for code cleanup
    within another pass.  Compilation time being high on everyone's
    minds, I think this is an exceedingly good reason to have two
    passes.

Fair enough, but it would seem that all it has to do is to add a
REG_EQUAL note to accomplish its inter-pass purposes and then let CSE
decide whether the zero or a register is best.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 16:08 Richard Kenner
@ 2003-05-09 21:04 ` Richard Henderson
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Henderson @ 2003-05-09 21:04 UTC (permalink / raw)
  To: Richard Kenner; +Cc: drow, gcc

On Fri, May 09, 2003 at 12:13:34PM -0400, Richard Kenner wrote:
> Indeed now that I see that message, I remember it.  But it doesn't answer the
> question I have: why is it appropriate to have *two* constant propagation
> routines within a basic block where only one (the one in cse.c) knows how to
> properly apply cost functions ...

Because cse.c is too slow.  It cannot be used for code cleanup
within another pass.  Compilation time being high on everyone's
minds, I think this is an exceedingly good reason to have two
passes.

As for examining or ignoring cost functions, Jan's patch seems
to have been a win on the major platforms.  Of course, they're
all fairly trivial when it comes to costs of constants that can
be recognized within instructions.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 12:12 Richard Kenner
                   ` (2 preceding siblings ...)
  2003-05-09 17:23 ` Joseph S. Myers
@ 2003-05-09 20:30 ` Scott Robert Ladd
  3 siblings, 0 replies; 56+ messages in thread
From: Scott Robert Ladd @ 2003-05-09 20:30 UTC (permalink / raw)
  To: gcc mailing list

Richard Kenner wrote:
> I want to repeat what I just said in a separate message: the amount of
> totally undocumented code in GCC is approaching very troublesome levels.
> There are who areas of the compler in which nobody but their author can
> work due to a complete lack of any high-level documentation.

Totally agreed. One major impediment to my working on GCC is the lack of
accurate and complete documentation, either external or embedded in the
code.

> The local_cprop_* functions in gcse are a very good case in point.
> 
> There is no comment anywhere that I can find of the form "This phase performs
> the ABC optimization by doing the following algorithm.  This optimization
> is useful because XYZ and does not duplicate a similar optimization in QQQ
> because of BAR."

Exactly. For example, I would very much like to add automatic
vectorization to GCC; this is the one area where GCC's performance falls
short of its commercial competitors for high-performance applications. I
have a vague notion of where to start, after studying the code -- but I
hesitite to start hacking on such an important project as GCC without a
clear sense of how the code works.

> There's no point in trying to understand how we got to this appalling state
> of affairs, but we do need to decide how to proceed from now.  My suggestion
> is the following:
> 
> (1) Do not allow any further patches to the compiler that do not have proper
> documentation.

Enforcing such a dictate would be problematic at best. You'll find
resistance from the "we need this now" crowd, which seems to lead a
robust life in both free and commercial products.

Even for writers like myself, documenting code is less enjoyable than
creating it -- yet when delivering code to a client, I document every
line thoroughly, because that is the "right" thing to do. I don't think
many "free" developers treat "free" software projects as they would a
commercial "business" product -- and therein lies the lack of motivation
for documenting code.

> (2) Remove any parts of the compiler for which documentation is not supplied
> by the 3.4 release.

Not feasible, given that this would leave us with a non-working compiler. :)

That said, I've expressed interest in the past in working on the gcc
documentation; however, my inquiries to the community about specific
issues (i.e., what the heck does this do?) tend to fall on deaf ears.

Quality documentation for GCC will result from a community concensus
that documentation is important, and a combined effort to describe the
arcane aspects of the code base. The big question is: How do we rally
that support?

> It is very clear that the approach of supplying documentation later
> does not work.

Most certainly true.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Professional programming for science and engineering;
Interesting and unusual bits of very free code.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 16:39 Benjamin Kosnik
@ 2003-05-09 18:22 ` Phil Edwards
  0 siblings, 0 replies; 56+ messages in thread
From: Phil Edwards @ 2003-05-09 18:22 UTC (permalink / raw)
  To: Benjamin Kosnik; +Cc: gcc

On Fri, May 09, 2003 at 11:38:20AM -0500, Benjamin Kosnik wrote:
> 
> For showing interrelationships between functions, you might find the
> doxygen maps useful. 
> 
> http://www.jaj.com/space/phil/gccdoxy/gccdoxy_20021228.tar.bz2

Aw crumbs, you mean somebody uses that?  Now I gotta keep it up to date!

-- 
If ye love wealth greater than liberty, the tranquility of servitude greater
than the animating contest for freedom, go home and leave us in peace.  We seek
not your counsel, nor your arms.  Crouch down and lick the hand that feeds you;
and may posterity forget that ye were our countrymen.            - Samuel Adams

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 12:12 Richard Kenner
  2003-05-09 12:41 ` Michael Matz
  2003-05-09 13:25 ` Paul Koning
@ 2003-05-09 17:23 ` Joseph S. Myers
  2003-05-09 20:30 ` Scott Robert Ladd
  3 siblings, 0 replies; 56+ messages in thread
From: Joseph S. Myers @ 2003-05-09 17:23 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Fri, 9 May 2003, Richard Kenner wrote:

> There's no point in trying to understand how we got to this appalling state
> of affairs, but we do need to decide how to proceed from now.  My suggestion
> is the following:
> 
> (1) Do not allow any further patches to the compiler that do not have proper
> documentation.

How about the suggestion Phil Edwards made two years ago
<http://gcc.gnu.org/ml/gcc/2001-05/msg00612.html> of a total code freeze
until the documentation is adequate?

I expect that my list of undocumented target macros
<http://gcc.gnu.org/ml/gcc/2001-06/msg00507.html> is still largely
accurate, though some have been documented, and some were defined only for
obsoleted targets (in which case we should consider removing the uses of
the macros unless there is a conceptual reason they may be useful for
future targets).

-- 
Joseph S. Myers
jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 16:39 Benjamin Kosnik
  2003-05-09 18:22 ` Phil Edwards
  0 siblings, 1 reply; 56+ messages in thread
From: Benjamin Kosnik @ 2003-05-09 16:39 UTC (permalink / raw)
  To: gcc


For showing interrelationships between functions, you might find the
doxygen maps useful. 

http://www.jaj.com/space/phil/gccdoxy/gccdoxy_20021228.tar.bz2

-benjamin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 15:57 ` Michael Matz
@ 2003-05-09 16:34   ` David Edelsohn
  2003-05-09 22:01   ` Jan Hubicka
  1 sibling, 0 replies; 56+ messages in thread
From: David Edelsohn @ 2003-05-09 16:34 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Kenner, gcc

>>>>> Michael Matz writes:

Michael> But this needs to be determined at a later point, instead of cluttering
Michael> all the optimization passes with machine dependend checks.

	Not every processor has an x86 cost model.  Generating fewer
instructions usually is an accurate, generic metric.  However, choosing
one sequence of instructions over another is very machine-dependent.

David

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 16:18 Richard Kenner
  2003-05-09 22:06 ` Jan Hubicka
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 16:18 UTC (permalink / raw)
  To: matz; +Cc: gcc

    Well, when dead code is removed this will look like:

      p2 <-- 100 + p3

Only if p1 is, in fact, dead.

    But this needs to be determined at a later point, instead of cluttering
    all the optimization passes with machine dependend checks.

You lost me here.  All of RTL and every choice within it is implicitly
machine-dependent.  Every cost calculation is machine-dependent.  The
entire RTL optimization strategy is based on making machine-dependent
decisions (of course, written in machine-independent ways).

    CSE as we implement it is fundamentally different from gcse.  For one
    our CSE is not any CSE but a kind of ad hoc value numbering
    intertwined with expression simplification.

Again, you lost me.  It eliminates common subexpressions.  Isn't that what
"cse" stands for?

    And I can't find a function in cse.c which does something similar
    (i.e. just propagating constants and copies).  

Well, fold_rtx, I suppose, but that's because it does everything at
one time.  And, as I've pointed out, it's not always better to
substitute a constant for a register: sometimes, it's best to do it
the other way around.  I have a favorite example for this:  the (non-obsolete)
ROMP chip had an ABI where the return register and the first argument 
register were the same. It also had a conditional return insn.  So if you
have a function with one operand, say X, and you have:

	if (x == 0)
	  return 0;

What you really want to do with the insn setting the return value is
to change *from* the 0 *to* the equivalent, X.  That way that insn is a
no-op, and you can do this as a conditional return insn.

CSE gets this right because it uses the cost metric.  It sounds like the
code in gcse.c will get this wrong.

    Like I said, local_cprop* has nothing to do with gcse itself, except being
    run to clean up the code between the gcse iterations.

Then that's all the *more* reason why it should be documented!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 16:08 Richard Kenner
  2003-05-09 21:04 ` Richard Henderson
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 16:08 UTC (permalink / raw)
  To: drow; +Cc: gcc

    We don't live in a void of one flat source tree.  While I'm not arguing
    that documentation would be better, it's still easy to find the history
    of this code.  It was added last July by Jan, in this patch:
      http://gcc.gnu.org/ml/gcc-patches/2002-07/msg00617.html

Indeed now that I see that message, I remember it.  But it doesn't answer the
question I have: why is it appropriate to have *two* constant propagation
routines within a basic block where only one (the one in cse.c) knows how to
properly apply cost functions?  At the time I saw this message, I was dubious
about this and thought I even raised the issue on the lists.

Moreover, this goes counter to the claim that was made just a few hours ago,
which is that the "local" did *not* refer to doing something that was
non-global, thus showing that there is indeed major confusion due to lack of
documentation.

However, the patch as cited in the message should *not* have been approved,
both because of the missing documentation, and because of the confusion about
the role of CSE in this task, as exemplified by the last sentence of the
posting: "I am surprised our local CSE don't know how to handle it, but I
will check it separately."  Since that "check" should have resulted in
not needing this patch, it needed to be done before the patch was approved.

My point in raising this was not to specifically get the information on
the code (as you point out, it's not hard to find the message where a 
particular piece of code was first proposed), but to address the more general
issue that there are *lots* of code that are improperly documented and it's
unreasonable to do this sort of search for *every* such function.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 15:36 Richard Kenner
  2003-05-09 15:56 ` Daniel Jacobowitz
@ 2003-05-09 15:57 ` Michael Matz
  2003-05-09 16:34   ` David Edelsohn
  2003-05-09 22:01   ` Jan Hubicka
  1 sibling, 2 replies; 56+ messages in thread
From: Michael Matz @ 2003-05-09 15:57 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Hi,

On Fri, 9 May 2003, Richard Kenner wrote:

> Then it what sense is it "local"?  Do you know?

Maybe the initial author felt it was local in the sense that it doesn't
use real and global data flow information, but instead a conservative
subset, like implemented with cselib.

>
>       p1 <-- 100
>       p2 <-- p1 + p3
>     into
>       p1 <-- 100
>       p2 <-- 100 + p3
>
> For many machines, the former is better.

Well, when dead code is removed this will look like:

  p2 <-- 100 + p3

i.e. without setting p1 at all.  I guess there are not many machines where
the latter is not better (or at least not worse) than the initial sequence
with the explicit setup of p1.

Also what is better depends on when this optimization is run.  To clean up
code, which usually is the goal for optimizations run early you want to do
this copy propagation done unconditionally to better expose the
redundancies.

If it then is determined that for a certain machine emitting that constant
instead if a register reference is worse, and that constant still sits
around in some register (for instance if the p1-setup could not be
removed for some reason), that conversion can be reversed again, i.e. the
register can be used again, or even a copy can again be inserted.

But this needs to be determined at a later point, instead of cluttering
all the optimization passes with machine dependend checks.

> CSE has lots of logic to do this and knows which is better.

Yes, and now look at CSE's code.

> What is this code doing, why does it to things differently than CSE, and
> how does it interact with CSE?  These are both questions and things that
> need to be documented in the code.
>
> I'd expect gcse to only operate on the *start* of a basic block and let
> CSE make the decisions *within* the block.

CSE as we implement it is fundamentally different from gcse.  For one our
CSE is not any CSE but a kind of ad hoc value numbering intertwined with
expression simplification.

That having said, the local_cprop pass _does_ use cselib to determine if a
pseudo is the same as another pseudo or equal to a constant.  And I can't
find a function in cse.c which does something similar (i.e. just
propagating constants and copies).  For the gcse we don't want to run a
full featured CSE pass in each iteration (because CSE is one of the
slowest things in gcc overall), so someone has obviously implemented
the missing functionality using as much code from cse (cselib namely) as
possible.  It just was placed into gcse.c, although it doesn't really has
any connection for it.  It just is a small cleanup pass.

> But the code doesn't seem to do that.  Without the missing
> documentation, I can't tell if that's a bug in the code or is the way it
> was designed to work.

Like I said, local_cprop* has nothing to do with gcse itself, except being
run to clean up the code between the gcse iterations.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 15:36 Richard Kenner
@ 2003-05-09 15:56 ` Daniel Jacobowitz
  2003-05-09 15:57 ` Michael Matz
  1 sibling, 0 replies; 56+ messages in thread
From: Daniel Jacobowitz @ 2003-05-09 15:56 UTC (permalink / raw)
  To: Richard Kenner; +Cc: matz, gcc

On Fri, May 09, 2003 at 11:41:25AM -0400, Richard Kenner wrote:
>     It seems to be a confusing name, as it doesn't work locally at all.  
> 
> Then it what sense is it "local"?  Do you know?
> 
>       p1 <-- 100
>       p2 <-- p1 + p3
>     into
>       p1 <-- 100
>       p2 <-- 100 + p3
> 
> For many machines, the former is better.  CSE has lots of logic to do this
> and knows which is better.  What is this code doing, why does it to things
> differently than CSE, and how does it interact with CSE?  These are both
> questions and things that need to be documented in the code.
> 
> I'd expect gcse to only operate on the *start* of a basic block and let
> CSE make the decisions *within* the block.  But the code doesn't seem to
> do that.  Without the missing documentation, I can't tell if that's a bug
> in the code or is the way it was designed to work.

We don't live in a void of one flat source tree.  While I'm not arguing
that documentation would be better, it's still easy to find the history
of this code.  It was added last July by Jan, in this patch:
  http://gcc.gnu.org/ml/gcc-patches/2002-07/msg00617.html

It was as simple as using CVS to find the patch, and then Google to
find the mailing list posting.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 15:36 Richard Kenner
  2003-05-09 15:56 ` Daniel Jacobowitz
  2003-05-09 15:57 ` Michael Matz
  0 siblings, 2 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 15:36 UTC (permalink / raw)
  To: matz; +Cc: gcc

    It seems to be a confusing name, as it doesn't work locally at all.  

Then it what sense is it "local"?  Do you know?

      p1 <-- 100
      p2 <-- p1 + p3
    into
      p1 <-- 100
      p2 <-- 100 + p3

For many machines, the former is better.  CSE has lots of logic to do this
and knows which is better.  What is this code doing, why does it to things
differently than CSE, and how does it interact with CSE?  These are both
questions and things that need to be documented in the code.

I'd expect gcse to only operate on the *start* of a basic block and let
CSE make the decisions *within* the block.  But the code doesn't seem to
do that.  Without the missing documentation, I can't tell if that's a bug
in the code or is the way it was designed to work.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 14:10 Richard Kenner
@ 2003-05-09 14:37 ` Michael Matz
  0 siblings, 0 replies; 56+ messages in thread
From: Michael Matz @ 2003-05-09 14:37 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Hi,

On Fri, 9 May 2003, Richard Kenner wrote:

>     Also before the *local_cprop* functions, except local_cprop_pass(),
>     which is only called at toplevel from one_cprop_pass().
>
> Also, what does "local" mean in this context?  I'd take it as the opposite
> of "global", meaning to me that this does an optimzation on a small part of
> the function being compiled.  Which part?  I see no documentation of that.
>
> Or is this just a confusing name?

It seems to be a confusing name, as it doesn't work locally at all.  In
fact the whole local_cprop pass makes a go over _all_ instructions
(and not per basic block, but instead by NEXT_INSN) using
cselib to propagate constants and register sources into uses.  I.e. it
converts such code:

  p1 <-- 100
  p2 <-- p1 + p3
into
  p1 <-- 100
  p2 <-- 100 + p3

And also this
  p1 <-- p4
  p2 <-- p1 + p3
into
  p1 <-- p4
  p2 <-- p4 + p3

The goal of this is to possibly remove all references to a pseudo, and
hence make the definition trivially deletable, which is faster than any
data flow based dead code removal, but also removes less code.  But it
might be Good Enough to be used repeatedly inside the GCSE loop.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 14:10 Richard Kenner
  2003-05-09 14:37 ` Michael Matz
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 14:10 UTC (permalink / raw)
  To: matz; +Cc: gcc

    Also before the *local_cprop* functions, except local_cprop_pass(),
    which is only called at toplevel from one_cprop_pass().  

Also, what does "local" mean in this context?  I'd take it as the opposite
of "global", meaning to me that this does an optimzation on a small part of
the function being compiled.  Which part?  I see no documentation of that.

Or is this just a confusing name?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 14:01 Richard Kenner
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 14:01 UTC (permalink / raw)
  To: matz; +Cc: gcc

    I don't know about you.  But when I look at gcse.c, I see heaps of
    comments, before most of the functions.  

The problem is that it's a requirement that it be before *all* functions,
not "most".  I count nine that are missing it and didn't look at how many
are incomplete (see one example below).

    Also before the *local_cprop* functions, except local_cprop_pass(), which
    is only called at toplevel from one_cprop_pass().

But if I'm looking to see what a specific local_cprop function does,
I'm going to look in front of local_cprop_pass, not try to find the function
that calls it and look there.

Now let's consider the documentation of one_cprop_pass.  It refers to a
"copy/constant propagation pass".   I did a search of the entire file for
the string "copy/constant propagation" and find *no* description of what
that term means.  Yes, most people would know what "constant propagation"
is, but what does "copy" mean in that context?

    Also the whole gcse.c is documented on the top of that file rather
    extensively.

No, I don't consider it "extensive" at all.  Yes, it gives a very broad
overview of what's being done, but doesn't link that overview to the
implementation.

    And additionally a heap of paper references are given
    for people really interested in the theoretic background.

That's good, but I suspect that the comments also assume that every
reader has read all the papers.  If it's the desire to rely on a paper
as a prerequisite for documentation, there should be a pointer to *one*
paper and a URL where it can be found.

    Well, on the top of gcse.c it says, that one of the passes in GCSE is
    copy/constant propagation.  This is a standard optimization, and hence
    the actual description of any algorithm (on the high level) seems
    superflous.

I disagree.  There are lots of "standard" optimizations, but once you
go from the realm of a paper into an actual implementation, you need to
say *precisely* what you've done.

    Tricks used in the implementation should of course be
    documented, but have no place in any high level overview comments.

Agreed, but what *is* important in that high-level overview, and is missing,
is an explanation of how the optimizations in this file interact with other
similar optimizations.  Specifically, it should say what the boundaries
between what's done there and what's done in CSE are.

I also see absolutely no discussion of the critical issue of when constants
should replace registers: CSE has a lot of code to do that and if it's being
done different in gcse there needs to be documentation saying why.

    About the actual problem you have: There is code to adjust the
    REG_EQUAL notes for changed libcall blocks, namely in
    adjust_libcall_notes() called from do_local_cprop(), which sounds as
    if it should do what you wanted.  Ergo you probably want to look at
    that function, why it doesn't work.  The third match for "LIBCALL" set
    me on the right track, so it wasn't exactly difficult to find.

OK, but just as an example for what's missing, I'm looking at 

/* LIBCALL_SP is a zero-terminated array of insns at the end of a libcall;
   their REG_EQUAL notes need updating.  */

static bool
do_local_cprop (x, insn, alter_jumps, libcall_sp)

It's nice that *one* operand to that function is documented, but it would
be even better if *all* of them were and better still if there were at least
one sentence saying what the function as a whole is supposed to do.

Getting back to adjust_libcall_notes: there are absolutely no comments
within the routine.  There's a call to reg_set_between_p whose purpose
isn't obvious to me and nothing in the function header comments say why
register usage would be important.  Since there are no comments in the
function itself, there's no way to know what was intended there.  Perhaps
that's related to the bug, perhaps not, but without knowing the *purpose*
of that call, it's hard to say.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 12:12 Richard Kenner
  2003-05-09 12:41 ` Michael Matz
@ 2003-05-09 13:25 ` Paul Koning
  2003-05-09 17:23 ` Joseph S. Myers
  2003-05-09 20:30 ` Scott Robert Ladd
  3 siblings, 0 replies; 56+ messages in thread
From: Paul Koning @ 2003-05-09 13:25 UTC (permalink / raw)
  To: kenner; +Cc: gcc

>>>>> "Richard" == Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:

 Richard> I want to repeat what I just said in a separate message: the
 Richard> amount of totally undocumented code in GCC is approaching
 Richard> very troublesome levels.  There are who areas of the compler
 Richard> in which nobody but their author can work due to a complete
 Richard> lack of any high-level documentation....

Speaking as somone who has been trying to dig into gcc with the help
of gccint.*, I certainly agree.  However...

 Richard> (2) Remove any parts of the compiler for which documentation
 Richard> is not supplied by the 3.4 release.

That doesn't seem very practical because you'd end up with much of the
compiler gone -- like the parser, and the code generator... :-(

FWIW, I think the situation with gcc is substantially better than,
say, gdb, or worse yet, binutils.

     paul

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: An issue for the SC: horrible documentation quality of GCC
  2003-05-09 12:12 Richard Kenner
@ 2003-05-09 12:41 ` Michael Matz
  2003-05-09 13:25 ` Paul Koning
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Michael Matz @ 2003-05-09 12:41 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Hi,

On Fri, 9 May 2003, Richard Kenner wrote:

> I want to repeat what I just said in a separate message: the amount of
> totally undocumented code in GCC is approaching very troublesome levels.

I don't know about you.  But when I look at gcse.c, I see heaps of
comments, before most of the functions.  Also before the *local_cprop*
functions, except local_cprop_pass(), which is only called at toplevel
from one_cprop_pass().  Also the whole gcse.c is documented on the top of
that file rather extensively.  And additionally a heap of paper references
are given for people really interested in the theoretic background.

I think you highly exaggerate the situation.

> There is no comment anywhere that I can find of the form "This phase performs
> the ABC optimization by doing the following algorithm.  This optimization
> is useful because XYZ and does not duplicate a similar optimization in QQQ
> because of BAR."

Well, on the top of gcse.c it says, that one of the passes in GCSE is
copy/constant propagation.  This is a standard optimization, and hence the
actual description of any algorithm (on the high level) seems superflous.
Tricks used in the implementation should of course be documented, but have
no place in any high level overview comments.

About the actual problem you have: There is code to adjust the REG_EQUAL
notes for changed libcall blocks, namely in adjust_libcall_notes() called
from do_local_cprop(), which sounds as if it should do what you wanted.
Ergo you probably want to look at that function, why it doesn't work.  The
third match for "LIBCALL" set me on the right track, so it wasn't exactly
difficult to find.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* An issue for the SC: horrible documentation quality of GCC
@ 2003-05-09 12:12 Richard Kenner
  2003-05-09 12:41 ` Michael Matz
                   ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Richard Kenner @ 2003-05-09 12:12 UTC (permalink / raw)
  To: gcc

I want to repeat what I just said in a separate message: the amount of
totally undocumented code in GCC is approaching very troublesome levels.
There are who areas of the compler in which nobody but their author can
work due to a complete lack of any high-level documentation.

The local_cprop_* functions in gcse are a very good case in point.

There is no comment anywhere that I can find of the form "This phase performs
the ABC optimization by doing the following algorithm.  This optimization
is useful because XYZ and does not duplicate a similar optimization in QQQ
because of BAR."

From the name, I can conclude this has *something* to do with "local
constant propagation".  If I studied the code long enough, I could probably
figure out what it actually *does*.

But there is no way to figure out the really important part of the
missing documentation, which is what the phase is *supposed* to do and
the motivations for the new phase.

This is by no means the first time this has happened: it most cases when I've
had to look at a relatively-recent part of the compiler, I find this lack
of documentation and I have complained about it before.

There's no point in trying to understand how we got to this appalling state
of affairs, but we do need to decide how to proceed from now.  My suggestion
is the following:

(1) Do not allow any further patches to the compiler that do not have proper
documentation.

(2) Remove any parts of the compiler for which documentation is not supplied
by the 3.4 release.

It is very clear that the approach of supplying documentation later
does not work.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2003-05-13  5:59 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-10 15:29 An issue for the SC: horrible documentation quality of GCC Richard Kenner
2003-05-10 15:52 ` Zdenek Dvorak
2003-05-10 16:09   ` David Edelsohn
2003-05-10 16:17     ` Zdenek Dvorak
2003-05-10 16:57     ` Michael S. Zick
2003-05-12  9:07   ` Richard Earnshaw
  -- strict thread matches above, loose matches on Subject: below --
2003-05-12 16:25 Richard Kenner
2003-05-12 11:37 Robert Dewar
2003-05-12 11:36 Robert Dewar
2003-05-12 11:16 Richard Kenner
2003-05-11 19:58 Richard Kenner
2003-05-13  5:59 ` Michael S. Zick
2003-05-10 18:08 Richard Kenner
2003-05-10 17:04 Richard Kenner
2003-05-10 17:18 ` Zdenek Dvorak
2003-05-10 16:49 Richard Kenner
2003-05-11 19:42 ` Kai Henningsen
2003-05-10  2:21 Richard Kenner
2003-05-10 15:12 ` Zdenek Dvorak
2003-05-12 16:05   ` law
2003-05-12 16:20     ` Michael Matz
2003-05-12 17:39 ` law
2003-05-12 20:15   ` Richard Henderson
2003-05-09 23:13 Richard Kenner
2003-05-09 23:23 ` Jan Hubicka
2003-05-09 22:51 Richard Kenner
2003-05-09 22:59 ` Jan Hubicka
2003-05-09 22:30 Richard Kenner
2003-05-09 22:51 ` Jan Hubicka
2003-05-09 22:26 Richard Kenner
2003-05-09 22:44 ` Jan Hubicka
2003-05-09 22:48 ` Joe Buck
2003-05-10 12:07 ` Richard Earnshaw
2003-05-09 22:17 Richard Kenner
2003-05-09 22:24 ` Jan Hubicka
2003-05-09 21:36 Richard Kenner
2003-05-09 22:19 ` Jan Hubicka
2003-05-09 16:39 Benjamin Kosnik
2003-05-09 18:22 ` Phil Edwards
2003-05-09 16:18 Richard Kenner
2003-05-09 22:06 ` Jan Hubicka
2003-05-09 16:08 Richard Kenner
2003-05-09 21:04 ` Richard Henderson
2003-05-09 15:36 Richard Kenner
2003-05-09 15:56 ` Daniel Jacobowitz
2003-05-09 15:57 ` Michael Matz
2003-05-09 16:34   ` David Edelsohn
2003-05-09 22:01   ` Jan Hubicka
2003-05-09 14:10 Richard Kenner
2003-05-09 14:37 ` Michael Matz
2003-05-09 14:01 Richard Kenner
2003-05-09 12:12 Richard Kenner
2003-05-09 12:41 ` Michael Matz
2003-05-09 13:25 ` Paul Koning
2003-05-09 17:23 ` Joseph S. Myers
2003-05-09 20:30 ` Scott Robert Ladd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).