Re: Language-independent functions-as-trees representation

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
@ 2002-07-20 13:58 ` Per Bothner
  2002-07-21 22:04   ` Jason Merrill
  2002-07-20 22:17 ` Steven Bosscher
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 95+ messages in thread
From: Per Bothner @ 2002-07-20 13:58 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

Jason Merrill wrote:
> In a previous thread, Diego has suggested that inlining would be done on
> language-dependent trees.  I think this is a mistake; IMO it should be done
> at the SIMPLE level.  Requiring the inliner to know about frontend trees is
> wrong.

But why can't the inliner work at the GENERIC level?

> Per (or other Java folk), why does the Java frontend use BLOCK as an
> expression rather than use BIND_EXPR, which seems to serve the same
> purpose?

I don't remember.  Possibly because I did think the BIND_EXPR bought us
anything over using just the BLOCK, which we needed anyway.  It does
look like using BIND_EXPR would be cleaner.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Language-independent functions-as-trees representation
@ 2002-07-20 19:47 Jason Merrill
  2002-07-20 13:58 ` Per Bothner
                   ` (8 more replies)
  0 siblings, 9 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-20 19:47 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Per Bothner, Diego Novillo, Mark Mitchell, gcc, Jason Merrill

I'd like to return to the discussion of a language-independent
functions-as-trees representation that was going on in January.  At this
point I'd like to ignore the question of representation within a particular
frontend (INITIAL?), and focus on what the frontend hands off to the
middle-end (GENERIC?) and the simplified form which is used in
optimization (SIMPLE).

Previous threads on the subject include:

  http://gcc.gnu.org/ml/gcc-patches/2000-06/threads.html#00460
  http://gcc.gnu.org/ml/gcc-patches/2001-11/threads.html#01332
  http://gcc.gnu.org/ml/gcc/2002-01/threads.html#00082

In each of these threads, rth and Per have disagreed over the EXPR/STMT
duality.  My oversimplified summary:

Per argues that having (for example) both IF_STMT and COND_EXPR is
gratuitous duplication.  rth argues that at the SIMPLE level we want to
distinguish statements (which have side effects/affect control flow) and
expressions (which don't).  Per points out that we don't need different
tree codes to get that distinction; we can just say that in SIMPLE a
COND_EXPR always has void type, so it only affects control flow.

Although I've argued with Per in the past, while working on the tree-ssa
branch I've largely come around to his point of view.  In SIMPLE, we deal
with lists of (let's call them) statements.  If one of these statements
affects control flow, we can tell that by the top tree code; it doesn't
matter whether it's spelled IF_STMT or COND_EXPR.  In GENERIC, we can't
tell anything from the top tree code, since expressions can contain
statements, so again the segregation of tree codes doesn't matter.

Working on SIMPLE has made this clearer to me because simplification
basically involves turning expressions into statements.  Sometimes an
expression is wrapped in another expression node, and I want to preserve
that nesting at the statement level.  But we can't currently nest _STMTs
inside _EXPRs, so we end up jumping through hoops.  For instance, to
simplify an EXPR_WITH_FILE_LOCATION, we create new _STMTs, set their line
numbers, and add FILE_STMTs as necessary.  It would be simpler if we could
just leave everything under the EXPR_WITH_FILE_LOCATION, and add it to the
list of tree codes which can contain other SIMPLE statements.  And I'm
looking at the same sort of thing when I start dealing with exception
handling constructs.  If I simplify something out from under a
TRY_CATCH_EXPR, I don't want to have to replace it with a TRY_BLOCK.

Furthermore, defining expressions to have no side-effects at all would seem
to exclude CALL_EXPRs, trapping arithmetic, and volatile accesses.  rth,
what exactly is the distinction you want to express?

-----
Other notes:

We still need to address the issue of expressing ordering requirements in
SIMPLE, as discussed in the third thread above.  For instance, currently
the tree-ssa branch always simplifies function arguments in left-to-right
order, which is not required for C and is inefficient on PUSH_ROUNDING
targets.  We need more tree codes to express these things.  Perhaps
SEQUENCE_POINT and UNORDERED_LIST would be sufficient.  I guess it's
reasonable to defer dealing with this until the basic infrastructure is
done, but I haven't seen an actual decision to that effect, it just seems
to have been dropped.

In a previous thread, Diego has suggested that inlining would be done on
language-dependent trees.  I think this is a mistake; IMO it should be done
at the SIMPLE level.  Requiring the inliner to know about frontend trees is
wrong.

Per (or other Java folk), why does the Java frontend use BLOCK as an
expression rather than use BIND_EXPR, which seems to serve the same
purpose?

The current TARGET_EXPR/WITH_CLEANUP_EXPR/CLEANUP_POINT_EXPR structure is
unsuited to a structured tree representation; it relies on bookkeeping
outside of the trees (i.e. expand_decl_cleanup/expand_cleanups).  I think
we should move some of those into the C++ frontend, as they are useful for
building up the C++ INITIAL representation, but that when we get to GENERIC 
they should be replaced with nested TRY_FINALLY_EXPRs.

SCOPE_STMT and DECL_STMT have no place in GENERIC, either.  The BIND_EXPR
abstraction is essentially the same as the LET_STMT in the McGill SIMPLE
document; it expresses a block which contains declarations and code for
which those decls are in scope.  This is a much more structured
representation, though again the others are a useful shorthand for INITIAL.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
  2002-07-20 13:58 ` Per Bothner
@ 2002-07-20 22:17 ` Steven Bosscher
  2002-07-21  6:20   ` Richard Henderson
  2002-07-21  0:09 ` Neil Booth
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 95+ messages in thread
From: Steven Bosscher @ 2002-07-20 22:17 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

Op za 20-07-2002, om 14:47 schreef Jason Merrill:
> In a previous thread, Diego has suggested that inlining would be done on
> language-dependent trees.  I think this is a mistake; IMO it should be done
> at the SIMPLE level.  Requiring the inliner to know about frontend trees is
> wrong.

That depends on what you're trying to inline. I know for sure that it
would be *mandatory* for an efficient inliner for Fortran 95 to know
about the front end.

For example, subroutines, functions or user defined operators can take
whole arrays or (worse!) array sections as their actual arguments.
Sometimes we need to copy in/out these arguments in temporary arrays,
but if the operator or function can be inlined, we can scalarize the
whole think inline and avoid the temporary arrays.

Inlining of user operators could be a big win, and users probably expect
inlining of statement functions or nested subroutines/functions.

Once we've generated SIMPLE code, we've lost the opportunity to do this
kind of inlining.

Greetz
Steven

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
  2002-07-20 13:58 ` Per Bothner
  2002-07-20 22:17 ` Steven Bosscher
@ 2002-07-21  0:09 ` Neil Booth
  2002-07-21  0:41   ` Richard Henderson
  2002-07-21  0:13 ` Richard Henderson
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 95+ messages in thread
From: Neil Booth @ 2002-07-21  0:09 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Richard Henderson, Per Bothner, Diego Novillo, Mark Mitchell, gcc

Jason Merrill wrote:-

> In a previous thread, Diego has suggested that inlining would be done on
> language-dependent trees.  I think this is a mistake; IMO it should be done
> at the SIMPLE level.  Requiring the inliner to know about frontend trees is
> wrong.

With appropriate use of a language hook or two, it should be possible
to do it on the front-end representation.  I suspect this is ideal in
general, because different languages have different constraints on what
is and is not possible in various situations, that tends to get lost
after lowering.

Neil.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
                   ` (2 preceding siblings ...)
  2002-07-21  0:09 ` Neil Booth
@ 2002-07-21  0:13 ` Richard Henderson
  2002-07-21  7:06 ` Andreas Jaeger
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 95+ messages in thread
From: Richard Henderson @ 2002-07-21  0:13 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Per Bothner, Diego Novillo, Mark Mitchell, gcc

On Sat, Jul 20, 2002 at 01:47:18PM +0100, Jason Merrill wrote:
> Furthermore, defining expressions to have no side-effects at all would seem
> to exclude CALL_EXPRs, trapping arithmetic, and volatile accesses.  rth,
> what exactly is the distinction you want to express?

In WHIRL, calls are not expressions.  As for trapping arithmetic
and volatile accesses... I have no idea.

I see your point wrt code reuse making it easier to lower the
trees.  I guess I simply wanted stronger typing.  Given that
you're doing a good hunk of the work, I'll defer to what you
think best.

> We still need to address the issue of expressing ordering requirements in
> SIMPLE, as discussed in the third thread above.  For instance, currently
> the tree-ssa branch always simplifies function arguments in left-to-right
> order, which is not required for C and is inefficient on PUSH_ROUNDING
> targets.  We need more tree codes to express these things.  Perhaps
> SEQUENCE_POINT and UNORDERED_LIST would be sufficient.  I guess it's
> reasonable to defer dealing with this until the basic infrastructure is
> done, but I haven't seen an actual decision to that effect, it just seems
> to have been dropped.

No one came up with a good idea.

> In a previous thread, Diego has suggested that inlining would be done on
> language-dependent trees.  I think this is a mistake; IMO it should be done
> at the SIMPLE level.  Requiring the inliner to know about frontend trees is
> wrong.

Indeed.

> The current TARGET_EXPR/WITH_CLEANUP_EXPR/CLEANUP_POINT_EXPR structure is
> unsuited to a structured tree representation; it relies on bookkeeping
> outside of the trees (i.e. expand_decl_cleanup/expand_cleanups).  I think
> we should move some of those into the C++ frontend, as they are useful for
> building up the C++ INITIAL representation, but that when we get to GENERIC 
> they should be replaced with nested TRY_FINALLY_EXPRs.

Yes.



r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21  0:09 ` Neil Booth
@ 2002-07-21  0:41   ` Richard Henderson
  0 siblings, 0 replies; 95+ messages in thread
From: Richard Henderson @ 2002-07-21  0:41 UTC (permalink / raw)
  To: Neil Booth; +Cc: Jason Merrill, Per Bothner, Diego Novillo, Mark Mitchell, gcc

On Sat, Jul 20, 2002 at 09:35:40PM +0100, Neil Booth wrote:
> With appropriate use of a language hook or two, it should be possible
> to do it on the front-end representation.  I suspect this is ideal in
> general, because different languages have different constraints on what
> is and is not possible in various situations, that tends to get lost
> after lowering.

I'm not fond of the idea.  If we've lost so much in the lowering
that we cannnot do decent inlining, then we've mis-targeted the
intermediate language.


r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 22:17 ` Steven Bosscher
@ 2002-07-21  6:20   ` Richard Henderson
  2002-07-21  9:43     ` Steven Bosscher
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Henderson @ 2002-07-21  6:20 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Jason Merrill, gcc

On Sat, Jul 20, 2002 at 10:09:53PM +0200, Steven Bosscher wrote:
> Once we've generated SIMPLE code, we've lost the opportunity to do this
> kind of inlining.

Err, why?


r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
                   ` (3 preceding siblings ...)
  2002-07-21  0:13 ` Richard Henderson
@ 2002-07-21  7:06 ` Andreas Jaeger
  2002-07-21 11:09   ` Pop Sébastian
  2002-07-21 17:03 ` Language-independent functions-as-trees representation Mark Mitchell
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 95+ messages in thread
From: Andreas Jaeger @ 2002-07-21  7:06 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Richard Henderson, Per Bothner, Diego Novillo, Mark Mitchell, gcc

Jason Merrill <jason@redhat.com> writes:

> In a previous thread, Diego has suggested that inlining would be done on
> language-dependent trees.  I think this is a mistake; IMO it should be done
> at the SIMPLE level.  Requiring the inliner to know about frontend trees is
> wrong.

Inlining on SIMPLE might also allow to inline mixed languages, e.g. C
code into Fortran.  I'm not sure how many people use this kind of mixed
languages for one project but if they do, it would help,

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21  6:20   ` Richard Henderson
@ 2002-07-21  9:43     ` Steven Bosscher
  2002-07-21 14:31       ` Richard Henderson
  0 siblings, 1 reply; 95+ messages in thread
From: Steven Bosscher @ 2002-07-21  9:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jason Merrill, gcc

Op zo 21-07-2002, om 07:17 schreef Richard Henderson:
> On Sat, Jul 20, 2002 at 10:09:53PM +0200, Steven Bosscher wrote:
> > Once we've generated SIMPLE code, we've lost the opportunity to do this
> > kind of inlining.
> 
> Err, why?

Because at that point:
- we have already created the temporary arrays and the code to copy 
  in/out the array
- we have already scalarized nonzero rank assignments

SIMPLE doesn't know about array syntax, and Fortran 95 does, remember?

A trivial example:

program could_inline
real, dimension(20) :: a,b,c
a = foo(b+c) ! of course b and c would be defined before this
stop

contains
	function foo(x) result (y)
	   real, dimension(:), intent (in) :: x
	   real, dimension(:), intent (out) :: y
	   y = x
	end function foo
end

Obvioulsy the function is trivial enough that it could be inlined. But
once we have generated SIMPLE, the actual argument 'b+c' is evaluated
and put in a temporary array before we do the call.

So instead of generating SIMPLE for:

program could_inline
real, dimension(20) :: a,b,c
a = b+c
stop

we would have to produde the SIMPE equivalent for something like:

program could_inline
real, dimension(20) :: a,b,c
tmp1 = b+c ! tmp1 declared locally in PRE chain of the call stmt
call foo(a, tmp1)
stop

contains
	subroutine foo(y,x)
	   real, dimension(:), intent (in) :: x
	   real, dimension(:), intent (out) :: y
	   y = x
	end subroutine foo
end

Greetz
Steven

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21  7:06 ` Andreas Jaeger
@ 2002-07-21 11:09   ` Pop Sébastian
  2002-07-21 11:44     ` Andreas Jaeger
  0 siblings, 1 reply; 95+ messages in thread
From: Pop Sébastian @ 2002-07-21 11:09 UTC (permalink / raw)
  To: Andreas Jaeger
  Cc: Jason Merrill, Richard Henderson, Per Bothner, Diego Novillo,
	Mark Mitchell, gcc

On Sun, Jul 21, 2002 at 08:42:43AM +0200, Andreas Jaeger wrote:
> Jason Merrill <jason@redhat.com> writes:
> 
> > In a previous thread, Diego has suggested that inlining would be done on
> > language-dependent trees.  I think this is a mistake; IMO it should be done
> > at the SIMPLE level.  Requiring the inliner to know about frontend trees is
> > wrong.
> 
> Inlining on SIMPLE might also allow to inline mixed languages, e.g. C
> code into Fortran.  

This would require to store SIMPLE trees comming from different front-ends
then reconstruct a global tree. 
(a little as the SGI's compiler works for interprocedural analysis, storing 
WHIRL trees into .o files, then building the whole tree at link/optimize time).

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 11:09   ` Pop Sébastian
@ 2002-07-21 11:44     ` Andreas Jaeger
  2002-07-21 23:20       ` Phil Edwards
  2002-07-22  7:10       ` where is tree dump utility Sathiskanna
  0 siblings, 2 replies; 95+ messages in thread
From: Andreas Jaeger @ 2002-07-21 11:44 UTC (permalink / raw)
  To: Pop SÃ©bastian
  Cc: Jason Merrill, Richard Henderson, Per Bothner, Diego Novillo,
	Mark Mitchell, gcc

Pop SÃ©bastian <pop@gauvain.u-strasbg.fr> writes:

> On Sun, Jul 21, 2002 at 08:42:43AM +0200, Andreas Jaeger wrote:
>> Jason Merrill <jason@redhat.com> writes:
>> 
>> > In a previous thread, Diego has suggested that inlining would be done on
>> > language-dependent trees.  I think this is a mistake; IMO it should be done
>> > at the SIMPLE level.  Requiring the inliner to know about frontend trees is
>> > wrong.
>> 
>> Inlining on SIMPLE might also allow to inline mixed languages, e.g. C
>> code into Fortran.  
>
> This would require to store SIMPLE trees comming from different front-ends
> then reconstruct a global tree. 
> (a little as the SGI's compiler works for interprocedural analysis, storing 
> WHIRL trees into .o files, then building the whole tree at link/optimize time).

It would only make sense with whole program optimizations for
interprocedural analysis.  Currently GCC does not do anything of this
kind but we should think forward and whole program optimizations is
something that some of us might want to see.


Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21  9:43     ` Steven Bosscher
@ 2002-07-21 14:31       ` Richard Henderson
  2002-07-21 15:07         ` Daniel Berlin
  2002-07-21 15:15         ` Steven Bosscher
  0 siblings, 2 replies; 95+ messages in thread
From: Richard Henderson @ 2002-07-21 14:31 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Jason Merrill, gcc

On Sun, Jul 21, 2002 at 10:32:03AM +0200, Steven Bosscher wrote:
> Because at that point:
> - we have already created the temporary arrays and the code to copy 
>   in/out the array
> - we have already scalarized nonzero rank assignments
> 
> SIMPLE doesn't know about array syntax, and Fortran 95 does, remember?

I don't necessarily think this must be a hard and fast rule.
Are there instances (aside from inlining) that could take
advantage of optimiziation before scalarization?

I've suggested before that it may well be useful to have cases
in which language-specific nodes may survive into SIMPLE.  We'd
treat them as black boxes during optimization and call back
into the front end to lower them.

r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 14:31       ` Richard Henderson
@ 2002-07-21 15:07         ` Daniel Berlin
  2002-07-21 15:15         ` Steven Bosscher
  1 sibling, 0 replies; 95+ messages in thread
From: Daniel Berlin @ 2002-07-21 15:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Steven Bosscher, Jason Merrill, gcc

On Sun, 21 Jul 2002, Richard Henderson wrote:

> On Sun, Jul 21, 2002 at 10:32:03AM +0200, Steven Bosscher wrote:
> > Because at that point:
> > - we have already created the temporary arrays and the code to copy 
> >   in/out the array
> > - we have already scalarized nonzero rank assignments
> > 
> > SIMPLE doesn't know about array syntax, and Fortran 95 does, remember?
> 
> I don't necessarily think this must be a hard and fast rule.
> Are there instances (aside from inlining) that could take
> advantage of optimiziation before scalarization?
> 
> I've suggested before that it may well be useful to have cases
> in which language-specific nodes may survive into SIMPLE.  We'd
> treat them as black boxes during optimization and call back
> into the front end to lower them.

Booo, hisss.
That would *screw* points-to analysis and other things quite harshly, 
unless we added language dependent hooks for it (which defeats the purpose 
of having language-independent alias analysis).

Right now, the points-to analysis will work on all languages (though from 
a f77 perspective, it doesn't handle the COMMON stuff (or whatever it's 
called.).

If I moved to a flow-sensitive alias-analysis, or some of the k-limiting 
ones (which differentiate between structure fields), i'd be kinda screwed 
depending on the "black box" nodes.

I'd rather try to avoid language-dependent stuff at all costs.

Other compilers just have some HLO (High Level Optimizer) hook that is 
language dependent, where the language can perform whatever optimizations 
can't be done well at a language-independent tree level.

The reasoning is probably:
If 99% of the language can be expressed in the  language-independent form 
that is amenable to optimization, just fine, why screw up 10 language 
independent optimizers so we can optimize the other 1% of the language 
statements  well. It makes more sense to write an optimizer at a language 
dependent level that does something smart with those 1% *before* they are 
made into  language-independent trees.

Does this add more code? Maybe.
Does it give you better optimization, and cleaner maintenance of language 
independent stuff? yes.

 > > > > r~
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 14:31       ` Richard Henderson
  2002-07-21 15:07         ` Daniel Berlin
@ 2002-07-21 15:15         ` Steven Bosscher
  1 sibling, 0 replies; 95+ messages in thread
From: Steven Bosscher @ 2002-07-21 15:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jason Merrill, gcc

Op zo 21-07-2002, om 19:01 schreef Richard Henderson:
> On Sun, Jul 21, 2002 at 10:32:03AM +0200, Steven Bosscher wrote:
> > Because at that point:
> > - we have already created the temporary arrays and the code to copy 
> >   in/out the array
> > - we have already scalarized nonzero rank assignments
> > 
> > SIMPLE doesn't know about array syntax, and Fortran 95 does, remember?
> 
> I don't necessarily think this must be a hard and fast rule.
> Are there instances (aside from inlining) that could take
> advantage of optimiziation before scalarization?

Well, of course there are cases where you'd like to optimize before you
scalarize. Basically, any optimization that removes duplicate code from
the scalarized version of the program. But the optimizations I'm
thinking of here are Fortran specific, e.g. PRE, but for complete array
expressions.

> I've suggested before that it may well be useful to have cases
> in which language-specific nodes may survive into SIMPLE.  We'd
> treat them as black boxes during optimization and call back
> into the front end to lower them.

I think that we can represent most Fortran statements and expressions in
SIMPLE, just not array syntax.

The kind of optimizations you would do on SIMPLE trees usually touch
expressions. So if you would do SIMPLE simplifications before
scalarizing, and you would treat array syntax as a black box, I don't
think you'd be doing any optimizations that couldn't be done after the
whole program is SIMPLEfied.

Greetz
Steven

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
                   ` (4 preceding siblings ...)
  2002-07-21  7:06 ` Andreas Jaeger
@ 2002-07-21 17:03 ` Mark Mitchell
  2002-07-21 20:30   ` Jason Merrill
  2002-07-22 21:09   ` Diego Novillo
  2002-07-22  2:42 ` where is the data Sathiskanna
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 95+ messages in thread
From: Mark Mitchell @ 2002-07-21 17:03 UTC (permalink / raw)
  To: Jason Merrill, Richard Henderson; +Cc: Per Bothner, Diego Novillo, gcc



--On Saturday, July 20, 2002 01:47:18 PM +0100 Jason Merrill 
<jason@redhat.com> wrote:

> I'd like to return to the discussion of a language-independent
> functions-as-trees representation that was going on in January.

I don't have much of an opinion about this once we get to SIMPLE, at
least not yet.

My disagreement with Per is closer to the front end; I would really
like the internal representation produced from C++ semantic analysis
to look like C++ -- with various implicit things made explict (like
default arguments, choice of overloaded function, choice of template
specialization, action to take on cleanup of object, etc.)

In SIMPLE, one place where I would like the statement/expression to come
in to play is the following:

  If two expressions are "the same", i.e., both are "x + y", then they
  are actually the same pointer.

In other words, statements are the control flow glue that holds
computations together.  The computations don't care about where they
are in the control flow.  This makes some optimizations easier and
more efficient, and it's a very appealing view of the world.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 17:03 ` Language-independent functions-as-trees representation Mark Mitchell
@ 2002-07-21 20:30   ` Jason Merrill
  2002-07-21 22:44     ` Mark Mitchell
  2002-07-22  5:11     ` Daniel Berlin
  2002-07-22 21:09   ` Diego Novillo
  1 sibling, 2 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-21 20:30 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Richard Henderson, Per Bothner, Diego Novillo, gcc, Jason Merrill

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

> My disagreement with Per is closer to the front end; I would really
> like the internal representation produced from C++ semantic analysis
> to look like C++ -- with various implicit things made explict (like
> default arguments, choice of overloaded function, choice of template
> specialization, action to take on cleanup of object, etc.)

I don't have a problem with this, except that it makes lowering more
complicated.

> In SIMPLE, one place where I would like the statement/expression to come
> in to play is the following:

>   If two expressions are "the same", i.e., both are "x + y", then they
>   are actually the same pointer.

Why?  We don't currently share expressions in RTL; why would we want to do
so in SIMPLE?

The current simplification code explicitly unshares all the trees to avoid
SAVE_EXPR-like problems with shared expressions.  For example, this code
from cppexpr.c:

  prio = optab[op].prio - ((optab[op].flags & LEFT_ASSOC) != 0);

gets folded to:

  prio = ((int)optab[(unsigned int)op].flags & 2) != 0
    ? (unsigned int)((int)optab[(unsigned int)op].prio - 1) 
    : (unsigned int)(int)optab[(unsigned int)op].prio;

where the "optab[op].prio" is shared.  If we don't unshare before
simplifying, we get something like (only partially simplified):

 T.1 = ((int)optab[(unsigned int)op].flags & 2);
 if (T.1 != 0)
   {
     T.2 = optab[(unsigned int)op].prio;
     prio = (unsigned int)((int)T.2 - 1);
   }
 else
   {
     prio = (unsigned int)(int)T.2;
   }

Oops.

> In other words, statements are the control flow glue that holds
> computations together.  The computations don't care about where they
> are in the control flow.  This makes some optimizations easier and
> more efficient, and it's a very appealing view of the world.

Sorry, I don't follow.  Most computations care very much about where they
are in the control flow; if I move computations about arbitrarily, their
values will tend to change.  The value of "x + y" is different before and
after I increment x.  It's the job of CSE to recognize when we can share
values between multiple instances of an expression.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 13:58 ` Per Bothner
@ 2002-07-21 22:04   ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-21 22:04 UTC (permalink / raw)
  To: Per Bothner; +Cc: gcc

>>>>> "Per" == Per Bothner <per@bothner.com> writes:

> Jason Merrill wrote:
>> In a previous thread, Diego has suggested that inlining would be done on
>> language-dependent trees.  I think this is a mistake; IMO it should be done
>> at the SIMPLE level.  Requiring the inliner to know about frontend trees is
>> wrong.

> But why can't the inliner work at the GENERIC level?

It certainly could.  But what would be the advantage of doing it there
rather than in SIMPLE?

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 20:30   ` Jason Merrill
@ 2002-07-21 22:44     ` Mark Mitchell
  2002-07-21 23:06       ` Jason Merrill
  2002-07-22  5:11     ` Daniel Berlin
  1 sibling, 1 reply; 95+ messages in thread
From: Mark Mitchell @ 2002-07-21 22:44 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Richard Henderson, Per Bothner, Diego Novillo, gcc

> Why?  We don't currently share expressions in RTL; why would we want to do
> so in SIMPLE?

Maybe this will make more sense after my other comment below.

Besides that, it makes things like rtx_equal_p simple/faster to implement,
and could save tons of memory.

> Sorry, I don't follow.  Most computations care very much about where they
> are in the control flow; if I move computations about arbitrarily, their
> values will tend to change.  The value of "x + y" is different before and
> after I increment x.  It's the job of CSE to recognize when we can share
> values between multiple instances of an expression.

I was implicitly (sorry!) in SSA form.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 22:44     ` Mark Mitchell
@ 2002-07-21 23:06       ` Jason Merrill
  2002-07-21 23:53         ` Mark Mitchell
  0 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-07-21 23:06 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Richard Henderson, Per Bothner, Diego Novillo, gcc, Jason Merrill

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

>> Why?  We don't currently share expressions in RTL; why would we want to do
>> so in SIMPLE?

> Maybe this will make more sense after my other comment below.

> Besides that, it makes things like rtx_equal_p simple/faster to implement,
> and could save tons of memory.

>> Sorry, I don't follow.  Most computations care very much about where they
>> are in the control flow; if I move computations about arbitrarily, their
>> values will tend to change.  The value of "x + y" is different before and
>> after I increment x.  It's the job of CSE to recognize when we can share
>> values between multiple instances of an expression.

> I was implicitly (sorry!) in SSA form.

I remain unenlightened.  Can you give an example of where such sharing
would be useful?

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 11:44     ` Andreas Jaeger
@ 2002-07-21 23:20       ` Phil Edwards
  2002-07-22  2:24         ` Daniel Berlin
  2002-07-22  7:10       ` where is tree dump utility Sathiskanna
  1 sibling, 1 reply; 95+ messages in thread
From: Phil Edwards @ 2002-07-21 23:20 UTC (permalink / raw)
  To: Andreas Jaeger
  Cc: Jason Merrill, Richard Henderson, Per Bothner, Diego Novillo,
	Mark Mitchell, gcc

On Sun, Jul 21, 2002 at 04:22:28PM +0200, Andreas Jaeger wrote:
> Pop SÃ©bastian <pop@gauvain.u-strasbg.fr> writes:
> > This would require to store SIMPLE trees comming from different front-ends
> > then reconstruct a global tree. 
> > (a little as the SGI's compiler works for interprocedural analysis, storing 
> > WHIRL trees into .o files, then building the whole tree at link/optimize time).
> 
> It would only make sense with whole program optimizations for
> interprocedural analysis.  Currently GCC does not do anything of this
> kind but we should think forward and whole program optimizations is
> something that some of us might want to see.

I was under the impression that this kind of whole-program IPA could only
be done in the linker.  Or at least, in whatever tool is performing the
role of the linker.

Phil

-- 
If ye love wealth greater than liberty, the tranquility of servitude greater
than the animating contest for freedom, go home and leave us in peace.  We seek
not your counsel, nor your arms.  Crouch down and lick the hand that feeds you;
and may posterity forget that ye were our countrymen.            - Samuel Adams

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 23:06       ` Jason Merrill
@ 2002-07-21 23:53         ` Mark Mitchell
  2002-07-22  0:13           ` Jason Merrill
  0 siblings, 1 reply; 95+ messages in thread
From: Mark Mitchell @ 2002-07-21 23:53 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Richard Henderson, Per Bothner, Diego Novillo, gcc

>> I was implicitly (sorry!) in SSA form.
>
> I remain unenlightened.  Can you give an example of where such sharing
> would be useful?

You mean, I assume, other than the obvious memory savings.  (In practice,
those savings can be rather substantial, and since we are trying to
avoid making compile-time any worse than it already is, something that
substantially reduces memory use may be worthwhile for that reason if
for no other.)

Consider that you want to do value-numbering to find places where you
compute the same value multiple times.  (Perhaps so as to move
computations to some point that dominates all of the uses, for example.)

Without sharing, you must recursively examine each new expression to see
to which existing expressions is identical.  (You can use hashing to speed
this up, but you must ultimately do a recursive walk.)

With sharing, you just do a pointer comparison.

However, with sharing, you must do work as you build the structure so that
you make sure that you really are sharing.  But that is not so hard; you
just keep tables of the things built from nodes you already have and check
to see if the node you are building uses the same operation to combine
the arguments as the one you already have.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 23:53         ` Mark Mitchell
@ 2002-07-22  0:13           ` Jason Merrill
  2002-07-22  3:17             ` Daniel Berlin
  0 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-07-22  0:13 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Richard Henderson, Per Bothner, Diego Novillo, gcc, Jason Merrill

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

>>> I was implicitly (sorry!) in SSA form.
>> 
>> I remain unenlightened.  Can you give an example of where such sharing
>> would be useful?

> You mean, I assume, other than the obvious memory savings.  (In practice,
> those savings can be rather substantial, and since we are trying to
> avoid making compile-time any worse than it already is, something that
> substantially reduces memory use may be worthwhile for that reason if
> for no other.)

Yep.

> Consider that you want to do value-numbering to find places where you
> compute the same value multiple times.  [snip]

Sure, any time you need to compare things it's faster if you can use
pointer comparison.  I was actually hoping for a small testcase, so we
could talk about the tradeoffs.

> However, with sharing, you must do work as you build the structure so that
> you make sure that you really are sharing.  But that is not so hard; you
> just keep tables of the things built from nodes you already have and check
> to see if the node you are building uses the same operation to combine
> the arguments as the one you already have.

The tricky part, as I mentioned before, is maintaining proper semantics
when simplifying shared trees, or indeed when making any modifications.
Sharing makes copying and comparison fast, but makes modification slow.
Simplification generally involves the recursive replacement of expressions
with temporary variables.  If the expressions are shared, this will change
semantics unless you somehow arrange to unshare them on modification, which
could imply an arbitrary amount of backtracking.  If I have a+b+c+d+e+f+g
and I want to replace a+b with a temp, I have to make a new copy of the
whole expression everywhere it appears, or suddenly all instances will use
the same temp, which is wrong if 'a' changes in the interim.

Or are you suggesting that we start sharing trees after simplification?
Given that we only do GC between functions, I don't think that would
provide any memory savings.

And I still don't see how this relates to the statement/expression
distinction.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 23:20       ` Phil Edwards
@ 2002-07-22  2:24         ` Daniel Berlin
  2002-07-22  6:23           ` Phil Edwards
  0 siblings, 1 reply; 95+ messages in thread
From: Daniel Berlin @ 2002-07-22  2:24 UTC (permalink / raw)
  To: Phil Edwards
  Cc: Andreas Jaeger, Jason Merrill, Richard Henderson, Per Bothner,
	Diego Novillo, Mark Mitchell, gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 1464 bytes --]

On Sun, 21 Jul 2002, Phil Edwards wrote:

> On Sun, Jul 21, 2002 at 04:22:28PM +0200, Andreas Jaeger wrote:
> > Pop Sébastian <pop@gauvain.u-strasbg.fr> writes:
> > > This would require to store SIMPLE trees comming from different front-ends
> > > then reconstruct a global tree. 
> > > (a little as the SGI's compiler works for interprocedural analysis, storing 
> > > WHIRL trees into .o files, then building the whole tree at link/optimize time).
> > 
> > It would only make sense with whole program optimizations for
> > interprocedural analysis.  Currently GCC does not do anything of this
> > kind but we should think forward and whole program optimizations is
> > something that some of us might want to see.
> 
> I was under the impression that this kind of whole-program IPA could only
> be done in the linker.  Or at least, in whatever tool is performing the
> role of the linker.

Only?
Certainly not, there are tons of ways to do it.
And it's not usually done "in" the linker.
The linker just calls the front/middle/back end (varies by compiler), 
with the names of all the objects, or, loads a shared lib and hands it 
the sections that contain the IPA data.

But as far as just inlining goes, you could do it without waiting till 
link time, it's just trickier (because you then have to make sure the 
trees you've stored are still up to date.  In the linker's case, you can 
assume the linker is only called once the objects are up-to-date.)
 > > 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* where is the data
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
                   ` (5 preceding siblings ...)
  2002-07-21 17:03 ` Language-independent functions-as-trees representation Mark Mitchell
@ 2002-07-22  2:42 ` Sathiskanna
  2002-07-22 19:04 ` Language-independent functions-as-trees representation Diego Novillo
  2002-08-23  5:41 ` Jason Merrill
  8 siblings, 0 replies; 95+ messages in thread
From: Sathiskanna @ 2002-07-22  2:42 UTC (permalink / raw)
  To: gcc

Hi,

In GCC where the data is stored while converting to RTL from tree?

sathis

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-22  0:13           ` Jason Merrill
@ 2002-07-22  3:17             ` Daniel Berlin
  0 siblings, 0 replies; 95+ messages in thread
From: Daniel Berlin @ 2002-07-22  3:17 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Mark Mitchell, Richard Henderson, Per Bothner, Diego Novillo, gcc

On Mon, 22 Jul 2002, Jason Merrill wrote:

> >>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:
> 
> >>> I was implicitly (sorry!) in SSA form.
> >> 
> >> I remain unenlightened.  Can you give an example of where such sharing
> >> would be useful?
> 
> > You mean, I assume, other than the obvious memory savings.  (In practice,
> > those savings can be rather substantial, and since we are trying to
> > avoid making compile-time any worse than it already is, something that
> > substantially reduces memory use may be worthwhile for that reason if
> > for no other.)
> 
> Yep.
> 
> > Consider that you want to do value-numbering to find places where you
> > compute the same value multiple times.  [snip]
> 
> Sure, any time you need to compare things it's faster if you can use
> pointer comparison.  I was actually hoping for a small testcase, so we
> could talk about the tradeoffs.
> 
> > However, with sharing, you must do work as you build the structure so that
> > you make sure that you really are sharing.  But that is not so hard; you
> > just keep tables of the things built from nodes you already have and check
> > to see if the node you are building uses the same operation to combine
> > the arguments as the one you already have.
> 
> The tricky part, as I mentioned before, is maintaining proper semantics
> when simplifying shared trees, or indeed when making any modifications.
> Sharing makes copying and comparison fast, but makes modification slow.
> Simplification generally involves the recursive replacement of expressions
> with temporary variables.  If the expressions are shared, this will change
> semantics unless you somehow arrange to unshare them on modification, which
> could imply an arbitrary amount of backtracking.

We also have no easy way to determine if they are currently shared, so 
unless this is added, you have to make a pass over all the trees, marking 
them, and seeing if you come across one already marked in the traversal 
(if you do, it's shared).

>  If I have a+b+c+d+e+f+g
> and I want to replace a+b with a temp, I have to make a new copy of the
> whole expression everywhere it appears, or suddenly all instances will use
> the same temp, which is wrong if 'a' changes in the interim.

Which wouldn't be so hard if you knew where the sharing starts, but you 
actually don't, unless we start and update this info.

And if we have to track what's shared, it adds memory overhead, and time 
overhead.
And adds code and maintenance cost.

And if you want to do IPA at some time in the future, having to make a 
pass over every single tree in existence in every file, to determine 
which are shared (unless we are unsharing them before writing them out, 
which runs you into the problem of memory usage/compile time when you 
actually do the compiling of them), is painful.



> 
> Or are you suggesting that we start sharing trees after simplification?


> Given that we only do GC between functions, I don't think that would
> provide any memory savings.
> 
> And I still don't see how this relates to the statement/expression
> distinction.
> 
> Jason
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 20:30   ` Jason Merrill
  2002-07-21 22:44     ` Mark Mitchell
@ 2002-07-22  5:11     ` Daniel Berlin
  2002-07-22  7:18       ` Jason Merrill
  1 sibling, 1 reply; 95+ messages in thread
From: Daniel Berlin @ 2002-07-22  5:11 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Mark Mitchell, Richard Henderson, Per Bothner, Diego Novillo, gcc

On Sun, 21 Jul 2002, Jason Merrill wrote:

> >>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:
> 
> > My disagreement with Per is closer to the front end; I would really
> > like the internal representation produced from C++ semantic analysis
> > to look like C++ -- with various implicit things made explict (like
> > default arguments, choice of overloaded function, choice of template
> > specialization, action to take on cleanup of object, etc.)
> 
> I don't have a problem with this, except that it makes lowering more
> complicated.
> 
> > In SIMPLE, one place where I would like the statement/expression to come
> > in to play is the following:
> 
> >   If two expressions are "the same", i.e., both are "x + y", then they
> >   are actually the same pointer.
> 
> Why?  We don't currently share expressions in RTL; why would we want to do
> so in SIMPLE?
> 
> The current simplification code explicitly unshares all the trees to avoid
> SAVE_EXPR-like problems with shared expressions.

Not all of them. Or at least, if it should, it's buggy.

From a SSAPRE dump:
In BB 111, insert save of la - lb to pretmp.4753 in statement T.4530 = la 
- lb on line 2047
In BB 135, insert reload of la - lb from pretmp.4753 in statement T.4676 = 
la - lb on line 2091
In BB 136, insert reload of la - lb from pretmp.4753 in statement T.4530 = 
pretmp.4753 = la - lb on line 2047


The last one occurs because, once simplified, we have:
    for (T.4530 = la - lb; j <= T.4530; T.4530 = la - lb)
        {


Where the T.4530 = la - lb 's are shared.

 --Dan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-22  2:24         ` Daniel Berlin
@ 2002-07-22  6:23           ` Phil Edwards
  0 siblings, 0 replies; 95+ messages in thread
From: Phil Edwards @ 2002-07-22  6:23 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Andreas Jaeger, Jason Merrill, Richard Henderson, Per Bothner,
	Diego Novillo, Mark Mitchell, gcc

On Mon, Jul 22, 2002 at 01:35:30AM -0400, Daniel Berlin wrote:
> On Sun, 21 Jul 2002, Phil Edwards wrote:
> > > It would only make sense with whole program optimizations for
> > > interprocedural analysis.  Currently GCC does not do anything of this
> > > kind but we should think forward and whole program optimizations is
> > > something that some of us might want to see.
> > 
> > I was under the impression that this kind of whole-program IPA could only
> > be done in the linker.  Or at least, in whatever tool is performing the
> > role of the linker.
> 
> Only?
> Certainly not, there are tons of ways to do it.

Sorry, I was thinking of the present situation, where the compiler (usually)
never sees the whole program at once.

> And it's not usually done "in" the linker.
> The linker just calls the front/middle/back end (varies by compiler), 
> with the names of all the objects,

Ah, hadn't thought of that.  Usually when I think of IPA, I think of a
smarter linker, rather than a smarter back end with a linker that knows
how to call back to it.

> or, loads a shared lib and hands it 
> the sections that contain the IPA data.

Hmmmm...


Phil

-- 
If ye love wealth greater than liberty, the tranquility of servitude greater
than the animating contest for freedom, go home and leave us in peace.  We seek
not your counsel, nor your arms.  Crouch down and lick the hand that feeds you;
and may posterity forget that ye were our countrymen.            - Samuel Adams

^ permalink raw reply	[flat|nested] 95+ messages in thread

* where is tree dump utility
  2002-07-21 11:44     ` Andreas Jaeger
  2002-07-21 23:20       ` Phil Edwards
@ 2002-07-22  7:10       ` Sathiskanna
  2002-07-22 13:25         ` Diego Novillo
  1 sibling, 1 reply; 95+ messages in thread
From: Sathiskanna @ 2002-07-22  7:10 UTC (permalink / raw)
  To: gcc


Hi,
which version has tree dump utility??

-kanna

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-22  5:11     ` Daniel Berlin
@ 2002-07-22  7:18       ` Jason Merrill
  2002-07-22 13:11         ` Daniel Berlin
  0 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-07-22  7:18 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc, Jason Merrill, Diego Novillo

>>>>> "Daniel" == Daniel Berlin <dberlin@dberlin.org> writes:

> On Sun, 21 Jul 2002, Jason Merrill wrote:

>> The current simplification code explicitly unshares all the trees to avoid
>> SAVE_EXPR-like problems with shared expressions.

> Not all of them. Or at least, if it should, it's buggy.

> From a SSAPRE dump:
> In BB 111, insert save of la - lb to pretmp.4753 in statement T.4530 = la 
> - lb on line 2047
> In BB 135, insert reload of la - lb from pretmp.4753 in statement T.4676 = 
> la - lb on line 2091
> In BB 136, insert reload of la - lb from pretmp.4753 in statement T.4530 = 
> pretmp.4753 = la - lb on line 2047

> The last one occurs because, once simplified, we have:
>     for (T.4530 = la - lb; j <= T.4530; T.4530 = la - lb)
>         {

> Where the T.4530 = la - lb 's are shared.

They shouldn't be; we do a deep_copy_list so the two copies will be
unshared.

I'm feeling pretty dubious about this duplication, though.  Any duplication
of code should be done for reasons of optimization, not to try to shoehorn
the simplified code into C loop forms.  So instead of simplifying

for (; j <= la - lb;)
  { body }

that way, I think we should simplify it to

while (true)
  {
    T.4530 = la - lb;
    if (j <= T.4530)
      { body }
  }

and let loop optimization deduce an induction variable later, if it can.
I have similar issues with the insert_before_continue_end stuff.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-22  7:18       ` Jason Merrill
@ 2002-07-22 13:11         ` Daniel Berlin
  2002-07-23 18:20           ` Jason Merrill
  0 siblings, 1 reply; 95+ messages in thread
From: Daniel Berlin @ 2002-07-22 13:11 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc, Diego Novillo

> 
> They shouldn't be; we do a deep_copy_list so the two copies will be
> unshared.

But we don't in this case. It's a simplify_for_stmt problem, I guess.
It happens in tons of for loops in this file, but trying to generate a 
simple case (at least, in the 10 minutes i spent trying a while ago) was 
not easy.

It does, however, demonstrate another issue with sharing.  Hard to 
track down problems that may only occur in weird corner cases where a 
normally unshared expression gets shared.

 > 
> I'm feeling pretty dubious about this duplication, though.  Any duplication
> of code should be done for reasons of optimization, not to try to shoehorn
> the simplified code into C loop forms.  So instead of simplifying
> 
> for (; j <= la - lb;)
>   { body }
> 
> that way, I think we should simplify it to
> 
> while (true)
>   {
>     T.4530 = la - lb;
>     if (j <= T.4530)
>       { body }
>   }
> 
> and let loop optimization deduce an induction variable later, if it can.
> I have similar issues with the insert_before_continue_end stuff.
As long as you don't throw goto's in the body to get flow control correct, 
i'm happy with any loop representation.
:)

> 
> Jason
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: where is tree dump utility
  2002-07-22  7:10       ` where is tree dump utility Sathiskanna
@ 2002-07-22 13:25         ` Diego Novillo
  0 siblings, 0 replies; 95+ messages in thread
From: Diego Novillo @ 2002-07-22 13:25 UTC (permalink / raw)
  To: Sathiskanna; +Cc: gcc

On Mon, 22 Jul 2002, Sathiskanna wrote:

> Hi,
> which version has tree dump utility??
> 
Any recent release will have a -fdump-translation-unit switch.


Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
                   ` (6 preceding siblings ...)
  2002-07-22  2:42 ` where is the data Sathiskanna
@ 2002-07-22 19:04 ` Diego Novillo
  2002-07-23 10:21   ` Fergus Henderson
  2002-07-23 17:48   ` Jason Merrill
  2002-08-23  5:41 ` Jason Merrill
  8 siblings, 2 replies; 95+ messages in thread
From: Diego Novillo @ 2002-07-22 19:04 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Richard Henderson, Per Bothner, Mark Mitchell, gcc

On Sat, 20 Jul 2002, Jason Merrill wrote:

> I'd like to return to the discussion of a language-independent
> functions-as-trees representation that was going on in January.  At this
> point I'd like to ignore the question of representation within a particular
> frontend (INITIAL?), and focus on what the frontend hands off to the
> middle-end (GENERIC?) and the simplified form which is used in
> optimization (SIMPLE).
> 
I'm curious about GENERIC.  Currently we are making each front
end do an INITIAL->SIMPLE pass via the simplify_function_tree
hook.  What advantage do you see in having INITIAL->GENERIC->SIMPLE?

> Per argues that having (for example) both IF_STMT and COND_EXPR is
> gratuitous duplication.  rth argues that at the SIMPLE level we want to
>
I don't have a strong opinion on this.  All we need is to be able
to do traversals of this kind:

	for every basic block B
	  for every stmt S in B
	    for every expr E in S

whether statements have different codes or are just of type
'void' is the same to me.  But we need to know whether a tree
node affects flow of control or not.  That's important
for building the flowgraph.

> We still need to address the issue of expressing ordering requirements in
> SIMPLE, as discussed in the third thread above.  For instance, currently
>
Nothing was decided.  I think that SEQUENCE_POINT or BARRIER
nodes should be sufficient.  Data dependencies will dictate
natural sequences in the code.  We can then insert barriers when
we need to express partial orderings.  Or maybe, have SEQUENCE
regions (BEGIN_SEQUENCE / END_SEQUENCE).

> In a previous thread, Diego has suggested that inlining would be done on
> language-dependent trees.  I think this is a mistake; IMO it should be done
> at the SIMPLE level.  Requiring the inliner to know about frontend trees is
> wrong.
> 
Ah, yes.  I have changed my mind since then.  One of the
suggestions for doing a Java inliner was to write the
simplification pass for Java and use the existing C inliner.


Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-21 17:03 ` Language-independent functions-as-trees representation Mark Mitchell
  2002-07-21 20:30   ` Jason Merrill
@ 2002-07-22 21:09   ` Diego Novillo
  1 sibling, 0 replies; 95+ messages in thread
From: Diego Novillo @ 2002-07-22 21:09 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jason Merrill, Richard Henderson, Per Bothner, gcc

On Sun, 21 Jul 2002, Mark Mitchell wrote:

> 
> 
> --On Saturday, July 20, 2002 01:47:18 PM +0100 Jason Merrill 
> <jason@redhat.com> wrote:
> 
> >I'd like to return to the discussion of a language-independent
> >functions-as-trees representation that was going on in January.
> 
> I don't have much of an opinion about this once we get to SIMPLE, at
> least not yet.
> 
> My disagreement with Per is closer to the front end; I would really
> like the internal representation produced from C++ semantic analysis
> to look like C++ -- with various implicit things made explict (like
> default arguments, choice of overloaded function, choice of template
> specialization, action to take on cleanup of object, etc.)
> 
> In SIMPLE, one place where I would like the statement/expression to come
> in to play is the following:
> 
>  If two expressions are "the same", i.e., both are "x + y", then they
>  are actually the same pointer.
> 
This sharing has given us no end of headaches.  The problem is
that the current definition of 'same' is purely syntactic.  Since
we are re-writing these expressions, the test of equality must
consider data flow:

	   (A)				   (B)
	a = x + y;			a = x + y;
	x = 3;				b = x + y;
	b = x + y;

In (A), sharing "x + y" causes PRE to generate the wrong code.
The same doesn't happen in (B).  Jason and Daniel had given other
examples elsewhere in this thread.

So, we can only set up the sharing after computing data flow
information.  That means that the parser shouldn't be sharing
expressions because it cannot possibly determine if two
expressions are the same.


Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-22 19:04 ` Language-independent functions-as-trees representation Diego Novillo
@ 2002-07-23 10:21   ` Fergus Henderson
  2002-07-23 11:26     ` Diego Novillo
  2002-07-23 17:48   ` Jason Merrill
  1 sibling, 1 reply; 95+ messages in thread
From: Fergus Henderson @ 2002-07-23 10:21 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

On 22-Jul-2002, Diego Novillo <dnovillo@redhat.com> wrote:
> On Sat, 20 Jul 2002, Jason Merrill wrote:
> 
> > I'd like to return to the discussion of a language-independent
> > functions-as-trees representation that was going on in January.  At this
> > point I'd like to ignore the question of representation within a particular
> > frontend (INITIAL?), and focus on what the frontend hands off to the
> > middle-end (GENERIC?) and the simplified form which is used in
> > optimization (SIMPLE).
>
> I'm curious about GENERIC.  Currently we are making each front
> end do an INITIAL->SIMPLE pass via the simplify_function_tree
> hook.  What advantage do you see in having INITIAL->GENERIC->SIMPLE?

Less code in the language front-ends, and code reuse rather than code
duplication for the conversion to SIMPLE.

It's easier for each front end to generate GENERIC than it is to
generate SIMPLE, because GENERIC is more expressive and has less
constraints than SIMPLE.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 10:21   ` Fergus Henderson
@ 2002-07-23 11:26     ` Diego Novillo
  2002-07-23 11:34       ` Toon Moene
  0 siblings, 1 reply; 95+ messages in thread
From: Diego Novillo @ 2002-07-23 11:26 UTC (permalink / raw)
  To: Fergus Henderson; +Cc: gcc

On Tue, 23 Jul 2002, Fergus Henderson wrote:

> It's easier for each front end to generate GENERIC than it is to
> generate SIMPLE, because GENERIC is more expressive and has less
> constraints than SIMPLE.
> 
Thanks.  That makes sense.


Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 11:26     ` Diego Novillo
@ 2002-07-23 11:34       ` Toon Moene
  2002-07-23 12:01         ` Diego Novillo
  0 siblings, 1 reply; 95+ messages in thread
From: Toon Moene @ 2002-07-23 11:34 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Fergus Henderson, gcc

Diego Novillo wrote:

> On Tue, 23 Jul 2002, Fergus Henderson wrote:
> 
> > It's easier for each front end to generate GENERIC than it is to
> > generate SIMPLE, because GENERIC is more expressive and has less
> > constraints than SIMPLE.
> >
> Thanks.  That makes sense.

Where's GENERIC documented ?  Is this something the g95 developers should
know about (I'm seeing the name for the first time) ?

-- 
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 11:34       ` Toon Moene
@ 2002-07-23 12:01         ` Diego Novillo
  2002-07-23 17:48           ` Pop Sébastian
  0 siblings, 1 reply; 95+ messages in thread
From: Diego Novillo @ 2002-07-23 12:01 UTC (permalink / raw)
  To: Toon Moene; +Cc: Fergus Henderson, gcc

On Tue, 23 Jul 2002, Toon Moene wrote:

> Diego Novillo wrote:
> 
> > On Tue, 23 Jul 2002, Fergus Henderson wrote:
> > 
> > > It's easier for each front end to generate GENERIC than it is to
> > > generate SIMPLE, because GENERIC is more expressive and has less
> > > constraints than SIMPLE.
> > >
> > Thanks.  That makes sense.
> 
> Where's GENERIC documented ?  Is this something the g95 developers should
> know about (I'm seeing the name for the first time) ?
> 
I don't think it's documented anywhere.  I saw them first
mentioned by Jason as a working name for language-independent
trees that do not have the strict structure imposed by SIMPLE.


Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 12:01         ` Diego Novillo
@ 2002-07-23 17:48           ` Pop Sébastian
  2002-07-26 15:56             ` Jason Merrill
  0 siblings, 1 reply; 95+ messages in thread
From: Pop Sébastian @ 2002-07-23 17:48 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Toon Moene, Fergus Henderson, gcc

On Tue, Jul 23, 2002 at 09:15:07AM -0400, Diego Novillo wrote:
> > 
> > Where's GENERIC documented ?  Is this something the g95 developers should
> > know about (I'm seeing the name for the first time) ?
> > 
> I don't think it's documented anywhere.  I saw them first
> mentioned by Jason as a working name for language-independent
> trees that do not have the strict structure imposed by SIMPLE.
> 

Then it seems that the issue about over simplification of call 
expressions due to the 3-address form of SIMPLE at inlining time is 
solved if we perform inlining not on SIMPLE, but on GENERIC trees.  

I understand GENERIC as beeing the original tree structure on which
we eliminate sugar syntaxes (most of which comes from c/c++ front-end
trees).  

As Richard Henderson suggested we'll need some more GENERIC nodes 
comming from other languages than C/C++ family.  F95 and Java are two 
good candidates.  We'll have to extend trees with nodes for expressing 
parallelism/thread concepts as pointed out by Richard.  

Maybe what we search in GENERIC is the definition of a normal form for 
imperative programs?

Sebastian.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-22 19:04 ` Language-independent functions-as-trees representation Diego Novillo
  2002-07-23 10:21   ` Fergus Henderson
@ 2002-07-23 17:48   ` Jason Merrill
  2002-07-23 18:29     ` Andrew Haley
                       ` (2 more replies)
  1 sibling, 3 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-23 17:48 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Richard Henderson, Per Bothner, Mark Mitchell, gcc, Jason Merrill

>>>>> "Diego" == Diego Novillo <dnovillo@redhat.com> writes:

> On Sat, 20 Jul 2002, Jason Merrill wrote:
>> I'd like to return to the discussion of a language-independent
>> functions-as-trees representation that was going on in January.  At this
>> point I'd like to ignore the question of representation within a particular
>> frontend (INITIAL?), and focus on what the frontend hands off to the
>> middle-end (GENERIC?) and the simplified form which is used in
>> optimization (SIMPLE).
>> 
> I'm curious about GENERIC.  Currently we are making each front
> end do an INITIAL->SIMPLE pass via the simplify_function_tree
> hook.  What advantage do you see in having INITIAL->GENERIC->SIMPLE?

Modularity, mainly.  The idea is that by the time we get to the
simplification stage there won't be any language-specific trees to worry
about.

On the other hand, going from INITIAL to GENERIC will use a lot of the same
functionality as from GENERIC to SIMPLE, so perhaps it would be simpler
just to go straight to SIMPLE, using a langhook to handle language-specific
trees.

Any other opinions?

>> Per argues that having (for example) both IF_STMT and COND_EXPR is
>> gratuitous duplication.  rth argues that at the SIMPLE level we want to
>> 
> I don't have a strong opinion on this.  All we need is to be able
> to do traversals of this kind:

> 	for every basic block B
> 	  for every stmt S in B
> 	    for every expr E in S

> whether statements have different codes or are just of type
> 'void' is the same to me.  But we need to know whether a tree
> node affects flow of control or not.  That's important
> for building the flowgraph.

Well, anything that can trap affects flow of control; in C++, most function
calls have edges to the local exception handler.  In Java, some arithmetic
does as well.

On the other side, a COMPOUND_STMT doesn't affect flow of control; it's
just a way of grouping things at the same level.

I suppose to me, the distinction is not about any intrinsic property of the
statement, it's whether or not its value is used in further calculations.
In expand_expr terms, whether or not "ignore" is true.

>> We still need to address the issue of expressing ordering requirements in
>> SIMPLE, as discussed in the third thread above.  For instance, currently

> Nothing was decided.  I think that SEQUENCE_POINT or BARRIER
> nodes should be sufficient.  Data dependencies will dictate
> natural sequences in the code.  We can then insert barriers when
> we need to express partial orderings.  Or maybe, have SEQUENCE
> regions (BEGIN_SEQUENCE / END_SEQUENCE).

I think we also need something to express the absence of ordering
restrictions.  If we write f(g(), h()), there will be a sequence point
after each of the calls to g and h (at least in C++).  Or if they're
inlined, within the calls.  We need to make it explicit that despite the
presence of sequence points, the optimizer can switch the two around.
That's why I was suggesting something like UNORDERED_LIST.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-22 13:11         ` Daniel Berlin
@ 2002-07-23 18:20           ` Jason Merrill
  2002-07-27 14:10             ` Pop Sébastian
  0 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-07-23 18:20 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc, Diego Novillo, Jason Merrill

>>>>> "Daniel" == Daniel Berlin <dberlin@dberlin.org> writes:

>> They shouldn't be; we do a deep_copy_list so the two copies will be
>> unshared.

> But we don't in this case. It's a simplify_for_stmt problem, I guess.

Bizarre.  The deep-copy is unconditional.

> It does, however, demonstrate another issue with sharing.  Hard to 
> track down problems that may only occur in weird corner cases where a 
> normally unshared expression gets shared.

Yep.

>> I'm feeling pretty dubious about this duplication, though.
>> [...]
>> I have similar issues with the insert_before_continue_end stuff.

> As long as you don't throw goto's in the body to get flow control correct, 
> i'm happy with any loop representation.

Well, I basically see two options.

1) Leave for and do-while loops in their natural structure, and wait for
   the loop optimizer to decompose them completely.  Or,

2) Lower all C loops at simplification time into infinite loops, so that

  for (init; cond; incr)
    {
      body1
        continue;
      body2
    }

would turn into something like

  init
  while (true)
    {
      if (! cond)
        break;
      {
        body1
          goto cont;
        body2
      }
     cont:
      incr
    }

...where the goto could be expressed by something like an EXIT_BLOCK_EXPR;
would that be easier to deal with in optimizers?

If we stay with option 1, the way to avoid the duplication problem is to
let the condition and increment fields contain an arbitrary block of code,
not just an expression.

I suppose my question is which approach is really simpler.  In one of the
earlier threads rth referred to FOR_INIT_STMT as a useless abstraction,
suggesting that the only useful forms were the infinite loop and the
FORTRAN-like do-loop, with a strong induction variable.  On the other hand,
we currently denote the condition and continue blocks in the RTL, so it
might make sense to do so in the trees as well.

Of course, the jump-elimination pass needs something like a while loop; an
infinite loop won't do, since to get out you need break, which is a jump...

Stepping back, the chunks of a C loop are:

 Code run before the first iteration (for-init-stmt), not really part of
   the loop
Top of loop
 Code run before each iteration (anything in a for or while condition
   before the actual test)
 A conditional exit (in a for or while)
 The body
Continue point
 Code run after each iteration (the for increment expression, anything in a
   do-while condition before the actual test)
 A conditional exit (in a do-while)
 Jump to top

So a generic LOOP_STMT/LOOP_EXPR would have LOOP_EXPR_PROLOGUE (arbitrary
code), LOOP_EXPR_PRECONDITION (condexpr), LOOP_EXPR_BODY (arbitrary code),
LOOP_EXPR_EPILOGUE (arbitrary code), LOOP_EXPR_POSTCONDITION (condexpr).
This seems like a useful form; what do others think?

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 17:48   ` Jason Merrill
@ 2002-07-23 18:29     ` Andrew Haley
  2002-07-23 19:07     ` Richard Henderson
  2002-07-23 21:57     ` Fergus Henderson
  2 siblings, 0 replies; 95+ messages in thread
From: Andrew Haley @ 2002-07-23 18:29 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Diego Novillo, Richard Henderson, Per Bothner, Mark Mitchell, gcc

Jason Merrill writes:
 > >>>>> "Diego" == Diego Novillo <dnovillo@redhat.com> writes:
 > 
 > > On Sat, 20 Jul 2002, Jason Merrill wrote:
 > >> I'd like to return to the discussion of a language-independent
 > >> functions-as-trees representation that was going on in January.  At this
 > >> point I'd like to ignore the question of representation within a particular
 > >> frontend (INITIAL?), and focus on what the frontend hands off to the
 > >> middle-end (GENERIC?) and the simplified form which is used in
 > >> optimization (SIMPLE).
 > >> 
 > > I'm curious about GENERIC.  Currently we are making each front
 > > end do an INITIAL->SIMPLE pass via the simplify_function_tree
 > > hook.  What advantage do you see in having INITIAL->GENERIC->SIMPLE?
 > 
 > Modularity, mainly.  The idea is that by the time we get to the
 > simplification stage there won't be any language-specific trees to worry
 > about.
 > 
 > On the other hand, going from INITIAL to GENERIC will use a lot of the same
 > functionality as from GENERIC to SIMPLE, so perhaps it would be simpler
 > just to go straight to SIMPLE, using a langhook to handle language-specific
 > trees.
 > 
 > Any other opinions?

WRT Java, It's not clear to me whether going from INITIAL to GENERIC
will be much easier than going from INITIAL to SIMPLE.  It certainly
could be, but that depends on the design of GENERIC.  If it is easier
then I'm in favour; the less language-specific code the better.

Andrew.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 17:48   ` Jason Merrill
  2002-07-23 18:29     ` Andrew Haley
@ 2002-07-23 19:07     ` Richard Henderson
  2002-07-23 23:40       ` Jason Merrill
  2002-07-23 21:57     ` Fergus Henderson
  2 siblings, 1 reply; 95+ messages in thread
From: Richard Henderson @ 2002-07-23 19:07 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Diego Novillo, Per Bothner, Mark Mitchell, gcc

On Tue, Jul 23, 2002 at 06:01:21PM +0100, Jason Merrill wrote:
> On the other hand, going from INITIAL to GENERIC will use a lot of the same
> functionality as from GENERIC to SIMPLE, so perhaps it would be simpler
> just to go straight to SIMPLE, using a langhook to handle language-specific
> trees.

I think I like this idea.  I can't imagine except that it'd be
faster, and use less memory in the process.

> > But we need to know whether a tree
> > node affects flow of control or not.  That's important
> > for building the flowgraph.
> 
> Well, anything that can trap affects flow of control; in C++, most function
> calls have edges to the local exception handler.  In Java, some arithmetic
> does as well.

Yes, but I'd say that traping arithmetic cannot be a nested expression.
It would only be able to appear as the statement "t = a op b".  Perhaps
similarly with trapping memory references.

> I think we also need something to express the absence of ordering
> restrictions.  If we write f(g(), h()), there will be a sequence point
> after each of the calls to g and h (at least in C++).  Or if they're
> inlined, within the calls.  We need to make it explicit that despite the
> presence of sequence points, the optimizer can switch the two around.
> That's why I was suggesting something like UNORDERED_LIST.

Alternately, the number of places in which this applies seems
relatively limited.  We might be able to do with a langhook 
that says what order in which arguments may be evaluated.

What other benefits do we get from the complexity of expressing
the partial ordering of expressions?


r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 17:48   ` Jason Merrill
  2002-07-23 18:29     ` Andrew Haley
  2002-07-23 19:07     ` Richard Henderson
@ 2002-07-23 21:57     ` Fergus Henderson
  2002-07-24  0:22       ` Jason Merrill
  2 siblings, 1 reply; 95+ messages in thread
From: Fergus Henderson @ 2002-07-23 21:57 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Diego Novillo, Richard Henderson, Per Bothner, Mark Mitchell, gcc

On 23-Jul-2002, Jason Merrill <jason@redhat.com> wrote:
> >>>>> "Diego" == Diego Novillo <dnovillo@redhat.com> writes:
> 
> > On Sat, 20 Jul 2002, Jason Merrill wrote:
> >> I'd like to return to the discussion of a language-independent
> >> functions-as-trees representation that was going on in January.  At this
> >> point I'd like to ignore the question of representation within a particular
> >> frontend (INITIAL?), and focus on what the frontend hands off to the
> >> middle-end (GENERIC?) and the simplified form which is used in
> >> optimization (SIMPLE).
> >> 
> > I'm curious about GENERIC.  Currently we are making each front
> > end do an INITIAL->SIMPLE pass via the simplify_function_tree
> > hook.  What advantage do you see in having INITIAL->GENERIC->SIMPLE?
> 
> Modularity, mainly.  The idea is that by the time we get to the
> simplification stage there won't be any language-specific trees to worry
> about.

Right.

> On the other hand, going from INITIAL to GENERIC will use a lot of the same
> functionality as from GENERIC to SIMPLE, so perhaps it would be simpler
> just to go straight to SIMPLE, using a langhook to handle language-specific
> trees.

For the Mercury front-end, ideally there will be no INITIAL trees --
at least not as GCC tree nodes -- since the Mercury front-end uses its
own intermediate representation.  Ideally we would convert straight from the
Mercury compiler's intermediate representation (MLDS) to GENERIC.

Currently the Mercury compiler builds GCC trees for atomic statements, but
converts compound statements directly from MLDS to RTL.  So we miss out on
tree-based inlining, and other tree-level whole-function optimizations.
The main reason for going straight from MLDS to RTL is that there are
no language-independent tree nodes for some constructs that we need,
e.g. switch statements.  I would like that to change.

So part of the point of GENERIC is to have a language-independent
representation which is expressive enough to be an easy target language,
and to express all the constructs that the back-end needs to know about.

Going straight from INITIAL (language-dependent) trees to SIMPLE trees
wouldn't work very well for us.  We'd have to do extra work.
One way would be to construct our own INITIAL representation for Mercury,
which would require extra work to e.g. define a GCC tree node for
(Mercury) switch statements, and then to define the hook to handle this
language-specific tree node in the INITIAL->SIMPLE conversion.
The other way would be to go straight from MLDS to SIMPLE, but that too
would be more work for us, since SIMPLE is a less expressive language --
we'd have to duplicate much of the work involved in the conversion to SIMPLE.

I suspect the same would apply to GNAT, since they also have their own
intermediate representation not based on GCC trees.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 19:07     ` Richard Henderson
@ 2002-07-23 23:40       ` Jason Merrill
  2002-07-24  3:01         ` Richard Henderson
  0 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-07-23 23:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Diego Novillo, Per Bothner, Mark Mitchell, gcc

>>>>> "Richard" == Richard Henderson <rth@redhat.com> writes:

> On Tue, Jul 23, 2002 at 06:01:21PM +0100, Jason Merrill wrote:
>> On the other hand, going from INITIAL to GENERIC will use a lot of the same
>> functionality as from GENERIC to SIMPLE, so perhaps it would be simpler
>> just to go straight to SIMPLE, using a langhook to handle language-specific
>> trees.

> I think I like this idea.  I can't imagine except that it'd be
> faster, and use less memory in the process.

I would think.

>> > But we need to know whether a tree
>> > node affects flow of control or not.  That's important
>> > for building the flowgraph.
>> 
>> Well, anything that can trap affects flow of control; in C++, most
>> function calls have edges to the local exception handler.  In Java, some
>> arithmetic does as well.

> Yes, but I'd say that traping arithmetic cannot be a nested expression.
> It would only be able to appear as the statement "t = a op b".  Perhaps
> similarly with trapping memory references.

In SIMPLE that's as complex as any expression gets.

>> I think we also need something to express the absence of ordering
>> restrictions.  If we write f(g(), h()), there will be a sequence point
>> after each of the calls to g and h (at least in C++).  Or if they're
>> inlined, within the calls.  We need to make it explicit that despite the
>> presence of sequence points, the optimizer can switch the two around.
>> That's why I was suggesting something like UNORDERED_LIST.

> Alternately, the number of places in which this applies seems
> relatively limited.  We might be able to do with a langhook 
> that says what order in which arguments may be evaluated.

That could be useful at simplification time, but not after; once we've
simplified, all the arguments are temporary variables or constants, and
their evaluations are serialized.  Likewise for arithmetic operands.

It may well be that the optimization opportunities from switching operand
evaluation around at a later phase of optimization are small compared to
the complexity of representing it.  But the complexity is also small; an
initial implementation could just treat UNORDERED_LIST like
COMPOUND_STMT/EXPR.

> What other benefits do we get from the complexity of expressing
> the partial ordering of expressions?

I would imagine that there could be some benefit to reordering evaluation
to reduce exception regions for temporary cleanups, or to bring instances
of a common subexpression closer together in the codestream, or whatever.

Of course, that we don't currently make any attempt to take advantage of
this opportunity could suggest that indeed it isn't very useful...

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 21:57     ` Fergus Henderson
@ 2002-07-24  0:22       ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-24  0:22 UTC (permalink / raw)
  To: Fergus Henderson
  Cc: Diego Novillo, Richard Henderson, Per Bothner, Mark Mitchell, gcc

>>>>> "Fergus" == Fergus Henderson <fjh@cs.mu.oz.au> writes:

> On 23-Jul-2002, Jason Merrill <jason@redhat.com> wrote:

>> >>>>> "Diego" == Diego Novillo <dnovillo@redhat.com> writes:

>> > I'm curious about GENERIC.  Currently we are making each front
>> > end do an INITIAL->SIMPLE pass via the simplify_function_tree
>> > hook.  What advantage do you see in having INITIAL->GENERIC->SIMPLE?
>> 
>> Modularity, mainly.  The idea is that by the time we get to the
>> simplification stage there won't be any language-specific trees to worry
>> about.

> Right.

>> On the other hand, going from INITIAL to GENERIC will use a lot of the same
>> functionality as from GENERIC to SIMPLE, so perhaps it would be simpler
>> just to go straight to SIMPLE, using a langhook to handle language-specific
>> trees.

> For the Mercury front-end, ideally there will be no INITIAL trees --
> at least not as GCC tree nodes -- since the Mercury front-end uses its
> own intermediate representation.  Ideally we would convert straight from the
> Mercury compiler's intermediate representation (MLDS) to GENERIC.

I don't think this has to be an either-or situation.  SIMPLE will use the
same tree codes as GENERIC, it just has more constraints on what are valid
patterns.  Even if the C family frontends go straight from INITIAL to
SIMPLE, it should work just fine for you to convert from MLDS to GENERIC
and then use the language-independent parts of the simplification code to
lower that to SIMPLE.

> Currently the Mercury compiler builds GCC trees for atomic statements, but
> converts compound statements directly from MLDS to RTL.  So we miss out on
> tree-based inlining, and other tree-level whole-function optimizations.
> The main reason for going straight from MLDS to RTL is that there are
> no language-independent tree nodes for some constructs that we need,
> e.g. switch statements.  I would like that to change.

Me, too.  There is a SWITCH_EXPR in the backend now, but it's really just a
hand-wave placeholder for the Java version.

> So part of the point of GENERIC is to have a language-independent
> representation which is expressive enough to be an easy target language,
> and to express all the constructs that the back-end needs to know about.

Indeed.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 23:40       ` Jason Merrill
@ 2002-07-24  3:01         ` Richard Henderson
  2002-07-24 11:34           ` Jason Merrill
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Henderson @ 2002-07-24  3:01 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Diego Novillo, Per Bothner, Mark Mitchell, gcc

On Wed, Jul 24, 2002 at 01:14:06AM +0100, Jason Merrill wrote:
> >> Well, anything that can trap affects flow of control; in C++, most
> >> function calls have edges to the local exception handler.  In Java, some
> >> arithmetic does as well.
> 
> > Yes, but I'd say that traping arithmetic cannot be a nested expression.
> > It would only be able to appear as the statement "t = a op b".  Perhaps
> > similarly with trapping memory references.
> 
> In SIMPLE that's as complex as any expression gets.

I guess I don't see your point then.

> It may well be that the optimization opportunities from switching operand
> evaluation around at a later phase of optimization are small compared to
> the complexity of representing it.  But the complexity is also small; an
> initial implementation could just treat UNORDERED_LIST like
> COMPOUND_STMT/EXPR.

That's a highly simplified representation of unsequenced expressions.
Is it enough to represent

	(a++, a + 2) + (b++, b + 3)

which has two ordered pairs, but no ordering across the +?

> Of course, that we don't currently make any attempt to take advantage of
> this opportunity could suggest that indeed it isn't very useful...

It's also a hard problem, so I don't think that's necessarily true.

At the same time, I do think there's probably little to be gained.
Certainly not enough to justify the complexity that I sense is
associated with the problem.  Indeed, I think that most of the
possible benefit would come in the form of relaxed scheduling
constraints, which would mean tracking this partial ordering all
the way through the optimizers.  Which is pretty much a non-starter.

r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-24  3:01         ` Richard Henderson
@ 2002-07-24 11:34           ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-24 11:34 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Diego Novillo, Per Bothner, Mark Mitchell, gcc, Jason Merrill

>>>>> "Richard" == Richard Henderson <rth@redhat.com> writes:

> On Wed, Jul 24, 2002 at 01:14:06AM +0100, Jason Merrill wrote:
>> >> Well, anything that can trap affects flow of control; in C++, most
>> >> function calls have edges to the local exception handler.  In Java, some
>> >> arithmetic does as well.
>> 
>> > Yes, but I'd say that traping arithmetic cannot be a nested expression.
>> > It would only be able to appear as the statement "t = a op b".  Perhaps
>> > similarly with trapping memory references.
>> 
>> In SIMPLE that's as complex as any expression gets.

> I guess I don't see your point then.

My point was just that affecting the flow of control is not a difference
between statements and expressions.

>> It may well be that the optimization opportunities from switching operand
>> evaluation around at a later phase of optimization are small compared to
>> the complexity of representing it.  But the complexity is also small; an
>> initial implementation could just treat UNORDERED_LIST like
>> COMPOUND_STMT/EXPR.

> That's a highly simplified representation of unsequenced expressions.
> Is it enough to represent

> 	(a++, a + 2) + (b++, b + 3)

> which has two ordered pairs, but no ordering across the +?

Sure:

unordered_list
  (
    (a = a + 1, t1 = a + 2),
    (b = b + 1, t2 = b + 3)
  )
t3 = t1 + t2

> At the same time, I do think there's probably little to be gained.
> Certainly not enough to justify the complexity that I sense is
> associated with the problem.  Indeed, I think that most of the
> possible benefit would come in the form of relaxed scheduling
> constraints, which would mean tracking this partial ordering all
> the way through the optimizers.  Which is pretty much a non-starter.

Fair enough.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 17:48           ` Pop Sébastian
@ 2002-07-26 15:56             ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-26 15:56 UTC (permalink / raw)
  To: Pop Sébastian; +Cc: Diego Novillo, Toon Moene, Fergus Henderson, gcc

>>>>> "Pop" == Pop Sébastian <pop@gauvain.u-strasbg.fr> writes:

> On Tue, Jul 23, 2002 at 09:15:07AM -0400, Diego Novillo wrote:
>> > 
>> > Where's GENERIC documented ?  Is this something the g95 developers should
>> > know about (I'm seeing the name for the first time) ?
>> > 
>> I don't think it's documented anywhere.  I saw them first
>> mentioned by Jason as a working name for language-independent
>> trees that do not have the strict structure imposed by SIMPLE.

> Then it seems that the issue about over simplification of call 
> expressions due to the 3-address form of SIMPLE at inlining time is 
> solved if we perform inlining not on SIMPLE, but on GENERIC trees.  

Not really; the issue affects simplification regardless of whether we do
inlining before or after simplification.  (Though if we do it after, we'll
probably want to simplify the expansion).

> I understand GENERIC as beeing the original tree structure on which
> we eliminate sugar syntaxes (most of which comes from c/c++ front-end
> trees).

FYI, I think you mean "syntactic sugar".

> As Richard Henderson suggested we'll need some more GENERIC nodes 
> comming from other languages than C/C++ family.  F95 and Java are two 
> good candidates.  We'll have to extend trees with nodes for expressing 
> parallelism/thread concepts as pointed out by Richard.  

> Maybe what we search in GENERIC is the definition of a normal form for 
> imperative programs?

Yes.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-23 18:20           ` Jason Merrill
@ 2002-07-27 14:10             ` Pop Sébastian
  2002-07-30  9:02               ` Jason Merrill
  0 siblings, 1 reply; 95+ messages in thread
From: Pop Sébastian @ 2002-07-27 14:10 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Daniel Berlin, gcc, Diego Novillo

On Tue, Jul 23, 2002 at 06:13:51PM +0100, Jason Merrill wrote:
> Well, I basically see two options.
> 
> 1) Leave for and do-while loops in their natural structure, and wait for
>    the loop optimizer to decompose them completely.  Or,
> 
> 2) Lower all C loops at simplification time into infinite loops, so that
> 
>   for (init; cond; incr)
>     {
>       body1
>         continue;
>       body2
>     }
> 
> would turn into something like
> 
>   init
>   while (true)
>     {
>       if (! cond)
>         break;
>       {
>         body1
>           goto cont;
>         body2
>       }
>      cont:
>       incr
>     }
> 
> ...where the goto could be expressed by something like an EXIT_BLOCK_EXPR;
> would that be easier to deal with in optimizers?
> 

Solution 2 will flatten all loop structures to a unique loop structure
without any distinction.  I think that in combination with an induction 
variable detector we can promote an infinite loop to a higher loop structure
(fortran's DO loop).  If these higher level loops can only be produced by 
the analyzer, then the loop nest optimizer could concentrate only on these
higher forms of loops and discard all others loops.

I think that the goal of the simplifier is to lower expressions and statements
to a normal form for reducing the number of allowed forms for the same concept
(reducing syntactic sugar :-) ).  This normal form reduces analyzer's 
complexity.  Analyzers promote expressions and statements to a higher level 
that contains more intrinsic information destined to optimizers.

> If we stay with option 1, the way to avoid the duplication problem is to
> let the condition and increment fields contain an arbitrary block of code,
> not just an expression.
> 
> I suppose my question is which approach is really simpler.  In one of the
> earlier threads rth referred to FOR_INIT_STMT as a useless abstraction,
> suggesting that the only useful forms were the infinite loop and the
> FORTRAN-like do-loop, with a strong induction variable.  On the other hand,
> we currently denote the condition and continue blocks in the RTL, so it
> might make sense to do so in the trees as well.
> 
> Of course, the jump-elimination pass needs something like a while loop; an
> infinite loop won't do, since to get out you need break, which is a jump...
> 

We could promote an infinite loop to a conditional loop after jump-elimination.

> Stepping back, the chunks of a C loop are:
> 
>  Code run before the first iteration (for-init-stmt), not really part of
>    the loop
> Top of loop
>  Code run before each iteration (anything in a for or while condition
>    before the actual test)
>  A conditional exit (in a for or while)
>  The body
> Continue point
>  Code run after each iteration (the for increment expression, anything in a
>    do-while condition before the actual test)
>  A conditional exit (in a do-while)
>  Jump to top
> 
> So a generic LOOP_STMT/LOOP_EXPR would have LOOP_EXPR_PROLOGUE (arbitrary
> code), LOOP_EXPR_PRECONDITION (condexpr), LOOP_EXPR_BODY (arbitrary code),
> LOOP_EXPR_EPILOGUE (arbitrary code), LOOP_EXPR_POSTCONDITION (condexpr).
> This seems like a useful form; what do others think?
> 

These EXPRs/STMTs could be good candidates for a GENERIC loop?

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-27 14:10             ` Pop Sébastian
@ 2002-07-30  9:02               ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-07-30  9:02 UTC (permalink / raw)
  To: Pop Sébastian; +Cc: Daniel Berlin, gcc, Diego Novillo

On Sat, 27 Jul 2002 08:56:47 +0200, Pop Sebastian <pop@gauvain.u-strasbg.fr> wrote:

> On Tue, Jul 23, 2002 at 06:13:51PM +0100, Jason Merrill wrote:

>>   init
>>   while (true)
>>     {
>>       if (! cond)
>>         break;
>>       {
>>         body1
>>           goto cont;
>>         body2
>>       }
>>      cont:
>>       incr
>>     }
>> 
>> ...where the goto could be expressed by something like an EXIT_BLOCK_EXPR;
>> would that be easier to deal with in optimizers?
>
> Solution 2 will flatten all loop structures to a unique loop structure
> without any distinction.  I think that in combination with an induction 
> variable detector we can promote an infinite loop to a higher loop structure
> (fortran's DO loop).  If these higher level loops can only be produced by 
> the analyzer, then the loop nest optimizer could concentrate only on these
> higher forms of loops and discard all others loops.

Yep, that's the idea.

> I think that the goal of the simplifier is to lower expressions and
> statements to a normal form for reducing the number of allowed forms for
> the same concept (reducing syntactic sugar :-) ).  This normal form
> reduces analyzer's complexity.  Analyzers promote expressions and
> statements to a higher level that contains more intrinsic information
> destined to optimizers.

Yes.

>> I suppose my question is which approach is really simpler.  In one of the
>> earlier threads rth referred to FOR_INIT_STMT as a useless abstraction,
>> suggesting that the only useful forms were the infinite loop and the
>> FORTRAN-like do-loop, with a strong induction variable.  On the other hand,
>> we currently denote the condition and continue blocks in the RTL, so it
>> might make sense to do so in the trees as well.
>> 
>> Of course, the jump-elimination pass needs something like a while loop; an
>> infinite loop won't do, since to get out you need break, which is a jump...

> We could promote an infinite loop to a conditional loop after
> jump-elimination.

Certainly.  But then, I wonder, why bother reducing it in the first place?

>> So a generic LOOP_STMT/LOOP_EXPR would have LOOP_EXPR_PROLOGUE (arbitrary
>> code), LOOP_EXPR_PRECONDITION (condexpr), LOOP_EXPR_BODY (arbitrary code),
>> LOOP_EXPR_EPILOGUE (arbitrary code), LOOP_EXPR_POSTCONDITION (condexpr).
>> This seems like a useful form; what do others think?

> These EXPRs/STMTs could be good candidates for a GENERIC loop?

That would make sense, but I still wonder which form is more appropriate
for SIMPLE.  The flat representation requires less knowledge of loop
structure from other passes, but requires more intelligence from the loop
optimizer; if we want to rotate the loop, which we basically always do, we
would need to re-derive the loop structure, at least the
prologue/precondition bits.

Hmm.  This structured form still isn't expressive enough to handle C++
loops with condition declarations, as the cleanup for the condition
variable needs to wrap the body and epilogue.  So we would need to flatten
such loops anyway.  So perhaps it makes sense to do that for all SIMPLE
loops.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
                   ` (7 preceding siblings ...)
  2002-07-22 19:04 ` Language-independent functions-as-trees representation Diego Novillo
@ 2002-08-23  5:41 ` Jason Merrill
  2002-08-23  6:00   ` Gabriel Dos Reis
                     ` (6 more replies)
  8 siblings, 7 replies; 95+ messages in thread
From: Jason Merrill @ 2002-08-23  5:41 UTC (permalink / raw)
  To: gcc; +Cc: Jason Merrill

I'd like to bring this subject up yet again, last discussed in

  http://gcc.gnu.org/ml/gcc/2002-07/threads.html#00890

I continue to believe that the current tree IR is fundamentally flawed; it
was designed as a convenient shorthand for representing C parse trees, and
is fairly unwieldy for use by optimizers.  I laid out various problems in
the message above.  One primary weakness is that many of the _STMT
codes just mean "call expand_foo now", and as such are a reasonable
representation if we are only going to pass them to the expander, as has
been the case in the past, but not if we want them to actually express
program semantics in a properly structured way.

I also continue to take issue with the distinction between _STMT and _EXPR
nodes.  Discussing this with other folks, I have come to the conclusion
that the real distinction between statements and expressions is that a
statement is something for which we are only interested in its
side-effects, not its value.  It's a difference of evaluation context, not
always an intrinsic difference.  Something like a cleanup can appear in
either context; it seems sensible to use TRY_FINALLY_EXPR to express it in
both, rather than convert it to a TRY_FINALLY_STMT when it moves to
statement context.

I've been working for a couple of weeks off and on to retarget the
simplifier to generate something much more like the Java frontend tree IR,
which seems much cleaner to me, largely by virtue of working mostly in the
existing backend tree codes.  I've made a lot of progress, though
significant things are still broken, but my morale is fading.  I see more
and more people building things on the current IR every day, increasing the
work that will be necessary to switch optimizers and such over to the
hypothetical new scheme.

It seems to me that these fundamental design questions need to be
straightened out now, before we build anything more on top of the current
IR, or we're just digging ourselves a deeper hole.  As long as we're
building something new, we should take the time to do it right.  Do others
disagree?  I suppose another strategy would be to clean up the expand
placeholders in the C/C++ IR one by one, rather than try to simplify it to
something rather different all at once, but that would not address the
STMT/EXPR issue.

Basically, I'm wondering if other people are interested enough in this
change to 1) help with the design for the IR and 2) help with porting the
existing tree optimizers to the new IR.

In any case, here's a draft of a design of a new SIMPLE representation.
It's a bit sketchy at this point, but I'm very interested in comments.

------

   function:
     FUNCTION_DECL
       DECL_SAVED_TREE -> block
   block:
     BIND_EXPR
       BIND_EXPR_VARS -> DECL chain
       BIND_EXPR_BLOCK -> BLOCK
       BIND_EXPR_BODY -> compound-stmt

A BIND_EXPR takes the place of the current COMPOUND_STMT, SCOPE_STMT and
DECL_STMT; all of the decls for a block are given RTL at the beginning of
the block.  DECLs with static initializers keep their DECL_INITIAL; other
initializations are implemented with INIT_EXPRs in the codestream.  The
Java "BLOCK_EXPR" is very similar.

   compound-stmt:
     COMPOUND_EXPR
       op0 -> non-compound-stmt
       op1 -> stmt

rth has raised some questions about the advisability of using COMPOUND_EXPR
to chain statements; the current scheme uses TREE_CHAIN of the statements
themselves.  To me, the benefit is modularity; apart from the earlier
complaints about the STMT/EXPR distinction, using COMPOUND_EXPR makes it
easy to replace a single complex expression with a sequence of simple ones,
simply by plugging in a COMPOUND_EXPR in its place.  The current scheme
requires a lot more pointer management in order to splice the new STMTs in
at both ends.

It seems to me that double-chaining could be provided by using the
TREE_CHAIN of the COMPOUND_EXPRs.

   stmt: compound-stmt | non-compound-stmt
   non-compound-stmt:
     block
     | loop-stmt
     | if-stmt
     | switch-stmt
     | labeled-block-stmt
     | jump-stmt
     | label-stmt
     | try-stmt
     | modify-stmt
     | call-stmt
   loop-stmt:
     LOOP_EXPR
       LOOP_EXPR_BODY -> stmt | NULL_TREE
     | DO_LOOP_EXPR
       (to be defined later)

The Java loop has 1 (or 0) EXIT_EXPR, used to express the loop condition.
This makes it easy to distinguish from 'break's, which are expressed
with EXIT_BLOCK_EXPR.  

EXIT_EXPR is a bit backwards for this purpose, as its sense is opposite to
that of the loop condition, so we end up calling invert_truthvalue twice in
the process of generating and expanding it.  But that's not a big deal.

From an optimization perspective, are LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
easier to deal with than plain gotos?  I assume they're preferable to the
current loosely bound BREAK_STMT, which has no information about what it's
exiting.  EXIT_EXPR would have the same problem if it were used to express
'break'.

   if-stmt:
     COND_EXPR
       op0 -> condition
       op1 -> stmt
       op2 -> stmt
   switch-stmt:
     SWITCH_EXPR
       op0 -> val
       op1 -> stmt

The McCAT SIMPLE requires the simplifier to make case labels disjoint by
copying shared code around, allowing a more structured representation of a
switch.  I think this is too dubious an optimization to be performed by
default, but might be interesting as part of a goto-elimination pass; a
possible representation would be to also allow a TREE_LIST for op1.

   labeled-block-stmt:
     LABELED_BLOCK_EXPR
       op0 -> LABEL_DECL
       op1 -> stmt
   jump-stmt:
     EXIT_EXPR
         op0 -> condition
     | GOTO_EXPR
         op0 -> LABEL_DECL | '*' ID
     | RETURN_EXPR
         op0 -> modify-stmt | NULL_TREE

I had thought about always moving the assignment to the return value out of
the RETURN_EXPR, but it seems like expand_return depends on getting a
MODIFY_EXPR in order to handle some return semantics.

     | EXIT_BLOCK_EXPR
         op0 -> ref to LABELED_BLOCK_EXPR
         op1 -> NULL_TREE
     | THROW_EXPR? 

I'm not sure how we want to represent throws for the purpose of to
generating an ERT_THROW region?  I had thought about using a THROW_EXPR
wrapper, but that wouldn't work in non-simplified code where calls can have
complex args.  Perhaps annotation of the CALL_EXPR would work better.

     | RESX_EXPR
   label-stmt:
     LABEL_EXPR
         op0 -> LABEL_DECL
     | CASE_LABEL_EXPR
         CASE_LOW -> val | NULL_TREE
         CASE_HIGH -> val | NULL_TREE
   try-stmt:
     TRY_CATCH_EXPR

This will need to be extended to handle type-based catch clauses as well.

     | TRY_FINALLY_EXPR

I think it makes sense to leave this as a separate tree code for handling
cleanups.

   modify-stmt:
     MODIFY_EXPR | INIT_EXPR
       op0 -> lhs
       op1 -> rhs
   call-stmt: CALL_EXPR
     op0 -> ID
     op1 -> arglist

Assignment and calls are the only expressions with intrinsic side-effects,
so only they can appear at statement context.

The rest of this is basically copied from the McCAT design.  I think it
still needs some tweaking, but that can wait until after the
statement-level stuff is worked out.

   varname : compref | ID (rvalue)
   lhs: varname | '*' ID  (lvalue)
   pseudo-lval: ID | '*' ID  (either)
   compref :
     COMPONENT_REF
       op0 -> compref | pseudo-lval
     | ARRAY_REF
       op0 -> compref | pseudo-lval
       op1 -> val

   condition : val | val relop val
   val : ID | CONST

   rhs        : varname | CONST
	      | '*' ID
	      | '&' varname_or_temp
	      | call_expr
	      | unop val
	      | val binop val
	      | '(' cast ')' varname

   unop    : '+' | '-' | '!' | '~'
   binop   : relop | '-' | '+' | '/' | '*' | '%' | '&' | '|' | '<<' | '>>' | '^'
   relop   : '<' | '<=' | '>' | '>=' | '==' | '!='

-----
Other thoughts:

I've gotten questions about what GENERIC means, and whether it's worth
designing yet another IR.  I don't think any design is necessary; GENERIC
is just whatever can be expressed in the generic trees.  The l-i simplifier
should be able to reduce any valid tree structure to SIMPLE.

It's still not clear to me how best to represent line number information.
I'd appreciate someone from the Java team explaining to me how it's handled
there, particularly the use of EXPR_WFL_LINECOL on
non-EXPR_WITH_FILE_LOCATION nodes.

There has been uncertainty about what to do with frontend trees that we
want to wait and lower after inlining; the problem is how to treat them in
dataflow optimizations.  I think that we don't actually need to worry about
that; we can do more localized optimizations, such as inlining, and then
lower them to backend trees.

I've attached my current patch below; it's still in the toy stages, but
gives an idea of my implementation strategy.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  5:41 ` Jason Merrill
@ 2002-08-23  6:00   ` Gabriel Dos Reis
  2002-08-23  7:41     ` Jason Merrill
  2002-08-23  8:22   ` Pop SÃ©bastian
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 95+ messages in thread
From: Gabriel Dos Reis @ 2002-08-23  6:00 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

Jason Merrill <jason@redhat.com> writes:

| It seems to me that these fundamental design questions need to be
| straightened out now, before we build anything more on top of the current
| IR, or we're just digging ourselves a deeper hole.  As long as we're
| building something new, we should take the time to do it right.  Do others
| disagree? 

All you said make sense to me.  One problem with leaving with history
is that you have to justify why things are the way they are each time
you come across a shortcoming :-)

| I suppose another strategy would be to clean up the expand
| placeholders in the C/C++ IR one by one, rather than try to simplify it to
| something rather different all at once, but that would not address the
| STMT/EXPR issue.

True.  Is it acceptable (from the Release Manager point of view) to
make that change now?  If we can't then it would be a really confusing
situation with the increasing number of branches and the amazing work
already done on them...  If we can, then that may make life a lot
easier... 

-- Gaby

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  6:00   ` Gabriel Dos Reis
@ 2002-08-23  7:41     ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-08-23  7:41 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: gcc

On 23 Aug 2002 14:56:10 +0200, Gabriel Dos Reis <gdr@integrable-solutions.net> wrote:

> Jason Merrill <jason@redhat.com> writes:
>
> | I suppose another strategy would be to clean up the expand
> | placeholders in the C/C++ IR one by one, rather than try to simplify it to
> | something rather different all at once, but that would not address the
> | STMT/EXPR issue.
>
> True.  Is it acceptable (from the Release Manager point of view) to make
> that change now?  If we can't then it would be a really confusing
> situation with the increasing number of branches and the amazing work
> already done on them...  If we can, then that may make life a lot
> easier...

My current strategy only changes the simplifier, so it only affects the
tree-ssa branch.  I'm also in favor of changing the C/C++ frontend IR, but
the urgent thing is to establish the framework for the optimizers.  How we
get there isn't as important.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  5:41 ` Jason Merrill
  2002-08-23  6:00   ` Gabriel Dos Reis
@ 2002-08-23  8:22   ` Pop SÃ©bastian
  2002-08-23  8:46     ` Jason Merrill
  2002-08-23  8:23   ` Jason Merrill
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 95+ messages in thread
From: Pop SÃƒÂ©bastian @ 2002-08-23  8:22 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

On Fri, Aug 23, 2002 at 01:41:28PM +0100, Jason Merrill wrote:
> 
> It seems to me that these fundamental design questions need to be
> straightened out now, before we build anything more on top of the current
> IR, or we're just digging ourselves a deeper hole.  As long as we're
> building something new, we should take the time to do it right.  Do others
> disagree?  

Agree.  

But we have to try to implement some trivial optimisations in order
to see what tools we'll need further.  As a trivial example I see the 
double-chaining without which goto-elimination is much harder to implement.

Another example is the problem on which I stopped to develop an improvement
for the switch to binary search : I just needed a vector of CASE_LABELs
on which I need a simple fast sorting based on the value of the CASE_LABEL.

vector<tree> then a simple std::sort (), or even better: vector<case_label> :-)

I can imagine how hard it will be to implement all the loop nest 
optimizations _without_ the support of a high level programming language.  
But that's another story...

> 
> Basically, I'm wondering if other people are interested enough in this
> change to 1) help with the design for the IR and 2) help with porting the
> existing tree optimizers to the new IR.
> 
I'm ready for giving of my free time for both tasks :-)

> 
> It seems to me that double-chaining could be provided by using the
> TREE_CHAIN of the COMPOUND_EXPRs.
> 
Hmm, interesting, I never thought to this solution.

I have another solution:  
since we'll need this double chaining only in the optimizer, 
and if we decide to write this optimizer in C++,
then we could use a map<stmt, stmt> for keeping the information
about previous statements.  

This could be a solution for keeping SSA information as well ...

>    stmt: compound-stmt | non-compound-stmt
>    non-compound-stmt:
>      block
>      | loop-stmt
>      | if-stmt
>      | switch-stmt
>      | labeled-block-stmt
>      | jump-stmt
>      | label-stmt
>      | try-stmt
>      | modify-stmt
>      | call-stmt
>    loop-stmt:
>      LOOP_EXPR
>        LOOP_EXPR_BODY -> stmt | NULL_TREE
>      | DO_LOOP_EXPR
>        (to be defined later)
> 
> The Java loop has 1 (or 0) EXIT_EXPR, used to express the loop condition.
> This makes it easy to distinguish from 'break's, which are expressed
> with EXIT_BLOCK_EXPR.  
> 
> EXIT_EXPR is a bit backwards for this purpose, as its sense is opposite to
> that of the loop condition, so we end up calling invert_truthvalue twice in
> the process of generating and expanding it.  But that's not a big deal.
> 
> From an optimization perspective, are LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
> easier to deal with than plain gotos?  I assume they're preferable to the
> current loosely bound BREAK_STMT, which has no information about what it's
> exiting.  EXIT_EXPR would have the same problem if it were used to express
> 'break'.
> 
>    if-stmt:
>      COND_EXPR
>        op0 -> condition
>        op1 -> stmt
>        op2 -> stmt
>    switch-stmt:
>      SWITCH_EXPR
>        op0 -> val
>        op1 -> stmt
> 
> The McCAT SIMPLE requires the simplifier to make case labels disjoint by
> copying shared code around, allowing a more structured representation of a
> switch.  I think this is too dubious an optimization to be performed by
> default, but might be interesting as part of a goto-elimination pass; 

Agree.  This would just copy too much code.  I think this optimization 
should take also some arbitrary bounds (ie. no more than 20 cases, and 
no more than k copied statements).  Otherwise we could end with a huge
tree...

> 
> I've attached my current patch below; it's still in the toy stages, but
> gives an idea of my implementation strategy.
> 
I'll take a look.  
Thanks.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  5:41 ` Jason Merrill
  2002-08-23  6:00   ` Gabriel Dos Reis
  2002-08-23  8:22   ` Pop SÃ©bastian
@ 2002-08-23  8:23   ` Jason Merrill
  2002-08-23 12:16   ` Diego Novillo
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-08-23  8:23 UTC (permalink / raw)
  To: gcc

On Fri, 23 Aug 2002 13:41:28 +0100, Jason Merrill <jason@redhat.com> wrote:

> EXIT_EXPR is a bit backwards for this purpose, as its sense is opposite to
> that of the loop condition, so we end up calling invert_truthvalue twice in
> the process of generating and expanding it.  But that's not a big deal.

But perhaps we should change it anyway.  Is anyone aware of a frontend
other than Java which uses EXIT_EXPR?  If nothing else, we could control
this with a flag.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  8:22   ` Pop SÃ©bastian
@ 2002-08-23  8:46     ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-08-23  8:46 UTC (permalink / raw)
  To: Pop SÃƒÂ©bastian; +Cc: gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 865 bytes --]

On Fri, 23 Aug 2002 17:21:58 +0200, Pop SÃ©bastian <pop@gauvain.u-strasbg.fr> wrote:

> On Fri, Aug 23, 2002 at 01:41:28PM +0100, Jason Merrill wrote:

> But we have to try to implement some trivial optimisations in order
> to see what tools we'll need further.

Certainly.  But I'd like to get away from dealing with SCOPE_STMT as
quickly as possible.

> Another example is the problem on which I stopped to develop an
> improvement for the switch to binary search : I just needed a vector of
> CASE_LABELs on which I need a simple fast sorting based on the value of
> the CASE_LABEL.

We could certainly add such a field to the SWITCH_EXPR.  Say
SWITCH_EXPR_VALUES, which is either NULL_TREE or a TREE_VEC of
CASE_LABEL_EXPRs.

> vector<tree> then a simple std::sort (), or even better:
> vector<case_label> :-)

Using qsort() isn't *that* horrible.  :)

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  5:41 ` Jason Merrill
                     ` (2 preceding siblings ...)
  2002-08-23  8:23   ` Jason Merrill
@ 2002-08-23 12:16   ` Diego Novillo
  2002-08-23 15:42     ` Daniel Berlin
                       ` (3 more replies)
  2002-08-23 13:01   ` Neil Booth
                     ` (2 subsequent siblings)
  6 siblings, 4 replies; 95+ messages in thread
From: Diego Novillo @ 2002-08-23 12:16 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

On Fri, 23 Aug 2002, Jason Merrill wrote:

> Basically, I'm wondering if other people are interested enough in this
> change to 1) help with the design for the IR and 2) help with porting the
> existing tree optimizers to the new IR.
> 
Yes to both.  I'm actually more interested in #2 but can help
with #1 by finding rough spots when building a flowgraph, SSA and
the other transformations.

> rth has raised some questions about the advisability of using COMPOUND_EXPR
> to chain statements; the current scheme uses TREE_CHAIN of the statements
> themselves.  To me, the benefit is modularity; apart from the earlier
> complaints about the STMT/EXPR distinction, using COMPOUND_EXPR makes it
> easy to replace a single complex expression with a sequence of simple ones,
> simply by plugging in a COMPOUND_EXPR in its place.  The current scheme
> requires a lot more pointer management in order to splice the new STMTs in
> at both ends.
> 
From an optimizer perspective, all you really want is be able to
traverse all the statements in a basic block using a single
traversal.  The current scheme of having COMPOUND_STMT (or _EXPR)
start a new chain is pretty convenient for building the
flowgraph.  It lets you keep the nesting information in the
flowgraph itself (basic blocks know exactly which block starts
the scope they belong to without having to compute dominance
info).


> It seems to me that double-chaining could be provided by using the
> TREE_CHAIN of the COMPOUND_EXPRs.
> 
How so?  I'm afraid I'm not following you here.

>    loop-stmt:
>      LOOP_EXPR
>        LOOP_EXPR_BODY -> stmt | NULL_TREE
>      | DO_LOOP_EXPR
>        (to be defined later)
> 
I would prefer to have 2 generic loop expressions and 1 Fortran
do-loop expression.  The two generic loops would be for
do-while and while constructs.  However, we could get by with
just a do-while guarded with an if() (a-la SUIF).  For the
do-loop expression,  I'd like:

	DO_LOOP_EXPR
		INDEX_DECL -> var
		LB_EXPR -> val
		UB_EXPR -> val
		COND_CODE -> '<' | '>' | '<=' | '>='
		STEP_EXPR -> val
		LANDING_EXPR -> COMPOUND_EXPR
		BODY_EXPR -> COMPOUND_EXPR

This is equivalent to the following:

for (INDEX_DECL = LB_EXPR;
     INDEX_DECL COND_CODE UB_EXPR;
     INDEX_DECL += STEP_EXPR)
{
  if (INDEX_DECL == LB_EXPR)
    LANDING_EXPR;
  else
    BODY_EXPR;
}

LANDING_EXPR is just a convenience for optimizers to have
somewhere to blindly move invariant code to.  This would be
expanded to something like:

INDEX_DECL = LB_EXPR;
if (INDEX_DECL COND_CODE UB_EXPR)
    LANDING_EXPR;
for (...)

We could do without it.  Another important feature of
DO_LOOP_EXPR is that both UB_EXPR and STEP_EXPR should be loop
invariant (ideally, constants).  Anything that the front-end
can't (or won't) convert directly can be dealt with a loop
canonicalization pass which would do some basic data flow
analysis to re-write WHILE_EXPRs.

Another thing I'd like to have in each statement is the concept
of parent statement.  This is a pointer to the immediately
enclosing control statement.  We currently gather this
information while building the CFG, so if it's too difficult to
gather while parsing, we could fill it out while building the
CFG.


> The Java loop has 1 (or 0) EXIT_EXPR, used to express the loop condition.
> This makes it easy to distinguish from 'break's, which are expressed
> with EXIT_BLOCK_EXPR.  
> 
> EXIT_EXPR is a bit backwards for this purpose, as its sense is opposite to
> that of the loop condition, so we end up calling invert_truthvalue twice in
> the process of generating and expanding it.  But that's not a big deal.
> 
> >From an optimization perspective, are LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
> easier to deal with than plain gotos?  I assume they're preferable to the
> current loosely bound BREAK_STMT, which has no information about what it's
> exiting.  EXIT_EXPR would have the same problem if it were used to express
> 'break'.
> 
What's wrong with BREAK_STMT?  As long as you know its parent,
it's relatively easy to figure out where to send the flow edge.
GOTO_STMTs are just as easy to deal with.  I don't have a problem
with the existing setup.  I don't know about
LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR.

> I've attached my current patch below; it's still in the toy stages, but
> gives an idea of my implementation strategy.
> 
Note that any changes you make to the tree representation are
very likely to break the CFG and SSA builders.  We want to
experiment, but we also need to make sure that any changes we
make to the IL are reflected in the CFG and dataflow info.

I'm all for experimenting with these IL changes, but I would
first want to have something like SSA-CCP working so that any
IL changes that break SSA are immediately detected with a
bootstrap/c-torture failure.


Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  5:41 ` Jason Merrill
                     ` (3 preceding siblings ...)
  2002-08-23 12:16   ` Diego Novillo
@ 2002-08-23 13:01   ` Neil Booth
  2002-08-28  4:44     ` Jason Merrill
  2002-08-28 14:06   ` Jason Merrill
  2002-08-29 14:10   ` Richard Henderson
  6 siblings, 1 reply; 95+ messages in thread
From: Neil Booth @ 2002-08-23 13:01 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

Jason Merrill wrote:-

> It's still not clear to me how best to represent line number information.
> I'd appreciate someone from the Java team explaining to me how it's handled
> there, particularly the use of EXPR_WFL_LINECOL on
> non-EXPR_WITH_FILE_LOCATION nodes.

I think this new IR is an *excellent* opportunity to get this right.
Namely, a location is uniquely specified by a (line, col) pair; there is
no need for a filename.  See cpplib and line-map.h.  There's no reason
all of GCC can't do this.

Neil.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 12:16   ` Diego Novillo
@ 2002-08-23 15:42     ` Daniel Berlin
  2002-08-23 17:26     ` Jason Merrill
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 95+ messages in thread
From: Daniel Berlin @ 2002-08-23 15:42 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Jason Merrill, gcc

> 
> > rth has raised some questions about the advisability of using COMPOUND_EXPR
> > to chain statements; the current scheme uses TREE_CHAIN of the statements
> > themselves.  To me, the benefit is modularity; apart from the earlier
> > complaints about the STMT/EXPR distinction, using COMPOUND_EXPR makes it
> > easy to replace a single complex expression with a sequence of simple ones,
> > simply by plugging in a COMPOUND_EXPR in its place.  The current scheme
> > requires a lot more pointer management in order to splice the new STMTs in
> > at both ends.
> > 
> From an optimizer perspective, all you really want is be able to
> traverse all the statements in a basic block using a single
> traversal.

Depends on if we are talking SSA optimizers or not.
Non-SSA optimizers (if we are going to have any) need to be able to do 
this.
SSA optimizers only care about being able to traverse the various chains, 
not the statements themselves.
>  The current scheme of having COMPOUND_STMT (or _EXPR)
> start a new chain is pretty convenient for building the
> flowgraph.  It lets you keep the nesting information in the
> flowgraph itself (basic blocks know exactly which block starts
> the scope they belong to without having to compute dominance
> info).

> 
> Another thing I'd like to have in each statement is the concept
> of parent statement.  This is a pointer to the immediately
> enclosing control statement.  We currently gather this
> information while building the CFG, so if it's too difficult to
> gather while parsing, we could fill it out while building the
> CFG.
This would be nice, as it would solve the complex insertion problems, 
likely.

> I'm all for experimenting with these IL changes, but I would
> first want to have something like SSA-CCP working so that any
> IL changes that break SSA are immediately detected with a
> bootstrap/c-torture failure.

SSA-PRE is also *very* sensitive to IR changes. 
But i'm working on pointer analysis usability before i get back to it, in 
the hopes that the magic SSA fairies will take care of problems inserting 
into various types of statements (since that's why it's not bootstrappable 
with it on right now)
 > > 
> Diego.
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 12:16   ` Diego Novillo
  2002-08-23 15:42     ` Daniel Berlin
@ 2002-08-23 17:26     ` Jason Merrill
  2002-08-23 17:42       ` Zack Weinberg
                         ` (2 more replies)
  2002-08-26 16:58     ` Paul Brook
  2002-08-29 14:21     ` Richard Henderson
  3 siblings, 3 replies; 95+ messages in thread
From: Jason Merrill @ 2002-08-23 17:26 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc, Jason Merrill

On Fri, 23 Aug 2002 15:16:21 -0400, Diego Novillo <dnovillo@redhat.com> wrote:

> On Fri, 23 Aug 2002, Jason Merrill wrote:

>> It seems to me that double-chaining could be provided by using the
>> TREE_CHAIN of the COMPOUND_EXPRs.
>> 
> How so?  I'm afraid I'm not following you here.

TREE_CHAIN of a COMPOUND_EXPR is currently unused; a sequence of statements
is represented by a series of COMPOUND_EXPRs, each with a statement in op0,
pointing to the next one through op1.

>>    loop-stmt:
>>      LOOP_EXPR
>>        LOOP_EXPR_BODY -> stmt | NULL_TREE
>>      | DO_LOOP_EXPR
>>        (to be defined later)

> I would prefer to have 2 generic loop expressions and 1 Fortran
> do-loop expression.  The two generic loops would be for
> do-while and while constructs.  However, we could get by with
> just a do-while guarded with an if() (a-la SUIF).

The Java frontend uses a LOOP_EXPR (infinite loop) with an EXIT_EXPR for
the loop condition, either at the beginning or end of the statement chain
in LOOP_EXPR_BODY, for while and do-while loops respectively.  Does this
work for you, or is that a significant inconvenience compared to hanging
the condition directly off of the loop node?

The key benefit of this scheme for SIMPLE is that it allows us to simplify
the loop condition without having to copy its preque around a la
insert_before_continue.  If the condition needs simplification, we just end
up with a few other statements before we get to the EXIT_EXPR.

> For the
> do-loop expression,  I'd like:
>
> 	DO_LOOP_EXPR
> 		INDEX_DECL -> var
> 		LB_EXPR -> val
> 		UB_EXPR -> val
> 		COND_CODE -> '<' | '>' | '<=' | '>='
> 		STEP_EXPR -> val
> 		LANDING_EXPR -> COMPOUND_EXPR
> 		BODY_EXPR -> COMPOUND_EXPR

I would say -> stmt for the last two.

> LANDING_EXPR is just a convenience for optimizers to have somewhere to
> blindly move invariant code to.  We could do without it.

I think my preference would be to do without it; if we're moving code out
of the loop, it should go out of the loop.  But I don't feel strongly about
this.

>> From an optimization perspective, are LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
>> easier to deal with than plain gotos?  I assume they're preferable to
>> the current loosely bound BREAK_STMT, which has no information about
>> what it's exiting.  EXIT_EXPR would have the same problem if it were
>> used to express 'break'.

> What's wrong with BREAK_STMT?  As long as you know its parent,
> it's relatively easy to figure out where to send the flow edge.
> GOTO_STMTs are just as easy to deal with.  I don't have a problem
> with the existing setup.  I don't know about
> LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR.

My thinking was that if you're walking through a function the first time,
when you see an EXIT_BLOCK_EXPR you know exactly which blocks it is going
to close, and thus what cleanups and whatnot you might need to do.  With a
general goto, if you haven't seen the label yet you don't know much of
anything.  I suppose that in the optimizer you can annotate the gotos
appropriately, but it's a bit more complicated.

The thing about BREAK_STMT is that expand_exit_something has variable
semantics; the things it does or doesn't exit depend on the value of
exit_flag passed to expand_start_loop et al.  I'd prefer to make what it
exits explicit.

>> I've attached my current patch below; it's still in the toy stages, but
>> gives an idea of my implementation strategy.
>> 
> Note that any changes you make to the tree representation are
> very likely to break the CFG and SSA builders.  We want to
> experiment, but we also need to make sure that any changes we
> make to the IL are reflected in the CFG and dataflow info.

Indeed; that's why I was asking for people interested in helping to adjust
the various optimizers.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 17:26     ` Jason Merrill
@ 2002-08-23 17:42       ` Zack Weinberg
  2002-08-23 22:25         ` Per Bothner
  2002-08-23 22:31       ` Per Bothner
  2002-08-26 19:56       ` Diego Novillo
  2 siblings, 1 reply; 95+ messages in thread
From: Zack Weinberg @ 2002-08-23 17:42 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Diego Novillo, gcc

On Sat, Aug 24, 2002 at 01:26:06AM +0100, Jason Merrill wrote:
> 
> TREE_CHAIN of a COMPOUND_EXPR is currently unused; a sequence of statements
> is represented by a series of COMPOUND_EXPRs, each with a statement in op0,
> pointing to the next one through op1.

I'm a little concerned about memory consumption if we have to have a
COMPOUND_EXPR for (nearly) every statement, on top of whatever sort of
thing the statement itself is.  A two-operand EXPR node is 24 bytes,
which currently gets rounded up to 32 (I plan to fix that).

zw

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 17:42       ` Zack Weinberg
@ 2002-08-23 22:25         ` Per Bothner
  2002-08-26  0:07           ` Zack Weinberg
  2002-08-28  6:50           ` Jason Merrill
  0 siblings, 2 replies; 95+ messages in thread
From: Per Bothner @ 2002-08-23 22:25 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Jason Merrill, gcc

Zack Weinberg wrote:

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 17:26     ` Jason Merrill
  2002-08-23 17:42       ` Zack Weinberg
@ 2002-08-23 22:31       ` Per Bothner
  2002-08-26 19:56       ` Diego Novillo
  2 siblings, 0 replies; 95+ messages in thread
From: Per Bothner @ 2002-08-23 22:31 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

Jason Merrill wrote:

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 22:25         ` Per Bothner
@ 2002-08-26  0:07           ` Zack Weinberg
  2002-08-26  3:07             ` Neil Booth
  2002-08-28  6:50           ` Jason Merrill
  1 sibling, 1 reply; 95+ messages in thread
From: Zack Weinberg @ 2002-08-26  0:07 UTC (permalink / raw)
  To: Per Bothner; +Cc: Jason Merrill, gcc

On Fri, Aug 23, 2002 at 10:25:14PM -0700, Per Bothner wrote:
> Zack Weinberg wrote:
> >I'm a little concerned about memory consumption if we have to have a
> >COMPOUND_EXPR for (nearly) every statement, on top of whatever sort of
> >thing the statement itself is.  A two-operand EXPR node is 24 bytes,
> >which currently gets rounded up to 32 (I plan to fix that).
> 
> Two possible solutions:
> 
> We could have a variable-sized COMPOUND_EXPR (a la TREE_VEC).
> A chain of 20 "statements" is a single COMPOUND_EXPR with 20
> "operands".  Compact, fast, great locality, easy to traverse
> in either direction.  Not so easy to build of incrementally
> - but for that a parser can use a temporary obstack, or
> temporary TREE_LIST, and then re-cycle the TREE_LIST at the
> end of the block.  Or just use 2-operand COMPOUND_EXPR, and
> recycle them at the end of the block.
> 
> This may be awkward for some optimizations, but its more of a
> coding issue than a performance issue, I believe.  E.g. an
> optimization that does major re-organization should perhaps
> just copy the entire tree, and throw away the old one.  If you
> do only a few insertions, use an extra 2-operand COMPOUND_EXPR.

This is akin to what you were talking about earlier for insn lists.
Personally I like the idea in both contexts -- other nice things are
likely to fall out of it (e.g. it being easy to have tables of
annotations indexed by array offset).

TREE_LIST nodes, incidentally, have ridiculous overhead if being used
as plain old cons chains, or even alists.  Four pointers (chain, type,
purpose, value) plus a whole word of flag bits which are generally
unused.

I don't have any definite ideas for what *should* be done here -- I
just wanted to raise awareness of the memory overhead involved in the
proposed solution.

[Experimentation with different statement tree representations would
be a fine thing to do on the faster-compiler-branch, hint hint...]

zw

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-26  0:07           ` Zack Weinberg
@ 2002-08-26  3:07             ` Neil Booth
  2002-08-26  3:18               ` Gabriel Dos Reis
  0 siblings, 1 reply; 95+ messages in thread
From: Neil Booth @ 2002-08-26  3:07 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Per Bothner, Jason Merrill, gcc

Zack Weinberg wrote:-

> TREE_LIST nodes, incidentally, have ridiculous overhead if being used
> as plain old cons chains, or even alists.  Four pointers (chain, type,
> purpose, value) plus a whole word of flag bits which are generally
> unused.

I really hope this all gets fixed by moving to more "definite" structs,
rather than a catch-all union for trees.  It'll speed up bootstrap too,
by making much of the tree checking we do unnecessary.

Neil.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-26  3:07             ` Neil Booth
@ 2002-08-26  3:18               ` Gabriel Dos Reis
  0 siblings, 0 replies; 95+ messages in thread
From: Gabriel Dos Reis @ 2002-08-26  3:18 UTC (permalink / raw)
  To: Neil Booth; +Cc: Zack Weinberg, Per Bothner, Jason Merrill, gcc

Neil Booth <neil@daikokuya.co.uk> writes:

| Zack Weinberg wrote:-
| 
| > TREE_LIST nodes, incidentally, have ridiculous overhead if being used
| > as plain old cons chains, or even alists.  Four pointers (chain, type,
| > purpose, value) plus a whole word of flag bits which are generally
| > unused.
| 
| I really hope this all gets fixed by moving to more "definite" structs,
| rather than a catch-all union for trees.

I agree with you.  I even hope that some silly type-errors will be
caught. 

-- Gaby

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 12:16   ` Diego Novillo
  2002-08-23 15:42     ` Daniel Berlin
  2002-08-23 17:26     ` Jason Merrill
@ 2002-08-26 16:58     ` Paul Brook
  2002-08-29 14:21     ` Richard Henderson
  3 siblings, 0 replies; 95+ messages in thread
From: Paul Brook @ 2002-08-26 16:58 UTC (permalink / raw)
  To: gcc

I'm currently helping write the Fortran 95 frontend to GCC 
(g95.sourceforege.net). This is of some interest to as we generate SIMPLE 
trees a function at a time..

On Friday 23 August 2002 8:16 pm, Diego Novillo wrote:
> On Fri, 23 Aug 2002, Jason Merrill wrote:
> > Basically, I'm wondering if other people are interested enough in this
> > change to 1) help with the design for the IR and 2) help with porting
> > the existing tree optimizers to the new IR.

I definitely think it's a decision that should be made sooner rather than 
later. It does seem a bit strange to have the two different classes of 
tree. I'll give what help I can, although I've not much idea how the new 
optimizers work.

<snip>
> just a do-while guarded with an if() (a-la SUIF).  For the
> do-loop expression,  I'd like:
>
> 	DO_LOOP_EXPR
> 		INDEX_DECL -> var
> 		LB_EXPR -> val
> 		UB_EXPR -> val
> 		COND_CODE -> '<' | '>' | '<=' | '>='
> 		STEP_EXPR -> val
> 		LANDING_EXPR -> COMPOUND_EXPR
> 		BODY_EXPR -> COMPOUND_EXPR
>
> This is equivalent to the following:
>
> for (INDEX_DECL = LB_EXPR;
>      INDEX_DECL COND_CODE UB_EXPR;
>      INDEX_DECL += STEP_EXPR)
> {
>   if (INDEX_DECL == LB_EXPR)
>     LANDING_EXPR;
>   else
>     BODY_EXPR;
> }

This is pretty much ideal for fortran. We generate a lot of loops like this 
implicity for the scalarized array code.

> LANDING_EXPR is just a convenience for optimizers to have
> somewhere to blindly move invariant code to.  This would be
> expanded to something like:
>
> INDEX_DECL = LB_EXPR;
> if (INDEX_DECL COND_CODE UB_EXPR)
>     LANDING_EXPR;
> for (...)
>
> We could do without it.  Another important feature of
> DO_LOOP_EXPR is that both UB_EXPR and STEP_EXPR should be loop
> invariant (ideally, constants).  Anything that the front-end
> can't (or won't) convert directly can be dealt with a loop
> canonicalization pass which would do some basic data flow
> analysis to re-write WHILE_EXPRs.

Agreed.

<snip>
> > The Java loop has 1 (or 0) EXIT_EXPR, used to express the loop
> > condition. This makes it easy to distinguish from 'break's, which are
> > expressed with EXIT_BLOCK_EXPR.
> >
> > EXIT_EXPR is a bit backwards for this purpose, as its sense is opposite
> > to that of the loop condition, so we end up calling invert_truthvalue
> > twice in the process of generating and expanding it.  But that's not a
> > big deal.
> >
> > >From an optimization perspective, are
> > > LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
> >
> > easier to deal with than plain gotos?  I assume they're preferable to
> > the current loosely bound BREAK_STMT, which has no information about
> > what it's exiting.  EXIT_EXPR would have the same problem if it were
> > used to express 'break'.
>
> What's wrong with BREAK_STMT?  As long as you know its parent,
> it's relatively easy to figure out where to send the flow edge.
> GOTO_STMTs are just as easy to deal with.  I don't have a problem
> with the existing setup.  I don't know about
> LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR.

The existing BREAK_STMT isn't sufficient for Fortran purposes, where a 
break/continue statement can break out of an outer loop. We're currently 
implementing this with GOTO_STMTs, but could equally well use 
LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR if this made things easier for 
optimizations.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 17:26     ` Jason Merrill
  2002-08-23 17:42       ` Zack Weinberg
  2002-08-23 22:31       ` Per Bothner
@ 2002-08-26 19:56       ` Diego Novillo
  2002-08-29 14:35         ` Richard Henderson
  2 siblings, 1 reply; 95+ messages in thread
From: Diego Novillo @ 2002-08-26 19:56 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

On Sat, 24 Aug 2002, Jason Merrill wrote:

> TREE_CHAIN of a COMPOUND_EXPR is currently unused; a sequence of statements
> is represented by a series of COMPOUND_EXPRs, each with a statement in op0,
> pointing to the next one through op1.
> 
And TREE_CHAIN would point to the previous one?  Isn't the extra
encapsulation even more cache hostile?

How about flattening the IR when we convert it into SIMPLE?  We
know that SIMPLE trees have a much more rigid layout.  Each tree
has at most two operands (lists could be modeled with array
containers).

Furthermore, we could flatten out statements into arrays as we
build the flow graph of the function.  Each basic block contains
an array of statements.  This setup makes intra-block insertions
expensive, so we could double chain them instead of treating them
like an array (similarly to what we do with the BASIC_BLOCK
array).


> The Java frontend uses a LOOP_EXPR (infinite loop) with an EXIT_EXPR for
> the loop condition, either at the beginning or end of the statement chain
> in LOOP_EXPR_BODY, for while and do-while loops respectively.  Does this
> work for you, or is that a significant inconvenience compared to hanging
> the condition directly off of the loop node?
> 
*shrug*  So, if the first node of a LOOP_EXPR_BODY is an
EXIT_EXPR then I'm dealing with a while(), otherwise I'm dealing
with a do/while?  I guess it would work.

> The key benefit of this scheme for SIMPLE is that it allows us to simplify
> the loop condition without having to copy its preque around a la
> insert_before_continue.  If the condition needs simplification, we just end
> up with a few other statements before we get to the EXIT_EXPR.
> 
Hmm, true, but this has just messed up the straightforward scheme
for building the flowgraph that I had described in my previous
paragraph.

This means that the only way for the flowgraph builder to tell a
while() from a do/while() is to walk the LOOP_EXPR_BODY and see
whether it's last node is an EXIT_EXPR?  That's very irritating.
We will have to walk the body twice, once to tell what it is and
the second time to build the nodes.

> > For the
> > do-loop expression,  I'd like:
> >
> > 	DO_LOOP_EXPR
> > 		INDEX_DECL -> var
> > 		LB_EXPR -> val
> > 		UB_EXPR -> val
> > 		COND_CODE -> '<' | '>' | '<=' | '>='
> > 		STEP_EXPR -> val
> > 		LANDING_EXPR -> COMPOUND_EXPR
> > 		BODY_EXPR -> COMPOUND_EXPR
> 
> I would say -> stmt for the last two.
> 
Yes.

> > LANDING_EXPR is just a convenience for optimizers to have somewhere to
> > blindly move invariant code to.  We could do without it.
> 
> I think my preference would be to do without it; if we're moving code out
> of the loop, it should go out of the loop.  But I don't feel strongly about
> this.
> 
The advantage of this is that all the optimizers can blindly move
code to the landing pad.  Afterwards, we have a single pass that
deals with generating code for it (ie, emitting the if() guard
before the loop begins).

> > What's wrong with BREAK_STMT?  As long as you know its parent,
> > it's relatively easy to figure out where to send the flow edge.
> > GOTO_STMTs are just as easy to deal with.  I don't have a problem
> > with the existing setup.  I don't know about
> > LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR.
> 
> My thinking was that if you're walking through a function the first time,
> when you see an EXIT_BLOCK_EXPR you know exactly which blocks it is going
> to close, and thus what cleanups and whatnot you might need to do.  With a
> general goto, if you haven't seen the label yet you don't know much of
> anything.  I suppose that in the optimizer you can annotate the gotos
> appropriately, but it's a bit more complicated.
> 
The flowgraph is built in two steps.  You first build all the
basic blocks and then you build all the edges.  By the time you
are building the edges, you have seen every label that is the
target of a goto.

> The thing about BREAK_STMT is that expand_exit_something has variable
> semantics; the things it does or doesn't exit depend on the value of
> exit_flag passed to expand_start_loop et al.  I'd prefer to make what it
> exits explicit.
> 
OK.  But you will always need to model gotos, right?  There's no
escaping that, I presume.



Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 13:01   ` Neil Booth
@ 2002-08-28  4:44     ` Jason Merrill
  2002-08-28  5:17       ` Gabriel Dos Reis
  0 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-08-28  4:44 UTC (permalink / raw)
  To: Neil Booth; +Cc: gcc

On Fri, 23 Aug 2002 21:01:22 +0100, Neil Booth <neil@daikokuya.co.uk> wrote:

> Jason Merrill wrote:-
>
>> It's still not clear to me how best to represent line number information.
>> I'd appreciate someone from the Java team explaining to me how it's handled
>> there, particularly the use of EXPR_WFL_LINECOL on
>> non-EXPR_WITH_FILE_LOCATION nodes.
>
> I think this new IR is an *excellent* opportunity to get this right.
> Namely, a location is uniquely specified by a (line, col) pair; there is
> no need for a filename.  See cpplib and line-map.h.  There's no reason
> all of GCC can't do this.

Makes sense.  I'm also interested in what dewar was saying about having
logical line numbers for instantiations; that sort of thing seems necessary
if we are going to carry template instantiation context information into
the tree IR.

I currently have no opinion about whether to express line and column
numbers separately or use a lookup table.

Does anyone object to using TREE_COMPLEXITY for source position in all
expressions (i.e. things which have TREE_COMPLEXITY, not *_DECL, *_CST)?
None of the frontends currently in the tree seem to use it for anything
that makes it into the backend.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-28  4:44     ` Jason Merrill
@ 2002-08-28  5:17       ` Gabriel Dos Reis
  2002-08-28  7:10         ` Jason Merrill
  0 siblings, 1 reply; 95+ messages in thread
From: Gabriel Dos Reis @ 2002-08-28  5:17 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Neil Booth, gcc

Jason Merrill <jason@redhat.com> writes:

Hi,

| Does anyone object to using TREE_COMPLEXITY for source position in all
| expressions (i.e. things which have TREE_COMPLEXITY, not *_DECL, *_CST)?

Sorry, I'm unclear here.  We already have DECL_SOURCE_LOCATION to
indicate source location for _DECLs.  Would you mind elaborating a bit
on your idea?

Thanks,

-- Gaby

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 22:25         ` Per Bothner
  2002-08-26  0:07           ` Zack Weinberg
@ 2002-08-28  6:50           ` Jason Merrill
  2002-08-28 16:42             ` Per Bothner
  1 sibling, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-08-28  6:50 UTC (permalink / raw)
  To: Per Bothner; +Cc: Zack Weinberg, gcc

On Fri, 23 Aug 2002 22:25:14 -0700, Per Bothner <per@bothner.com> wrote:

> Zack Weinberg wrote:
>> I'm a little concerned about memory consumption if we have to have a
>> COMPOUND_EXPR for (nearly) every statement, on top of whatever sort of
>> thing the statement itself is.  A two-operand EXPR node is 24 bytes,
>> which currently gets rounded up to 32 (I plan to fix that).
>
> Two possible solutions:
>
> We could have a variable-sized COMPOUND_EXPR (a la TREE_VEC).
> A chain of 20 "statements" is a single COMPOUND_EXPR with 20
> "operands".  Compact, fast, great locality, easy to traverse
> in either direction.  Not so easy to build incrementally
> - but for that a parser can use a temporary obstack, or
> temporary TREE_LIST, and then re-cycle the TREE_LIST at the
> end of the block.  Or just use 2-operand COMPOUND_EXPR, and
> recycle them at the end of the block.
>
> This may be awkward for some optimizations, but its more of a
> coding issue than a performance issue, I believe.  E.g. an
> optimization that does major re-organization should perhaps
> just copy the entire tree, and throw away the old one.  If you
> do only a few insertions, use an extra 2-operand COMPOUND_EXPR.

This does sound awkward, particularly for simplification.  In the code I've
been writing which uses 2-operand COMPOUND_EXPR, simplifying a complex
statement just means replacing it with a COMPOUND_EXPR; later, we go
through and flatten the COMPOUND_EXPRs so that they're only
right-recursive.  Of course, this last step could replace them with a
variable-sized one instead, but this would add complexity in order to avoid
doing it too early.

Allocating and then throwing away all these COMPOUND_EXPRs would use even
more memory in the short term than just using the 2-operand forms, since we
don't do GC until we're done with the function.  We might be able to reduce
the wastage by using a varray for temporary storage.

Deleting a statement could be handled by replacing it with empty_stmt_node
until we get around to compacting.

Of course, even an array involves more memory use and pointer chasing than
just using the TREE_CHAIN.

I don't see any real benefit to random access.

> Alternatively, we can just use the TREE_CHAIN of expression
> nodes to chain them  Some valid expressions, including
> constants and declaration references would need to be wrapped
> in some other expression, if they are to be chained, but
> that is any enough.

Those expressions shouldn't appear at statement context (since they have no
side-effects), so we're OK.  It does seem fragile, though, and code that
currently uses the TREE_CHAIN of other expressions (i.e. PUSH_LABELED_BLOCK
in Java) would need to be changed.

This leaves the question of how to handle double-chaining of statements.
The type of any expression at statement context isn't important (and will
usually be void), so we could reuse that field like the current code does.
Or look up backwards links in another table.

This scheme makes substitution more complicated; if we have some sort of
wrapper, we can just replace one statement with another (or with a
COMPOUND_EXPR).  If we use the TREE_CHAIN directly, we need to splice the
replacement in at both beginning and end.

This scheme provides no easy way of finding the end of a chain; walking
over the whole list each time could get expensive in large functions.

> The solution of just chaining expressions together does the
> smentic problem of when are we talking about the component
> expression, and are we talking about the chain.  Should expand_expr
> applied to an expression implicitly also expand its TREE_CHAIN?
> That appears to be what c-semantics.c does, at first glance.

Makes sense.  Or there could be a CHAINED_EXPR_LIST node wrapping the first
one, which could also have a pointer to the end of the chain.

I'm not compelled by any of the options mentioned.  Any other opinions?

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-28  5:17       ` Gabriel Dos Reis
@ 2002-08-28  7:10         ` Jason Merrill
  2002-08-28  8:02           ` Diego Novillo
  2002-08-28  8:32           ` Gabriel Dos Reis
  0 siblings, 2 replies; 95+ messages in thread
From: Jason Merrill @ 2002-08-28  7:10 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: Neil Booth, gcc

On 28 Aug 2002 14:13:12 +0200, Gabriel Dos Reis <gdr@integrable-solutions.net> wrote:

> Jason Merrill <jason@redhat.com> writes:
>
> | Does anyone object to using TREE_COMPLEXITY for source position in all
> | expressions (i.e. things which have TREE_COMPLEXITY, not *_DECL, *_CST)?
>
> Sorry, I'm unclear here.  We already have DECL_SOURCE_LOCATION to
> indicate source location for _DECLs.  Would you mind elaborating a bit
> on your idea?

Yes, but that only applies to _DECLs.  We need source location info for
other expressions, too, if we're going to have useful debugging info.  I'm
suggesting that we use TREE_COMPLEXITY for this purpose.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-28  7:10         ` Jason Merrill
@ 2002-08-28  8:02           ` Diego Novillo
  2002-08-28  8:32           ` Gabriel Dos Reis
  1 sibling, 0 replies; 95+ messages in thread
From: Diego Novillo @ 2002-08-28  8:02 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Gabriel Dos Reis, Neil Booth, gcc

On Wed, 2002-08-28 at 10:10, Jason Merrill wrote:
> On 28 Aug 2002 14:13:12 +0200, Gabriel Dos Reis <gdr@integrable-solutions.net> wrote:
> 
> > Jason Merrill <jason@redhat.com> writes:
> >
> > | Does anyone object to using TREE_COMPLEXITY for source position in all
> > | expressions (i.e. things which have TREE_COMPLEXITY, not *_DECL, *_CST)?
> >
> > Sorry, I'm unclear here.  We already have DECL_SOURCE_LOCATION to
> > indicate source location for _DECLs.  Would you mind elaborating a bit
> > on your idea?
> 
> Yes, but that only applies to _DECLs.  We need source location info for
> other expressions, too, if we're going to have useful debugging info.  I'm
> suggesting that we use TREE_COMPLEXITY for this purpose.
> 
Minor nit.  We would be renaming the accessor to something more
intuitive than TREE_COMPLEXITY, right?


Diego.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-28  7:10         ` Jason Merrill
  2002-08-28  8:02           ` Diego Novillo
@ 2002-08-28  8:32           ` Gabriel Dos Reis
  1 sibling, 0 replies; 95+ messages in thread
From: Gabriel Dos Reis @ 2002-08-28  8:32 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Neil Booth, gcc

Jason Merrill <jason@redhat.com> writes:

| On 28 Aug 2002 14:13:12 +0200, Gabriel Dos Reis <gdr@integrable-solutions.net> wrote:
| 
| > Jason Merrill <jason@redhat.com> writes:
| >
| > | Does anyone object to using TREE_COMPLEXITY for source position in all
| > | expressions (i.e. things which have TREE_COMPLEXITY, not *_DECL, *_CST)?
| >
| > Sorry, I'm unclear here.  We already have DECL_SOURCE_LOCATION to
| > indicate source location for _DECLs.  Would you mind elaborating a bit
| > on your idea?
| 
| Yes, but that only applies to _DECLs.  We need source location info for
| other expressions, too, if we're going to have useful debugging info.  I'm
| suggesting that we use TREE_COMPLEXITY for this purpose.

Aha, OK.  That makes sense to me, now. 

Sometime ago, Neil proposed to attach source location information to
every token CPP passes to the front-end (using line-map.h machinery).
I think both your proposals are complementary and make sense.

-- Gaby

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  5:41 ` Jason Merrill
                     ` (4 preceding siblings ...)
  2002-08-23 13:01   ` Neil Booth
@ 2002-08-28 14:06   ` Jason Merrill
  2002-09-10 17:08     ` Gerald Pfeifer
  2002-08-29 14:10   ` Richard Henderson
  6 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-08-28 14:06 UTC (permalink / raw)
  To: gcc

FWIW, I've created a new branch, named bnw-simple-branch, for my work in
progress on the new simplifier.  Please ask me before checking anything
into it.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-28  6:50           ` Jason Merrill
@ 2002-08-28 16:42             ` Per Bothner
  0 siblings, 0 replies; 95+ messages in thread
From: Per Bothner @ 2002-08-28 16:42 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

Jason Merrill wrote:

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23  5:41 ` Jason Merrill
                     ` (5 preceding siblings ...)
  2002-08-28 14:06   ` Jason Merrill
@ 2002-08-29 14:10   ` Richard Henderson
  2002-09-03  5:16     ` Jason Merrill
  6 siblings, 1 reply; 95+ messages in thread
From: Richard Henderson @ 2002-08-29 14:10 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

On Fri, Aug 23, 2002 at 01:41:28PM +0100, Jason Merrill wrote:
> rth has raised some questions about the advisability of using COMPOUND_EXPR
> to chain statements; the current scheme uses TREE_CHAIN of the statements
> themselves.

Indeed.  I'm not really sure what to do about this either.  Per's scheme
of using chained array fragments to reduce the memory allocation overhead
also complicates the act of iteration (without moving to C++ with more
abstract iterators).

It'd be nice if the thingy used here were exactly 3 words long (next,
prev, expr), and avoided the extra tree overhead.  Perhaps that's too
much to hope for.

Perhaps this can be deferred for now by creating some 
FOR_EACH_STMT (STMT, START) macros that do the iteration.  Then we
can replace the iteration implementation more easily later.

> To me, the benefit is modularity; apart from the earlier
> complaints about the STMT/EXPR distinction, using COMPOUND_EXPR makes it
> easy to replace a single complex expression with a sequence of simple ones,
> simply by plugging in a COMPOUND_EXPR in its place.  The current scheme
> requires a lot more pointer management in order to splice the new STMTs in
> at both ends.

I don't see this as particularly compelling.  Write one function
to do the splice and it's done.

> From an optimization perspective, are LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
> easier to deal with than plain gotos?

Perhaps, but since gotos have to be handled in any case, two passes
are required for CFG construction.  I guess the main point of having
something of this form is for goto-elimination vs switch statements.

Though I don't see the point of LABELED_BLOCK_EXPR; EXIT_BLOCK_EXPR
could just as well refer to the stmt tree that it exits.

> I assume they're preferable to the
> current loosely bound BREAK_STMT, which has no information about what it's
> exiting.  EXIT_EXPR would have the same problem if it were used to express
> 'break'.

Yes, I really dislike having such loosely-bound constructs persisting
this long.  This goes for BREAK, EXIT, and CASE_LABEL.

>    switch-stmt:
>      SWITCH_EXPR
>        op0 -> val
>        op1 -> stmt

Speaking of CASE_LABEL, one way to fix that would be to have the
SWITCH_EXPR have a vector that mapped indicies to plain old LABEL_EXPRs.
Thoughts here?  Perhaps I am making too much work for the front end?

> I'm not sure how we want to represent throws for the purpose of to
> generating an ERT_THROW region?  I had thought about using a THROW_EXPR
> wrapper, but that wouldn't work in non-simplified code where calls can have
> complex args.  Perhaps annotation of the CALL_EXPR would work better.

Probably.

>    varname : compref | ID (rvalue)
>    lhs: varname | '*' ID  (lvalue)

Is this lhs really correct?  Surely ID (rvalue) isn't valid here.
Oughtn't it be

	lhs: compref | ID (lvalue) | * ID (rvalue)

r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 12:16   ` Diego Novillo
                       ` (2 preceding siblings ...)
  2002-08-26 16:58     ` Paul Brook
@ 2002-08-29 14:21     ` Richard Henderson
  3 siblings, 0 replies; 95+ messages in thread
From: Richard Henderson @ 2002-08-29 14:21 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Jason Merrill, gcc

On Fri, Aug 23, 2002 at 03:16:21PM -0400, Diego Novillo wrote:
> 	DO_LOOP_EXPR
> 		INDEX_DECL -> var
> 		LB_EXPR -> val
> 		UB_EXPR -> val
> 		COND_CODE -> '<' | '>' | '<=' | '>='
> 		STEP_EXPR -> val
> 		LANDING_EXPR -> COMPOUND_EXPR
> 		BODY_EXPR -> COMPOUND_EXPR
> 
> This is equivalent to the following:
> 
> for (INDEX_DECL = LB_EXPR;
>      INDEX_DECL COND_CODE UB_EXPR;
>      INDEX_DECL += STEP_EXPR)
> {
>   if (INDEX_DECL == LB_EXPR)
>     LANDING_EXPR;
>   else
>     BODY_EXPR;
> }
>
> LANDING_EXPR is just a convenience for optimizers to have
> somewhere to blindly move invariant code to.  This would be
> expanded to something like:
> 
> INDEX_DECL = LB_EXPR;
> if (INDEX_DECL COND_CODE UB_EXPR)
>     LANDING_EXPR;
> for (...)

Surely a better expansion is

	INDEX_DECL = LB_EXPR;
	if (INDEX_DECL COND_CODE UB_EXPR)
	  {
	    LANDING_EXPR;
	    do {
	      BODY_EXPR;
	      INDEX_DECL += STEP_EXPR;
	    } while (INDEX_DECL COND_CODE UB_EXPR);
	  }

And, really, I'm sort of with Jason in wondering whether the
LANDING_EXPR is really useful at this stage.

> Another important feature of DO_LOOP_EXPR is that both UB_EXPR and
> STEP_EXPR should be loop invariant (ideally, constants).

We have two choices here: either require the front end to 
know that these are loop-invariant beforehand (either by
dataflow analysis or by the undefined-ness of the source
program that would have created such nodes), or we can
copy these expressions into temporaries ourselves.  I.e.

	T.lb = LB_EXPR;
	T.ub = UB_EXPR;
	INDEX_DECL = T.lb;
	if (INDEX_DECL COND_CODE T.ub)
	  ...

> What's wrong with BREAK_STMT?  As long as you know its parent,

That's the point, really.  Not having to track the parent
after the fact.


r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-26 19:56       ` Diego Novillo
@ 2002-08-29 14:35         ` Richard Henderson
  2002-09-03  5:38           ` Jason Merrill
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Henderson @ 2002-08-29 14:35 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Jason Merrill, gcc

On Mon, Aug 26, 2002 at 10:56:21PM -0400, Diego Novillo wrote:
> *shrug*  So, if the first node of a LOOP_EXPR_BODY is an
> EXIT_EXPR then I'm dealing with a while(), otherwise I'm dealing
> with a do/while?  I guess it would work.

No.  Consider

	while (foo() == bar())
	  baz();

	LOOP_EXPR
	  T.1 = foo();
	  T.2 = bar();
	  EXIT_EXPR T.1 != T.2;
	  baz();

vs

	do {
	  baz();
	} while (foo() == bar());

	LOOP_EXPR
	  baz();
	  T.1 = foo();
	  T.2 = bar();
	  EXIT_EXPR T.1 != T.2;

Why does CFG creation care about the difference between while
and do/while anyway?

One thing I do note here is that loop rotation cannot be done
easily and correctly with things in this form.  I fixed a bug
with that just this year.  See

2002-01-30  Richard Henderson  <rth@redhat.com>

        PR opt/5076
        * rtl.h (NOTE_INSN_LOOP_END_TOP_COND): New.
        * rtl.c (note_insn_name): Update.
        * emit-rtl.c (remove_unnecessary_notes): Kill it.
        * stmt.c (expand_end_loop): Kill jump opt code.  Use LOOP_END_TOP_COND
        to perform loop rotation.
        (expand_exit_loop_top_cond): New.
        * tree.h (expand_exit_loop_top_cond): Declare it.
        * c-semantics.c (genrtl_while_stmt): Use it.
        (genrtl_for_stmt): Likewise.

I don't see that the problem encountered at the rtl level 
will be any different at the tree level.  This might be evidence
for retaining distinct WHILE and DO_WHILE nodes and dispensing
with plain LOOP_EXPR.



r~

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-29 14:10   ` Richard Henderson
@ 2002-09-03  5:16     ` Jason Merrill
  2002-09-04 10:20       ` Paul Brook
  0 siblings, 1 reply; 95+ messages in thread
From: Jason Merrill @ 2002-09-03  5:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

On Thu, 29 Aug 2002 14:10:43 -0700, Richard Henderson <rth@redhat.com> wrote:

> On Fri, Aug 23, 2002 at 01:41:28PM +0100, Jason Merrill wrote:
>> rth has raised some questions about the advisability of using COMPOUND_EXPR
>> to chain statements; the current scheme uses TREE_CHAIN of the statements
>> themselves.

> Perhaps this can be deferred for now by creating some 
> FOR_EACH_STMT (STMT, START) macros that do the iteration.  Then we
> can replace the iteration implementation more easily later.

Indeed, I've already added a foreach_stmt function to c-simplify.c on the
bnw-simple-branch.

> Speaking of CASE_LABEL, one way to fix that would be to have the
> SWITCH_EXPR have a vector that mapped indices to plain old LABEL_EXPRs.
> Thoughts here?  Perhaps I am making too much work for the front end?

I was just talking about giving it an optional vector of case labels; I
think it would make sense for the simplifier to lower CASE_LABELs to
LABEL_EXPRs and build up that vector.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-29 14:35         ` Richard Henderson
@ 2002-09-03  5:38           ` Jason Merrill
  0 siblings, 0 replies; 95+ messages in thread
From: Jason Merrill @ 2002-09-03  5:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Diego Novillo, gcc

On Thu, 29 Aug 2002 14:35:32 -0700, Richard Henderson <rth@redhat.com> wrote:

> Consider
>
> 	while (foo() == bar())
> 	  baz();
>
> 	LOOP_EXPR
> 	  T.1 = foo();
> 	  T.2 = bar();
> 	  EXIT_EXPR T.1 != T.2;
> 	  baz();
>
> vs
>
> 	do {
> 	  baz();
> 	} while (foo() == bar());
>
> 	LOOP_EXPR
> 	  baz();
> 	  T.1 = foo();
> 	  T.2 = bar();
> 	  EXIT_EXPR T.1 != T.2;
>
> One thing I do note here is that loop rotation cannot be done
> easily and correctly with things in this form.  I fixed a bug
> with that just this year.  See
>
> 2002-01-30  Richard Henderson  <rth@redhat.com>
>
>         PR opt/5076
>         * rtl.h (NOTE_INSN_LOOP_END_TOP_COND): New.
>         * rtl.c (note_insn_name): Update.
>         * emit-rtl.c (remove_unnecessary_notes): Kill it.
>         * stmt.c (expand_end_loop): Kill jump opt code.  Use LOOP_END_TOP_COND
>         to perform loop rotation.
>         (expand_exit_loop_top_cond): New.
>         * tree.h (expand_exit_loop_top_cond): Declare it.
>         * c-semantics.c (genrtl_while_stmt): Use it.
>         (genrtl_for_stmt): Likewise.
>
> I don't see that the problem encountered at the rtl level 
> will be any different at the tree level.  This might be evidence
> for retaining distinct WHILE and DO_WHILE nodes and dispensing
> with plain LOOP_EXPR.

I don't think so; we can set a flag on the EXIT_EXPR/GOTO_EXPR that would
work just like LOOP_END_TOP_COND does for RTL.

However, I don't see any reason why you can't just arbitrarily move the
first basic block in the loop to the bottom.  If the branch happens to come
from something like 'if (!foo) break;' rather than a loop condition, that's
fine; it would be tested just as often.  The problem in opt/5076 seems to
have to do with not knowing the full extent of the loop; this
representation doesn't have that problem.  Even if the testcase were

int
main (void)
{
  int iNbr = 1;
  int test = 0;
  while (true)    /* was test == 0 */
    {
      inc ();
      if (iNbr == 0)
        break;
      else
        {
          inc ();
          iNbr--;
        }
      test = 1;
    }
  if (count != 2)
    abort ();
  return 0;
}

We could rotate it to

  goto first;
  while (true)
    {
      inc ();
      iNbr--;
      test = 1;
    first:
      inc ();
      if (iNbr == 0)
        break;
    }

or just suppress rotation because the break condition has an 'else' clause.

As we discussed on Thursday, this would be complicated by a cleanup for a
C++ condition declaration, but as long as all the code to be moved is at
the same nesting level, I don't see any reason why it would be a problem.

Jason

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-09-03  5:16     ` Jason Merrill
@ 2002-09-04 10:20       ` Paul Brook
  0 siblings, 0 replies; 95+ messages in thread
From: Paul Brook @ 2002-09-04 10:20 UTC (permalink / raw)
  To: gcc

On Monday 02 September 2002 10:03 pm, Jason Merrill wrote:
> On Thu, 29 Aug 2002 14:10:43 -0700, Richard Henderson <rth@redhat.com> 
wrote:
> > On Fri, Aug 23, 2002 at 01:41:28PM +0100, Jason Merrill wrote:
> >> rth has raised some questions about the advisability of using
> >> COMPOUND_EXPR to chain statements; the current scheme uses TREE_CHAIN
> >> of the statements themselves.
> >
> > Perhaps this can be deferred for now by creating some
> > FOR_EACH_STMT (STMT, START) macros that do the iteration.  Then we
> > can replace the iteration implementation more easily later.
>
> Indeed, I've already added a foreach_stmt function to c-simplify.c on the
> bnw-simple-branch.

I haven't looked at the code, but I'd gess this is a language-independant 
function used by the optimizers. If so, could you put it in a l-i file, 
rather than c-simplify.c. Otherwise It'll just mean more work later. There 
are already some functions (eg. deep_copy_node/list) which shouldn't be in 
c-simplify.c, It seems silly to add more.

Basically I'm opposed to anything that isn't C specific going into 
c-simplify.c because this makes maintenance of other languages (in my case 
Fortran95) difficult and requires unneccessary code duplication.

Paul Brook

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-28 14:06   ` Jason Merrill
@ 2002-09-10 17:08     ` Gerald Pfeifer
  0 siblings, 0 replies; 95+ messages in thread
From: Gerald Pfeifer @ 2002-09-10 17:08 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

On Wed, 28 Aug 2002, Jason Merrill wrote:
> FWIW, I've created a new branch, named bnw-simple-branch, for my work in
> progress on the new simplifier.  Please ask me before checking anything
> into it.

How about mentioning this in cvs.html (i.e., both the name of the branch,
a quick description of its purpose and the policy)?

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
@ 2002-08-28  5:03 Robert Dewar
  0 siblings, 0 replies; 95+ messages in thread
From: Robert Dewar @ 2002-08-28  5:03 UTC (permalink / raw)
  To: jason, neil; +Cc: gcc

> Makes sense.  I'm also interested in what dewar was saying about having
> logical line numbers for instantiations; that sort of thing seems necessary
> if we are going to carry template instantiation context information into
> the tree IR.

For details see the GNAT units sinput.ads/sinput.adb (I don't think you particularly
need to know Ada or many other details of GNAT to follow the approach).

Basically we use a single 32-bit number to represent a source position.

We have a table that shows where each file starts in this 32-bit space, and you
can binary search this table to find out what file a 32-bit value belongs to.

For instantiations we create essentially a phantom copy of the source that has
a separate entry in this table, so that when you go back to the table you
know that the source position is offset bla in file xxx instantiated at yyy
(since yyy is itself a similar 32-bit location, the instantiation chain
can be followed recursively).

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
@ 2002-08-24 12:47 Robert Dewar
  0 siblings, 0 replies; 95+ messages in thread
From: Robert Dewar @ 2002-08-24 12:47 UTC (permalink / raw)
  To: gcc, mszick

>On Friday 23 August 2002 04:35 pm, Robert Dewar wrote:
>>
>> But Four thousand million characrters of sourcers is a heck of a lot.
>> This is typically a hundred million lines of source. I doubt there are
>> any programs in existence where a single compilation requires a hundred
>> million lines of source!
>See: OpenOffice.org

You are mixing up the size of a program which, if you count lots of subsystems
can of course be in this range, with the size of a single compilation which
cannot reasonably be anywhere near in this range.

(for one thing, if there was a single file to be compiled requiring hundreds
of millions of lines to compile, I doubt you would ever see gcc complete
compiling it :-)

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 14:35 Robert Dewar
  2002-08-23 15:21 ` Neil Booth
@ 2002-08-24  7:59 ` Michael S. Zick
  1 sibling, 0 replies; 95+ messages in thread
From: Michael S. Zick @ 2002-08-24  7:59 UTC (permalink / raw)
  To: gcc

On Friday 23 August 2002 04:35 pm, Robert Dewar wrote:
>
> But Four thousand million characrters of sourcers is a heck of a lot.
> This is typically a hundred million lines of source. I doubt there are
> any programs in existence where a single compilation requires a hundred
> million lines of source!
See: OpenOffice.org

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
@ 2002-08-23 18:20 Robert Dewar
  0 siblings, 0 replies; 95+ messages in thread
From: Robert Dewar @ 2002-08-23 18:20 UTC (permalink / raw)
  To: dewar, neil; +Cc: gcc, jason

> Figuring out line numbers is painful if all you have is a character
> offset into the preprocessed source, or something similar.

Why? All you need is a sorted line table and you do binary searches. This
is the way things are done in GNAT, and it works just fine.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 15:25 Richard Kenner
@ 2002-08-23 15:33 ` Neil Booth
  0 siblings, 0 replies; 95+ messages in thread
From: Neil Booth @ 2002-08-23 15:33 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner wrote:-

>     Figuring out line numbers is painful if all you have is a character
>     offset into the preprocessed source, or something similar.
> 
> Why?  You make a map from line number to starting character number and
> do a binary search.

And that array can get absolutely huge, is always being resized, and
adding to it is a very frequent operation.  At which point it is far
from obvious that you're gaining anything.  I imagine the average
compilation unit in GCC touches 10000 entries here frequently;
Fergus was reporting files he compiles with almost a million lines.

The line maps that CPP uses are pretty minimal in comparison (at a
guess, one hundred for an average compilation unit; for every
enter / leave of a file).  It just requires you keep track of column
(2 bytes) for whatever things you care about.  Often that is just the
tokens of the current logical line, and things that stick around like
DECLs.

Without the column info, cpp's token would have 2 bytes of padding, so
it doesn't even cost anything.

Neil.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
@ 2002-08-23 15:25 Richard Kenner
  2002-08-23 15:33 ` Neil Booth
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Kenner @ 2002-08-23 15:25 UTC (permalink / raw)
  To: neil; +Cc: gcc

    Figuring out line numbers is painful if all you have is a character
    offset into the preprocessed source, or something similar.

Why?  You make a map from line number to starting character number and
do a binary search.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 14:35 Robert Dewar
@ 2002-08-23 15:21 ` Neil Booth
  2002-08-24  7:59 ` Michael S. Zick
  1 sibling, 0 replies; 95+ messages in thread
From: Neil Booth @ 2002-08-23 15:21 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc, jason

Robert Dewar wrote:-

> Of course! That's why my message said 4 gigabytes of sources and not
> four gigalines.
> 
> But Four thousand million characrters of sourcers is a heck of a lot.
> This is typically a hundred million lines of source. I doubt there are
> any programs in existence where a single compilation requires a hundred
> million lines of source!

Figuring out line numbers is painful if all you have is a character
offset into the preprocessed source, or something similar.

Neil.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
@ 2002-08-23 14:35 Robert Dewar
  2002-08-23 15:21 ` Neil Booth
  2002-08-24  7:59 ` Michael S. Zick
  0 siblings, 2 replies; 95+ messages in thread
From: Robert Dewar @ 2002-08-23 14:35 UTC (permalink / raw)
  To: dewar, neil; +Cc: gcc, jason

> > How can that be? 32 bits = 4 gigabytes of sources for one compilation. Pretty
> > hard to believe that there is any program that requires anything close to
> > that.
> 
> As my original message said, there's a column too.

Of course! That's why my message said 4 gigabytes of sources and not
four gigalines.

But Four thousand million characrters of sourcers is a heck of a lot.
This is typically a hundred million lines of source. I doubt there are
any programs in existence where a single compilation requires a hundred
million lines of source!

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
@ 2002-08-23 14:14 Robert Dewar
  2002-08-23 14:14 ` Neil Booth
  0 siblings, 1 reply; 95+ messages in thread
From: Robert Dewar @ 2002-08-23 14:14 UTC (permalink / raw)
  To: dewar, neil; +Cc: gcc, jason

> We discussed this a while ago, and 32-bits is not enough for C etc.
> 
> Neil.


How can that be? 32 bits = 4 gigabytes of sources for one compilation. Pretty
hard to believe that there is any program that requires anything close to
that.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 14:14 Robert Dewar
@ 2002-08-23 14:14 ` Neil Booth
  0 siblings, 0 replies; 95+ messages in thread
From: Neil Booth @ 2002-08-23 14:14 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc, jason

Robert Dewar wrote:-

> How can that be? 32 bits = 4 gigabytes of sources for one compilation. Pretty
> hard to believe that there is any program that requires anything close to
> that.

As my original message said, there's a column too.

Neil.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
  2002-08-23 13:53 Robert Dewar
@ 2002-08-23 14:10 ` Neil Booth
  0 siblings, 0 replies; 95+ messages in thread
From: Neil Booth @ 2002-08-23 14:10 UTC (permalink / raw)
  To: Robert Dewar; +Cc: jason, gcc

Robert Dewar wrote:-

> In GNAT we specify a location by a single 32-bit number which includes information
> about nested generic instantiations (see sinput for details).

We discussed this a while ago, and 32-bits is not enough for C etc.

Neil.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: Language-independent functions-as-trees representation
@ 2002-08-23 13:53 Robert Dewar
  2002-08-23 14:10 ` Neil Booth
  0 siblings, 1 reply; 95+ messages in thread
From: Robert Dewar @ 2002-08-23 13:53 UTC (permalink / raw)
  To: jason, neil; +Cc: gcc

In GNAT we specify a location by a single 32-bit number which includes information
about nested generic instantiations (see sinput for details).

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2002-09-11  0:08 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-20 19:47 Language-independent functions-as-trees representation Jason Merrill
2002-07-20 13:58 ` Per Bothner
2002-07-21 22:04   ` Jason Merrill
2002-07-20 22:17 ` Steven Bosscher
2002-07-21  6:20   ` Richard Henderson
2002-07-21  9:43     ` Steven Bosscher
2002-07-21 14:31       ` Richard Henderson
2002-07-21 15:07         ` Daniel Berlin
2002-07-21 15:15         ` Steven Bosscher
2002-07-21  0:09 ` Neil Booth
2002-07-21  0:41   ` Richard Henderson
2002-07-21  0:13 ` Richard Henderson
2002-07-21  7:06 ` Andreas Jaeger
2002-07-21 11:09   ` Pop Sébastian
2002-07-21 11:44     ` Andreas Jaeger
2002-07-21 23:20       ` Phil Edwards
2002-07-22  2:24         ` Daniel Berlin
2002-07-22  6:23           ` Phil Edwards
2002-07-22  7:10       ` where is tree dump utility Sathiskanna
2002-07-22 13:25         ` Diego Novillo
2002-07-21 17:03 ` Language-independent functions-as-trees representation Mark Mitchell
2002-07-21 20:30   ` Jason Merrill
2002-07-21 22:44     ` Mark Mitchell
2002-07-21 23:06       ` Jason Merrill
2002-07-21 23:53         ` Mark Mitchell
2002-07-22  0:13           ` Jason Merrill
2002-07-22  3:17             ` Daniel Berlin
2002-07-22  5:11     ` Daniel Berlin
2002-07-22  7:18       ` Jason Merrill
2002-07-22 13:11         ` Daniel Berlin
2002-07-23 18:20           ` Jason Merrill
2002-07-27 14:10             ` Pop Sébastian
2002-07-30  9:02               ` Jason Merrill
2002-07-22 21:09   ` Diego Novillo
2002-07-22  2:42 ` where is the data Sathiskanna
2002-07-22 19:04 ` Language-independent functions-as-trees representation Diego Novillo
2002-07-23 10:21   ` Fergus Henderson
2002-07-23 11:26     ` Diego Novillo
2002-07-23 11:34       ` Toon Moene
2002-07-23 12:01         ` Diego Novillo
2002-07-23 17:48           ` Pop Sébastian
2002-07-26 15:56             ` Jason Merrill
2002-07-23 17:48   ` Jason Merrill
2002-07-23 18:29     ` Andrew Haley
2002-07-23 19:07     ` Richard Henderson
2002-07-23 23:40       ` Jason Merrill
2002-07-24  3:01         ` Richard Henderson
2002-07-24 11:34           ` Jason Merrill
2002-07-23 21:57     ` Fergus Henderson
2002-07-24  0:22       ` Jason Merrill
2002-08-23  5:41 ` Jason Merrill
2002-08-23  6:00   ` Gabriel Dos Reis
2002-08-23  7:41     ` Jason Merrill
2002-08-23  8:22   ` Pop SÃ©bastian
2002-08-23  8:46     ` Jason Merrill
2002-08-23  8:23   ` Jason Merrill
2002-08-23 12:16   ` Diego Novillo
2002-08-23 15:42     ` Daniel Berlin
2002-08-23 17:26     ` Jason Merrill
2002-08-23 17:42       ` Zack Weinberg
2002-08-23 22:25         ` Per Bothner
2002-08-26  0:07           ` Zack Weinberg
2002-08-26  3:07             ` Neil Booth
2002-08-26  3:18               ` Gabriel Dos Reis
2002-08-28  6:50           ` Jason Merrill
2002-08-28 16:42             ` Per Bothner
2002-08-23 22:31       ` Per Bothner
2002-08-26 19:56       ` Diego Novillo
2002-08-29 14:35         ` Richard Henderson
2002-09-03  5:38           ` Jason Merrill
2002-08-26 16:58     ` Paul Brook
2002-08-29 14:21     ` Richard Henderson
2002-08-23 13:01   ` Neil Booth
2002-08-28  4:44     ` Jason Merrill
2002-08-28  5:17       ` Gabriel Dos Reis
2002-08-28  7:10         ` Jason Merrill
2002-08-28  8:02           ` Diego Novillo
2002-08-28  8:32           ` Gabriel Dos Reis
2002-08-28 14:06   ` Jason Merrill
2002-09-10 17:08     ` Gerald Pfeifer
2002-08-29 14:10   ` Richard Henderson
2002-09-03  5:16     ` Jason Merrill
2002-09-04 10:20       ` Paul Brook
2002-08-23 13:53 Robert Dewar
2002-08-23 14:10 ` Neil Booth
2002-08-23 14:14 Robert Dewar
2002-08-23 14:14 ` Neil Booth
2002-08-23 14:35 Robert Dewar
2002-08-23 15:21 ` Neil Booth
2002-08-24  7:59 ` Michael S. Zick
2002-08-23 15:25 Richard Kenner
2002-08-23 15:33 ` Neil Booth
2002-08-23 18:20 Robert Dewar
2002-08-24 12:47 Robert Dewar
2002-08-28  5:03 Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).