public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Lazy allocation of DECL_ASSEMBLER_NAME
@ 2004-03-01 19:10 Mark Mitchell
  2004-03-01 19:47 ` Geoff Keating
  2004-03-01 20:20 ` Jan Hubicka
  0 siblings, 2 replies; 28+ messages in thread
From: Mark Mitchell @ 2004-03-01 19:10 UTC (permalink / raw)
  To: jh; +Cc: gcc


Jan --

I believe that your cgraph stuff has caused us to be much less lazy
about allocating DECL_ASSEMBLER_NAME than we used to be.  

A while back, I changed DECL_ASSEMBLER_NAME to allocate lazily because
I observed that we were allocating a ton of space for mangled names
for entities that were never omitted.  Now, it looks like cgraph has
gone back to using DECL_ASSEMBLER_NAME unconditionally.

It is basically a bug for any part of the code in the compiler to use
DECL_ASSEMBLER_NAME other than the routines that actually emity
assembly code.  It should be possible to assign DECL_ASSEMBLER_NAMEs
only after the entire translation unit has been parsed, analyzed, and
it has been decided what functions and variables need to be omitted to
the object file.  All other uses are either wrong, or hacks that
should be replaced with a cleaner mechanism.

To check for whether two declarations are the same, just compare their
addresses.  If a front end creates two FUNCTION_DECLs or VAR_DECLs
that happen to have the same DECL_ASSEMBLER_NAME that's either a bug
or an intentional trick on the part of the front end: by the time
things get to the middle end there should be only one DECL for each
declared entity.

Would you please see if you can fix this problem -- after fixing some
of the wrong-code/ICE problems in 3.4?

Thanks,

--
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 19:10 Lazy allocation of DECL_ASSEMBLER_NAME Mark Mitchell
@ 2004-03-01 19:47 ` Geoff Keating
  2004-03-01 19:57   ` Zack Weinberg
                     ` (2 more replies)
  2004-03-01 20:20 ` Jan Hubicka
  1 sibling, 3 replies; 28+ messages in thread
From: Geoff Keating @ 2004-03-01 19:47 UTC (permalink / raw)
  To: mark; +Cc: gcc

Mark Mitchell <mark@codesourcery.com> writes:

> Jan --
> 
> I believe that your cgraph stuff has caused us to be much less lazy
> about allocating DECL_ASSEMBLER_NAME than we used to be.  
> 
> A while back, I changed DECL_ASSEMBLER_NAME to allocate lazily because
> I observed that we were allocating a ton of space for mangled names
> for entities that were never omitted.  Now, it looks like cgraph has
> gone back to using DECL_ASSEMBLER_NAME unconditionally.
> 
> It is basically a bug for any part of the code in the compiler to use
> DECL_ASSEMBLER_NAME other than the routines that actually emity
> assembly code.  It should be possible to assign DECL_ASSEMBLER_NAMEs
> only after the entire translation unit has been parsed, analyzed, and
> it has been decided what functions and variables need to be omitted to
> the object file.  All other uses are either wrong, or hacks that
> should be replaced with a cleaner mechanism.

Hi Mark,

Could you mention this in the documentation of DECL_ASSEMBLER_NAME in
c-tree.texi?

> To check for whether two declarations are the same, just compare their
> addresses.

You mean, the addresses of the trees, or the addresses of the objects
(presumably you would use fold and see if it was integer_zero_node or not)?

> If a front end creates two FUNCTION_DECLs or VAR_DECLs
> that happen to have the same DECL_ASSEMBLER_NAME that's either a bug
> or an intentional trick on the part of the front end: by the time
> things get to the middle end there should be only one DECL for each
> declared entity.

I don't believe that the frontends actually ensure this.  For instance,

extern int x;
extern int y asm ("x");

will produce two DECLs that refer to the same integer.

Likewise, I believe with Zack's c-decl changes, I believe that it
always returns the new DECL, which means that in code like:

extern int foo();
int bar(void) { return foo(1); }
extern int foo(int);
int baz(void) { return foo(2); }

bar will reference one DECL of foo, and baz will reference a different
one.  I think there are cases (perhaps even this example) where it'll
do this even without Zack's changes.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 19:47 ` Geoff Keating
@ 2004-03-01 19:57   ` Zack Weinberg
  2004-03-01 20:21     ` Geoff Keating
  2004-03-01 20:11   ` Mark Mitchell
  2004-03-02 17:04   ` Mark Mitchell
  2 siblings, 1 reply; 28+ messages in thread
From: Zack Weinberg @ 2004-03-01 19:57 UTC (permalink / raw)
  To: Geoff Keating; +Cc: mark, gcc

Geoff Keating <geoffk@geoffk.org> writes:

> Likewise, I believe with Zack's c-decl changes, I believe that it
> always returns the new DECL,

I can't speak to anything else here, but this is not true; I
considered that approach and rejected it.  The thrust of the
(currently incomplete) work is now to always use the oldest decl,
adding information from later ones as necessary.  Cases where that
does not happen are bugs - the bugs I've been trying to fix; the whole
problem is that the middle end is not prepared to cope with multiple
DECLs for the same object.

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 19:47 ` Geoff Keating
  2004-03-01 19:57   ` Zack Weinberg
@ 2004-03-01 20:11   ` Mark Mitchell
  2004-03-01 20:19     ` Zack Weinberg
  2004-03-02 17:04   ` Mark Mitchell
  2 siblings, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2004-03-01 20:11 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc


>Could you mention this in the documentation of DECL_ASSEMBLER_NAME in
>c-tree.texi?
>  
>
Yes.

>>To check for whether two declarations are the same, just compare their
>>addresses.
>>    
>>
>
>You mean, the addresses of the trees
>  
>
Yes, I mean the addresses of the trees.  If that makes PCH unhappy, then 
we use DECL_UID.  (Much though I think that wasting a word in every DECL 
for a UID when the address is *already* a UID is goofy.  The EDG front 
end is a worked example demonstrating that you can use pointers as 
indices and still make PCH work by swizzling the pointers on the way 
in.  We really should do that; it would allow us to save memory and 
would make PCH more robust.)

>>If a front end creates two FUNCTION_DECLs or VAR_DECLs
>>that happen to have the same DECL_ASSEMBLER_NAME that's either a bug
>>or an intentional trick on the part of the front end: by the time
>>things get to the middle end there should be only one DECL for each
>>declared entity.
>>    
>>
>
>I don't believe that the frontends actually ensure this.  For instance,
>
>extern int x;
>extern int y asm ("x");
>
>will produce two DECLs that refer to the same integer.
>  
>
Yes -- but that's the user's issue.

 From the point of view of the compiler, we should still treat these as 
two separate variables.

You could even do
 
  extern float y asm ("x")

and I don't expect that we would present give you a warning.

I certainly don't think that should be an error -- I think it's 
perfectly valid input -- but of course the user must be very careful 
about type-based aliasing rules if playing such games.

The asm-specifier should have nothing to do with anything except what 
name is put out in the assembly file.

>Likewise, I believe with Zack's c-decl changes, I believe that it
>always returns the new DECL, which means that in code like:
>
>extern int foo();
>int bar(void) { return foo(1); }
>extern int foo(int);
>int baz(void) { return foo(2); }
>
>bar will reference one DECL of foo, and baz will reference a different
>one.  I think there are cases (perhaps even this example) where it'll
>do this even without Zack's changes.
>  
>
The point of Zack's changes was (in part) to eliminate having multiple 
copies of a FUNCTION_DECL for foo.

 I'm not 100% sure that this was fully completed in the current patch, 
but if not, it will be soon.

There should be just one FUNCTION_DECL for "foo", and by the time we 
reach the middle end, its type should be "int ()(int)".

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:11   ` Mark Mitchell
@ 2004-03-01 20:19     ` Zack Weinberg
  0 siblings, 0 replies; 28+ messages in thread
From: Zack Weinberg @ 2004-03-01 20:19 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Geoff Keating, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> Yes, I mean the addresses of the trees.  If that makes PCH unhappy,
> then we use DECL_UID.  (Much though I think that wasting a word in
> every DECL for a UID when the address is *already* a UID is goofy.
> The EDG front end is a worked example demonstrating that you can use
> pointers as indices and still make PCH work by swizzling the pointers
> on the way in.  We really should do that; it would allow us to save
> memory and would make PCH more robust.)

In fact I have a patch waiting for me to get time to finish it up,
which removes DECL_UID and TYPE_UID entirely with no noticeable ill
effect.  There is *one* use of each in the 3.4 tree, and in both cases
pointer hashing is safe.

I'll be keeping the field itself around in mainline, because rth says
it's actually useful on tree-ssa somehow, although I hope that another
solution can be found.

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 19:10 Lazy allocation of DECL_ASSEMBLER_NAME Mark Mitchell
  2004-03-01 19:47 ` Geoff Keating
@ 2004-03-01 20:20 ` Jan Hubicka
  2004-03-01 20:27   ` Mark Mitchell
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Hubicka @ 2004-03-01 20:20 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: jh, gcc

> 
> Jan --
> 
> I believe that your cgraph stuff has caused us to be much less lazy
> about allocating DECL_ASSEMBLER_NAME than we used to be.  
> 
> A while back, I changed DECL_ASSEMBLER_NAME to allocate lazily because
> I observed that we were allocating a ton of space for mangled names
> for entities that were never omitted.  Now, it looks like cgraph has
> gone back to using DECL_ASSEMBLER_NAME unconditionally.
> 
> It is basically a bug for any part of the code in the compiler to use
> DECL_ASSEMBLER_NAME other than the routines that actually emity
> assembly code.  It should be possible to assign DECL_ASSEMBLER_NAMEs
> only after the entire translation unit has been parsed, analyzed, and
> it has been decided what functions and variables need to be omitted to
> the object file.  All other uses are either wrong, or hacks that
> should be replaced with a cleaner mechanism.
> 
> To check for whether two declarations are the same, just compare their
> addresses.  If a front end creates two FUNCTION_DECLs or VAR_DECLs
> that happen to have the same DECL_ASSEMBLER_NAME that's either a bug
> or an intentional trick on the part of the front end: by the time
> things get to the middle end there should be only one DECL for each
> declared entity.
> 
> Would you please see if you can fix this problem -- after fixing some
> of the wrong-code/ICE problems in 3.4?

I will try to look into it, but last time I tried to do so, both C and
C++ frontends was creating separate declarations for local declarations,
forward declarations and such, so single function got many DECLs.
Also we can't hash directly the addresses as the hashtable is saved into
PCH headers, so I am unsure about better sollution to the hashing.
I need at least one entity that is stable across PCH and multiple
declarations...

Honza
> 
> Thanks,
> 
> --
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 19:57   ` Zack Weinberg
@ 2004-03-01 20:21     ` Geoff Keating
  2004-03-01 20:25       ` Mark Mitchell
  2004-03-01 21:02       ` Dale Johannesen
  0 siblings, 2 replies; 28+ messages in thread
From: Geoff Keating @ 2004-03-01 20:21 UTC (permalink / raw)
  To: zack, dalej; +Cc: mark, gcc

> From: Zack Weinberg <zack@codesourcery.com>
> Date: Mon, 01 Mar 2004 11:57:56 -0800

> Geoff Keating <geoffk@geoffk.org> writes:
> 
> > Likewise, I believe with Zack's c-decl changes, I believe that it
> > always returns the new DECL,
> 
> I can't speak to anything else here, but this is not true; I
> considered that approach and rejected it.  The thrust of the
> (currently incomplete) work is now to always use the oldest decl,
> adding information from later ones as necessary.  Cases where that
> does not happen are bugs - the bugs I've been trying to fix; the whole
> problem is that the middle end is not prepared to cope with multiple
> DECLs for the same object.

That does sound like a good plan.  The difficulty will be in ensuring
that the newer decl is not used before duplicate_decls is called on
the pair of decls; in particular, this will require some changes to
IMA, merge_translation_unit_decls will need to go away because
duplicate_decls will have to be called when the decl is seen rather
than later on.  I've wanted to do this ever since we removed the GCC
extension that made it necessary to do this as a final pass, but don't
have time right now.

Is this going to be necessary for 3.4?  It sounds like a lot of work
very late in the release cycle.  It may be better to have the
middle-end compare DECL_ASSEMBLER_NAME for 3.4 and make these changes
in 3.5.

(I've added Dale since he was struggling with a related problem in the
tree-ssa branch, and neither he nor I knew about this new requirement
for 'one DECL' in the backend.)

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:21     ` Geoff Keating
@ 2004-03-01 20:25       ` Mark Mitchell
  2004-03-01 21:15         ` Zack Weinberg
  2004-03-01 21:02       ` Dale Johannesen
  1 sibling, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2004-03-01 20:25 UTC (permalink / raw)
  To: Geoff Keating; +Cc: zack, dalej, gcc

Geoff Keating wrote:

>>From: Zack Weinberg <zack@codesourcery.com>
>>Date: Mon, 01 Mar 2004 11:57:56 -0800
>>    
>>
>
>  
>
>>Geoff Keating <geoffk@geoffk.org> writes:
>>
>>    
>>
>>>Likewise, I believe with Zack's c-decl changes, I believe that it
>>>always returns the new DECL,
>>>      
>>>
>>I can't speak to anything else here, but this is not true; I
>>considered that approach and rejected it.  The thrust of the
>>(currently incomplete) work is now to always use the oldest decl,
>>adding information from later ones as necessary.  Cases where that
>>does not happen are bugs - the bugs I've been trying to fix; the whole
>>problem is that the middle end is not prepared to cope with multiple
>>DECLs for the same object.
>>    
>>
>
>That does sound like a good plan.  The difficulty will be in ensuring
>that the newer decl is not used before duplicate_decls is called on
>the pair of decls; in particular, this will require some changes to
>IMA, merge_translation_unit_decls will need to go away because
>duplicate_decls will have to be called when the decl is seen rather
>than later on.  I've wanted to do this ever since we removed the GCC
>extension that made it necessary to do this as a final pass, but don't
>have time right now.
>
>Is this going to be necessary for 3.4?  
>
No.

Unfortunately, we hosed ourselves for 3.4.  Zack started on this work to 
fix some problems in the C front end that have been there forever, and 
unfortunately didn't finish the project.  So, now we have some old bugs 
fixed but we also have some regressions.  As you say, it's too dramatic 
a change to try to fix all of the problems for 3.4, so we're going to 
have to live with some regressions.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:20 ` Jan Hubicka
@ 2004-03-01 20:27   ` Mark Mitchell
  2004-03-01 20:33     ` Jan Hubicka
                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Mark Mitchell @ 2004-03-01 20:27 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc


>I will try to look into it, but last time I tried to do so, both C and
>C++ frontends was creating separate declarations for local declarations,
>  
>
If so, those are bugs -- or cases where the front ends really wants you 
to think of those as seaparate objects.

>forward declarations and such, so single function got many DECLs.
>Also we can't hash directly the addresses as the hashtable is saved into
>PCH headers, so I am unsure about better sollution to the hashing.
>I need at least one entity that is stable across PCH and multiple
>declarations...
>  
>
Use DECL_UID.

But, really, someone should just fix PCH to do the obvious pointer 
swizzling and/or hash-table rebuilds.  It's silly to be inventing new 
uniquifiers, computing them, filling up memory with them, etc., in lots 
of places when we could just fix the PCH machinery.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:27   ` Mark Mitchell
@ 2004-03-01 20:33     ` Jan Hubicka
  2004-03-01 21:19     ` Zack Weinberg
  2004-03-01 21:19     ` Geoff Keating
  2 siblings, 0 replies; 28+ messages in thread
From: Jan Hubicka @ 2004-03-01 20:33 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, gcc

> 
> >I will try to look into it, but last time I tried to do so, both C and
> >C++ frontends was creating separate declarations for local declarations,
> > 
> >
> If so, those are bugs -- or cases where the front ends really wants you 
> to think of those as seaparate objects.

Tricking me that the objects are separate when they are not is a
problem.  The observation about multiple decls is pretty old and Zack
already mentioned that he should've fixed them, so hopefully it will
work.  But definitly the backward inlining didn't work even for trivial
testcases then.
> 
> >forward declarations and such, so single function got many DECLs.
> >Also we can't hash directly the addresses as the hashtable is saved into
> >PCH headers, so I am unsure about better sollution to the hashing.
> >I need at least one entity that is stable across PCH and multiple
> >declarations...
> > 
> >
> Use DECL_UID.
> 
> But, really, someone should just fix PCH to do the obvious pointer 
> swizzling and/or hash-table rebuilds.  It's silly to be inventing new 
> uniquifiers, computing them, filling up memory with them, etc., in lots 
> of places when we could just fix the PCH machinery.

OK, I will give it a try and see if it works.  (BTW the situation of
DECL_RTL set is not terribly bad - we create these only for DECLs
finalized and relatively high portion of these gets used,  I used do
have some data for this, but can't find them).

Thanks,
Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> (916) 791-8304
> mark@codesourcery.com
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:21     ` Geoff Keating
  2004-03-01 20:25       ` Mark Mitchell
@ 2004-03-01 21:02       ` Dale Johannesen
  2004-03-01 21:45         ` Mark Mitchell
  1 sibling, 1 reply; 28+ messages in thread
From: Dale Johannesen @ 2004-03-01 21:02 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc, zack, mark


On Mar 1, 2004, at 12:21 PM, Geoff Keating wrote:

>> From: Zack Weinberg <zack@codesourcery.com>
>> Date: Mon, 01 Mar 2004 11:57:56 -0800
>
>> Geoff Keating <geoffk@geoffk.org> writes:
>>
>>> Likewise, I believe with Zack's c-decl changes, I believe that it
>>> always returns the new DECL,
>>
>> I can't speak to anything else here, but this is not true; I
>> considered that approach and rejected it.  The thrust of the
>> (currently incomplete) work is now to always use the oldest decl,
>> adding information from later ones as necessary.  Cases where that
>> does not happen are bugs - the bugs I've been trying to fix; the whole
>> problem is that the middle end is not prepared to cope with multiple
>> DECLs for the same object.
>
> That does sound like a good plan.  The difficulty will be in ensuring
> that the newer decl is not used before duplicate_decls is called on
> the pair of decls; in particular, this will require some changes to
> IMA, merge_translation_unit_decls will need to go away because
> duplicate_decls will have to be called when the decl is seen rather
> than later on.  I've wanted to do this ever since we removed the GCC
> extension that made it necessary to do this as a final pass, but don't
> have time right now.
>
> Is this going to be necessary for 3.4?  It sounds like a lot of work
> very late in the release cycle.  It may be better to have the
> middle-end compare DECL_ASSEMBLER_NAME for 3.4 and make these changes
> in 3.5.
>
> (I've added Dale since he was struggling with a related problem in the
> tree-ssa branch, and neither he nor I knew about this new requirement
> for 'one DECL' in the backend.)

Thanks.  The problem I'm struggling with involves use of mutliple 
TYPE_DECL nodes
for "the same" type multiply defined in different files.  Geoff has 
posted an example
showing that compatibility of types is not transitive in C99, which I 
believe
implies you can't unify all the compatible types into one node, and 
further implies
that comparisons for type compatibility can't just test for pointer 
equality, as they
do now in many places in tree-ssa.  This is a mess; beyond the 
correctness problem,
going through the full rigamarole for type comparisons all the time has 
the potential
for introducing compile speed regressions.  I am thinking along the 
lines of keeping
a hash table or some such for remembering the result of type1=?=type2.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:25       ` Mark Mitchell
@ 2004-03-01 21:15         ` Zack Weinberg
  2004-03-01 21:46           ` Geoff Keating
  0 siblings, 1 reply; 28+ messages in thread
From: Zack Weinberg @ 2004-03-01 21:15 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Geoff Keating, dalej, gcc

Mark Mitchell <mark@codesourcery.com> writes:
> Geoff Keating wrote:
>>>From: Zack Weinberg <zack@codesourcery.com>
>>>The thrust of the (currently incomplete) work is now to always use
>>>the oldest decl, adding information from later ones as necessary.
>>
>>That does sound like a good plan.  The difficulty will be in ensuring
>>that the newer decl is not used before duplicate_decls is called on
>>the pair of decls; in particular, this will require some changes to
>>IMA, merge_translation_unit_decls will need to go away because
>>duplicate_decls will have to be called when the decl is seen rather
>>than later on.

Yes.  I have partial code implementing this.  The fiddliest part is
the nontransitive type comparison problem that Dale is encountering.
I personally think that the front end should go to whatever lengths
are necessary to present just one type to the language-independent
compiler, even across multiple translation units, even in the presence
of a nontransitive type system.  I think this is easier than it might
sound - the trick I have in mind is permuting TYPE_MAIN_VARIANT so
that it's always the appropriate choice for *this* translation unit.

Since these patches are only going onto mainline for the time being, I
expect the next couple patches will fix all the problems with
single-translation-unit mode at the expense of disabling IMA, and then
we can come back to IMA's problems.

>> Is this going to be necessary for 3.4?
>>
> No.
>
> Unfortunately, we hosed ourselves for 3.4.  Zack started on this work
> to fix some problems in the C front end that have been there forever,
> and unfortunately didn't finish the project.  So, now we have some old
> bugs fixed but we also have some regressions.  As you say, it's too
> dramatic a change to try to fix all of the problems for 3.4, so we're
> going to have to live with some regressions.

I'm hoping that the patches will prove stable enough to backport in
the 3.4.1 time frame, and I'm maintaining a 3.4 backport tree with
that in mind.

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:27   ` Mark Mitchell
  2004-03-01 20:33     ` Jan Hubicka
  2004-03-01 21:19     ` Zack Weinberg
@ 2004-03-01 21:19     ` Geoff Keating
  2 siblings, 0 replies; 28+ messages in thread
From: Geoff Keating @ 2004-03-01 21:19 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

Mark Mitchell <mark@codesourcery.com> writes:

> >I will try to look into it, but last time I tried to do so, both C and
> >C++ frontends was creating separate declarations for local declarations,
> >
> If so, those are bugs -- or cases where the front ends really wants
> you to think of those as seaparate objects.
> 
> >forward declarations and such, so single function got many DECLs.
> >Also we can't hash directly the addresses as the hashtable is saved into
> >PCH headers, so I am unsure about better sollution to the hashing.
> >I need at least one entity that is stable across PCH and multiple
> >declarations...
> >
> Use DECL_UID.
> 
> But, really, someone should just fix PCH to do the obvious pointer
> swizzling and/or hash-table rebuilds.  It's silly to be inventing new
> uniquifiers, computing them, filling up memory with them, etc., in
> lots of places when we could just fix the PCH machinery.

You'll want to be doing some benchmarks before committing to one
approach...

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 20:27   ` Mark Mitchell
  2004-03-01 20:33     ` Jan Hubicka
@ 2004-03-01 21:19     ` Zack Weinberg
  2004-03-01 21:19     ` Geoff Keating
  2 siblings, 0 replies; 28+ messages in thread
From: Zack Weinberg @ 2004-03-01 21:19 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, gcc

Mark Mitchell <mark@codesourcery.com> writes:

>>forward declarations and such, so single function got many DECLs.
>>Also we can't hash directly the addresses as the hashtable is saved into
>>PCH headers, so I am unsure about better sollution to the hashing.
>>I need at least one entity that is stable across PCH and multiple
>>declarations...
>>
> Use DECL_UID.

DECL_UID will not work - the multiple decls for the same symbol that we
currently have, they do not have the same DECL_UID (there is special
code in copy_node, merge_decls (C), duplicate_decls (C++) to ensure
that they do not).

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 21:02       ` Dale Johannesen
@ 2004-03-01 21:45         ` Mark Mitchell
  2004-03-01 21:48           ` Dale Johannesen
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2004-03-01 21:45 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: Geoff Keating, gcc, zack

Dale Johannesen wrote:

> Thanks.  The problem I'm struggling with involves use of mutliple 
> TYPE_DECL nodes
> for "the same" type multiply defined in different files.

Well, IMA is a whole 'nother can of worms.

I think our current approach is overly fragile, but that's not really 
material here.

TYPE_DECLs are not covered by my earlier statements.  We already have 
multiple TYPE_DECLs for the same type because some of our TYPE_DECLs 
refer to typedefs, which are of course the "same type" from the point of 
view of the language.  My statement should refer to FUNCTION_DECLs, 
VAR_DECLs, and CONST_DECLs for enumeration constants.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 21:15         ` Zack Weinberg
@ 2004-03-01 21:46           ` Geoff Keating
  2004-03-01 22:03             ` Mark Mitchell
  0 siblings, 1 reply; 28+ messages in thread
From: Geoff Keating @ 2004-03-01 21:46 UTC (permalink / raw)
  To: zack; +Cc: mark, dalej, gcc

> From: Zack Weinberg <zack@codesourcery.com>
> Date: Mon, 01 Mar 2004 13:15:52 -0800

> Mark Mitchell <mark@codesourcery.com> writes:
> > Geoff Keating wrote:
> >>>From: Zack Weinberg <zack@codesourcery.com>
> >>>The thrust of the (currently incomplete) work is now to always use
> >>>the oldest decl, adding information from later ones as necessary.
> >>
> >>That does sound like a good plan.  The difficulty will be in ensuring
> >>that the newer decl is not used before duplicate_decls is called on
> >>the pair of decls; in particular, this will require some changes to
> >>IMA, merge_translation_unit_decls will need to go away because
> >>duplicate_decls will have to be called when the decl is seen rather
> >>than later on.
> 
> Yes.  I have partial code implementing this.  The fiddliest part is
> the nontransitive type comparison problem that Dale is encountering.
> I personally think that the front end should go to whatever lengths
> are necessary to present just one type to the language-independent
> compiler, even across multiple translation units, even in the presence
> of a nontransitive type system.  I think this is easier than it might
> sound - the trick I have in mind is permuting TYPE_MAIN_VARIANT so
> that it's always the appropriate choice for *this* translation unit.

It might be difficult to determine which translation unit is "this
unit" at any given point; you might have code that does:


struct foo;
extern struct foo * make_a_foo (void);
extern void use_a_foo (struct foo *);

extern struct foo * make_another_foo (void);
extern void use_another_foo (struct foo *);

void bar(void) {
  struct foo * f1, *f2;

  f1 = make_a_foo (void);
  f2 = make_another_foo (void);
  use_a_foo (f1);
  use_another_foo (f2);
}

where the two 'struct foo's are different in the translation units
that implement make_a_foo and make_another_foo, and then have all four
routines inlined into 'bar' so that 'bar' needs to see both versions
of the structure.  Of course, in a real program it wouldn't be done in
such a bare-faced fashion; bar() itself might have been constructed
through inlining, for example.

Worse, you might see this file first, so you won't know about the
problem until later in parsing, which will be after you've already
unified the declarations in this file.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 21:45         ` Mark Mitchell
@ 2004-03-01 21:48           ` Dale Johannesen
  0 siblings, 0 replies; 28+ messages in thread
From: Dale Johannesen @ 2004-03-01 21:48 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc, zack, Geoff Keating

On Mar 1, 2004, at 1:45 PM, Mark Mitchell wrote:
> Dale Johannesen wrote:
>
>> Thanks.  The problem I'm struggling with involves use of mutliple 
>> TYPE_DECL nodes
>> for "the same" type multiply defined in different files.
>
> Well, IMA is a whole 'nother can of worms.
> TYPE_DECLs are not covered by my earlier statements.  We already have 
> multiple TYPE_DECLs for the same type because some of our TYPE_DECLs 
> refer to typedefs, which are of course the "same type" from the point 
> of view of the language.
>  My statement should refer to FUNCTION_DECLs, VAR_DECLs, and 
> CONST_DECLs for enumeration constants.

Thanks for the clarification.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 21:46           ` Geoff Keating
@ 2004-03-01 22:03             ` Mark Mitchell
  2004-03-01 22:54               ` Geoff Keating
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2004-03-01 22:03 UTC (permalink / raw)
  To: Geoff Keating; +Cc: zack, dalej, gcc

Geoff Keating wrote:

>>Yes.  I have partial code implementing this.  The fiddliest part is
>>the nontransitive type comparison problem that Dale is encountering.
>>I personally think that the front end should go to whatever lengths
>>are necessary to present just one type to the language-independent
>>compiler, even across multiple translation units, even in the presence
>>of a nontransitive type system.  I think this is easier than it might
>>sound - the trick I have in mind is permuting TYPE_MAIN_VARIANT so
>>that it's always the appropriate choice for *this* translation unit.
>>    
>>
>
>It might be difficult to determine which translation unit is "this
>unit" at any given point; you might have code that does:
>  
>
There are lots of things that are hard about inter-module analysis, as 
you of course know. :-)

It's an inherent design problem: we're essentially trying to combine 
multiple C translation units into a single C translation unit, and C is 
not a language that permits that.  Most object file formats are designed 
to support C, so we see these problems at that level too.  For example, 
in the past, I've complained about the fact that if we have two static 
functions with the same name in different translation units, the single 
.o that we generate renames them.

In the example that you give, the program is not conforming if the 
definitions of "struct foo" are not identical across translation units.  
(Or, at least, that would be true in ISO C++.  It might not have to be 
true in ISO C.  You may know more than I.)  In any case, if the 
structures do not match up across the translation units, we should just 
not combine them.  That is certainly not the common case.

We'll likely break some aspects of inter-module optimization when we fix 
some of the single translation-unit issues.  That is what we get for 
collectively not agreeing on a single development plan.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 22:03             ` Mark Mitchell
@ 2004-03-01 22:54               ` Geoff Keating
  2004-03-02  0:03                 ` Mark Mitchell
  0 siblings, 1 reply; 28+ messages in thread
From: Geoff Keating @ 2004-03-01 22:54 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: zack, dalej, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> Geoff Keating wrote:
> 
> >>Yes.  I have partial code implementing this.  The fiddliest part is
> >>the nontransitive type comparison problem that Dale is encountering.
> >>I personally think that the front end should go to whatever lengths
> >>are necessary to present just one type to the language-independent
> >>compiler, even across multiple translation units, even in the presence
> >>of a nontransitive type system.  I think this is easier than it might
> >>sound - the trick I have in mind is permuting TYPE_MAIN_VARIANT so
> >>that it's always the appropriate choice for *this* translation unit.
> >>
> >
> >It might be difficult to determine which translation unit is "this
> >unit" at any given point; you might have code that does:
> >
> There are lots of things that are hard about inter-module analysis, as
> you of course know. :-)

By 'difficult' I meant 'impossible and also conceptually wrong'.

Inter-module analysis is very simple.  Trying to change decl handling
in the compiler without thinking about the consequences of
inter-module analysis: that's hard.

> It's an inherent design problem: we're essentially trying to combine
> multiple C translation units into a single C translation unit, and C
> is not a language that permits that.

I'm not trying to make a single C translation unit.  I'm not sure that
is a good idea, or is even possible.  I don't recommend anyone try it;
I think it's not a productive design direction.

What I did to implement IMA is to make the compiler handle multiple
translation units in the same process.

>  Most object file formats are
> designed to support C, so we see these problems at that level too.
> For example, in the past, I've complained about the fact that if we
> have two static functions with the same name in different translation
> units, the single .o that we generate renames them.
> 
> In the example that you give, the program is not conforming if the
> definitions of "struct foo" are not identical across translation
> units.  (Or, at least, that would be true in ISO C++.  It might not
> have to be true in ISO C.  You may know more than I.)  In any case, if
> the structures do not match up across the translation units, we should
> just not combine them.  That is certainly not the common case.

C is very very different from C++ for the purposes of IMA.  In C++,
types have linkage, there is the one definition rule, and there are no
'tentative definitions'.  IMA for C++ will look quite different.

I'm not sure what you mean by 'just not combine them'.  (I'm not sure
which "them" you're referring to, structures or TUs, and I'm not sure
what "combine" means in this context.)

> We'll likely break some aspects of inter-module optimization when we
> fix some of the single translation-unit issues.  That is what we get
> for collectively not agreeing on a single development plan.

I worry when I see statements like this.  I presume that by "break",
you mean that "while working on these patches, some things will break,
but of course those will be fixed before any patch is actually checked
in, or as soon as possible afterwards if we didn't notice them before
checkin", correct?

Breaking IMA means that it's hard to get meaningful SPEC numbers,
which means that some people can't work on GCC until the problem is
fixed.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 22:54               ` Geoff Keating
@ 2004-03-02  0:03                 ` Mark Mitchell
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Mitchell @ 2004-03-02  0:03 UTC (permalink / raw)
  To: Geoff Keating; +Cc: zack, dalej, gcc


>Inter-module analysis is very simple.  Trying to change decl handling
>in the compiler without thinking about the consequences of
>inter-module analysis: that's hard.
>  
>
Implementing inter-module analysis without thinking about the current 
known defects in the way the front end manages its symbol table is also 
hard. :-)

>What I did to implement IMA is to make the compiler handle multiple
>translation units in the same process.
>  
>
Yes, and that's a good thing. 

>>We'll likely break some aspects of inter-module optimization when we
>>fix some of the single translation-unit issues.  That is what we get
>>for collectively not agreeing on a single development plan.
>>    
>>
>
>I worry when I see statements like this.  I presume that by "break",
>you mean that "while working on these patches, some things will break,
>but of course those will be fixed before any patch is actually checked
>in, or as soon as possible afterwards if we didn't notice them before
>checkin", correct?
>
>  
>
Obviously, that would be the goal.

But, my feeling is that the intermodule stuff  was committed without as 
much planning as would have been ideal. 

I don't think this was your fault. 

You posted a message back on May 19th, and didn't get much in the way of 
comments.  In particular, I certainly did not comment -- I failed to see 
your patch.  Partly, that might be because it was only posted to 
gcc-patches -- there was no design discussion on gcc.  But there's no 
question that you made your early draft available for comment, and you 
didn't get much feedback, so I don't think anyone could possibly 
criticize you for proceeding with your plan.

But, there were lots of other balls in the air that are going to impact 
IMA.  We've talked for years about combining the C and C++ front ends, 
which would be a huge win from a maintenance standpoint, and would give 
a more consistent user-experience, but will make it difficult to keep 
IMA in its current form.  Zack's c-decl rewrite has been in progress for 
close to a year and is basically the only way to fix a number of bugs in 
C front end.  Those will likely impact IMA, and some of them may not 
have easy fixes.  All that information was out there in the community, 
but it never got fashioned into a coherent plan.  As a result, you 
checked in your changes, and now we've got obstacles on other fronts 
that might have been avoided.

My feeling is that we will at some point likely want to rewrite IMA 
support to address these issues and to make it a much more 
language-independent framework and to make it support multiple languages 
at once.  I think the current version is like my 
load-hoisting/store-sinking implementation: effective for some of the 
common cases, but not as general, maintainable, or powerful as one would 
like.  (Thankfully, my stuff will be dying soon, in favor of modern 
flow-graph oriented techniques!)

It may not be possible to do all this at once -- the new IMA support 
might need the new front end stuff, and the new front end stuff might 
not work with the old IMA, and so we'd have to either do the front end 
work first (temporarily breaking IMA), or we would have to make both 
sets of changes at once.

In the future, I'd really like to see design documents written up for 
these kinds of pervasive changes (PCH, IMA, combinging the C and C++ 
front ends, implementing "export", tree-ssa, compressed RTL, etc.) and 
posted to the GCC list *before* a patch is written and submitted.  After 
there's a patch, it's hard to change things, and it's almost impossible 
to go in a different direction once the patch has been checked in.

In the shorter term, what I hope and expect will happen is that you and 
Zack will collaborate to make the appropriate changes: changes to fix 
the C front end's various language-conformance bugs and changes to fix 
IMA accordingly.  None of that's going to happen in 3.4.0 -- and quite 
likely not in 3.4.x at all.  On the mainline, we can deal with some 
brief breakage as y'all work out the details of these changes.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 19:47 ` Geoff Keating
  2004-03-01 19:57   ` Zack Weinberg
  2004-03-01 20:11   ` Mark Mitchell
@ 2004-03-02 17:04   ` Mark Mitchell
  2004-03-02 17:11     ` Jan Hubicka
  2 siblings, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2004-03-02 17:04 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 217 bytes --]


>
>Could you mention this in the documentation of DECL_ASSEMBLER_NAME in
>c-tree.texi?
>  
>
I committed the attached patch to the mainline.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com


[-- Attachment #2: diffs --]
[-- Type: text/plain, Size: 1327 bytes --]

2004-03-02  Mark Mitchell  <mark@codesourcery.com>

	* doc/c-tree.texi (DECL_ASSEMBLER_NAME): Mention that using this
	macro results in memory allocation.

Index: c-tree.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/c-tree.texi,v
retrieving revision 1.52
diff -c -5 -p -r1.52 c-tree.texi
*** c-tree.texi	18 Jan 2004 11:57:11 -0000	1.52
--- c-tree.texi	2 Mar 2004 16:57:12 -0000
*************** name is computed in the same way on all 
*** 1103,1112 ****
--- 1103,1119 ----
  is required to deal with the object file format used on a particular
  platform, it is the responsibility of the back end to perform those
  modifications.  (Of course, the back end should not modify
  @code{DECL_ASSEMBLER_NAME} itself.)
  
+ Using @code{DECL_ASSEMBLER_NAME} will cause additional memory to be
+ allocated (for the mangled name of the entity) so it should be used
+ only when emitting assembly code.  It should not be used within the
+ optimizers to determine whether or not two declarations are the same,
+ even though some of the existing optimizers do use it in that way.
+ These uses will be removed over time.
+ 
  @item DECL_EXTERNAL
  This predicate holds if the function is undefined.
  
  @item TREE_PUBLIC
  This predicate holds if the function has external linkage.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-02 17:04   ` Mark Mitchell
@ 2004-03-02 17:11     ` Jan Hubicka
  2004-03-02 17:18       ` Mark Mitchell
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Hubicka @ 2004-03-02 17:11 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Geoff Keating, gcc

> 
> >
> >Could you mention this in the documentation of DECL_ASSEMBLER_NAME in
> >c-tree.texi?
> > 
> >
> I committed the attached patch to the mainline.
>   
> + Using @code{DECL_ASSEMBLER_NAME} will cause additional memory to be
> + allocated (for the mangled name of the entity) so it should be used
> + only when emitting assembly code.  It should not be used within the
> + optimizers to determine whether or not two declarations are the same,
> + even though some of the existing optimizers do use it in that way.
> + These uses will be removed over time.
> + 

OK, so to summarize the sitaution, the conclusion is that the DECL nodes
shall be unique now and thus using DECL_ID indexed hash table shall be
safe?
Also I need to get nodes from identifier names back in varasm.c (while
outputting them).  Is there some comfortable way to get the DECLs back
without having two hashtables?

Would it be OK if I did it once the cgraph merge patch is settled down?

Honza

>   @item DECL_EXTERNAL
>   This predicate holds if the function is undefined.
>   
>   @item TREE_PUBLIC
>   This predicate holds if the function has external linkage.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-02 17:11     ` Jan Hubicka
@ 2004-03-02 17:18       ` Mark Mitchell
  2004-03-02 17:22         ` Jan Hubicka
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2004-03-02 17:18 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Geoff Keating, gcc


>OK, so to summarize the sitaution, the conclusion is that the DECL nodes
>shall be unique now and thus using DECL_ID indexed hash table shall be
>safe?
>  
>
Yes.  If there are problems with that, then we will need to fix the 
front ends.

>Also I need to get nodes from identifier names back in varasm.c (while
>outputting them).  Is there some comfortable way to get the DECLs back
>without having two hashtables?
>  
>
Not that I know of.  The real fix would be to change the interfaces so 
that you had DECLs, rather than names, in varasm.c -- but I know that 
would be a huge change.

>Would it be OK if I did it once the cgraph merge patch is settled down?
>  
>
Yes.  We're not going to make this optimization for 3.4, and there is 
lots of time until 3.5.

I'd really like to see you make progress on some of the wrong-code/ICE 
bugs in 3.4.  For example, the cselib_record_set problems should be fixed.

Thanks,

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-02 17:18       ` Mark Mitchell
@ 2004-03-02 17:22         ` Jan Hubicka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Hubicka @ 2004-03-02 17:22 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, Geoff Keating, gcc

> 
> >OK, so to summarize the sitaution, the conclusion is that the DECL nodes
> >shall be unique now and thus using DECL_ID indexed hash table shall be
> >safe?
> > 
> >
> Yes.  If there are problems with that, then we will need to fix the 
> front ends.
> 
> >Also I need to get nodes from identifier names back in varasm.c (while
> >outputting them).  Is there some comfortable way to get the DECLs back
> >without having two hashtables?
> > 
> >
> Not that I know of.  The real fix would be to change the interfaces so 
> that you had DECLs, rather than names, in varasm.c -- but I know that 
> would be a huge change.
> 
> >Would it be OK if I did it once the cgraph merge patch is settled down?
> > 
> >
> Yes.  We're not going to make this optimization for 3.4, and there is 
> lots of time until 3.5.
> 
> I'd really like to see you make progress on some of the wrong-code/ICE 
> bugs in 3.4.  For example, the cselib_record_set problems should be fixed.

Yes, I plan to work on them this week (I was traveline majority of last
week so didn't had much time for funny tasks like watching bugzilla) The
cselib_record_set is however pretty nasty old issue (not sure how I got
attached to that bug :).  I will probably try to make patch to avoid the
abort as RTH suggest but I think we can get various random
missoptimizations in presence of multiple sets of same destination in
single instruction...

Honza
> 
> Thanks,
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> (916) 791-8304
> mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 23:58   ` Chris Lattner
@ 2004-03-02  0:09     ` Mark Mitchell
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Mitchell @ 2004-03-02  0:09 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Geoff Keating, zack, dalej, gcc

Chris Lattner wrote:

>On Mon, 1 Mar 2004, Mark Mitchell wrote:
>
>  
>
>>>Just to be clear, this is not a problem with IMA, this is a problem with
>>>doing it at the source level.
>>>
>>>      
>>>
>>I completely agree.  In GCC, the current IMA stuff is sort-of halfway
>>in-between.  The representation we use is basically at the source level,
>>but, as Geoff mentions, we don't quite fully squish everything together
>>into one translation unit.  So, it's pretty similar to C, but if there
>>were a source language for this representation, it would also have some
>>kind of "module" construct.
>>    
>>
>
>It is also worth considering how support for Java, Fortran, Ada, ... will
>be effected by the IMA work in general.  If you really want to do IMA, it
>will be important to *nail down* exactly what the representation looks
>like.
>
Exactly!

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 23:26 ` Mark Mitchell
@ 2004-03-01 23:58   ` Chris Lattner
  2004-03-02  0:09     ` Mark Mitchell
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Lattner @ 2004-03-01 23:58 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Geoff Keating, zack, dalej, gcc

On Mon, 1 Mar 2004, Mark Mitchell wrote:

> >Just to be clear, this is not a problem with IMA, this is a problem with
> >doing it at the source level.
> >
> I completely agree.  In GCC, the current IMA stuff is sort-of halfway
> in-between.  The representation we use is basically at the source level,
> but, as Geoff mentions, we don't quite fully squish everything together
> into one translation unit.  So, it's pretty similar to C, but if there
> were a source language for this representation, it would also have some
> kind of "module" construct.

It is also worth considering how support for Java, Fortran, Ada, ... will
be effected by the IMA work in general.  If you really want to do IMA, it
will be important to *nail down* exactly what the representation looks
like.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
  2004-03-01 22:21 Chris Lattner
@ 2004-03-01 23:26 ` Mark Mitchell
  2004-03-01 23:58   ` Chris Lattner
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2004-03-01 23:26 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Geoff Keating, zack, dalej, gcc

Chris Lattner wrote:

>Mark Mitchell wrote:
>  
>
>>There are lots of things that are hard about inter-module analysis, as
>>you of course know. :-) It's an inherent design problem: we're
>>essentially trying to combine multiple C translation units into a single
>>C translation unit, and C is not a language that permits that. Most
>>object file formats are designed to support C, so we see these problems
>>at that level too.
>>    
>>
>
>Just to be clear, this is not a problem with IMA, this is a problem with
>doing it at the source level. 
>
I completely agree.  In GCC, the current IMA stuff is sort-of halfway 
in-between.  The representation we use is basically at the source level, 
but, as Geoff mentions, we don't quite fully squish everything together 
into one translation unit.  So, it's pretty similar to C, but if there 
were a source language for this representation, it would also have some 
kind of "module" construct.

 > This comes up all of the time in real-world programs. zlib and 300.twolf

I know! :-) I used to write tools that found the mismatches, and they 
are pretty common.


--
Mark Mitchell

CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Lazy allocation of DECL_ASSEMBLER_NAME
@ 2004-03-01 22:21 Chris Lattner
  2004-03-01 23:26 ` Mark Mitchell
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Lattner @ 2004-03-01 22:21 UTC (permalink / raw)
  To: Mark Mitchell, Geoff Keating, zack, dalej, gcc


Mark Mitchell wrote:
> There are lots of things that are hard about inter-module analysis, as
> you of course know. :-) It's an inherent design problem: we're
> essentially trying to combine multiple C translation units into a single
> C translation unit, and C is not a language that permits that. Most
> object file formats are designed to support C, so we see these problems
> at that level too.

Just to be clear, this is not a problem with IMA, this is a problem with
doing it at the source level.  C is just a really ugly language and never
really defined what happens if two structs or other declarations don't
agree.

In LLVM we address this by doing IPO on a non-source-level representation,
which fixes the issue, but since LLVM is a type-safe representation, our
linker has to be able to handle cases where you are linking two things
with different types.  In the case of LLVM, the semantics are clearly
defined so this isn't a problem, it just took implementation.  FWIW, this
was aluded to in my GCC summit paper (sec 4.4).

In GCC, perhaps the best way to do it is to make the IMA and normal
optimizer not depend on the source-level types as much as possible, they
simply cannot be trusted.  Silently breaking the program is obviously a
bad alternative.

> In the example that you give, the program is not conforming if the
> definitions of "struct foo" are not identical across translation units.
> (Or, at least, that would be true in ISO C++. It might not have to be
> true in ISO C. You may know more than I.) In any case, if the structures
> do not match up across the translation units, we should just not combine
> them.  That is certainly not the common case.

This comes up all of the time in real-world programs.  zlib and 300.twolf
are examples that spring to mind immediately.  libstdc++ does some other
funky things declaring "std::cin" as an array of characters in one .o
file, etc.  Unfortunately, though compiler venders might like to not
handle this, we _have to_.  It's just not acceptable to base optimization
decisions on type-equality, unless there are conservative fallbacks.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2004-03-02 17:22 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-01 19:10 Lazy allocation of DECL_ASSEMBLER_NAME Mark Mitchell
2004-03-01 19:47 ` Geoff Keating
2004-03-01 19:57   ` Zack Weinberg
2004-03-01 20:21     ` Geoff Keating
2004-03-01 20:25       ` Mark Mitchell
2004-03-01 21:15         ` Zack Weinberg
2004-03-01 21:46           ` Geoff Keating
2004-03-01 22:03             ` Mark Mitchell
2004-03-01 22:54               ` Geoff Keating
2004-03-02  0:03                 ` Mark Mitchell
2004-03-01 21:02       ` Dale Johannesen
2004-03-01 21:45         ` Mark Mitchell
2004-03-01 21:48           ` Dale Johannesen
2004-03-01 20:11   ` Mark Mitchell
2004-03-01 20:19     ` Zack Weinberg
2004-03-02 17:04   ` Mark Mitchell
2004-03-02 17:11     ` Jan Hubicka
2004-03-02 17:18       ` Mark Mitchell
2004-03-02 17:22         ` Jan Hubicka
2004-03-01 20:20 ` Jan Hubicka
2004-03-01 20:27   ` Mark Mitchell
2004-03-01 20:33     ` Jan Hubicka
2004-03-01 21:19     ` Zack Weinberg
2004-03-01 21:19     ` Geoff Keating
2004-03-01 22:21 Chris Lattner
2004-03-01 23:26 ` Mark Mitchell
2004-03-01 23:58   ` Chris Lattner
2004-03-02  0:09     ` Mark Mitchell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).