Re: Removing duplicate DWARF2 info

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Removing duplicate DWARF2 info
@ 2000-07-11 11:34 Mike Stump
  2000-07-11 18:45 ` Daniel Berlin+list.gcc
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Stump @ 2000-07-11 11:34 UTC (permalink / raw)
  To: dberlin, zack; +Cc: gcc

> Date: Mon, 10 Jul 2000 23:54:33 -0400 (EDT)
> From: Daniel Berlin <dberlin@redhat.com>

> I know this has been hashed over before, but nobody ever seems to do
> it where it needs to be done (I do it in GDB when we load the
> stuff), and i'm guessing it's because there are three people on the
> planet with the knowledge of how LD works that can make this happen.

If you did it the right way, in bfd, and if bfd handled all things
symbol, then enabling ld to make use of it also should be easy for
you...  But alas, I fear it wasn't quite done like this.

> Why can't we simply not emit the duplicate info, taking LD out of
> the picture completely.  All it seems this would entail is keeping
> track of what we've emitted info for, over the course of more than
> one file.  Couldn't we have a simple persistent hash table in a
> file, and do a lookup in that, then do the check to see if we
> emitted it during this compilation?  That doesn't really seem that
> tricky to do, or am i missing something?

I often argue for a generic along side database.  The repo database
almost fits that role.  With such a beast, quite a few things become
trivial.  I don't happen to think you're missing too much, assuming
you already know about things like usage of .o files in more than one
executable/library.

Profile base feedback for branch prediction I would argue wants it,
improving compilation speeds for C++ wants it (think in part
precompiled headers), debugging dups want it, templates want it...

> From: Zack Weinberg <zack@wolery.cumb.org>
> Date: Mon, 10 Jul 2000 21:06:20 -0700
> To: Daniel Berlin <dberlin@redhat.com>

> Do you want to teach every makefile in existence about this persistent
> hash table?

Since the repo database could in fact be used to do exactly what is
requested, and since I have in fact not seen a lot of repo stuff in
Makefiles, I'd think this is a slight overstatement of the reality of
the situation, though, without 5 years of experience on a new system,
I do agree, peering into the future is at times, hard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-11 11:34 Removing duplicate DWARF2 info Mike Stump
@ 2000-07-11 18:45 ` Daniel Berlin+list.gcc
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Berlin+list.gcc @ 2000-07-11 18:45 UTC (permalink / raw)
  To: Mike Stump; +Cc: zack, gcc

Mike Stump <mrs@windriver.com> writes:

> > Date: Mon, 10 Jul 2000 23:54:33 -0400 (EDT) From: Daniel Berlin
> ><dberlin@redhat.com>
> 
> > I know this has been hashed over before, but nobody ever seems to do it
> >where it needs to be done (I do it in GDB when we load the stuff), and i'm
> >guessing it's because there are three people on the planet with the knowledge
> 
> >of how LD works that can make this happen.
> 
> If you did it the right way, in bfd, and if bfd handled all things symbol,
> then enabling ld to make use of it also should be easy for you...  But alas, I
> 
> fear it wasn't quite done like this.


I can make BFD read and write DWARF2.
The problem is that removing info once written requires a *lot* of recalculation and rewriting.

Links already take forever because they have to deal with so much duplicate info. If jason can make LD not become an order of magnitude slower with his scheme, i'd be very impressed. It's just going to make things slower than they are already, anyway.
The ideal solution is to not write the info in the first place.



> 
> > Why can't we simply not emit the duplicate info, taking LD out of the
> >picture completely.  All it seems this would entail is keeping track of what
> >we've emitted info for, over the course of more than one file.  Couldn't we
> >have a simple persistent hash table in a file, and do a lookup in that, then
> >do the check to see if we emitted it during this compilation?  That doesn't
> >really seem that tricky to do, or am i missing something?
> 
> I often argue for a generic along side database.
I've seen that discussion (I've been lurking for years) quite a few times.
I've always thought the benefits outweigh the disadvantages, but I seem to be in the minority.

>   The repo database almost
> fits that role.  With such a beast, quite a few things become trivial.  I
> don't happen to think you're missing too much, assuming you already know about
> 
> things like usage of .o files in more than one executable/library.

Yeah, but by answer is that who cares if it's not absolutely perfect.
It would never produce incorrect info.
Let's say we miss 3% of the duplicate info.
We've still removed 97%. of it.
I know of people who have >1 gig of debug info, almost all of it duplicate info.
So we could only remove 993 meg of duplicate info, instead of 1023.99 meg.
Somehow, i think people would still be ecstatic.
Also, if you play the "do it in the linker" game, you still have to process all one gig, and what you do is remove 97% of what is there.

> 
> Profile base feedback for branch prediction I would argue wants it, improving
> compilation speeds for C++ wants it (think in part precompiled headers),
> debugging dups want it, templates want it...

You also missed whole program and inter-file optimizations that would be trivial given a generic database.


> 
> > From: Zack Weinberg <zack@wolery.cumb.org> Date: Mon, 10 Jul 2000 21:06:20
> >-0700 To: Daniel Berlin <dberlin@redhat.com>
> 
> > Do you want to teach every makefile in existence about this persistent hash
> >table?
> 
> Since the repo database could in fact be used to do exactly what is requested,
> 
> and since I have in fact not seen a lot of repo stuff in Makefiles, I'd think
> this is a slight overstatement of the reality of the situation, though,
> without 5 years of experience on a new system, I do agree, peering into the
> future is at times, hard.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-10 21:22     ` Daniel Berlin
@ 2000-07-11  0:22       ` Martin v. Loewis
  0 siblings, 0 replies; 10+ messages in thread
From: Martin v. Loewis @ 2000-07-11  0:22 UTC (permalink / raw)
  To: dberlin; +Cc: meissner, zack, gcc

> Then you can do it using a similar scheme to the way template repositories
> are done, not emitting the actual debug info until right before you link,
> during the recompile.

Are you proposing that every source file is compiled *at least twice*,
once during build and once during linking? I'm sure users of the C++
compiler will love that scheme...

Regards,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-10 21:17   ` Daniel Berlin
@ 2000-07-11  0:18     ` Martin v. Loewis
  0 siblings, 0 replies; 10+ messages in thread
From: Martin v. Loewis @ 2000-07-11  0:18 UTC (permalink / raw)
  To: dberlin; +Cc: zack, gcc

> When you clean out the build directory, you lose the info.
> You lost the objects too, so what's the problem.
> It's like -frepo, but for debug info instead.

Indeed, yet another repository mechanism. A number of gcc contributors
(including myself) believe that any repository mechanism is inherently
broken, and that it is impossible to have it work in all cases where
it currently works.

Apart from the "multiple build directory" issue, you also have the
"static library issue", where you have enormous problems locating the
repository for object files that live in a .a file.

Regards,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-10 21:16   ` Michael Meissner
@ 2000-07-10 21:22     ` Daniel Berlin
  2000-07-11  0:22       ` Martin v. Loewis
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Berlin @ 2000-07-10 21:22 UTC (permalink / raw)
  To: Michael Meissner; +Cc: Zack Weinberg, gcc

> 
> I suspect it is more that people who usually work on the linker are busy, and
> it hasn't filtered up yet as important.

Somehow i suspect this isn't right. It's particularly annoying to rewrite
dwarf2 info once it's been written, because of how it works.

It's easier to either
A. not write it in the first place
or
B. ignore duplicates when reading it in

In fact, i've yet to see a LINKER that removes duplicate dwarf2 info.
It's always done as a prelinking step.

> 
> Consider several different build trees.  You also can't assume that all objects
> in a build tree are going to be linked into the same executable.  The classic
> case is GCC itself, which uses logically distinct build and host compilers.
> 
> In addition, libraries may be built by other people that may include debug
> information, that may or may not be the same as the information you see.

Then you can do it using a similar scheme to the way template repositories
are done, not emitting the actual debug info until right before you link,
during the recompile.

--Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-10 21:06 ` Zack Weinberg
  2000-07-10 21:16   ` Michael Meissner
@ 2000-07-10 21:17   ` Daniel Berlin
  2000-07-11  0:18     ` Martin v. Loewis
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Berlin @ 2000-07-10 21:17 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

> 
> Do you want to teach every makefile in existence about this persistent
> hash table?  Which will probably require adding features to make, and
> therefore locking every makefile in existence into one implementation
> of make - which is a thing we try ridiculously hard not to do?
> 
> And don't think you can get away with not telling make about it,
> either.  Consider what happens when you clean out a build directory -
> and that's the *simplest* case.
> 

When you clean out the build directory, you lose the info.
You lost the objects too, so what's the problem.
It's like -frepo, but for debug info instead.
> zw
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-10 21:06 ` Zack Weinberg
@ 2000-07-10 21:16   ` Michael Meissner
  2000-07-10 21:22     ` Daniel Berlin
  2000-07-10 21:17   ` Daniel Berlin
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Meissner @ 2000-07-10 21:16 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Daniel Berlin, gcc

On Mon, Jul 10, 2000 at 09:06:20PM -0700, Zack Weinberg wrote:
> On Mon, Jul 10, 2000 at 11:54:33PM -0400, Daniel Berlin wrote:
> > I know this has been hashed over before, but nobody ever seems to do it
> > where it needs to be done (I do it in GDB when we load the stuff), and i'm
> > guessing it's because there are three people on the planet with the
> > knowledge of how LD works that can make this happen.

I suspect it is more that people who usually work on the linker are busy, and
it hasn't filtered up yet as important.

> > Why can't we simply not emit the duplicate info, taking LD out of the
> > picture completely.
> > All it seems this would entail is keeping track of what we've emitted info
> > for, over the course of more than one file. 
> > Couldn't we have a simple persistent hash table in a file, and do a lookup
> > in that, then do the check to see if we emitted it during this
> > compilation?
> > That doesn't really seem that tricky to do, or am i missing something?
> 
> Do you want to teach every makefile in existence about this persistent
> hash table?  Which will probably require adding features to make, and
> therefore locking every makefile in existence into one implementation
> of make - which is a thing we try ridiculously hard not to do?
> 
> And don't think you can get away with not telling make about it,
> either.  Consider what happens when you clean out a build directory -
> and that's the *simplest* case.

Consider several different build trees.  You also can't assume that all objects
in a build tree are going to be linked into the same executable.  The classic
case is GCC itself, which uses logically distinct build and host compilers.

In addition, libraries may be built by other people that may include debug
information, that may or may not be the same as the information you see.

-- 
Michael Meissner, Red Hat, Inc.
PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA
Work:	  meissner@redhat.com		phone: +1 978-486-9304
Non-work: meissner@spectacle-pond.org	fax:   +1 978-692-4482

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-10 20:54 Daniel Berlin
  2000-07-10 21:06 ` Zack Weinberg
@ 2000-07-10 21:07 ` Jeffrey A Law
  1 sibling, 0 replies; 10+ messages in thread
From: Jeffrey A Law @ 2000-07-10 21:07 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

  In message < Pine.LNX.4.21.0007102350050.29170-100000@devserv.devel.redhat.com
>you write:
  > I know this has been hashed over before, but nobody ever seems to do it
  > where it needs to be done (I do it in GDB when we load the stuff), and i'm
  > guessing it's because there are three people on the planet with the
  > knowledge of how LD works that can make this happen.
  > 
  > Why can't we simply not emit the duplicate info, taking LD out of the
  > picture completely.
  > All it seems this would entail is keeping track of what we've emitted info
  > for, over the course of more than one file. 
  > Couldn't we have a simple persistent hash table in a file, and do a lookup
  > in that, then do the check to see if we emitted it during this
  > compilation?
  > That doesn't really seem that tricky to do, or am i missing something?
I don't believe dwarf2 debug records are structured in a way that makes
that kind of scheme easy.

I believe Jason Merrill (jason@cygnus.com) has devised a scheme that allows
these kinds of optimizations to be implemented (in the linker).  But I do not
know its details offhand.


jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Removing duplicate DWARF2 info
  2000-07-10 20:54 Daniel Berlin
@ 2000-07-10 21:06 ` Zack Weinberg
  2000-07-10 21:16   ` Michael Meissner
  2000-07-10 21:17   ` Daniel Berlin
  2000-07-10 21:07 ` Jeffrey A Law
  1 sibling, 2 replies; 10+ messages in thread
From: Zack Weinberg @ 2000-07-10 21:06 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On Mon, Jul 10, 2000 at 11:54:33PM -0400, Daniel Berlin wrote:
> I know this has been hashed over before, but nobody ever seems to do it
> where it needs to be done (I do it in GDB when we load the stuff), and i'm
> guessing it's because there are three people on the planet with the
> knowledge of how LD works that can make this happen.
> 
> Why can't we simply not emit the duplicate info, taking LD out of the
> picture completely.
> All it seems this would entail is keeping track of what we've emitted info
> for, over the course of more than one file. 
> Couldn't we have a simple persistent hash table in a file, and do a lookup
> in that, then do the check to see if we emitted it during this
> compilation?
> That doesn't really seem that tricky to do, or am i missing something?

Do you want to teach every makefile in existence about this persistent
hash table?  Which will probably require adding features to make, and
therefore locking every makefile in existence into one implementation
of make - which is a thing we try ridiculously hard not to do?

And don't think you can get away with not telling make about it,
either.  Consider what happens when you clean out a build directory -
and that's the *simplest* case.

zw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Removing duplicate DWARF2 info
@ 2000-07-10 20:54 Daniel Berlin
  2000-07-10 21:06 ` Zack Weinberg
  2000-07-10 21:07 ` Jeffrey A Law
  0 siblings, 2 replies; 10+ messages in thread
From: Daniel Berlin @ 2000-07-10 20:54 UTC (permalink / raw)
  To: gcc

I know this has been hashed over before, but nobody ever seems to do it
where it needs to be done (I do it in GDB when we load the stuff), and i'm
guessing it's because there are three people on the planet with the
knowledge of how LD works that can make this happen.

Why can't we simply not emit the duplicate info, taking LD out of the
picture completely.
All it seems this would entail is keeping track of what we've emitted info
for, over the course of more than one file. 
Couldn't we have a simple persistent hash table in a file, and do a lookup
in that, then do the check to see if we emitted it during this
compilation?
That doesn't really seem that tricky to do, or am i missing something?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2000-07-11 18:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-07-11 11:34 Removing duplicate DWARF2 info Mike Stump
2000-07-11 18:45 ` Daniel Berlin+list.gcc
  -- strict thread matches above, loose matches on Subject: below --
2000-07-10 20:54 Daniel Berlin
2000-07-10 21:06 ` Zack Weinberg
2000-07-10 21:16   ` Michael Meissner
2000-07-10 21:22     ` Daniel Berlin
2000-07-11  0:22       ` Martin v. Loewis
2000-07-10 21:17   ` Daniel Berlin
2000-07-11  0:18     ` Martin v. Loewis
2000-07-10 21:07 ` Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).