public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* RFC: include mod time and size in DWARF file name table
@ 2013-01-12 13:24 Martin Runge
  2013-01-13  4:19 ` John Gilmore
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Runge @ 2013-01-12 13:24 UTC (permalink / raw)
  To: gdb; +Cc: Martin Runge

Hi all,

The DWARF standard seems to have fields for the modification time and
file size of each source file that was used to build a binary.
Standard tools like readelf and objdump will display them but by
default, gcc fills the with zero (except for VMS targets):

readelf -wl dwarftest1

.....
 The File Name Table:
  Entry Dir     Time    Size    Name
  1     1       0       0       dwarftest1.cpp
  2     2       0       0       iostream
  3     2       0       0       cstddef
  4     2       0       0       cwchar
....

I patched gcc ( gas from binutils, to more precise ) to fill in
modification time and file size of the source files. The result:

readelf -wl dwarftest1

....
 The File Name Table:
  Entry Dir     Time    Size    Name
  1     1       1352976872      149     dwarftest1.cpp
  2     2       1353406399      2665    iostream
  3     3       1353406307      12542   stddef.h
  4     2       1353406399      1838    cstddef
  5     2       1353406399      6665    cwchar
....

This information can be used to check, if one is debugging the right
source file, or if the source file has changed since compilation,
which leads to displaced stepping and other strange errors.

It can also be used to detect errors like the following:
a program that links to a library and the library itself have been
compiled using different versions of the header files that specify the
library's interface, e.g. data types passed from one to the other.

So why do I post this in gdb ML since GCC/binutils fill in this information?

There are comments in the GCC sources that explain, that GCC fills in
zeros instead of time and size, because some debuggers stop working
correctly when this information is present.

Is this still valid? I think, for gdb and Linux targets its not true.
I have started developing a gdb patch that will check source files
based on the time and size information before setting breakpoints
there or displaying the source.

Does anybody see any problems with this approach that I just havn't
run into yet? And if it works, would the gdb developers support a
change request to GCC/binutils?

regards
Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RFC: include mod time and size in DWARF file name table
  2013-01-12 13:24 RFC: include mod time and size in DWARF file name table Martin Runge
@ 2013-01-13  4:19 ` John Gilmore
  2013-01-13  5:54   ` Joel Brobecker
  0 siblings, 1 reply; 5+ messages in thread
From: John Gilmore @ 2013-01-13  4:19 UTC (permalink / raw)
  To: Martin Runge; +Cc: gdb, Martin Runge

> Does anybody see any problems with this approach that I just havn't
> run into yet? And if it works, would the gdb developers support a
> change request to GCC/binutils?

Yes.  The big problem with this approach is that you can never compile
the same source code files to produce the same binary code files.

In other words, it becomes hard to tell whether you actually have the
source code that matches your binaries -- because there will always be
diffs between the binaries you're investigating, and the binaries that
you just compiled from source.  You start needing more and more
complicated "object file diff" tools, which can have their own obscure
failures because they aren't commonly used like "cmp" and "diff" are.

At Cygnus we weeded out all the ways in which dates, times, temporary
file names, hostnames, etc, were wending their way into object files.
We did that so we could make damn sure that the compiler was producing
the exact same object code when compiling on every host platform.  So
that we could make sure that the 3-phase compiler bootstrap produced
the same object code when the compiler was compiled with itself, as
when the compiler was compiled with some other compiler.  So that
we could verify that a Linux distribution that claims to ship "matching
source code" actually does ship the source code that matches all their
binaries.  (So far I don't know of any distro that actually does
validate this -- so I suspect they are failing in a variety of ways.
What isn't regularly tested is probably broken.)

We found many, many bugs of all sorts by being able to do direct
binary comparison to detect allegedly "just the same" object files
that actually weren't the same.  Uninitialized variables, alignment
issues, byte order problems, freeing of memory before using it, etc,
etc, etc, all turn up as minor, minor changes in the generated object
files.  The minor loss of function such as not having GDB warn you
when you invoke the wrong symbol file, is well worth the price of
immediately and automatically detecting all those bugs.

	John

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RFC: include mod time and size in DWARF file name table
  2013-01-13  4:19 ` John Gilmore
@ 2013-01-13  5:54   ` Joel Brobecker
  2013-01-14 12:31     ` Martin Runge
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Brobecker @ 2013-01-13  5:54 UTC (permalink / raw)
  To: John Gilmore; +Cc: Martin Runge, gdb, Martin Runge

> Yes.  The big problem with this approach is that you can never compile
> the same source code files to produce the same binary code files.
[snip]

I can also confirm that he have had many reports about this as well.

-- 
Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RFC: include mod time and size in DWARF file name table
  2013-01-13  5:54   ` Joel Brobecker
@ 2013-01-14 12:31     ` Martin Runge
  2013-01-15 17:51       ` Tom Tromey
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Runge @ 2013-01-14 12:31 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: John Gilmore, gdb, Martin Runge

Thanks for your replies, I now understand that just looking at the
modification time (and size) is not exact enough:

We often use code generators in our tooling (moc, uic, resource
compiler from Qt, as well as flex, bison and many, many more). They
generate the source files fed into gcc at build time. So the compiler
will put the generation timestamp into the mentioned fields of the
debug info, which will differ on every single rebuild even if the
sources (those under version control) did not change at all.

On thing is about making sure that one uses compatible binaries/symbol
files on host and target for remote debugging. This is necessary for
gdb to work correctly. The other thing is about finding errors caused
by incorrect builds and maybe give gdb the needed features to detect
these problems, too. E.g. being able to find out, why the binaries
differ, what source file caused the problem.

There are still some cases left that stop gdb from working correctly
that cannot be solved by binary comparison:

Imagine an executable that uses a library. The executable is not
changed very often, but the library is rebuild every day. If the
library's interface (defined in a header file included by both)
changes, you get a process (the executable with the library loaded)
that knows two versions of the things defined in that header file. How
can this be detected, once the wrong build messed things up?

Similar error can be introduced by using different compiler switches
or compiler versions, possibly with different defines set by default
for building both.

Actually its about debugging your build system.

best regards
Martin


2013/1/13 Joel Brobecker <brobecker@adacore.com>:
>> Yes.  The big problem with this approach is that you can never compile
>> the same source code files to produce the same binary code files.
> [snip]
>
> I can also confirm that he have had many reports about this as well.
>
> --
> Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RFC: include mod time and size in DWARF file name table
  2013-01-14 12:31     ` Martin Runge
@ 2013-01-15 17:51       ` Tom Tromey
  0 siblings, 0 replies; 5+ messages in thread
From: Tom Tromey @ 2013-01-15 17:51 UTC (permalink / raw)
  To: Martin Runge; +Cc: Joel Brobecker, John Gilmore, gdb, Martin Runge

>>>>> "Martin" == Martin Runge <martin.runge@web.de> writes:

Martin> Imagine an executable that uses a library. The executable is not
Martin> changed very often, but the library is rebuild every day. If the
Martin> library's interface (defined in a header file included by both)
Martin> changes, you get a process (the executable with the library loaded)
Martin> that knows two versions of the things defined in that header file. How
Martin> can this be detected, once the wrong build messed things up?

I would suggest writing a tool to check the ABI using the debuginfo.
Then you can compare types between the executable and the library.
There are already some ABI checkers out there, maybe one would suit your
purposes.

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-15 17:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-12 13:24 RFC: include mod time and size in DWARF file name table Martin Runge
2013-01-13  4:19 ` John Gilmore
2013-01-13  5:54   ` Joel Brobecker
2013-01-14 12:31     ` Martin Runge
2013-01-15 17:51       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).