public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: gcc binary output differs whether it is built from *.o or *.a
@ 2010-01-11 22:07 Michael Morrell
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Morrell @ 2010-01-11 22:07 UTC (permalink / raw)
  To: gcc-help

I've been doing a LOT of work lately trying to understand why I don't
always get 100% identical output from identical source, so I'm somewhat
of an expert in this area.

> I assume that it does not come from the ar packaging, because
> checksums on ar archives are never the same.

The reason ar archive checksums are different is because there is a timestamp stored of the archive members (something the new and long overdue
-D option will address).  However, your problem is still with how you are
calling "ar".  If you put the members in the same order as you would list
them on the compile line, you will get an identical output:

  gcc -o foo main.o 1.o 2.o

vs

  ar q bar.a 1.o 2.o
  gcc -o foo main.o bar.a

The use of "-g" won't affect this.

  Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gcc binary output differs whether it is built from *.o or *.a
  2010-01-08 15:47   ` Ian Lance Taylor
@ 2010-01-17 14:41     ` BEAUGY Alexandre
  0 siblings, 0 replies; 6+ messages in thread
From: BEAUGY Alexandre @ 2010-01-17 14:41 UTC (permalink / raw)
  To: gcc-help

Dear Ian and Michael,

Thanks a lot for your answers, which both focuses on ar 
non-deterministic behaviour (in binutils versions prior to 2.20).

And to the solutions you proposed, i.e. use a more recent version of 
binutils (>=2.20) + activate the D modifier to generate a deterministic 
archive.

Nevertheless, I'm currently working on an RHEL4.4 which still uses a 
binutils v2.15.92.0.2-21, and I'm stuck with it. Therefore I will 
continue using my workaround.

Again, thanks a lot to you all.

Regards,

-- 
Alexandre Beaugy

On 08/01/2010 16:46, Ian Lance Taylor wrote:
> Alexandre Beaugy<beaugy.a@free.fr>  writes:
>
>    
>> Then, how does gcc behaves at link time? How did it handles static
>> libraries? Or, how perhaps, the linker behaves? Does gcc pass it a
>> list of object files to link (considering it previously extracted the
>> files from the ar archive)? Why does some binaries, linked with static
>> libs, have a constant checksum, where others cannot consecutively have
>> the same checksum twice, with the same C sources?
>>      
> gcc's behaviour at link time is to pass the arguments to the linker,
> with some prepended files and some appended files and libraries.
>
> When using an archive, the main effect on the linker is the order of
> the symbols in the archive symbol map.  You can see the archive symbol
> map using nm --print-armap on the archive.  The order of the symbols
> in the archive symbol map is determined by the order in which the
> objects are passed to the ar program when creating the archive.
>
> I assume you are using precisely the same ld and ar programs in all
> cases, otherwise all bets are off.
>
> By the way, as of binutils 2.20 GNU ar supports a D modifier to
> generate a deterministic archive, which should always be bitwise
> identical given the same inputs.
>    

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gcc binary output differs whether it is built from *.o or *.a
  2010-01-08 14:59 ` Alexandre Beaugy
@ 2010-01-08 15:47   ` Ian Lance Taylor
  2010-01-17 14:41     ` BEAUGY Alexandre
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Lance Taylor @ 2010-01-08 15:47 UTC (permalink / raw)
  To: Alexandre Beaugy; +Cc: gcc-help

Alexandre Beaugy <beaugy.a@free.fr> writes:

> Then, how does gcc behaves at link time? How did it handles static
> libraries? Or, how perhaps, the linker behaves? Does gcc pass it a
> list of object files to link (considering it previously extracted the
> files from the ar archive)? Why does some binaries, linked with static
> libs, have a constant checksum, where others cannot consecutively have
> the same checksum twice, with the same C sources?

gcc's behaviour at link time is to pass the arguments to the linker,
with some prepended files and some appended files and libraries.

When using an archive, the main effect on the linker is the order of
the symbols in the archive symbol map.  You can see the archive symbol
map using nm --print-armap on the archive.  The order of the symbols
in the archive symbol map is determined by the order in which the
objects are passed to the ar program when creating the archive.

I assume you are using precisely the same ld and ar programs in all
cases, otherwise all bets are off.

By the way, as of binutils 2.20 GNU ar supports a D modifier to
generate a deterministic archive, which should always be bitwise
identical given the same inputs.

Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gcc binary output differs whether it is built from *.o or *.a
       [not found] <86816328.6906451262962766781.JavaMail.root@spooler8-g27.priv.proxad.net>
@ 2010-01-08 14:59 ` Alexandre Beaugy
  2010-01-08 15:47   ` Ian Lance Taylor
  0 siblings, 1 reply; 6+ messages in thread
From: Alexandre Beaugy @ 2010-01-08 14:59 UTC (permalink / raw)
  To: gcc-help

----- "Cedric Roux" <cedric.roux@acri-st.fr> a écrit :
> beaugy.a@free.fr wrote:
> > Hi all,
> > So, to put it in a nutshell, all my generated objects file are
> > identical on dev1 and dev2 and object files contained in my
> > convenience libraries are all identical. The only difference
> > remaining, before I generate my binary, resides in the generated
> > convenience libraries which are not identical, but their contents
> > are. So AFAK, this slight difference shall not make the difference.
> So
> > "why does gcc output (MD5 checksum) differs when I build a binary
> > using the project object files (*.o) or the project convenience
> > libraries (*.a)?" and "what can I do to fix that?".
> 
> Your *.o files are proceeded in a different order when
> on the command line and on the .a archive, putting symbols
> on different addresses, so obviously different binaries
> are produced.
> 
> Even if you do:
> gcc 1.o 2.o
> And:
> gcc 2.o 1.o
> you get binaries with different md5.

As you (and Eljay, later) suggested to me, the order of objects files
on my compilation commandline matters. I dare say I already noticed
that very point (as my bottom mail workaround will show you...). I
assume it has to do with the adressing, which will differ as object
files' order changes. Nevertheless, as far as I'm using the autotools,
object files' order remain the same on the commandline, so the as the
checksum should too.

For Eljay suggestion, about using "-frandom-seed=foo" for C++
objects generation. I already am using this option for C++
objects. And mainly, my project is C based.

For Eljay second remark, about debugging info that could make
checksums vary, I also dare say that I have noticed this
behaviour. Therefore, I removed all kind of "-g*" options in
compilation options.

At the end, with both "-frandom-seed=foo" and debug info removed, I am
able to always generate reproducibly identical object files.

After that if I link all object files in a binary, I always get the
same CRC checksum for the binary (anywhere, at anytime). But if I
choose, to group together my object files into various static
libraries, before linking those very static libraries in a binary, it
results in:
    - Binaries with a similar checksum (anywhere, at anytime);
    - But sometimes binaries with different checksums.

I assume that it does not come from the ar packaging, because
checksums on ar archives are never the same. The explanation comes, I
think, from the fact that ar, like tar, stores all the information
about object files archived (E.g. creation date, owner, group,
etc.). Information that may vary on the different compilation platforms.

Therefore, I think that the difference between the binaries produced
on my two platforms, comes from the (possibly) different ways, ar
archives extracts object files. Producing different object files
lists. And we are back to your suggestion about the order in the
commandline.

This is confirmed by the workaround I had to put in some of my
projects to generate reproducibly identical binaries:

[BEGIN: Makefile.am extract]
foo$(EXEEXT): $(foo_OBJECTS) $(foo_DEPENDENCIES) 
	@rm -f foo$(EXEEXT)
	@rm -rf tmp-ar-o
	@mkdir -p tmp-ar-o
	@pattern="[[:blank:]]*\([^[:blank:]]\+\)/lib\([^[:blank:]]\+\)\.la" ; \
	 lib_list=`echo "$(foo_LDADD)" | \
	           sed "s#$$pattern# \1/.libs/lib\2.a#g"` ; \
	 pattern="[[:blank:]]*\([\.][^[:blank:]]\+\)\.a" ; \
	 lib_list=`echo "$$lib_list" | \
	           sed "/[[:blank:]]\+-l/d ; s#$$pattern# ../\1.a#g"` ; \
	 pushd tmp-ar-o ; \
	 for lib in $$lib_list ; do \
	   echo "ar x $$lib" ; \
	   ar x $$lib ; \
	 done ; \
	 rm -f *-2.o ; \
	 for dupobj in `ls *-1.o` ; do \
	   obj=`echo $$dupobj | sed 's/-1//'` ; \
	   mv $$dupobj $$obj ; \
	 done ; \
	 popd
	$(LINK) $(foo_LDFLAGS) $(foo_OBJECTS) tmp-ar-o/*.o $(LIBS)
	@rm -rf tmp-ar-o
[END: Makefile.am extract]

(It extracts the project's local static libraries content into a
 temporary folder and then do the linking. For a better understanding,
 LDFLAGS are only external shared libraries (E.g. libm.so,
 libpthread.so, etc.)).

Then, how does gcc behaves at link time? How did it handles static
libraries? Or, how perhaps, the linker behaves? Does gcc pass it a
list of object files to link (considering it previously extracted the
files from the ar archive)? Why does some binaries, linked with static
libs, have a constant checksum, where others cannot consecutively have
the same checksum twice, with the same C sources?

Hope you guys could help me.

Thanks a lot for your help.

Regards,

-- 
Alexandre Beaugy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gcc binary output differs whether it is built from *.o or *.a
  2010-01-07 10:02 ` beaugy.a
@ 2010-01-07 10:28   ` Cedric Roux
  0 siblings, 0 replies; 6+ messages in thread
From: Cedric Roux @ 2010-01-07 10:28 UTC (permalink / raw)
  To: beaugy.a; +Cc: gcc-help

beaugy.a@free.fr wrote:
> Hi all,
> So, to put it in a nutshell, all my generated objects file are
> identical on dev1 and dev2 and object files contained in my
> convenience libraries are all identical. The only difference
> remaining, before I generate my binary, resides in the generated
> convenience libraries which are not identical, but their contents
> are. So AFAK, this slight difference shall not make the difference. So
> "why does gcc output (MD5 checksum) differs when I build a binary
> using the project object files (*.o) or the project convenience
> libraries (*.a)?" and "what can I do to fix that?".

Your *.o files are proceeded in a different order when
on the command line and on the .a archive, putting symbols
on different addresses, so obviously different binaries
are produced.

Even if you do:
gcc 1.o 2.o
And:
gcc 2.o 1.o
you get binaries with different md5.

My 0.02 euros.
Cédric.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* gcc binary output differs whether it is built from *.o or *.a
       [not found] <471223414.6661911262858011824.JavaMail.root@spooler8-g27.priv.proxad.net>
@ 2010-01-07 10:02 ` beaugy.a
  2010-01-07 10:28   ` Cedric Roux
  0 siblings, 1 reply; 6+ messages in thread
From: beaugy.a @ 2010-01-07 10:02 UTC (permalink / raw)
  To: gcc-help

Hi all,

To fully understand the purpose of my e-mail, you shall know what my
goal is: "produce reproducibly identical binaries and libraries on all
my development computers" (ie.: produce binaries and libraries which
MD5 checksum will be the same on all my development computers).

Here is a light description of my environment:

Computer 1:
  - hostname: dev1
  - user: userX
  - home: /home/userX
  - OS: RHEL4-U4

Computer 2:
  - hostname: dev2
  - user: userY
  - home: /home/userY
  - OS: RHEL4-U4

On both computers, packages set installed are equivalent
(cloned). Only user names used for login differs from one computer to
the other (will be important, considering static libraries checksum).

Now the full version of my e-mail object: "Why does gcc output (MD5
checksum) differs when I build a binary using the project object files
(*.o) or the project convenience libraries (*.a)?".

In my peregrinations, first of all, I realized that the first source
of variable checksum was the non-systematic use of "-fPIC -DPIC" in
gcc options for object files generation. Therefore, I configured my
project with "--with-pic" too force the use of PIC on each object
generated. The result was as expected: all object files produced,
whether on dev1 or on dev2 are identical (same md5sum). Nevertheless,
the generated binary is finally still different on dev1 and dev2.

I looked at the convenience libraries I produced and discovered that
none of them was identical between dev1 and dev2 one. A quick edit of
the produced ar archives and a little reading on
http://en.wikipedia.org/wiki/Ar_%28Unix%29 gave me the explanation of
all that. Ar, as tar also do, stores various info on objects archived:
such as its modification timestamp, its uid, its gid, etc. Therefore,
AFAK, in any case, convenience libraries could be produced two times
successively with the same MD5 checksum (i.e.: if object files are
also re-generated, timestamps will change MD5 checksum). Therefore, to
understand what was going on in there, I extracted all ar archive
contents in one directory and I checked all my object files. All
object files extracted from convenience libraries were the same as the
one spread over my project directory. Good point, but it was also what
I expected.

Therefore I tried a little workaround to reach my goal: extract all
object files contained in my convenience libraries and build my binary
from these extracted object files. And here are the results obtained
on both computers:

dev1:
  - Checksum of binary built from *.a: fda85686a499d273596d33b661a01913
  - Checksum of binary built from *.o: 9e1525d404db45bf920e54a8728313eb

dev2:
  - Checksum of binary built from *.a: 5f8c6a658343741f1a251f00ab60a9c3
  - Checksum of binary built from *.o: 9e1525d404db45bf920e54a8728313eb

Then, when building my binary with object files, both generated
binaries are identical on dev1 and dev2. But if I use convenience
libraries as input, generated binaries are different.

So, to put it in a nutshell, all my generated objects file are
identical on dev1 and dev2 and object files contained in my
convenience libraries are all identical. The only difference
remaining, before I generate my binary, resides in the generated
convenience libraries which are not identical, but their contents
are. So AFAK, this slight difference shall not make the difference. So
"why does gcc output (MD5 checksum) differs when I build a binary
using the project object files (*.o) or the project convenience
libraries (*.a)?" and "what can I do to fix that?".

Here are the commands I executed on dev1 and dev2 to generate my
binaries, respectively, with object files (*.o) or convenience
libraries (*.a) as inputs:

gcc -Wall -Wextra -Ulinux -Dlinux=linux -D_GNU_SOURCE -DNDEBUG \
-o my_binary *.o -L/opt/snmp/lib /opt/snmp/lib/libnetsnmpmibs.so \
/usr/lib/librpm.so -L/usr/src/build/757196-i386/install/usr/lib \
-L/usr/lib /usr/lib/librpmdb.so -L/usr/local/lib -lelf -lselinux \
/usr/lib/librpmio.so /usr/lib/libbeecrypt.so -lbz2 /usr/lib/libpopt.so \
/opt/snmp/lib/libnetsnmphelpers.so /opt/snmp/lib/libnetsnmpagent.so \
/opt/snmp/lib/libnetsnmp.so -lcrypto /usr/lib/libccgnu2.so \
/usr/lib/libccext2.so -lrt -lxerces-c -ldl /usr/lib/libxml2.so -lpthread \
-lz -lm -lcib -lplumb -lcrmcommon -lhbclient -lc -Wl,--rpath \
-Wl,/opt/snmp/lib -Wl,--rpath -Wl,/opt/snmp/lib

gcc -Wall -Wextra -Ulinux -Dlinux=linux -D_GNU_SOURCE -DNDEBUG \
-o my_binary -L/opt/snmp/lib /opt/snmp/lib/libnetsnmpmibs.so \
/usr/lib/librpm.so -L/usr/src/build/757196-i386/install/usr/lib \
-L/usr/lib /usr/lib/librpmdb.so -L/usr/local/lib -lelf -lselinux \
/usr/lib/librpmio.so /usr/lib/libbeecrypt.so -lbz2 /usr/lib/libpopt.so \
/opt/snmp/lib/libnetsnmphelpers.so /opt/snmp/lib/libnetsnmpagent.so \
/opt/snmp/lib/libnetsnmp.so -lcrypto /usr/lib/libccgnu2.so \
/usr/lib/libccext2.so -lrt -lxerces-c -ldl /usr/lib/libxml2.so -lpthread \
-lz -lm -lcib -lplumb -lcrmcommon -lhbclient *.a -lc -Wl,--rpath \
-Wl,/opt/snmp/lib -Wl,--rpath -Wl,/opt/snmp/lib

Rem: These commandlines originally comes from my autotools Makefile.

Thank you, in advance, for your help.

Kind Regards,

--
Alexandre BEAUGY
Product Engineer at Egis Avia
(Toulouse, France)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-01-17 14:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-11 22:07 gcc binary output differs whether it is built from *.o or *.a Michael Morrell
     [not found] <86816328.6906451262962766781.JavaMail.root@spooler8-g27.priv.proxad.net>
2010-01-08 14:59 ` Alexandre Beaugy
2010-01-08 15:47   ` Ian Lance Taylor
2010-01-17 14:41     ` BEAUGY Alexandre
     [not found] <471223414.6661911262858011824.JavaMail.root@spooler8-g27.priv.proxad.net>
2010-01-07 10:02 ` beaugy.a
2010-01-07 10:28   ` Cedric Roux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).