public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* Comparing binaries automatically
@ 2017-12-19  4:48 Philip Prindeville
  2017-12-19 13:06 ` Simon Richter
  0 siblings, 1 reply; 4+ messages in thread
From: Philip Prindeville @ 2017-12-19  4:48 UTC (permalink / raw)
  To: binutils

Hi.

We’ve got a situation were we’re refactoring Makefiles, headers, etc. and want to make sure that the resultant code is still what we’re expecting.

The easiest way to do this would be to compare the binaries with “cmp” or something like that, but unfortunately it would be easy and wrong.

Program is we have things in our build automation like:

REVISION := $(shell git log -1 --pretty=format:%h .)

CPPFLAGS += -DREVISION=‘“$(REVISION)”'

And there are programs that build __DATE__ and __USAGE__ into their help, etc.

Obviously these will change with successive compilations.

Is there an easy way to say, in high-level terms, “compare a.out and b.out ignoring section .xyzzy or the contents of memory occupied by variables ‘foo’ and ‘bar’”?

Then I could do:

const char __date[] = __DATE__;

and then interpolate “__date” into printf-format arguments instead of __DATE__… but say “ignore the symbol ‘__date’ when comparing executables".

Or alternately:

const char __date[] __attribute__((section(“xyzzy”))) = __DATE__;

and then say “ignore section .xyzzy when comparing executables”.

I thought about using objcopy and deleting the “.xyzzy” section (objcopy -R xyzzy ...) and using that for comparison, but that would result in undefined symbols.

I doubt I’m the first to need to do this, and indeed when I was at Cisco and we needed to validate a new compiler or toolchain, we’d swap out tools but do a sophisticated comparison of the binaries to see if they were identical… and then when they weren’t, we’d have to sift through disassembly listings trying to figure out what changed, why, and if the new code was equivalent or better (and correct, obviously).

Just wish I had asked some questions about what the methodology was for the validation.

Equally, I wish that binutils included a “objcmp” utility for comparing executables, module symbols and sections you might want to ignore.

Come to thing of it, such a utility would be very useful for validating changes to gcc, ld, gas, etc.

Anyone know of a way to do a “smart” comparison of executables (or relocatable objections, or shared libraries, or…)?

Thanks,

-Philip

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Comparing binaries automatically
  2017-12-19  4:48 Comparing binaries automatically Philip Prindeville
@ 2017-12-19 13:06 ` Simon Richter
  2017-12-19 19:09   ` Philip Prindeville
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Richter @ 2017-12-19 13:06 UTC (permalink / raw)
  To: binutils


[-- Attachment #1.1: Type: text/plain, Size: 556 bytes --]

Hi,

On 19.12.2017 05:48, Philip Prindeville wrote:

> Anyone know of a way to do a “smart” comparison of executables (or relocatable objections, or shared libraries, or…)?

The Debian Reproducible Builds project has a tool called "diffoscope",
as well as a few years experience reproducing compilation results, so
this may be a good starting point:

https://wiki.debian.org/ReproducibleBuilds

They don't have a good solution for __DATE__ yet though, unless you
consider a compiler flag warning about its use a solution.

   Simon


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Comparing binaries automatically
  2017-12-19 13:06 ` Simon Richter
@ 2017-12-19 19:09   ` Philip Prindeville
  2017-12-20  1:15     ` Philip Prindeville
  0 siblings, 1 reply; 4+ messages in thread
From: Philip Prindeville @ 2017-12-19 19:09 UTC (permalink / raw)
  To: Simon Richter; +Cc: binutils



> On Dec 19, 2017, at 6:06 AM, Simon Richter <Simon.Richter@hogyros.de> wrote:
> 
> Hi,
> 
> On 19.12.2017 05:48, Philip Prindeville wrote:
> 
>> Anyone know of a way to do a “smart” comparison of executables (or relocatable objections, or shared libraries, or…)?
> 
> The Debian Reproducible Builds project has a tool called "diffoscope",
> as well as a few years experience reproducing compilation results, so
> this may be a good starting point:
> 
> https://wiki.debian.org/ReproducibleBuilds
> 
> They don't have a good solution for __DATE__ yet though, unless you
> consider a compiler flag warning about its use a solution.
> 
>   Simon
> 


Yeah, I’ve seen a variety of approaches to the __DATE__/__TIME__/__TIMESTEMP__ problem, including setting SOURCE_DATE_EPOCH to the mtime of the youngest file in the tarball or the timestamp of the most recent git commit, etc. (although squashing/rebasing commits tends to screw up their timestamp in that the code might be changed/updated but the timestamp remains that of the original commit not the rebased once).

No perfect solution.

Will have a look at diffoscope, thanks.


-Philip

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Comparing binaries automatically
  2017-12-19 19:09   ` Philip Prindeville
@ 2017-12-20  1:15     ` Philip Prindeville
  0 siblings, 0 replies; 4+ messages in thread
From: Philip Prindeville @ 2017-12-20  1:15 UTC (permalink / raw)
  To: Simon Richter; +Cc: binutils

[-- Attachment #1: Type: text/plain, Size: 2690 bytes --]



> On Dec 19, 2017, at 12:09 PM, Philip Prindeville <philipp_subx@redfish-solutions.com> wrote:
> 
> 
> 
>> On Dec 19, 2017, at 6:06 AM, Simon Richter <Simon.Richter@hogyros.de> wrote:
>> 
>> Hi,
>> 
>> On 19.12.2017 05:48, Philip Prindeville wrote:
>> 
>>> Anyone know of a way to do a “smart” comparison of executables (or relocatable objections, or shared libraries, or…)?
>> 
>> The Debian Reproducible Builds project has a tool called "diffoscope",
>> as well as a few years experience reproducing compilation results, so
>> this may be a good starting point:
>> 
>> https://wiki.debian.org/ReproducibleBuilds
>> 
>> They don't have a good solution for __DATE__ yet though, unless you
>> consider a compiler flag warning about its use a solution.
>> 
>>  Simon
>> 
> 
> 
> Yeah, I’ve seen a variety of approaches to the __DATE__/__TIME__/__TIMESTEMP__ problem, including setting SOURCE_DATE_EPOCH to the mtime of the youngest file in the tarball or the timestamp of the most recent git commit, etc. (although squashing/rebasing commits tends to screw up their timestamp in that the code might be changed/updated but the timestamp remains that of the original commit not the rebased once).
> 
> No perfect solution.
> 
> Will have a look at diffoscope, thanks.
> 
> 
> -Philip
> 


Hi all,

I tried doing the following:

#define __Variant __attribute__((section(“.variant”)))

…

const char __revision[] __Variant = REVISION;		/* defined by CPPFLAGS on command-line */

and then substituting out occurrences (there was just one) of REVISION for __revision, and recompiling.

The tried the following steps:

% mips-openwrt-linux-uclibc-objcopy -R .variant dnssec_test dnssec_test.stripped
% mips-openwrt-linux-uclibc-objdump -D dnssec_test > dnssec_test.lst
% mips-openwrt-linux-uclibc-objdump -D dnssec_test.stripped > dnssec_test.stripped.lst
% diff -u dnssec_test.lst dnssec_test.stripped.lstt > dnssec_test.diff

The result is attached.

The differences in the filenames I expected.  The missing .variant section was of course the desired effect.

But what I don’t get is why the symbol names for the subsequent addresses (00414010, 0041443c, 00414440, 004144f0, and 0000000) ended up disagreeing.

In the future, when comparing two different compiles, I’ll be deleting the .variant section from both of them, so the offsets should be affected similarly, and there won’t be disagreement.

But here on the same file before and after deletion, there was.  It shouldn’t be a problem but it is curious and I’d like to understand it and if possible, find a workaround.

Thanks,

-Philip




[-- Attachment #2: dnssec_test.diff --]
[-- Type: application/octet-stream, Size: 1583 bytes --]

--- dnssec_test.lst	2017-12-19 23:23:10.041489211 +0000
+++ dnssec_test.stripped.lst	2017-12-19 23:23:12.973515877 +0000
@@ -1,5 +1,5 @@
 
-dnssec_test:     file format elf32-tradbigmips
+dnssec_test.stripped:     file format elf32-tradbigmips
 
 
 Disassembly of section .interp:
@@ -4686,18 +4686,6 @@
   403ae0:	00403690 	0x403690
   403ae4:	0040369c 	0x40369c
 
-Disassembly of section .variant:
-
-00403ae8 <.variant>:
-  403ae8:	616e6472 	0x616e6472
-  403aec:	6f69642d 	0x6f69642d
-  403af0:	32303137 	andi	s0,s1,0x3137
-  403af4:	2d30392d 	sltiu	s0,t1,14637
-  403af8:	7263322d 	0x7263322d
-  403afc:	372d6763 	ori	t5,t9,0x6763
-  403b00:	64313538 	0x64313538
-  403b04:	36350000 	ori	s5,s1,0x0
-
 Disassembly of section .eh_frame:
 
 00403b08 <.eh_frame>:
@@ -4717,7 +4705,7 @@
 
 Disassembly of section .data:
 
-00414010 <_fdata>:
+00414010 <.data>:
   414010:	59000000 	blezl	t0,414014 <_fdata+0x4>
   414014:	46000000 	add.s	$f0,$f0,$f0
   414018:	46000000 	add.s	$f0,$f0,$f0
@@ -4732,12 +4720,12 @@
 
 Disassembly of section .rld_map:
 
-0041443c <__RLD_MAP>:
+0041443c <_fdata+0x42c>:
   41443c:	00000000 	nop
 
 Disassembly of section .got.plt:
 
-00414440 <.got.plt>:
+00414440 <__RLD_MAP+0x4>:
 	...
   414448:	00400a20 	0x400a20
   41444c:	00400a20 	0x400a20
@@ -4784,12 +4772,12 @@
 
 Disassembly of section .bss:
 
-004144f0 <stderr>:
+004144f0 <.bss>:
 	...
 
 Disassembly of section .comment:
 
-00000000 <.comment>:
+00000000 <stderr-0x4144f0>:
    0:	4743433a 	c1	0x143433a
    4:	2028474e 	addi	t0,at,18254
    8:	55292033 	bnel	t1,t1,80d8 <_init-0x3f88e8>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-12-20  1:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-19  4:48 Comparing binaries automatically Philip Prindeville
2017-12-19 13:06 ` Simon Richter
2017-12-19 19:09   ` Philip Prindeville
2017-12-20  1:15     ` Philip Prindeville

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).