Machine Readable Format options? Try 2 without autoattachment.

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* Machine Readable Format options? Try 2 without autoattachment.
@ 2020-04-10 23:29 John Carter
  2020-04-17 14:46 ` Nick Clifton
  0 siblings, 1 reply; 2+ messages in thread
From: John Carter @ 2020-04-10 23:29 UTC (permalink / raw)
  To: binutils

I have been using binutils for various datamining tasks for decades now....

I build multiple products for multiple tool chains and multiple
platforms from the same code base.... carved up by #if spaghetti and
cunning linking.

The one and only thing that truly knows what ends up being called by
what and is where, is the preprocessor/compiler / linker.

Binutils, via tools like nm and objdump can tell me what the compiler
/linker did....

But the output formats are designed for human consumption and has,
ahhh, umfeatures that make automated parsing and querying hard.

For example, it's quite common in the output formats to omit a field,
forcing you to use heuristics to navigate past the corresponding
spaces to pick up the next column.

Also, the formats are informally specified... not something you really
want to rely on.

However, the world is awash with well defined human readable machine
formats that could trivially be used. (eg. json, yaml, ...)

Some of the binutil tools have --format=XXX settings (typically used
for compatibility with legacy standards).

It would be trivial to add a --format=json option.

Ideally all and every bit of information that can be obtained from the
binutil tools should be available in a machine readable form... (a
well factored / deisgned sqlite db would be a dream...)

...but in practice I'm usually parsing objdump --syms, or nm --extern
or even occasionally objdump -d to pull call graph and definition -
reference graph information or objdump --dwarf=info to pull macro
definitions info.

It's always a sore point with me that the first step in datamining elf
data is always to write a one-off custom somewhat kludgy parser to
read the output format.

However I'm clearly not alone in this desire, as witnessed by this
llvm-dev mail thread...

https://groups.google.com/d/topic/llvm-dev/U-sTsZB-6ls/discussion

Is there any initiative afoot to producing machine readable output formats
such as csv or json or yaml or...?

My typical destination for these activities is either a ruby script or
sqlite.

Of course, a standalone tool could do this, but for most tasks it's
just adding to the preexisting list of formatters (eg. bsd / posix /
sysv / and now add json)

I did look at elfutils, I wasn't aware of them until Nick mentioned
them, but they seem way behind in all aspects, including output
formats.

Would the maintainers object to a pull requests that added such a feature
(probably easier than doing half baked parsers)?

What would the preferred output format be?

What would the preferred command line interface be (eg. Instead of the
usual bsd / posix / sysv ... options on --format= add --format=csv or
something?)

In my ideal  dream universe there would be a "convert everything elf and
dwarf knows about this large collection of files into a well designed
relational data model in a sqlite db" switch.

But that is probably a large step too far.

Arguably this exists, it's called libbfd, libdwarf and libelf....but
those have to cope with the many tentacled horrors of decades of
legacy systems and cpus, which is why everyone uses the binutils tools
not the raw naked bfd.

Example use case just for inspiration:

Supposing you have a large body (>2000) object files * 4 tool chains *
20 products all compiled * 1000 unit tests all
compiled--ffunction-sections --ddata-sections and linked
--gc-sections...

Tell me which functions / macros / are in the source but never used anywhere?

Believe me, whenever I do this class of analysis on a mature system,
you'll be surprised out how much code I delete, and how much easier it
is to refactor the remaining code...

Thanks!

John

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Machine Readable Format options? Try 2 without autoattachment.
  2020-04-10 23:29 Machine Readable Format options? Try 2 without autoattachment John Carter
@ 2020-04-17 14:46 ` Nick Clifton
  0 siblings, 0 replies; 2+ messages in thread
From: Nick Clifton @ 2020-04-17 14:46 UTC (permalink / raw)
  To: John Carter, binutils

Hi John,

> However, the world is awash with well defined human readable machine
> formats that could trivially be used. (eg. json, yaml, ...)

Well "trivial" might not be the right word here.  Especially for something
like disassemblies which tend to be quite complex to produce.

Talking of which, did you see the recent addition of ascii art flow
graphs to the disassembler's output ?  Maybe hooking into this feature
will allow the production of the kind of output that you are seeking.

> Ideally all and every bit of information that can be obtained from the
> binutil tools should be available in a machine readable form... (a
> well factored / deisgned sqlite db would be a dream...)

I am not familiar with these formats, so please excuse me if the
following question is silly - are these formats self documenting ?
By which I mean is it necessary to come up with a specification for
the contents of the documents and then code the tools to conform
to this specification ?  Or can the tools just dump bits of information
piecemeal into an output file ?

> https://groups.google.com/d/topic/llvm-dev/U-sTsZB-6ls/discussion
> 
> Is there any initiative afoot to producing machine readable output formats
> such as csv or json or yaml or...?

Well modulo that discussion thread - no.  Of course we are always willing
to review contributions that work towards this goal.  But I can say from 
a personal point of view that unless a Red Hat customer asks for this
feature, I am not going to be able to have the time to implement it.  :-(

> I did look at elfutils, I wasn't aware of them until Nick mentioned
> them, but they seem way behind in all aspects, including output
> formats.

Really - I had always thought they were up to date, if not leading the field.
But maybe they do not provide the kind of functionality that you really need.

> Would the maintainers object to a pull requests that added such a feature
> (probably easier than doing half baked parsers)?

Nope - we will happily review anything.  I assume that you are aware
of the requirement for an FSF copyright assignment notice though ?

> What would the preferred output format be?

Your choice.  If you are writing, you get to choose the format.

> What would the preferred command line interface be (eg. Instead of the
> usual bsd / posix / sysv ... options on --format= add --format=csv or
> something?)

For tools that have the --format option then adding a new alternative
to the list of formats makes sense.  For tools which do not already
have this option then I would suggest adding a slightly longer option
name:  --output-format=<json/sqlite/ascii>.

> Supposing you have a large body (>2000) object files * 4 tool chains *
> 20 products all compiled * 1000 unit tests all
> compiled--ffunction-sections --ddata-sections and linked
> --gc-sections...
> 
> Tell me which functions / macros / are in the source but never used anywhere?

Link them all with --print-gc-sections added to the command line and
capture the output...  Well that should work for functions anyway.  I
think that for unused macros you are going to need compiler help, not
linker help.

Cheers
  Nick

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-04-17 14:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-10 23:29 Machine Readable Format options? Try 2 without autoattachment John Carter
2020-04-17 14:46 ` Nick Clifton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).