public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* New feature: -fdump-gimple-nodes (once more, with feeling)
@ 2024-02-13 19:46 Robert Dubner
  2024-02-14  7:40 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Robert Dubner @ 2024-02-13 19:46 UTC (permalink / raw)
  To: 'GCC Mailing List'

I have not contributed to GCC before, so I am not totally sure how to go
about it.

So, I am letting you know what I want to do, so that I can get advice on a
good way to do it.  I have read https://gcc.gnu.org/contribute.html, and I
have reviewed the Gnu Coding Standards and the GCC additional coding
standards, so I have some idea of what's needed.  But there is a gulf
between theory and practice, and I am hoping for guidance.
 
Jim Lowden and I have been developing a COBOL front end for GCC.  He's
primarily been parsing the language.  It's been my task to generate the
GENERIC/GIMPLE trees for the parsed code.  We've been working at this for
a couple of years.  We have reached the point where we want to start
submitting patches for the community to evaluate.
 
I figured I would start small, where "small" means mainly one new source
code file of 1,580 lines.

When I first started trying to generate GIMPLE trees to implement
functions, it became clear to me that I needed to be able to
reverse-engineer known good trees generated by the C front end.  Oh, I
could see what other front ends were doing in their source code.  But I
didn't know what the goal was.  I wanted to see not just individual nodes,
but how they all related to each other.

There didn't seem to be any such functionality in GCC.  I found a routine
in print-tree.cc which printed out a single node, but I needed to
understand the entire tree of nodes for a function.  And I very quickly
got tired -- very tired -- of trying to figure out the relationships
between nodes, and I wanted more information than the print-tree routines
were providing.

So, I created the gcc/dump-gimple-nodes.cc source code, which implements
the dump_gimple_nodes() function, which is controlled by the new
-fdump-gimple-nodes GCC command-line option.  That option hooks into the
top of the gimplify_function_tree() function in gcc/gimplify.cc.

The dump_gimple_nodes() function does a depth-first walk of the specified
function_decl, outputting each node once in a readable format.  Each node
gets an arbitrary identifying number.  There are two output files; the
first, "func_name.nodes", is pure text.  After I got tired of endlessly
searching through the text file for the next node of interest, I created
the "func_name.nodes.html" file, which is the same information with
internal hyperlinks between the nodes.

Here are the first two nodes of a typical simple function:

***********************************This is NodeNumber0
(0x7f12e13b0d00) NodeNumber0
tree_code: function_decl
tree_code_class: tcc_declaration
base_flags: static public
type: NodeNumber1 function_type
name: NodeNumber6410 identifier_node "main"
context: NodeNumber107 translation_unit_decl "bigger.c"
source_location: bigger.c:7:5
uid: 3663
initial(bindings): NodeNumber6411 block
machine_mode: QI(15)
align: 8
warn_if_not_align: 0
pt_uid: 3663
raw_assembler_name: NodeNumber6410 identifier_node "main"
visibility: default
result: NodeNumber6412 result_decl
function(pointer): 0x7f12e135d508
arguments: NodeNumber6413 parm_decl "argc"
saved_tree(function_body): NodeNumber6417 statement_list
function_code: 0
function_flags: public no_instrument_function_entry_exit
***********************************This is NodeNumber1
(0x7f12e13b3d20) NodeNumber1
tree_code: function_type
tree_code_class: tcc_type
machine_mode: QI(15)
type: NodeNumber2 integer_type
address_space:0
size(in bits): NodeNumber55 uint128 8
size_unit(in bytes): NodeNumber12 uint64 1
uid: 1515
precision: 0
contains_placeholder: 0
align: 8
warn_if_not_align: 0
alias_set_type: -1
canonical: NodeNumber1 function_type
main_variant: NodeNumber1 function_type
values: NodeNumber6408 tree_list
***********************************

Note how even when an attribute points to another node, e.g.,

arguments: NodeNumber6413 parm_decl "argc"

the output routine goes down another level or two in an attempt to make it
more meaningful.  The attribute points just to NodeNumber6413, but the
output shows that node to be a parm_decl, and there is additional code
that recognizes that a parm_decl has an identifier_node with the value
"argc".

An example of a complete dump is available at
https://www.dubner.com/main.nodes.html.  The C source code that generated
it is available at the end of
https://cobolworx.com/pages/dump-gimple-nodes.html

I found this feature to be absolutely necessary when figuring out how
working front ends built valid GIMPLE trees for functions.  I am hopeful
other developers can see the utility.

Does this require any further discussion?  Or is my next step to start
developing the series of patches that will create the dump-gimple-nodes
source code, and that will modify Makefile.in, gimplify.cc, and common.opt
to incorporate it?

Thanks so much for any suggestions and guidance,

Bob Dubner


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: New feature: -fdump-gimple-nodes (once more, with feeling)
  2024-02-13 19:46 New feature: -fdump-gimple-nodes (once more, with feeling) Robert Dubner
@ 2024-02-14  7:40 ` Andi Kleen
  2024-02-14 14:10   ` David Malcolm
  2024-02-16 14:42   ` Florian Weimer
  2024-02-14 10:24 ` Richard Biener
  2024-02-14 16:31 ` Dimitar Dimitrov
  2 siblings, 2 replies; 7+ messages in thread
From: Andi Kleen @ 2024-02-14  7:40 UTC (permalink / raw)
  To: Robert Dubner; +Cc: 'GCC Mailing List'

Robert Dubner <rdubner@symas.com> writes:

> There didn't seem to be any such functionality in GCC.  I found a routine
> in print-tree.cc which printed out a single node, but I needed to
> understand the entire tree of nodes for a function.

FWIW the standard way to do this is to run the compiler in gdb with
the .gdbinit in the object directory, set a suitable break
point and then use pt etc to dump the trees. It prints all the fields
and you can use the gdb command line to explore further.

> ***********************************This is NodeNumber0
> (0x7f12e13b0d00) NodeNumber0
> tree_code: function_decl
> tree_code_class: tcc_declaration

My suggestion if you go this route would be to generate
some standard format like YAML or JSON that other tools
can easily parse.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: New feature: -fdump-gimple-nodes (once more, with feeling)
  2024-02-13 19:46 New feature: -fdump-gimple-nodes (once more, with feeling) Robert Dubner
  2024-02-14  7:40 ` Andi Kleen
@ 2024-02-14 10:24 ` Richard Biener
  2024-02-14 16:31 ` Dimitar Dimitrov
  2 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2024-02-14 10:24 UTC (permalink / raw)
  To: Robert Dubner; +Cc: GCC Mailing List

On Tue, Feb 13, 2024 at 8:47 PM Robert Dubner <rdubner@symas.com> wrote:
>
> I have not contributed to GCC before, so I am not totally sure how to go
> about it.
>
> So, I am letting you know what I want to do, so that I can get advice on a
> good way to do it.  I have read https://gcc.gnu.org/contribute.html, and I
> have reviewed the Gnu Coding Standards and the GCC additional coding
> standards, so I have some idea of what's needed.  But there is a gulf
> between theory and practice, and I am hoping for guidance.
>
> Jim Lowden and I have been developing a COBOL front end for GCC.  He's
> primarily been parsing the language.  It's been my task to generate the
> GENERIC/GIMPLE trees for the parsed code.  We've been working at this for
> a couple of years.  We have reached the point where we want to start
> submitting patches for the community to evaluate.
>
> I figured I would start small, where "small" means mainly one new source
> code file of 1,580 lines.
>
> When I first started trying to generate GIMPLE trees to implement
> functions, it became clear to me that I needed to be able to
> reverse-engineer known good trees generated by the C front end.  Oh, I
> could see what other front ends were doing in their source code.  But I
> didn't know what the goal was.  I wanted to see not just individual nodes,
> but how they all related to each other.
>
> There didn't seem to be any such functionality in GCC.  I found a routine
> in print-tree.cc which printed out a single node, but I needed to
> understand the entire tree of nodes for a function.  And I very quickly
> got tired -- very tired -- of trying to figure out the relationships
> between nodes, and I wanted more information than the print-tree routines
> were providing.
>
> So, I created the gcc/dump-gimple-nodes.cc source code, which implements
> the dump_gimple_nodes() function, which is controlled by the new
> -fdump-gimple-nodes GCC command-line option.  That option hooks into the
> top of the gimplify_function_tree() function in gcc/gimplify.cc.

A first comment is that you seem to dump the GENERIC graph the frontend
feeds to the gimplifier.  So this isn't GIMPLE just yet, so it possibly should
be dump_generic_nodes ().

We dump a textual representation at a similar state with -fdump-tree-original.
There's a -raw modifier that for example for C streams

;; Function main (null)
;; enabled by -tree-original

@1      statement_list   0   : @2       1   : @3
@2      bind_expr        type: @4       body: @5
@3      return_expr      type: @4       expr: @6
@4      void_type        name: @7       algn: 8
@5      statement_list
@6      modify_expr      type: @8       op 0: @9       op 1: @10
@7      type_decl        name: @11      type: @4
@8      integer_type     name: @12      size: @13      algn: 32
                         prec: 32       sign: signed   min : @14
                         max : @15
...

I didn't track down where the C frontend triggers this or what utility
it uses in the
end.  It is also somewhat frontend specific, likely before genericization.

I agree with Andi that these days sth more structured might be preferable
(but your html example might be good to parse and click through for a human)

> The dump_gimple_nodes() function does a depth-first walk of the specified
> function_decl, outputting each node once in a readable format.  Each node
> gets an arbitrary identifying number.  There are two output files; the
> first, "func_name.nodes", is pure text.  After I got tired of endlessly
> searching through the text file for the next node of interest, I created
> the "func_name.nodes.html" file, which is the same information with
> internal hyperlinks between the nodes.
>
> Here are the first two nodes of a typical simple function:
>
> ***********************************This is NodeNumber0
> (0x7f12e13b0d00) NodeNumber0
> tree_code: function_decl
> tree_code_class: tcc_declaration
> base_flags: static public
> type: NodeNumber1 function_type
> name: NodeNumber6410 identifier_node "main"
> context: NodeNumber107 translation_unit_decl "bigger.c"
> source_location: bigger.c:7:5
> uid: 3663
> initial(bindings): NodeNumber6411 block
> machine_mode: QI(15)
> align: 8
> warn_if_not_align: 0
> pt_uid: 3663
> raw_assembler_name: NodeNumber6410 identifier_node "main"
> visibility: default
> result: NodeNumber6412 result_decl
> function(pointer): 0x7f12e135d508
> arguments: NodeNumber6413 parm_decl "argc"
> saved_tree(function_body): NodeNumber6417 statement_list
> function_code: 0
> function_flags: public no_instrument_function_entry_exit
> ***********************************This is NodeNumber1
> (0x7f12e13b3d20) NodeNumber1
> tree_code: function_type
> tree_code_class: tcc_type
> machine_mode: QI(15)
> type: NodeNumber2 integer_type
> address_space:0
> size(in bits): NodeNumber55 uint128 8
> size_unit(in bytes): NodeNumber12 uint64 1
> uid: 1515
> precision: 0
> contains_placeholder: 0
> align: 8
> warn_if_not_align: 0
> alias_set_type: -1
> canonical: NodeNumber1 function_type
> main_variant: NodeNumber1 function_type
> values: NodeNumber6408 tree_list
> ***********************************
>
> Note how even when an attribute points to another node, e.g.,
>
> arguments: NodeNumber6413 parm_decl "argc"
>
> the output routine goes down another level or two in an attempt to make it
> more meaningful.  The attribute points just to NodeNumber6413, but the
> output shows that node to be a parm_decl, and there is additional code
> that recognizes that a parm_decl has an identifier_node with the value
> "argc".
>
> An example of a complete dump is available at
> https://www.dubner.com/main.nodes.html.  The C source code that generated
> it is available at the end of
> https://cobolworx.com/pages/dump-gimple-nodes.html
>
> I found this feature to be absolutely necessary when figuring out how
> working front ends built valid GIMPLE trees for functions.  I am hopeful
> other developers can see the utility.
>
> Does this require any further discussion?  Or is my next step to start
> developing the series of patches that will create the dump-gimple-nodes
> source code, and that will modify Makefile.in, gimplify.cc, and common.opt
> to incorporate it?
>
> Thanks so much for any suggestions and guidance,
>
> Bob Dubner
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: New feature: -fdump-gimple-nodes (once more, with feeling)
  2024-02-14  7:40 ` Andi Kleen
@ 2024-02-14 14:10   ` David Malcolm
  2024-02-16 14:42   ` Florian Weimer
  1 sibling, 0 replies; 7+ messages in thread
From: David Malcolm @ 2024-02-14 14:10 UTC (permalink / raw)
  To: Andi Kleen, Robert Dubner; +Cc: 'GCC Mailing List'

On Tue, 2024-02-13 at 23:40 -0800, Andi Kleen via Gcc wrote:
> Robert Dubner <rdubner@symas.com> writes:
> 
> > There didn't seem to be any such functionality in GCC.  I found a
> > routine
> > in print-tree.cc which printed out a single node, but I needed to
> > understand the entire tree of nodes for a function.
> 
> FWIW the standard way to do this is to run the compiler in gdb with
> the .gdbinit in the object directory, set a suitable break
> point and then use pt etc to dump the trees. It prints all the fields
> and you can use the gdb command line to explore further.
> 
> > ***********************************This is NodeNumber0
> > (0x7f12e13b0d00) NodeNumber0
> > tree_code: function_decl
> > tree_code_class: tcc_declaration
> 
> My suggestion if you go this route would be to generate
> some standard format like YAML or JSON that other tools
> can easily parse.

I'd love it if we had a JSON output for our IR.  FWIW, as of r14-6228-
g3bd8241a1f1982 our JSON output routines can nicely format the
generated JSON in a way that I find very readable in debugging (and I'm
using this when debugging the analyzer).

Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: New feature: -fdump-gimple-nodes (once more, with feeling)
  2024-02-13 19:46 New feature: -fdump-gimple-nodes (once more, with feeling) Robert Dubner
  2024-02-14  7:40 ` Andi Kleen
  2024-02-14 10:24 ` Richard Biener
@ 2024-02-14 16:31 ` Dimitar Dimitrov
  2024-02-14 21:41   ` Robert Dubner
  2 siblings, 1 reply; 7+ messages in thread
From: Dimitar Dimitrov @ 2024-02-14 16:31 UTC (permalink / raw)
  To: Robert Dubner; +Cc: 'GCC Mailing List'

On Tue, Feb 13, 2024 at 01:46:11PM -0600, Robert Dubner wrote:
...
> An example of a complete dump is available at
> https://www.dubner.com/main.nodes.html.  The C source code that generated
> it is available at the end of
> https://cobolworx.com/pages/dump-gimple-nodes.html
> 

Hyperlinked text is useful.  But I would love a graphical visualization
even more, e.g. via either Graphviz or Plantuml.

Regards,
Dimitar

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: New feature: -fdump-gimple-nodes (once more, with feeling)
  2024-02-14 16:31 ` Dimitar Dimitrov
@ 2024-02-14 21:41   ` Robert Dubner
  0 siblings, 0 replies; 7+ messages in thread
From: Robert Dubner @ 2024-02-14 21:41 UTC (permalink / raw)
  To: Dimitar Dimitrov; +Cc: GCC Mailing List

I have thought about a graphical representation more than once.  Heck, the
connections between nodes is one of the things I needed to know in the
first place.  And certainly the information necessary is all there in the
output I generate; I have drawn by hand pieces of the tree connections
many times.

But it doesn't seem to me to scale.

A candidate for the absolute minimally sized executable program one can
write in C and compile with GCC is

      void main(){}  /* I didn't say it would do anything *useful* */

The generic tree for that program has in excess of fifty nodes.

	#include <stdio.h>
      void main(){printf("hello, world\n");}

has in excess of 4,800 nodes because stdio.h was brought in.  Without a
plotter that draws on the sides of buildings or on football pitches (you'd
sit in the stands with binoculars to read the results), it's difficult for
me to envision how the graphical representation could be useful.  I don't
claim my imagination should be the limiting factor.  On the other hand, I
don't think the compiler should be generating that directly, anyway.

(I've managed to distract myself.  Now I want to build a wheeled robot
that wanders around a football pitch drawing with colored chalk dust.)

My current takeaway from these responses -- thank you so much!,
incidentally -- is that whatever utility I have created here would be
enhanced by JSON (one and a half "votes", so far) or YAML (half a vote)
output.

Once the tree were available in JSON, then separate utilities to take that
output and display it graphically would be straightforward.

Okay then.  I'll change the naming from "*gimple*" to "*generic*" as more
accurate, and I'll generate JSON in addition to the other two files.

Thanks again.

-----Original Message-----
From: Dimitar Dimitrov <dimitar@dinux.eu> 
Sent: Wednesday, February 14, 2024 11:31
To: Robert Dubner <rdubner@symas.com>
Cc: 'GCC Mailing List' <gcc@gcc.gnu.org>
Subject: Re: New feature: -fdump-gimple-nodes (once more, with feeling)

On Tue, Feb 13, 2024 at 01:46:11PM -0600, Robert Dubner wrote:
...
> An example of a complete dump is available at 
> https://www.dubner.com/main.nodes.html.  The C source code that 
> generated it is available at the end of 
> https://cobolworx.com/pages/dump-gimple-nodes.html
> 

Hyperlinked text is useful.  But I would love a graphical visualization
even more, e.g. via either Graphviz or Plantuml.

Regards,
Dimitar

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: New feature: -fdump-gimple-nodes (once more, with feeling)
  2024-02-14  7:40 ` Andi Kleen
  2024-02-14 14:10   ` David Malcolm
@ 2024-02-16 14:42   ` Florian Weimer
  1 sibling, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2024-02-16 14:42 UTC (permalink / raw)
  To: Andi Kleen via Gcc; +Cc: Robert Dubner, Andi Kleen

* Andi Kleen via Gcc:

>> ***********************************This is NodeNumber0
>> (0x7f12e13b0d00) NodeNumber0
>> tree_code: function_decl
>> tree_code_class: tcc_declaration
>
> My suggestion if you go this route would be to generate
> some standard format like YAML or JSON that other tools
> can easily parse.

And C++ code that emits equivalent GENERIC or GIMPLE, perhaps?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-02-16 14:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-13 19:46 New feature: -fdump-gimple-nodes (once more, with feeling) Robert Dubner
2024-02-14  7:40 ` Andi Kleen
2024-02-14 14:10   ` David Malcolm
2024-02-16 14:42   ` Florian Weimer
2024-02-14 10:24 ` Richard Biener
2024-02-14 16:31 ` Dimitar Dimitrov
2024-02-14 21:41   ` Robert Dubner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).