* [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
@ 2011-06-08 2:27 Sriraman Tallam
2011-06-08 16:30 ` Sriraman Tallam
0 siblings, 1 reply; 4+ messages in thread
From: Sriraman Tallam @ 2011-06-08 2:27 UTC (permalink / raw)
To: reply, gcc-patches
Patch Description:
=================
I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
****************************
.section .note.callgraph.text._Z3foov,"",@progbits
.string "Function _Z3foov"
.string "_Z3barv"
.string "100"
.string "_Z3zapv"
.string "50"
***************************
For now, this is for google/main. I will re-submit for review to trunk along with data layout.
Google ref 41940
2011-06-07 Sriraman Tallam <tmsriram@google.com>
* doc/invoke.texi: document option -fcallgraph-profiles-sections.
* final.c (dump_cgraph_profiles): New function.
(rest_of_handle_final): Create new section '.note.callgraph.text'
with compiler flag -fcallgraph-profiles-sections
* common.opt: New option -fcallgraph-profiles-sections.
* params.def (DEFPARAM): New param
PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 174789)
+++ doc/invoke.texi (working copy)
@@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
-falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
-fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
-fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
--fcheck-data-deps -fclone-hot-version-paths @gol
+-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
-fcombine-stack-adjustments -fconserve-stack @gol
-fcompare-elim -fcprop-registers -fcrossjumping @gol
-fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
@@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
@opindex fripa-verbose
Enable printing of verbose information about dynamic inter-procedural optimizations.
This is used in conjunction with the @option{-fripa}.
+
+@item -fcallgraph-profiles-sections
+@opindex fcallgraph-profiles-sections
+Emit call graph edge profile counts in .note.callgraph.text sections. This is
+used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
+section is created for each function. This section lists every callee and the
+number of times it is called. The params variable
+"note-cgraph-section-edge-threshold" can be used to only list edges above a
+certain threshold.
@end table
The following options control compiler behavior regarding floating
Index: final.c
===================================================================
--- final.c (revision 174789)
+++ final.c (working copy)
@@ -4321,13 +4321,37 @@ debug_free_queue (void)
symbol_queue_size = 0;
}
}
-\f
+
+/* List the call graph profiled edges whise value is greater than
+ PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
+ ".note.callgraph.text" section. */
+static void
+dump_cgraph_profiles (void)
+{
+ struct cgraph_node *node = cgraph_node (current_function_decl);
+ struct cgraph_edge *e;
+ struct cgraph_node *callee;
+
+ for (e = node->callees; e != NULL; e = e->next_callee)
+ {
+ if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
+ continue;
+ callee = e->callee;
+ fprintf (asm_out_file, "\t.string \"%s\"\n",
+ IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
+ fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
+ e->count);
+ }
+}
+
/* Turn the RTL into assembly. */
static unsigned int
rest_of_handle_final (void)
{
rtx x;
const char *fnname;
+ char *profile_fnname;
+ unsigned int flags;
/* Get the function's name, as described by its RTL. This may be
different from the DECL_NAME name used in the source file. */
@@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
decl_fini_priority_lookup
(current_function_decl));
+
+ /* With -fcgraph-section, add ".note.callgraph.text" section for storing
+ profiling information. */
+ if (flag_callgraph_profiles_sections
+ && flag_profile_use
+ && cgraph_node (current_function_decl) != NULL)
+ {
+ flags = SECTION_DEBUG;
+ asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
+ switch_to_section (get_section (profile_fnname, flags, NULL));
+ fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
+ dump_cgraph_profiles ();
+ free (profile_fnname);
+ }
+
return 0;
}
Index: common.opt
===================================================================
--- common.opt (revision 174789)
+++ common.opt (working copy)
@@ -907,6 +907,10 @@ fcaller-saves
Common Report Var(flag_caller_saves) Optimization
Save registers around function calls
+fcallgraph-profiles-sections
+Common Report Var(flag_callgraph_profiles_sections) Init(0)
+Generate .note.callgraph.text sections listing callees and edge counts.
+
fcheck-data-deps
Common Report Var(flag_check_data_deps)
Compare the results of several data dependence analyzers.
Index: params.def
===================================================================
--- params.def (revision 174789)
+++ params.def (working copy)
@@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
"maximum length of the call graph path to be cloned "
"while doing multiversioning",
2, 0, 5)
+
+/* Only output those call graph edges in .note.callgraph.text sections
+ whose count is greater than this value. */
+DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
+ "note-cgraph-section-edge-threshold",
+ "minimum call graph edge count for inclusion in "
+ ".note.callgraph.text section",
+ 0, 0, 0)
+
/*
Local variables:
mode:c
--
This patch is available for review at http://codereview.appspot.com/4591045
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
2011-06-08 2:27 [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045) Sriraman Tallam
@ 2011-06-08 16:30 ` Sriraman Tallam
2011-06-08 16:37 ` Xinliang David Li
0 siblings, 1 reply; 4+ messages in thread
From: Sriraman Tallam @ 2011-06-08 16:30 UTC (permalink / raw)
To: reply, GCC Patches, Xinliang David Li
+davidxl
On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Patch Description:
> =================
>
> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
>
> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
>
> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
>
> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
> ****************************
> .section .note.callgraph.text._Z3foov,"",@progbits
> .string "Function _Z3foov"
> .string "_Z3barv"
> .string "100"
> .string "_Z3zapv"
> .string "50"
> ***************************
>
> For now, this is for google/main. I will re-submit for review to trunk along with data layout.
>
> Google ref 41940
>
> 2011-06-07 Sriraman Tallam <tmsriram@google.com>
>
> * doc/invoke.texi: document option -fcallgraph-profiles-sections.
> * final.c (dump_cgraph_profiles): New function.
> (rest_of_handle_final): Create new section '.note.callgraph.text'
> with compiler flag -fcallgraph-profiles-sections
> * common.opt: New option -fcallgraph-profiles-sections.
> * params.def (DEFPARAM): New param
> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
>
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi (revision 174789)
> +++ doc/invoke.texi (working copy)
> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
> --fcheck-data-deps -fclone-hot-version-paths @gol
> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
> -fcombine-stack-adjustments -fconserve-stack @gol
> -fcompare-elim -fcprop-registers -fcrossjumping @gol
> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
> @opindex fripa-verbose
> Enable printing of verbose information about dynamic inter-procedural optimizations.
> This is used in conjunction with the @option{-fripa}.
> +
> +@item -fcallgraph-profiles-sections
> +@opindex fcallgraph-profiles-sections
> +Emit call graph edge profile counts in .note.callgraph.text sections. This is
> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
> +section is created for each function. This section lists every callee and the
> +number of times it is called. The params variable
> +"note-cgraph-section-edge-threshold" can be used to only list edges above a
> +certain threshold.
> @end table
>
> The following options control compiler behavior regarding floating
> Index: final.c
> ===================================================================
> --- final.c (revision 174789)
> +++ final.c (working copy)
> @@ -4321,13 +4321,37 @@ debug_free_queue (void)
> symbol_queue_size = 0;
> }
> }
> -
> +
> +/* List the call graph profiled edges whise value is greater than
> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
> + ".note.callgraph.text" section. */
> +static void
> +dump_cgraph_profiles (void)
> +{
> + struct cgraph_node *node = cgraph_node (current_function_decl);
> + struct cgraph_edge *e;
> + struct cgraph_node *callee;
> +
> + for (e = node->callees; e != NULL; e = e->next_callee)
> + {
> + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
> + continue;
> + callee = e->callee;
> + fprintf (asm_out_file, "\t.string \"%s\"\n",
> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
> + e->count);
> + }
> +}
> +
> /* Turn the RTL into assembly. */
> static unsigned int
> rest_of_handle_final (void)
> {
> rtx x;
> const char *fnname;
> + char *profile_fnname;
> + unsigned int flags;
>
> /* Get the function's name, as described by its RTL. This may be
> different from the DECL_NAME name used in the source file. */
> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
> decl_fini_priority_lookup
> (current_function_decl));
> +
> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing
> + profiling information. */
> + if (flag_callgraph_profiles_sections
> + && flag_profile_use
> + && cgraph_node (current_function_decl) != NULL)
> + {
> + flags = SECTION_DEBUG;
> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
> + switch_to_section (get_section (profile_fnname, flags, NULL));
> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
> + dump_cgraph_profiles ();
> + free (profile_fnname);
> + }
> +
> return 0;
> }
>
> Index: common.opt
> ===================================================================
> --- common.opt (revision 174789)
> +++ common.opt (working copy)
> @@ -907,6 +907,10 @@ fcaller-saves
> Common Report Var(flag_caller_saves) Optimization
> Save registers around function calls
>
> +fcallgraph-profiles-sections
> +Common Report Var(flag_callgraph_profiles_sections) Init(0)
> +Generate .note.callgraph.text sections listing callees and edge counts.
> +
> fcheck-data-deps
> Common Report Var(flag_check_data_deps)
> Compare the results of several data dependence analyzers.
> Index: params.def
> ===================================================================
> --- params.def (revision 174789)
> +++ params.def (working copy)
> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
> "maximum length of the call graph path to be cloned "
> "while doing multiversioning",
> 2, 0, 5)
> +
> +/* Only output those call graph edges in .note.callgraph.text sections
> + whose count is greater than this value. */
> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
> + "note-cgraph-section-edge-threshold",
> + "minimum call graph edge count for inclusion in "
> + ".note.callgraph.text section",
> + 0, 0, 0)
> +
> /*
> Local variables:
> mode:c
>
> --
> This patch is available for review at http://codereview.appspot.com/4591045
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
2011-06-08 16:30 ` Sriraman Tallam
@ 2011-06-08 16:37 ` Xinliang David Li
2011-06-08 16:41 ` Sriraman Tallam
0 siblings, 1 reply; 4+ messages in thread
From: Xinliang David Li @ 2011-06-08 16:37 UTC (permalink / raw)
To: Sriraman Tallam; +Cc: reply, GCC Patches
ok for google/main.
David
On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> +davidxl
>
> On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Patch Description:
>> =================
>>
>> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
>>
>> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
>>
>> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
>>
>> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
>> ****************************
>> .section .note.callgraph.text._Z3foov,"",@progbits
>> .string "Function _Z3foov"
>> .string "_Z3barv"
>> .string "100"
>> .string "_Z3zapv"
>> .string "50"
>> ***************************
>>
>> For now, this is for google/main. I will re-submit for review to trunk along with data layout.
>>
>> Google ref 41940
>>
>> 2011-06-07 Sriraman Tallam <tmsriram@google.com>
>>
>> * doc/invoke.texi: document option -fcallgraph-profiles-sections.
>> * final.c (dump_cgraph_profiles): New function.
>> (rest_of_handle_final): Create new section '.note.callgraph.text'
>> with compiler flag -fcallgraph-profiles-sections
>> * common.opt: New option -fcallgraph-profiles-sections.
>> * params.def (DEFPARAM): New param
>> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
>>
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi (revision 174789)
>> +++ doc/invoke.texi (working copy)
>> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
>> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
>> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
>> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
>> --fcheck-data-deps -fclone-hot-version-paths @gol
>> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
>> -fcombine-stack-adjustments -fconserve-stack @gol
>> -fcompare-elim -fcprop-registers -fcrossjumping @gol
>> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
>> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
>> @opindex fripa-verbose
>> Enable printing of verbose information about dynamic inter-procedural optimizations.
>> This is used in conjunction with the @option{-fripa}.
>> +
>> +@item -fcallgraph-profiles-sections
>> +@opindex fcallgraph-profiles-sections
>> +Emit call graph edge profile counts in .note.callgraph.text sections. This is
>> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
>> +section is created for each function. This section lists every callee and the
>> +number of times it is called. The params variable
>> +"note-cgraph-section-edge-threshold" can be used to only list edges above a
>> +certain threshold.
>> @end table
>>
>> The following options control compiler behavior regarding floating
>> Index: final.c
>> ===================================================================
>> --- final.c (revision 174789)
>> +++ final.c (working copy)
>> @@ -4321,13 +4321,37 @@ debug_free_queue (void)
>> symbol_queue_size = 0;
>> }
>> }
>> -
>> +
>> +/* List the call graph profiled edges whise value is greater than
>> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
>> + ".note.callgraph.text" section. */
>> +static void
>> +dump_cgraph_profiles (void)
>> +{
>> + struct cgraph_node *node = cgraph_node (current_function_decl);
>> + struct cgraph_edge *e;
>> + struct cgraph_node *callee;
>> +
>> + for (e = node->callees; e != NULL; e = e->next_callee)
>> + {
>> + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
>> + continue;
>> + callee = e->callee;
>> + fprintf (asm_out_file, "\t.string \"%s\"\n",
>> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
>> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
>> + e->count);
>> + }
>> +}
>> +
>> /* Turn the RTL into assembly. */
>> static unsigned int
>> rest_of_handle_final (void)
>> {
>> rtx x;
>> const char *fnname;
>> + char *profile_fnname;
>> + unsigned int flags;
>>
>> /* Get the function's name, as described by its RTL. This may be
>> different from the DECL_NAME name used in the source file. */
>> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
>> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
>> decl_fini_priority_lookup
>> (current_function_decl));
>> +
>> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing
>> + profiling information. */
>> + if (flag_callgraph_profiles_sections
>> + && flag_profile_use
>> + && cgraph_node (current_function_decl) != NULL)
>> + {
>> + flags = SECTION_DEBUG;
>> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
>> + switch_to_section (get_section (profile_fnname, flags, NULL));
>> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
>> + dump_cgraph_profiles ();
>> + free (profile_fnname);
>> + }
>> +
>> return 0;
>> }
>>
>> Index: common.opt
>> ===================================================================
>> --- common.opt (revision 174789)
>> +++ common.opt (working copy)
>> @@ -907,6 +907,10 @@ fcaller-saves
>> Common Report Var(flag_caller_saves) Optimization
>> Save registers around function calls
>>
>> +fcallgraph-profiles-sections
>> +Common Report Var(flag_callgraph_profiles_sections) Init(0)
>> +Generate .note.callgraph.text sections listing callees and edge counts.
>> +
>> fcheck-data-deps
>> Common Report Var(flag_check_data_deps)
>> Compare the results of several data dependence analyzers.
>> Index: params.def
>> ===================================================================
>> --- params.def (revision 174789)
>> +++ params.def (working copy)
>> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
>> "maximum length of the call graph path to be cloned "
>> "while doing multiversioning",
>> 2, 0, 5)
>> +
>> +/* Only output those call graph edges in .note.callgraph.text sections
>> + whose count is greater than this value. */
>> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
>> + "note-cgraph-section-edge-threshold",
>> + "minimum call graph edge count for inclusion in "
>> + ".note.callgraph.text section",
>> + 0, 0, 0)
>> +
>> /*
>> Local variables:
>> mode:c
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/4591045
>>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
2011-06-08 16:37 ` Xinliang David Li
@ 2011-06-08 16:41 ` Sriraman Tallam
0 siblings, 0 replies; 4+ messages in thread
From: Sriraman Tallam @ 2011-06-08 16:41 UTC (permalink / raw)
To: Xinliang David Li; +Cc: reply, GCC Patches
On Wed, Jun 8, 2011 at 9:16 AM, Xinliang David Li <davidxl@google.com> wrote:
> ok for google/main.
Thanks, the patch is now committed.
>
> David
>
> On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> +davidxl
>>
>> On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Patch Description:
>>> =================
>>>
>>> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
>>>
>>> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
>>>
>>> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
>>>
>>> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
>>> ****************************
>>> .section .note.callgraph.text._Z3foov,"",@progbits
>>> .string "Function _Z3foov"
>>> .string "_Z3barv"
>>> .string "100"
>>> .string "_Z3zapv"
>>> .string "50"
>>> ***************************
>>>
>>> For now, this is for google/main. I will re-submit for review to trunk along with data layout.
>>>
>>> Google ref 41940
>>>
>>> 2011-06-07 Sriraman Tallam <tmsriram@google.com>
>>>
>>> * doc/invoke.texi: document option -fcallgraph-profiles-sections.
>>> * final.c (dump_cgraph_profiles): New function.
>>> (rest_of_handle_final): Create new section '.note.callgraph.text'
>>> with compiler flag -fcallgraph-profiles-sections
>>> * common.opt: New option -fcallgraph-profiles-sections.
>>> * params.def (DEFPARAM): New param
>>> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
>>>
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi (revision 174789)
>>> +++ doc/invoke.texi (working copy)
>>> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
>>> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
>>> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
>>> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
>>> --fcheck-data-deps -fclone-hot-version-paths @gol
>>> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
>>> -fcombine-stack-adjustments -fconserve-stack @gol
>>> -fcompare-elim -fcprop-registers -fcrossjumping @gol
>>> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
>>> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
>>> @opindex fripa-verbose
>>> Enable printing of verbose information about dynamic inter-procedural optimizations.
>>> This is used in conjunction with the @option{-fripa}.
>>> +
>>> +@item -fcallgraph-profiles-sections
>>> +@opindex fcallgraph-profiles-sections
>>> +Emit call graph edge profile counts in .note.callgraph.text sections. This is
>>> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
>>> +section is created for each function. This section lists every callee and the
>>> +number of times it is called. The params variable
>>> +"note-cgraph-section-edge-threshold" can be used to only list edges above a
>>> +certain threshold.
>>> @end table
>>>
>>> The following options control compiler behavior regarding floating
>>> Index: final.c
>>> ===================================================================
>>> --- final.c (revision 174789)
>>> +++ final.c (working copy)
>>> @@ -4321,13 +4321,37 @@ debug_free_queue (void)
>>> symbol_queue_size = 0;
>>> }
>>> }
>>> -
>>> +
>>> +/* List the call graph profiled edges whise value is greater than
>>> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
>>> + ".note.callgraph.text" section. */
>>> +static void
>>> +dump_cgraph_profiles (void)
>>> +{
>>> + struct cgraph_node *node = cgraph_node (current_function_decl);
>>> + struct cgraph_edge *e;
>>> + struct cgraph_node *callee;
>>> +
>>> + for (e = node->callees; e != NULL; e = e->next_callee)
>>> + {
>>> + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
>>> + continue;
>>> + callee = e->callee;
>>> + fprintf (asm_out_file, "\t.string \"%s\"\n",
>>> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
>>> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
>>> + e->count);
>>> + }
>>> +}
>>> +
>>> /* Turn the RTL into assembly. */
>>> static unsigned int
>>> rest_of_handle_final (void)
>>> {
>>> rtx x;
>>> const char *fnname;
>>> + char *profile_fnname;
>>> + unsigned int flags;
>>>
>>> /* Get the function's name, as described by its RTL. This may be
>>> different from the DECL_NAME name used in the source file. */
>>> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
>>> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
>>> decl_fini_priority_lookup
>>> (current_function_decl));
>>> +
>>> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing
>>> + profiling information. */
>>> + if (flag_callgraph_profiles_sections
>>> + && flag_profile_use
>>> + && cgraph_node (current_function_decl) != NULL)
>>> + {
>>> + flags = SECTION_DEBUG;
>>> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
>>> + switch_to_section (get_section (profile_fnname, flags, NULL));
>>> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
>>> + dump_cgraph_profiles ();
>>> + free (profile_fnname);
>>> + }
>>> +
>>> return 0;
>>> }
>>>
>>> Index: common.opt
>>> ===================================================================
>>> --- common.opt (revision 174789)
>>> +++ common.opt (working copy)
>>> @@ -907,6 +907,10 @@ fcaller-saves
>>> Common Report Var(flag_caller_saves) Optimization
>>> Save registers around function calls
>>>
>>> +fcallgraph-profiles-sections
>>> +Common Report Var(flag_callgraph_profiles_sections) Init(0)
>>> +Generate .note.callgraph.text sections listing callees and edge counts.
>>> +
>>> fcheck-data-deps
>>> Common Report Var(flag_check_data_deps)
>>> Compare the results of several data dependence analyzers.
>>> Index: params.def
>>> ===================================================================
>>> --- params.def (revision 174789)
>>> +++ params.def (working copy)
>>> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
>>> "maximum length of the call graph path to be cloned "
>>> "while doing multiversioning",
>>> 2, 0, 5)
>>> +
>>> +/* Only output those call graph edges in .note.callgraph.text sections
>>> + whose count is greater than this value. */
>>> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
>>> + "note-cgraph-section-edge-threshold",
>>> + "minimum call graph edge count for inclusion in "
>>> + ".note.callgraph.text section",
>>> + 0, 0, 0)
>>> +
>>> /*
>>> Local variables:
>>> mode:c
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/4591045
>>>
>>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-06-08 16:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-08 2:27 [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045) Sriraman Tallam
2011-06-08 16:30 ` Sriraman Tallam
2011-06-08 16:37 ` Xinliang David Li
2011-06-08 16:41 ` Sriraman Tallam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).