public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC PATCH] Add inlining growth bias flag
@ 2019-09-06 10:59 Graham Markall
  2019-09-09 18:55 ` Jeff Law
  2019-09-12 10:08 ` Richard Biener
  0 siblings, 2 replies; 7+ messages in thread
From: Graham Markall @ 2019-09-06 10:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: ofer.shinaar, nidal.faour, Graham Markall

This patch is an RFC to invite comments as to whether the approach
to solving the problem is a suitable one for upstream GCC, or whether
there are alternative approaches that would be recommended.

Motivation
----------

We have observed that in some cases the estimation of callee growth for
inlining particular functions can be tuned for better overall code size
with particular programs on particular targets. Although modification of
the heuristics to make a general improvement is a difficult problem to
tackle, simply biasing the growth by a fixed amount can lead to
improvements in code size within the context of a particular program.

This has first been tested on a proprietary program, where setting the
growth bias to -2 resulted in a saving of 1.35% in the text section size
(62396 bytes as opposed to 63252 bytes). Using the Embench suite (
https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also
shows that adjusting the inline growth estimate carefully can also
reduce code size. Results presented here are percentages relative to the
Embench reference platform (which is the standard way of presenting
Embench results) for values of the bias from -2 to 2:

Benchmark	-2	-1	0	1	2
aha-mont64	 99.05	 99.05	 99.05	 99.05	 99.05
crc32		100.00	100.00	100.00	100.00	100.00
cubic		 94.66	 94.66	 94.66	 94.66	 94.66
edn		100.00	100.00	100.00	100.00	100.00
huffbench	 99.88	 99.88	 99.88	 99.88	 99.88
matmult-int	100.00	100.00	100.00	102.86	102.86
minver		 97.96	 97.96	 97.96	106.88	106.88
nbody		 98.87	 98.87	 98.87	 98.87	 98.87
nettle-aes	 99.31	 99.10	 99.10	 99.10	 99.10
nettle-sha256	 99.89	 99.89	 99.89	 99.89	 99.89
nsichneu	100.00	100.00	100.00	100.00	100.00
picojpeg	102.56	102.54	 99.73	 99.38	 99.28
qrduino		 99.70	 99.70	 99.90	 99.90	 99.90
sglib-combined	 95.52	100.00	100.00	100.43	101.12
slre		100.66	100.66	 99.09	 98.85	 98.85
st		 96.36	 96.36	 96.36	 96.82	 96.82
statemate	100.00	100.00	100.00	100.00	100.00
ud		100.00	100.00	100.00	100.00	100.00
wikisort	 99.48	 99.48	 99.48	 99.48	 99.48
Mean		 99.14	 99.36	 99.15	 99.77	 99.80

In most cases, leaving the bias at 0 (unmodified) produces the smallest
code. However, there are some cases where an alternative value prevails,
The "Best Diff" column shows the reduction in size compared to leaving
the bias at 0, for cases where changing it yielded an improvement:

Benchmark	Best	Worst   Best diff
aha-mont64	0	0
crc32		0	0
cubic		0	0
edn		0	0
huffbench	0	0
matmult-int	0	1 / 2
minver		0	1 / 2
nbody		0	0
nettle-aes	0	-2
nettle-sha256	0	0
nsichneu	0	0
picojpeg	2	-2      -0.45%
qrduino		-1 /-2	0       -0.20%
sglib-combined	-2	1       -4.48%
slre		1 / 2	-1 / -2 -0.24%
st		0	1 / 2
statemate	0	0
ud		0	0
wikisort	0	0


In summary, for this test setup:

- In most cases, leaving the growth bias at 0 is optimal.
- Biasing the growth estimate positively may either increase or decrease
  code size.
- Biasing the estimate negatively may also either increase or decrease
  code size.
- In a small number of cases, biasing the growth estimate improves code
  size (up to 4.48% smaller for sglib-combined).


Patch
-----

The growth bias can be set with a flag:

  -finline-growth-bias=<number>

which controls the bias (an increase or decrease) of the inline growth
in ipa-inline.c. In cases where the bias would result in a negative
function size, we clip the growth estimate so that adding the growth to
the size of the function results in a size of 0, by setting the growth
to the negative of the function size.

There is not a great deal of validation of the argument - if a
non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz),
it will be as if the parameter were 0. More validation could be added if
necessary. It seemed to me that GCC's infrastructure doesn't seem to
anticipate option values / parameters that could contain negative
values. For parameters, -1 seemed to represent an error and could result
in an ICE (-2 etc. pass through OK though). For options, I looked at the
UInteger and Host_Wide_Int numeric types, but they both expect a
positive integer. I did consider extending this with an Integer type
that could accept positive and negative integers, (e.g. starting with
augmenting switch_bit_fields in opt-functions.awk) but rooting out
assumptions of non-negative integer values in places further down seemed
like an onerous task.


Further discussion
------------------

Given that a default setting of 0 (equivalent to current behaviour)
should provide for some potential improvement in code size in individual
situations where it is critical (albeit using a rather coarse instrument
to do so), without affecting the general case - is the addition of such
a flag to GCC likely to be considered an acceptable inclusion for this
purpose?


gcc/ChangeLog:

        * common.opt: Add -finline-growth-bias= flag.
        * ipa-inline.h (estimate_edge_growth): Adjust inline
        growth by bias factor, if supplied.
---
 gcc/common.opt   |  4 ++++
 gcc/ipa-inline.h | 39 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index c1605385712..4725df2c477 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1670,6 +1670,10 @@ finline-limit=
 Common RejectNegative Joined UInteger
 -finline-limit=<number>	Limit the size of inlined functions to <number>.
 
+finline-growth-bias=
+Common RejectNegative Joined Var(flag_inline_growth_bias) Init(0)
+-finline-growth-bias=<number>	Adjust the growth estimate for inlined functions by <number>.
+
 finline-atomics
 Common Report Var(flag_inline_atomics) Init(1) Optimization
 Inline __atomic operations when a lock free instruction sequence is available.
diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index 18c8e1eebd0..65e105b00a2 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -84,7 +84,44 @@ estimate_edge_growth (struct cgraph_edge *edge)
 {
   ipa_call_summary *s = ipa_call_summaries->get (edge);
   gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
-  return (estimate_edge_size (edge) - s->call_stmt_size);
+
+  if (dump_file)
+    {
+      const char *caller_name
+	= edge->caller->ultimate_alias_target ()->name ();
+      const char *callee_name
+	= edge->callee->ultimate_alias_target ()->name ();
+
+      fprintf (dump_file, "Estimate edge growth for %s -> %s\n",
+	       caller_name, callee_name);
+      fprintf (dump_file, "\t estimate_edge_size = %d\n",
+	       estimate_edge_size (edge));
+      fprintf (dump_file, "\t s (%p) ->call_stmt_size = %d\n",
+	       ((void *) s), s->call_stmt_size);
+    }
+
+  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
+
+  int sz = 0;
+  if (flag_inline_growth_bias)
+    sz = strtol(flag_inline_growth_bias, NULL, 0);
+  if (sz != 0)
+    {
+      struct cgraph_node *caller = edge->caller;
+      ipa_fn_summary *fs = ipa_fn_summaries->get (caller);
+      if (dump_file)
+	fprintf (dump_file, "\t Adjusting growth by %d\n", sz);
+      growth += sz;
+      if (growth < 0 && fs->size + growth < 0) {
+	if (dump_file)
+          fprintf (dump_file, "\t Growth %d would shrink size beneath zero.\n", growth);
+	growth = -fs->size;
+        if (dump_file)
+	  fprintf (dump_file, "\t Setting growth to %d to make size zero.\n", growth);
+      }
+    }
+
+  return growth;
 }
 
 /* Return estimated callee runtime increase after inlining
-- 
2.21.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] Add inlining growth bias flag
  2019-09-06 10:59 [RFC PATCH] Add inlining growth bias flag Graham Markall
@ 2019-09-09 18:55 ` Jeff Law
  2019-09-11 20:42   ` Graham Markall
  2019-09-12 10:08 ` Richard Biener
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Law @ 2019-09-09 18:55 UTC (permalink / raw)
  To: Graham Markall, gcc-patches; +Cc: ofer.shinaar, nidal.faour

On 9/6/19 4:58 AM, Graham Markall wrote:
> This patch is an RFC to invite comments as to whether the approach
> to solving the problem is a suitable one for upstream GCC, or whether
> there are alternative approaches that would be recommended.
> 
> Motivation
> ----------
> 
> We have observed that in some cases the estimation of callee growth for
> inlining particular functions can be tuned for better overall code size
> with particular programs on particular targets. Although modification of
> the heuristics to make a general improvement is a difficult problem to
> tackle, simply biasing the growth by a fixed amount can lead to
> improvements in code size within the context of a particular program.
> 
> This has first been tested on a proprietary program, where setting the
> growth bias to -2 resulted in a saving of 1.35% in the text section size
> (62396 bytes as opposed to 63252 bytes). Using the Embench suite (
> https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also
> shows that adjusting the inline growth estimate carefully can also
> reduce code size. Results presented here are percentages relative to the
> Embench reference platform (which is the standard way of presenting
> Embench results) for values of the bias from -2 to 2:
> 
> Benchmark	-2	-1	0	1	2
> aha-mont64	 99.05	 99.05	 99.05	 99.05	 99.05
> crc32		100.00	100.00	100.00	100.00	100.00
> cubic		 94.66	 94.66	 94.66	 94.66	 94.66
> edn		100.00	100.00	100.00	100.00	100.00
> huffbench	 99.88	 99.88	 99.88	 99.88	 99.88
> matmult-int	100.00	100.00	100.00	102.86	102.86
> minver		 97.96	 97.96	 97.96	106.88	106.88
> nbody		 98.87	 98.87	 98.87	 98.87	 98.87
> nettle-aes	 99.31	 99.10	 99.10	 99.10	 99.10
> nettle-sha256	 99.89	 99.89	 99.89	 99.89	 99.89
> nsichneu	100.00	100.00	100.00	100.00	100.00
> picojpeg	102.56	102.54	 99.73	 99.38	 99.28
> qrduino		 99.70	 99.70	 99.90	 99.90	 99.90
> sglib-combined	 95.52	100.00	100.00	100.43	101.12
> slre		100.66	100.66	 99.09	 98.85	 98.85
> st		 96.36	 96.36	 96.36	 96.82	 96.82
> statemate	100.00	100.00	100.00	100.00	100.00
> ud		100.00	100.00	100.00	100.00	100.00
> wikisort	 99.48	 99.48	 99.48	 99.48	 99.48
> Mean		 99.14	 99.36	 99.15	 99.77	 99.80
> 
> In most cases, leaving the bias at 0 (unmodified) produces the smallest
> code. However, there are some cases where an alternative value prevails,
> The "Best Diff" column shows the reduction in size compared to leaving
> the bias at 0, for cases where changing it yielded an improvement:
> 
> Benchmark	Best	Worst   Best diff
> aha-mont64	0	0
> crc32		0	0
> cubic		0	0
> edn		0	0
> huffbench	0	0
> matmult-int	0	1 / 2
> minver		0	1 / 2
> nbody		0	0
> nettle-aes	0	-2
> nettle-sha256	0	0
> nsichneu	0	0
> picojpeg	2	-2      -0.45%
> qrduino		-1 /-2	0       -0.20%
> sglib-combined	-2	1       -4.48%
> slre		1 / 2	-1 / -2 -0.24%
> st		0	1 / 2
> statemate	0	0
> ud		0	0
> wikisort	0	0
> 
> 
> In summary, for this test setup:
> 
> - In most cases, leaving the growth bias at 0 is optimal.
> - Biasing the growth estimate positively may either increase or decrease
>   code size.
> - Biasing the estimate negatively may also either increase or decrease
>   code size.
> - In a small number of cases, biasing the growth estimate improves code
>   size (up to 4.48% smaller for sglib-combined).
> 
> 
> Patch
> -----
> 
> The growth bias can be set with a flag:
> 
>   -finline-growth-bias=<number>
> 
> which controls the bias (an increase or decrease) of the inline growth
> in ipa-inline.c. In cases where the bias would result in a negative
> function size, we clip the growth estimate so that adding the growth to
> the size of the function results in a size of 0, by setting the growth
> to the negative of the function size.
> 
> There is not a great deal of validation of the argument - if a
> non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz),
> it will be as if the parameter were 0. More validation could be added if
> necessary. It seemed to me that GCC's infrastructure doesn't seem to
> anticipate option values / parameters that could contain negative
> values. For parameters, -1 seemed to represent an error and could result
> in an ICE (-2 etc. pass through OK though). For options, I looked at the
> UInteger and Host_Wide_Int numeric types, but they both expect a
> positive integer. I did consider extending this with an Integer type
> that could accept positive and negative integers, (e.g. starting with
> augmenting switch_bit_fields in opt-functions.awk) but rooting out
> assumptions of non-negative integer values in places further down seemed
> like an onerous task.
> 
> 
> Further discussion
> ------------------
> 
> Given that a default setting of 0 (equivalent to current behaviour)
> should provide for some potential improvement in code size in individual
> situations where it is critical (albeit using a rather coarse instrument
> to do so), without affecting the general case - is the addition of such
> a flag to GCC likely to be considered an acceptable inclusion for this
> purpose?
> 
> 
> gcc/ChangeLog:
> 
>         * common.opt: Add -finline-growth-bias= flag.
>         * ipa-inline.h (estimate_edge_growth): Adjust inline
>         growth by bias factor, if supplied.
I'm not sure if we really want to expose this as a first class option.
Perhaps a PARAM would be better.  Jan would have the final call though
since he's done the most work on the IPA bits.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] Add inlining growth bias flag
  2019-09-09 18:55 ` Jeff Law
@ 2019-09-11 20:42   ` Graham Markall
  0 siblings, 0 replies; 7+ messages in thread
From: Graham Markall @ 2019-09-11 20:42 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: ofer.shinaar, nidal.faour


[-- Attachment #1.1: Type: text/plain, Size: 549 bytes --]

Hi Jeff,

On 09/09/2019 19:55, Jeff Law wrote:
> I'm not sure if we really want to expose this as a first class option.
> Perhaps a PARAM would be better.  Jan would have the final call though
> since he's done the most work on the IPA bits.

Many thanks for the suggestion - I could turn it into a pair of integer
params (one to increase the growth and one to decrease it) - would that
still be suitable? (I think there's no easy way to have a param with as
signed value, but am I missing something obvious?)

Best regards,
Graham.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] Add inlining growth bias flag
  2019-09-06 10:59 [RFC PATCH] Add inlining growth bias flag Graham Markall
  2019-09-09 18:55 ` Jeff Law
@ 2019-09-12 10:08 ` Richard Biener
  2019-11-25 12:50   ` [PATCH v2] Add inline growth bias param Graham Markall
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Biener @ 2019-09-12 10:08 UTC (permalink / raw)
  To: Graham Markall; +Cc: GCC Patches, ofer.shinaar, nidal.faour

On Fri, Sep 6, 2019 at 12:59 PM Graham Markall
<graham.markall@embecosm.com> wrote:
>
> This patch is an RFC to invite comments as to whether the approach
> to solving the problem is a suitable one for upstream GCC, or whether
> there are alternative approaches that would be recommended.
>
> Motivation
> ----------
>
> We have observed that in some cases the estimation of callee growth for
> inlining particular functions can be tuned for better overall code size
> with particular programs on particular targets. Although modification of
> the heuristics to make a general improvement is a difficult problem to
> tackle, simply biasing the growth by a fixed amount can lead to
> improvements in code size within the context of a particular program.
>
> This has first been tested on a proprietary program, where setting the
> growth bias to -2 resulted in a saving of 1.35% in the text section size
> (62396 bytes as opposed to 63252 bytes). Using the Embench suite (
> https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also
> shows that adjusting the inline growth estimate carefully can also
> reduce code size. Results presented here are percentages relative to the
> Embench reference platform (which is the standard way of presenting
> Embench results) for values of the bias from -2 to 2:
>
> Benchmark       -2      -1      0       1       2
> aha-mont64       99.05   99.05   99.05   99.05   99.05
> crc32           100.00  100.00  100.00  100.00  100.00
> cubic            94.66   94.66   94.66   94.66   94.66
> edn             100.00  100.00  100.00  100.00  100.00
> huffbench        99.88   99.88   99.88   99.88   99.88
> matmult-int     100.00  100.00  100.00  102.86  102.86
> minver           97.96   97.96   97.96  106.88  106.88
> nbody            98.87   98.87   98.87   98.87   98.87
> nettle-aes       99.31   99.10   99.10   99.10   99.10
> nettle-sha256    99.89   99.89   99.89   99.89   99.89
> nsichneu        100.00  100.00  100.00  100.00  100.00
> picojpeg        102.56  102.54   99.73   99.38   99.28
> qrduino          99.70   99.70   99.90   99.90   99.90
> sglib-combined   95.52  100.00  100.00  100.43  101.12
> slre            100.66  100.66   99.09   98.85   98.85
> st               96.36   96.36   96.36   96.82   96.82
> statemate       100.00  100.00  100.00  100.00  100.00
> ud              100.00  100.00  100.00  100.00  100.00
> wikisort         99.48   99.48   99.48   99.48   99.48
> Mean             99.14   99.36   99.15   99.77   99.80
>
> In most cases, leaving the bias at 0 (unmodified) produces the smallest
> code. However, there are some cases where an alternative value prevails,
> The "Best Diff" column shows the reduction in size compared to leaving
> the bias at 0, for cases where changing it yielded an improvement:
>
> Benchmark       Best    Worst   Best diff
> aha-mont64      0       0
> crc32           0       0
> cubic           0       0
> edn             0       0
> huffbench       0       0
> matmult-int     0       1 / 2
> minver          0       1 / 2
> nbody           0       0
> nettle-aes      0       -2
> nettle-sha256   0       0
> nsichneu        0       0
> picojpeg        2       -2      -0.45%
> qrduino         -1 /-2  0       -0.20%
> sglib-combined  -2      1       -4.48%
> slre            1 / 2   -1 / -2 -0.24%
> st              0       1 / 2
> statemate       0       0
> ud              0       0
> wikisort        0       0
>
>
> In summary, for this test setup:
>
> - In most cases, leaving the growth bias at 0 is optimal.
> - Biasing the growth estimate positively may either increase or decrease
>   code size.
> - Biasing the estimate negatively may also either increase or decrease
>   code size.
> - In a small number of cases, biasing the growth estimate improves code
>   size (up to 4.48% smaller for sglib-combined).
>
>
> Patch
> -----
>
> The growth bias can be set with a flag:
>
>   -finline-growth-bias=<number>
>
> which controls the bias (an increase or decrease) of the inline growth
> in ipa-inline.c. In cases where the bias would result in a negative
> function size, we clip the growth estimate so that adding the growth to
> the size of the function results in a size of 0, by setting the growth
> to the negative of the function size.
>
> There is not a great deal of validation of the argument - if a
> non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz),
> it will be as if the parameter were 0. More validation could be added if
> necessary. It seemed to me that GCC's infrastructure doesn't seem to
> anticipate option values / parameters that could contain negative
> values. For parameters, -1 seemed to represent an error and could result
> in an ICE (-2 etc. pass through OK though). For options, I looked at the
> UInteger and Host_Wide_Int numeric types, but they both expect a
> positive integer. I did consider extending this with an Integer type
> that could accept positive and negative integers, (e.g. starting with
> augmenting switch_bit_fields in opt-functions.awk) but rooting out
> assumptions of non-negative integer values in places further down seemed
> like an onerous task.
>
>
> Further discussion
> ------------------
>
> Given that a default setting of 0 (equivalent to current behaviour)
> should provide for some potential improvement in code size in individual
> situations where it is critical (albeit using a rather coarse instrument
> to do so), without affecting the general case - is the addition of such
> a flag to GCC likely to be considered an acceptable inclusion for this
> purpose?

I agree with Jeff that this should be a --param like all other inliner
tunings.

What the bias achieves is adjusting functions size estimate by
an absolute value so I'd implement it there like

Index: gcc/ipa-fnsummary.c
===================================================================
--- gcc/ipa-fnsummary.c (revision 275680)
+++ gcc/ipa-fnsummary.c (working copy)
@@ -2467,6 +2467,8 @@ compute_fn_summary (struct cgraph_node *
       break;
   node->calls_comdat_local = (e != NULL);

+  info->self_size = MAX (0, info->self_size + PARAM_VALUE
(PARAM_INLINE_FNSIZE_BIAS));
+
   /* Inlining characteristics are maintained by the cgraph_mark_inline.  */
   info->size = info->self_size;
   info->stack_frame_offset = 0;

which has the advantage to reduce the number of times the adjustment is
done.  I also think that you can use --param unlinlined-function-insns to
get the same effect of your parameter, but it doesn't allow negative values
though an effective negative value of -2 is possible since the default is 2.
The parameter is used during summary computation and so if we want
an additional parameter it could be accounted for at the same place this one is.

Richard.

>
> gcc/ChangeLog:
>
>         * common.opt: Add -finline-growth-bias= flag.
>         * ipa-inline.h (estimate_edge_growth): Adjust inline
>         growth by bias factor, if supplied.
> ---
>  gcc/common.opt   |  4 ++++
>  gcc/ipa-inline.h | 39 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index c1605385712..4725df2c477 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1670,6 +1670,10 @@ finline-limit=
>  Common RejectNegative Joined UInteger
>  -finline-limit=<number>        Limit the size of inlined functions to <number>.
>
> +finline-growth-bias=
> +Common RejectNegative Joined Var(flag_inline_growth_bias) Init(0)
> +-finline-growth-bias=<number>  Adjust the growth estimate for inlined functions by <number>.
> +
>  finline-atomics
>  Common Report Var(flag_inline_atomics) Init(1) Optimization
>  Inline __atomic operations when a lock free instruction sequence is available.
> diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
> index 18c8e1eebd0..65e105b00a2 100644
> --- a/gcc/ipa-inline.h
> +++ b/gcc/ipa-inline.h
> @@ -84,7 +84,44 @@ estimate_edge_growth (struct cgraph_edge *edge)
>  {
>    ipa_call_summary *s = ipa_call_summaries->get (edge);
>    gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
> -  return (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  if (dump_file)
> +    {
> +      const char *caller_name
> +       = edge->caller->ultimate_alias_target ()->name ();
> +      const char *callee_name
> +       = edge->callee->ultimate_alias_target ()->name ();
> +
> +      fprintf (dump_file, "Estimate edge growth for %s -> %s\n",
> +              caller_name, callee_name);
> +      fprintf (dump_file, "\t estimate_edge_size = %d\n",
> +              estimate_edge_size (edge));
> +      fprintf (dump_file, "\t s (%p) ->call_stmt_size = %d\n",
> +              ((void *) s), s->call_stmt_size);
> +    }
> +
> +  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  int sz = 0;
> +  if (flag_inline_growth_bias)
> +    sz = strtol(flag_inline_growth_bias, NULL, 0);
> +  if (sz != 0)
> +    {
> +      struct cgraph_node *caller = edge->caller;
> +      ipa_fn_summary *fs = ipa_fn_summaries->get (caller);
> +      if (dump_file)
> +       fprintf (dump_file, "\t Adjusting growth by %d\n", sz);
> +      growth += sz;
> +      if (growth < 0 && fs->size + growth < 0) {
> +       if (dump_file)
> +          fprintf (dump_file, "\t Growth %d would shrink size beneath zero.\n", growth);
> +       growth = -fs->size;
> +        if (dump_file)
> +         fprintf (dump_file, "\t Setting growth to %d to make size zero.\n", growth);
> +      }
> +    }
> +
> +  return growth;
>  }
>
>  /* Return estimated callee runtime increase after inlining
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] Add inline growth bias param
  2019-09-12 10:08 ` Richard Biener
@ 2019-11-25 12:50   ` Graham Markall
  2019-11-27  8:26     ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Graham Markall @ 2019-11-25 12:50 UTC (permalink / raw)
  To: gcc-patches
  Cc: ofer.shinaar, nidal.faour, craig.blackmore, richard.guenther,
	law, Graham Markall

Hi Richard,

Many thanks for the suggestion of an alternative implementation. I tried
implementing the suggestion, and I had a couple of observations:

1. As well as applying the bias in compute_fn_summary, it seemed to also
be necessary to apply it in ip_update_overall_fn_summary to avoid an ICE
resulting from size_info->size and size_info->self_size differing at the
end of compute_fn_summary.

2. The results that it produced were exactly the same as those using the
param uninlined-function-insns, so it would probably be redundant to add
the additional parameter implemented this way.

The implementation I tried is at:
https://github.com/gmarkall/gcc/commit/1d22f65b8a392cffc235ecb49143a93d1720b91c

When I benchmarked the code size again using Embench with both the
inline-growth-bias param implemented this way, and the
uninlined-function-insns param the results were, in both cases:

Benchmark       0       1       2       3       4
---------       ------  ------  ------  ------  ------
aha-mont64       99.05   99.05   99.05   99.05   99.05
crc32           100.00  100.00  100.00  100.00  100.00
cubic            94.66   94.66   94.66   94.66   94.66
edn              96.14   96.14   96.14   96.14   96.14
huffbench        99.88   99.88   99.88   99.88   99.88
matmult-int     100.00  100.00  100.00  100.00  100.00
minver          100.56  100.56  100.56  100.56  100.56
nbody           101.13  101.13  101.13  101.13  101.13
nettle-aes       99.03   99.03   99.03   99.03   99.03
nettle-sha256    99.89   99.89   99.89   99.89   99.89
nsichneu        100.00  100.00  100.00  100.00  100.00
picojpeg         99.35   99.35  100.10  100.10  100.10
qrduino         101.81  101.81  101.81  101.81  101.81
sglib-combined  100.00  100.00  100.00  100.00  100.00
slre             99.09   99.42   99.42  100.49  100.49
st               96.36   96.36   96.36   96.36   96.36
statemate       100.22  100.22  100.22  100.22  100.22
ud              100.00  100.00  100.00  100.00  100.00
wikisort         99.48   99.48   99.48   99.48   99.48
Mean             99.28   99.30   99.34   99.40   99.40

These results show that:

1. Setting the uninlined-function-insns value to 0 rather than 2
generally seems to yield an improvement in code size on RISC-V, without
making any individual benchmark larger than the current default of 2.

2. There are two cases where the patch in the original version of the
patch (v1) makes a slightly greater improvement than using
uninlined-function-insns or the patch linked above (ufi). These are:

Benchmark       ufi      v1
---------       ---      --
qrduino         101.81   101.61
sglib-combined  100.00    95.87

(note that the v1 values are slightly different to those posted in the
original patch - I rebased the original patch on a more recent version
of the master branch and the values changed slightly.)

Additionally, for a proprietary application that we tested the flags
with, the code sizes were for using neither flag (Original) for
using uninlined-function-insns, and using the original patch, with the
parameter values that resulted in the smallest code size, were:

Original   ufi (val=0)  v1 (val=-2)
57380      57184        56756
(100.0%)   (99.66%)     (98.91%)


Patch v2
--------

In conclusion, it seems that in some cases there is some additional code
size saving that can be gained in specific cases using the original
implementation of the patch. I have tidied up the original version of
the patch and adjusted the option so that it is a param instead of a
flag as suggested by both Jeff and yourself. I chose a default param
value of 5 to allow for a reasonable scope for setting negative values -
although I haven't seen any result where setting the value beneath an
effective setting of -2 yielded an improvement, this does allow a little
more room in case there are cases where going below -2 would help.

Is there sufficient data to indicate that the additional parameter as
implemented in the attached patch is worth adding?


Many thanks,
Graham.



---
 gcc/ipa-inline.h | 21 ++++++++++++++++++++-
 gcc/params.def   |  5 +++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index 18c8e1eebd0..fb10677ea8f 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_IPA_INLINE_H
 #define GCC_IPA_INLINE_H
 
+#include "params.h"
+
 /* Data we cache about callgraph edges during inlining to avoid expensive
    re-computations during the greedy algorithm.  */
 class edge_growth_cache_entry
@@ -84,7 +86,24 @@ estimate_edge_growth (struct cgraph_edge *edge)
 {
   ipa_call_summary *s = ipa_call_summaries->get (edge);
   gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
-  return (estimate_edge_size (edge) - s->call_stmt_size);
+
+  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
+
+  /* Bias function growth according to the bias parameter. The default is
+     parameter value is 5 to allow for slight negative biases, so we subtract 5
+     to allow an effective default value of 0.  */
+  growth += PARAM_VALUE (PARAM_INLINE_GROWTH_BIAS) - 5;
+
+  /* Function size cannot be negative, so if the growth is negative to the point
+     that it will reduce function size below 0, then we cap the growth such that
+     it makes the function size exactly zero.  */
+  struct cgraph_node *caller = edge->caller;
+  ipa_size_summary *fs = ipa_size_summaries->get (caller);
+  if (fs->size + growth < 0) {
+    growth = -fs->size;
+  }
+
+  return growth;
 }
 
 /* Return estimated callee runtime increase after inlining
diff --git a/gcc/params.def b/gcc/params.def
index 7928f6f071e..4274d8f36e0 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -112,6 +112,11 @@ DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2,
 	  "The scale (in percents) applied to inline-insns-single and auto limits when heuristics hints that inlining is very profitable.",
 	  200, 100, 1000000)
 
+DEFPARAM (PARAM_INLINE_GROWTH_BIAS,
+	  "inline-growth-bias",
+	  "Bias of inlining growth overhead (default 5 is equal to no bias).",
+	  5, 0, 1000000)
+
 DEFPARAM (PARAM_MAX_INLINE_INSNS_SIZE,
 	  "max-inline-insns-size",
 	  "The maximum number of instructions when inlining for size.",
-- 
2.21.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Add inline growth bias param
  2019-11-25 12:50   ` [PATCH v2] Add inline growth bias param Graham Markall
@ 2019-11-27  8:26     ` Richard Biener
  2019-11-27  8:29       ` Jan Hubicka
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2019-11-27  8:26 UTC (permalink / raw)
  To: Graham Markall
  Cc: GCC Patches, ofer.shinaar, nidal.faour, craig.blackmore,
	Jeff Law, Jan Hubicka

On Mon, Nov 25, 2019 at 1:44 PM Graham Markall
<graham.markall@embecosm.com> wrote:
>
> Hi Richard,
>
> Many thanks for the suggestion of an alternative implementation. I tried
> implementing the suggestion, and I had a couple of observations:
>
> 1. As well as applying the bias in compute_fn_summary, it seemed to also
> be necessary to apply it in ip_update_overall_fn_summary to avoid an ICE
> resulting from size_info->size and size_info->self_size differing at the
> end of compute_fn_summary.
>
> 2. The results that it produced were exactly the same as those using the
> param uninlined-function-insns, so it would probably be redundant to add
> the additional parameter implemented this way.
>
> The implementation I tried is at:
> https://github.com/gmarkall/gcc/commit/1d22f65b8a392cffc235ecb49143a93d1720b91c
>
> When I benchmarked the code size again using Embench with both the
> inline-growth-bias param implemented this way, and the
> uninlined-function-insns param the results were, in both cases:
>
> Benchmark       0       1       2       3       4
> ---------       ------  ------  ------  ------  ------
> aha-mont64       99.05   99.05   99.05   99.05   99.05
> crc32           100.00  100.00  100.00  100.00  100.00
> cubic            94.66   94.66   94.66   94.66   94.66
> edn              96.14   96.14   96.14   96.14   96.14
> huffbench        99.88   99.88   99.88   99.88   99.88
> matmult-int     100.00  100.00  100.00  100.00  100.00
> minver          100.56  100.56  100.56  100.56  100.56
> nbody           101.13  101.13  101.13  101.13  101.13
> nettle-aes       99.03   99.03   99.03   99.03   99.03
> nettle-sha256    99.89   99.89   99.89   99.89   99.89
> nsichneu        100.00  100.00  100.00  100.00  100.00
> picojpeg         99.35   99.35  100.10  100.10  100.10
> qrduino         101.81  101.81  101.81  101.81  101.81
> sglib-combined  100.00  100.00  100.00  100.00  100.00
> slre             99.09   99.42   99.42  100.49  100.49
> st               96.36   96.36   96.36   96.36   96.36
> statemate       100.22  100.22  100.22  100.22  100.22
> ud              100.00  100.00  100.00  100.00  100.00
> wikisort         99.48   99.48   99.48   99.48   99.48
> Mean             99.28   99.30   99.34   99.40   99.40
>
> These results show that:
>
> 1. Setting the uninlined-function-insns value to 0 rather than 2
> generally seems to yield an improvement in code size on RISC-V, without
> making any individual benchmark larger than the current default of 2.
>
> 2. There are two cases where the patch in the original version of the
> patch (v1) makes a slightly greater improvement than using
> uninlined-function-insns or the patch linked above (ufi). These are:
>
> Benchmark       ufi      v1
> ---------       ---      --
> qrduino         101.81   101.61
> sglib-combined  100.00    95.87
>
> (note that the v1 values are slightly different to those posted in the
> original patch - I rebased the original patch on a more recent version
> of the master branch and the values changed slightly.)
>
> Additionally, for a proprietary application that we tested the flags
> with, the code sizes were for using neither flag (Original) for
> using uninlined-function-insns, and using the original patch, with the
> parameter values that resulted in the smallest code size, were:
>
> Original   ufi (val=0)  v1 (val=-2)
> 57380      57184        56756
> (100.0%)   (99.66%)     (98.91%)
>
>
> Patch v2
> --------
>
> In conclusion, it seems that in some cases there is some additional code
> size saving that can be gained in specific cases using the original
> implementation of the patch. I have tidied up the original version of
> the patch and adjusted the option so that it is a param instead of a
> flag as suggested by both Jeff and yourself. I chose a default param
> value of 5 to allow for a reasonable scope for setting negative values -
> although I haven't seen any result where setting the value beneath an
> effective setting of -2 yielded an improvement, this does allow a little
> more room in case there are cases where going below -2 would help.
>
> Is there sufficient data to indicate that the additional parameter as
> implemented in the attached patch is worth adding?

There needs to be more explanation on what this really achieves
and as I said IIRC we should already have a --param that accounts
for this, --param uninlined-function-insns which maybe only misses
the ability to become negative.

Richard.

>
> Many thanks,
> Graham.
>
>
>
> ---
>  gcc/ipa-inline.h | 21 ++++++++++++++++++++-
>  gcc/params.def   |  5 +++++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
> index 18c8e1eebd0..fb10677ea8f 100644
> --- a/gcc/ipa-inline.h
> +++ b/gcc/ipa-inline.h
> @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_IPA_INLINE_H
>  #define GCC_IPA_INLINE_H
>
> +#include "params.h"
> +
>  /* Data we cache about callgraph edges during inlining to avoid expensive
>     re-computations during the greedy algorithm.  */
>  class edge_growth_cache_entry
> @@ -84,7 +86,24 @@ estimate_edge_growth (struct cgraph_edge *edge)
>  {
>    ipa_call_summary *s = ipa_call_summaries->get (edge);
>    gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
> -  return (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  /* Bias function growth according to the bias parameter. The default is
> +     parameter value is 5 to allow for slight negative biases, so we subtract 5
> +     to allow an effective default value of 0.  */
> +  growth += PARAM_VALUE (PARAM_INLINE_GROWTH_BIAS) - 5;
> +
> +  /* Function size cannot be negative, so if the growth is negative to the point
> +     that it will reduce function size below 0, then we cap the growth such that
> +     it makes the function size exactly zero.  */
> +  struct cgraph_node *caller = edge->caller;
> +  ipa_size_summary *fs = ipa_size_summaries->get (caller);
> +  if (fs->size + growth < 0) {
> +    growth = -fs->size;
> +  }
> +
> +  return growth;
>  }
>
>  /* Return estimated callee runtime increase after inlining
> diff --git a/gcc/params.def b/gcc/params.def
> index 7928f6f071e..4274d8f36e0 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -112,6 +112,11 @@ DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2,
>           "The scale (in percents) applied to inline-insns-single and auto limits when heuristics hints that inlining is very profitable.",
>           200, 100, 1000000)
>
> +DEFPARAM (PARAM_INLINE_GROWTH_BIAS,
> +         "inline-growth-bias",
> +         "Bias of inlining growth overhead (default 5 is equal to no bias).",
> +         5, 0, 1000000)
> +
>  DEFPARAM (PARAM_MAX_INLINE_INSNS_SIZE,
>           "max-inline-insns-size",
>           "The maximum number of instructions when inlining for size.",
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Add inline growth bias param
  2019-11-27  8:26     ` Richard Biener
@ 2019-11-27  8:29       ` Jan Hubicka
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Hubicka @ 2019-11-27  8:29 UTC (permalink / raw)
  To: Richard Biener
  Cc: Graham Markall, GCC Patches, ofer.shinaar, nidal.faour,
	craig.blackmore, Jeff Law

> > 2. The results that it produced were exactly the same as those using the
> > param uninlined-function-insns, so it would probably be redundant to add
> > the additional parameter implemented this way.
> >
> > The implementation I tried is at:
> > https://github.com/gmarkall/gcc/commit/1d22f65b8a392cffc235ecb49143a93d1720b91c
> >
> > When I benchmarked the code size again using Embench with both the
> > inline-growth-bias param implemented this way, and the
> > uninlined-function-insns param the results were, in both cases:
> >
> > Benchmark       0       1       2       3       4
> > ---------       ------  ------  ------  ------  ------
> > aha-mont64       99.05   99.05   99.05   99.05   99.05
> > crc32           100.00  100.00  100.00  100.00  100.00
> > cubic            94.66   94.66   94.66   94.66   94.66
> > edn              96.14   96.14   96.14   96.14   96.14
> > huffbench        99.88   99.88   99.88   99.88   99.88
> > matmult-int     100.00  100.00  100.00  100.00  100.00
> > minver          100.56  100.56  100.56  100.56  100.56
> > nbody           101.13  101.13  101.13  101.13  101.13
> > nettle-aes       99.03   99.03   99.03   99.03   99.03
> > nettle-sha256    99.89   99.89   99.89   99.89   99.89
> > nsichneu        100.00  100.00  100.00  100.00  100.00
> > picojpeg         99.35   99.35  100.10  100.10  100.10
> > qrduino         101.81  101.81  101.81  101.81  101.81
> > sglib-combined  100.00  100.00  100.00  100.00  100.00
> > slre             99.09   99.42   99.42  100.49  100.49
> > st               96.36   96.36   96.36   96.36   96.36
> > statemate       100.22  100.22  100.22  100.22  100.22
> > ud              100.00  100.00  100.00  100.00  100.00
> > wikisort         99.48   99.48   99.48   99.48   99.48
> > Mean             99.28   99.30   99.34   99.40   99.40
> >
> > These results show that:
> >
> > 1. Setting the uninlined-function-insns value to 0 rather than 2
> > generally seems to yield an improvement in code size on RISC-V, without
> > making any individual benchmark larger than the current default of 2.

It would be interesting to understand this better - I am generally
trying the inline metrics to match reality and the constant of "2"
basically accounts the fact that offline copy of function has some
prologue/epilogue/return instructions which we do not see at gimple
level and thus we want to account it. These are always optimized out
after inliing.

I made this param mostly becased on presentaiton of s390 performance
where function calls, prologues, epilogues etc are significantly more
expensive.

So rather than biassing code size estimates to be less realistic I would
try to fix it on the inline heuristics side.

Do you have small examples for to look at?
> >
> > 2. There are two cases where the patch in the original version of the
> > patch (v1) makes a slightly greater improvement than using
> > uninlined-function-insns or the patch linked above (ufi). These are:
> >
> > Benchmark       ufi      v1
> > ---------       ---      --
> > qrduino         101.81   101.61
> > sglib-combined  100.00    95.87
> >
> > (note that the v1 values are slightly different to those posted in the
> > original patch - I rebased the original patch on a more recent version
> > of the master branch and the values changed slightly.)
> >
> > Additionally, for a proprietary application that we tested the flags
> > with, the code sizes were for using neither flag (Original) for
> > using uninlined-function-insns, and using the original patch, with the
> > parameter values that resulted in the smallest code size, were:
> >
> > Original   ufi (val=0)  v1 (val=-2)
> > 57380      57184        56756
> > (100.0%)   (99.66%)     (98.91%)
> >
> >
> > Patch v2
> > --------
> >
> > In conclusion, it seems that in some cases there is some additional code
> > size saving that can be gained in specific cases using the original
> > implementation of the patch. I have tidied up the original version of
> > the patch and adjusted the option so that it is a param instead of a
> > flag as suggested by both Jeff and yourself. I chose a default param
> > value of 5 to allow for a reasonable scope for setting negative values -
> > although I haven't seen any result where setting the value beneath an
> > effective setting of -2 yielded an improvement, this does allow a little
> > more room in case there are cases where going below -2 would help.
> >
> > Is there sufficient data to indicate that the additional parameter as
> > implemented in the attached patch is worth adding?
> 
> There needs to be more explanation on what this really achieves
> and as I said IIRC we should already have a --param that accounts
> for this, --param uninlined-function-insns which maybe only misses
> the ability to become negative.
> 
> Richard.
> 
> >
> > Many thanks,
> > Graham.
> >
> >
> >
> > ---
> >  gcc/ipa-inline.h | 21 ++++++++++++++++++++-
> >  gcc/params.def   |  5 +++++
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
> > index 18c8e1eebd0..fb10677ea8f 100644
> > --- a/gcc/ipa-inline.h
> > +++ b/gcc/ipa-inline.h
> > @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #ifndef GCC_IPA_INLINE_H
> >  #define GCC_IPA_INLINE_H
> >
> > +#include "params.h"
> > +
> >  /* Data we cache about callgraph edges during inlining to avoid expensive
> >     re-computations during the greedy algorithm.  */
> >  class edge_growth_cache_entry
> > @@ -84,7 +86,24 @@ estimate_edge_growth (struct cgraph_edge *edge)
> >  {
> >    ipa_call_summary *s = ipa_call_summaries->get (edge);
> >    gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
> > -  return (estimate_edge_size (edge) - s->call_stmt_size);
> > +
> > +  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
> > +
> > +  /* Bias function growth according to the bias parameter. The default is
> > +     parameter value is 5 to allow for slight negative biases, so we subtract 5
> > +     to allow an effective default value of 0.  */
> > +  growth += PARAM_VALUE (PARAM_INLINE_GROWTH_BIAS) - 5;
> > +
> > +  /* Function size cannot be negative, so if the growth is negative to the point
> > +     that it will reduce function size below 0, then we cap the growth such that
> > +     it makes the function size exactly zero.  */
> > +  struct cgraph_node *caller = edge->caller;
> > +  ipa_size_summary *fs = ipa_size_summaries->get (caller);
> > +  if (fs->size + growth < 0) {
> > +    growth = -fs->size;
> > +  }
> > +
> > +  return growth;

The esitmated growth is used in the following ways

1) it is compared to max-inline-insns-* and early-inlining-insns parameters
2) it is used to estimate overall code growth for inlining for size
3) it affects badness metrics.

Again instead of biassing growth, perhaps we should work out what of
1, 2, 3 needs returning.

Honza
> >  }
> >
> >  /* Return estimated callee runtime increase after inlining
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 7928f6f071e..4274d8f36e0 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -112,6 +112,11 @@ DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2,
> >           "The scale (in percents) applied to inline-insns-single and auto limits when heuristics hints that inlining is very profitable.",
> >           200, 100, 1000000)
> >
> > +DEFPARAM (PARAM_INLINE_GROWTH_BIAS,
> > +         "inline-growth-bias",
> > +         "Bias of inlining growth overhead (default 5 is equal to no bias).",
> > +         5, 0, 1000000)
> > +
> >  DEFPARAM (PARAM_MAX_INLINE_INSNS_SIZE,
> >           "max-inline-insns-size",
> >           "The maximum number of instructions when inlining for size.",
> > --
> > 2.21.0
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-27  8:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-06 10:59 [RFC PATCH] Add inlining growth bias flag Graham Markall
2019-09-09 18:55 ` Jeff Law
2019-09-11 20:42   ` Graham Markall
2019-09-12 10:08 ` Richard Biener
2019-11-25 12:50   ` [PATCH v2] Add inline growth bias param Graham Markall
2019-11-27  8:26     ` Richard Biener
2019-11-27  8:29       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).