[PATCH] Check calls before loop unrolling

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Check calls before loop unrolling
@ 2020-08-20  4:34 guojiufu
  2020-08-24  9:16 ` Richard Biener
  2020-09-01  3:33 ` Jiufu Guo
  0 siblings, 2 replies; 17+ messages in thread
From: guojiufu @ 2020-08-20  4:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: guojiufu, wschmidt, segher, dje.gcc

Hi,

When unroll loops, if there are calls inside the loop, those calls
may raise negative impacts for unrolling.  This patch adds a param
param_max_unrolled_calls, and checks if the number of calls inside
the loop bigger than this param, loop is prevent from unrolling.

This patch is checking the _average_ number of calls which is the
summary of call numbers multiply the possibility of the call maybe
executed.  The _average_ number could be a fraction, to keep the
precision, the param is the threshold number multiply 10000.

Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?

gcc/ChangeLog
2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>

	* params.opt (param_max_unrolled_average_calls_x10000): New param.
	* cfgloop.h (average_num_loop_calls): New declare.
	* cfgloopanal.c (average_num_loop_calls): New function.
	* loop-unroll.c (decide_unroll_constant_iteration,
	decide_unroll_runtime_iterations,
	decide_unroll_stupid): Check average_num_loop_calls and
	param_max_unrolled_average_calls_x10000.
---
 gcc/cfgloop.h     |  2 ++
 gcc/cfgloopanal.c | 25 +++++++++++++++++++++++++
 gcc/loop-unroll.c | 10 ++++++++++
 gcc/params.opt    |  4 ++++
 4 files changed, 41 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 18b404e292f..dab933da150 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_CFGLOOP_H
 
 #include "cfgloopmanip.h"
+#include "sreal.h"
 
 /* Structure to hold decision about unrolling/peeling.  */
 enum lpt_dec
@@ -387,6 +388,7 @@ extern vec<edge> get_loop_exit_edges (const class loop *, basic_block * = NULL);
 extern edge single_exit (const class loop *);
 extern edge single_likely_exit (class loop *loop, vec<edge>);
 extern unsigned num_loop_branches (const class loop *);
+extern sreal average_num_loop_calls (const class loop *);
 
 extern edge loop_preheader_edge (const class loop *);
 extern edge loop_latch_edge (const class loop *);
diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index 0b33e8272a7..a314db4e0c0 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop)
   return ret;
 }
 
+/* Count the number of call insns in LOOP.  */
+sreal
+average_num_loop_calls (const class loop *loop)
+{
+  basic_block *bbs;
+  rtx_insn *insn;
+  unsigned int i, bncalls;
+  sreal ncalls = 0;
+
+  bbs = get_loop_body (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      bncalls = 0;
+      FOR_BB_INSNS (bbs[i], insn)
+	if (CALL_P (insn))
+	  bncalls++;
+
+      ncalls += (sreal) bncalls
+	* bbs[i]->count.to_sreal_scale (loop->header->count);
+    }
+  free (bbs);
+
+  return ncalls;
+}
+
 /* Returns expected number of iterations of LOOP, according to
    measured or guessed profile.
 
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 693c7768868..56b8fb37d2a 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int flags)
     nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
     nunroll = param_max_unroll_times;
+  if (!loop->unroll
+      && (average_num_loop_calls (loop) * (sreal) 10000).to_int ()
+	   > (unsigned) param_max_unrolled_average_calls_x10000)
+    nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int flags)
     nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
     nunroll = param_max_unroll_times;
+  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
+      > (unsigned) param_max_unrolled_average_calls_x10000)
+    nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags)
     nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
     nunroll = param_max_unroll_times;
+  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
+      > (unsigned) param_max_unrolled_average_calls_x10000)
+    nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
diff --git a/gcc/params.opt b/gcc/params.opt
index f39e5d1a012..80605861223 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -634,6 +634,10 @@ The maximum number of unrollings of a single loop.
 Common Joined UInteger Var(param_max_unrolled_insns) Init(200) Param Optimization
 The maximum number of instructions to consider to unroll in a loop.
 
+-param=max-unrolled-average-calls-x10000=
+Common Joined UInteger Var(param_max_unrolled_average_calls_x10000) Init(0) Param Optimization
+The maximum number of calls to consider to unroll in a loop on average and multiply 10000.
+
 -param=max-unswitch-insns=
 Common Joined UInteger Var(param_max_unswitch_insns) Init(50) Param Optimization
 The maximum number of insns of an unswitched loop.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-08-20  4:34 [PATCH] Check calls before loop unrolling guojiufu
@ 2020-08-24  9:16 ` Richard Biener
  2020-08-24 11:16   ` Jan Hubicka
  2020-09-01  3:33 ` Jiufu Guo
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2020-08-24  9:16 UTC (permalink / raw)
  To: guojiufu; +Cc: GCC Patches, Bill Schmidt, David Edelsohn, Segher Boessenkool

On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> When unroll loops, if there are calls inside the loop, those calls
> may raise negative impacts for unrolling.  This patch adds a param
> param_max_unrolled_calls, and checks if the number of calls inside
> the loop bigger than this param, loop is prevent from unrolling.
>
> This patch is checking the _average_ number of calls which is the
> summary of call numbers multiply the possibility of the call maybe
> executed.  The _average_ number could be a fraction, to keep the
> precision, the param is the threshold number multiply 10000.
>
> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?

Can you try mimicking what try_unroll_loop_completely on GIMPLE does
instead?  IIRC the main motivation to not unroll calls is the spilling code
around it which we cannot estimate very well.  And that spilling happens
irrespective of whether the call is in a hot or cold path so I'm not sure
it makes sense to use the "average" number of calls here.

Richard.

> gcc/ChangeLog
> 2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>
>
>         * params.opt (param_max_unrolled_average_calls_x10000): New param.
>         * cfgloop.h (average_num_loop_calls): New declare.
>         * cfgloopanal.c (average_num_loop_calls): New function.
>         * loop-unroll.c (decide_unroll_constant_iteration,
>         decide_unroll_runtime_iterations,
>         decide_unroll_stupid): Check average_num_loop_calls and
>         param_max_unrolled_average_calls_x10000.
> ---
>  gcc/cfgloop.h     |  2 ++
>  gcc/cfgloopanal.c | 25 +++++++++++++++++++++++++
>  gcc/loop-unroll.c | 10 ++++++++++
>  gcc/params.opt    |  4 ++++
>  4 files changed, 41 insertions(+)
>
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index 18b404e292f..dab933da150 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define GCC_CFGLOOP_H
>
>  #include "cfgloopmanip.h"
> +#include "sreal.h"
>
>  /* Structure to hold decision about unrolling/peeling.  */
>  enum lpt_dec
> @@ -387,6 +388,7 @@ extern vec<edge> get_loop_exit_edges (const class loop *, basic_block * = NULL);
>  extern edge single_exit (const class loop *);
>  extern edge single_likely_exit (class loop *loop, vec<edge>);
>  extern unsigned num_loop_branches (const class loop *);
> +extern sreal average_num_loop_calls (const class loop *);
>
>  extern edge loop_preheader_edge (const class loop *);
>  extern edge loop_latch_edge (const class loop *);
> diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
> index 0b33e8272a7..a314db4e0c0 100644
> --- a/gcc/cfgloopanal.c
> +++ b/gcc/cfgloopanal.c
> @@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop)
>    return ret;
>  }
>
> +/* Count the number of call insns in LOOP.  */
> +sreal
> +average_num_loop_calls (const class loop *loop)
> +{
> +  basic_block *bbs;
> +  rtx_insn *insn;
> +  unsigned int i, bncalls;
> +  sreal ncalls = 0;
> +
> +  bbs = get_loop_body (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      bncalls = 0;
> +      FOR_BB_INSNS (bbs[i], insn)
> +       if (CALL_P (insn))
> +         bncalls++;
> +
> +      ncalls += (sreal) bncalls
> +       * bbs[i]->count.to_sreal_scale (loop->header->count);
> +    }
> +  free (bbs);
> +
> +  return ncalls;
> +}
> +
>  /* Returns expected number of iterations of LOOP, according to
>     measured or guessed profile.
>
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 693c7768868..56b8fb37d2a 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int flags)
>      nunroll = nunroll_by_av;
>    if (nunroll > (unsigned) param_max_unroll_times)
>      nunroll = param_max_unroll_times;
> +  if (!loop->unroll
> +      && (average_num_loop_calls (loop) * (sreal) 10000).to_int ()
> +          > (unsigned) param_max_unrolled_average_calls_x10000)
> +    nunroll = 0;
>
>    if (targetm.loop_unroll_adjust)
>      nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> @@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int flags)
>      nunroll = nunroll_by_av;
>    if (nunroll > (unsigned) param_max_unroll_times)
>      nunroll = param_max_unroll_times;
> +  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
> +      > (unsigned) param_max_unrolled_average_calls_x10000)
> +    nunroll = 0;
>
>    if (targetm.loop_unroll_adjust)
>      nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> @@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags)
>      nunroll = nunroll_by_av;
>    if (nunroll > (unsigned) param_max_unroll_times)
>      nunroll = param_max_unroll_times;
> +  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
> +      > (unsigned) param_max_unrolled_average_calls_x10000)
> +    nunroll = 0;
>
>    if (targetm.loop_unroll_adjust)
>      nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f39e5d1a012..80605861223 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -634,6 +634,10 @@ The maximum number of unrollings of a single loop.
>  Common Joined UInteger Var(param_max_unrolled_insns) Init(200) Param Optimization
>  The maximum number of instructions to consider to unroll in a loop.
>
> +-param=max-unrolled-average-calls-x10000=
> +Common Joined UInteger Var(param_max_unrolled_average_calls_x10000) Init(0) Param Optimization
> +The maximum number of calls to consider to unroll in a loop on average and multiply 10000.
> +
>  -param=max-unswitch-insns=
>  Common Joined UInteger Var(param_max_unswitch_insns) Init(50) Param Optimization
>  The maximum number of insns of an unswitched loop.
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-08-24  9:16 ` Richard Biener
@ 2020-08-24 11:16   ` Jan Hubicka
  2020-08-25  2:26     ` Jiufu Guo
  2020-09-16  3:27     ` Jiufu Guo
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Hubicka @ 2020-08-24 11:16 UTC (permalink / raw)
  To: Richard Biener
  Cc: guojiufu, Bill Schmidt, GCC Patches, Segher Boessenkool, David Edelsohn

> On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > When unroll loops, if there are calls inside the loop, those calls
> > may raise negative impacts for unrolling.  This patch adds a param
> > param_max_unrolled_calls, and checks if the number of calls inside
> > the loop bigger than this param, loop is prevent from unrolling.
> >
> > This patch is checking the _average_ number of calls which is the
> > summary of call numbers multiply the possibility of the call maybe
> > executed.  The _average_ number could be a fraction, to keep the
> > precision, the param is the threshold number multiply 10000.
> >
> > Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
> 
> Can you try mimicking what try_unroll_loop_completely on GIMPLE does
> instead?  IIRC the main motivation to not unroll calls is the spilling code
> around it which we cannot estimate very well.  And that spilling happens
> irrespective of whether the call is in a hot or cold path so I'm not sure
> it makes sense to use the "average" number of calls here.

As long as I remember, we excluded calls simply becuase it is/was an
expensive intruction so it was an indication that the loop overhead is
small compared to the overhead of loop body.

Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-08-24 11:16   ` Jan Hubicka
@ 2020-08-25  2:26     ` Jiufu Guo
  2020-09-16  3:27     ` Jiufu Guo
  1 sibling, 0 replies; 17+ messages in thread
From: Jiufu Guo @ 2020-08-25  2:26 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Richard Biener, Bill Schmidt, GCC Patches, Segher Boessenkool,
	David Edelsohn

On 2020-08-24 19:16, Jan Hubicka wrote:
>> On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>> >
>> > Hi,
>> >
>> > This patch is checking the _average_ number of calls which is the
>> > summary of call numbers multiply the possibility of the call maybe
>> > executed.  The _average_ number could be a fraction, to keep the
>> > precision, the param is the threshold number multiply 10000.
>> >
>> Can you try mimicking what try_unroll_loop_completely on GIMPLE does
>> instead?  IIRC the main motivation to not unroll calls is the spilling 
>> code
>> around it which we cannot estimate very well.  And that spilling 
>> happens
>> irrespective of whether the call is in a hot or cold path so I'm not 
>> sure
>> it makes sense to use the "average" number of calls here.

In try_unroll_loop_completely, it is checking the calls in the hot path:
num_non_pure_calls_on_hot_path), and avoid unrolling if there is.
This is one reason for here to use "average".
> 
> As long as I remember, we excluded calls simply becuase it is/was an
> expensive intruction so it was an indication that the loop overhead is
> small compared to the overhead of loop body.

Thanks, Honza and Richard!
> 
> Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-08-20  4:34 [PATCH] Check calls before loop unrolling guojiufu
  2020-08-24  9:16 ` Richard Biener
@ 2020-09-01  3:33 ` Jiufu Guo
  2020-11-19 19:13   ` Jeff Law
  1 sibling, 1 reply; 17+ messages in thread
From: Jiufu Guo @ 2020-09-01  3:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt, segher, dje.gcc, richard.guenther, hubicka

guojiufu <guojiufu@linux.ibm.com> writes:

Hi,

In this patch, the default value of
param=max-unrolled-average-calls-x10000 is '0', which means to unroll
a loop, there should be no call inside the body.  Do I need to set the
default value to a bigger value (16?) for later tune?  Biger value will
keep the behavior unchanged.

And is this patch ok for trunk?  Thanks a lot for you comments!

BR.
Jiufu.


> Hi,
>
> When unroll loops, if there are calls inside the loop, those calls
> may raise negative impacts for unrolling.  This patch adds a param
> param_max_unrolled_calls, and checks if the number of calls inside
> the loop bigger than this param, loop is prevent from unrolling.
>
> This patch is checking the _average_ number of calls which is the
> summary of call numbers multiply the possibility of the call maybe
> executed.  The _average_ number could be a fraction, to keep the
> precision, the param is the threshold number multiply 10000.
>
> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>
> gcc/ChangeLog
> 2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>
>
> 	* params.opt (param_max_unrolled_average_calls_x10000): New param.
> 	* cfgloop.h (average_num_loop_calls): New declare.
> 	* cfgloopanal.c (average_num_loop_calls): New function.
> 	* loop-unroll.c (decide_unroll_constant_iteration,
> 	decide_unroll_runtime_iterations,
> 	decide_unroll_stupid): Check average_num_loop_calls and
> 	param_max_unrolled_average_calls_x10000.
> ---
>  gcc/cfgloop.h     |  2 ++
>  gcc/cfgloopanal.c | 25 +++++++++++++++++++++++++
>  gcc/loop-unroll.c | 10 ++++++++++
>  gcc/params.opt    |  4 ++++
>  4 files changed, 41 insertions(+)
>
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index 18b404e292f..dab933da150 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define GCC_CFGLOOP_H
>  
>  #include "cfgloopmanip.h"
> +#include "sreal.h"
>  
>  /* Structure to hold decision about unrolling/peeling.  */
>  enum lpt_dec
> @@ -387,6 +388,7 @@ extern vec<edge> get_loop_exit_edges (const class loop *, basic_block * = NULL);
>  extern edge single_exit (const class loop *);
>  extern edge single_likely_exit (class loop *loop, vec<edge>);
>  extern unsigned num_loop_branches (const class loop *);
> +extern sreal average_num_loop_calls (const class loop *);
>  
>  extern edge loop_preheader_edge (const class loop *);
>  extern edge loop_latch_edge (const class loop *);
> diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
> index 0b33e8272a7..a314db4e0c0 100644
> --- a/gcc/cfgloopanal.c
> +++ b/gcc/cfgloopanal.c
> @@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop)
>    return ret;
>  }
>  
> +/* Count the number of call insns in LOOP.  */
> +sreal
> +average_num_loop_calls (const class loop *loop)
> +{
> +  basic_block *bbs;
> +  rtx_insn *insn;
> +  unsigned int i, bncalls;
> +  sreal ncalls = 0;
> +
> +  bbs = get_loop_body (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      bncalls = 0;
> +      FOR_BB_INSNS (bbs[i], insn)
> +	if (CALL_P (insn))
> +	  bncalls++;
> +
> +      ncalls += (sreal) bncalls
> +	* bbs[i]->count.to_sreal_scale (loop->header->count);
> +    }
> +  free (bbs);
> +
> +  return ncalls;
> +}
> +
>  /* Returns expected number of iterations of LOOP, according to
>     measured or guessed profile.
>  
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 693c7768868..56b8fb37d2a 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int flags)
>      nunroll = nunroll_by_av;
>    if (nunroll > (unsigned) param_max_unroll_times)
>      nunroll = param_max_unroll_times;
> +  if (!loop->unroll
> +      && (average_num_loop_calls (loop) * (sreal) 10000).to_int ()
> +	   > (unsigned) param_max_unrolled_average_calls_x10000)
> +    nunroll = 0;
>  
>    if (targetm.loop_unroll_adjust)
>      nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> @@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int flags)
>      nunroll = nunroll_by_av;
>    if (nunroll > (unsigned) param_max_unroll_times)
>      nunroll = param_max_unroll_times;
> +  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
> +      > (unsigned) param_max_unrolled_average_calls_x10000)
> +    nunroll = 0;
>  
>    if (targetm.loop_unroll_adjust)
>      nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> @@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags)
>      nunroll = nunroll_by_av;
>    if (nunroll > (unsigned) param_max_unroll_times)
>      nunroll = param_max_unroll_times;
> +  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
> +      > (unsigned) param_max_unrolled_average_calls_x10000)
> +    nunroll = 0;
>  
>    if (targetm.loop_unroll_adjust)
>      nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f39e5d1a012..80605861223 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -634,6 +634,10 @@ The maximum number of unrollings of a single loop.
>  Common Joined UInteger Var(param_max_unrolled_insns) Init(200) Param Optimization
>  The maximum number of instructions to consider to unroll in a loop.
>  
> +-param=max-unrolled-average-calls-x10000=
> +Common Joined UInteger Var(param_max_unrolled_average_calls_x10000) Init(0) Param Optimization
> +The maximum number of calls to consider to unroll in a loop on average and multiply 10000.
> +
>  -param=max-unswitch-insns=
>  Common Joined UInteger Var(param_max_unswitch_insns) Init(50) Param Optimization
>  The maximum number of insns of an unswitched loop.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-08-24 11:16   ` Jan Hubicka
  2020-08-25  2:26     ` Jiufu Guo
@ 2020-09-16  3:27     ` Jiufu Guo
  1 sibling, 0 replies; 17+ messages in thread
From: Jiufu Guo @ 2020-09-16  3:27 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Richard Biener, Bill Schmidt, GCC Patches, Segher Boessenkool,
	David Edelsohn

Hi all,

This patch sets the default value to 16 for parameter
max_unrolled_average_calls which could be used to restict calls in loop
when unrolling.  This default value(16) is a big number which keeps
current behavior for almost all cases.

Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?

Thanks for comments!

Jiufu Guo

gcc/ChangeLog
2020-09-16  Jiufu Guo   <guojiufu@cn.ibm.com>

	* params.opt (param_max_unrolled_average_calls_x10000): New param.
	* cfgloop.h (average_num_loop_calls): New declare.
	* cfgloopanal.c (average_num_loop_calls): New function.
	* loop-unroll.c (decide_unroll_constant_iteration,
	decide_unroll_runtime_iterations,
	decide_unroll_stupid): Check average_num_loop_calls and
	param_max_unrolled_average_calls_x10000.
---
 gcc/cfgloop.h     |  2 ++
 gcc/cfgloopanal.c | 25 +++++++++++++++++++++++++
 gcc/loop-unroll.c | 10 ++++++++++
 gcc/params.opt    |  4 ++++
 4 files changed, 41 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 18b404e292f..dab933da150 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_CFGLOOP_H
 
 #include "cfgloopmanip.h"
+#include "sreal.h"
 
 /* Structure to hold decision about unrolling/peeling.  */
 enum lpt_dec
@@ -387,6 +388,7 @@ extern vec<edge> get_loop_exit_edges (const class loop *, basic_block * = NULL);
 extern edge single_exit (const class loop *);
 extern edge single_likely_exit (class loop *loop, vec<edge>);
 extern unsigned num_loop_branches (const class loop *);
+extern sreal average_num_loop_calls (const class loop *);
 
 extern edge loop_preheader_edge (const class loop *);
 extern edge loop_latch_edge (const class loop *);
diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index 0b33e8272a7..a314db4e0c0 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop)
   return ret;
 }
 
+/* Count the number of call insns in LOOP.  */
+sreal
+average_num_loop_calls (const class loop *loop)
+{
+  basic_block *bbs;
+  rtx_insn *insn;
+  unsigned int i, bncalls;
+  sreal ncalls = 0;
+
+  bbs = get_loop_body (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      bncalls = 0;
+      FOR_BB_INSNS (bbs[i], insn)
+	if (CALL_P (insn))
+	  bncalls++;
+
+      ncalls += (sreal) bncalls
+	* bbs[i]->count.to_sreal_scale (loop->header->count);
+    }
+  free (bbs);
+
+  return ncalls;
+}
+
 /* Returns expected number of iterations of LOOP, according to
    measured or guessed profile.
 
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 693c7768868..56b8fb37d2a 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int flags)
     nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
     nunroll = param_max_unroll_times;
+  if (!loop->unroll
+      && (average_num_loop_calls (loop) * (sreal) 10000).to_int ()
+	   > (unsigned) param_max_unrolled_average_calls_x10000)
+    nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int flags)
     nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
     nunroll = param_max_unroll_times;
+  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
+      > (unsigned) param_max_unrolled_average_calls_x10000)
+    nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags)
     nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
     nunroll = param_max_unroll_times;
+  if ((average_num_loop_calls (loop) * (sreal) 10000).to_int ()
+      > (unsigned) param_max_unrolled_average_calls_x10000)
+    nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
diff --git a/gcc/params.opt b/gcc/params.opt
index f39e5d1a012..80605861223 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -634,6 +634,10 @@ The maximum number of unrollings of a single loop.
 Common Joined UInteger Var(param_max_unrolled_insns) Init(200) Param Optimization
 The maximum number of instructions to consider to unroll in a loop.
 
+-param=max-unrolled-average-calls-x10000=
+Common Joined UInteger Var(param_max_unrolled_average_calls_x10000) Init(160000) Param Optimization
+The maximum number of calls to consider to unroll in a loop on average and multiply 10000.
+
 -param=max-unswitch-insns=
 Common Joined UInteger Var(param_max_unswitch_insns) Init(50) Param Optimization
 The maximum number of insns of an unswitched loop.
-- 
2.25.1



Jan Hubicka <hubicka@ucw.cz> writes:

>> On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>> >
>> > Hi,
>> >
>> > When unroll loops, if there are calls inside the loop, those calls
>> > may raise negative impacts for unrolling.  This patch adds a param
>> > param_max_unrolled_calls, and checks if the number of calls inside
>> > the loop bigger than this param, loop is prevent from unrolling.
>> >
>> > This patch is checking the _average_ number of calls which is the
>> > summary of call numbers multiply the possibility of the call maybe
>> > executed.  The _average_ number could be a fraction, to keep the
>> > precision, the param is the threshold number multiply 10000.
>> >
>> > Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>> 
>> Can you try mimicking what try_unroll_loop_completely on GIMPLE does
>> instead?  IIRC the main motivation to not unroll calls is the spilling code
>> around it which we cannot estimate very well.  And that spilling happens
>> irrespective of whether the call is in a hot or cold path so I'm not sure
>> it makes sense to use the "average" number of calls here.
>
> As long as I remember, we excluded calls simply becuase it is/was an
> expensive intruction so it was an indication that the loop overhead is
> small compared to the overhead of loop body.
>
> Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-09-01  3:33 ` Jiufu Guo
@ 2020-11-19 19:13   ` Jeff Law
  2020-11-19 19:42     ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-11-19 19:13 UTC (permalink / raw)
  To: Jiufu Guo, gcc-patches; +Cc: wschmidt, dje.gcc, segher, hubicka



On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> guojiufu <guojiufu@linux.ibm.com> writes:
>
> Hi,
>
> In this patch, the default value of
> param=max-unrolled-average-calls-x10000 is '0', which means to unroll
> a loop, there should be no call inside the body.  Do I need to set the
> default value to a bigger value (16?) for later tune?  Biger value will
> keep the behavior unchanged.
>
> And is this patch ok for trunk?  Thanks a lot for you comments!
>
> BR.
> Jiufu.
>
>
>> Hi,
>>
>> When unroll loops, if there are calls inside the loop, those calls
>> may raise negative impacts for unrolling.  This patch adds a param
>> param_max_unrolled_calls, and checks if the number of calls inside
>> the loop bigger than this param, loop is prevent from unrolling.
>>
>> This patch is checking the _average_ number of calls which is the
>> summary of call numbers multiply the possibility of the call maybe
>> executed.  The _average_ number could be a fraction, to keep the
>> precision, the param is the threshold number multiply 10000.
>>
>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>>
>> gcc/ChangeLog
>> 2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>
>>
>> 	* params.opt (param_max_unrolled_average_calls_x10000): New param.
>> 	* cfgloop.h (average_num_loop_calls): New declare.
>> 	* cfgloopanal.c (average_num_loop_calls): New function.
>> 	* loop-unroll.c (decide_unroll_constant_iteration,
>> 	decide_unroll_runtime_iterations,
>> 	decide_unroll_stupid): Check average_num_loop_calls and
>> 	param_max_unrolled_average_calls_x10000.
So what's the motivation behind adding a PARAM to control this
behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
tune behavior (though I've made the same lapse in judgment myself).  In
my mind a PARAM is really more about controlling pathological behavior.

jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-19 19:13   ` Jeff Law
@ 2020-11-19 19:42     ` Segher Boessenkool
  2020-11-19 19:53       ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2020-11-19 19:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jiufu Guo, gcc-patches, wschmidt, dje.gcc, hubicka

On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> > guojiufu <guojiufu@linux.ibm.com> writes:
> >> When unroll loops, if there are calls inside the loop, those calls
> >> may raise negative impacts for unrolling.  This patch adds a param
> >> param_max_unrolled_calls, and checks if the number of calls inside
> >> the loop bigger than this param, loop is prevent from unrolling.
> >>
> >> This patch is checking the _average_ number of calls which is the
> >> summary of call numbers multiply the possibility of the call maybe
> >> executed.  The _average_ number could be a fraction, to keep the
> >> precision, the param is the threshold number multiply 10000.
> >>
> >> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
> >>
> >> gcc/ChangeLog
> >> 2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>
> >>
> >> 	* params.opt (param_max_unrolled_average_calls_x10000): New param.
> >> 	* cfgloop.h (average_num_loop_calls): New declare.
> >> 	* cfgloopanal.c (average_num_loop_calls): New function.
> >> 	* loop-unroll.c (decide_unroll_constant_iteration,
> >> 	decide_unroll_runtime_iterations,
> >> 	decide_unroll_stupid): Check average_num_loop_calls and
> >> 	param_max_unrolled_average_calls_x10000.
> So what's the motivation behind adding a PARAM to control this
> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
> tune behavior (though I've made the same lapse in judgment myself).  In
> my mind a PARAM is really more about controlling pathological behavior.

But we (Power) need very different tuning than what others apparently
need.  It is similar to inlining, in that that also differs a lot
between archs how aggressively to do that optimally.


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-19 19:42     ` Segher Boessenkool
@ 2020-11-19 19:53       ` Jeff Law
  2020-11-19 20:01         ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-11-19 19:53 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jiufu Guo, gcc-patches, wschmidt, dje.gcc, hubicka



On 11/19/20 12:42 PM, Segher Boessenkool wrote:
> On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
>> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
>>> guojiufu <guojiufu@linux.ibm.com> writes:
>>>> When unroll loops, if there are calls inside the loop, those calls
>>>> may raise negative impacts for unrolling.  This patch adds a param
>>>> param_max_unrolled_calls, and checks if the number of calls inside
>>>> the loop bigger than this param, loop is prevent from unrolling.
>>>>
>>>> This patch is checking the _average_ number of calls which is the
>>>> summary of call numbers multiply the possibility of the call maybe
>>>> executed.  The _average_ number could be a fraction, to keep the
>>>> precision, the param is the threshold number multiply 10000.
>>>>
>>>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>>>>
>>>> gcc/ChangeLog
>>>> 2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>
>>>>
>>>> 	* params.opt (param_max_unrolled_average_calls_x10000): New param.
>>>> 	* cfgloop.h (average_num_loop_calls): New declare.
>>>> 	* cfgloopanal.c (average_num_loop_calls): New function.
>>>> 	* loop-unroll.c (decide_unroll_constant_iteration,
>>>> 	decide_unroll_runtime_iterations,
>>>> 	decide_unroll_stupid): Check average_num_loop_calls and
>>>> 	param_max_unrolled_average_calls_x10000.
>> So what's the motivation behind adding a PARAM to control this
>> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
>> tune behavior (though I've made the same lapse in judgment myself).  In
>> my mind a PARAM is really more about controlling pathological behavior.
> But we (Power) need very different tuning than what others apparently
> need.  It is similar to inlining, in that that also differs a lot
> between archs how aggressively to do that optimally.
But what I think that argues is that we've got a gap in the costing
model and/or how its being used.  Throwing PARAMS at the problem isn't
really useful for the end user.  The vast majority aren't going to use
them and of the ones that do, most are probably going to get it wrong.

In  my mind fixing things so they work with no magic arguments is best. 
PARAMS are the worst solution.  A -f flag with no arguments is somewhere
in between.  Others may clearly have different opinions here.


jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-19 19:53       ` Jeff Law
@ 2020-11-19 20:01         ` Segher Boessenkool
  2020-11-19 22:30           ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2020-11-19 20:01 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jiufu Guo, gcc-patches, wschmidt, dje.gcc, hubicka

On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote:
> On 11/19/20 12:42 PM, Segher Boessenkool wrote:
> > On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
> >> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> >>> guojiufu <guojiufu@linux.ibm.com> writes:
> >>>> When unroll loops, if there are calls inside the loop, those calls
> >>>> may raise negative impacts for unrolling.  This patch adds a param
> >>>> param_max_unrolled_calls, and checks if the number of calls inside
> >>>> the loop bigger than this param, loop is prevent from unrolling.
> >>>>
> >>>> This patch is checking the _average_ number of calls which is the
> >>>> summary of call numbers multiply the possibility of the call maybe
> >>>> executed.  The _average_ number could be a fraction, to keep the
> >>>> precision, the param is the threshold number multiply 10000.
> >>>>
> >>>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
> >>>>
> >>>> gcc/ChangeLog
> >>>> 2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>
> >>>>
> >>>> 	* params.opt (param_max_unrolled_average_calls_x10000): New param.
> >>>> 	* cfgloop.h (average_num_loop_calls): New declare.
> >>>> 	* cfgloopanal.c (average_num_loop_calls): New function.
> >>>> 	* loop-unroll.c (decide_unroll_constant_iteration,
> >>>> 	decide_unroll_runtime_iterations,
> >>>> 	decide_unroll_stupid): Check average_num_loop_calls and
> >>>> 	param_max_unrolled_average_calls_x10000.
> >> So what's the motivation behind adding a PARAM to control this
> >> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
> >> tune behavior (though I've made the same lapse in judgment myself).  In
> >> my mind a PARAM is really more about controlling pathological behavior.
> > But we (Power) need very different tuning than what others apparently
> > need.  It is similar to inlining, in that that also differs a lot
> > between archs how aggressively to do that optimally.
> But what I think that argues is that we've got a gap in the costing
> model and/or how its being used.  Throwing PARAMS at the problem isn't
> really useful for the end user.  The vast majority aren't going to use
> them and of the ones that do, most are probably going to get it wrong.

No, the vast majority of people will *not* (consciously) use them,
because the target defaults will set things to useful values.

The compiler could use saner "generic" defaults perhaps, but those will
still not be satisfactory for anyone (except when they aren't generic in
fact but instead tuned for one arch ;-) ) -- unrolling is just too
important for performance.

> In  my mind fixing things so they work with no magic arguments is best. 
> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> in between.  Others may clearly have different opinions here.

There is no big difference between params and flags here, IMO -- it has
to be a -f with a value as well, for good results.

Since we have (almost) all such tunings in --param already, I'd say this
one belongs there as well?


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-19 20:01         ` Segher Boessenkool
@ 2020-11-19 22:30           ` Jeff Law
  2020-11-19 23:56             ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-11-19 22:30 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jiufu Guo, gcc-patches, wschmidt, dje.gcc, hubicka



On 11/19/20 1:01 PM, Segher Boessenkool wrote:
> On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote:
>> On 11/19/20 12:42 PM, Segher Boessenkool wrote:
>>> On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
>>>> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
>>>>> guojiufu <guojiufu@linux.ibm.com> writes:
>>>>>> When unroll loops, if there are calls inside the loop, those calls
>>>>>> may raise negative impacts for unrolling.  This patch adds a param
>>>>>> param_max_unrolled_calls, and checks if the number of calls inside
>>>>>> the loop bigger than this param, loop is prevent from unrolling.
>>>>>>
>>>>>> This patch is checking the _average_ number of calls which is the
>>>>>> summary of call numbers multiply the possibility of the call maybe
>>>>>> executed.  The _average_ number could be a fraction, to keep the
>>>>>> precision, the param is the threshold number multiply 10000.
>>>>>>
>>>>>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>>>>>>
>>>>>> gcc/ChangeLog
>>>>>> 2020-08-19  Jiufu Guo   <guojiufu@cn.ibm.com>
>>>>>>
>>>>>> 	* params.opt (param_max_unrolled_average_calls_x10000): New param.
>>>>>> 	* cfgloop.h (average_num_loop_calls): New declare.
>>>>>> 	* cfgloopanal.c (average_num_loop_calls): New function.
>>>>>> 	* loop-unroll.c (decide_unroll_constant_iteration,
>>>>>> 	decide_unroll_runtime_iterations,
>>>>>> 	decide_unroll_stupid): Check average_num_loop_calls and
>>>>>> 	param_max_unrolled_average_calls_x10000.
>>>> So what's the motivation behind adding a PARAM to control this
>>>> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
>>>> tune behavior (though I've made the same lapse in judgment myself).  In
>>>> my mind a PARAM is really more about controlling pathological behavior.
>>> But we (Power) need very different tuning than what others apparently
>>> need.  It is similar to inlining, in that that also differs a lot
>>> between archs how aggressively to do that optimally.
>> But what I think that argues is that we've got a gap in the costing
>> model and/or how its being used.  Throwing PARAMS at the problem isn't
>> really useful for the end user.  The vast majority aren't going to use
>> them and of the ones that do, most are probably going to get it wrong.
> No, the vast majority of people will *not* (consciously) use them,
> because the target defaults will set things to useful values.
>
> The compiler could use saner "generic" defaults perhaps, but those will
> still not be satisfactory for anyone (except when they aren't generic in
> fact but instead tuned for one arch ;-) ) -- unrolling is just too
> important for performance.
Then fix the heuristics, don't add new PARAMS :-)

It didn't even occur to me until now that you may be pushing to have the
ppc backend have different values for the PARAMS.  I would strongly
discourage that.  It's been a huge headache in the s390 backend already.

>
>> In  my mind fixing things so they work with no magic arguments is best. 
>> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
>> in between.  Others may clearly have different opinions here.
> There is no big difference between params and flags here, IMO -- it has
> to be a -f with a value as well, for good results.
Which is a signal that we have a deeper problem.  -f with a value is no
different than a param.

>
> Since we have (almost) all such tunings in --param already, I'd say this
> one belongs there as well?
I'm not convinced at this point. 

jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-19 22:30           ` Jeff Law
@ 2020-11-19 23:56             ` Segher Boessenkool
  2020-11-20  7:48               ` Richard Biener
  2020-11-20 15:22               ` Jan Hubicka
  0 siblings, 2 replies; 17+ messages in thread
From: Segher Boessenkool @ 2020-11-19 23:56 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jiufu Guo, gcc-patches, wschmidt, dje.gcc, hubicka

On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > No, the vast majority of people will *not* (consciously) use them,
> > because the target defaults will set things to useful values.
> >
> > The compiler could use saner "generic" defaults perhaps, but those will
> > still not be satisfactory for anyone (except when they aren't generic in
> > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > important for performance.
> Then fix the heuristics, don't add new PARAMS :-)

I just said that cannot work?

> It didn't even occur to me until now that you may be pushing to have the
> ppc backend have different values for the PARAMS.  I would strongly
> discourage that.  It's been a huge headache in the s390 backend already.

It also makes a huge performance difference.  That the generic parts
of GCC are only tuned for x86 (or not well tuned for anything?) is a
huge roadblock for us.

I am not saying we should have six hundred different tunings.  But we
need a few (and we already *have* a few, not params but generic flags,
just like many other targets fwiw).

We *do* have a few custom param settings already, just like aarch64,
ia64, and sh, actually.

> >> In  my mind fixing things so they work with no magic arguments is best. 
> >> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> >> in between.  Others may clearly have different opinions here.
> > There is no big difference between params and flags here, IMO -- it has
> > to be a -f with a value as well, for good results.
> Which is a signal that we have a deeper problem.  -f with a value is no
> different than a param.

Yes exactly.

> > Since we have (almost) all such tunings in --param already, I'd say this
> > one belongs there as well?
> I'm not convinced at this point. 

Why not?

We have way many params, yes.  But the first step to counteract that
would be to deprecate and get rid of many existing ones, not to block
having new ones which can be useful (while many of the existing ones are
not).

Or, we could accept that it is not really a problem at all.  You seem to
have a strong opinion that it *is*, but I don't understand that; maybe
you can explain a bit more?

Thanks,

Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-19 23:56             ` Segher Boessenkool
@ 2020-11-20  7:48               ` Richard Biener
  2020-11-20 14:58                 ` David Edelsohn
  2020-11-20 15:22               ` Jan Hubicka
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2020-11-20  7:48 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, Bill Schmidt, GCC Patches, Jan Hubicka, David Edelsohn

On Fri, Nov 20, 2020 at 12:58 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > > No, the vast majority of people will *not* (consciously) use them,
> > > because the target defaults will set things to useful values.
> > >
> > > The compiler could use saner "generic" defaults perhaps, but those will
> > > still not be satisfactory for anyone (except when they aren't generic in
> > > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > > important for performance.
> > Then fix the heuristics, don't add new PARAMS :-)
>
> I just said that cannot work?
>
> > It didn't even occur to me until now that you may be pushing to have the
> > ppc backend have different values for the PARAMS.  I would strongly
> > discourage that.  It's been a huge headache in the s390 backend already.
>
> It also makes a huge performance difference.  That the generic parts
> of GCC are only tuned for x86 (or not well tuned for anything?) is a
> huge roadblock for us.
>
> I am not saying we should have six hundred different tunings.  But we
> need a few (and we already *have* a few, not params but generic flags,
> just like many other targets fwiw).
>
> We *do* have a few custom param settings already, just like aarch64,
> ia64, and sh, actually.
>
> > >> In  my mind fixing things so they work with no magic arguments is best.
> > >> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> > >> in between.  Others may clearly have different opinions here.
> > > There is no big difference between params and flags here, IMO -- it has
> > > to be a -f with a value as well, for good results.
> > Which is a signal that we have a deeper problem.  -f with a value is no
> > different than a param.
>
> Yes exactly.
>
> > > Since we have (almost) all such tunings in --param already, I'd say this
> > > one belongs there as well?
> > I'm not convinced at this point.
>
> Why not?
>
> We have way many params, yes.

--params were introduced to avoid "magic numbers" in code and at the
same time not overwhelm users with many -f options.  That they are
runtime-controllable was probably done because we could and because
it's nice for GCC developers.

>  But the first step to counteract that
> would be to deprecate and get rid of many existing ones, not to block
> having new ones which can be useful (while many of the existing ones are
> not).

Not sure about this - sure, if heuristic can be simplified to use N < M
(previous) "magic" numbers that's better.  But if "deprecating" just
involves pasting the current --param default literally into the heuristcs
then no, please not.

For this particular patch the question is if the heuristic is sound,
not the particular magic number.  And I have no opinion about this
(being this is the RTL unroller).

Richard.

>
> Or, we could accept that it is not really a problem at all.  You seem to
> have a strong opinion that it *is*, but I don't understand that; maybe
> you can explain a bit more?
>
> Thanks,
>
>
> Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-20  7:48               ` Richard Biener
@ 2020-11-20 14:58                 ` David Edelsohn
  0 siblings, 0 replies; 17+ messages in thread
From: David Edelsohn @ 2020-11-20 14:58 UTC (permalink / raw)
  To: Richard Biener, Segher Boessenkool, Jeff Law
  Cc: Bill Schmidt, GCC Patches, Jan Hubicka

On Fri, Nov 20, 2020 at 2:48 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 12:58 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > > > No, the vast majority of people will *not* (consciously) use them,
> > > > because the target defaults will set things to useful values.
> > > >
> > > > The compiler could use saner "generic" defaults perhaps, but those will
> > > > still not be satisfactory for anyone (except when they aren't generic in
> > > > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > > > important for performance.
> > > Then fix the heuristics, don't add new PARAMS :-)
> >
> > I just said that cannot work?
> >
> > > It didn't even occur to me until now that you may be pushing to have the
> > > ppc backend have different values for the PARAMS.  I would strongly
> > > discourage that.  It's been a huge headache in the s390 backend already.
> >
> > It also makes a huge performance difference.  That the generic parts
> > of GCC are only tuned for x86 (or not well tuned for anything?) is a
> > huge roadblock for us.
> >
> > I am not saying we should have six hundred different tunings.  But we
> > need a few (and we already *have* a few, not params but generic flags,
> > just like many other targets fwiw).
> >
> > We *do* have a few custom param settings already, just like aarch64,
> > ia64, and sh, actually.
> >
> > > >> In  my mind fixing things so they work with no magic arguments is best.
> > > >> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> > > >> in between.  Others may clearly have different opinions here.
> > > > There is no big difference between params and flags here, IMO -- it has
> > > > to be a -f with a value as well, for good results.
> > > Which is a signal that we have a deeper problem.  -f with a value is no
> > > different than a param.
> >
> > Yes exactly.
> >
> > > > Since we have (almost) all such tunings in --param already, I'd say this
> > > > one belongs there as well?
> > > I'm not convinced at this point.
> >
> > Why not?
> >
> > We have way many params, yes.
>
> --params were introduced to avoid "magic numbers" in code and at the
> same time not overwhelm users with many -f options.  That they are
> runtime-controllable was probably done because we could and because
> it's nice for GCC developers.

GCC historically has not done a good job at loop unrolling.  And
tuning of loop unrolling is inherently architecture- and
microarchitecture-specific.  There is no "better heuristic".  There is
nothing inherent in the instruction stream to determine an optimal
unrolling that is correct for x86 and AArch64 and RISC-V and s390x and
Power.

Based on what academic literature or experience of other compilers
have you determined that this limitation can be addressed with "fix
the heuristics"?

The patch *IS* trying to fix the heuristics.  The heuristics require
additional, processor-specific information.  And a parameter is the
natural mechanism in GCC to provide a numerical value to adjust a
heuristic.

As Richard wrote, the GCC community chose to collect the "magic
numbers" in a centralized table with a consistent interface that can
be overridden by individual ports.  The ability to override the
parameters on the command line was for convenience and ease of
development.  It's not meant as a value that any end-user normally
will adjust.  But there is no reason to arbitrarily start a campaign
against parameters.

GCC supports a large number of targets, and that requires the ability
to adjust the optimization and transformation behavior of the compiler
to achieve the best performance on a wide variety of processors.  We
would appreciate it if you would not block a patch that improves the
code generation of GCC on Power (and other targets) because of an
aesthetic concern about too many parameters in GCC.  That ship has
sailed.

Thanks, David

>
> >  But the first step to counteract that
> > would be to deprecate and get rid of many existing ones, not to block
> > having new ones which can be useful (while many of the existing ones are
> > not).
>
> Not sure about this - sure, if heuristic can be simplified to use N < M
> (previous) "magic" numbers that's better.  But if "deprecating" just
> involves pasting the current --param default literally into the heuristcs
> then no, please not.
>
> For this particular patch the question is if the heuristic is sound,
> not the particular magic number.  And I have no opinion about this
> (being this is the RTL unroller).
>
> Richard.
>
> >
> > Or, we could accept that it is not really a problem at all.  You seem to
> > have a strong opinion that it *is*, but I don't understand that; maybe
> > you can explain a bit more?
> >
> > Thanks,
> >
> >
> > Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-19 23:56             ` Segher Boessenkool
  2020-11-20  7:48               ` Richard Biener
@ 2020-11-20 15:22               ` Jan Hubicka
  2020-11-20 18:09                 ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: Jan Hubicka @ 2020-11-20 15:22 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, wschmidt, gcc-patches, dje.gcc

> On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > > No, the vast majority of people will *not* (consciously) use them,
> > > because the target defaults will set things to useful values.
> > >
> > > The compiler could use saner "generic" defaults perhaps, but those will
> > > still not be satisfactory for anyone (except when they aren't generic in
> > > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > > important for performance.
> > Then fix the heuristics, don't add new PARAMS :-)
> 
> I just said that cannot work?
> 
> > It didn't even occur to me until now that you may be pushing to have the
> > ppc backend have different values for the PARAMS.  I would strongly
> > discourage that.  It's been a huge headache in the s390 backend already.
> 
> It also makes a huge performance difference.  That the generic parts
> of GCC are only tuned for x86 (or not well tuned for anything?) is a
> huge roadblock for us.

As you know I spend quite some time on inliner heuristics but even after
the years I have no clear idea how the requirements differs from x86-64
to ppc, arm and s390.  Clearly compared to x86_64 prologues may get more
expensive on ppc/arm because of more registers (so we should inline less
to cold code) and function calls are more expensive (so we sould inline
more to hot code). We do have PR for that in testusite where most of
them I looked through.

Problem is that each of us has different metodology - different
bechmarks to look at and different opinions on what is good for O2 and
O3.  From long term maintenace POV I am worried about changing a lot of
--param defaults in different backends simply becuase the meaning of
those values keeps changing (as early opts improve; we get better on
tracking optimizations during IPA passes; and our focus shift from C
with sane inlines to basic C++ to heavy templatized C++ with many broken
inline hints to heavy C++ with lto).

For this reason I tend to preffer to not tweak in taret specific ways
unless there is very clear evidence to do so just because I think I will
not be able to maintain code quality testing in future.

It would be very interesting to set up testing that could let us compare
basic arches side to side to different defaults. Our LNT testing does
good job for x86-64 but we have basically zero coverage publically
available on other targets and it is very hard to get inliner relevant
banchmarks (where SPEC is not the best choice) done in comparable way on
multiple arches.

Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-20 15:22               ` Jan Hubicka
@ 2020-11-20 18:09                 ` Segher Boessenkool
  2020-11-23  8:42                   ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2020-11-20 18:09 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, wschmidt, gcc-patches, dje.gcc

Hi!

On Fri, Nov 20, 2020 at 04:22:47PM +0100, Jan Hubicka wrote:
> As you know I spend quite some time on inliner heuristics but even after
> the years I have no clear idea how the requirements differs from x86-64
> to ppc, arm and s390.  Clearly compared to x86_64 prologues may get more
> expensive on ppc/arm because of more registers (so we should inline less
> to cold code) and function calls are more expensive (so we sould inline
> more to hot code). We do have PR for that in testusite where most of
> them I looked through.

I made -fshrink-wrap-separate to make prologues less expensive for stuff
that is only used on the cold paths.  This matters a lot -- and much
more could be done there, but that requires changing the generated code,
not just reordering it, so it is harder to do.

Prologues (and epilogues) are only expensive if they are only needed for
cold code, in a hot function.

> Problem is that each of us has different metodology - different
> bechmarks to look at

This is a good thing often as well, it increases our total coverage.
But if not everything sees all results that also hurts :-/

> and different opinions on what is good for O2 and
> O3.

Yeah.  The documentation for -O3 merely says "Optimize yet more.", but
that is no guidance at all: why would a user ever use -O2 then?

I always understood it as "-O2 is always faster than -O1, but -O3 is not
always faster than -O2".  Aka "-O2 is always a good choice, and -O3 is a
an even better choice for *some* code, but that needs testing per case".

In at least that understanding, and also to battle inflation in general,
we probably should move some things from -O3 to -O2.

> From long term maintenace POV I am worried about changing a lot of
> --param defaults in different backends

Me too.  But changing a few key ones is just too important for
performance :-/

> simply becuase the meaning of
> those values keeps changing (as early opts improve; we get better on
> tracking optimizations during IPA passes; and our focus shift from C
> with sane inlines to basic C++ to heavy templatized C++ with many broken
> inline hints to heavy C++ with lto).

I don't like if targets start to differ too much (in what generic passes
effectively do), no matter what.  It's just not maintainable.

> For this reason I tend to preffer to not tweak in taret specific ways
> unless there is very clear evidence to do so just because I think I will
> not be able to maintain code quality testing in future.

Yes, completely agreed.  But that exception is important :-)

> It would be very interesting to set up testing that could let us compare
> basic arches side to side to different defaults. Our LNT testing does
> good job for x86-64 but we have basically zero coverage publically
> available on other targets and it is very hard to get inliner relevant
> banchmarks (where SPEC is not the best choice) done in comparable way on
> multiple arches.

We cannot help with that on the cfarm, unless we get dedicated hardware
for such benchmarking (and I am not holding my breath for that, getting
good coverage at all is hard enough).  So you probably need to get such
support for every arch separately, elsewhere :-/

Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Check calls before loop unrolling
  2020-11-20 18:09                 ` Segher Boessenkool
@ 2020-11-23  8:42                   ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2020-11-23  8:42 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jan Hubicka, GCC Patches, Bill Schmidt, David Edelsohn

On Fri, Nov 20, 2020 at 7:11 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Fri, Nov 20, 2020 at 04:22:47PM +0100, Jan Hubicka wrote:
> > As you know I spend quite some time on inliner heuristics but even after
> > the years I have no clear idea how the requirements differs from x86-64
> > to ppc, arm and s390.  Clearly compared to x86_64 prologues may get more
> > expensive on ppc/arm because of more registers (so we should inline less
> > to cold code) and function calls are more expensive (so we sould inline
> > more to hot code). We do have PR for that in testusite where most of
> > them I looked through.
>
> I made -fshrink-wrap-separate to make prologues less expensive for stuff
> that is only used on the cold paths.  This matters a lot -- and much
> more could be done there, but that requires changing the generated code,
> not just reordering it, so it is harder to do.
>
> Prologues (and epilogues) are only expensive if they are only needed for
> cold code, in a hot function.
>
> > Problem is that each of us has different metodology - different
> > bechmarks to look at
>
> This is a good thing often as well, it increases our total coverage.
> But if not everything sees all results that also hurts :-/
>
> > and different opinions on what is good for O2 and
> > O3.
>
> Yeah.  The documentation for -O3 merely says "Optimize yet more.", but
> that is no guidance at all: why would a user ever use -O2 then?
>
> I always understood it as "-O2 is always faster than -O1, but -O3 is not
> always faster than -O2".  Aka "-O2 is always a good choice, and -O3 is a
> an even better choice for *some* code, but that needs testing per case".

So basically -O2 is supposed to be well-balanced in compile-time, code-size,
performance and debuggability (if there's sth like that with optimized code...).

-O1 is what you should use for machine-generated code, we kind-of promise
to have no quadratic or worse algorithms in compile-time/memory-use here
so you can throw a multi-gigabyte source function at GCC and it should not
blow up.  And -O1 still optimizes.

-Os is when you want small code size at all cost (compile-time, less
performance).

-O3 is when you want performance at all cost (compile-time + code size and
the ability to debug)

So I'd always use -O2 unless doing a compute workload where I'd chose
-O3 (maybe selective for the relevant TUs).

Then there's profile-feedback and LTO which are enable them if you can
(I'd avoid them for code you need -O1 for).  It really helps GCC to make
an appropriate decision what code to optimize more/less for the goal
of the balanced profile which -O2 is.

> In at least that understanding, and also to battle inflation in general,
> we probably should move some things from -O3 to -O2.
>
> > From long term maintenace POV I am worried about changing a lot of
> > --param defaults in different backends
>
> Me too.  But changing a few key ones is just too important for
> performance :-/
>
> > simply becuase the meaning of
> > those values keeps changing (as early opts improve; we get better on
> > tracking optimizations during IPA passes; and our focus shift from C
> > with sane inlines to basic C++ to heavy templatized C++ with many broken
> > inline hints to heavy C++ with lto).
>
> I don't like if targets start to differ too much (in what generic passes
> effectively do), no matter what.  It's just not maintainable.
>
> > For this reason I tend to preffer to not tweak in taret specific ways
> > unless there is very clear evidence to do so just because I think I will
> > not be able to maintain code quality testing in future.
>
> Yes, completely agreed.  But that exception is important :-)
>
> > It would be very interesting to set up testing that could let us compare
> > basic arches side to side to different defaults. Our LNT testing does
> > good job for x86-64 but we have basically zero coverage publically
> > available on other targets and it is very hard to get inliner relevant
> > banchmarks (where SPEC is not the best choice) done in comparable way on
> > multiple arches.
>
> We cannot help with that on the cfarm, unless we get dedicated hardware
> for such benchmarking (and I am not holding my breath for that, getting
> good coverage at all is hard enough).  So you probably need to get such
> support for every arch separately, elsewhere :-/
>
>
> Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-11-23  8:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20  4:34 [PATCH] Check calls before loop unrolling guojiufu
2020-08-24  9:16 ` Richard Biener
2020-08-24 11:16   ` Jan Hubicka
2020-08-25  2:26     ` Jiufu Guo
2020-09-16  3:27     ` Jiufu Guo
2020-09-01  3:33 ` Jiufu Guo
2020-11-19 19:13   ` Jeff Law
2020-11-19 19:42     ` Segher Boessenkool
2020-11-19 19:53       ` Jeff Law
2020-11-19 20:01         ` Segher Boessenkool
2020-11-19 22:30           ` Jeff Law
2020-11-19 23:56             ` Segher Boessenkool
2020-11-20  7:48               ` Richard Biener
2020-11-20 14:58                 ` David Edelsohn
2020-11-20 15:22               ` Jan Hubicka
2020-11-20 18:09                 ` Segher Boessenkool
2020-11-23  8:42                   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).