public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
@ 2020-05-25 11:10 guojiufu
  2020-05-25 11:10 ` [PATCH 2/2] rs6000: Turn on -frtl-unroll-loops instead -funroll-loops at -O2 guojiufu
  2020-05-25 12:14 ` [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller Richard Biener
  0 siblings, 2 replies; 11+ messages in thread
From: guojiufu @ 2020-05-25 11:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: guojiufu, wschmidt, segher, dje.gcc, richard.guenther

Currently option -funroll-loops controls both GIMPLE unroler and
RTL unroller. It is not able to control GIMPLE cunroller and
RTL unroller independently.  This patch introducing different flags
to control them seperately, and this also provide more freedom to
tune one of them without affecting another.

This patch introduces two undocumented flags: -fcomplete-unroll-loops
for GIMPLE cunroll, and -frtl-unroll-loops for RTL unroller.  And
these two options are enabled by original -funroll-loops.

Bootstrap and regtest pass on powerpc64le, is this ok for trunk?

Jiufu

ChangeLog:
2020-05-25  Jiufu Guo   <guojiufu@cn.ibm.com>

	* common.opt: Add -frtl-unroll-loops and -fcomplete-unroll-loops.
	* opts.c (enable_fdo_optimizations): Replace flag_unroll_loops
	with flag_complete_unroll_loops.
	* toplev.c (process_options): set flag_rtl_unroll_loops and
	flag_complete_unroll_loops if not explicitly set by user.
	* tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute): Replace
	flag_unroll_loops with flag_complete_unroll_loops.
	* loop-init.c (pass_loop2::gate): Replace flag_unroll_loops with
	flag_rtl_unroll_loops.
	(pass_rtl_unroll_loops::gate): Replace flag_unroll_loops with
	flag_rtl_unroll_loops.
---
 gcc/common.opt              | 8 ++++++++
 gcc/loop-init.c             | 6 +++---
 gcc/opts.c                  | 2 +-
 gcc/toplev.c                | 6 ++++++
 gcc/tree-ssa-loop-ivcanon.c | 2 +-
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4464049fc1f..3b5ab52bb9d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2856,6 +2856,14 @@ funroll-all-loops
 Common Report Var(flag_unroll_all_loops) Optimization
 Perform loop unrolling for all loops.
 
+frtl-unroll-loops
+Common Undocumented Var(flag_rtl_unroll_loops) Init(2) Optimization
+; Perform rtl loop unrolling when iteration count is known.
+
+fcomplete-unroll-loops
+Common Undocumented Var(flag_complete_unroll_loops) Init(2) Optimization
+; Perform GIMPLE loop complete unrolling.
+
 ; Nonzero means that loop optimizer may assume that the induction variables
 ; that control loops do not overflow and that the loops with nontrivial
 ; exit condition are not infinite
diff --git a/gcc/loop-init.c b/gcc/loop-init.c
index 401e5282907..e955068f36c 100644
--- a/gcc/loop-init.c
+++ b/gcc/loop-init.c
@@ -360,7 +360,7 @@ pass_loop2::gate (function *fun)
   if (optimize > 0
       && (flag_move_loop_invariants
 	  || flag_unswitch_loops
-	  || flag_unroll_loops
+	  || flag_rtl_unroll_loops
 	  || (flag_branch_on_count_reg && targetm.have_doloop_end ())
 	  || cfun->has_unroll))
     return true;
@@ -560,7 +560,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
+      return (flag_rtl_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
     }
 
   virtual unsigned int execute (function *);
@@ -576,7 +576,7 @@ pass_rtl_unroll_loops::execute (function *fun)
       if (dump_file)
 	df_dump (dump_file);
 
-      if (flag_unroll_loops)
+      if (flag_rtl_unroll_loops)
 	flags |= UAP_UNROLL;
       if (flag_unroll_all_loops)
 	flags |= UAP_UNROLL_ALL;
diff --git a/gcc/opts.c b/gcc/opts.c
index ec3ca0720f9..64c35d8d7fc 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1702,7 +1702,7 @@ enable_fdo_optimizations (struct gcc_options *opts,
 {
   SET_OPTION_IF_UNSET (opts, opts_set, flag_branch_probabilities, value);
   SET_OPTION_IF_UNSET (opts, opts_set, flag_profile_values, value);
-  SET_OPTION_IF_UNSET (opts, opts_set, flag_unroll_loops, value);
+  SET_OPTION_IF_UNSET (opts, opts_set, flag_complete_unroll_loops, value);
   SET_OPTION_IF_UNSET (opts, opts_set, flag_peel_loops, value);
   SET_OPTION_IF_UNSET (opts, opts_set, flag_tracer, value);
   SET_OPTION_IF_UNSET (opts, opts_set, flag_value_profile_transformations,
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 96316fbd23b..c2b94d33464 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1474,6 +1474,12 @@ process_options (void)
   if (flag_unroll_all_loops)
     flag_unroll_loops = 1;
 
+  if (flag_rtl_unroll_loops == AUTODETECT_VALUE)
+    flag_rtl_unroll_loops = flag_unroll_loops;
+
+  if (flag_complete_unroll_loops == AUTODETECT_VALUE)
+    flag_complete_unroll_loops = flag_unroll_loops;
+
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
     flag_web = flag_unroll_loops;
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8ab6ab3330c..cd5df353df5 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1603,7 +1603,7 @@ pass_complete_unroll::execute (function *fun)
      re-peeling the same loop multiple times.  */
   if (flag_peel_loops)
     peeled_loops = BITMAP_ALLOC (NULL);
-  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
+  unsigned int val = tree_unroll_loops_completely (flag_complete_unroll_loops
 						   || flag_peel_loops
 						   || optimize >= 3, true);
   if (peeled_loops)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] rs6000: Turn on -frtl-unroll-loops instead -funroll-loops at -O2
  2020-05-25 11:10 [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller guojiufu
@ 2020-05-25 11:10 ` guojiufu
  2020-05-25 12:14 ` [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller Richard Biener
  1 sibling, 0 replies; 11+ messages in thread
From: guojiufu @ 2020-05-25 11:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: guojiufu, wschmidt, segher, dje.gcc, richard.guenther

Previously, turning -funroll-loops on at -O2, which also turn on
GIMPLE cunroll fully.  While cunroll unrolls some complex loops.

This patch turn on -frtl-unroll-loops at -O2 only, and continue to
use previous tuned rs6000 heurisitics for small loops.  While this
patch does not turn on GIMPLE cunroll any more.  We may tune
cunroll in near future at -O2.

In this patch, it become simpler to check/set -fweb, -frename-register
and -munroll-only-small-loops.  Together with -frtl-unroll-loops, -fweb
is useful, then turn -fweb on;  and -frename-registers is no need to
be checked, because it is affected by -frtl-unroll-loops.


Bootstrap and regtest pass on powerpc64le, is this ok for trunk?
And backport to GCC10 together with the patch "Seperate -funroll-loops
 for GIMPLE unroller and RTL unroller"

Jiufu


gcc/ChangeLog
2020-05-25  Jiufu Guo   <guojiufu@cn.ibm.com>

	PR target/95018
	* common/config/rs6000/rs6000-common.c
	(rs6000_option_optimization_table)
	[OPT_LEVELS_2_PLUS_SPEED_ONLY]: Replace -funroll-loops
	with -frtl-unroll-loops.  Remove -munroll-only-small-loops
	and add -fweb.
	[OPT_LEVELS_ALL]: Remove turn off -frename-registers.
	* config/rs6000/rs6000.c (rs6000_option_override_internal):
	-funroll-loops overrides -munroll-only-small-loops and
	-frtl-unroll-loops.
---
 gcc/common/config/rs6000/rs6000-common.c | 11 +++--------
 gcc/config/rs6000/rs6000.c               | 21 ++++++++++-----------
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/gcc/common/config/rs6000/rs6000-common.c b/gcc/common/config/rs6000/rs6000-common.c
index ee37b9dc90b..c7388edb867 100644
--- a/gcc/common/config/rs6000/rs6000-common.c
+++ b/gcc/common/config/rs6000/rs6000-common.c
@@ -34,14 +34,9 @@ static const struct default_options rs6000_option_optimization_table[] =
     { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
     /* Enable -fsched-pressure for first pass instruction scheduling.  */
     { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
-       loops at -O2 and above by default.  */
-    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
-    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
-
-    /* -frename-registers leads to non-optimal codegen and performance
-       on rs6000, turn it off by default.  */
-    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
+    /* Enable -frtl-unroll-loops and -fweb at -O2 and above by default.  */
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_frtl_unroll_loops, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fweb, NULL, 1 },
 
     /* Double growth factor to counter reduced min jump length.  */
     { OPT_LEVELS_ALL, OPT__param_max_grow_copy_bb_insns_, NULL, 16 },
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8435bc15d72..96620651a59 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4557,17 +4557,16 @@ rs6000_option_override_internal (bool global_init_p)
 			   param_sched_pressure_algorithm,
 			   SCHED_PRESSURE_MODEL);
 
-      /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
-	 turns -frename-registers on.  */
-      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
-	   || (global_options_set.x_flag_unroll_all_loops
-	       && flag_unroll_all_loops))
-	{
-	  if (!global_options_set.x_unroll_only_small_loops)
-	    unroll_only_small_loops = 0;
-	  if (!global_options_set.x_flag_rename_registers)
-	    flag_rename_registers = 1;
-	}
+      /* if -f[no-]unroll-loops is specified explicitly, turn [off/]on
+	 -frtl-unroll-loops.  */
+      if (global_options_set.x_flag_unroll_loops
+	  && !global_options_set.x_flag_rtl_unroll_loops)
+	flag_rtl_unroll_loops = flag_unroll_loops;
+	
+      /* If flag_unroll_loops is effect, not _only_ small loops, but
+	 large loops are unrolled if possible.  */
+      if (!global_options_set.x_unroll_only_small_loops)
+	unroll_only_small_loops = flag_unroll_loops ? 0 : 1;
 
       /* If using typedef char *va_list, signal that
 	 __builtin_va_start (&ap, 0) can be optimized to
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-25 11:10 [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller guojiufu
  2020-05-25 11:10 ` [PATCH 2/2] rs6000: Turn on -frtl-unroll-loops instead -funroll-loops at -O2 guojiufu
@ 2020-05-25 12:14 ` Richard Biener
  2020-05-25 17:40   ` Segher Boessenkool
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Biener @ 2020-05-25 12:14 UTC (permalink / raw)
  To: guojiufu; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool, David Edelsohn

On Mon, May 25, 2020 at 1:10 PM guojiufu <guojiufu@linux.ibm.com> wrote:
>
> Currently option -funroll-loops controls both GIMPLE unroler and
> RTL unroller. It is not able to control GIMPLE cunroller and
> RTL unroller independently.  This patch introducing different flags
> to control them seperately, and this also provide more freedom to
> tune one of them without affecting another.
>
> This patch introduces two undocumented flags: -fcomplete-unroll-loops
> for GIMPLE cunroll, and -frtl-unroll-loops for RTL unroller.  And
> these two options are enabled by original -funroll-loops.

I don't like the new -fcomplete-unroll-loops name.  It does not
match the implementation either (the gate functions of cunroll
and cunrolli only check optimize against 2/3), given it just
controls whether unrolling can grow code size.

> -  SET_OPTION_IF_UNSET (opts, opts_set, flag_unroll_loops, value);
> +  SET_OPTION_IF_UNSET (opts, opts_set, flag_complete_unroll_loops, value);

This is also not a no-op change.

Since a new flag is not needed to fix the regression please avoid
adding -fcomplete-unroll-loops.

For -frtl-unroll-loops you should be able to use

EnabledBy(funroll-loops)

and thus you can simplify option handling accordingly.

Thanks,
Richard.

> Bootstrap and regtest pass on powerpc64le, is this ok for trunk?
>
> Jiufu
>
> ChangeLog:
> 2020-05-25  Jiufu Guo   <guojiufu@cn.ibm.com>
>
>         * common.opt: Add -frtl-unroll-loops and -fcomplete-unroll-loops.
>         * opts.c (enable_fdo_optimizations): Replace flag_unroll_loops
>         with flag_complete_unroll_loops.
>         * toplev.c (process_options): set flag_rtl_unroll_loops and
>         flag_complete_unroll_loops if not explicitly set by user.
>         * tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute): Replace
>         flag_unroll_loops with flag_complete_unroll_loops.
>         * loop-init.c (pass_loop2::gate): Replace flag_unroll_loops with
>         flag_rtl_unroll_loops.
>         (pass_rtl_unroll_loops::gate): Replace flag_unroll_loops with
>         flag_rtl_unroll_loops.
> ---
>  gcc/common.opt              | 8 ++++++++
>  gcc/loop-init.c             | 6 +++---
>  gcc/opts.c                  | 2 +-
>  gcc/toplev.c                | 6 ++++++
>  gcc/tree-ssa-loop-ivcanon.c | 2 +-
>  5 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4464049fc1f..3b5ab52bb9d 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2856,6 +2856,14 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
>
> +frtl-unroll-loops
> +Common Undocumented Var(flag_rtl_unroll_loops) Init(2) Optimization
> +; Perform rtl loop unrolling when iteration count is known.
> +
> +fcomplete-unroll-loops
> +Common Undocumented Var(flag_complete_unroll_loops) Init(2) Optimization
> +; Perform GIMPLE loop complete unrolling.
> +
>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/loop-init.c b/gcc/loop-init.c
> index 401e5282907..e955068f36c 100644
> --- a/gcc/loop-init.c
> +++ b/gcc/loop-init.c
> @@ -360,7 +360,7 @@ pass_loop2::gate (function *fun)
>    if (optimize > 0
>        && (flag_move_loop_invariants
>           || flag_unswitch_loops
> -         || flag_unroll_loops
> +         || flag_rtl_unroll_loops
>           || (flag_branch_on_count_reg && targetm.have_doloop_end ())
>           || cfun->has_unroll))
>      return true;
> @@ -560,7 +560,7 @@ public:
>    /* opt_pass methods: */
>    virtual bool gate (function *)
>      {
> -      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
> +      return (flag_rtl_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
>      }
>
>    virtual unsigned int execute (function *);
> @@ -576,7 +576,7 @@ pass_rtl_unroll_loops::execute (function *fun)
>        if (dump_file)
>         df_dump (dump_file);
>
> -      if (flag_unroll_loops)
> +      if (flag_rtl_unroll_loops)
>         flags |= UAP_UNROLL;
>        if (flag_unroll_all_loops)
>         flags |= UAP_UNROLL_ALL;
> diff --git a/gcc/opts.c b/gcc/opts.c
> index ec3ca0720f9..64c35d8d7fc 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1702,7 +1702,7 @@ enable_fdo_optimizations (struct gcc_options *opts,
>  {
>    SET_OPTION_IF_UNSET (opts, opts_set, flag_branch_probabilities, value);
>    SET_OPTION_IF_UNSET (opts, opts_set, flag_profile_values, value);
> -  SET_OPTION_IF_UNSET (opts, opts_set, flag_unroll_loops, value);
> +  SET_OPTION_IF_UNSET (opts, opts_set, flag_complete_unroll_loops, value);
>    SET_OPTION_IF_UNSET (opts, opts_set, flag_peel_loops, value);
>    SET_OPTION_IF_UNSET (opts, opts_set, flag_tracer, value);
>    SET_OPTION_IF_UNSET (opts, opts_set, flag_value_profile_transformations,
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 96316fbd23b..c2b94d33464 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1474,6 +1474,12 @@ process_options (void)
>    if (flag_unroll_all_loops)
>      flag_unroll_loops = 1;
>
> +  if (flag_rtl_unroll_loops == AUTODETECT_VALUE)
> +    flag_rtl_unroll_loops = flag_unroll_loops;
> +
> +  if (flag_complete_unroll_loops == AUTODETECT_VALUE)
> +    flag_complete_unroll_loops = flag_unroll_loops;
> +
>    /* web and rename-registers help when run after loop unrolling.  */
>    if (flag_web == AUTODETECT_VALUE)
>      flag_web = flag_unroll_loops;
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8ab6ab3330c..cd5df353df5 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1603,7 +1603,7 @@ pass_complete_unroll::execute (function *fun)
>       re-peeling the same loop multiple times.  */
>    if (flag_peel_loops)
>      peeled_loops = BITMAP_ALLOC (NULL);
> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> +  unsigned int val = tree_unroll_loops_completely (flag_complete_unroll_loops
>                                                    || flag_peel_loops
>                                                    || optimize >= 3, true);
>    if (peeled_loops)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-25 12:14 ` [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller Richard Biener
@ 2020-05-25 17:40   ` Segher Boessenkool
  2020-05-25 17:58     ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2020-05-25 17:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: guojiufu, GCC Patches, Bill Schmidt, David Edelsohn

On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> On Mon, May 25, 2020 at 1:10 PM guojiufu <guojiufu@linux.ibm.com> wrote:
> Since a new flag is not needed to fix the regression please avoid
> adding -fcomplete-unroll-loops.
> 
> For -frtl-unroll-loops you should be able to use

Erm.  That *is* a new command-line option (the internal flags I do not
care about so much: new implementation details are *good*).  And a new
name that is a mistake in my opinion, for many reasons (users do not
know and should not have to care about "rtl"; the name is not
descriptive; it is useless churn, it is not the same name as we have
had for decades now; it is adding a new option for a future where we
will do most unrolling at gimple level, a future we do not know will
ever exist, and we do not know what that will look like anyway; it is
an extra level of indirection (in the name)).

We should not have an -frtl-unroll-loops if we do not have a
-ftree-unroll-loops (or whatever).

Unrolling early is not a good idea in general (the problems with the
very trivial complete unrolling case just underline that).  But we
*should* know which code we expect to unroll later, for better costing.
Adding names like "rtl-unroll-loops" only stands in the way of getting
a better design here.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-25 17:40   ` Segher Boessenkool
@ 2020-05-25 17:58     ` Richard Biener
  2020-05-25 17:59       ` David Edelsohn
  2020-05-25 18:10       ` Segher Boessenkool
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Biener @ 2020-05-25 17:58 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: guojiufu, GCC Patches, Bill Schmidt, David Edelsohn

On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
>> On Mon, May 25, 2020 at 1:10 PM guojiufu <guojiufu@linux.ibm.com>
>wrote:
>> Since a new flag is not needed to fix the regression please avoid
>> adding -fcomplete-unroll-loops.
>> 
>> For -frtl-unroll-loops you should be able to use
>
>Erm.  That *is* a new command-line option (the internal flags I do not
>care about so much: new implementation details are *good*).  And a new
>name that is a mistake in my opinion, for many reasons (users do not
>know and should not have to care about "rtl"; the name is not
>descriptive; it is useless churn, it is not the same name as we have
>had for decades now; it is adding a new option for a future where we
>will do most unrolling at gimple level, a future we do not know will
>ever exist, and we do not know what that will look like anyway; it is
>an extra level of indirection (in the name)).
>
>We should not have an -frtl-unroll-loops if we do not have a
>-ftree-unroll-loops (or whatever).
>
>Unrolling early is not a good idea in general (the problems with the
>very trivial complete unrolling case just underline that).  But we
>*should* know which code we expect to unroll later, for better costing.
>Adding names like "rtl-unroll-loops" only stands in the way of getting
>a better design here.

You folks made ppc specific hacks instead of a better design. Those now stand in the way as well. But sure, simply do not expose the flag to the users, use
Var(flag_rtl_unroll_loops). My other points still stand. 

Feel free to ignore the regression part on the branch and come up with a great design. But don't expect to backport that then. 

Richard. 

>
>Segher


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-25 17:58     ` Richard Biener
@ 2020-05-25 17:59       ` David Edelsohn
  2020-05-26  4:58         ` Jiufu Guo
  2020-05-25 18:10       ` Segher Boessenkool
  1 sibling, 1 reply; 11+ messages in thread
From: David Edelsohn @ 2020-05-25 17:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Segher Boessenkool, guojiufu, GCC Patches, Bill Schmidt

On Mon, May 25, 2020 at 1:58 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> >> On Mon, May 25, 2020 at 1:10 PM guojiufu <guojiufu@linux.ibm.com>
> >wrote:
> >> Since a new flag is not needed to fix the regression please avoid
> >> adding -fcomplete-unroll-loops.
> >>
> >> For -frtl-unroll-loops you should be able to use
> >
> >Erm.  That *is* a new command-line option (the internal flags I do not
> >care about so much: new implementation details are *good*).  And a new
> >name that is a mistake in my opinion, for many reasons (users do not
> >know and should not have to care about "rtl"; the name is not
> >descriptive; it is useless churn, it is not the same name as we have
> >had for decades now; it is adding a new option for a future where we
> >will do most unrolling at gimple level, a future we do not know will
> >ever exist, and we do not know what that will look like anyway; it is
> >an extra level of indirection (in the name)).
> >
> >We should not have an -frtl-unroll-loops if we do not have a
> >-ftree-unroll-loops (or whatever).
> >
> >Unrolling early is not a good idea in general (the problems with the
> >very trivial complete unrolling case just underline that).  But we
> >*should* know which code we expect to unroll later, for better costing.
> >Adding names like "rtl-unroll-loops" only stands in the way of getting
> >a better design here.
>
> You folks made ppc specific hacks instead of a better design. Those now stand in the way as well. But sure, simply do not expose the flag to the users, use
> Var(flag_rtl_unroll_loops). My other points still stand.
>
> Feel free to ignore the regression part on the branch and come up with a great design. But don't expect to backport that then.

I completely agree.

This path is digging a deeper and deeper hole.

- David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-25 17:58     ` Richard Biener
  2020-05-25 17:59       ` David Edelsohn
@ 2020-05-25 18:10       ` Segher Boessenkool
  1 sibling, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2020-05-25 18:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: guojiufu, GCC Patches, Bill Schmidt, David Edelsohn

On Mon, May 25, 2020 at 07:58:09PM +0200, Richard Biener wrote:
> You folks made ppc specific hacks instead of a better design. Those now stand in the way as well. But sure, simply do not expose the flag to the users, use
> Var(flag_rtl_unroll_loops). My other points still stand. 
> 
> Feel free to ignore the regression part on the branch and come up with a great design. But don't expect to backport that then. 

I just do not think fixing the regression should make things worse for
a long time, if that can be avoided.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-25 17:59       ` David Edelsohn
@ 2020-05-26  4:58         ` Jiufu Guo
  2020-05-28 14:22           ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2020-05-26  4:58 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Richard Biener, Segher Boessenkool, GCC Patches, Bill Schmidt

David Edelsohn <dje.gcc@gmail.com> writes:

> On Mon, May 25, 2020 at 1:58 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
>> >> On Mon, May 25, 2020 at 1:10 PM guojiufu <guojiufu@linux.ibm.com>
>> >wrote:
>> >> Since a new flag is not needed to fix the regression please avoid
>> >> adding -fcomplete-unroll-loops.
>> >>
>> >> For -frtl-unroll-loops you should be able to use
>> >
>> >Erm.  That *is* a new command-line option (the internal flags I do not
>> >care about so much: new implementation details are *good*).  And a new
>> >name that is a mistake in my opinion, for many reasons (users do not
>> >know and should not have to care about "rtl"; the name is not
>> >descriptive; it is useless churn, it is not the same name as we have
>> >had for decades now; it is adding a new option for a future where we
>> >will do most unrolling at gimple level, a future we do not know will
>> >ever exist, and we do not know what that will look like anyway; it is
>> >an extra level of indirection (in the name)).
>> >
>> >We should not have an -frtl-unroll-loops if we do not have a
>> >-ftree-unroll-loops (or whatever).
>> >
>> >Unrolling early is not a good idea in general (the problems with the
>> >very trivial complete unrolling case just underline that).  But we
>> >*should* know which code we expect to unroll later, for better costing.
>> >Adding names like "rtl-unroll-loops" only stands in the way of getting
>> >a better design here.
>>
>> You folks made ppc specific hacks instead of a better design. Those
>> now stand in the way as well. But sure, simply do not expose the
>> flag to the users, use
>> Var(flag_rtl_unroll_loops). My other points still stand.
>>
>> Feel free to ignore the regression part on the branch and come up
>> with a great design. But don't expect to backport that then.
>
> I completely agree.

Thanks a lot for all your comments, suggestions, and tips in the
discussion.  Thank Richar, Segher, David, Hanza, and all!

I may have an explanation about the intention of this work.

We know that loop unrolling is a complex and tickly thing.  It could
help some kinds of code in a great manner.  Sometimes there are side
effects.  For different types of loop and different platforms, it may
result in different effects.
It would makes sense to tune the loop unrolling accordingly.  And so, to
help and tune loop unrolling is what we want to do.

Currently, we have loop unroller at GIMPLE part (cunroll/cunrolli) and
RTL part.  There are some options (like -funroll-loops) and --params to
control unrollers.

Through target hook, it would be helpful for different platforms to tune
unroller: checking the type of loops, check optimization level.
Existing hooks may help with something, like turn --params.

Adding separate flags(or options) may be helpful to control different
behaviors independently.  This is one reason for the patch which
introduces internal undocumented options.

One previous patch, r10-4525, is tunning for ppc at -O2. Which
implements an existing hook for rs6000 to check simple loops for RTL
unroller. For cunroll, it just enables it even increasing size at -O2
directly, without check the type of the loops.  And then the
side/negative effects of cunroll are also visible at -O2 besides
positive effects.  In PR95018, the side effect is shown on complex loop
(early exit, and more peeling).
One idea is for cunroll to tune it to avoid side effects. And if the
heuristic is suitable, it would be helpful for other usage, like -O3 and
-funroll-loops.

Thanks for any comments!

Jiufu

>
> This path is digging a deeper and deeper hole.
>
> - David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-26  4:58         ` Jiufu Guo
@ 2020-05-28 14:22           ` Richard Biener
  2020-05-28 18:45             ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2020-05-28 14:22 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: David Edelsohn, Segher Boessenkool, GCC Patches, Bill Schmidt

On Tue, May 26, 2020 at 6:58 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>
> David Edelsohn <dje.gcc@gmail.com> writes:
>
> > On Mon, May 25, 2020 at 1:58 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >>
> >> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> >> >> On Mon, May 25, 2020 at 1:10 PM guojiufu <guojiufu@linux.ibm.com>
> >> >wrote:
> >> >> Since a new flag is not needed to fix the regression please avoid
> >> >> adding -fcomplete-unroll-loops.
> >> >>
> >> >> For -frtl-unroll-loops you should be able to use
> >> >
> >> >Erm.  That *is* a new command-line option (the internal flags I do not
> >> >care about so much: new implementation details are *good*).  And a new
> >> >name that is a mistake in my opinion, for many reasons (users do not
> >> >know and should not have to care about "rtl"; the name is not
> >> >descriptive; it is useless churn, it is not the same name as we have
> >> >had for decades now; it is adding a new option for a future where we
> >> >will do most unrolling at gimple level, a future we do not know will
> >> >ever exist, and we do not know what that will look like anyway; it is
> >> >an extra level of indirection (in the name)).
> >> >
> >> >We should not have an -frtl-unroll-loops if we do not have a
> >> >-ftree-unroll-loops (or whatever).
> >> >
> >> >Unrolling early is not a good idea in general (the problems with the
> >> >very trivial complete unrolling case just underline that).  But we
> >> >*should* know which code we expect to unroll later, for better costing.
> >> >Adding names like "rtl-unroll-loops" only stands in the way of getting
> >> >a better design here.
> >>
> >> You folks made ppc specific hacks instead of a better design. Those
> >> now stand in the way as well. But sure, simply do not expose the
> >> flag to the users, use
> >> Var(flag_rtl_unroll_loops). My other points still stand.
> >>
> >> Feel free to ignore the regression part on the branch and come up
> >> with a great design. But don't expect to backport that then.
> >
> > I completely agree.
>
> Thanks a lot for all your comments, suggestions, and tips in the
> discussion.  Thank Richar, Segher, David, Hanza, and all!
>
> I may have an explanation about the intention of this work.
>
> We know that loop unrolling is a complex and tickly thing.  It could
> help some kinds of code in a great manner.  Sometimes there are side
> effects.  For different types of loop and different platforms, it may
> result in different effects.
> It would makes sense to tune the loop unrolling accordingly.  And so, to
> help and tune loop unrolling is what we want to do.
>
> Currently, we have loop unroller at GIMPLE part (cunroll/cunrolli) and
> RTL part.  There are some options (like -funroll-loops) and --params to
> control unrollers.
>
> Through target hook, it would be helpful for different platforms to tune
> unroller: checking the type of loops, check optimization level.
> Existing hooks may help with something, like turn --params.
>
> Adding separate flags(or options) may be helpful to control different
> behaviors independently.  This is one reason for the patch which
> introduces internal undocumented options.
>
> One previous patch, r10-4525, is tunning for ppc at -O2. Which
> implements an existing hook for rs6000 to check simple loops for RTL
> unroller. For cunroll, it just enables it even increasing size at -O2
> directly, without check the type of the loops.  And then the
> side/negative effects of cunroll are also visible at -O2 besides
> positive effects.  In PR95018, the side effect is shown on complex loop
> (early exit, and more peeling).
> One idea is for cunroll to tune it to avoid side effects. And if the
> heuristic is suitable, it would be helpful for other usage, like -O3 and
> -funroll-loops.
>
> Thanks for any comments!

For GIMPLE level transforms I don't think targets have more knowledge
than the middle-end.  In fact GIMPLE complete unrolling is about
secondary effects, removing redundancies and abstraction.  So IMHO
the correct approach is to look at individual cases and try to improve
the generic code rather than try to get better benchmark results
on a per-target manner by magical parameter tuning.

For what the RTL unroller does it indeed depends very heavily on
the target whether sth is beneficial or not.

So I'd like to see specific cases where you think cunroll should
do "better" on powerpc only but not elsewhere.

Richard.

> Jiufu
>
> >
> > This path is digging a deeper and deeper hole.
> >
> > - David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-28 14:22           ` Richard Biener
@ 2020-05-28 18:45             ` Segher Boessenkool
  2020-05-29  3:58               ` Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2020-05-28 18:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jiufu Guo, David Edelsohn, GCC Patches, Bill Schmidt

Hi!

On Thu, May 28, 2020 at 04:22:16PM +0200, Richard Biener wrote:
> For GIMPLE level transforms I don't think targets have more knowledge
> than the middle-end.

Yes, certainly.

> In fact GIMPLE complete unrolling is about
> secondary effects, removing redundancies and abstraction.  So IMHO
> the correct approach is to look at individual cases and try to improve
> the generic code

Yep.

> rather than try to get better benchmark results
> on a per-target manner by magical parameter tuning.

I'm no fan of that for target-specific code either.  It's fine to be led
by benchmarks, but usually a better justification is needed.

> For what the RTL unroller does it indeed depends very heavily on
> the target whether sth is beneficial or not.

Yes :-(  And this means it will need to remain late in the pass
pipeline,  or at least the decision needs to use target information
(just like what ivopts does).

> So I'd like to see specific cases where you think cunroll should
> do "better" on powerpc only but not elsewhere.

It is probably not a good idea in general to unroll 14 times, yes :-)


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller
  2020-05-28 18:45             ` Segher Boessenkool
@ 2020-05-29  3:58               ` Jiufu Guo
  0 siblings, 0 replies; 11+ messages in thread
From: Jiufu Guo @ 2020-05-29  3:58 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, David Edelsohn, GCC Patches, Bill Schmidt

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Thu, May 28, 2020 at 04:22:16PM +0200, Richard Biener wrote:
>> For GIMPLE level transforms I don't think targets have more knowledge
>> than the middle-end.
>
> Yes, certainly.
>
>> In fact GIMPLE complete unrolling is about
>> secondary effects, removing redundancies and abstraction.  So IMHO
>> the correct approach is to look at individual cases and try to improve
>> the generic code
>
> Yep.
>
>> rather than try to get better benchmark results
>> on a per-target manner by magical parameter tuning.
>
> I'm no fan of that for target-specific code either.  It's fine to be led
> by benchmarks, but usually a better justification is needed.

Thanks all,
Agree, we'd better tune it in generic code.

Jiufu

>
>> For what the RTL unroller does it indeed depends very heavily on
>> the target whether sth is beneficial or not.
>
> Yes :-(  And this means it will need to remain late in the pass
> pipeline,  or at least the decision needs to use target information
> (just like what ivopts does).
>
>> So I'd like to see specific cases where you think cunroll should
>> do "better" on powerpc only but not elsewhere.
>
> It is probably not a good idea in general to unroll 14 times, yes :-)
>
>
> Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-05-29  3:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 11:10 [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller guojiufu
2020-05-25 11:10 ` [PATCH 2/2] rs6000: Turn on -frtl-unroll-loops instead -funroll-loops at -O2 guojiufu
2020-05-25 12:14 ` [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller Richard Biener
2020-05-25 17:40   ` Segher Boessenkool
2020-05-25 17:58     ` Richard Biener
2020-05-25 17:59       ` David Edelsohn
2020-05-26  4:58         ` Jiufu Guo
2020-05-28 14:22           ` Richard Biener
2020-05-28 18:45             ` Segher Boessenkool
2020-05-29  3:58               ` Jiufu Guo
2020-05-25 18:10       ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).