[RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
@ 2015-10-08  4:32 Ajit Kumar Agarwal
  2015-10-08  4:59 ` Bin.Cheng
  2015-11-13  6:13 ` Jeff Law
  0 siblings, 2 replies; 14+ messages in thread
From: Ajit Kumar Agarwal @ 2015-10-08  4:32 UTC (permalink / raw)
  To: GCC Patches
  Cc: Vinod Kathail, Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala

[-- Attachment #1: Type: text/plain, Size: 3159 bytes --]

Following Proposed:

Changes are done in the Loop Invariant(LICM) at RTL level and also the Induction variable optimization based on SSA representation. 
The current logic used in LICM for register used inside the loops is changed. The Live Out of the loop latch node and the Live in of the 
destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop. The register used is the number of live variables 
at the exit of the Loop calculated above.

For Induction variable optimization on tree SSA representation, the register used logic is based on the number of phi nodes at the loop 
header to represent the liveness at the loop. Current Logic used only the number of phi nodes at the loop header. I have made changes
 to represent the phi operands also live at the loop. Thus number of phi operands also gets incremented in the number of registers used.

Performance runs:

Bootstrapping with i386 goes through fine. The spec cpu 2000 benchmarks is run and following performance runs and the code size for
 i386 target seen.

Ratio with the above optimization changes vs ratio without above optimizations for INT benchmarks (3785.261 vs 3783.064).
Ratio with the above optimization changes vs ratio without above optimization for FP benchmarks ( 4676.763189 vs 4676.072428 ).

Code size reduction for INT benchmarks : 2324 instructions.
Code size reduction for FP benchmarks : 1283 instructions.

For Microblaze target the Mibench and EEMBC benchmarks is run and the following improvements is seen.

(qos_lite(5.3%), consumer_jpeg_c(1.34%), security_rijndael_d(1.8%), security_rijndael_e(1.4%))

Code Size reduction for Mibench  = 16164 instructions.
Code Size reduction for EEMBC = 98 instructions.

Patch ChangeLog:

PATCH] [RFC, Patch]: Optimized changes in the register used inside  loop for LICM and IVOPTS.

Changes are done in the Loop Invariant(LICM) at RTL level and also the Induction variable optimization 
based on SSA representation. The current logic used in LICM for register used inside the loops is changed. 
The Live Out of the loop latch node and the Live in of the destination of the exit nodes is used to set the
 Loops Liveness at the exit of the Loop. The register used is the number of live variables at the exit of the
 Loop calculated above.

For Induction variable optimization on tree SSA representation, the register used logic is based on the
 number of phi nodes at the loop header to represent the liveness at the loop.  Current Logic used only
 the number of phi nodes at the loop header.  Changes are made to represent the phi operands also live
 at the loop. Thus number of phi operands also gets incremented in the number of registers used.

ChangeLog:
2015-10-09  Ajit Agarwal  <ajitkum@xilinx.com>

        * loop-invariant.c (compute_loop_liveness): New.
        (determine_regs_used): New.
        (find_invariants_to_move): Use of determine_regs_used.
        * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
        arguments for register used.

Signed-off-by:Ajit Agarwal ajitkum@xilinx.com

Thanks & Regards
Ajit

[-- Attachment #2: 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch --]
[-- Type: application/octet-stream, Size: 4674 bytes --]

From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
From: Ajit Kumar Agarwal <ajitkum@xilix.com>
Date: Wed, 7 Oct 2015 20:50:40 +0200
Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
 loop for LICM and IVOPTS.

Changes are done in the Loop Invariant(LICM) at RTL level and also the
Induction variable optimization based on SSA representation. The current
logic used in LICM for register used inside the loops is changed. The
Live Out of the loop latch node and the Live in of the destination of
the exit nodes is used to set the Loops Liveness at the exit of the Loop.
The register used is the number of live variables at the exit of the
Loop calculated above.

For Induction variable optimization on tree SSA representation, the register
used logic is based on the number of phi nodes at the loop header to represent
the liveness at the loop.  Current Logic used only the number of phi nodes at
the loop header.  Changes are made to represent the phi operands also live at
the loop. Thus number of phi operands also gets incremented in the number of
registers used.

ChangeLog:
2015-10-09  Ajit Agarwal  <ajitkum@xilinx.com>

	* loop-invariant.c (compute_loop_liveness): New.
	(determine_regs_used): New.
	(find_invariants_to_move): Use of determine_regs_used.
	* tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
	arguments for register used.

Signed-off-by:Ajit Agarwal ajitkum@xilinx.com
---
 gcc/loop-invariant.c       | 72 +++++++++++++++++++++++++++++++++++++---------
 gcc/tree-ssa-loop-ivopts.c |  4 +--
 2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..e4291c9 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
     }
 }

+static int
+determine_regs_used()
+{
+  unsigned int j;
+  unsigned int reg_used = 2;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
+    (reg_used) ++;
+
+  return reg_used;
+}
+
 /* Determines which invariants to move.  */

 static void
@@ -1433,18 +1446,7 @@ find_invariants_to_move (bool speed, bool call_p)
        registers used; we put some initial bound here to stand for
        induction variables etc.  that we do not detect.  */
     {
-      unsigned int n_regs = DF_REG_SIZE (df);
-
-      regs_used = 2;
-
-      for (i = 0; i < n_regs; i++)
-	{
-	  if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i))
-	    {
-	      /* This is a value that is used but not changed inside loop.  */
-	      regs_used++;
-	    }
-	}
+      regs_used = determine_regs_used();
     }

   if (! flag_ira_loop_pressure)
@@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
     }
 }

-\f
+static void
+calculate_loop_liveness (void)
+{
+  basic_block bb;
+  struct loop *loop;

-/* Move the invariants out of the loops.  */
+  FOR_EACH_LOOP (loop, 0)
+    if (loop->aux == NULL)
+      {
+        loop->aux = xcalloc (1, sizeof (struct loop_data));
+        bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
+     }
+
+  FOR_EACH_BB_FN (bb, cfun)
+  {
+    curr_loop = bb->loop_father;
+    if (curr_loop == current_loops->tree_root)
+      continue;
+
+    for (loop = curr_loop;
+         loop != current_loops->tree_root;
+         loop = loop_outer (loop))
+       {
+         int  i;
+         edge e;
+         vec<edge> edges;
+         edges = get_loop_exit_edges (loop);
+         FOR_EACH_VEC_ELT (edges, i, e)
+         {
+           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
+           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN(e->dest));
+         }
+      }
+  }
+}
+
+/* Move the invariants  ut of the loops.  */

 void
 move_loop_invariants (void)
@@ -2072,6 +2108,14 @@ move_loop_invariants (void)
       calculate_loop_reg_pressure ();
       regstat_free_n_sets_and_refs ();
     }
+  else
+    {
+      df_analyze();
+      regstat_init_n_sets_and_refs ();
+      calculate_loop_liveness();
+      regstat_free_n_sets_and_refs ();
+    }
+
   df_set_flags (DF_EQ_NOTES + DF_DEFER_INSN_RESCAN);
   /* Process the loops, innermost first.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 945d34b..9d94c60 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5736,8 +5736,8 @@ determine_set_costs (struct ivopts_data *data)

       if (get_iv (data, op))
 	continue;
-
-      n++;
+      
+      n += gimple_phi_num_args (phi);
     }

   EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi)
-- 
1.8.2.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-10-08  4:32 [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS Ajit Kumar Agarwal
@ 2015-10-08  4:59 ` Bin.Cheng
  2015-10-08  5:54   ` Ajit Kumar Agarwal
  2015-11-13  6:13 ` Jeff Law
  1 sibling, 1 reply; 14+ messages in thread
From: Bin.Cheng @ 2015-10-08  4:59 UTC (permalink / raw)
  To: Ajit Kumar Agarwal
  Cc: GCC Patches, Vinod Kathail, Shail Aditya Gupta,
	Vidhumouli Hunsigida, Nagaraju Mekala

On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal
<ajit.kumar.agarwal@xilinx.com> wrote:
> Following Proposed:
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the Induction variable optimization based on SSA representation.
> The current logic used in LICM for register used inside the loops is changed. The Live Out of the loop latch node and the Live in of the
> destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop. The register used is the number of live variables
> at the exit of the Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the register used logic is based on the number of phi nodes at the loop
> header to represent the liveness at the loop. Current Logic used only the number of phi nodes at the loop header. I have made changes
>  to represent the phi operands also live at the loop. Thus number of phi operands also gets incremented in the number of registers used.
Hi,
For the GIMPLE IVO part, I don't think the change is reasonable
enough.  IMHO, IVO fails to restrict iv number in some complex cases,
your change tries to rectify that by increasing register pressure
irrespective to out-of-ssa and coalescing.  I think the original code
models reg-pressure better, what needs to be changed is how we compute
cost from register pressure and use that to restrict iv number.
As for the specific function determine_set_costs, I think one change
is necessary to rule out all floating point phi nodes, because they do
not have impact on IVO register pressure.  Actually this change will
further reduce register pressure for fp related cases.

Thanks,
bin
>
> Performance runs:
>
> Bootstrapping with i386 goes through fine. The spec cpu 2000 benchmarks is run and following performance runs and the code size for
>  i386 target seen.
>
> Ratio with the above optimization changes vs ratio without above optimizations for INT benchmarks (3785.261 vs 3783.064).
> Ratio with the above optimization changes vs ratio without above optimization for FP benchmarks ( 4676.763189 vs 4676.072428 ).
>
> Code size reduction for INT benchmarks : 2324 instructions.
> Code size reduction for FP benchmarks : 1283 instructions.
>
> For Microblaze target the Mibench and EEMBC benchmarks is run and the following improvements is seen.
>
> (qos_lite(5.3%), consumer_jpeg_c(1.34%), security_rijndael_d(1.8%), security_rijndael_e(1.4%))
>
> Code Size reduction for Mibench  = 16164 instructions.
> Code Size reduction for EEMBC = 98 instructions.
>
> Patch ChangeLog:
>
> PATCH] [RFC, Patch]: Optimized changes in the register used inside  loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the Induction variable optimization
> based on SSA representation. The current logic used in LICM for register used inside the loops is changed.
> The Live Out of the loop latch node and the Live in of the destination of the exit nodes is used to set the
>  Loops Liveness at the exit of the Loop. The register used is the number of live variables at the exit of the
>  Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the register used logic is based on the
>  number of phi nodes at the loop header to represent the liveness at the loop.  Current Logic used only
>  the number of phi nodes at the loop header.  Changes are made to represent the phi operands also live
>  at the loop. Thus number of phi operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal  <ajitkum@xilinx.com>
>
>         * loop-invariant.c (compute_loop_liveness): New.
>         (determine_regs_used): New.
>         (find_invariants_to_move): Use of determine_regs_used.
>         * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>         arguments for register used.
>
> Signed-off-by:Ajit Agarwal ajitkum@xilinx.com
>
> Thanks & Regards
> Ajit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-10-08  4:59 ` Bin.Cheng
@ 2015-10-08  5:54   ` Ajit Kumar Agarwal
  2015-10-09  2:45     ` Bin.Cheng
  0 siblings, 1 reply; 14+ messages in thread
From: Ajit Kumar Agarwal @ 2015-10-08  5:54 UTC (permalink / raw)
  To: Bin.Cheng
  Cc: GCC Patches, Vinod Kathail, Shail Aditya Gupta,
	Vidhumouli Hunsigida, Nagaraju Mekala



-----Original Message-----
From: Bin.Cheng [mailto:amker.cheng@gmail.com] 
Sent: Thursday, October 08, 2015 10:29 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
> Following Proposed:
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the Induction variable optimization based on SSA representation.
> The current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop. The register used is the number of live variables at the exit of the Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop. Current Logic used only the number of phi nodes at the loop header. I have made changes  to represent the phi operands also live at the loop. Thus number of phi operands also gets incremented in the number of registers used.
Hi,
>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  IMHO, IVO fails to restrict iv number in some complex cases, your change tries to >>rectify that by increasing register pressure irrespective to out-of-ssa and coalescing.  I think the original code models reg-pressure better, what needs to be >>changed is how we compute cost from register pressure and use that to restrict iv number.

Considering the liveness with respect to all the phi arguments will not increase the register pressure. It improves the heuristics for restricting
The IV that increases the register pressure. The cost model uses regs_used and modelling the Liveness with respect to the phi arguments measures
Better register pressure.

Number of phi nodes in the loop header is not only the criteria for regs_used, but the number of liveness with respect to loop should be 
Criteria to measure appropriate register pressure.

Thanks & Regards
Ajit
>>As for the specific function determine_set_costs, I think one change is necessary to rule out all floating point phi nodes, because they do not have impact on >>IVO register pressure.  Actually this change will further reduce register pressure for fp related cases.


Thanks,
bin
>
> Performance runs:
>
> Bootstrapping with i386 goes through fine. The spec cpu 2000 
> benchmarks is run and following performance runs and the code size for
>  i386 target seen.
>
> Ratio with the above optimization changes vs ratio without above optimizations for INT benchmarks (3785.261 vs 3783.064).
> Ratio with the above optimization changes vs ratio without above optimization for FP benchmarks ( 4676.763189 vs 4676.072428 ).
>
> Code size reduction for INT benchmarks : 2324 instructions.
> Code size reduction for FP benchmarks : 1283 instructions.
>
> For Microblaze target the Mibench and EEMBC benchmarks is run and the following improvements is seen.
>
> (qos_lite(5.3%), consumer_jpeg_c(1.34%), security_rijndael_d(1.8%), 
> security_rijndael_e(1.4%))
>
> Code Size reduction for Mibench  = 16164 instructions.
> Code Size reduction for EEMBC = 98 instructions.
>
> Patch ChangeLog:
>
> PATCH] [RFC, Patch]: Optimized changes in the register used inside  loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The current logic used in LICM for register used inside the loops is changed.
> The Live Out of the loop latch node and the Live in of the destination 
> of the exit nodes is used to set the  Loops Liveness at the exit of 
> the Loop. The register used is the number of live variables at the exit of the  Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the  number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only  
> the number of phi nodes at the loop header.  Changes are made to represent the phi operands also live  at the loop. Thus number of phi operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal  <ajitkum@xilinx.com>
>
>         * loop-invariant.c (compute_loop_liveness): New.
>         (determine_regs_used): New.
>         (find_invariants_to_move): Use of determine_regs_used.
>         * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>         arguments for register used.
>
> Signed-off-by:Ajit Agarwal ajitkum@xilinx.com
>
> Thanks & Regards
> Ajit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-10-08  5:54   ` Ajit Kumar Agarwal
@ 2015-10-09  2:45     ` Bin.Cheng
  2015-10-12  4:25       ` Ajit Kumar Agarwal
  0 siblings, 1 reply; 14+ messages in thread
From: Bin.Cheng @ 2015-10-09  2:45 UTC (permalink / raw)
  To: Ajit Kumar Agarwal
  Cc: GCC Patches, Vinod Kathail, Shail Aditya Gupta,
	Vidhumouli Hunsigida, Nagaraju Mekala

On Thu, Oct 8, 2015 at 1:53 PM, Ajit Kumar Agarwal
<ajit.kumar.agarwal@xilinx.com> wrote:
>
>
> -----Original Message-----
> From: Bin.Cheng [mailto:amker.cheng@gmail.com]
> Sent: Thursday, October 08, 2015 10:29 AM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
>
> On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
>> Following Proposed:
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the Induction variable optimization based on SSA representation.
>> The current logic used in LICM for register used inside the loops is
>> changed. The Live Out of the loop latch node and the Live in of the
>> destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop. The register used is the number of live variables at the exit of the Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the
>> register used logic is based on the number of phi nodes at the loop
>> header to represent the liveness at the loop. Current Logic used only the number of phi nodes at the loop header. I have made changes  to represent the phi operands also live at the loop. Thus number of phi operands also gets incremented in the number of registers used.
> Hi,
>>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  IMHO, IVO fails to restrict iv number in some complex cases, your change tries to >>rectify that by increasing register pressure irrespective to out-of-ssa and coalescing.  I think the original code models reg-pressure better, what needs to be >>changed is how we compute cost from register pressure and use that to restrict iv number.
>
> Considering the liveness with respect to all the phi arguments will not increase the register pressure. It improves the heuristics for restricting
> The IV that increases the register pressure. The cost model uses regs_used and modelling the
I think register pressure is increased along with regs_needed, doesn't
matter if it will be canceled in estimate_reg_pressure_cost for both
ends of cost comparison.
Liveness with respect to the phi arguments measures
> Better register pressure.
I agree IV number should be controlled for some cases, but not by
increasing `n' using phi argument number unconditionally.  Considering
summary reduction as an example, most likely the ssa names will be
coalesced and held in single register.  Furthermore, there is no
reason to count phi node/arg number for floating point phi nodes.

>
> Number of phi nodes in the loop header is not only the criteria for regs_used, but the number of liveness with respect to loop should be
> Criteria to measure appropriate register pressure.
IMHO, it's hard to accurately track liveness info on SSA(PHI), because
of coalescing etc.  So could you give some examples/proof for this?

Thanks,
bin
>
> Thanks & Regards
> Ajit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-10-09  2:45     ` Bin.Cheng
@ 2015-10-12  4:25       ` Ajit Kumar Agarwal
  0 siblings, 0 replies; 14+ messages in thread
From: Ajit Kumar Agarwal @ 2015-10-12  4:25 UTC (permalink / raw)
  To: Bin.Cheng
  Cc: GCC Patches, Vinod Kathail, Shail Aditya Gupta,
	Vidhumouli Hunsigida, Nagaraju Mekala

-----Original Message-----
From: Bin.Cheng [mailto:amker.cheng@gmail.com] 
Sent: Friday, October 09, 2015 8:15 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

On Thu, Oct 8, 2015 at 1:53 PM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
>
>
> -----Original Message-----
> From: Bin.Cheng [mailto:amker.cheng@gmail.com]
> Sent: Thursday, October 08, 2015 10:29 AM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
> Hunsigida; Nagaraju Mekala
> Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
>
> On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
>> Following Proposed:
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the Induction variable optimization based on SSA representation.
>> The current logic used in LICM for register used inside the loops is 
>> changed. The Live Out of the loop latch node and the Live in of the 
>> destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop. The register used is the number of live variables at the exit of the Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the 
>> register used logic is based on the number of phi nodes at the loop 
>> header to represent the liveness at the loop. Current Logic used only the number of phi nodes at the loop header. I have made changes  to represent the phi operands also live at the loop. Thus number of phi operands also gets incremented in the number of registers used.
> Hi,
>>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  IMHO, IVO fails to restrict iv number in some complex cases, your change tries to >>rectify that by increasing register pressure irrespective to out-of-ssa and coalescing.  I think the original code models reg-pressure better, what needs to be >>changed is how we compute cost from register pressure and use that to restrict iv number.
>
> Considering the liveness with respect to all the phi arguments will 
> not increase the register pressure. It improves the heuristics for 
> restricting The IV that increases the register pressure. The cost 
> model uses regs_used and modelling the
>>I think register pressure is increased along with regs_needed, doesn't matter if it will be canceled in estimate_reg_pressure_cost for both ends of cost >>comparison.
>>Liveness with respect to the phi arguments measures
> Better register pressure.
>>I agree IV number should be controlled for some cases, but not by increasing `n' using phi argument number unconditionally.  Considering summary >>reduction as an example, most likely the ssa names will be coalesced and held in single register.  Furthermore, there is no reason to count phi node/arg >>number for floating point phi nodes.

>
> Number of phi nodes in the loop header is not only the criteria for 
> regs_used, but the number of liveness with respect to loop should be Criteria to measure appropriate register pressure.
>>IMHO, it's hard to accurately track liveness info on SSA(PHI), because of coalescing etc.  So could you give some examples/proof for this?

I agree with you that it is hard to predict the exact mapping from SSA to the actual register allocation due to coalescing and out of SSA.

The Interference on phi arguments and results are important criteria for register pressure on SSA. The conventional SSA where the  phi 
arguments don't interfere. Most of the current compilers don't have conventional SSA. In the Non-conventional SSA there are chances
the phi arguments interfere. The Non-Conventional SSA arises due to the copy propagation of ssa names  makes the phi arguments
interfere. Due to non-conventional nature of SSA the phi arguments interfere and should be considered for the register used.  I interpret
the register used as the number of interfering live ranges that leads to increase or decrease in register pressure. 

On top of the above the Out of SSA or SSA names coalescing, for conventional SSA is quite simple as each phi nodes is assigned to new
variables and the def and use is replaced with the new variables and makes the case of assigning single register and then the corresponding 
phi node is removed. But in the Non- Conventional nature of SSA, the out of ssa makes the SSA conventional by inserting copying to each
of the predecessor node and assigned it to new variable on each of the predecessor nodes and the phi node is replaced with new variables
of the copy results generated in each of the predecessor as phi arguments and assigned to a new result and the copy is inserted after the 
phi nodes with source as the new phi result to a new variable. This makes the SSA conventional and out of ssa uses the above approach
for conventional to remove the phi nodes and replace the def and use.

This nature of out of SSA for non- conventional SSA increases the copy instructions and thus lead to increase in register pressure.
Though aggressive and conservative coalescing remove some of the copy instruction but not possible to remove all the copy instructions.

I liked the approach of calculating the number of phi nodes in the loop header because of the following properties of SSA.

The SSA variables that are live at the live at the header should be live at all other nodes of the Loop because of the SSA properties  of 
header dominates the all other nodes of the Loop.

The current implementation of the number of phi nodes to the register used in GIMPLE IVOPTS is not appropriate and sufficient reasons 
to consider the register used. The number of phi arguments should also be considered to have an effect of register pressure on the
number of IVS.

This concludes to consider the number of interfering phi arguments of non-conventional SSA in the register used  effecting the register
pressure.

 If you have any examples that impacted the above, Please share with us.

Thanks & Regards
Ajit

Thanks,
bin
>
> Thanks & Regards
> Ajit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-10-08  4:32 [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS Ajit Kumar Agarwal
  2015-10-08  4:59 ` Bin.Cheng
@ 2015-11-13  6:13 ` Jeff Law
  2015-11-13  6:32   ` Bin.Cheng
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Jeff Law @ 2015-11-13  6:13 UTC (permalink / raw)
  To: Ajit Kumar Agarwal, GCC Patches
  Cc: Vinod Kathail, Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
> From: Ajit Kumar Agarwal<ajitkum@xilix.com>
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the
> Induction variable optimization based on SSA representation. The current
> logic used in LICM for register used inside the loops is changed. The
> Live Out of the loop latch node and the Live in of the destination of
> the exit nodes is used to set the Loops Liveness at the exit of the Loop.
> The register used is the number of live variables at the exit of the
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the register
> used logic is based on the number of phi nodes at the loop header to represent
> the liveness at the loop.  Current Logic used only the number of phi nodes at
> the loop header.  Changes are made to represent the phi operands also live at
> the loop. Thus number of phi operands also gets incremented in the number of
> registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal<ajitkum@xilinx.com>
>
> 	* loop-invariant.c (compute_loop_liveness): New.
> 	(determine_regs_used): New.
> 	(find_invariants_to_move): Use of determine_regs_used.
> 	* tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
> 	arguments for register used.
I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit Agarwalajitkum@xilinx.com
> ---
>   gcc/loop-invariant.c       | 72 +++++++++++++++++++++++++++++++++++++---------
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>       }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
> +    (reg_used) ++;
> +
> +  return reg_used;
> +}
Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>       }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
Needs a function comment.


> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +    if (loop->aux == NULL)
> +      {
> +        loop->aux = xcalloc (1, sizeof (struct loop_data));
> +        bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
> +     }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
Why loop over blocks here?  Why not just iterate through all the loops 
in the loop structure.  Order isn't particularly important AFAICT for 
this code.



> +       {
> +         int  i;
> +         edge e;
> +         vec<edge> edges;
> +         edges = get_loop_exit_edges (loop);
> +         FOR_EACH_VEC_ELT (edges, i, e)
> +         {
> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN(e->dest));
Space before the open-paren in the previous two lines
DF_LR_OUT (e->src) and FD_LR_INT (e->dest))


> +         }
> +      }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
Looks like you introduced a typo.

I'd like to see testcases which show the change in # regs used 
computation helping generate better code.

And  I'd also like to see some background information on why you think 
this is a more accurate measure for the number of registers used in the 
loop.  regs_used AFAICT is supposed to be an estimate of the registers 
live around the loop.  So ISTM that you get that value by live-out set 
on the backedge of the loop.  I guess you get somethign similar by 
looking at the exit edge's source block's live-out set.  But I don't see 
any value  in including stuff live at the block outside the loop.

It also seems fairly non-intuitive.  Get the block's latch and use its 
live-out set.  That seems more intuitive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-13  6:13 ` Jeff Law
@ 2015-11-13  6:32   ` Bin.Cheng
  2015-11-13 10:18     ` Richard Biener
  2015-11-16 17:36   ` Ajit Kumar Agarwal
  2015-11-16 17:57   ` Ajit Kumar Agarwal
  2 siblings, 1 reply; 14+ messages in thread
From: Bin.Cheng @ 2015-11-13  6:32 UTC (permalink / raw)
  To: Jeff Law, Richard Biener
  Cc: Ajit Kumar Agarwal, GCC Patches, Vinod Kathail,
	Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala

On Fri, Nov 13, 2015 at 2:13 PM, Jeff Law <law@redhat.com> wrote:
> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>
>>
>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>
>>
>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
>> From: Ajit Kumar Agarwal<ajitkum@xilix.com>
>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used
>> inside
>>   loop for LICM and IVOPTS.
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>> Induction variable optimization based on SSA representation. The current
>> logic used in LICM for register used inside the loops is changed. The
>> Live Out of the loop latch node and the Live in of the destination of
>> the exit nodes is used to set the Loops Liveness at the exit of the Loop.
>> The register used is the number of live variables at the exit of the
>> Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the
>> register
>> used logic is based on the number of phi nodes at the loop header to
>> represent
>> the liveness at the loop.  Current Logic used only the number of phi nodes
>> at
>> the loop header.  Changes are made to represent the phi operands also live
>> at
>> the loop. Thus number of phi operands also gets incremented in the number
>> of
>> registers used.
>>
>> ChangeLog:
>> 2015-10-09  Ajit Agarwal<ajitkum@xilinx.com>
>>
>>         * loop-invariant.c (compute_loop_liveness): New.
>>         (determine_regs_used): New.
>>         (find_invariants_to_move): Use of determine_regs_used.
>>         * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>>         arguments for register used.
>
> I think Bin rejected the tree-ssa-loop-ivopts change.  However, the
> loop-invariant change is still pending, right?
Ah, reject is a strong word, I am just being dumb and don't understand
why it's a general better estimation yet.
Maybe Richard have some inputs here?

Thanks,
bin
>
>
>>
>> Signed-off-by:Ajit Agarwalajitkum@xilinx.com
>> ---
>>   gcc/loop-invariant.c       | 72
>> +++++++++++++++++++++++++++++++++++++---------
>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index 52c8ae8..e4291c9 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>       }
>>   }
>>
>> +static int
>> +determine_regs_used()
>> +{
>> +  unsigned int j;
>> +  unsigned int reg_used = 2;
>> +  bitmap_iterator bi;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
>> +    (reg_used) ++;
>> +
>> +  return reg_used;
>> +}
>
> Isn't this just bitmap_count_bits (regs_live) + 2?
>
>
>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>       }
>>   }
>>
>> -
>> +static void
>> +calculate_loop_liveness (void)
>
> Needs a function comment.
>
>
>> +{
>> +  basic_block bb;
>> +  struct loop *loop;
>>
>> -/* Move the invariants out of the loops.  */
>> +  FOR_EACH_LOOP (loop, 0)
>> +    if (loop->aux == NULL)
>> +      {
>> +        loop->aux = xcalloc (1, sizeof (struct loop_data));
>> +        bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
>> +     }
>> +
>> +  FOR_EACH_BB_FN (bb, cfun)
>
> Why loop over blocks here?  Why not just iterate through all the loops in
> the loop structure.  Order isn't particularly important AFAICT for this
> code.
>
>
>
>> +       {
>> +         int  i;
>> +         edge e;
>> +         vec<edge> edges;
>> +         edges = get_loop_exit_edges (loop);
>> +         FOR_EACH_VEC_ELT (edges, i, e)
>> +         {
>> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>> DF_LR_OUT(e->src));
>> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>> DF_LR_IN(e->dest));
>
> Space before the open-paren in the previous two lines
> DF_LR_OUT (e->src) and FD_LR_INT (e->dest))
>
>
>> +         }
>> +      }
>> +  }
>> +}
>> +
>> +/* Move the invariants  ut of the loops.  */
>
> Looks like you introduced a typo.
>
> I'd like to see testcases which show the change in # regs used computation
> helping generate better code.
>
> And  I'd also like to see some background information on why you think this
> is a more accurate measure for the number of registers used in the loop.
> regs_used AFAICT is supposed to be an estimate of the registers live around
> the loop.  So ISTM that you get that value by live-out set on the backedge
> of the loop.  I guess you get somethign similar by looking at the exit
> edge's source block's live-out set.  But I don't see any value  in including
> stuff live at the block outside the loop.
>
> It also seems fairly non-intuitive.  Get the block's latch and use its
> live-out set.  That seems more intuitive.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-13  6:32   ` Bin.Cheng
@ 2015-11-13 10:18     ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2015-11-13 10:18 UTC (permalink / raw)
  To: Bin.Cheng
  Cc: Jeff Law, Ajit Kumar Agarwal, GCC Patches, Vinod Kathail,
	Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala

On Fri, Nov 13, 2015 at 7:31 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Nov 13, 2015 at 2:13 PM, Jeff Law <law@redhat.com> wrote:
>> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>>
>>>
>>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>>
>>>
>>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
>>> From: Ajit Kumar Agarwal<ajitkum@xilix.com>
>>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used
>>> inside
>>>   loop for LICM and IVOPTS.
>>>
>>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>>> Induction variable optimization based on SSA representation. The current
>>> logic used in LICM for register used inside the loops is changed. The
>>> Live Out of the loop latch node and the Live in of the destination of
>>> the exit nodes is used to set the Loops Liveness at the exit of the Loop.
>>> The register used is the number of live variables at the exit of the
>>> Loop calculated above.
>>>
>>> For Induction variable optimization on tree SSA representation, the
>>> register
>>> used logic is based on the number of phi nodes at the loop header to
>>> represent
>>> the liveness at the loop.  Current Logic used only the number of phi nodes
>>> at
>>> the loop header.  Changes are made to represent the phi operands also live
>>> at
>>> the loop. Thus number of phi operands also gets incremented in the number
>>> of
>>> registers used.
>>>
>>> ChangeLog:
>>> 2015-10-09  Ajit Agarwal<ajitkum@xilinx.com>
>>>
>>>         * loop-invariant.c (compute_loop_liveness): New.
>>>         (determine_regs_used): New.
>>>         (find_invariants_to_move): Use of determine_regs_used.
>>>         * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>>>         arguments for register used.
>>
>> I think Bin rejected the tree-ssa-loop-ivopts change.  However, the
>> loop-invariant change is still pending, right?
> Ah, reject is a strong word, I am just being dumb and don't understand
> why it's a general better estimation yet.
> Maybe Richard have some inputs here?

Not really.  I agree with Bin that the change doesn't look like an improvement
by design (might be one by accident for some benchmarks).

Richard.

> Thanks,
> bin
>>
>>
>>>
>>> Signed-off-by:Ajit Agarwalajitkum@xilinx.com
>>> ---
>>>   gcc/loop-invariant.c       | 72
>>> +++++++++++++++++++++++++++++++++++++---------
>>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 52c8ae8..e4291c9 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>>       }
>>>   }
>>>
>>> +static int
>>> +determine_regs_used()
>>> +{
>>> +  unsigned int j;
>>> +  unsigned int reg_used = 2;
>>> +  bitmap_iterator bi;
>>> +
>>> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
>>> +    (reg_used) ++;
>>> +
>>> +  return reg_used;
>>> +}
>>
>> Isn't this just bitmap_count_bits (regs_live) + 2?
>>
>>
>>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>>       }
>>>   }
>>>
>>> -
>>> +static void
>>> +calculate_loop_liveness (void)
>>
>> Needs a function comment.
>>
>>
>>> +{
>>> +  basic_block bb;
>>> +  struct loop *loop;
>>>
>>> -/* Move the invariants out of the loops.  */
>>> +  FOR_EACH_LOOP (loop, 0)
>>> +    if (loop->aux == NULL)
>>> +      {
>>> +        loop->aux = xcalloc (1, sizeof (struct loop_data));
>>> +        bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
>>> +     }
>>> +
>>> +  FOR_EACH_BB_FN (bb, cfun)
>>
>> Why loop over blocks here?  Why not just iterate through all the loops in
>> the loop structure.  Order isn't particularly important AFAICT for this
>> code.
>>
>>
>>
>>> +       {
>>> +         int  i;
>>> +         edge e;
>>> +         vec<edge> edges;
>>> +         edges = get_loop_exit_edges (loop);
>>> +         FOR_EACH_VEC_ELT (edges, i, e)
>>> +         {
>>> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>>> DF_LR_OUT(e->src));
>>> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>>> DF_LR_IN(e->dest));
>>
>> Space before the open-paren in the previous two lines
>> DF_LR_OUT (e->src) and FD_LR_INT (e->dest))
>>
>>
>>> +         }
>>> +      }
>>> +  }
>>> +}
>>> +
>>> +/* Move the invariants  ut of the loops.  */
>>
>> Looks like you introduced a typo.
>>
>> I'd like to see testcases which show the change in # regs used computation
>> helping generate better code.
>>
>> And  I'd also like to see some background information on why you think this
>> is a more accurate measure for the number of registers used in the loop.
>> regs_used AFAICT is supposed to be an estimate of the registers live around
>> the loop.  So ISTM that you get that value by live-out set on the backedge
>> of the loop.  I guess you get somethign similar by looking at the exit
>> edge's source block's live-out set.  But I don't see any value  in including
>> stuff live at the block outside the loop.
>>
>> It also seems fairly non-intuitive.  Get the block's latch and use its
>> live-out set.  That seems more intuitive.
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-13  6:13 ` Jeff Law
  2015-11-13  6:32   ` Bin.Cheng
@ 2015-11-16 17:36   ` Ajit Kumar Agarwal
  2015-11-16 23:00     ` Jeff Law
  2015-11-16 17:57   ` Ajit Kumar Agarwal
  2 siblings, 1 reply; 14+ messages in thread
From: Ajit Kumar Agarwal @ 2015-11-16 17:36 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: Vinod Kathail, Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala



-----Original Message-----
From: Jeff Law [mailto:law@redhat.com] 
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 
> 2001
> From: Ajit Kumar Agarwal<ajitkum@xilix.com>
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal<ajitkum@xilinx.com>
>
> 	* loop-invariant.c (compute_loop_liveness): New.
> 	(determine_regs_used): New.
> 	(find_invariants_to_move): Use of determine_regs_used.
> 	* tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
> 	arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit Agarwalajitkum@xilinx.com
> ---
>   gcc/loop-invariant.c       | 72 +++++++++++++++++++++++++++++++++++++---------
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>       }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
> +    (reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>       }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +    if (loop->aux == NULL)
> +      {
> +        loop->aux = xcalloc (1, sizeof (struct loop_data));
> +        bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
> +     }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the basic blocks.

> +       {
> +         int  i;
> +         edge e;
> +         vec<edge> edges;
> +         edges = get_loop_exit_edges (loop);
> +         FOR_EACH_VEC_ELT (edges, i, e)
> +         {
> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines
>>DF_LR_OUT (e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> +         }
> +      }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks like you introduced a typo.

>>I'd like to see testcases which show the change in # regs used 
>>computation helping generate better code. 

We need to measure the test case with the scenario where the new variable created for loop invariant increases the register pressure and
the cost with respect to reg_used and new_regs increases that lead to spill and fetch and drop the invariant movement.

Getting that environment in the test case seems to be little difficult. But I will try to identify the testcases extracted from the Benchmarks
 where we got the gains by the above method.


>>And  I'd also like to see some background information on why you think 
>>this is a more accurate measure for the number of registers used in the 
>>loop.  regs_used AFAICT is supposed to be an estimate of the registers 
>>live around the loop.  So ISTM that you get that value by live-out set 
>>on the backedge of the loop.  I guess you get somethign similar by 
>>looking at the exit edge's source block's live-out set.  But I don't see 
>>any value  in including stuff live at the block outside the loop.

>>It also seems fairly non-intuitive.  Get the block's latch and use its 
>>live-out set.  That seems more intuitive.

We are interested in registers pressure that arises with code motion optimization like Loop Invariant. The register pressure
is based out of the number of interfering live ranges at a given point. If that exceeds the number of available registers, there may be chance
of spilling that live range. Again the spilling is based on the coloring scheme where the Briggs Allocator place such live ranges into the stack
during simplification phase instead of just spilling. This might enables the chances of getting the register and reduces the spill and fetch.

Again some of the coloring scheme choose the live range that has high register pressure at the highest priority for coloring and removes
Such nodes from interference graph and place it on the stack at the highest priority. This may give a better changes of covering the 
Interference graph colorable.

Based on the above background and considering all the above points the number of liveness of the loop is a better approximation of the
register pressure and should be considered for the register used to do the cost analysis.

Getting the loop exit edges and the source of the exit edge is the Loop latch node. This will take care of Live out of the latch node.
Moreover adding the Live-in of the destination edge of the loop exit edges will add  the liveness after the loops.  I also tried using only liveout 
of the loop latch nodes but did not see much gains using only  liveout of the loop latch node.

Thanks & Regards
Ajit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-13  6:13 ` Jeff Law
  2015-11-13  6:32   ` Bin.Cheng
  2015-11-16 17:36   ` Ajit Kumar Agarwal
@ 2015-11-16 17:57   ` Ajit Kumar Agarwal
  2015-11-17  5:37     ` Bin.Cheng
  2 siblings, 1 reply; 14+ messages in thread
From: Ajit Kumar Agarwal @ 2015-11-16 17:57 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: Vinod Kathail, Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala


Sorry I missed out some of the points in earlier mail which is given below.

-----Original Message-----
From: Ajit Kumar Agarwal 
Sent: Monday, November 16, 2015 11:07 PM
To: 'Jeff Law'; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.



-----Original Message-----
From: Jeff Law [mailto:law@redhat.com]
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00
> 2001
> From: Ajit Kumar Agarwal<ajitkum@xilix.com>
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal<ajitkum@xilinx.com>
>
> 	* loop-invariant.c (compute_loop_liveness): New.
> 	(determine_regs_used): New.
> 	(find_invariants_to_move): Use of determine_regs_used.
> 	* tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
> 	arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit Agarwalajitkum@xilinx.com
> ---
>   gcc/loop-invariant.c       | 72 +++++++++++++++++++++++++++++++++++++---------
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 
> 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>       }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
> +    (reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>       }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +    if (loop->aux == NULL)
> +      {
> +        loop->aux = xcalloc (1, sizeof (struct loop_data));
> +        bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
> +     }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the basic blocks.

> +       {
> +         int  i;
> +         edge e;
> +         vec<edge> edges;
> +         edges = get_loop_exit_edges (loop);
> +         FOR_EACH_VEC_ELT (edges, i, e)
> +         {
> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, 
> + DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines DF_LR_OUT 
>>(e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> +         }
> +      }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks like you introduced a typo.

>>I'd like to see testcases which show the change in # regs used 
>>computation helping generate better code.

We need to measure the test case with the scenario where the new variable created for loop invariant increases the register pressure 
and the cost with respect to reg_used and new_regs increases that lead to spill and fetch and drop the invariant movement.

Getting that environment in the test case seems to be little difficult. But I will try to identify the testcases extracted from the Benchmarks 
 where we got the gains by the above method.


>>And  I'd also like to see some background information on why you think 
>>this is a more accurate measure for the number of registers used in 
>>the loop.  regs_used AFAICT is supposed to be an estimate of the 
>>registers live around the loop.  So ISTM that you get that value by 
>>live-out set on the backedge of the loop.  I guess you get somethign 
>>similar by looking at the exit edge's source block's live-out set.  
>>But I don't see any value  in including stuff live at the block outside the loop.

>>It also seems fairly non-intuitive.  Get the block's latch and use its 
>>live-out set.  That seems more intuitive.

We are interested in registers pressure that arises with code motion optimization like Loop Invariant. The register pressure is based out 
of the number of interfering live ranges at a given point. If that exceeds the number of available registers, there may be chance of spilling 
that live range. Again the spilling is based on the coloring scheme where the Briggs Allocator place such live ranges into the stack during 
simplification phase instead of just spilling. This might enables the chances of getting the register and reduces the spill and fetch.

Again some of the coloring scheme choose the live range that has high register pressure at the highest priority for coloring and removes 
Such nodes from interference graph and place it on the stack at the highest priority. This may give a better changes of covering the Interference 
graph colorable.

In the region based allocator the loops forms a region and accordingly the allocno are assigned accumulating the live range info from inner loops to outerloops.
The number of liveness and the register pressure contributes significantly  for the loops forming the regions. 
 
Based on the above background and considering all the above points the number of liveness of the loop is a better approximation of the register pressure and
 should be considered for the register used to do the cost analysis.

Getting the loop exit edges and the source of the exit edge is the Loop latch node. This will take care of Live out of the latch node.
Moreover adding the Live-in of the destination edge of the loop exit edges will add  the liveness after the loops.  I also tried using only liveout of the loop 
latch nodes but did not see much gains using only  liveout of the loop latch node.

Thanks & Regards
Ajit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-16 17:36   ` Ajit Kumar Agarwal
@ 2015-11-16 23:00     ` Jeff Law
  2015-11-29 17:14       ` Ajit Kumar Agarwal
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-11-16 23:00 UTC (permalink / raw)
  To: Ajit Kumar Agarwal, GCC Patches
  Cc: Vinod Kathail, Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala

On 11/16/2015 10:36 AM, Ajit Kumar Agarwal wrote:
>> For Induction variable optimization on tree SSA representation, the
>> register used logic is based on the number of phi nodes at the loop
>> header to represent the liveness at the loop.  Current Logic used only
>> the number of phi nodes at the loop header.  Changes are made to
>> represent the phi operands also live at the loop. Thus number of phi
>> operands also gets incremented in the number of registers used.
But my question is why is the # of PHI operands useful here.  You'd have 
a stronger argument if it was the number of unique operands in each PHI. 
  While I don't doubt this patch improved things, I think it's just 
putting a band-aid over the problem.

I think anything that just looks at PHIs or at register liveness at loop 
boundaries is inherently underestimating the register pressure 
implications of code motion from inside to outside a loop.

If an object is pulled out of the loop, then it's going to conflict with 
nearly every object that births in the loop (because the object being 
moved now has to live throughout the loop).  There's exceptions, but I 
doubt they matter in practice.  The object is also going to conflict 
with anything else that is live through the loop.  At least that's how 
it seems to me at first thought.

So build the set of objects (SSA_NAMEs) that either birth or are live 
through the loop that have the same type class as the object we want to 
hoist out of the loop (scalar, floating point, vector).  Use that set of 
objects to estimate register pressure.

It won't be exact because some of those objects could end up coloring 
the same.  BUt it's probably still considerably more accurate than what 
we have now.

I suspect that would be a better estimator for register pressure as far 
as LICM is concerned.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-16 17:57   ` Ajit Kumar Agarwal
@ 2015-11-17  5:37     ` Bin.Cheng
  0 siblings, 0 replies; 14+ messages in thread
From: Bin.Cheng @ 2015-11-17  5:37 UTC (permalink / raw)
  To: Ajit Kumar Agarwal
  Cc: Jeff Law, GCC Patches, Vinod Kathail, Shail Aditya Gupta,
	Vidhumouli Hunsigida, Nagaraju Mekala

On Tue, Nov 17, 2015 at 1:56 AM, Ajit Kumar Agarwal
<ajit.kumar.agarwal@xilinx.com> wrote:
>
> Sorry I missed out some of the points in earlier mail which is given below.
>
> -----Original Message-----
> From: Ajit Kumar Agarwal
> Sent: Monday, November 16, 2015 11:07 PM
> To: 'Jeff Law'; GCC Patches
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
>
>
>
> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Friday, November 13, 2015 11:44 AM
> To: Ajit Kumar Agarwal; GCC Patches
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
>
> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>
>>
>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>
>>
>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00
>> 2001
>> From: Ajit Kumar Agarwal<ajitkum@xilix.com>
>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>>   loop for LICM and IVOPTS.
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>> Induction variable optimization based on SSA representation. The
>> current logic used in LICM for register used inside the loops is
>> changed. The Live Out of the loop latch node and the Live in of the
>> destination of the exit nodes is used to set the Loops Liveness at the exit of the Loop.
>> The register used is the number of live variables at the exit of the
>> Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the
>> register used logic is based on the number of phi nodes at the loop
>> header to represent the liveness at the loop.  Current Logic used only
>> the number of phi nodes at the loop header.  Changes are made to
>> represent the phi operands also live at the loop. Thus number of phi
>> operands also gets incremented in the number of registers used.
>>
>> ChangeLog:
>> 2015-10-09  Ajit Agarwal<ajitkum@xilinx.com>
>>
>>       * loop-invariant.c (compute_loop_liveness): New.
>>       (determine_regs_used): New.
>>       (find_invariants_to_move): Use of determine_regs_used.
>>       * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>>       arguments for register used.
>>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the loop-invariant change is still pending, right?
>
>
>>
>> Signed-off-by:Ajit Agarwalajitkum@xilinx.com
>> ---
>>   gcc/loop-invariant.c       | 72 +++++++++++++++++++++++++++++++++++++---------
>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index
>> 52c8ae8..e4291c9 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>       }
>>   }
>>
>> +static int
>> +determine_regs_used()
>> +{
>> +  unsigned int j;
>> +  unsigned int reg_used = 2;
>> +  bitmap_iterator bi;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
>> +    (reg_used) ++;
>> +
>> +  return reg_used;
>> +}
>>>Isn't this just bitmap_count_bits (regs_live) + 2?
>
>
>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>       }
>>   }
>>
>> -
>> +static void
>> +calculate_loop_liveness (void)
>>>Needs a function comment.
>
> I will incorporate the above comments.
>> +{
>> +  basic_block bb;
>> +  struct loop *loop;
>>
>> -/* Move the invariants out of the loops.  */
>> +  FOR_EACH_LOOP (loop, 0)
>> +    if (loop->aux == NULL)
>> +      {
>> +        loop->aux = xcalloc (1, sizeof (struct loop_data));
>> +        bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
>> +     }
>> +
>> +  FOR_EACH_BB_FN (bb, cfun)
>>>Why loop over blocks here?  Why not just iterate through all the loops
>>>in the loop structure.  Order isn't particularly important AFAICT for
>>>this code.
>
> Iterating over the Loop structure is enough. We don't need iterating over the basic blocks.
>
>> +       {
>> +         int  i;
>> +         edge e;
>> +         vec<edge> edges;
>> +         edges = get_loop_exit_edges (loop);
>> +         FOR_EACH_VEC_ELT (edges, i, e)
>> +         {
>> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
>> +           bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>> + DF_LR_IN(e->dest));
>>>Space before the open-paren in the previous two lines DF_LR_OUT
>>>(e->src) and FD_LR_INT (e->dest))
>
> I will incorporate this.
>
>> +         }
>> +      }
>> +  }
>> +}
>> +
>> +/* Move the invariants  ut of the loops.  */
>>>Looks like you introduced a typo.
>
>>>I'd like to see testcases which show the change in # regs used
>>>computation helping generate better code.
>
> We need to measure the test case with the scenario where the new variable created for loop invariant increases the register pressure
> and the cost with respect to reg_used and new_regs increases that lead to spill and fetch and drop the invariant movement.
>
> Getting that environment in the test case seems to be little difficult. But I will try to identify the testcases extracted from the Benchmarks
>  where we got the gains by the above method.
>
>
>>>And  I'd also like to see some background information on why you think
>>>this is a more accurate measure for the number of registers used in
>>>the loop.  regs_used AFAICT is supposed to be an estimate of the
>>>registers live around the loop.  So ISTM that you get that value by
>>>live-out set on the backedge of the loop.  I guess you get somethign
>>>similar by looking at the exit edge's source block's live-out set.
>>>But I don't see any value  in including stuff live at the block outside the loop.
>
>>>It also seems fairly non-intuitive.  Get the block's latch and use its
>>>live-out set.  That seems more intuitive.
>
> We are interested in registers pressure that arises with code motion optimization like Loop Invariant. The register pressure is based out
> of the number of interfering live ranges at a given point. If that exceeds the number of available registers, there may be chance of spilling
> that live range. Again the spilling is based on the coloring scheme where the Briggs Allocator place such live ranges into the stack during
> simplification phase instead of just spilling. This might enables the chances of getting the register and reduces the spill and fetch.
>
> Again some of the coloring scheme choose the live range that has high register pressure at the highest priority for coloring and removes
> Such nodes from interference graph and place it on the stack at the highest priority. This may give a better changes of covering the Interference
> graph colorable.
>
> In the region based allocator the loops forms a region and accordingly the allocno are assigned accumulating the live range info from inner loops to outerloops.
> The number of liveness and the register pressure contributes significantly  for the loops forming the regions.
IIUC, the change also counts live through registers, this is kind of
against intuition if region based RA is used.  Shouldn't live through
regs be saved/restored at loop boundary?  ISTM, this change punishes
loop invariant hoist for register pressure around/outside loop.  I
think we can rule out live through registers like below and see what
will happen:
+
+  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
+    if (!DF_REGNO_FIRST_DEF (i))
+      (reg_used) ++;
+

I am also curious why the loop live in register are not counted.  It's
strange to count live out registers and use that information to guide
an optimization that happens at the point of loop live in.

I will benchmark this on aarch64 later.

Thanks,
bin
>
> Based on the above background and considering all the above points the number of liveness of the loop is a better approximation of the register pressure and
>  should be considered for the register used to do the cost analysis.
>
> Getting the loop exit edges and the source of the exit edge is the Loop latch node. This will take care of Live out of the latch node.
> Moreover adding the Live-in of the destination edge of the loop exit edges will add  the liveness after the loops.  I also tried using only liveout of the loop
> latch nodes but did not see much gains using only  liveout of the loop latch node.
>
> Thanks & Regards
> Ajit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-16 23:00     ` Jeff Law
@ 2015-11-29 17:14       ` Ajit Kumar Agarwal
  2015-11-30 22:31         ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Ajit Kumar Agarwal @ 2015-11-29 17:14 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: Vinod Kathail, Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala

Hello Jeff:

-----Original Message-----
From: Jeff Law [mailto:law@redhat.com] 
Sent: Tuesday, November 17, 2015 4:30 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

On 11/16/2015 10:36 AM, Ajit Kumar Agarwal wrote:
>> For Induction variable optimization on tree SSA representation, the 
>> register used logic is based on the number of phi nodes at the loop 
>> header to represent the liveness at the loop.  Current Logic used 
>> only the number of phi nodes at the loop header.  Changes are made to 
>> represent the phi operands also live at the loop. Thus number of phi 
>> operands also gets incremented in the number of registers used.
>>But my question is why is the # of PHI operands useful here.  You'd have a stronger argument if it was the number of unique operands in each PHI. 
 >> While I don't doubt this patch improved things, I think it's just putting a band-aid over the problem.

>>I think anything that just looks at PHIs or at register liveness at loop boundaries is inherently underestimating the register pressure implications of code motion from inside to outside a >>loop.

>>If an object is pulled out of the loop, then it's going to conflict with nearly every object that births in the loop (because the object being moved now has to live throughout the loop).  >>There's exceptions, but I doubt they matter in practice.  The object is also going to conflict with anything else that is live through the loop.  At least that's how it seems to me at first >>thought.

>>So build the set of objects (SSA_NAMEs) that either birth or are live through the loop that have the same type class as the object we want to hoist out of the loop (scalar, floating point, >>vector).  Use that set of objects to estimate register pressure.

I agree with the above.  To add up on the above, we only require to calculate the set of objects ( SSA_NAMES) that are live at the birth or the header of the loop.
We don't need to calculate the live through the Loop considering Live in and Live out of all the basic blocks of the Loop. This is because the set of objects (SSA_NAMES) 
That are live-in at the birth or header of the loop will be live-in at every node in the Loop.

If a v live out at the header of the loop then the variable is live-in at every node in the Loop. To prove this, Consider a Loop L with header h such that
The variable v defined at d is live-in at h. Since v is live at h, d is not part of L. This follows from the dominance property, i.e. h is strictly dominated by d.
Furthermore, there exists a path from h to a use of v which does not go through d. For every node of the loop, p, since the loop is strongly connected
Component of the CFG, there exists a path, consisting only of nodes of L from p to h. Concatenating those two paths prove that v is live-in and live-out
Of p.

On top of live-in at the birth or header of the loop as proven above, if we calculate the Live out of the exit block of the block and Live-in at the destination
Edge of the exit block of the loops. This consider the liveness outside of the Loop.

The above two cases forms the basis of better estimator for register pressure as far as LICM is concerned.

If you agree with the above, I will implement add the above in the patch for register_used estimates for better estimate of register pressure for LICM.

Thanks & Regards
Ajit

>>It won't be exact because some of those objects could end up coloring the same.  BUt it's probably still considerably more accurate than what we have now.

>>I suspect that would be a better estimator for register pressure as far as LICM is concerned.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.
  2015-11-29 17:14       ` Ajit Kumar Agarwal
@ 2015-11-30 22:31         ` Jeff Law
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Law @ 2015-11-30 22:31 UTC (permalink / raw)
  To: Ajit Kumar Agarwal, GCC Patches
  Cc: Vinod Kathail, Shail Aditya Gupta, Vidhumouli Hunsigida, Nagaraju Mekala

On 11/29/2015 09:24 AM, Ajit Kumar Agarwal wrote:
>
> I agree with the above.  To add up on the above, we only require to calculate the set of objects ( SSA_NAMES) that are live at the birth or the header of the loop.
> We don't need to calculate the live through the Loop considering Live in and Live out of all the basic blocks of the Loop. This is because the set of objects (SSA_NAMES)
> That are live-in at the birth or header of the loop will be live-in at every node in the Loop.
>
> If a v live out at the header of the loop then the variable is live-in at every node in the Loop. To prove this, Consider a Loop L with header h such that
> The variable v defined at d is live-in at h. Since v is live at h, d is not part of L. This follows from the dominance property, i.e. h is strictly dominated by d.
> Furthermore, there exists a path from h to a use of v which does not go through d. For every node of the loop, p, since the loop is strongly connected
> Component of the CFG, there exists a path, consisting only of nodes of L from p to h. Concatenating those two paths prove that v is live-in and live-out
> Of p.
>
> On top of live-in at the birth or header of the loop as proven above, if we calculate the Live out of the exit block of the block and Live-in at the destination
> Edge of the exit block of the loops. This consider the liveness outside of the Loop.
>
> The above two cases forms the basis of better estimator for register pressure as far as LICM is concerned.
>
> If you agree with the above, I will implement add the above in the patch for register_used estimates for better estimate of register pressure for LICM.
Yes, I think we're in agreement.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-11-30 22:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-08  4:32 [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS Ajit Kumar Agarwal
2015-10-08  4:59 ` Bin.Cheng
2015-10-08  5:54   ` Ajit Kumar Agarwal
2015-10-09  2:45     ` Bin.Cheng
2015-10-12  4:25       ` Ajit Kumar Agarwal
2015-11-13  6:13 ` Jeff Law
2015-11-13  6:32   ` Bin.Cheng
2015-11-13 10:18     ` Richard Biener
2015-11-16 17:36   ` Ajit Kumar Agarwal
2015-11-16 23:00     ` Jeff Law
2015-11-29 17:14       ` Ajit Kumar Agarwal
2015-11-30 22:31         ` Jeff Law
2015-11-16 17:57   ` Ajit Kumar Agarwal
2015-11-17  5:37     ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).