public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
@ 2017-06-27 10:50 Bin Cheng
  2017-07-10  8:25 ` Bin.Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Bin Cheng @ 2017-06-27 10:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

Hi,
This is a followup patch better handling below case:
     for (i = 0; i < n; i++)
       {
	 a[i] = 1;
	 a[i+2] = 2;
       }
Instead of generating root variables by loading from memory and propagating with PHI
nodes, like:
     t0 = a[0];
     t1 = a[1];
     for (i = 0; i < n; i++)
       {
	 a[i] = 1;
	 t2 = 2;
	 t0 = t1;
	 t1 = t2;
       }
     a[n] = t0;
     a[n+1] = t1;
We can simply store loop invariant values after loop body if we know loop iterates more
than chain->length times, like:
     for (i = 0; i < n; i++)
       {
	 a[i] = 1;
       }
     a[n] = 2;
     a[n+1] = 2;

Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?

Thanks,
bin
2017-06-21  Bin Cheng  <bin.cheng@arm.com>

	* tree-predcom.c: (struct chain): Handle store-store chain in which
	stores for elimination only store loop invariant values.
	(execute_pred_commoning_chain): Ditto.
	(prepare_initializers_chain_store_elim): Ditto.
	(prepare_finalizers): Ditto.
	(is_inv_store_elimination_chain): New function.
	(initialize_root_vars_store_elim_1): New function.

[-- Attachment #2: inv-store-elimination-20170621.txt --]
[-- Type: text/plain, Size: 6142 bytes --]

From 16603c31d42e44f93a5de0faa0354629e669c5d0 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Wed, 21 Jun 2017 16:18:43 +0100
Subject: [PATCH 6/6] inv-store-elimination-20170621.txt

---
 gcc/tree-predcom.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 125 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 9be93e4..8e38be4 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -327,6 +327,10 @@ typedef struct chain
 
   /* True if this chain was combined together with some other chain.  */
   unsigned combined : 1;
+
+  /* True if this is store elimination chain and eliminated stores store
+     loop invariant value into memory.  */
+  unsigned inv_store_elimination : 1;
 } *chain_p;
 
 
@@ -1630,6 +1634,98 @@ initialize_root_vars (struct loop *loop, chain_p chain, bitmap tmp_vars)
     }
 }
 
+/* For inter-iteration store elimination CHAIN in LOOP, returns true if
+   all stores to be eliminated store loop invariant values into memory.
+   In this case, we can use these invariant values directly after LOOP.  */
+
+static bool
+is_inv_store_elimination_chain (struct loop *loop, chain_p chain)
+{
+  if (chain->length == 0 || chain->type != CT_STORE_STORE)
+    return false;
+
+  gcc_assert (!chain->has_max_use_after);
+
+  /* If loop iterates for unknown times or fewer times than chain->lenght,
+     we still need to setup root variable and propagate it with PHI node.  */
+  tree niters = number_of_latch_executions (loop);
+  if (TREE_CODE (niters) != INTEGER_CST || wi::leu_p (niters, chain->length))
+    return false;
+
+  /* Check stores in chain for elimination if they only store loop invariant
+     values.  */
+  for (unsigned i = 0; i < chain->length; i++)
+    {
+      dref a = get_chain_last_ref_at (chain, i);
+      if (a == NULL)
+	continue;
+
+      gimple *def_stmt, *stmt = a->stmt;
+      if (!gimple_assign_single_p (stmt))
+	return false;
+
+      tree val = gimple_assign_rhs1 (stmt);
+      if (TREE_CLOBBER_P (val))
+	return false;
+
+      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
+	continue;
+
+      if (TREE_CODE (val) != SSA_NAME)
+	return false;
+
+      def_stmt = SSA_NAME_DEF_STMT (val);
+      if (gimple_nop_p (def_stmt))
+	continue;
+
+      if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
+	return false;
+    }
+  return true;
+}
+
+/* Creates root variables for store elimination CHAIN in which stores for
+   elimination only store loop invariant values.  In this case, we neither
+   need to load root variables before loop nor propagate it with PHI nodes.  */
+
+static void
+initialize_root_vars_store_elim_1 (chain_p chain)
+{
+  tree var;
+  unsigned i, n = chain->length;
+
+  chain->vars.create (n);
+  chain->vars.safe_grow_cleared (n);
+
+  /* Initialize root value for eliminated stores at each distance.  */
+  for (i = 0; i < n; i++)
+    {
+      dref a = get_chain_last_ref_at (chain, i);
+      if (a == NULL)
+	continue;
+
+      var = gimple_assign_rhs1 (a->stmt);
+      chain->vars[a->distance] = var;
+    }
+
+  /* We don't propagate values with PHI nodes, so manually propagate value
+     to bubble positions.  */
+  var = chain->vars[0];
+  for (i = 1; i < n; i++)
+    {
+      if (chain->vars[i] != NULL_TREE)
+	{
+	  var = chain->vars[i];
+	  continue;
+	}
+      chain->vars[i] = var;
+    }
+
+  /* Revert the vector.  */
+  for (i = 0; i < n / 2; i++)
+    std::swap (chain->vars[i], chain->vars[n - i - 1]);
+}
+
 /* Creates root variables for store elimination CHAIN in which stores for
    elimination store loop variant values.  In this case, we may need to
    load root variables before LOOP and propagate it with PHI nodes.  Uids
@@ -1953,10 +2049,20 @@ execute_pred_commoning_chain (struct loop *loop, chain_p chain,
     {
       if (chain->length > 0)
 	{
-	  /* For inter-iteration store elimination chain, set up the
-	     variables by loading from memory before loop, copying from rhs
-	     of stores for elimination and propagate it with PHI nodes.  */
-	  initialize_root_vars_store_elim_2 (loop, chain, tmp_vars);
+	  if (chain->inv_store_elimination)
+	    {
+	      /* If dead stores in this chain only store loop invariant
+		 values, we can simply record the invariant value and use
+		 it directly after loop.  */
+	      initialize_root_vars_store_elim_1 (chain);
+	    }
+	  else
+	    {
+	      /* If dead stores in this chain store loop variant values,
+		 we need to set up the variables by loading from memory
+		 before loop and propagating it with PHI nodes.  */
+	      initialize_root_vars_store_elim_2 (loop, chain, tmp_vars);
+	    }
 
 	  /* For inter-iteration store elimination chain, stores at each
 	     distance in loop's last (chain->length - 1) iterations can't
@@ -2657,7 +2763,7 @@ try_combine_chains (vec<chain_p> *chains)
    otherwise.  */
 
 static bool
-prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
+prepare_initializers_chain_store_elim (struct loop *loop, chain_p chain)
 {
   unsigned i, n = chain->length;
 
@@ -2665,6 +2771,15 @@ prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
   if (n == 0 && chain->type == CT_STORE_STORE)
     return true;
 
+  /* For store elimination chain, there is nothing to initialize if stores
+     to be eliminated only store loop invariant values into memory.  */
+  if (chain->type == CT_STORE_STORE
+      && is_inv_store_elimination_chain (loop, chain))
+    {
+      chain->inv_store_elimination = true;
+      return true;
+    }
+
   chain->inits.create (n);
   chain->inits.safe_grow_cleared (n);
 
@@ -2864,7 +2979,11 @@ prepare_finalizers (struct loop *loop, vec<chain_p> chains)
       if (prepare_finalizers_chain (loop, chain))
 	{
 	  i++;
-	  loop_closed_ssa = true;
+	  /* We don't corrupt loop closed ssa form for store elimination
+	     chain if eliminated stores only store loop invariant values
+	     into memory.  */
+	  if (!chain->inv_store_elimination)
+	    loop_closed_ssa |= (!chain->inv_store_elimination);
 	}
       else
 	{
-- 
1.9.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
  2017-06-27 10:50 [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant Bin Cheng
@ 2017-07-10  8:25 ` Bin.Cheng
  2017-07-25 11:48   ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Bin.Cheng @ 2017-07-10  8:25 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1429 bytes --]

On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This is a followup patch better handling below case:
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>          a[i+2] = 2;
>        }
> Instead of generating root variables by loading from memory and propagating with PHI
> nodes, like:
>      t0 = a[0];
>      t1 = a[1];
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>          t2 = 2;
>          t0 = t1;
>          t1 = t2;
>        }
>      a[n] = t0;
>      a[n+1] = t1;
> We can simply store loop invariant values after loop body if we know loop iterates more
> than chain->length times, like:
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>        }
>      a[n] = 2;
>      a[n+1] = 2;
>
> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
Update patch wrto changes in previous patch.
Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin
>
> Thanks,
> bin
> 2017-06-21  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-predcom.c: (struct chain): Handle store-store chain in which
>         stores for elimination only store loop invariant values.
>         (execute_pred_commoning_chain): Ditto.
>         (prepare_initializers_chain_store_elim): Ditto.
>         (prepare_finalizers): Ditto.
>         (is_inv_store_elimination_chain): New function.
>         (initialize_root_vars_store_elim_1): New function.

[-- Attachment #2: 0006-inv-store-elimination-20170627.txt.patch --]
[-- Type: text/x-patch, Size: 6141 bytes --]

From 7b03b8fe27ce144b557cfd10143e2531323164db Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Wed, 5 Jul 2017 17:30:38 +0100
Subject: [PATCH 6/6] inv-store-elimination-20170627.txt

---
 gcc/tree-predcom.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 125 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index ac0d0d0..d2738a5 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -331,6 +331,10 @@ typedef struct chain
 
   /* True if this chain was combined together with some other chain.  */
   unsigned combined : 1;
+
+  /* True if this is store elimination chain and eliminated stores store
+     loop invariant value into memory.  */
+  unsigned inv_store_elimination : 1;
 } *chain_p;
 
 
@@ -1634,6 +1638,98 @@ initialize_root_vars (struct loop *loop, chain_p chain, bitmap tmp_vars)
     }
 }
 
+/* For inter-iteration store elimination CHAIN in LOOP, returns true if
+   all stores to be eliminated store loop invariant values into memory.
+   In this case, we can use these invariant values directly after LOOP.  */
+
+static bool
+is_inv_store_elimination_chain (struct loop *loop, chain_p chain)
+{
+  if (chain->length == 0 || chain->type != CT_STORE_STORE)
+    return false;
+
+  gcc_assert (!chain->has_max_use_after);
+
+  /* If loop iterates for unknown times or fewer times than chain->lenght,
+     we still need to setup root variable and propagate it with PHI node.  */
+  tree niters = number_of_latch_executions (loop);
+  if (TREE_CODE (niters) != INTEGER_CST || wi::leu_p (niters, chain->length))
+    return false;
+
+  /* Check stores in chain for elimination if they only store loop invariant
+     values.  */
+  for (unsigned i = 0; i < chain->length; i++)
+    {
+      dref a = get_chain_last_ref_at (chain, i);
+      if (a == NULL)
+	continue;
+
+      gimple *def_stmt, *stmt = a->stmt;
+      if (!gimple_assign_single_p (stmt))
+	return false;
+
+      tree val = gimple_assign_rhs1 (stmt);
+      if (TREE_CLOBBER_P (val))
+	return false;
+
+      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
+	continue;
+
+      if (TREE_CODE (val) != SSA_NAME)
+	return false;
+
+      def_stmt = SSA_NAME_DEF_STMT (val);
+      if (gimple_nop_p (def_stmt))
+	continue;
+
+      if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
+	return false;
+    }
+  return true;
+}
+
+/* Creates root variables for store elimination CHAIN in which stores for
+   elimination only store loop invariant values.  In this case, we neither
+   need to load root variables before loop nor propagate it with PHI nodes.  */
+
+static void
+initialize_root_vars_store_elim_1 (chain_p chain)
+{
+  tree var;
+  unsigned i, n = chain->length;
+
+  chain->vars.create (n);
+  chain->vars.safe_grow_cleared (n);
+
+  /* Initialize root value for eliminated stores at each distance.  */
+  for (i = 0; i < n; i++)
+    {
+      dref a = get_chain_last_ref_at (chain, i);
+      if (a == NULL)
+	continue;
+
+      var = gimple_assign_rhs1 (a->stmt);
+      chain->vars[a->distance] = var;
+    }
+
+  /* We don't propagate values with PHI nodes, so manually propagate value
+     to bubble positions.  */
+  var = chain->vars[0];
+  for (i = 1; i < n; i++)
+    {
+      if (chain->vars[i] != NULL_TREE)
+	{
+	  var = chain->vars[i];
+	  continue;
+	}
+      chain->vars[i] = var;
+    }
+
+  /* Revert the vector.  */
+  for (i = 0; i < n / 2; i++)
+    std::swap (chain->vars[i], chain->vars[n - i - 1]);
+}
+
 /* Creates root variables for store elimination CHAIN in which stores for
    elimination store loop variant values.  In this case, we may need to
    load root variables before LOOP and propagate it with PHI nodes.  Uids
@@ -1957,10 +2053,20 @@ execute_pred_commoning_chain (struct loop *loop, chain_p chain,
     {
       if (chain->length > 0)
 	{
-	  /* For inter-iteration store elimination chain, set up the
-	     variables by loading from memory before loop, copying from rhs
-	     of stores for elimination and propagate it with PHI nodes.  */
-	  initialize_root_vars_store_elim_2 (loop, chain, tmp_vars);
+	  if (chain->inv_store_elimination)
+	    {
+	      /* If dead stores in this chain only store loop invariant
+		 values, we can simply record the invariant value and use
+		 it directly after loop.  */
+	      initialize_root_vars_store_elim_1 (chain);
+	    }
+	  else
+	    {
+	      /* If dead stores in this chain store loop variant values,
+		 we need to set up the variables by loading from memory
+		 before loop and propagating it with PHI nodes.  */
+	      initialize_root_vars_store_elim_2 (loop, chain, tmp_vars);
+	    }
 
 	  /* For inter-iteration store elimination chain, stores at each
 	     distance in loop's last (chain->length - 1) iterations can't
@@ -2661,7 +2767,7 @@ try_combine_chains (vec<chain_p> *chains)
    otherwise.  */
 
 static bool
-prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
+prepare_initializers_chain_store_elim (struct loop *loop, chain_p chain)
 {
   unsigned i, n = chain->length;
 
@@ -2674,6 +2780,15 @@ prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
   if (n == 0 && chain->type == CT_STORE_STORE)
     return true;
 
+  /* For store elimination chain, there is nothing to initialize if stores
+     to be eliminated only store loop invariant values into memory.  */
+  if (chain->type == CT_STORE_STORE
+      && is_inv_store_elimination_chain (loop, chain))
+    {
+      chain->inv_store_elimination = true;
+      return true;
+    }
+
   chain->inits.create (n);
   chain->inits.safe_grow_cleared (n);
 
@@ -2866,7 +2981,11 @@ prepare_finalizers (struct loop *loop, vec<chain_p> chains)
       if (prepare_finalizers_chain (loop, chain))
 	{
 	  i++;
-	  loop_closed_ssa = true;
+	  /* We don't corrupt loop closed ssa form for store elimination
+	     chain if eliminated stores only store loop invariant values
+	     into memory.  */
+	  if (!chain->inv_store_elimination)
+	    loop_closed_ssa |= (!chain->inv_store_elimination);
 	}
       else
 	{
-- 
1.9.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
  2017-07-10  8:25 ` Bin.Cheng
@ 2017-07-25 11:48   ` Richard Biener
  2017-07-25 12:38     ` Bin.Cheng
  2017-07-28 15:42     ` Bin.Cheng
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Biener @ 2017-07-25 11:48 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches

On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> This is a followup patch better handling below case:
>>      for (i = 0; i < n; i++)
>>        {
>>          a[i] = 1;
>>          a[i+2] = 2;
>>        }
>> Instead of generating root variables by loading from memory and propagating with PHI
>> nodes, like:
>>      t0 = a[0];
>>      t1 = a[1];
>>      for (i = 0; i < n; i++)
>>        {
>>          a[i] = 1;
>>          t2 = 2;
>>          t0 = t1;
>>          t1 = t2;
>>        }
>>      a[n] = t0;
>>      a[n+1] = t1;
>> We can simply store loop invariant values after loop body if we know loop iterates more
>> than chain->length times, like:
>>      for (i = 0; i < n; i++)
>>        {
>>          a[i] = 1;
>>        }
>>      a[n] = 2;
>>      a[n+1] = 2;
>>
>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
> Update patch wrto changes in previous patch.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

+      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
+       continue;

Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
FIXED_CST would be ok as well for example.

Ok with that change.  Did we eventually optimize this in followup
passes previously?

Richard.

> Thanks,
> bin
>>
>> Thanks,
>> bin
>> 2017-06-21  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-predcom.c: (struct chain): Handle store-store chain in which
>>         stores for elimination only store loop invariant values.
>>         (execute_pred_commoning_chain): Ditto.
>>         (prepare_initializers_chain_store_elim): Ditto.
>>         (prepare_finalizers): Ditto.
>>         (is_inv_store_elimination_chain): New function.
>>         (initialize_root_vars_store_elim_1): New function.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
  2017-07-25 11:48   ` Richard Biener
@ 2017-07-25 12:38     ` Bin.Cheng
  2017-07-25 12:58       ` Richard Biener
  2017-07-28 15:42     ` Bin.Cheng
  1 sibling, 1 reply; 7+ messages in thread
From: Bin.Cheng @ 2017-07-25 12:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> This is a followup patch better handling below case:
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>          a[i+2] = 2;
>>>        }
>>> Instead of generating root variables by loading from memory and propagating with PHI
>>> nodes, like:
>>>      t0 = a[0];
>>>      t1 = a[1];
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>          t2 = 2;
>>>          t0 = t1;
>>>          t1 = t2;
>>>        }
>>>      a[n] = t0;
>>>      a[n+1] = t1;
>>> We can simply store loop invariant values after loop body if we know loop iterates more
>>> than chain->length times, like:
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>        }
>>>      a[n] = 2;
>>>      a[n+1] = 2;
>>>
>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>> Update patch wrto changes in previous patch.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> +      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
> +       continue;
>
> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
> FIXED_CST would be ok as well for example.
>
> Ok with that change.  Did we eventually optimize this in followup
> passes previously?
Probably not?  Given below test:

int a[10000], b[10000], c[10000];
int f(void)
{
  int i, n = 100;
  int t0 = a[0];
  int t1 = a[1];
     for (i = 0; i < n; i++)
       {
         a[i] = 1;
         int t2 = 2;
         t0 = t1;
         t1 = t2;
       }
     a[n] = t0;
     a[n+1] = t1;
  return 0;
}
The optimized dump is as:

  <bb 2> [1.00%] [count: INV]:
  t1_8 = a[1];
  ivtmp.9_17 = (unsigned long) &a;
  _16 = ivtmp.9_17 + 400;

  <bb 3> [99.00%] [count: INV]:
  # t1_20 = PHI <2(3), t1_8(2)>
  # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)>
  _15 = (void *) ivtmp.9_2;
  MEM[base: _15, offset: 0B] = 1;
  ivtmp.9_1 = ivtmp.9_2 + 4;
  if (ivtmp.9_1 != _16)
    goto <bb 3>; [98.99%] [count: INV]
  else
    goto <bb 4>; [1.01%] [count: INV]

  <bb 4> [1.00%] [count: INV]:
  a[100] = t1_20;
  a[101] = 2;
  return 0;

We now eliminate one phi and leave another behind.  It is vrp1/dce2
when the phi is eliminated.

Thanks,
bin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
  2017-07-25 12:38     ` Bin.Cheng
@ 2017-07-25 12:58       ` Richard Biener
  2017-07-25 13:11         ` Bin.Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2017-07-25 12:58 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches

On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> This is a followup patch better handling below case:
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>          a[i+2] = 2;
>>>>        }
>>>> Instead of generating root variables by loading from memory and propagating with PHI
>>>> nodes, like:
>>>>      t0 = a[0];
>>>>      t1 = a[1];
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>          t2 = 2;
>>>>          t0 = t1;
>>>>          t1 = t2;
>>>>        }
>>>>      a[n] = t0;
>>>>      a[n+1] = t1;
>>>> We can simply store loop invariant values after loop body if we know loop iterates more
>>>> than chain->length times, like:
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>        }
>>>>      a[n] = 2;
>>>>      a[n+1] = 2;
>>>>
>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>> Update patch wrto changes in previous patch.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> +      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
>> +       continue;
>>
>> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
>> FIXED_CST would be ok as well for example.
>>
>> Ok with that change.  Did we eventually optimize this in followup
>> passes previously?
> Probably not?  Given below test:
>
> int a[10000], b[10000], c[10000];
> int f(void)
> {
>   int i, n = 100;
>   int t0 = a[0];
>   int t1 = a[1];
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>          int t2 = 2;
>          t0 = t1;
>          t1 = t2;
>        }
>      a[n] = t0;
>      a[n+1] = t1;
>   return 0;
> }
> The optimized dump is as:
>
>   <bb 2> [1.00%] [count: INV]:
>   t1_8 = a[1];
>   ivtmp.9_17 = (unsigned long) &a;
>   _16 = ivtmp.9_17 + 400;
>
>   <bb 3> [99.00%] [count: INV]:
>   # t1_20 = PHI <2(3), t1_8(2)>
>   # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)>
>   _15 = (void *) ivtmp.9_2;
>   MEM[base: _15, offset: 0B] = 1;
>   ivtmp.9_1 = ivtmp.9_2 + 4;
>   if (ivtmp.9_1 != _16)
>     goto <bb 3>; [98.99%] [count: INV]
>   else
>     goto <bb 4>; [1.01%] [count: INV]
>
>   <bb 4> [1.00%] [count: INV]:
>   a[100] = t1_20;
>   a[101] = 2;
>   return 0;
>
> We now eliminate one phi and leave another behind.  It is vrp1/dce2
> when the phi is eliminated.

Ok, I see.  Maybe worth filing a missed optimization PR.

Richard.

> Thanks,
> bin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
  2017-07-25 12:58       ` Richard Biener
@ 2017-07-25 13:11         ` Bin.Cheng
  0 siblings, 0 replies; 7+ messages in thread
From: Bin.Cheng @ 2017-07-25 13:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Tue, Jul 25, 2017 at 1:57 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>> Hi,
>>>>> This is a followup patch better handling below case:
>>>>>      for (i = 0; i < n; i++)
>>>>>        {
>>>>>          a[i] = 1;
>>>>>          a[i+2] = 2;
>>>>>        }
>>>>> Instead of generating root variables by loading from memory and propagating with PHI
>>>>> nodes, like:
>>>>>      t0 = a[0];
>>>>>      t1 = a[1];
>>>>>      for (i = 0; i < n; i++)
>>>>>        {
>>>>>          a[i] = 1;
>>>>>          t2 = 2;
>>>>>          t0 = t1;
>>>>>          t1 = t2;
>>>>>        }
>>>>>      a[n] = t0;
>>>>>      a[n+1] = t1;
>>>>> We can simply store loop invariant values after loop body if we know loop iterates more
>>>>> than chain->length times, like:
>>>>>      for (i = 0; i < n; i++)
>>>>>        {
>>>>>          a[i] = 1;
>>>>>        }
>>>>>      a[n] = 2;
>>>>>      a[n+1] = 2;
>>>>>
>>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>>> Update patch wrto changes in previous patch.
>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> +      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
>>> +       continue;
>>>
>>> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
>>> FIXED_CST would be ok as well for example.
>>>
>>> Ok with that change.  Did we eventually optimize this in followup
>>> passes previously?
>> Probably not?  Given below test:
>>
>> int a[10000], b[10000], c[10000];
>> int f(void)
>> {
>>   int i, n = 100;
>>   int t0 = a[0];
>>   int t1 = a[1];
>>      for (i = 0; i < n; i++)
>>        {
>>          a[i] = 1;
>>          int t2 = 2;
>>          t0 = t1;
>>          t1 = t2;
>>        }
>>      a[n] = t0;
>>      a[n+1] = t1;
>>   return 0;
>> }
>> The optimized dump is as:
>>
>>   <bb 2> [1.00%] [count: INV]:
>>   t1_8 = a[1];
>>   ivtmp.9_17 = (unsigned long) &a;
>>   _16 = ivtmp.9_17 + 400;
>>
>>   <bb 3> [99.00%] [count: INV]:
>>   # t1_20 = PHI <2(3), t1_8(2)>
>>   # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)>
>>   _15 = (void *) ivtmp.9_2;
>>   MEM[base: _15, offset: 0B] = 1;
>>   ivtmp.9_1 = ivtmp.9_2 + 4;
>>   if (ivtmp.9_1 != _16)
>>     goto <bb 3>; [98.99%] [count: INV]
>>   else
>>     goto <bb 4>; [1.01%] [count: INV]
>>
>>   <bb 4> [1.00%] [count: INV]:
>>   a[100] = t1_20;
>>   a[101] = 2;
>>   return 0;
>>
>> We now eliminate one phi and leave another behind.  It is vrp1/dce2
>> when the phi is eliminated.
>
> Ok, I see.  Maybe worth filing a missed optimization PR.
Right, PR81549 filed.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
  2017-07-25 11:48   ` Richard Biener
  2017-07-25 12:38     ` Bin.Cheng
@ 2017-07-28 15:42     ` Bin.Cheng
  1 sibling, 0 replies; 7+ messages in thread
From: Bin.Cheng @ 2017-07-28 15:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> This is a followup patch better handling below case:
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>          a[i+2] = 2;
>>>        }
>>> Instead of generating root variables by loading from memory and propagating with PHI
>>> nodes, like:
>>>      t0 = a[0];
>>>      t1 = a[1];
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>          t2 = 2;
>>>          t0 = t1;
>>>          t1 = t2;
>>>        }
>>>      a[n] = t0;
>>>      a[n+1] = t1;
>>> We can simply store loop invariant values after loop body if we know loop iterates more
>>> than chain->length times, like:
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>        }
>>>      a[n] = 2;
>>>      a[n+1] = 2;
>>>
>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>> Update patch wrto changes in previous patch.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> +      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
> +       continue;
>
> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
> FIXED_CST would be ok as well for example.
>
> Ok with that change.  Did we eventually optimize this in followup
Patch revised as suggested.  I retested whole patch set and applied as
250665 ~ 250670.  Will keep eyes on possible breakage in next couple
weeks.

Thanks,
bin
> passes previously?
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Thanks,
>>> bin
>>> 2017-06-21  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         * tree-predcom.c: (struct chain): Handle store-store chain in which
>>>         stores for elimination only store loop invariant values.
>>>         (execute_pred_commoning_chain): Ditto.
>>>         (prepare_initializers_chain_store_elim): Ditto.
>>>         (prepare_finalizers): Ditto.
>>>         (is_inv_store_elimination_chain): New function.
>>>         (initialize_root_vars_store_elim_1): New function.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-28 15:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-27 10:50 [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant Bin Cheng
2017-07-10  8:25 ` Bin.Cheng
2017-07-25 11:48   ` Richard Biener
2017-07-25 12:38     ` Bin.Cheng
2017-07-25 12:58       ` Richard Biener
2017-07-25 13:11         ` Bin.Cheng
2017-07-28 15:42     ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).