public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
@ 2016-06-13  9:57 Bin Cheng
  2016-06-13 10:10 ` Richard Biener
  2016-06-18  8:59 ` Andreas Schwab
  0 siblings, 2 replies; 7+ messages in thread
From: Bin Cheng @ 2016-06-13  9:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]

Hi,
This patch partially reverts part of r235513 to fix PR71347, the original patch is to improve compilation time for a small amount.  Root cause as analyzed in bugzilla PR is that we can't skip computing cost for sub iv_use if it has different position to the first use in group.  The patch also includes a new test.

Bootstrap and test on x86_64.  Is it OK?

Thanks,
bin

2016-05-31  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/71347
	* tree-ssa-loop-ivopts.c (determine_group_iv_cost_address): Compute
	cost for all uses in group.

gcc/testsuite/ChangeLog
2016-05-31  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/71347
	* gcc.dg/tree-ssa/pr71347.c: New test.

[-- Attachment #2: pr71347-20160531.txt --]
[-- Type: text/plain, Size: 2540 bytes --]

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1e8d637..25b9780 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5115,7 +5115,7 @@ determine_group_iv_cost_address (struct ivopts_data *data,
 {
   unsigned i;
   bitmap depends_on;
-  bool can_autoinc, first = true;
+  bool can_autoinc;
   iv_inv_expr_ent *inv_expr = NULL;
   struct iv_use *use = group->vuses[0];
   comp_cost sum_cost = no_cost, cost;
@@ -5142,30 +5142,11 @@ determine_group_iv_cost_address (struct ivopts_data *data,
     {
       struct iv_use *next = group->vuses[i];
 
-      /* Compute cost for the first use with different offset to the main
-	 use and add it afterwards.  Costs for these uses could be quite
-	 different.  Given below uses in a group:
-	   use 0  : {base + A + offset_0, step}
-	   use 0.1: {base + A + offset_0, step}
-	   use 0.2: {base + A + offset_1, step}
-	   use 0.3: {base + A + offset_2, step}
-	 when we need to compute costs with candidate:
-	   cand 1 : {base + B + offset_0, step}
-
-	 The first use with different offset is use 0.2, its cost is larger
-	 than cost of use 0/0.1 because we need to compute:
-	   A - B + offset_1 - offset_0
-	   rather than:
-	   A - B.  */
-      if (first && next->addr_offset != use->addr_offset)
-	{
-	  first = false;
-	  cost = get_computation_cost (data, next, cand, true,
-				       NULL, &can_autoinc, NULL);
-	  /* Remove setup cost.  */
-	  if (!cost.infinite_cost_p ())
-	    cost -= cost.scratch;
-	}
+      /* TODO: We could skip computing cost for sub iv_use when it has the
+	 same cost as the first iv_use, but the cost really depends on the
+	 offset and where the iv_use is.  */
+	cost = get_computation_cost (data, next, cand, true,
+				     NULL, &can_autoinc, NULL);
       sum_cost += cost;
     }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
new file mode 100644
index 0000000..7e5ad49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+double in;
+extern void Write (double);
+void foo (void)
+{
+  static double X[9];
+  int i;
+        X[1] = in * in; 
+        for (i = 2; i <= 8; i++)
+            X[i] = X[i - 1] * X[1]; 
+        Write (X[5]);
+}
+
+/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration.  */
+/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
  2016-06-13  9:57 [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group Bin Cheng
@ 2016-06-13 10:10 ` Richard Biener
  2016-06-18  8:59 ` Andreas Schwab
  1 sibling, 0 replies; 7+ messages in thread
From: Richard Biener @ 2016-06-13 10:10 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

On Mon, Jun 13, 2016 at 11:57 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This patch partially reverts part of r235513 to fix PR71347, the original patch is to improve compilation time for a small amount.  Root cause as analyzed in bugzilla PR is that we can't skip computing cost for sub iv_use if it has different position to the first use in group.  The patch also includes a new test.
>
> Bootstrap and test on x86_64.  Is it OK?

Ok.

Richard.

> Thanks,
> bin
>
> 2016-05-31  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/71347
>         * tree-ssa-loop-ivopts.c (determine_group_iv_cost_address): Compute
>         cost for all uses in group.
>
> gcc/testsuite/ChangeLog
> 2016-05-31  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/71347
>         * gcc.dg/tree-ssa/pr71347.c: New test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
  2016-06-13  9:57 [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group Bin Cheng
  2016-06-13 10:10 ` Richard Biener
@ 2016-06-18  8:59 ` Andreas Schwab
  2016-06-20  8:18   ` Christophe Lyon
  1 sibling, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2016-06-18  8:59 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

Bin Cheng <Bin.Cheng@arm.com> writes:

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
> new file mode 100644
> index 0000000..7e5ad49
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +double in;
> +extern void Write (double);
> +void foo (void)
> +{
> +  static double X[9];
> +  int i;
> +        X[1] = in * in; 
> +        for (i = 2; i <= 8; i++)
> +            X[i] = X[i - 1] * X[1]; 
> +        Write (X[5]);
> +}
> +
> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration.  */
> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */

The test fails on ia64, this is what I get in .optimized:

;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)

foo ()
{
  int i;
  static double X[9];
  double in.0_1;
  double in.1_2;
  double _3;
  int _4;
  double _5;
  double _6;
  double _7;
  double _8;

  <bb 2>:
  in.0_1 = in;
  in.1_2 = in;
  _3 = in.0_1 * in.1_2;
  X[1] = _3;
  i_13 = 2;
  goto <bb 4>;

  <bb 3>:
  _4 = i_9 + -1;
  _5 = X[_4];
  _6 = X[1];
  _7 = _5 * _6;
  X[i_9] = _7;
  i_15 = i_9 + 1;

  <bb 4>:
  # i_9 = PHI <i_13(2), i_15(3)>
  if (i_9 <= 8)
    goto <bb 3>;
  else
    goto <bb 5>;

  <bb 5>:
  _8 = X[5];
  Write (_8);
  return;

}


Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
  2016-06-18  8:59 ` Andreas Schwab
@ 2016-06-20  8:18   ` Christophe Lyon
  2016-06-20  8:20     ` Bin.Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Christophe Lyon @ 2016-06-20  8:18 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Bin Cheng, gcc-patches, nd

On 18 June 2016 at 10:59, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Bin Cheng <Bin.Cheng@arm.com> writes:
>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>> new file mode 100644
>> index 0000000..7e5ad49
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +double in;
>> +extern void Write (double);
>> +void foo (void)
>> +{
>> +  static double X[9];
>> +  int i;
>> +        X[1] = in * in;
>> +        for (i = 2; i <= 8; i++)
>> +            X[i] = X[i - 1] * X[1];
>> +        Write (X[5]);
>> +}
>> +
>> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration.  */
>> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
>
> The test fails on ia64, this is what I get in .optimized:
>

The test passes on aarch64, but fails on arm targets. Maybe that's
easier for Bin to reproduce?

Christophe

> ;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)
>
> foo ()
> {
>   int i;
>   static double X[9];
>   double in.0_1;
>   double in.1_2;
>   double _3;
>   int _4;
>   double _5;
>   double _6;
>   double _7;
>   double _8;
>
>   <bb 2>:
>   in.0_1 = in;
>   in.1_2 = in;
>   _3 = in.0_1 * in.1_2;
>   X[1] = _3;
>   i_13 = 2;
>   goto <bb 4>;
>
>   <bb 3>:
>   _4 = i_9 + -1;
>   _5 = X[_4];
>   _6 = X[1];
>   _7 = _5 * _6;
>   X[i_9] = _7;
>   i_15 = i_9 + 1;
>
>   <bb 4>:
>   # i_9 = PHI <i_13(2), i_15(3)>
>   if (i_9 <= 8)
>     goto <bb 3>;
>   else
>     goto <bb 5>;
>
>   <bb 5>:
>   _8 = X[5];
>   Write (_8);
>   return;
>
> }
>
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
  2016-06-20  8:18   ` Christophe Lyon
@ 2016-06-20  8:20     ` Bin.Cheng
  2016-06-20  9:11       ` Bin.Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Bin.Cheng @ 2016-06-20  8:20 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Andreas Schwab, Bin Cheng, gcc-patches, nd

On Mon, Jun 20, 2016 at 9:18 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 18 June 2016 at 10:59, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>> new file mode 100644
>>> index 0000000..7e5ad49
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>> @@ -0,0 +1,17 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>> +
>>> +double in;
>>> +extern void Write (double);
>>> +void foo (void)
>>> +{
>>> +  static double X[9];
>>> +  int i;
>>> +        X[1] = in * in;
>>> +        for (i = 2; i <= 8; i++)
>>> +            X[i] = X[i - 1] * X[1];
>>> +        Write (X[5]);
>>> +}
>>> +
>>> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration.  */
>>> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
>>
>> The test fails on ia64, this is what I get in .optimized:
>>
>
> The test passes on aarch64, but fails on arm targets. Maybe that's
> easier for Bin to reproduce?
Hi all,
Sorry for the inconvenience, I will have a look at the two targets.

Thanks,
bin
>
> Christophe
>
>> ;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)
>>
>> foo ()
>> {
>>   int i;
>>   static double X[9];
>>   double in.0_1;
>>   double in.1_2;
>>   double _3;
>>   int _4;
>>   double _5;
>>   double _6;
>>   double _7;
>>   double _8;
>>
>>   <bb 2>:
>>   in.0_1 = in;
>>   in.1_2 = in;
>>   _3 = in.0_1 * in.1_2;
>>   X[1] = _3;
>>   i_13 = 2;
>>   goto <bb 4>;
>>
>>   <bb 3>:
>>   _4 = i_9 + -1;
>>   _5 = X[_4];
>>   _6 = X[1];
>>   _7 = _5 * _6;
>>   X[i_9] = _7;
>>   i_15 = i_9 + 1;
>>
>>   <bb 4>:
>>   # i_9 = PHI <i_13(2), i_15(3)>
>>   if (i_9 <= 8)
>>     goto <bb 3>;
>>   else
>>     goto <bb 5>;
>>
>>   <bb 5>:
>>   _8 = X[5];
>>   Write (_8);
>>   return;
>>
>> }
>>
>>
>> Andreas.
>>
>> --
>> Andreas Schwab, schwab@linux-m68k.org
>> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
>> "And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
  2016-06-20  8:20     ` Bin.Cheng
@ 2016-06-20  9:11       ` Bin.Cheng
  2016-06-20  9:19         ` Andreas Schwab
  0 siblings, 1 reply; 7+ messages in thread
From: Bin.Cheng @ 2016-06-20  9:11 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Andreas Schwab, Bin Cheng, gcc-patches, nd

On Mon, Jun 20, 2016 at 9:20 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Jun 20, 2016 at 9:18 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> On 18 June 2016 at 10:59, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>>> new file mode 100644
>>>> index 0000000..7e5ad49
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>>> @@ -0,0 +1,17 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>>> +
>>>> +double in;
>>>> +extern void Write (double);
>>>> +void foo (void)
>>>> +{
>>>> +  static double X[9];
>>>> +  int i;
>>>> +        X[1] = in * in;
>>>> +        for (i = 2; i <= 8; i++)
>>>> +            X[i] = X[i - 1] * X[1];
>>>> +        Write (X[5]);
>>>> +}
>>>> +
>>>> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration.  */
>>>> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
>>>
>>> The test fails on ia64, this is what I get in .optimized:
>>>
>>
>> The test passes on aarch64, but fails on arm targets. Maybe that's
>> easier for Bin to reproduce?
> Hi all,
> Sorry for the inconvenience, I will have a look at the two targets.
Hmm, the failure is because post-increment is enabled in IVOPT on both
ia64 and arm.  As a result, IVOPT tends to choose iv_cand which is
incremented after the first store.  The dump for IVOPT is as:


  <bb 3>:
  # prephitmp_20 = PHI <pretmp_15(4), _2(2)>
  # prephitmp_22 = PHI <pretmp_21(4), _2(2)>
  # ivtmp.23_16 = PHI <ivtmp.23_8(4), ivtmp.23_23(2)>
  _6 = prephitmp_20 * prephitmp_22;
  _5 = (void *) ivtmp.23_16;
  MEM[base: _5, offset: 0B] = _6;
  ivtmp.23_8 = ivtmp.23_16 + 8;
  if (ivtmp.23_8 != _27)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  _24 = (void *) ivtmp.23_8;
  _25 = _24 + 18446744073709551608;
  pretmp_15 = MEM[base: _25, offset: 0B];
  pretmp_21 = X[1];
  goto <bb 3>;

Note address expressions of the load and store now are of different
forms, though of the same value.  I will look into DOM to see if it
can be improved to handle address expressions in different forms.
Also I believe this case is long time failed before it was introduced,
I will mark it XFAIL for the moment.

Thanks,
bin
>
> Thanks,
> bin
>>
>> Christophe
>>
>>> ;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)
>>>
>>> foo ()
>>> {
>>>   int i;
>>>   static double X[9];
>>>   double in.0_1;
>>>   double in.1_2;
>>>   double _3;
>>>   int _4;
>>>   double _5;
>>>   double _6;
>>>   double _7;
>>>   double _8;
>>>
>>>   <bb 2>:
>>>   in.0_1 = in;
>>>   in.1_2 = in;
>>>   _3 = in.0_1 * in.1_2;
>>>   X[1] = _3;
>>>   i_13 = 2;
>>>   goto <bb 4>;
>>>
>>>   <bb 3>:
>>>   _4 = i_9 + -1;
>>>   _5 = X[_4];
>>>   _6 = X[1];
>>>   _7 = _5 * _6;
>>>   X[i_9] = _7;
>>>   i_15 = i_9 + 1;
>>>
>>>   <bb 4>:
>>>   # i_9 = PHI <i_13(2), i_15(3)>
>>>   if (i_9 <= 8)
>>>     goto <bb 3>;
>>>   else
>>>     goto <bb 5>;
>>>
>>>   <bb 5>:
>>>   _8 = X[5];
>>>   Write (_8);
>>>   return;
>>>
>>> }
>>>
>>>
>>> Andreas.
>>>
>>> --
>>> Andreas Schwab, schwab@linux-m68k.org
>>> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
>>> "And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
  2016-06-20  9:11       ` Bin.Cheng
@ 2016-06-20  9:19         ` Andreas Schwab
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Schwab @ 2016-06-20  9:19 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Christophe Lyon, Bin Cheng, gcc-patches, nd

"Bin.Cheng" <amker.cheng@gmail.com> writes:

>>> The test passes on aarch64, but fails on arm targets. Maybe that's
>>> easier for Bin to reproduce?
>> Hi all,
>> Sorry for the inconvenience, I will have a look at the two targets.
> Hmm, the failure is because post-increment is enabled in IVOPT on both
> ia64 and arm.  As a result, IVOPT tends to choose iv_cand which is
> incremented after the first store.  The dump for IVOPT is as:

FWIW, the test also fails on m68k, for the same reason.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-20  9:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13  9:57 [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group Bin Cheng
2016-06-13 10:10 ` Richard Biener
2016-06-18  8:59 ` Andreas Schwab
2016-06-20  8:18   ` Christophe Lyon
2016-06-20  8:20     ` Bin.Cheng
2016-06-20  9:11       ` Bin.Cheng
2016-06-20  9:19         ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).