* [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
@ 2016-06-13 9:57 Bin Cheng
2016-06-13 10:10 ` Richard Biener
2016-06-18 8:59 ` Andreas Schwab
0 siblings, 2 replies; 7+ messages in thread
From: Bin Cheng @ 2016-06-13 9:57 UTC (permalink / raw)
To: gcc-patches; +Cc: nd
[-- Attachment #1: Type: text/plain, Size: 701 bytes --]
Hi,
This patch partially reverts part of r235513 to fix PR71347, the original patch is to improve compilation time for a small amount. Root cause as analyzed in bugzilla PR is that we can't skip computing cost for sub iv_use if it has different position to the first use in group. The patch also includes a new test.
Bootstrap and test on x86_64. Is it OK?
Thanks,
bin
2016-05-31 Bin Cheng <bin.cheng@arm.com>
PR tree-optimization/71347
* tree-ssa-loop-ivopts.c (determine_group_iv_cost_address): Compute
cost for all uses in group.
gcc/testsuite/ChangeLog
2016-05-31 Bin Cheng <bin.cheng@arm.com>
PR tree-optimization/71347
* gcc.dg/tree-ssa/pr71347.c: New test.
[-- Attachment #2: pr71347-20160531.txt --]
[-- Type: text/plain, Size: 2540 bytes --]
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1e8d637..25b9780 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5115,7 +5115,7 @@ determine_group_iv_cost_address (struct ivopts_data *data,
{
unsigned i;
bitmap depends_on;
- bool can_autoinc, first = true;
+ bool can_autoinc;
iv_inv_expr_ent *inv_expr = NULL;
struct iv_use *use = group->vuses[0];
comp_cost sum_cost = no_cost, cost;
@@ -5142,30 +5142,11 @@ determine_group_iv_cost_address (struct ivopts_data *data,
{
struct iv_use *next = group->vuses[i];
- /* Compute cost for the first use with different offset to the main
- use and add it afterwards. Costs for these uses could be quite
- different. Given below uses in a group:
- use 0 : {base + A + offset_0, step}
- use 0.1: {base + A + offset_0, step}
- use 0.2: {base + A + offset_1, step}
- use 0.3: {base + A + offset_2, step}
- when we need to compute costs with candidate:
- cand 1 : {base + B + offset_0, step}
-
- The first use with different offset is use 0.2, its cost is larger
- than cost of use 0/0.1 because we need to compute:
- A - B + offset_1 - offset_0
- rather than:
- A - B. */
- if (first && next->addr_offset != use->addr_offset)
- {
- first = false;
- cost = get_computation_cost (data, next, cand, true,
- NULL, &can_autoinc, NULL);
- /* Remove setup cost. */
- if (!cost.infinite_cost_p ())
- cost -= cost.scratch;
- }
+ /* TODO: We could skip computing cost for sub iv_use when it has the
+ same cost as the first iv_use, but the cost really depends on the
+ offset and where the iv_use is. */
+ cost = get_computation_cost (data, next, cand, true,
+ NULL, &can_autoinc, NULL);
sum_cost += cost;
}
set_group_iv_cost (data, group, cand, sum_cost, depends_on,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
new file mode 100644
index 0000000..7e5ad49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+double in;
+extern void Write (double);
+void foo (void)
+{
+ static double X[9];
+ int i;
+ X[1] = in * in;
+ for (i = 2; i <= 8; i++)
+ X[i] = X[i - 1] * X[1];
+ Write (X[5]);
+}
+
+/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration. */
+/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
2016-06-13 9:57 [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group Bin Cheng
@ 2016-06-13 10:10 ` Richard Biener
2016-06-18 8:59 ` Andreas Schwab
1 sibling, 0 replies; 7+ messages in thread
From: Richard Biener @ 2016-06-13 10:10 UTC (permalink / raw)
To: Bin Cheng; +Cc: gcc-patches, nd
On Mon, Jun 13, 2016 at 11:57 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This patch partially reverts part of r235513 to fix PR71347, the original patch is to improve compilation time for a small amount. Root cause as analyzed in bugzilla PR is that we can't skip computing cost for sub iv_use if it has different position to the first use in group. The patch also includes a new test.
>
> Bootstrap and test on x86_64. Is it OK?
Ok.
Richard.
> Thanks,
> bin
>
> 2016-05-31 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/71347
> * tree-ssa-loop-ivopts.c (determine_group_iv_cost_address): Compute
> cost for all uses in group.
>
> gcc/testsuite/ChangeLog
> 2016-05-31 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/71347
> * gcc.dg/tree-ssa/pr71347.c: New test.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
2016-06-13 9:57 [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group Bin Cheng
2016-06-13 10:10 ` Richard Biener
@ 2016-06-18 8:59 ` Andreas Schwab
2016-06-20 8:18 ` Christophe Lyon
1 sibling, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2016-06-18 8:59 UTC (permalink / raw)
To: Bin Cheng; +Cc: gcc-patches, nd
Bin Cheng <Bin.Cheng@arm.com> writes:
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
> new file mode 100644
> index 0000000..7e5ad49
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +double in;
> +extern void Write (double);
> +void foo (void)
> +{
> + static double X[9];
> + int i;
> + X[1] = in * in;
> + for (i = 2; i <= 8; i++)
> + X[i] = X[i - 1] * X[1];
> + Write (X[5]);
> +}
> +
> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration. */
> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
The test fails on ia64, this is what I get in .optimized:
;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)
foo ()
{
int i;
static double X[9];
double in.0_1;
double in.1_2;
double _3;
int _4;
double _5;
double _6;
double _7;
double _8;
<bb 2>:
in.0_1 = in;
in.1_2 = in;
_3 = in.0_1 * in.1_2;
X[1] = _3;
i_13 = 2;
goto <bb 4>;
<bb 3>:
_4 = i_9 + -1;
_5 = X[_4];
_6 = X[1];
_7 = _5 * _6;
X[i_9] = _7;
i_15 = i_9 + 1;
<bb 4>:
# i_9 = PHI <i_13(2), i_15(3)>
if (i_9 <= 8)
goto <bb 3>;
else
goto <bb 5>;
<bb 5>:
_8 = X[5];
Write (_8);
return;
}
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
2016-06-18 8:59 ` Andreas Schwab
@ 2016-06-20 8:18 ` Christophe Lyon
2016-06-20 8:20 ` Bin.Cheng
0 siblings, 1 reply; 7+ messages in thread
From: Christophe Lyon @ 2016-06-20 8:18 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Bin Cheng, gcc-patches, nd
On 18 June 2016 at 10:59, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Bin Cheng <Bin.Cheng@arm.com> writes:
>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>> new file mode 100644
>> index 0000000..7e5ad49
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +double in;
>> +extern void Write (double);
>> +void foo (void)
>> +{
>> + static double X[9];
>> + int i;
>> + X[1] = in * in;
>> + for (i = 2; i <= 8; i++)
>> + X[i] = X[i - 1] * X[1];
>> + Write (X[5]);
>> +}
>> +
>> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration. */
>> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
>
> The test fails on ia64, this is what I get in .optimized:
>
The test passes on aarch64, but fails on arm targets. Maybe that's
easier for Bin to reproduce?
Christophe
> ;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)
>
> foo ()
> {
> int i;
> static double X[9];
> double in.0_1;
> double in.1_2;
> double _3;
> int _4;
> double _5;
> double _6;
> double _7;
> double _8;
>
> <bb 2>:
> in.0_1 = in;
> in.1_2 = in;
> _3 = in.0_1 * in.1_2;
> X[1] = _3;
> i_13 = 2;
> goto <bb 4>;
>
> <bb 3>:
> _4 = i_9 + -1;
> _5 = X[_4];
> _6 = X[1];
> _7 = _5 * _6;
> X[i_9] = _7;
> i_15 = i_9 + 1;
>
> <bb 4>:
> # i_9 = PHI <i_13(2), i_15(3)>
> if (i_9 <= 8)
> goto <bb 3>;
> else
> goto <bb 5>;
>
> <bb 5>:
> _8 = X[5];
> Write (_8);
> return;
>
> }
>
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
2016-06-20 8:18 ` Christophe Lyon
@ 2016-06-20 8:20 ` Bin.Cheng
2016-06-20 9:11 ` Bin.Cheng
0 siblings, 1 reply; 7+ messages in thread
From: Bin.Cheng @ 2016-06-20 8:20 UTC (permalink / raw)
To: Christophe Lyon; +Cc: Andreas Schwab, Bin Cheng, gcc-patches, nd
On Mon, Jun 20, 2016 at 9:18 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 18 June 2016 at 10:59, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>> new file mode 100644
>>> index 0000000..7e5ad49
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>> @@ -0,0 +1,17 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>> +
>>> +double in;
>>> +extern void Write (double);
>>> +void foo (void)
>>> +{
>>> + static double X[9];
>>> + int i;
>>> + X[1] = in * in;
>>> + for (i = 2; i <= 8; i++)
>>> + X[i] = X[i - 1] * X[1];
>>> + Write (X[5]);
>>> +}
>>> +
>>> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration. */
>>> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
>>
>> The test fails on ia64, this is what I get in .optimized:
>>
>
> The test passes on aarch64, but fails on arm targets. Maybe that's
> easier for Bin to reproduce?
Hi all,
Sorry for the inconvenience, I will have a look at the two targets.
Thanks,
bin
>
> Christophe
>
>> ;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)
>>
>> foo ()
>> {
>> int i;
>> static double X[9];
>> double in.0_1;
>> double in.1_2;
>> double _3;
>> int _4;
>> double _5;
>> double _6;
>> double _7;
>> double _8;
>>
>> <bb 2>:
>> in.0_1 = in;
>> in.1_2 = in;
>> _3 = in.0_1 * in.1_2;
>> X[1] = _3;
>> i_13 = 2;
>> goto <bb 4>;
>>
>> <bb 3>:
>> _4 = i_9 + -1;
>> _5 = X[_4];
>> _6 = X[1];
>> _7 = _5 * _6;
>> X[i_9] = _7;
>> i_15 = i_9 + 1;
>>
>> <bb 4>:
>> # i_9 = PHI <i_13(2), i_15(3)>
>> if (i_9 <= 8)
>> goto <bb 3>;
>> else
>> goto <bb 5>;
>>
>> <bb 5>:
>> _8 = X[5];
>> Write (_8);
>> return;
>>
>> }
>>
>>
>> Andreas.
>>
>> --
>> Andreas Schwab, schwab@linux-m68k.org
>> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
>> "And now for something completely different."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
2016-06-20 8:20 ` Bin.Cheng
@ 2016-06-20 9:11 ` Bin.Cheng
2016-06-20 9:19 ` Andreas Schwab
0 siblings, 1 reply; 7+ messages in thread
From: Bin.Cheng @ 2016-06-20 9:11 UTC (permalink / raw)
To: Christophe Lyon; +Cc: Andreas Schwab, Bin Cheng, gcc-patches, nd
On Mon, Jun 20, 2016 at 9:20 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Jun 20, 2016 at 9:18 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> On 18 June 2016 at 10:59, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>>> new file mode 100644
>>>> index 0000000..7e5ad49
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
>>>> @@ -0,0 +1,17 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>>> +
>>>> +double in;
>>>> +extern void Write (double);
>>>> +void foo (void)
>>>> +{
>>>> + static double X[9];
>>>> + int i;
>>>> + X[1] = in * in;
>>>> + for (i = 2; i <= 8; i++)
>>>> + X[i] = X[i - 1] * X[1];
>>>> + Write (X[5]);
>>>> +}
>>>> +
>>>> +/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration. */
>>>> +/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */
>>>
>>> The test fails on ia64, this is what I get in .optimized:
>>>
>>
>> The test passes on aarch64, but fails on arm targets. Maybe that's
>> easier for Bin to reproduce?
> Hi all,
> Sorry for the inconvenience, I will have a look at the two targets.
Hmm, the failure is because post-increment is enabled in IVOPT on both
ia64 and arm. As a result, IVOPT tends to choose iv_cand which is
incremented after the first store. The dump for IVOPT is as:
<bb 3>:
# prephitmp_20 = PHI <pretmp_15(4), _2(2)>
# prephitmp_22 = PHI <pretmp_21(4), _2(2)>
# ivtmp.23_16 = PHI <ivtmp.23_8(4), ivtmp.23_23(2)>
_6 = prephitmp_20 * prephitmp_22;
_5 = (void *) ivtmp.23_16;
MEM[base: _5, offset: 0B] = _6;
ivtmp.23_8 = ivtmp.23_16 + 8;
if (ivtmp.23_8 != _27)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
_24 = (void *) ivtmp.23_8;
_25 = _24 + 18446744073709551608;
pretmp_15 = MEM[base: _25, offset: 0B];
pretmp_21 = X[1];
goto <bb 3>;
Note address expressions of the load and store now are of different
forms, though of the same value. I will look into DOM to see if it
can be improved to handle address expressions in different forms.
Also I believe this case is long time failed before it was introduced,
I will mark it XFAIL for the moment.
Thanks,
bin
>
> Thanks,
> bin
>>
>> Christophe
>>
>>> ;; Function foo (foo, funcdef_no=0, decl_uid=1387, cgraph_uid=0, symbol_order=1)
>>>
>>> foo ()
>>> {
>>> int i;
>>> static double X[9];
>>> double in.0_1;
>>> double in.1_2;
>>> double _3;
>>> int _4;
>>> double _5;
>>> double _6;
>>> double _7;
>>> double _8;
>>>
>>> <bb 2>:
>>> in.0_1 = in;
>>> in.1_2 = in;
>>> _3 = in.0_1 * in.1_2;
>>> X[1] = _3;
>>> i_13 = 2;
>>> goto <bb 4>;
>>>
>>> <bb 3>:
>>> _4 = i_9 + -1;
>>> _5 = X[_4];
>>> _6 = X[1];
>>> _7 = _5 * _6;
>>> X[i_9] = _7;
>>> i_15 = i_9 + 1;
>>>
>>> <bb 4>:
>>> # i_9 = PHI <i_13(2), i_15(3)>
>>> if (i_9 <= 8)
>>> goto <bb 3>;
>>> else
>>> goto <bb 5>;
>>>
>>> <bb 5>:
>>> _8 = X[5];
>>> Write (_8);
>>> return;
>>>
>>> }
>>>
>>>
>>> Andreas.
>>>
>>> --
>>> Andreas Schwab, schwab@linux-m68k.org
>>> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
>>> "And now for something completely different."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group
2016-06-20 9:11 ` Bin.Cheng
@ 2016-06-20 9:19 ` Andreas Schwab
0 siblings, 0 replies; 7+ messages in thread
From: Andreas Schwab @ 2016-06-20 9:19 UTC (permalink / raw)
To: Bin.Cheng; +Cc: Christophe Lyon, Bin Cheng, gcc-patches, nd
"Bin.Cheng" <amker.cheng@gmail.com> writes:
>>> The test passes on aarch64, but fails on arm targets. Maybe that's
>>> easier for Bin to reproduce?
>> Hi all,
>> Sorry for the inconvenience, I will have a look at the two targets.
> Hmm, the failure is because post-increment is enabled in IVOPT on both
> ia64 and arm. As a result, IVOPT tends to choose iv_cand which is
> incremented after the first store. The dump for IVOPT is as:
FWIW, the test also fails on m68k, for the same reason.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-06-20 9:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13 9:57 [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group Bin Cheng
2016-06-13 10:10 ` Richard Biener
2016-06-18 8:59 ` Andreas Schwab
2016-06-20 8:18 ` Christophe Lyon
2016-06-20 8:20 ` Bin.Cheng
2016-06-20 9:11 ` Bin.Cheng
2016-06-20 9:19 ` Andreas Schwab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).