* Missed optimization in PRE?
@ 2012-03-29 10:02 Bin.Cheng
2012-03-29 10:07 ` Richard Guenther
0 siblings, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-03-29 10:02 UTC (permalink / raw)
To: gcc
Hi,
Following is the tree dump of 094t.pre for a test program.
Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
bb7, bb5, bb6>,
but why not lowered into basic block 3, where it is used.
BTW, seems no tree pass handles this case currently.
Any idea? Thanks
int z$imag;
int z$real;
int D.5378;
int D.5377;
int D.5376;
int D.5375;
<bb 2>:
D.5375_11 = REALPART_EXPR <g2>;
D.5376_12 = IMAGPART_EXPR <g2>;
D.5377_13 = REALPART_EXPR <g3>;
D.5378_14 = IMAGPART_EXPR <g3>;
if (D.5375_11 == D.5377_13)
goto <bb 3>;
else
goto <bb 7>;
<bb 3>:
if (D.5376_12 == D.5378_14)
goto <bb 4>;
else
goto <bb 7>;
<bb 4>:
z$real_15 = REALPART_EXPR <g1>;
z$imag_16 = IMAGPART_EXPR <g1>;
goto <bb 6>;
<bb 7>:
<bb 5>:
z$real_17 = REALPART_EXPR <g4>;
z$imag_18 = IMAGPART_EXPR <g4>;
<bb 6>:
# z$real_19 = PHI <z$real_15(4), z$real_17(5)>
# z$imag_20 = PHI <z$imag_16(4), z$imag_18(5)>
REALPART_EXPR <g1> = z$real_19;
IMAGPART_EXPR <g1> = z$imag_20;
return 0;
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-29 10:02 Missed optimization in PRE? Bin.Cheng
@ 2012-03-29 10:07 ` Richard Guenther
2012-03-29 10:10 ` Bin.Cheng
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2012-03-29 10:07 UTC (permalink / raw)
To: Bin.Cheng; +Cc: gcc
On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> Hi,
> Following is the tree dump of 094t.pre for a test program.
> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
> bb7, bb5, bb6>,
> but why not lowered into basic block 3, where it is used.
>
> BTW, seems no tree pass handles this case currently.
tree-ssa-sink.c should do this.
> Any idea? Thanks
>
> int z$imag;
> int z$real;
> int D.5378;
> int D.5377;
> int D.5376;
> int D.5375;
>
> <bb 2>:
> D.5375_11 = REALPART_EXPR <g2>;
> D.5376_12 = IMAGPART_EXPR <g2>;
> D.5377_13 = REALPART_EXPR <g3>;
> D.5378_14 = IMAGPART_EXPR <g3>;
> if (D.5375_11 == D.5377_13)
> goto <bb 3>;
> else
> goto <bb 7>;
>
> <bb 3>:
> if (D.5376_12 == D.5378_14)
> goto <bb 4>;
> else
> goto <bb 7>;
>
> <bb 4>:
> z$real_15 = REALPART_EXPR <g1>;
> z$imag_16 = IMAGPART_EXPR <g1>;
> goto <bb 6>;
>
> <bb 7>:
>
> <bb 5>:
> z$real_17 = REALPART_EXPR <g4>;
> z$imag_18 = IMAGPART_EXPR <g4>;
>
> <bb 6>:
> # z$real_19 = PHI <z$real_15(4), z$real_17(5)>
> # z$imag_20 = PHI <z$imag_16(4), z$imag_18(5)>
> REALPART_EXPR <g1> = z$real_19;
> IMAGPART_EXPR <g1> = z$imag_20;
> return 0;
>
> --
> Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-29 10:07 ` Richard Guenther
@ 2012-03-29 10:10 ` Bin.Cheng
2012-03-29 10:14 ` Richard Guenther
0 siblings, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-03-29 10:10 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> Hi,
>> Following is the tree dump of 094t.pre for a test program.
>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>> bb7, bb5, bb6>,
>> but why not lowered into basic block 3, where it is used.
>>
>> BTW, seems no tree pass handles this case currently.
>
> tree-ssa-sink.c should do this.
>
It does not work for me, I will double check and update soon.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-29 10:10 ` Bin.Cheng
@ 2012-03-29 10:14 ` Richard Guenther
2012-03-29 10:22 ` Bin.Cheng
2012-03-29 15:25 ` Bin.Cheng
0 siblings, 2 replies; 15+ messages in thread
From: Richard Guenther @ 2012-03-29 10:14 UTC (permalink / raw)
To: Bin.Cheng; +Cc: gcc
On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> Hi,
>>> Following is the tree dump of 094t.pre for a test program.
>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>>> bb7, bb5, bb6>,
>>> but why not lowered into basic block 3, where it is used.
>>>
>>> BTW, seems no tree pass handles this case currently.
>>
>> tree-ssa-sink.c should do this.
>>
> It does not work for me, I will double check and update soon.
Well, "should" as in, it's the place to do it. And certainly the pass can sink
loads, so this must be a missed optimization.
Richard.
> --
> Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-29 10:14 ` Richard Guenther
@ 2012-03-29 10:22 ` Bin.Cheng
2012-03-29 15:25 ` Bin.Cheng
1 sibling, 0 replies; 15+ messages in thread
From: Bin.Cheng @ 2012-03-29 10:22 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> Hi,
>>>> Following is the tree dump of 094t.pre for a test program.
>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>>>> bb7, bb5, bb6>,
>>>> but why not lowered into basic block 3, where it is used.
>>>>
>>>> BTW, seems no tree pass handles this case currently.
>>>
>>> tree-ssa-sink.c should do this.
>>>
>> It does not work for me, I will double check and update soon.
>
> Well, "should" as in, it's the place to do it. And certainly the pass can sink
> loads, so this must be a missed optimization.
>
ok, I will investigate it.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-29 10:14 ` Richard Guenther
2012-03-29 10:22 ` Bin.Cheng
@ 2012-03-29 15:25 ` Bin.Cheng
2012-03-30 8:16 ` Richard Guenther
1 sibling, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-03-29 15:25 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> Hi,
>>>> Following is the tree dump of 094t.pre for a test program.
>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>>>> bb7, bb5, bb6>,
>>>> but why not lowered into basic block 3, where it is used.
>>>>
>>>> BTW, seems no tree pass handles this case currently.
>>>
>>> tree-ssa-sink.c should do this.
>>>
>> It does not work for me, I will double check and update soon.
>
> Well, "should" as in, it's the place to do it. And certainly the pass can sink
> loads, so this must be a missed optimization.
>
Curiously, it is said explicitly that "We don't want to sink loads from memory."
in tree-ssa-sink.c function statement_sink_location, and the condition is
if (stmt_ends_bb_p (stmt)
|| gimple_has_side_effects (stmt)
|| gimple_has_volatile_ops (stmt)
|| (gimple_vuse (stmt) && !gimple_vdef (stmt))
<-----------------check load
|| (cfun->has_local_explicit_reg_vars
&& TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode))
return false;
I haven't found any clue about this decision in ChangeLogs.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-29 15:25 ` Bin.Cheng
@ 2012-03-30 8:16 ` Richard Guenther
2012-03-30 9:43 ` Bin.Cheng
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2012-03-30 8:16 UTC (permalink / raw)
To: Bin.Cheng; +Cc: gcc
On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>> Hi,
>>>>> Following is the tree dump of 094t.pre for a test program.
>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>>>>> bb7, bb5, bb6>,
>>>>> but why not lowered into basic block 3, where it is used.
>>>>>
>>>>> BTW, seems no tree pass handles this case currently.
>>>>
>>>> tree-ssa-sink.c should do this.
>>>>
>>> It does not work for me, I will double check and update soon.
>>
>> Well, "should" as in, it's the place to do it. And certainly the pass can sink
>> loads, so this must be a missed optimization.
>>
> Curiously, it is said explicitly that "We don't want to sink loads from memory."
> in tree-ssa-sink.c function statement_sink_location, and the condition is
>
> if (stmt_ends_bb_p (stmt)
> || gimple_has_side_effects (stmt)
> || gimple_has_volatile_ops (stmt)
> || (gimple_vuse (stmt) && !gimple_vdef (stmt))
> <-----------------check load
> || (cfun->has_local_explicit_reg_vars
> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode))
> return false;
>
> I haven't found any clue about this decision in ChangeLogs.
Ah, that's probably because usually you want to hoist loads and sink stores,
separating them (like a scheduler would do). We'd want to restrict sinking
of loads to sink into not post-dominated regions (thus where they end up
being executed less times).
Richard.
>
> --
> Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-30 8:16 ` Richard Guenther
@ 2012-03-30 9:43 ` Bin.Cheng
2012-04-09 6:00 ` Bin.Cheng
0 siblings, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-03-30 9:43 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Fri, Mar 30, 2012 at 4:15 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>>> Hi,
>>>>>> Following is the tree dump of 094t.pre for a test program.
>>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>>>>>> bb7, bb5, bb6>,
>>>>>> but why not lowered into basic block 3, where it is used.
>>>>>>
>>>>>> BTW, seems no tree pass handles this case currently.
>>>>>
>>>>> tree-ssa-sink.c should do this.
>>>>>
>>>> It does not work for me, I will double check and update soon.
>>>
>>> Well, "should" as in, it's the place to do it. And certainly the pass can sink
>>> loads, so this must be a missed optimization.
>>>
>> Curiously, it is said explicitly that "We don't want to sink loads from memory."
>> in tree-ssa-sink.c function statement_sink_location, and the condition is
>>
>> if (stmt_ends_bb_p (stmt)
>> || gimple_has_side_effects (stmt)
>> || gimple_has_volatile_ops (stmt)
>> || (gimple_vuse (stmt) && !gimple_vdef (stmt))
>> <-----------------check load
>> || (cfun->has_local_explicit_reg_vars
>> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode))
>> return false;
>>
>> I haven't found any clue about this decision in ChangeLogs.
>
> Ah, that's probably because usually you want to hoist loads and sink stores,
> separating them (like a scheduler would do). We'd want to restrict sinking
> of loads to sink into not post-dominated regions (thus where they end up
> being executed less times).
>
Understood, I will work on this.
Thanks.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-03-30 9:43 ` Bin.Cheng
@ 2012-04-09 6:00 ` Bin.Cheng
2012-04-09 11:02 ` Richard Guenther
0 siblings, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-04-09 6:00 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Mar 30, 2012 at 4:15 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>>>> Hi,
>>>>>>> Following is the tree dump of 094t.pre for a test program.
>>>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>>>>>>> bb7, bb5, bb6>,
>>>>>>> but why not lowered into basic block 3, where it is used.
>>>>>>>
>>>>>>> BTW, seems no tree pass handles this case currently.
>>>>>>
>>>>>> tree-ssa-sink.c should do this.
>>>>>>
>>>>> It does not work for me, I will double check and update soon.
>>>>
>>>> Well, "should" as in, it's the place to do it. And certainly the pass can sink
>>>> loads, so this must be a missed optimization.
>>>>
>>> Curiously, it is said explicitly that "We don't want to sink loads from memory."
>>> in tree-ssa-sink.c function statement_sink_location, and the condition is
>>>
>>> if (stmt_ends_bb_p (stmt)
>>> || gimple_has_side_effects (stmt)
>>> || gimple_has_volatile_ops (stmt)
>>> || (gimple_vuse (stmt) && !gimple_vdef (stmt))
>>> <-----------------check load
>>> || (cfun->has_local_explicit_reg_vars
>>> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode))
>>> return false;
>>>
>>> I haven't found any clue about this decision in ChangeLogs.
>>
>> Ah, that's probably because usually you want to hoist loads and sink stores,
>> separating them (like a scheduler would do). We'd want to restrict sinking
>> of loads to sink into not post-dominated regions (thus where they end up
>> being executed less times).
Hi Richard,
I am testing a patch to sink load of memory to proper basic block.
Everything goes fine except auto-vectorization, sinking of load sometime
corrupts the canonical form of data references. I haven't touched auto-vec
before and cannot tell whether it's good or bad to do sink before auto-vec.
For example, the slp-cond-1.c
<bb 3>:
# i_39 = PHI <i_32(11), 0(2)>
D.5150_5 = i_39 * 2;
D.5151_10 = D.5150_5 + 1;
D.5153_17 = a[D.5150_5];
D.5154_19 = b[D.5150_5];
if (D.5153_17 >= D.5154_19)
goto <bb 9>;
else
goto <bb 4>;
<bb 9>:
d0_6 = d[D.5150_5]; <-----this is sunk from bb3
goto <bb 5>;
<bb 4>:
e0_8 = e[D.5150_5]; <-----this is sunk from bb3
<bb 5>:
# d0_2 = PHI <d0_6(9), e0_8(4)>
k[D.5150_5] = d0_2;
D.5159_26 = a[D.5151_10];
D.5160_29 = b[D.5151_10];
if (D.5159_26 >= D.5160_29)
goto <bb 10>;
else
goto <bb 6>;
<bb 10>:
d1_11 = d[D.5151_10]; <-----this is sunk from bb3
goto <bb 7>;
<bb 6>:
e1_14 = e[D.5151_10]; <-----this is sunk from bb3
<bb 7>:
.......
I will look into auto-vect but not sure how to handle this case.
Any comments? Thanks very much.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-04-09 6:00 ` Bin.Cheng
@ 2012-04-09 11:02 ` Richard Guenther
2012-04-11 3:28 ` Bin.Cheng
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2012-04-09 11:02 UTC (permalink / raw)
To: Bin.Cheng; +Cc: gcc
On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Mar 30, 2012 at 4:15 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>>>>> Hi,
>>>>>>>> Following is the tree dump of 094t.pre for a test program.
>>>>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2,
>>>>>>>> bb7, bb5, bb6>,
>>>>>>>> but why not lowered into basic block 3, where it is used.
>>>>>>>>
>>>>>>>> BTW, seems no tree pass handles this case currently.
>>>>>>>
>>>>>>> tree-ssa-sink.c should do this.
>>>>>>>
>>>>>> It does not work for me, I will double check and update soon.
>>>>>
>>>>> Well, "should" as in, it's the place to do it. And certainly the pass can sink
>>>>> loads, so this must be a missed optimization.
>>>>>
>>>> Curiously, it is said explicitly that "We don't want to sink loads from memory."
>>>> in tree-ssa-sink.c function statement_sink_location, and the condition is
>>>>
>>>> if (stmt_ends_bb_p (stmt)
>>>> || gimple_has_side_effects (stmt)
>>>> || gimple_has_volatile_ops (stmt)
>>>> || (gimple_vuse (stmt) && !gimple_vdef (stmt))
>>>> <-----------------check load
>>>> || (cfun->has_local_explicit_reg_vars
>>>> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode))
>>>> return false;
>>>>
>>>> I haven't found any clue about this decision in ChangeLogs.
>>>
>>> Ah, that's probably because usually you want to hoist loads and sink stores,
>>> separating them (like a scheduler would do). We'd want to restrict sinking
>>> of loads to sink into not post-dominated regions (thus where they end up
>>> being executed less times).
>
> Hi Richard,
> I am testing a patch to sink load of memory to proper basic block.
> Everything goes fine except auto-vectorization, sinking of load sometime
> corrupts the canonical form of data references. I haven't touched auto-vec
> before and cannot tell whether it's good or bad to do sink before auto-vec.
> For example, the slp-cond-1.c
>
> <bb 3>:
> # i_39 = PHI <i_32(11), 0(2)>
> D.5150_5 = i_39 * 2;
> D.5151_10 = D.5150_5 + 1;
> D.5153_17 = a[D.5150_5];
> D.5154_19 = b[D.5150_5];
> if (D.5153_17 >= D.5154_19)
> goto <bb 9>;
> else
> goto <bb 4>;
>
> <bb 9>:
> d0_6 = d[D.5150_5]; <-----this is sunk from bb3
> goto <bb 5>;
>
> <bb 4>:
> e0_8 = e[D.5150_5]; <-----this is sunk from bb3
>
> <bb 5>:
> # d0_2 = PHI <d0_6(9), e0_8(4)>
> k[D.5150_5] = d0_2;
> D.5159_26 = a[D.5151_10];
> D.5160_29 = b[D.5151_10];
> if (D.5159_26 >= D.5160_29)
> goto <bb 10>;
> else
> goto <bb 6>;
>
>
> <bb 10>:
> d1_11 = d[D.5151_10]; <-----this is sunk from bb3
> goto <bb 7>;
>
> <bb 6>:
> e1_14 = e[D.5151_10]; <-----this is sunk from bb3
>
> <bb 7>:
> .......
>
> I will look into auto-vect but not sure how to handle this case.
>
> Any comments? Thanks very much.
Simple - the vectorizer expects empty latch blocks. So simply
never sink stuff into latch-blocks - I think the current code already
tries to avoid that for regular computations.
Richard.
> --
> Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-04-09 11:02 ` Richard Guenther
@ 2012-04-11 3:28 ` Bin.Cheng
2012-04-11 8:05 ` Bin.Cheng
0 siblings, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-04-11 3:28 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Mon, Apr 9, 2012 at 7:02 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>
>> Hi Richard,
>> I am testing a patch to sink load of memory to proper basic block.
>> Everything goes fine except auto-vectorization, sinking of load sometime
>> corrupts the canonical form of data references. I haven't touched auto-vec
>> before and cannot tell whether it's good or bad to do sink before auto-vec.
>> For example, the slp-cond-1.c
>>
>> <bb 3>:
>> # i_39 = PHI <i_32(11), 0(2)>
>> D.5150_5 = i_39 * 2;
>> D.5151_10 = D.5150_5 + 1;
>> D.5153_17 = a[D.5150_5];
>> D.5154_19 = b[D.5150_5];
>> if (D.5153_17 >= D.5154_19)
>> goto <bb 9>;
>> else
>> goto <bb 4>;
>>
>> <bb 9>:
>> d0_6 = d[D.5150_5]; <-----this is sunk from bb3
>> goto <bb 5>;
>>
>> <bb 4>:
>> e0_8 = e[D.5150_5]; <-----this is sunk from bb3
>>
>> <bb 5>:
>> # d0_2 = PHI <d0_6(9), e0_8(4)>
>> k[D.5150_5] = d0_2;
>> D.5159_26 = a[D.5151_10];
>> D.5160_29 = b[D.5151_10];
>> if (D.5159_26 >= D.5160_29)
>> goto <bb 10>;
>> else
>> goto <bb 6>;
>>
>>
>> <bb 10>:
>> d1_11 = d[D.5151_10]; <-----this is sunk from bb3
>> goto <bb 7>;
>>
>> <bb 6>:
>> e1_14 = e[D.5151_10]; <-----this is sunk from bb3
>>
>> <bb 7>:
>> .......
>>
>> I will look into auto-vect but not sure how to handle this case.
>>
>> Any comments? Thanks very much.
>
> Simple - the vectorizer expects empty latch blocks. So simply
> never sink stuff into latch-blocks - I think the current code already
> tries to avoid that for regular computations.
Seems not the story. The sinkto basic block is not latch basic block at all.
The problem is about if-conversion, code of the case in slp-cond-1.c is like:
__attribute__((noinline, noclone)) void
f4 (void)
{
int i;
for (i = 0; i < N/2; ++i)
{
int d0 = d[2*i], e0 = e[2*i];
int d1 = d[2*i+1], e1 = e[2*i+1];
k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; <------example
k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1;
}
}
It is strictly formed in conditional operations, just like the output
of if-conversion pass before auto-vec.
Now sink pass sinks load of d0/e0 into then/else branch, since they
are partial dead code in the opposite branch, which corrupts the
canonical form.
Ideally, if-conversion pass should handle the sinked code and collapse
if-then-else into conditional operations.
But as showed by dump of if-conversion pass before auto-vect:
<bb 3>:
# i_39 = PHI <i_32(10), 0(2)>
# ivtmp.137_41 = PHI <ivtmp.137_38(10), 16(2)>
D.5150_5 = i_39 * 2;
D.5151_10 = D.5150_5 + 1;
D.5153_17 = a[D.5150_5];
D.5154_19 = b[D.5150_5];
if (D.5153_17 >= D.5154_19)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
d0_6 = d[D.5150_5];
goto <bb 6>;
<bb 5>:
e0_8 = e[D.5150_5];
<bb 6>:
# d0_2 = PHI <d0_6(4), e0_8(5)>
k[D.5150_5] = d0_2;
It does not work as expected.
So I think sink pass is fine for this case, it is the if-conversion
pass need further investigation.
Comments? Thanks.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-04-11 3:28 ` Bin.Cheng
@ 2012-04-11 8:05 ` Bin.Cheng
2012-04-11 9:09 ` Richard Guenther
0 siblings, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-04-11 8:05 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Apr 9, 2012 at 7:02 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>
>>>
>>> Hi Richard,
>>> I am testing a patch to sink load of memory to proper basic block.
>>> Everything goes fine except auto-vectorization, sinking of load sometime
>>> corrupts the canonical form of data references. I haven't touched auto-vec
>>> before and cannot tell whether it's good or bad to do sink before auto-vec.
>>> For example, the slp-cond-1.c
>>>
>>> <bb 3>:
>>> # i_39 = PHI <i_32(11), 0(2)>
>>> D.5150_5 = i_39 * 2;
>>> D.5151_10 = D.5150_5 + 1;
>>> D.5153_17 = a[D.5150_5];
>>> D.5154_19 = b[D.5150_5];
>>> if (D.5153_17 >= D.5154_19)
>>> goto <bb 9>;
>>> else
>>> goto <bb 4>;
>>>
>>> <bb 9>:
>>> d0_6 = d[D.5150_5]; <-----this is sunk from bb3
>>> goto <bb 5>;
>>>
>>> <bb 4>:
>>> e0_8 = e[D.5150_5]; <-----this is sunk from bb3
>>>
>>> <bb 5>:
>>> # d0_2 = PHI <d0_6(9), e0_8(4)>
>>> k[D.5150_5] = d0_2;
>>> D.5159_26 = a[D.5151_10];
>>> D.5160_29 = b[D.5151_10];
>>> if (D.5159_26 >= D.5160_29)
>>> goto <bb 10>;
>>> else
>>> goto <bb 6>;
>>>
>>>
>>> <bb 10>:
>>> d1_11 = d[D.5151_10]; <-----this is sunk from bb3
>>> goto <bb 7>;
>>>
>>> <bb 6>:
>>> e1_14 = e[D.5151_10]; <-----this is sunk from bb3
>>>
>>> <bb 7>:
>>> .......
>>>
>>> I will look into auto-vect but not sure how to handle this case.
>>>
>>> Any comments? Thanks very much.
>>
>> Simple - the vectorizer expects empty latch blocks. So simply
>> never sink stuff into latch-blocks - I think the current code already
>> tries to avoid that for regular computations.
>
> Seems not the story. The sinkto basic block is not latch basic block at all.
>
> The problem is about if-conversion, code of the case in slp-cond-1.c is like:
>
> __attribute__((noinline, noclone)) void
> f4 (void)
> {
> int i;
> for (i = 0; i < N/2; ++i)
> {
> int d0 = d[2*i], e0 = e[2*i];
> int d1 = d[2*i+1], e1 = e[2*i+1];
> k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; <------example
> k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1;
> }
> }
> It is strictly formed in conditional operations, just like the output
> of if-conversion pass before auto-vec.
>
> Now sink pass sinks load of d0/e0 into then/else branch, since they
> are partial dead code in the opposite branch, which corrupts the
> canonical form.
>
> Ideally, if-conversion pass should handle the sinked code and collapse
> if-then-else into conditional operations.
> But as showed by dump of if-conversion pass before auto-vect:
>
> <bb 3>:
> # i_39 = PHI <i_32(10), 0(2)>
> # ivtmp.137_41 = PHI <ivtmp.137_38(10), 16(2)>
> D.5150_5 = i_39 * 2;
> D.5151_10 = D.5150_5 + 1;
> D.5153_17 = a[D.5150_5];
> D.5154_19 = b[D.5150_5];
> if (D.5153_17 >= D.5154_19)
> goto <bb 4>;
> else
> goto <bb 5>;
>
> <bb 4>:
> d0_6 = d[D.5150_5];
> goto <bb 6>;
>
> <bb 5>:
> e0_8 = e[D.5150_5];
>
> <bb 6>:
> # d0_2 = PHI <d0_6(4), e0_8(5)>
> k[D.5150_5] = d0_2;
>
> It does not work as expected.
>
> So I think sink pass is fine for this case, it is the if-conversion
> pass need further investigation.
> Comments? Thanks.
Turns out if-conversion checks whether gimple statement traps or not.
For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap,
because it sees the index not INTEGER_CST.
Two questions:
1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF.
2, Should I check this before sinking load from memory? If yes, then why sink of
store does not do such check?
Again, please correct if I missed something important.
Thanks very much.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-04-11 8:05 ` Bin.Cheng
@ 2012-04-11 9:09 ` Richard Guenther
2012-04-11 9:25 ` Bin.Cheng
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2012-04-11 9:09 UTC (permalink / raw)
To: Bin.Cheng; +Cc: gcc
On Wed, Apr 11, 2012 at 10:05 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Mon, Apr 9, 2012 at 7:02 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>
>>>>
>>>> Hi Richard,
>>>> I am testing a patch to sink load of memory to proper basic block.
>>>> Everything goes fine except auto-vectorization, sinking of load sometime
>>>> corrupts the canonical form of data references. I haven't touched auto-vec
>>>> before and cannot tell whether it's good or bad to do sink before auto-vec.
>>>> For example, the slp-cond-1.c
>>>>
>>>> <bb 3>:
>>>> # i_39 = PHI <i_32(11), 0(2)>
>>>> D.5150_5 = i_39 * 2;
>>>> D.5151_10 = D.5150_5 + 1;
>>>> D.5153_17 = a[D.5150_5];
>>>> D.5154_19 = b[D.5150_5];
>>>> if (D.5153_17 >= D.5154_19)
>>>> goto <bb 9>;
>>>> else
>>>> goto <bb 4>;
>>>>
>>>> <bb 9>:
>>>> d0_6 = d[D.5150_5]; <-----this is sunk from bb3
>>>> goto <bb 5>;
>>>>
>>>> <bb 4>:
>>>> e0_8 = e[D.5150_5]; <-----this is sunk from bb3
>>>>
>>>> <bb 5>:
>>>> # d0_2 = PHI <d0_6(9), e0_8(4)>
>>>> k[D.5150_5] = d0_2;
>>>> D.5159_26 = a[D.5151_10];
>>>> D.5160_29 = b[D.5151_10];
>>>> if (D.5159_26 >= D.5160_29)
>>>> goto <bb 10>;
>>>> else
>>>> goto <bb 6>;
>>>>
>>>>
>>>> <bb 10>:
>>>> d1_11 = d[D.5151_10]; <-----this is sunk from bb3
>>>> goto <bb 7>;
>>>>
>>>> <bb 6>:
>>>> e1_14 = e[D.5151_10]; <-----this is sunk from bb3
>>>>
>>>> <bb 7>:
>>>> .......
>>>>
>>>> I will look into auto-vect but not sure how to handle this case.
>>>>
>>>> Any comments? Thanks very much.
>>>
>>> Simple - the vectorizer expects empty latch blocks. So simply
>>> never sink stuff into latch-blocks - I think the current code already
>>> tries to avoid that for regular computations.
>>
>> Seems not the story. The sinkto basic block is not latch basic block at all.
>>
>> The problem is about if-conversion, code of the case in slp-cond-1.c is like:
>>
>> __attribute__((noinline, noclone)) void
>> f4 (void)
>> {
>> int i;
>> for (i = 0; i < N/2; ++i)
>> {
>> int d0 = d[2*i], e0 = e[2*i];
>> int d1 = d[2*i+1], e1 = e[2*i+1];
>> k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; <------example
>> k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1;
>> }
>> }
>> It is strictly formed in conditional operations, just like the output
>> of if-conversion pass before auto-vec.
>>
>> Now sink pass sinks load of d0/e0 into then/else branch, since they
>> are partial dead code in the opposite branch, which corrupts the
>> canonical form.
>>
>> Ideally, if-conversion pass should handle the sinked code and collapse
>> if-then-else into conditional operations.
>> But as showed by dump of if-conversion pass before auto-vect:
>>
>> <bb 3>:
>> # i_39 = PHI <i_32(10), 0(2)>
>> # ivtmp.137_41 = PHI <ivtmp.137_38(10), 16(2)>
>> D.5150_5 = i_39 * 2;
>> D.5151_10 = D.5150_5 + 1;
>> D.5153_17 = a[D.5150_5];
>> D.5154_19 = b[D.5150_5];
>> if (D.5153_17 >= D.5154_19)
>> goto <bb 4>;
>> else
>> goto <bb 5>;
>>
>> <bb 4>:
>> d0_6 = d[D.5150_5];
>> goto <bb 6>;
>>
>> <bb 5>:
>> e0_8 = e[D.5150_5];
>>
>> <bb 6>:
>> # d0_2 = PHI <d0_6(4), e0_8(5)>
>> k[D.5150_5] = d0_2;
>>
>> It does not work as expected.
>>
>> So I think sink pass is fine for this case, it is the if-conversion
>> pass need further investigation.
>> Comments? Thanks.
>
> Turns out if-conversion checks whether gimple statement traps or not.
> For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap,
> because it sees the index not INTEGER_CST.
Yes. After sinking the load is no longer executed unconditionally but
if-conversion would make it so. This is information loss caused by
sinking.
> Two questions:
> 1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF.
No, the index may be out-of-bounds.
> 2, Should I check this before sinking load from memory? If yes, then why sink of
> store does not do such check?
Sinking is never a problem - the code will only be executed less times. The
issue with if-conversion is that it speculates the loads / stores, so they may
not trap if they were originally not executed.
So this is a pass ordering issue - sinking and if-conversion have different
conflicting goals. Btw, you also make RTL if-conversion harder. I suppose
you should try to avoid messing up if-conversion possibilities so early,
thus, not sink in these cases. The same issue is present
for non-loads that are possibly trapping. So I'm not even sure we can easily
detect these cases - apart from never sinking possibly trapping stmts.
At least you could say that the side-effect of trapping has to be preserved
(note that we do not generally do that, which you might consider a bug).
Richard.
> Again, please correct if I missed something important.
> Thanks very much.
>
> --
> Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-04-11 9:09 ` Richard Guenther
@ 2012-04-11 9:25 ` Bin.Cheng
2012-04-11 10:02 ` Richard Guenther
0 siblings, 1 reply; 15+ messages in thread
From: Bin.Cheng @ 2012-04-11 9:25 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Wed, Apr 11, 2012 at 5:09 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Apr 11, 2012 at 10:05 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>
>> Turns out if-conversion checks whether gimple statement traps or not.
>> For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap,
>> because it sees the index not INTEGER_CST.
>
> Yes. After sinking the load is no longer executed unconditionally but
> if-conversion would make it so. This is information loss caused by
> sinking.
>
>> Two questions:
>> 1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF.
>
> No, the index may be out-of-bounds.
Apart from this topic. Actually I mean we can do more bound check
rather than assume
out-of-bounds if the index is not CST.
For this example, the source code is like:
#define N 32
int a[N], b[N];
int d[N], e[N];
int k[N];
__attribute__((noinline, noclone)) void
f4 (void)
{
int i;
for (i = 0; i < N/2; ++i)
{
int d0 = d[2*i], e0 = e[2*i];
int d1 = d[2*i+1], e1 = e[2*i+1];
k[2*i] = a[2*i] >= b[2*i] ? d0 : e0;
k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1;
}
}
The load "d[2*i+1]" is never out-of-bound, right?
>
>> 2, Should I check this before sinking load from memory? If yes, then why sink of
>> store does not do such check?
>
> Sinking is never a problem - the code will only be executed less times. The
> issue with if-conversion is that it speculates the loads / stores, so they may
> not trap if they were originally not executed.
>
> So this is a pass ordering issue - sinking and if-conversion have different
> conflicting goals. Btw, you also make RTL if-conversion harder. I suppose
Yes, I have already found a case resulting in bad basic block ordering
at RTL level,
though not sure it RTL if-conversion related.
> you should try to avoid messing up if-conversion possibilities so early,
> thus, not sink in these cases. The same issue is present
> for non-loads that are possibly trapping. So I'm not even sure we can easily
> detect these cases - apart from never sinking possibly trapping stmts.
>
> At least you could say that the side-effect of trapping has to be preserved
> (note that we do not generally do that, which you might consider a bug).
Understood. I have already tested to not sink possibly trapping stmt, but not
sure whether this is still wanted in GCC.
Thanks.
--
Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE?
2012-04-11 9:25 ` Bin.Cheng
@ 2012-04-11 10:02 ` Richard Guenther
0 siblings, 0 replies; 15+ messages in thread
From: Richard Guenther @ 2012-04-11 10:02 UTC (permalink / raw)
To: Bin.Cheng; +Cc: gcc
On Wed, Apr 11, 2012 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Apr 11, 2012 at 5:09 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Apr 11, 2012 at 10:05 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>
>>>
>>> Turns out if-conversion checks whether gimple statement traps or not.
>>> For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap,
>>> because it sees the index not INTEGER_CST.
>>
>> Yes. After sinking the load is no longer executed unconditionally but
>> if-conversion would make it so. This is information loss caused by
>> sinking.
>>
>>> Two questions:
>>> 1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF.
>>
>> No, the index may be out-of-bounds.
>
> Apart from this topic. Actually I mean we can do more bound check
> rather than assume
> out-of-bounds if the index is not CST.
> For this example, the source code is like:
>
> #define N 32
> int a[N], b[N];
> int d[N], e[N];
> int k[N];
>
> __attribute__((noinline, noclone)) void
> f4 (void)
> {
> int i;
> for (i = 0; i < N/2; ++i)
> {
> int d0 = d[2*i], e0 = e[2*i];
> int d1 = d[2*i+1], e1 = e[2*i+1];
> k[2*i] = a[2*i] >= b[2*i] ? d0 : e0;
> k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1;
> }
> }
>
> The load "d[2*i+1]" is never out-of-bound, right?
Right. value-range analysis could set TREE_NO_TRAP on the memory
reference.
>>
>>> 2, Should I check this before sinking load from memory? If yes, then why sink of
>>> store does not do such check?
>>
>> Sinking is never a problem - the code will only be executed less times. The
>> issue with if-conversion is that it speculates the loads / stores, so they may
>> not trap if they were originally not executed.
>>
>> So this is a pass ordering issue - sinking and if-conversion have different
>> conflicting goals. Btw, you also make RTL if-conversion harder. I suppose
> Yes, I have already found a case resulting in bad basic block ordering
> at RTL level,
> though not sure it RTL if-conversion related.
>
>> you should try to avoid messing up if-conversion possibilities so early,
>> thus, not sink in these cases. The same issue is present
>> for non-loads that are possibly trapping. So I'm not even sure we can easily
>> detect these cases - apart from never sinking possibly trapping stmts.
>>
>> At least you could say that the side-effect of trapping has to be preserved
>> (note that we do not generally do that, which you might consider a bug).
>
> Understood. I have already tested to not sink possibly trapping stmt, but not
> sure whether this is still wanted in GCC.
In general the sinking is wanted. For non-trapping stmts it would be
obvious I think.
Richard.
> Thanks.
>
>
> --
> Best Regards.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-04-11 10:02 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-29 10:02 Missed optimization in PRE? Bin.Cheng
2012-03-29 10:07 ` Richard Guenther
2012-03-29 10:10 ` Bin.Cheng
2012-03-29 10:14 ` Richard Guenther
2012-03-29 10:22 ` Bin.Cheng
2012-03-29 15:25 ` Bin.Cheng
2012-03-30 8:16 ` Richard Guenther
2012-03-30 9:43 ` Bin.Cheng
2012-04-09 6:00 ` Bin.Cheng
2012-04-09 11:02 ` Richard Guenther
2012-04-11 3:28 ` Bin.Cheng
2012-04-11 8:05 ` Bin.Cheng
2012-04-11 9:09 ` Richard Guenther
2012-04-11 9:25 ` Bin.Cheng
2012-04-11 10:02 ` Richard Guenther
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).