* Missed optimization in PRE? @ 2012-03-29 10:02 Bin.Cheng 2012-03-29 10:07 ` Richard Guenther 0 siblings, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-03-29 10:02 UTC (permalink / raw) To: gcc Hi, Following is the tree dump of 094t.pre for a test program. Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, bb7, bb5, bb6>, but why not lowered into basic block 3, where it is used. BTW, seems no tree pass handles this case currently. Any idea? Thanks int z$imag; int z$real; int D.5378; int D.5377; int D.5376; int D.5375; <bb 2>: D.5375_11 = REALPART_EXPR <g2>; D.5376_12 = IMAGPART_EXPR <g2>; D.5377_13 = REALPART_EXPR <g3>; D.5378_14 = IMAGPART_EXPR <g3>; if (D.5375_11 == D.5377_13) goto <bb 3>; else goto <bb 7>; <bb 3>: if (D.5376_12 == D.5378_14) goto <bb 4>; else goto <bb 7>; <bb 4>: z$real_15 = REALPART_EXPR <g1>; z$imag_16 = IMAGPART_EXPR <g1>; goto <bb 6>; <bb 7>: <bb 5>: z$real_17 = REALPART_EXPR <g4>; z$imag_18 = IMAGPART_EXPR <g4>; <bb 6>: # z$real_19 = PHI <z$real_15(4), z$real_17(5)> # z$imag_20 = PHI <z$imag_16(4), z$imag_18(5)> REALPART_EXPR <g1> = z$real_19; IMAGPART_EXPR <g1> = z$imag_20; return 0; -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-29 10:02 Missed optimization in PRE? Bin.Cheng @ 2012-03-29 10:07 ` Richard Guenther 2012-03-29 10:10 ` Bin.Cheng 0 siblings, 1 reply; 15+ messages in thread From: Richard Guenther @ 2012-03-29 10:07 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > Hi, > Following is the tree dump of 094t.pre for a test program. > Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, > bb7, bb5, bb6>, > but why not lowered into basic block 3, where it is used. > > BTW, seems no tree pass handles this case currently. tree-ssa-sink.c should do this. > Any idea? Thanks > > int z$imag; > int z$real; > int D.5378; > int D.5377; > int D.5376; > int D.5375; > > <bb 2>: > D.5375_11 = REALPART_EXPR <g2>; > D.5376_12 = IMAGPART_EXPR <g2>; > D.5377_13 = REALPART_EXPR <g3>; > D.5378_14 = IMAGPART_EXPR <g3>; > if (D.5375_11 == D.5377_13) > goto <bb 3>; > else > goto <bb 7>; > > <bb 3>: > if (D.5376_12 == D.5378_14) > goto <bb 4>; > else > goto <bb 7>; > > <bb 4>: > z$real_15 = REALPART_EXPR <g1>; > z$imag_16 = IMAGPART_EXPR <g1>; > goto <bb 6>; > > <bb 7>: > > <bb 5>: > z$real_17 = REALPART_EXPR <g4>; > z$imag_18 = IMAGPART_EXPR <g4>; > > <bb 6>: > # z$real_19 = PHI <z$real_15(4), z$real_17(5)> > # z$imag_20 = PHI <z$imag_16(4), z$imag_18(5)> > REALPART_EXPR <g1> = z$real_19; > IMAGPART_EXPR <g1> = z$imag_20; > return 0; > > -- > Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-29 10:07 ` Richard Guenther @ 2012-03-29 10:10 ` Bin.Cheng 2012-03-29 10:14 ` Richard Guenther 0 siblings, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-03-29 10:10 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> Hi, >> Following is the tree dump of 094t.pre for a test program. >> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >> bb7, bb5, bb6>, >> but why not lowered into basic block 3, where it is used. >> >> BTW, seems no tree pass handles this case currently. > > tree-ssa-sink.c should do this. > It does not work for me, I will double check and update soon. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-29 10:10 ` Bin.Cheng @ 2012-03-29 10:14 ` Richard Guenther 2012-03-29 10:22 ` Bin.Cheng 2012-03-29 15:25 ` Bin.Cheng 0 siblings, 2 replies; 15+ messages in thread From: Richard Guenther @ 2012-03-29 10:14 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> Hi, >>> Following is the tree dump of 094t.pre for a test program. >>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>> bb7, bb5, bb6>, >>> but why not lowered into basic block 3, where it is used. >>> >>> BTW, seems no tree pass handles this case currently. >> >> tree-ssa-sink.c should do this. >> > It does not work for me, I will double check and update soon. Well, "should" as in, it's the place to do it. And certainly the pass can sink loads, so this must be a missed optimization. Richard. > -- > Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-29 10:14 ` Richard Guenther @ 2012-03-29 10:22 ` Bin.Cheng 2012-03-29 15:25 ` Bin.Cheng 1 sibling, 0 replies; 15+ messages in thread From: Bin.Cheng @ 2012-03-29 10:22 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> Hi, >>>> Following is the tree dump of 094t.pre for a test program. >>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>>> bb7, bb5, bb6>, >>>> but why not lowered into basic block 3, where it is used. >>>> >>>> BTW, seems no tree pass handles this case currently. >>> >>> tree-ssa-sink.c should do this. >>> >> It does not work for me, I will double check and update soon. > > Well, "should" as in, it's the place to do it. And certainly the pass can sink > loads, so this must be a missed optimization. > ok, I will investigate it. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-29 10:14 ` Richard Guenther 2012-03-29 10:22 ` Bin.Cheng @ 2012-03-29 15:25 ` Bin.Cheng 2012-03-30 8:16 ` Richard Guenther 1 sibling, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-03-29 15:25 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> Hi, >>>> Following is the tree dump of 094t.pre for a test program. >>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>>> bb7, bb5, bb6>, >>>> but why not lowered into basic block 3, where it is used. >>>> >>>> BTW, seems no tree pass handles this case currently. >>> >>> tree-ssa-sink.c should do this. >>> >> It does not work for me, I will double check and update soon. > > Well, "should" as in, it's the place to do it. And certainly the pass can sink > loads, so this must be a missed optimization. > Curiously, it is said explicitly that "We don't want to sink loads from memory." in tree-ssa-sink.c function statement_sink_location, and the condition is if (stmt_ends_bb_p (stmt) || gimple_has_side_effects (stmt) || gimple_has_volatile_ops (stmt) || (gimple_vuse (stmt) && !gimple_vdef (stmt)) <-----------------check load || (cfun->has_local_explicit_reg_vars && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode)) return false; I haven't found any clue about this decision in ChangeLogs. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-29 15:25 ` Bin.Cheng @ 2012-03-30 8:16 ` Richard Guenther 2012-03-30 9:43 ` Bin.Cheng 0 siblings, 1 reply; 15+ messages in thread From: Richard Guenther @ 2012-03-30 8:16 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>>> Hi, >>>>> Following is the tree dump of 094t.pre for a test program. >>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>>>> bb7, bb5, bb6>, >>>>> but why not lowered into basic block 3, where it is used. >>>>> >>>>> BTW, seems no tree pass handles this case currently. >>>> >>>> tree-ssa-sink.c should do this. >>>> >>> It does not work for me, I will double check and update soon. >> >> Well, "should" as in, it's the place to do it. And certainly the pass can sink >> loads, so this must be a missed optimization. >> > Curiously, it is said explicitly that "We don't want to sink loads from memory." > in tree-ssa-sink.c function statement_sink_location, and the condition is > > if (stmt_ends_bb_p (stmt) > || gimple_has_side_effects (stmt) > || gimple_has_volatile_ops (stmt) > || (gimple_vuse (stmt) && !gimple_vdef (stmt)) > <-----------------check load > || (cfun->has_local_explicit_reg_vars > && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode)) > return false; > > I haven't found any clue about this decision in ChangeLogs. Ah, that's probably because usually you want to hoist loads and sink stores, separating them (like a scheduler would do). We'd want to restrict sinking of loads to sink into not post-dominated regions (thus where they end up being executed less times). Richard. > > -- > Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-30 8:16 ` Richard Guenther @ 2012-03-30 9:43 ` Bin.Cheng 2012-04-09 6:00 ` Bin.Cheng 0 siblings, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-03-30 9:43 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Fri, Mar 30, 2012 at 4:15 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>>>> Hi, >>>>>> Following is the tree dump of 094t.pre for a test program. >>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>>>>> bb7, bb5, bb6>, >>>>>> but why not lowered into basic block 3, where it is used. >>>>>> >>>>>> BTW, seems no tree pass handles this case currently. >>>>> >>>>> tree-ssa-sink.c should do this. >>>>> >>>> It does not work for me, I will double check and update soon. >>> >>> Well, "should" as in, it's the place to do it. And certainly the pass can sink >>> loads, so this must be a missed optimization. >>> >> Curiously, it is said explicitly that "We don't want to sink loads from memory." >> in tree-ssa-sink.c function statement_sink_location, and the condition is >> >> if (stmt_ends_bb_p (stmt) >> || gimple_has_side_effects (stmt) >> || gimple_has_volatile_ops (stmt) >> || (gimple_vuse (stmt) && !gimple_vdef (stmt)) >> <-----------------check load >> || (cfun->has_local_explicit_reg_vars >> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode)) >> return false; >> >> I haven't found any clue about this decision in ChangeLogs. > > Ah, that's probably because usually you want to hoist loads and sink stores, > separating them (like a scheduler would do). We'd want to restrict sinking > of loads to sink into not post-dominated regions (thus where they end up > being executed less times). > Understood, I will work on this. Thanks. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-03-30 9:43 ` Bin.Cheng @ 2012-04-09 6:00 ` Bin.Cheng 2012-04-09 11:02 ` Richard Guenther 0 siblings, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-04-09 6:00 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Fri, Mar 30, 2012 at 4:15 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>>>>> Hi, >>>>>>> Following is the tree dump of 094t.pre for a test program. >>>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>>>>>> bb7, bb5, bb6>, >>>>>>> but why not lowered into basic block 3, where it is used. >>>>>>> >>>>>>> BTW, seems no tree pass handles this case currently. >>>>>> >>>>>> tree-ssa-sink.c should do this. >>>>>> >>>>> It does not work for me, I will double check and update soon. >>>> >>>> Well, "should" as in, it's the place to do it. And certainly the pass can sink >>>> loads, so this must be a missed optimization. >>>> >>> Curiously, it is said explicitly that "We don't want to sink loads from memory." >>> in tree-ssa-sink.c function statement_sink_location, and the condition is >>> >>> if (stmt_ends_bb_p (stmt) >>> || gimple_has_side_effects (stmt) >>> || gimple_has_volatile_ops (stmt) >>> || (gimple_vuse (stmt) && !gimple_vdef (stmt)) >>> <-----------------check load >>> || (cfun->has_local_explicit_reg_vars >>> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode)) >>> return false; >>> >>> I haven't found any clue about this decision in ChangeLogs. >> >> Ah, that's probably because usually you want to hoist loads and sink stores, >> separating them (like a scheduler would do). We'd want to restrict sinking >> of loads to sink into not post-dominated regions (thus where they end up >> being executed less times). Hi Richard, I am testing a patch to sink load of memory to proper basic block. Everything goes fine except auto-vectorization, sinking of load sometime corrupts the canonical form of data references. I haven't touched auto-vec before and cannot tell whether it's good or bad to do sink before auto-vec. For example, the slp-cond-1.c <bb 3>: # i_39 = PHI <i_32(11), 0(2)> D.5150_5 = i_39 * 2; D.5151_10 = D.5150_5 + 1; D.5153_17 = a[D.5150_5]; D.5154_19 = b[D.5150_5]; if (D.5153_17 >= D.5154_19) goto <bb 9>; else goto <bb 4>; <bb 9>: d0_6 = d[D.5150_5]; <-----this is sunk from bb3 goto <bb 5>; <bb 4>: e0_8 = e[D.5150_5]; <-----this is sunk from bb3 <bb 5>: # d0_2 = PHI <d0_6(9), e0_8(4)> k[D.5150_5] = d0_2; D.5159_26 = a[D.5151_10]; D.5160_29 = b[D.5151_10]; if (D.5159_26 >= D.5160_29) goto <bb 10>; else goto <bb 6>; <bb 10>: d1_11 = d[D.5151_10]; <-----this is sunk from bb3 goto <bb 7>; <bb 6>: e1_14 = e[D.5151_10]; <-----this is sunk from bb3 <bb 7>: ....... I will look into auto-vect but not sure how to handle this case. Any comments? Thanks very much. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-04-09 6:00 ` Bin.Cheng @ 2012-04-09 11:02 ` Richard Guenther 2012-04-11 3:28 ` Bin.Cheng 0 siblings, 1 reply; 15+ messages in thread From: Richard Guenther @ 2012-04-09 11:02 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Fri, Mar 30, 2012 at 4:15 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther >>>>>> <richard.guenther@gmail.com> wrote: >>>>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>>>>>> Hi, >>>>>>>> Following is the tree dump of 094t.pre for a test program. >>>>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>>>>>>> bb7, bb5, bb6>, >>>>>>>> but why not lowered into basic block 3, where it is used. >>>>>>>> >>>>>>>> BTW, seems no tree pass handles this case currently. >>>>>>> >>>>>>> tree-ssa-sink.c should do this. >>>>>>> >>>>>> It does not work for me, I will double check and update soon. >>>>> >>>>> Well, "should" as in, it's the place to do it. And certainly the pass can sink >>>>> loads, so this must be a missed optimization. >>>>> >>>> Curiously, it is said explicitly that "We don't want to sink loads from memory." >>>> in tree-ssa-sink.c function statement_sink_location, and the condition is >>>> >>>> if (stmt_ends_bb_p (stmt) >>>> || gimple_has_side_effects (stmt) >>>> || gimple_has_volatile_ops (stmt) >>>> || (gimple_vuse (stmt) && !gimple_vdef (stmt)) >>>> <-----------------check load >>>> || (cfun->has_local_explicit_reg_vars >>>> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode)) >>>> return false; >>>> >>>> I haven't found any clue about this decision in ChangeLogs. >>> >>> Ah, that's probably because usually you want to hoist loads and sink stores, >>> separating them (like a scheduler would do). We'd want to restrict sinking >>> of loads to sink into not post-dominated regions (thus where they end up >>> being executed less times). > > Hi Richard, > I am testing a patch to sink load of memory to proper basic block. > Everything goes fine except auto-vectorization, sinking of load sometime > corrupts the canonical form of data references. I haven't touched auto-vec > before and cannot tell whether it's good or bad to do sink before auto-vec. > For example, the slp-cond-1.c > > <bb 3>: > # i_39 = PHI <i_32(11), 0(2)> > D.5150_5 = i_39 * 2; > D.5151_10 = D.5150_5 + 1; > D.5153_17 = a[D.5150_5]; > D.5154_19 = b[D.5150_5]; > if (D.5153_17 >= D.5154_19) > goto <bb 9>; > else > goto <bb 4>; > > <bb 9>: > d0_6 = d[D.5150_5]; <-----this is sunk from bb3 > goto <bb 5>; > > <bb 4>: > e0_8 = e[D.5150_5]; <-----this is sunk from bb3 > > <bb 5>: > # d0_2 = PHI <d0_6(9), e0_8(4)> > k[D.5150_5] = d0_2; > D.5159_26 = a[D.5151_10]; > D.5160_29 = b[D.5151_10]; > if (D.5159_26 >= D.5160_29) > goto <bb 10>; > else > goto <bb 6>; > > > <bb 10>: > d1_11 = d[D.5151_10]; <-----this is sunk from bb3 > goto <bb 7>; > > <bb 6>: > e1_14 = e[D.5151_10]; <-----this is sunk from bb3 > > <bb 7>: > ....... > > I will look into auto-vect but not sure how to handle this case. > > Any comments? Thanks very much. Simple - the vectorizer expects empty latch blocks. So simply never sink stuff into latch-blocks - I think the current code already tries to avoid that for regular computations. Richard. > -- > Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-04-09 11:02 ` Richard Guenther @ 2012-04-11 3:28 ` Bin.Cheng 2012-04-11 8:05 ` Bin.Cheng 0 siblings, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-04-11 3:28 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Mon, Apr 9, 2012 at 7:02 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> >> Hi Richard, >> I am testing a patch to sink load of memory to proper basic block. >> Everything goes fine except auto-vectorization, sinking of load sometime >> corrupts the canonical form of data references. I haven't touched auto-vec >> before and cannot tell whether it's good or bad to do sink before auto-vec. >> For example, the slp-cond-1.c >> >> <bb 3>: >> # i_39 = PHI <i_32(11), 0(2)> >> D.5150_5 = i_39 * 2; >> D.5151_10 = D.5150_5 + 1; >> D.5153_17 = a[D.5150_5]; >> D.5154_19 = b[D.5150_5]; >> if (D.5153_17 >= D.5154_19) >> goto <bb 9>; >> else >> goto <bb 4>; >> >> <bb 9>: >> d0_6 = d[D.5150_5]; <-----this is sunk from bb3 >> goto <bb 5>; >> >> <bb 4>: >> e0_8 = e[D.5150_5]; <-----this is sunk from bb3 >> >> <bb 5>: >> # d0_2 = PHI <d0_6(9), e0_8(4)> >> k[D.5150_5] = d0_2; >> D.5159_26 = a[D.5151_10]; >> D.5160_29 = b[D.5151_10]; >> if (D.5159_26 >= D.5160_29) >> goto <bb 10>; >> else >> goto <bb 6>; >> >> >> <bb 10>: >> d1_11 = d[D.5151_10]; <-----this is sunk from bb3 >> goto <bb 7>; >> >> <bb 6>: >> e1_14 = e[D.5151_10]; <-----this is sunk from bb3 >> >> <bb 7>: >> ....... >> >> I will look into auto-vect but not sure how to handle this case. >> >> Any comments? Thanks very much. > > Simple - the vectorizer expects empty latch blocks. So simply > never sink stuff into latch-blocks - I think the current code already > tries to avoid that for regular computations. Seems not the story. The sinkto basic block is not latch basic block at all. The problem is about if-conversion, code of the case in slp-cond-1.c is like: __attribute__((noinline, noclone)) void f4 (void) { int i; for (i = 0; i < N/2; ++i) { int d0 = d[2*i], e0 = e[2*i]; int d1 = d[2*i+1], e1 = e[2*i+1]; k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; <------example k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1; } } It is strictly formed in conditional operations, just like the output of if-conversion pass before auto-vec. Now sink pass sinks load of d0/e0 into then/else branch, since they are partial dead code in the opposite branch, which corrupts the canonical form. Ideally, if-conversion pass should handle the sinked code and collapse if-then-else into conditional operations. But as showed by dump of if-conversion pass before auto-vect: <bb 3>: # i_39 = PHI <i_32(10), 0(2)> # ivtmp.137_41 = PHI <ivtmp.137_38(10), 16(2)> D.5150_5 = i_39 * 2; D.5151_10 = D.5150_5 + 1; D.5153_17 = a[D.5150_5]; D.5154_19 = b[D.5150_5]; if (D.5153_17 >= D.5154_19) goto <bb 4>; else goto <bb 5>; <bb 4>: d0_6 = d[D.5150_5]; goto <bb 6>; <bb 5>: e0_8 = e[D.5150_5]; <bb 6>: # d0_2 = PHI <d0_6(4), e0_8(5)> k[D.5150_5] = d0_2; It does not work as expected. So I think sink pass is fine for this case, it is the if-conversion pass need further investigation. Comments? Thanks. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-04-11 3:28 ` Bin.Cheng @ 2012-04-11 8:05 ` Bin.Cheng 2012-04-11 9:09 ` Richard Guenther 0 siblings, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-04-11 8:05 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Mon, Apr 9, 2012 at 7:02 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > >>> >>> Hi Richard, >>> I am testing a patch to sink load of memory to proper basic block. >>> Everything goes fine except auto-vectorization, sinking of load sometime >>> corrupts the canonical form of data references. I haven't touched auto-vec >>> before and cannot tell whether it's good or bad to do sink before auto-vec. >>> For example, the slp-cond-1.c >>> >>> <bb 3>: >>> # i_39 = PHI <i_32(11), 0(2)> >>> D.5150_5 = i_39 * 2; >>> D.5151_10 = D.5150_5 + 1; >>> D.5153_17 = a[D.5150_5]; >>> D.5154_19 = b[D.5150_5]; >>> if (D.5153_17 >= D.5154_19) >>> goto <bb 9>; >>> else >>> goto <bb 4>; >>> >>> <bb 9>: >>> d0_6 = d[D.5150_5]; <-----this is sunk from bb3 >>> goto <bb 5>; >>> >>> <bb 4>: >>> e0_8 = e[D.5150_5]; <-----this is sunk from bb3 >>> >>> <bb 5>: >>> # d0_2 = PHI <d0_6(9), e0_8(4)> >>> k[D.5150_5] = d0_2; >>> D.5159_26 = a[D.5151_10]; >>> D.5160_29 = b[D.5151_10]; >>> if (D.5159_26 >= D.5160_29) >>> goto <bb 10>; >>> else >>> goto <bb 6>; >>> >>> >>> <bb 10>: >>> d1_11 = d[D.5151_10]; <-----this is sunk from bb3 >>> goto <bb 7>; >>> >>> <bb 6>: >>> e1_14 = e[D.5151_10]; <-----this is sunk from bb3 >>> >>> <bb 7>: >>> ....... >>> >>> I will look into auto-vect but not sure how to handle this case. >>> >>> Any comments? Thanks very much. >> >> Simple - the vectorizer expects empty latch blocks. So simply >> never sink stuff into latch-blocks - I think the current code already >> tries to avoid that for regular computations. > > Seems not the story. The sinkto basic block is not latch basic block at all. > > The problem is about if-conversion, code of the case in slp-cond-1.c is like: > > __attribute__((noinline, noclone)) void > f4 (void) > { > int i; > for (i = 0; i < N/2; ++i) > { > int d0 = d[2*i], e0 = e[2*i]; > int d1 = d[2*i+1], e1 = e[2*i+1]; > k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; <------example > k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1; > } > } > It is strictly formed in conditional operations, just like the output > of if-conversion pass before auto-vec. > > Now sink pass sinks load of d0/e0 into then/else branch, since they > are partial dead code in the opposite branch, which corrupts the > canonical form. > > Ideally, if-conversion pass should handle the sinked code and collapse > if-then-else into conditional operations. > But as showed by dump of if-conversion pass before auto-vect: > > <bb 3>: > # i_39 = PHI <i_32(10), 0(2)> > # ivtmp.137_41 = PHI <ivtmp.137_38(10), 16(2)> > D.5150_5 = i_39 * 2; > D.5151_10 = D.5150_5 + 1; > D.5153_17 = a[D.5150_5]; > D.5154_19 = b[D.5150_5]; > if (D.5153_17 >= D.5154_19) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > d0_6 = d[D.5150_5]; > goto <bb 6>; > > <bb 5>: > e0_8 = e[D.5150_5]; > > <bb 6>: > # d0_2 = PHI <d0_6(4), e0_8(5)> > k[D.5150_5] = d0_2; > > It does not work as expected. > > So I think sink pass is fine for this case, it is the if-conversion > pass need further investigation. > Comments? Thanks. Turns out if-conversion checks whether gimple statement traps or not. For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap, because it sees the index not INTEGER_CST. Two questions: 1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF. 2, Should I check this before sinking load from memory? If yes, then why sink of store does not do such check? Again, please correct if I missed something important. Thanks very much. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-04-11 8:05 ` Bin.Cheng @ 2012-04-11 9:09 ` Richard Guenther 2012-04-11 9:25 ` Bin.Cheng 0 siblings, 1 reply; 15+ messages in thread From: Richard Guenther @ 2012-04-11 9:09 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc On Wed, Apr 11, 2012 at 10:05 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Mon, Apr 9, 2012 at 7:02 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> >>>> >>>> Hi Richard, >>>> I am testing a patch to sink load of memory to proper basic block. >>>> Everything goes fine except auto-vectorization, sinking of load sometime >>>> corrupts the canonical form of data references. I haven't touched auto-vec >>>> before and cannot tell whether it's good or bad to do sink before auto-vec. >>>> For example, the slp-cond-1.c >>>> >>>> <bb 3>: >>>> # i_39 = PHI <i_32(11), 0(2)> >>>> D.5150_5 = i_39 * 2; >>>> D.5151_10 = D.5150_5 + 1; >>>> D.5153_17 = a[D.5150_5]; >>>> D.5154_19 = b[D.5150_5]; >>>> if (D.5153_17 >= D.5154_19) >>>> goto <bb 9>; >>>> else >>>> goto <bb 4>; >>>> >>>> <bb 9>: >>>> d0_6 = d[D.5150_5]; <-----this is sunk from bb3 >>>> goto <bb 5>; >>>> >>>> <bb 4>: >>>> e0_8 = e[D.5150_5]; <-----this is sunk from bb3 >>>> >>>> <bb 5>: >>>> # d0_2 = PHI <d0_6(9), e0_8(4)> >>>> k[D.5150_5] = d0_2; >>>> D.5159_26 = a[D.5151_10]; >>>> D.5160_29 = b[D.5151_10]; >>>> if (D.5159_26 >= D.5160_29) >>>> goto <bb 10>; >>>> else >>>> goto <bb 6>; >>>> >>>> >>>> <bb 10>: >>>> d1_11 = d[D.5151_10]; <-----this is sunk from bb3 >>>> goto <bb 7>; >>>> >>>> <bb 6>: >>>> e1_14 = e[D.5151_10]; <-----this is sunk from bb3 >>>> >>>> <bb 7>: >>>> ....... >>>> >>>> I will look into auto-vect but not sure how to handle this case. >>>> >>>> Any comments? Thanks very much. >>> >>> Simple - the vectorizer expects empty latch blocks. So simply >>> never sink stuff into latch-blocks - I think the current code already >>> tries to avoid that for regular computations. >> >> Seems not the story. The sinkto basic block is not latch basic block at all. >> >> The problem is about if-conversion, code of the case in slp-cond-1.c is like: >> >> __attribute__((noinline, noclone)) void >> f4 (void) >> { >> int i; >> for (i = 0; i < N/2; ++i) >> { >> int d0 = d[2*i], e0 = e[2*i]; >> int d1 = d[2*i+1], e1 = e[2*i+1]; >> k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; <------example >> k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1; >> } >> } >> It is strictly formed in conditional operations, just like the output >> of if-conversion pass before auto-vec. >> >> Now sink pass sinks load of d0/e0 into then/else branch, since they >> are partial dead code in the opposite branch, which corrupts the >> canonical form. >> >> Ideally, if-conversion pass should handle the sinked code and collapse >> if-then-else into conditional operations. >> But as showed by dump of if-conversion pass before auto-vect: >> >> <bb 3>: >> # i_39 = PHI <i_32(10), 0(2)> >> # ivtmp.137_41 = PHI <ivtmp.137_38(10), 16(2)> >> D.5150_5 = i_39 * 2; >> D.5151_10 = D.5150_5 + 1; >> D.5153_17 = a[D.5150_5]; >> D.5154_19 = b[D.5150_5]; >> if (D.5153_17 >= D.5154_19) >> goto <bb 4>; >> else >> goto <bb 5>; >> >> <bb 4>: >> d0_6 = d[D.5150_5]; >> goto <bb 6>; >> >> <bb 5>: >> e0_8 = e[D.5150_5]; >> >> <bb 6>: >> # d0_2 = PHI <d0_6(4), e0_8(5)> >> k[D.5150_5] = d0_2; >> >> It does not work as expected. >> >> So I think sink pass is fine for this case, it is the if-conversion >> pass need further investigation. >> Comments? Thanks. > > Turns out if-conversion checks whether gimple statement traps or not. > For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap, > because it sees the index not INTEGER_CST. Yes. After sinking the load is no longer executed unconditionally but if-conversion would make it so. This is information loss caused by sinking. > Two questions: > 1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF. No, the index may be out-of-bounds. > 2, Should I check this before sinking load from memory? If yes, then why sink of > store does not do such check? Sinking is never a problem - the code will only be executed less times. The issue with if-conversion is that it speculates the loads / stores, so they may not trap if they were originally not executed. So this is a pass ordering issue - sinking and if-conversion have different conflicting goals. Btw, you also make RTL if-conversion harder. I suppose you should try to avoid messing up if-conversion possibilities so early, thus, not sink in these cases. The same issue is present for non-loads that are possibly trapping. So I'm not even sure we can easily detect these cases - apart from never sinking possibly trapping stmts. At least you could say that the side-effect of trapping has to be preserved (note that we do not generally do that, which you might consider a bug). Richard. > Again, please correct if I missed something important. > Thanks very much. > > -- > Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-04-11 9:09 ` Richard Guenther @ 2012-04-11 9:25 ` Bin.Cheng 2012-04-11 10:02 ` Richard Guenther 0 siblings, 1 reply; 15+ messages in thread From: Bin.Cheng @ 2012-04-11 9:25 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc On Wed, Apr 11, 2012 at 5:09 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, Apr 11, 2012 at 10:05 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> >> Turns out if-conversion checks whether gimple statement traps or not. >> For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap, >> because it sees the index not INTEGER_CST. > > Yes. After sinking the load is no longer executed unconditionally but > if-conversion would make it so. This is information loss caused by > sinking. > >> Two questions: >> 1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF. > > No, the index may be out-of-bounds. Apart from this topic. Actually I mean we can do more bound check rather than assume out-of-bounds if the index is not CST. For this example, the source code is like: #define N 32 int a[N], b[N]; int d[N], e[N]; int k[N]; __attribute__((noinline, noclone)) void f4 (void) { int i; for (i = 0; i < N/2; ++i) { int d0 = d[2*i], e0 = e[2*i]; int d1 = d[2*i+1], e1 = e[2*i+1]; k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1; } } The load "d[2*i+1]" is never out-of-bound, right? > >> 2, Should I check this before sinking load from memory? If yes, then why sink of >> store does not do such check? > > Sinking is never a problem - the code will only be executed less times. The > issue with if-conversion is that it speculates the loads / stores, so they may > not trap if they were originally not executed. > > So this is a pass ordering issue - sinking and if-conversion have different > conflicting goals. Btw, you also make RTL if-conversion harder. I suppose Yes, I have already found a case resulting in bad basic block ordering at RTL level, though not sure it RTL if-conversion related. > you should try to avoid messing up if-conversion possibilities so early, > thus, not sink in these cases. The same issue is present > for non-loads that are possibly trapping. So I'm not even sure we can easily > detect these cases - apart from never sinking possibly trapping stmts. > > At least you could say that the side-effect of trapping has to be preserved > (note that we do not generally do that, which you might consider a bug). Understood. I have already tested to not sink possibly trapping stmt, but not sure whether this is still wanted in GCC. Thanks. -- Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Missed optimization in PRE? 2012-04-11 9:25 ` Bin.Cheng @ 2012-04-11 10:02 ` Richard Guenther 0 siblings, 0 replies; 15+ messages in thread From: Richard Guenther @ 2012-04-11 10:02 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc On Wed, Apr 11, 2012 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Wed, Apr 11, 2012 at 5:09 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Wed, Apr 11, 2012 at 10:05 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Wed, Apr 11, 2012 at 11:28 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > >>> >>> Turns out if-conversion checks whether gimple statement traps or not. >>> For the statement "d0_6 = d[D.5150_5];", it assumes rhs might trap, >>> because it sees the index not INTEGER_CST. >> >> Yes. After sinking the load is no longer executed unconditionally but >> if-conversion would make it so. This is information loss caused by >> sinking. >> >>> Two questions: >>> 1, The check can be enhanced in gimple_could_trap_p_1 for ARRAY_REF. >> >> No, the index may be out-of-bounds. > > Apart from this topic. Actually I mean we can do more bound check > rather than assume > out-of-bounds if the index is not CST. > For this example, the source code is like: > > #define N 32 > int a[N], b[N]; > int d[N], e[N]; > int k[N]; > > __attribute__((noinline, noclone)) void > f4 (void) > { > int i; > for (i = 0; i < N/2; ++i) > { > int d0 = d[2*i], e0 = e[2*i]; > int d1 = d[2*i+1], e1 = e[2*i+1]; > k[2*i] = a[2*i] >= b[2*i] ? d0 : e0; > k[2*i+1] = a[2*i+1] >= b[2*i+1] ? d1 : e1; > } > } > > The load "d[2*i+1]" is never out-of-bound, right? Right. value-range analysis could set TREE_NO_TRAP on the memory reference. >> >>> 2, Should I check this before sinking load from memory? If yes, then why sink of >>> store does not do such check? >> >> Sinking is never a problem - the code will only be executed less times. The >> issue with if-conversion is that it speculates the loads / stores, so they may >> not trap if they were originally not executed. >> >> So this is a pass ordering issue - sinking and if-conversion have different >> conflicting goals. Btw, you also make RTL if-conversion harder. I suppose > Yes, I have already found a case resulting in bad basic block ordering > at RTL level, > though not sure it RTL if-conversion related. > >> you should try to avoid messing up if-conversion possibilities so early, >> thus, not sink in these cases. The same issue is present >> for non-loads that are possibly trapping. So I'm not even sure we can easily >> detect these cases - apart from never sinking possibly trapping stmts. >> >> At least you could say that the side-effect of trapping has to be preserved >> (note that we do not generally do that, which you might consider a bug). > > Understood. I have already tested to not sink possibly trapping stmt, but not > sure whether this is still wanted in GCC. In general the sinking is wanted. For non-trapping stmts it would be obvious I think. Richard. > Thanks. > > > -- > Best Regards. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-04-11 10:02 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-29 10:02 Missed optimization in PRE? Bin.Cheng 2012-03-29 10:07 ` Richard Guenther 2012-03-29 10:10 ` Bin.Cheng 2012-03-29 10:14 ` Richard Guenther 2012-03-29 10:22 ` Bin.Cheng 2012-03-29 15:25 ` Bin.Cheng 2012-03-30 8:16 ` Richard Guenther 2012-03-30 9:43 ` Bin.Cheng 2012-04-09 6:00 ` Bin.Cheng 2012-04-09 11:02 ` Richard Guenther 2012-04-11 3:28 ` Bin.Cheng 2012-04-11 8:05 ` Bin.Cheng 2012-04-11 9:09 ` Richard Guenther 2012-04-11 9:25 ` Bin.Cheng 2012-04-11 10:02 ` Richard Guenther
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).