* [PATCH, cprop] Check rtx_cost when propagating constant @ 2014-06-17 2:11 Zhenqiang Chen 2014-06-17 8:15 ` Richard Biener 0 siblings, 1 reply; 5+ messages in thread From: Zhenqiang Chen @ 2014-06-17 2:11 UTC (permalink / raw) To: gcc-patches; +Cc: Andrew Pinski Hi, For some large constant, ports like ARM, need one more instructions to operate it. e.g #define MASK 0xfe00ff void maskdata (int * data, int len) { int i = len; for (; i > 0; i -= 2) { data[i] &= MASK; data[i + 1] &= MASK; } } Need two instructions for each AND operation: and r3, r3, #16711935 bic r3, r3, #65536 If we keep the MASK in a register, loop2_invariant pass can hoist it out the loop. And it can be shared by different references. So the patch skips constant propagation if it makes INSN's cost higher. Bootstrap and no make check regression on X86-64 and ARM Chrome book. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org> * cprop.c (try_replace_reg): Check cost for constants. diff --git a/gcc/cprop.c b/gcc/cprop.c index aef3ee8..c9cf02a 100644 --- a/gcc/cprop.c +++ b/gcc/cprop.c @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) rtx src = 0; int success = 0; rtx set = single_set (insn); + int old_cost = 0; + bool copy_p = false; + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); + + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) + copy_p = true; + else + old_cost = set_rtx_cost (set, speed); /* Usually we substitute easy stuff, so we won't copy everything. We however need to take care to not duplicate non-trivial CONST @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) to = copy_rtx (to); validate_replace_src_group (from, to, insn); + + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. + And it can be shared by different references. So skip propagation if + it makes INSN's rtx cost higher. */ + if (set && !copy_p && CONSTANT_P (to)) + { + int new_cost = set_rtx_cost (set, speed); + if (new_cost > old_cost) + { + cancel_changes (0); + return false; + } + } + if (num_changes_pending () && apply_change_group ()) success = 1; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH, cprop] Check rtx_cost when propagating constant 2014-06-17 2:11 [PATCH, cprop] Check rtx_cost when propagating constant Zhenqiang Chen @ 2014-06-17 8:15 ` Richard Biener 2014-06-17 9:42 ` Zhenqiang Chen 0 siblings, 1 reply; 5+ messages in thread From: Richard Biener @ 2014-06-17 8:15 UTC (permalink / raw) To: Zhenqiang Chen; +Cc: gcc-patches, Andrew Pinski On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote: > Hi, > > For some large constant, ports like ARM, need one more instructions to > operate it. e.g > > #define MASK 0xfe00ff > void maskdata (int * data, int len) > { > int i = len; > for (; i > 0; i -= 2) > { > data[i] &= MASK; > data[i + 1] &= MASK; > } > } > > Need two instructions for each AND operation: > > and r3, r3, #16711935 > bic r3, r3, #65536 > > If we keep the MASK in a register, loop2_invariant pass can hoist it > out the loop. And it can be shared by different references. > > So the patch skips constant propagation if it makes INSN's cost higher. So cprop undos invariant motions work here? Should we make sure we add a REG_EQUAL note when not propagating? > Bootstrap and no make check regression on X86-64 and ARM Chrome book. > > OK for trunk? > > Thanks! > -Zhenqiang > > ChangeLog: > 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * cprop.c (try_replace_reg): Check cost for constants. > > diff --git a/gcc/cprop.c b/gcc/cprop.c > index aef3ee8..c9cf02a 100644 > --- a/gcc/cprop.c > +++ b/gcc/cprop.c > @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) > rtx src = 0; > int success = 0; > rtx set = single_set (insn); > + int old_cost = 0; > + bool copy_p = false; > + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); > + > + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) > + copy_p = true; > + else > + old_cost = set_rtx_cost (set, speed); Looks bogus for set == NULL? Also what about register pressure? I think this kind of change needs wider testing as RTX costs are usually not fully implemented and you introduce a new use kind (or is it already used elsewhere in this way to compute cost difference of a set with s/reg/const?). What kind of performance difference do you see? Thanks, Richard. > /* Usually we substitute easy stuff, so we won't copy everything. > We however need to take care to not duplicate non-trivial CONST > @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) > to = copy_rtx (to); > > validate_replace_src_group (from, to, insn); > + > + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. > + And it can be shared by different references. So skip propagation if > + it makes INSN's rtx cost higher. */ > + if (set && !copy_p && CONSTANT_P (to)) > + { > + int new_cost = set_rtx_cost (set, speed); > + if (new_cost > old_cost) > + { > + cancel_changes (0); > + return false; > + } > + } > + > if (num_changes_pending () && apply_change_group ()) > success = 1; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH, cprop] Check rtx_cost when propagating constant 2014-06-17 8:15 ` Richard Biener @ 2014-06-17 9:42 ` Zhenqiang Chen 2014-06-19 9:45 ` Zhenqiang Chen 0 siblings, 1 reply; 5+ messages in thread From: Zhenqiang Chen @ 2014-06-17 9:42 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Andrew Pinski On 17 June 2014 16:15, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen > <zhenqiang.chen@linaro.org> wrote: >> Hi, >> >> For some large constant, ports like ARM, need one more instructions to >> operate it. e.g >> >> #define MASK 0xfe00ff >> void maskdata (int * data, int len) >> { >> int i = len; >> for (; i > 0; i -= 2) >> { >> data[i] &= MASK; >> data[i + 1] &= MASK; >> } >> } >> >> Need two instructions for each AND operation: >> >> and r3, r3, #16711935 >> bic r3, r3, #65536 >> >> If we keep the MASK in a register, loop2_invariant pass can hoist it >> out the loop. And it can be shared by different references. >> >> So the patch skips constant propagation if it makes INSN's cost higher. > > So cprop undos invariant motions work here? Yes. GLOBAL CONST-PROP will undo invariant motions. > Should we make sure we add a REG_EQUAL note when not propagating? Logs show there already has REG_EQUAL note. >> Bootstrap and no make check regression on X86-64 and ARM Chrome book. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org> >> >> * cprop.c (try_replace_reg): Check cost for constants. >> >> diff --git a/gcc/cprop.c b/gcc/cprop.c >> index aef3ee8..c9cf02a 100644 >> --- a/gcc/cprop.c >> +++ b/gcc/cprop.c >> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) >> rtx src = 0; >> int success = 0; >> rtx set = single_set (insn); >> + int old_cost = 0; >> + bool copy_p = false; >> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); >> + >> + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) >> + copy_p = true; >> + else >> + old_cost = set_rtx_cost (set, speed); > > Looks bogus for set == NULL? set_rtx_cost has checked it. If it is NULL, the function will return 0; > Also what about register pressure? Do you think it has big register pressure impact? I think it does not increase register pressure. > I think this kind of change needs wider testing as RTX costs are > usually not fully implemented and you introduce a new use kind > (or is it already used elsewhere in this way to compute cost > difference of a set with s/reg/const?). Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the decision. e.g. in function attempt_change of auto-inc-dec.c, it has code segments like: old_cost = (set_src_cost (mem, speed) + set_rtx_cost (PATTERN (inc_insn.insn), speed)); new_cost = set_src_cost (mem_tmp, speed); ... if (old_cost < new_cost) { ... return false; } The usage of RTX costs in this patch is similar. I had run X86-64 bootstrap and regression tests with --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java And ARM bootstrap and regression tests with --enable-languages=c,c++,fortran,lto,objc,obj-c++ I will run tests on i686. What other tests do you think I have to run? > What kind of performance difference do you see? I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm backend changes). Coremark with some options show >10% performance improvement. dhrystone is a little better. Some wave in eembc, but overall result is better. I will run spec2000 on X86-64 and ARM, and back to you about the performance changes. Thanks! -Zhenqiang > Thanks, > Richard. > >> /* Usually we substitute easy stuff, so we won't copy everything. >> We however need to take care to not duplicate non-trivial CONST >> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) >> to = copy_rtx (to); >> >> validate_replace_src_group (from, to, insn); >> + >> + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. >> + And it can be shared by different references. So skip propagation if >> + it makes INSN's rtx cost higher. */ >> + if (set && !copy_p && CONSTANT_P (to)) >> + { >> + int new_cost = set_rtx_cost (set, speed); >> + if (new_cost > old_cost) >> + { >> + cancel_changes (0); >> + return false; >> + } >> + } >> + >> if (num_changes_pending () && apply_change_group ()) >> success = 1; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH, cprop] Check rtx_cost when propagating constant 2014-06-17 9:42 ` Zhenqiang Chen @ 2014-06-19 9:45 ` Zhenqiang Chen 2014-07-18 5:05 ` Jeff Law 0 siblings, 1 reply; 5+ messages in thread From: Zhenqiang Chen @ 2014-06-19 9:45 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Andrew Pinski On 17 June 2014 17:42, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote: > On 17 June 2014 16:15, Richard Biener <richard.guenther@gmail.com> wrote: >> On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen >> <zhenqiang.chen@linaro.org> wrote: >>> Hi, >>> >>> For some large constant, ports like ARM, need one more instructions to >>> operate it. e.g >>> >>> #define MASK 0xfe00ff >>> void maskdata (int * data, int len) >>> { >>> int i = len; >>> for (; i > 0; i -= 2) >>> { >>> data[i] &= MASK; >>> data[i + 1] &= MASK; >>> } >>> } >>> >>> Need two instructions for each AND operation: >>> >>> and r3, r3, #16711935 >>> bic r3, r3, #65536 >>> >>> If we keep the MASK in a register, loop2_invariant pass can hoist it >>> out the loop. And it can be shared by different references. >>> >>> So the patch skips constant propagation if it makes INSN's cost higher. >> >> So cprop undos invariant motions work here? > > Yes. GLOBAL CONST-PROP will undo invariant motions. > >> Should we make sure we add a REG_EQUAL note when not propagating? > > Logs show there already has REG_EQUAL note. > >>> Bootstrap and no make check regression on X86-64 and ARM Chrome book. >>> >>> OK for trunk? >>> >>> Thanks! >>> -Zhenqiang >>> >>> ChangeLog: >>> 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org> >>> >>> * cprop.c (try_replace_reg): Check cost for constants. >>> >>> diff --git a/gcc/cprop.c b/gcc/cprop.c >>> index aef3ee8..c9cf02a 100644 >>> --- a/gcc/cprop.c >>> +++ b/gcc/cprop.c >>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) >>> rtx src = 0; >>> int success = 0; >>> rtx set = single_set (insn); >>> + int old_cost = 0; >>> + bool copy_p = false; >>> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); >>> + >>> + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) >>> + copy_p = true; >>> + else >>> + old_cost = set_rtx_cost (set, speed); >> >> Looks bogus for set == NULL? > > set_rtx_cost has checked it. If it is NULL, the function will return 0; > >> Also what about register pressure? > > Do you think it has big register pressure impact? I think it does not > increase register pressure. > >> I think this kind of change needs wider testing as RTX costs are >> usually not fully implemented and you introduce a new use kind >> (or is it already used elsewhere in this way to compute cost >> difference of a set with s/reg/const?). > > Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the > decision. e.g. in function attempt_change of auto-inc-dec.c, it has > code segments like: > > old_cost = (set_src_cost (mem, speed) > + set_rtx_cost (PATTERN (inc_insn.insn), speed)); > new_cost = set_src_cost (mem_tmp, speed); > ... > if (old_cost < new_cost) > { > ... > return false; > } > > The usage of RTX costs in this patch is similar. > > I had run X86-64 bootstrap and regression tests with > --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java > > And ARM bootstrap and regression tests with > --enable-languages=c,c++,fortran,lto,objc,obj-c++ > > I will run tests on i686. What other tests do you think I have to run? > >> What kind of performance difference do you see? > > I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm > backend changes). Coremark with some options show >10% performance > improvement. dhrystone is a little better. Some wave in eembc, but > overall result is better. > > I will run spec2000 on X86-64 and ARM, and back to you about the > performance changes. Please ignore my previous comments about Cortex-M4 performance since it does not base on clean codes. Here is a summary for performance result on X86-64 and ARM. For X86-64, I run SPEC2000 INT and FP (-O3). There is no improvement or regression. As tests, I moved the code segment to end of function try_replace_reg and check insns which meet "success && new_cost > old_cost". Logs show only 52 occurrences for all SPEC2000 build and the only one instruction pattern: *adddi_1 is impacted. For *adddi_1, rtx_cost increases from 8 to 10 when changing a register operand to a constant. For ARM Cortex-M4, minimal changes for Coremark, Dhrystone and EEMBC. For ARM Chrome book (Cortex-A15), some wave in SPEC2000 INT test. But the final result does not show improvement or regression. The patch is updated to remove the "bogus" code and keep more constants. Bootstrap and no make check regression on X86-64, i686 and ARM. diff --git a/gcc/cprop.c b/gcc/cprop.c index aef3ee8..6ea6be0 100644 --- a/gcc/cprop.c +++ b/gcc/cprop.c @@ -733,6 +733,28 @@ try_replace_reg (rtx from, rtx to, rtx insn) rtx src = 0; int success = 0; rtx set = single_set (insn); + int old_cost = 0; + bool const_p = false; + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); + + if (set && SET_SRC (set)) + { + rtx src = SET_SRC (set); + if (REG_P (src) || GET_CODE (src) == SUBREG) + const_p = true; + else + { + if (note != 0 + && REG_NOTE_KIND (note) == REG_EQUAL + && (GET_CODE (XEXP (note, 0)) == CONST + || CONSTANT_P (XEXP (note, 0)))) + { + const_p = true; + } + else + old_cost = set_rtx_cost (set, speed); + } + } /* Usually we substitute easy stuff, so we won't copy everything. We however need to take care to not duplicate non-trivial CONST @@ -740,6 +762,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) to = copy_rtx (to); validate_replace_src_group (from, to, insn); + + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. + And it can be shared by different references. So skip propagation if + it makes INSN's rtx cost higher. */ + if (set && SET_SRC (set) && !const_p && CONSTANT_P (to)) + { + if (!CONSTANT_P (SET_SRC (set)) + && (set_rtx_cost (set, speed) > old_cost)) + { + cancel_changes (0); + return false; + } + } + if (num_changes_pending () && apply_change_group ()) success = 1; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH, cprop] Check rtx_cost when propagating constant 2014-06-19 9:45 ` Zhenqiang Chen @ 2014-07-18 5:05 ` Jeff Law 0 siblings, 0 replies; 5+ messages in thread From: Jeff Law @ 2014-07-18 5:05 UTC (permalink / raw) To: Zhenqiang Chen, Richard Biener; +Cc: gcc-patches, Andrew Pinski On 06/19/14 03:44, Zhenqiang Chen wrote: >>>> >>>> ChangeLog: >>>> 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org> >>>> >>>> * cprop.c (try_replace_reg): Check cost for constants. >>>> >>>> diff --git a/gcc/cprop.c b/gcc/cprop.c >>>> index aef3ee8..c9cf02a 100644 >>>> --- a/gcc/cprop.c >>>> +++ b/gcc/cprop.c >>>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) >>>> rtx src = 0; >>>> int success = 0; >>>> rtx set = single_set (insn); >>>> + int old_cost = 0; >>>> + bool copy_p = false; >>>> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); >>>> + >>>> + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) >>>> + copy_p = true; >>>> + else >>>> + old_cost = set_rtx_cost (set, speed); >>> >>> Looks bogus for set == NULL? >> >> set_rtx_cost has checked it. If it is NULL, the function will return 0; >> >>> Also what about register pressure? >> >> Do you think it has big register pressure impact? I think it does not >> increase register pressure. I would expect a small impact on register pressure. In general anytime we avoid propagating a constant to its uses and instead hold the value in a register register pressure will increase. > > Here is a summary for performance result on X86-64 and ARM. > > For X86-64, I run SPEC2000 INT and FP (-O3). There is no improvement > or regression. As tests, I moved the code segment to end of function > try_replace_reg and check insns which meet "success && new_cost > > old_cost". Logs show only 52 occurrences for all SPEC2000 build and > the only one instruction pattern: *adddi_1 is impacted. For *adddi_1, > rtx_cost increases from 8 to 10 when changing a register operand to a > constant. > > For ARM Cortex-M4, minimal changes for Coremark, Dhrystone and EEMBC. > For ARM Chrome book (Cortex-A15), some wave in SPEC2000 INT test. But > the final result does not show improvement or regression. > > The patch is updated to remove the "bogus" code and keep more constants. > > Bootstrap and no make check regression on X86-64, i686 and ARM. So with no notable improvements, do you still want this patch to go forward? You certainly need a testcase. It's fine if it's ARM specific, though obviously you get wider testing if you've got an x86_64 test. > > diff --git a/gcc/cprop.c b/gcc/cprop.c > index aef3ee8..6ea6be0 100644 > --- a/gcc/cprop.c > +++ b/gcc/cprop.c > @@ -733,6 +733,28 @@ try_replace_reg (rtx from, rtx to, rtx insn) > rtx src = 0; > int success = 0; > rtx set = single_set (insn); > + int old_cost = 0; > + bool const_p = false; > + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); > + > + if (set && SET_SRC (set)) > + { > + rtx src = SET_SRC (set); > + if (REG_P (src) || GET_CODE (src) == SUBREG) > + const_p = true; > + else > + { > + if (note != 0 > + && REG_NOTE_KIND (note) == REG_EQUAL > + && (GET_CODE (XEXP (note, 0)) == CONST > + || CONSTANT_P (XEXP (note, 0)))) > + { > + const_p = true; > + } > + else > + old_cost = set_rtx_cost (set, speed); > + } > + } Can you come up with a better name than "const_p". I see that and my first though is "that variable indicates if some object is a constant". But that's not how its used here. > > /* Usually we substitute easy stuff, so we won't copy everything. > We however need to take care to not duplicate non-trivial CONST > @@ -740,6 +762,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) > to = copy_rtx (to); > > validate_replace_src_group (from, to, insn); > + > + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. > + And it can be shared by different references. So skip propagation if > + it makes INSN's rtx cost higher. */ > + if (set && SET_SRC (set) && !const_p && CONSTANT_P (to)) > + { > + if (!CONSTANT_P (SET_SRC (set)) > + && (set_rtx_cost (set, speed) > old_cost)) > + { > + cancel_changes (0); > + return false; > + } > + } > + > if (num_changes_pending () && apply_change_group ()) > success = 1; I guess this needs to be after replacement so that you can easily compute the new cost. Another case where our costing API is lame :( I'm not going to have you fix that. Why the !CONSTANT_P check? Why not just let the rtx costing alone determine if we want to avoid this propagation? jeff ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-18 5:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-06-17 2:11 [PATCH, cprop] Check rtx_cost when propagating constant Zhenqiang Chen 2014-06-17 8:15 ` Richard Biener 2014-06-17 9:42 ` Zhenqiang Chen 2014-06-19 9:45 ` Zhenqiang Chen 2014-07-18 5:05 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).