* [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization @ 2020-03-26 10:06 luoxhu 2020-03-26 15:18 ` will schmidt 2020-03-27 14:33 ` Segher Boessenkool 0 siblings, 2 replies; 11+ messages in thread From: luoxhu @ 2020-03-26 10:06 UTC (permalink / raw) To: gcc-patches; +Cc: segher, wschmidt, guojiufu, linkw, Xionghu Luo From: Xionghu Luo <luoxhu@linux.ibm.com> Remove split code from add<mode>3 to allow a later pass to split. This allows later logic to hoist out constant load in add instructions. In loop, lis+ori could be hoisted out to improve performance compared with previous addis+addi (About 15% on typical case), weak point is one more register is used and one more instruction is generated. i.e.: addis 3,3,0x8765 addi 3,3,0x4321 => lis 9,0x8765 ori 9,9,0x4321 add 3,3,9 gcc/ChangeLog: 2020-03-26 Xiong Hu Luo <luoxhu@linux.ibm.com> * config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant to temp register before add. gcc/testsuite/ChangeLog: 2020-03-26 Xiong Hu Luo <luoxhu@linux.ibm.com> * gcc.target/powerpc/add-const.c: New. --- gcc/config/rs6000/rs6000.md | 25 ++------------------ gcc/testsuite/gcc.target/powerpc/add-const.c | 18 ++++++++++++++ 2 files changed, 20 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/add-const.c diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index ad88b6783af..72f3f604e0d 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -1733,29 +1733,8 @@ (define_expand "add<mode>3" || rtx_equal_p (operands[0], operands[1])) ? operands[0] : gen_reg_rtx (<MODE>mode)); - /* Adding a constant to r0 is not a valid insn, so use a different - strategy in that case. */ - if (reg_or_subregno (operands[1]) == 0 || reg_or_subregno (tmp) == 0) - { - if (operands[0] == operands[1]) - FAIL; - rs6000_emit_move (operands[0], operands[2], <MODE>mode); - emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[0])); - DONE; - } - - HOST_WIDE_INT val = INTVAL (operands[2]); - HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000; - HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode); - - if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT (rest))) - FAIL; - - /* The ordering here is important for the prolog expander. - When space is allocated from the stack, adding 'low' first may - produce a temporary deallocation (which would be bad). */ - emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest))); - emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low))); + rs6000_emit_move (tmp, operands[2], <MODE>mode); + emit_insn (gen_add<mode>3 (operands[0], operands[1], tmp)); DONE; } }) diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c b/gcc/testsuite/gcc.target/powerpc/add-const.c new file mode 100644 index 00000000000..e1007247b32 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/add-const.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { lp64 } } } */ +/* { dg-options "-O3 -fno-unroll-loops" } */ + +/* Ensure the lis,ori are generated, which indicates they have + been hoisted outside of the loop. */ + +typedef unsigned long ulong; +ulong +foo (ulong n, ulong h) +{ + int i; + for (i = 0; i < n; i++) + h = ((h + 8) | h) + 0x87654321; + return h; +} + +/* { dg-final { scan-assembler-times {\mlis\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mori\M} 1 } } */ -- 2.21.0.777.g83232e3864 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-03-26 10:06 [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization luoxhu @ 2020-03-26 15:18 ` will schmidt 2020-03-27 14:33 ` Segher Boessenkool 1 sibling, 0 replies; 11+ messages in thread From: will schmidt @ 2020-03-26 15:18 UTC (permalink / raw) To: luoxhu, gcc-patches; +Cc: wschmidt, linkw, segher On Thu, 2020-03-26 at 05:06 -0500, luoxhu--- via Gcc-patches wrote: > From: Xionghu Luo <luoxhu@linux.ibm.com> > > Remove split code from add<mode>3 to allow a later pass to split. > This allows later logic to hoist out constant load in add > instructions. > In loop, lis+ori could be hoisted out to improve performance compared > with > previous addis+addi (About 15% on typical case), weak point is > one more register is used and one more instruction is > generated. i.e.: > > addis 3,3,0x8765 > addi 3,3,0x4321 > > => > > lis 9,0x8765 > ori 9,9,0x4321 > add 3,3,9 LGTM. :-) I defer to Segher for his review & approval, etc. Thanks, -Will > > gcc/ChangeLog: > > 2020-03-26 Xiong Hu Luo <luoxhu@linux.ibm.com> > > * config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant > to temp register before add. > > gcc/testsuite/ChangeLog: > > 2020-03-26 Xiong Hu Luo <luoxhu@linux.ibm.com> > > * gcc.target/powerpc/add-const.c: New. > --- > gcc/config/rs6000/rs6000.md | 25 ++-------------- > ---- > gcc/testsuite/gcc.target/powerpc/add-const.c | 18 ++++++++++++++ > 2 files changed, 20 insertions(+), 23 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/add-const.c > > diff --git a/gcc/config/rs6000/rs6000.md > b/gcc/config/rs6000/rs6000.md > index ad88b6783af..72f3f604e0d 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -1733,29 +1733,8 @@ (define_expand "add<mode>3" > || rtx_equal_p (operands[0], operands[1])) > ? operands[0] : gen_reg_rtx (<MODE>mode)); > > - /* Adding a constant to r0 is not a valid insn, so use a > different > - strategy in that case. */ > - if (reg_or_subregno (operands[1]) == 0 || reg_or_subregno > (tmp) == 0) > - { > - if (operands[0] == operands[1]) > - FAIL; > - rs6000_emit_move (operands[0], operands[2], <MODE>mode); > - emit_insn (gen_add<mode>3 (operands[0], operands[1], > operands[0])); > - DONE; > - } > - > - HOST_WIDE_INT val = INTVAL (operands[2]); > - HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000; > - HOST_WIDE_INT rest = trunc_int_for_mode (val - low, > <MODE>mode); > - > - if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT > (rest))) > - FAIL; > - > - /* The ordering here is important for the prolog expander. > - When space is allocated from the stack, adding 'low' first may > - produce a temporary deallocation (which would be bad). */ > - emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest))); > - emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low))); > + rs6000_emit_move (tmp, operands[2], <MODE>mode); > + emit_insn (gen_add<mode>3 (operands[0], operands[1], tmp)); > DONE; > } > }) > diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c > b/gcc/testsuite/gcc.target/powerpc/add-const.c > new file mode 100644 > index 00000000000..e1007247b32 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/add-const.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile { target { lp64 } } } */ > +/* { dg-options "-O3 -fno-unroll-loops" } */ > + > +/* Ensure the lis,ori are generated, which indicates they have > + been hoisted outside of the loop. */ > + > +typedef unsigned long ulong; > +ulong > +foo (ulong n, ulong h) > +{ > + int i; > + for (i = 0; i < n; i++) > + h = ((h + 8) | h) + 0x87654321; > + return h; > +} > + > +/* { dg-final { scan-assembler-times {\mlis\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mori\M} 1 } } */ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-03-26 10:06 [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization luoxhu 2020-03-26 15:18 ` will schmidt @ 2020-03-27 14:33 ` Segher Boessenkool 2020-03-30 3:59 ` luoxhu 1 sibling, 1 reply; 11+ messages in thread From: Segher Boessenkool @ 2020-03-27 14:33 UTC (permalink / raw) To: luoxhu; +Cc: gcc-patches, wschmidt, guojiufu, linkw Hi! On Thu, Mar 26, 2020 at 05:06:43AM -0500, luoxhu@linux.ibm.com wrote: > Remove split code from add<mode>3 to allow a later pass to split. > This allows later logic to hoist out constant load in add instructions. > In loop, lis+ori could be hoisted out to improve performance compared with > previous addis+addi (About 15% on typical case), weak point is > one more register is used and one more instruction is generated. i.e.: > > addis 3,3,0x8765 > addi 3,3,0x4321 > > => > > lis 9,0x8765 > ori 9,9,0x4321 > add 3,3,9 What does it do overall? Say, to SPEC. What does it do to execution time, and what does it do to binary size? Do we want something later in the RTL pipeline to make "addi"s etc. again? > 2020-03-26 Xiong Hu Luo <luoxhu@linux.ibm.com> > > * config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant > to temp register before add. This should not be indented, so just: * config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant to temp register before add. We have six separate add<mode>3 patterns (three of those are in rs6000.md, too). You can write something like (add<mode>3 for SDI): to show which iterator (or mode) this one is for. This is helpful with any <mode> or <code> or the like, even if there (currently) is only one pattern you could mean. > diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c b/gcc/testsuite/gcc.target/powerpc/add-const.c > new file mode 100644 > index 00000000000..e1007247b32 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/add-const.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile { target { lp64 } } } */ > +/* { dg-options "-O3 -fno-unroll-loops" } */ > + > +/* Ensure the lis,ori are generated, which indicates they have > + been hoisted outside of the loop. */ This sounds a bit fragile -- but we'll see :-) It is small code, so it probably will be fine, it won't change its generated code often. Segher ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-03-27 14:33 ` Segher Boessenkool @ 2020-03-30 3:59 ` luoxhu 2020-04-02 22:16 ` Segher Boessenkool 0 siblings, 1 reply; 11+ messages in thread From: luoxhu @ 2020-03-30 3:59 UTC (permalink / raw) To: Segher Boessenkool; +Cc: gcc-patches, wschmidt, guojiufu, linkw On 2020/3/27 22:33, Segher Boessenkool wrote: > Hi! > > On Thu, Mar 26, 2020 at 05:06:43AM -0500, luoxhu@linux.ibm.com wrote: >> Remove split code from add<mode>3 to allow a later pass to split. >> This allows later logic to hoist out constant load in add instructions. >> In loop, lis+ori could be hoisted out to improve performance compared with >> previous addis+addi (About 15% on typical case), weak point is >> one more register is used and one more instruction is generated. i.e.: >> >> addis 3,3,0x8765 >> addi 3,3,0x4321 >> >> => >> >> lis 9,0x8765 >> ori 9,9,0x4321 >> add 3,3,9 > > What does it do overall? Say, to SPEC. What does it do to execution > time, and what does it do to binary size? > > Do we want something later in the RTL pipeline to make "addi"s etc. again? > >> 2020-03-26 Xiong Hu Luo <luoxhu@linux.ibm.com> >> >> * config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant >> to temp register before add. > > This should not be indented, so just: > * config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant > to temp register before add. > > We have six separate add<mode>3 patterns (three of those are in rs6000.md, > too). You can write something like > (add<mode>3 for SDI): > to show which iterator (or mode) this one is for. This is helpful with > any <mode> or <code> or the like, even if there (currently) is only one > pattern you could mean. > Thanks, it is necessary to re-enable split add as some later RTL passes like final will still need generate addis+addi (case: g++.dg/opt/thunk1.C ). Update the patch as below: [PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization Don't split code from add<mode>3 for SDI to allow a later pass to split. This allows later logic to hoist out constant load in add instructions. In loop, lis+ori could be hoisted out to improve performance compared with previous addis+addi (About 15% on typical case), weak point is one more register is used and one more instruction is generated. i.e.: addis 3,3,0x8765 addi 3,3,0x4321 => lis 9,0x8765 ori 9,9,0x4321 add 3,3,9 No obvious performance and binary size change to SPEC2017. gcc/ChangeLog: 2020-03-30 Xiong Hu Luo <luoxhu@linux.ibm.com> * config/rs6000/rs6000.md (add<mode>3 for SDI): Don't split before reload, move constant to temp register for add. gcc/testsuite/ChangeLog: 2020-03-26 Xiong Hu Luo <luoxhu@linux.ibm.com> * gcc.target/powerpc/add-const.c: New. --- gcc/config/rs6000/rs6000.md | 51 +++++++++----------- gcc/testsuite/gcc.target/powerpc/add-const.c | 18 +++++++ 2 files changed, 42 insertions(+), 27 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/add-const.c diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index ad88b6783af..76af3d5b1d9 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -1729,34 +1729,31 @@ (define_expand "add<mode>3" if (CONST_INT_P (operands[2]) && !add_operand (operands[2], <MODE>mode)) { - rtx tmp = ((!can_create_pseudo_p () - || rtx_equal_p (operands[0], operands[1])) - ? operands[0] : gen_reg_rtx (<MODE>mode)); - - /* Adding a constant to r0 is not a valid insn, so use a different - strategy in that case. */ - if (reg_or_subregno (operands[1]) == 0 || reg_or_subregno (tmp) == 0) - { - if (operands[0] == operands[1]) - FAIL; - rs6000_emit_move (operands[0], operands[2], <MODE>mode); - emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[0])); - DONE; - } - - HOST_WIDE_INT val = INTVAL (operands[2]); - HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000; - HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode); - - if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT (rest))) - FAIL; + if (can_create_pseudo_p ()) + { + rtx tmp = rtx_equal_p (operands[0], operands[1]) + ? operands[0] : gen_reg_rtx (<MODE>mode); - /* The ordering here is important for the prolog expander. - When space is allocated from the stack, adding 'low' first may - produce a temporary deallocation (which would be bad). */ - emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest))); - emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low))); - DONE; + rs6000_emit_move (tmp, operands[2], <MODE>mode); + emit_insn (gen_add<mode>3 (operands[0], operands[1], tmp)); + DONE; + } + else + { + HOST_WIDE_INT val = INTVAL (operands[2]); + HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000; + HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode); + + if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT (rest))) + FAIL; + + /* The ordering here is important for the prolog expander. + When space is allocated from the stack, adding 'low' first may + produce a temporary deallocation (which would be bad). */ + emit_insn (gen_add<mode>3 (operands[0], operands[1], GEN_INT (rest))); + emit_insn (gen_add<mode>3 (operands[0], operands[0], GEN_INT (low))); + DONE; + } } }) diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c b/gcc/testsuite/gcc.target/powerpc/add-const.c new file mode 100644 index 00000000000..e1007247b32 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/add-const.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { lp64 } } } */ +/* { dg-options "-O3 -fno-unroll-loops" } */ + +/* Ensure the lis,ori are generated, which indicates they have + been hoisted outside of the loop. */ + +typedef unsigned long ulong; +ulong +foo (ulong n, ulong h) +{ + int i; + for (i = 0; i < n; i++) + h = ((h + 8) | h) + 0x87654321; + return h; +} + +/* { dg-final { scan-assembler-times {\mlis\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mori\M} 1 } } */ -- 2.21.0.777.g83232e3864 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-03-30 3:59 ` luoxhu @ 2020-04-02 22:16 ` Segher Boessenkool 2020-04-03 6:13 ` luoxhu 0 siblings, 1 reply; 11+ messages in thread From: Segher Boessenkool @ 2020-04-02 22:16 UTC (permalink / raw) To: luoxhu; +Cc: gcc-patches, wschmidt, guojiufu, linkw Hi! On Mon, Mar 30, 2020 at 11:59:57AM +0800, luoxhu wrote: > > Do we want something later in the RTL pipeline to make "addi"s etc. again? (This would be a good thing to consider -- maybe a define_insn_and_split will work. But see below). > [PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization > > Don't split code from add<mode>3 for SDI to allow a later pass to split. > This allows later logic to hoist out constant load in add instructions. > In loop, lis+ori could be hoisted out to improve performance compared with > previous addis+addi (About 15% on typical case), weak point is > one more register is used and one more instruction is generated. i.e.: > > addis 3,3,0x8765 > addi 3,3,0x4321 > > => > > lis 9,0x8765 > ori 9,9,0x4321 > add 3,3,9 (This patch will of course have to wait for stage 1). Such a define_insn_and_split could be for an add of a (signed) 32-bit immediate. combine will try to combine the three insns (lis;ori;add), and match the new pattern. This also links in with Alan's work on big immediates, and with paddi insns, etc. Segher ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-04-02 22:16 ` Segher Boessenkool @ 2020-04-03 6:13 ` luoxhu 2020-04-03 12:13 ` Alan Modra 2020-04-03 21:08 ` Segher Boessenkool 0 siblings, 2 replies; 11+ messages in thread From: luoxhu @ 2020-04-03 6:13 UTC (permalink / raw) To: Segher Boessenkool; +Cc: gcc-patches, wschmidt, guojiufu, linkw On 2020/4/3 06:16, Segher Boessenkool wrote: > Hi! > > On Mon, Mar 30, 2020 at 11:59:57AM +0800, luoxhu wrote: >>> Do we want something later in the RTL pipeline to make "addi"s etc. again? > > (This would be a good thing to consider -- maybe a define_insn_and_split > will work. But see below). > >> [PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization >> >> Don't split code from add<mode>3 for SDI to allow a later pass to split. >> This allows later logic to hoist out constant load in add instructions. >> In loop, lis+ori could be hoisted out to improve performance compared with >> previous addis+addi (About 15% on typical case), weak point is >> one more register is used and one more instruction is generated. i.e.: >> >> addis 3,3,0x8765 >> addi 3,3,0x4321 >> >> => >> >> lis 9,0x8765 >> ori 9,9,0x4321 >> add 3,3,9 > > (This patch will of course have to wait for stage 1). > > Such a define_insn_and_split could be for an add of a (signed) 32-bit > immediate. combine will try to combine the three insns (lis;ori;add), > and match the new pattern. Currently 286r.split2 will split "12:%9:DI=0x87654321" to lis+ori by rs6000_emit_set_const of define_split, do you mean add new define_insn_and_split to do the split? Another patch to do this after this one goes upstream in stage 1? > > This also links in with Alan's work on big immediates, and with paddi > insns, etc. Seems PR94393? Yes, rs6000_emit_set_const calls rs6000_emit_set_long_const for DImode. I tried unsigned long like 0xabcd87654321, 0xffffabcd87654321 and 0xc000000000000000ULL, All of them are outside of loop even without my patch. No difference with or without Alan's patch. 0xabcd87654321: li 9,0 ori 9,9,0xabcd sldi 9,9,32 oris 9,9,0x8765 ori 9,9,0x4321 0xffffabcd87654321: lis 9,0xabcd ori 9,9,0x8765 sldi 9,9,16 ori 9,9,0x4321 0xc000000000000000ULL: li 9,-1 rldicr 9,9,0,1 Thanks, Xionghu > > > Segher > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-04-03 6:13 ` luoxhu @ 2020-04-03 12:13 ` Alan Modra 2020-04-03 21:11 ` Segher Boessenkool 2020-04-03 21:08 ` Segher Boessenkool 1 sibling, 1 reply; 11+ messages in thread From: Alan Modra @ 2020-04-03 12:13 UTC (permalink / raw) To: luoxhu; +Cc: Segher Boessenkool, wschmidt, gcc-patches, linkw On Fri, Apr 03, 2020 at 02:13:06PM +0800, luoxhu via Gcc-patches wrote: > Seems PR94393? Yes, rs6000_emit_set_const calls rs6000_emit_set_long_const for DImode. > I tried unsigned long like 0xabcd87654321, 0xffffabcd87654321 and 0xc000000000000000ULL, > All of them are outside of loop even without my patch. No difference with or without > Alan's patch. Segher probably meant the part I'm working on and haven't posted yet, fixing lots of problems with rs6000_rtx_costs. One of the improvements I'm aiming for is that we should be able to emit code that loads constants from memory without following optimisation passes converting the loads from memory to those long sequences of dependent instructions. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-04-03 12:13 ` Alan Modra @ 2020-04-03 21:11 ` Segher Boessenkool 2020-04-03 23:51 ` Alan Modra 0 siblings, 1 reply; 11+ messages in thread From: Segher Boessenkool @ 2020-04-03 21:11 UTC (permalink / raw) To: Alan Modra; +Cc: luoxhu, wschmidt, gcc-patches, linkw On Fri, Apr 03, 2020 at 10:43:49PM +1030, Alan Modra wrote: > On Fri, Apr 03, 2020 at 02:13:06PM +0800, luoxhu via Gcc-patches wrote: > > Seems PR94393? Yes, rs6000_emit_set_const calls rs6000_emit_set_long_const for DImode. > > I tried unsigned long like 0xabcd87654321, 0xffffabcd87654321 and 0xc000000000000000ULL, > > All of them are outside of loop even without my patch. No difference with or without > > Alan's patch. > > Segher probably meant the part I'm working on and haven't posted yet, > fixing lots of problems with rs6000_rtx_costs. I meant PR94393 in fact, but yeah, this is connected *everywhere* :-) > One of the improvements > I'm aiming for is that we should be able to emit code that loads > constants from memory without following optimisation passes converting > the loads from memory to those long sequences of dependent instructions. Yeah. Looking forward to it :-) Segher ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-04-03 21:11 ` Segher Boessenkool @ 2020-04-03 23:51 ` Alan Modra 2020-04-07 21:47 ` Segher Boessenkool 0 siblings, 1 reply; 11+ messages in thread From: Alan Modra @ 2020-04-03 23:51 UTC (permalink / raw) To: Segher Boessenkool; +Cc: luoxhu, wschmidt, gcc-patches, linkw On Fri, Apr 03, 2020 at 04:11:32PM -0500, Segher Boessenkool wrote: > On Fri, Apr 03, 2020 at 10:43:49PM +1030, Alan Modra wrote: > > Segher probably meant the part I'm working on and haven't posted yet, > > fixing lots of problems with rs6000_rtx_costs. > > I meant PR94393 in fact, but yeah, this is connected *everywhere* :-) > > > One of the improvements > > I'm aiming for is that we should be able to emit code that loads > > constants from memory without following optimisation passes converting > > the loads from memory to those long sequences of dependent instructions. > > Yeah. Looking forward to it :-) I have a series of small patches already, the most significant so far being the one that adds the following comment to rs6000_rtx_costs: "Calls from places like optabs.c:avoid_expensive_constant will come here with OUTER_CODE set to an operation such as AND with X being a CONST_INT or other CONSTANT_P type. This will be compared against set_src_cost, where we'll come here with OUTER_CODE as SET and X the same constant. Calls from places like default_noce_conversion_profitable_p will come here via seq_cost and pass the pattern of a SET insn in X. The SET isn't handled here so *TOTAL will remain as COSTS_N_INSNS(1) multiplied by the number of words being set. Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs will next see the set_src. Continuing the example of an AND, this might be an AND of two other regs. This AND should cost zero here since the insn cost has already been counted. The overall cost value should be comparable to rs6000_insn_cost." It pays to figure out what you need to do before doing anything. :-) Those two paragraphs will result in quite a few changes. The first one means that, for example, the rs6000_is_valid_and_mask test belongs under case CONST_INT as || (outer_code == AND && rs6000_is_valid_and_mask (x, mode)) not under case AND. The second paragraph says we are costing practically all operations too highly. I'm still in analysis mode, worried about whether gcc generates deep rtl expressions to pass to rtx_cost. I have a vague recollection of seeing that years ago, but it looks like most places with anything complex generate a sequence of insns and cost the sequence. If we do have expressions like (set (reg1) (and (plus (reg2) cst1) cst2)) then rs6000_rtx_cost should recognise that AND as costing an extra insn. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-04-03 23:51 ` Alan Modra @ 2020-04-07 21:47 ` Segher Boessenkool 0 siblings, 0 replies; 11+ messages in thread From: Segher Boessenkool @ 2020-04-07 21:47 UTC (permalink / raw) To: Alan Modra; +Cc: luoxhu, wschmidt, gcc-patches, linkw Hi! On Sat, Apr 04, 2020 at 10:21:00AM +1030, Alan Modra wrote: > I have a series of small patches already, the most significant so far > being the one that adds the following comment to rs6000_rtx_costs: > > "Calls from places like optabs.c:avoid_expensive_constant will come > here with OUTER_CODE set to an operation such as AND with X being a > CONST_INT or other CONSTANT_P type. This will be compared against > set_src_cost, where we'll come here with OUTER_CODE as SET and X > the same constant. > > Calls from places like default_noce_conversion_profitable_p will > come here via seq_cost and pass the pattern of a SET insn in X. That uses set_rtx_cost, which still doesn't use insn_cost, yes. set_rtx_cost and set_src_cost are the two biggest things that still need rtx_cost. > The SET isn't handled here so *TOTAL will remain as > COSTS_N_INSNS(1) multiplied by the number of words being set. > Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs > will next see the set_src. Continuing the example of an AND, this > might be an AND of two other regs. This AND should cost zero here > since the insn cost has already been counted. The overall cost > value should be comparable to rs6000_insn_cost." The biggest problem in converting it is that sometimes there is no valid code yet, and constructing it isn't easy either, but some code still wants to know the cost :-/ > It pays to figure out what you need to do before doing anything. :-) Yes :-) > Those two paragraphs will result in quite a few changes. The first > one means that, for example, the rs6000_is_valid_and_mask test belongs > under case CONST_INT as > || (outer_code == AND > && rs6000_is_valid_and_mask (x, mode)) > not under case AND. Does it? Isn't the rtx_cost logic inside-out enough already? > The second paragraph says we are costing practically all operations > too highly. Another huge advantage of insn_cost. > I'm still in analysis mode, worried about whether gcc generates deep > rtl expressions to pass to rtx_cost. Yes :-( > I have a vague recollection of > seeing that years ago, but it looks like most places with anything > complex generate a sequence of insns and cost the sequence. Sure, and anything not a single_set will use insn_cost, not set_rtx_cost, and sanity prevails. > If we do > have expressions like (set (reg1) (and (plus (reg2) cst1) cst2)) then > rs6000_rtx_cost should recognise that AND as costing an extra insn. Well, what does such an expression *mean*, if there is no instruction like that? How should we estimate the cost if we do not know what insns it will generate? Segher ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization 2020-04-03 6:13 ` luoxhu 2020-04-03 12:13 ` Alan Modra @ 2020-04-03 21:08 ` Segher Boessenkool 1 sibling, 0 replies; 11+ messages in thread From: Segher Boessenkool @ 2020-04-03 21:08 UTC (permalink / raw) To: luoxhu; +Cc: gcc-patches, wschmidt, guojiufu, linkw Hi! On Fri, Apr 03, 2020 at 02:13:06PM +0800, luoxhu wrote: > Currently 286r.split2 will split "12:%9:DI=0x87654321" to lis+ori by > rs6000_emit_set_const of define_split, do you mean add new define_insn_and_split > to do the split? Another patch to do this after this one goes upstream in stage 1? I mean a splitter that recognises reg := reg + imm for a big immediate, and then does addis+addi again if that is the best way to do it. We expand as a load of the 32-bit constant, which means it can be hoisted just fine, and when we can use paddi we can; but when we cannot, we can still split it again. > > This also links in with Alan's work on big immediates, and with paddi > > insns, etc. > > Seems PR94393? Yes, PR94393. And everything else that will show up if we start pulling at that string (which we should now, btw, for paddi). Segher ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-04-07 21:47 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-26 10:06 [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization luoxhu 2020-03-26 15:18 ` will schmidt 2020-03-27 14:33 ` Segher Boessenkool 2020-03-30 3:59 ` luoxhu 2020-04-02 22:16 ` Segher Boessenkool 2020-04-03 6:13 ` luoxhu 2020-04-03 12:13 ` Alan Modra 2020-04-03 21:11 ` Segher Boessenkool 2020-04-03 23:51 ` Alan Modra 2020-04-07 21:47 ` Segher Boessenkool 2020-04-03 21:08 ` Segher Boessenkool
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).