public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
@ 2020-03-26 10:06 luoxhu
  2020-03-26 15:18 ` will schmidt
  2020-03-27 14:33 ` Segher Boessenkool
  0 siblings, 2 replies; 11+ messages in thread
From: luoxhu @ 2020-03-26 10:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt, guojiufu, linkw, Xionghu Luo

From: Xionghu Luo <luoxhu@linux.ibm.com>

Remove split code from add<mode>3 to allow a later pass to split.
This allows later logic to hoist out constant load in add instructions.
In loop, lis+ori could be hoisted out to improve performance compared with
previous addis+addi (About 15% on typical case), weak point is
one more register is used and one more instruction is generated.  i.e.:

addis 3,3,0x8765
addi 3,3,0x4321

=>

lis 9,0x8765
ori 9,9,0x4321
add 3,3,9

gcc/ChangeLog:

2020-03-26  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	* config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant
	  to temp register before add.

gcc/testsuite/ChangeLog:

2020-03-26  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	* gcc.target/powerpc/add-const.c: New.
---
 gcc/config/rs6000/rs6000.md                  | 25 ++------------------
 gcc/testsuite/gcc.target/powerpc/add-const.c | 18 ++++++++++++++
 2 files changed, 20 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/add-const.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ad88b6783af..72f3f604e0d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1733,29 +1733,8 @@ (define_expand "add<mode>3"
 		  || rtx_equal_p (operands[0], operands[1]))
 		 ? operands[0] : gen_reg_rtx (<MODE>mode));
 
-      /* Adding a constant to r0 is not a valid insn, so use a different
-	 strategy in that case.  */
-      if (reg_or_subregno (operands[1]) == 0 || reg_or_subregno (tmp) == 0)
-	{
-	  if (operands[0] == operands[1])
-	    FAIL;
-	  rs6000_emit_move (operands[0], operands[2], <MODE>mode);
-	  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[0]));
-	  DONE;
-	}
-
-      HOST_WIDE_INT val = INTVAL (operands[2]);
-      HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000;
-      HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode);
-
-      if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT (rest)))
-	FAIL;
-
-      /* The ordering here is important for the prolog expander.
-	 When space is allocated from the stack, adding 'low' first may
-	 produce a temporary deallocation (which would be bad).  */
-      emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest)));
-      emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low)));
+      rs6000_emit_move (tmp, operands[2], <MODE>mode);
+      emit_insn (gen_add<mode>3 (operands[0], operands[1], tmp));
       DONE;
     }
 })
diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c b/gcc/testsuite/gcc.target/powerpc/add-const.c
new file mode 100644
index 00000000000..e1007247b32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/add-const.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O3 -fno-unroll-loops" } */
+
+/* Ensure the lis,ori are generated, which indicates they have
+   been hoisted outside of the loop.  */
+
+typedef unsigned long ulong;
+ulong
+foo (ulong n, ulong h)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    h = ((h + 8) | h) + 0x87654321;
+  return h;
+}
+
+/* { dg-final { scan-assembler-times {\mlis\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mori\M} 1 } } */
-- 
2.21.0.777.g83232e3864


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-03-26 10:06 [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization luoxhu
@ 2020-03-26 15:18 ` will schmidt
  2020-03-27 14:33 ` Segher Boessenkool
  1 sibling, 0 replies; 11+ messages in thread
From: will schmidt @ 2020-03-26 15:18 UTC (permalink / raw)
  To: luoxhu, gcc-patches; +Cc: wschmidt, linkw, segher

On Thu, 2020-03-26 at 05:06 -0500, luoxhu--- via Gcc-patches wrote:
> From: Xionghu Luo <luoxhu@linux.ibm.com>
> 
> Remove split code from add<mode>3 to allow a later pass to split.
> This allows later logic to hoist out constant load in add
> instructions.
> In loop, lis+ori could be hoisted out to improve performance compared
> with
> previous addis+addi (About 15% on typical case), weak point is
> one more register is used and one more instruction is
> generated.  i.e.:
> 
> addis 3,3,0x8765
> addi 3,3,0x4321
> 
> =>
> 
> lis 9,0x8765
> ori 9,9,0x4321
> add 3,3,9

LGTM.  :-)
I defer to Segher for his review & approval, etc.

Thanks,
-Will

> 
> gcc/ChangeLog:
> 
> 2020-03-26  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	* config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant
> 	  to temp register before add.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-03-26  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	* gcc.target/powerpc/add-const.c: New.
> ---
>  gcc/config/rs6000/rs6000.md                  | 25 ++--------------
> ----
>  gcc/testsuite/gcc.target/powerpc/add-const.c | 18 ++++++++++++++
>  2 files changed, 20 insertions(+), 23 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/add-const.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md
> b/gcc/config/rs6000/rs6000.md
> index ad88b6783af..72f3f604e0d 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -1733,29 +1733,8 @@ (define_expand "add<mode>3"
>  		  || rtx_equal_p (operands[0], operands[1]))
>  		 ? operands[0] : gen_reg_rtx (<MODE>mode));
> 
> -      /* Adding a constant to r0 is not a valid insn, so use a
> different
> -	 strategy in that case.  */
> -      if (reg_or_subregno (operands[1]) == 0 || reg_or_subregno
> (tmp) == 0)
> -	{
> -	  if (operands[0] == operands[1])
> -	    FAIL;
> -	  rs6000_emit_move (operands[0], operands[2], <MODE>mode);
> -	  emit_insn (gen_add<mode>3 (operands[0], operands[1],
> operands[0]));
> -	  DONE;
> -	}
> -
> -      HOST_WIDE_INT val = INTVAL (operands[2]);
> -      HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000;
> -      HOST_WIDE_INT rest = trunc_int_for_mode (val - low,
> <MODE>mode);
> -
> -      if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT
> (rest)))
> -	FAIL;
> -
> -      /* The ordering here is important for the prolog expander.
> -	 When space is allocated from the stack, adding 'low' first may
> -	 produce a temporary deallocation (which would be bad).  */
> -      emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest)));
> -      emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low)));
> +      rs6000_emit_move (tmp, operands[2], <MODE>mode);
> +      emit_insn (gen_add<mode>3 (operands[0], operands[1], tmp));
>        DONE;
>      }
>  })
> diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c
> b/gcc/testsuite/gcc.target/powerpc/add-const.c
> new file mode 100644
> index 00000000000..e1007247b32
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/add-const.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { lp64 } } } */
> +/* { dg-options "-O3 -fno-unroll-loops" } */
> +
> +/* Ensure the lis,ori are generated, which indicates they have
> +   been hoisted outside of the loop.  */
> +
> +typedef unsigned long ulong;
> +ulong
> +foo (ulong n, ulong h)
> +{
> +  int i;
> +  for (i = 0; i < n; i++)
> +    h = ((h + 8) | h) + 0x87654321;
> +  return h;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mlis\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mori\M} 1 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-03-26 10:06 [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization luoxhu
  2020-03-26 15:18 ` will schmidt
@ 2020-03-27 14:33 ` Segher Boessenkool
  2020-03-30  3:59   ` luoxhu
  1 sibling, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2020-03-27 14:33 UTC (permalink / raw)
  To: luoxhu; +Cc: gcc-patches, wschmidt, guojiufu, linkw

Hi!

On Thu, Mar 26, 2020 at 05:06:43AM -0500, luoxhu@linux.ibm.com wrote:
> Remove split code from add<mode>3 to allow a later pass to split.
> This allows later logic to hoist out constant load in add instructions.
> In loop, lis+ori could be hoisted out to improve performance compared with
> previous addis+addi (About 15% on typical case), weak point is
> one more register is used and one more instruction is generated.  i.e.:
> 
> addis 3,3,0x8765
> addi 3,3,0x4321
> 
> =>
> 
> lis 9,0x8765
> ori 9,9,0x4321
> add 3,3,9

What does it do overall?  Say, to SPEC.  What does it do to execution
time, and what does it do to binary size?

Do we want something later in the RTL pipeline to make "addi"s etc. again?

> 2020-03-26  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	* config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant
> 	  to temp register before add.

This should not be indented, so just:
	* config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant
	to temp register before add.

We have six separate add<mode>3 patterns (three of those are in rs6000.md,
too).  You can write something like
	(add<mode>3 for SDI):
to show which iterator (or mode) this one is for.  This is helpful with
any <mode> or <code> or the like, even if there (currently) is only one
pattern you could mean.

> diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c b/gcc/testsuite/gcc.target/powerpc/add-const.c
> new file mode 100644
> index 00000000000..e1007247b32
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/add-const.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { lp64 } } } */
> +/* { dg-options "-O3 -fno-unroll-loops" } */
> +
> +/* Ensure the lis,ori are generated, which indicates they have
> +   been hoisted outside of the loop.  */

This sounds a bit fragile -- but we'll see :-)  It is small code, so it
probably will be fine, it won't change its generated code often.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-03-27 14:33 ` Segher Boessenkool
@ 2020-03-30  3:59   ` luoxhu
  2020-04-02 22:16     ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: luoxhu @ 2020-03-30  3:59 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, wschmidt, guojiufu, linkw



On 2020/3/27 22:33, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Mar 26, 2020 at 05:06:43AM -0500, luoxhu@linux.ibm.com wrote:
>> Remove split code from add<mode>3 to allow a later pass to split.
>> This allows later logic to hoist out constant load in add instructions.
>> In loop, lis+ori could be hoisted out to improve performance compared with
>> previous addis+addi (About 15% on typical case), weak point is
>> one more register is used and one more instruction is generated.  i.e.:
>>
>> addis 3,3,0x8765
>> addi 3,3,0x4321
>>
>> =>
>>
>> lis 9,0x8765
>> ori 9,9,0x4321
>> add 3,3,9
> 
> What does it do overall?  Say, to SPEC.  What does it do to execution
> time, and what does it do to binary size?
> 
> Do we want something later in the RTL pipeline to make "addi"s etc. again?
> 
>> 2020-03-26  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>>
>> 	* config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant
>> 	  to temp register before add.
> 
> This should not be indented, so just:
> 	* config/rs6000/rs6000.md (add<mode>3): Remove split code, move constant
> 	to temp register before add.
> 
> We have six separate add<mode>3 patterns (three of those are in rs6000.md,
> too).  You can write something like
> 	(add<mode>3 for SDI):
> to show which iterator (or mode) this one is for.  This is helpful with
> any <mode> or <code> or the like, even if there (currently) is only one
> pattern you could mean.
> 

Thanks, it is necessary to re-enable split add as some later RTL passes like final will
still need generate addis+addi (case: g++.dg/opt/thunk1.C ). 
Update the patch as below:


[PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization

Don't split code from add<mode>3 for SDI to allow a later pass to split.
This allows later logic to hoist out constant load in add instructions.
In loop, lis+ori could be hoisted out to improve performance compared with
previous addis+addi (About 15% on typical case), weak point is
one more register is used and one more instruction is generated.  i.e.:

addis 3,3,0x8765
addi 3,3,0x4321

=>

lis 9,0x8765
ori 9,9,0x4321
add 3,3,9

No obvious performance and binary size change to SPEC2017.

gcc/ChangeLog:

2020-03-30  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	* config/rs6000/rs6000.md (add<mode>3 for SDI): Don't split before reload,
	move constant to temp register for add.

gcc/testsuite/ChangeLog:

2020-03-26  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	* gcc.target/powerpc/add-const.c: New.
---
 gcc/config/rs6000/rs6000.md                  | 51 +++++++++-----------
 gcc/testsuite/gcc.target/powerpc/add-const.c | 18 +++++++
 2 files changed, 42 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/add-const.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ad88b6783af..76af3d5b1d9 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1729,34 +1729,31 @@ (define_expand "add<mode>3"
 
   if (CONST_INT_P (operands[2]) && !add_operand (operands[2], <MODE>mode))
     {
-      rtx tmp = ((!can_create_pseudo_p ()
-		  || rtx_equal_p (operands[0], operands[1]))
-		 ? operands[0] : gen_reg_rtx (<MODE>mode));
-
-      /* Adding a constant to r0 is not a valid insn, so use a different
-	 strategy in that case.  */
-      if (reg_or_subregno (operands[1]) == 0 || reg_or_subregno (tmp) == 0)
-	{
-	  if (operands[0] == operands[1])
-	    FAIL;
-	  rs6000_emit_move (operands[0], operands[2], <MODE>mode);
-	  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[0]));
-	  DONE;
-	}
-
-      HOST_WIDE_INT val = INTVAL (operands[2]);
-      HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000;
-      HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode);
-
-      if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT (rest)))
-	FAIL;
+      if (can_create_pseudo_p ())
+      {
+	rtx tmp = rtx_equal_p (operands[0], operands[1])
+	  ? operands[0] : gen_reg_rtx (<MODE>mode);
 
-      /* The ordering here is important for the prolog expander.
-	 When space is allocated from the stack, adding 'low' first may
-	 produce a temporary deallocation (which would be bad).  */
-      emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest)));
-      emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low)));
-      DONE;
+	rs6000_emit_move (tmp, operands[2], <MODE>mode);
+	emit_insn (gen_add<mode>3 (operands[0], operands[1], tmp));
+	DONE;
+      }
+      else
+      {
+	HOST_WIDE_INT val = INTVAL (operands[2]);
+	HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000;
+	HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode);
+
+	if (<MODE>mode == DImode && !satisfies_constraint_L (GEN_INT (rest)))
+	  FAIL;
+
+	/* The ordering here is important for the prolog expander.
+	   When space is allocated from the stack, adding 'low' first may
+	   produce a temporary deallocation (which would be bad).  */
+	emit_insn (gen_add<mode>3 (operands[0], operands[1], GEN_INT (rest)));
+	emit_insn (gen_add<mode>3 (operands[0], operands[0], GEN_INT (low)));
+	DONE;
+      }
     }
 })
 
diff --git a/gcc/testsuite/gcc.target/powerpc/add-const.c b/gcc/testsuite/gcc.target/powerpc/add-const.c
new file mode 100644
index 00000000000..e1007247b32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/add-const.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O3 -fno-unroll-loops" } */
+
+/* Ensure the lis,ori are generated, which indicates they have
+   been hoisted outside of the loop.  */
+
+typedef unsigned long ulong;
+ulong
+foo (ulong n, ulong h)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    h = ((h + 8) | h) + 0x87654321;
+  return h;
+}
+
+/* { dg-final { scan-assembler-times {\mlis\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mori\M} 1 } } */
-- 
2.21.0.777.g83232e3864



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-03-30  3:59   ` luoxhu
@ 2020-04-02 22:16     ` Segher Boessenkool
  2020-04-03  6:13       ` luoxhu
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2020-04-02 22:16 UTC (permalink / raw)
  To: luoxhu; +Cc: gcc-patches, wschmidt, guojiufu, linkw

Hi!

On Mon, Mar 30, 2020 at 11:59:57AM +0800, luoxhu wrote:
> > Do we want something later in the RTL pipeline to make "addi"s etc. again?

(This would be a good thing to consider -- maybe a define_insn_and_split
will work.  But see below).

> [PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization
> 
> Don't split code from add<mode>3 for SDI to allow a later pass to split.
> This allows later logic to hoist out constant load in add instructions.
> In loop, lis+ori could be hoisted out to improve performance compared with
> previous addis+addi (About 15% on typical case), weak point is
> one more register is used and one more instruction is generated.  i.e.:
> 
> addis 3,3,0x8765
> addi 3,3,0x4321
> 
> =>
> 
> lis 9,0x8765
> ori 9,9,0x4321
> add 3,3,9

(This patch will of course have to wait for stage 1).

Such a define_insn_and_split could be for an add of a (signed) 32-bit
immediate.  combine will try to combine the three insns (lis;ori;add),
and match the new pattern.

This also links in with Alan's work on big immediates, and with paddi
insns, etc.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-04-02 22:16     ` Segher Boessenkool
@ 2020-04-03  6:13       ` luoxhu
  2020-04-03 12:13         ` Alan Modra
  2020-04-03 21:08         ` Segher Boessenkool
  0 siblings, 2 replies; 11+ messages in thread
From: luoxhu @ 2020-04-03  6:13 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, wschmidt, guojiufu, linkw



On 2020/4/3 06:16, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Mar 30, 2020 at 11:59:57AM +0800, luoxhu wrote:
>>> Do we want something later in the RTL pipeline to make "addi"s etc. again?
> 
> (This would be a good thing to consider -- maybe a define_insn_and_split
> will work.  But see below).
> 
>> [PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization
>>
>> Don't split code from add<mode>3 for SDI to allow a later pass to split.
>> This allows later logic to hoist out constant load in add instructions.
>> In loop, lis+ori could be hoisted out to improve performance compared with
>> previous addis+addi (About 15% on typical case), weak point is
>> one more register is used and one more instruction is generated.  i.e.:
>>
>> addis 3,3,0x8765
>> addi 3,3,0x4321
>>
>> =>
>>
>> lis 9,0x8765
>> ori 9,9,0x4321
>> add 3,3,9
> 
> (This patch will of course have to wait for stage 1).
> 
> Such a define_insn_and_split could be for an add of a (signed) 32-bit
> immediate.  combine will try to combine the three insns (lis;ori;add),
> and match the new pattern.

Currently 286r.split2 will split "12:%9:DI=0x87654321" to lis+ori by
rs6000_emit_set_const of define_split, do you mean add new define_insn_and_split
to do the split?  Another patch to do this after this one goes upstream in stage 1?

> 
> This also links in with Alan's work on big immediates, and with paddi
> insns, etc.

Seems PR94393?  Yes, rs6000_emit_set_const calls rs6000_emit_set_long_const for DImode.
I tried unsigned long like 0xabcd87654321, 0xffffabcd87654321 and 0xc000000000000000ULL, 
All of them are outside of loop even without my patch.  No difference with or without
Alan's patch.

0xabcd87654321: li 9,0  ori 9,9,0xabcd  sldi 9,9,32   oris 9,9,0x8765    ori 9,9,0x4321
0xffffabcd87654321: lis 9,0xabcd   ori 9,9,0x8765     sldi 9,9,16     ori 9,9,0x4321
0xc000000000000000ULL:   li 9,-1   rldicr 9,9,0,1


Thanks,
Xionghu

> 
> 
> Segher
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-04-03  6:13       ` luoxhu
@ 2020-04-03 12:13         ` Alan Modra
  2020-04-03 21:11           ` Segher Boessenkool
  2020-04-03 21:08         ` Segher Boessenkool
  1 sibling, 1 reply; 11+ messages in thread
From: Alan Modra @ 2020-04-03 12:13 UTC (permalink / raw)
  To: luoxhu; +Cc: Segher Boessenkool, wschmidt, gcc-patches, linkw

On Fri, Apr 03, 2020 at 02:13:06PM +0800, luoxhu via Gcc-patches wrote:
> Seems PR94393?  Yes, rs6000_emit_set_const calls rs6000_emit_set_long_const for DImode.
> I tried unsigned long like 0xabcd87654321, 0xffffabcd87654321 and 0xc000000000000000ULL, 
> All of them are outside of loop even without my patch.  No difference with or without
> Alan's patch.

Segher probably meant the part I'm working on and haven't posted yet,
fixing lots of problems with rs6000_rtx_costs.  One of the improvements
I'm aiming for is that we should be able to emit code that loads
constants from memory without following optimisation passes converting
the loads from memory to those long sequences of dependent instructions.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-04-03  6:13       ` luoxhu
  2020-04-03 12:13         ` Alan Modra
@ 2020-04-03 21:08         ` Segher Boessenkool
  1 sibling, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2020-04-03 21:08 UTC (permalink / raw)
  To: luoxhu; +Cc: gcc-patches, wschmidt, guojiufu, linkw

Hi!

On Fri, Apr 03, 2020 at 02:13:06PM +0800, luoxhu wrote:
> Currently 286r.split2 will split "12:%9:DI=0x87654321" to lis+ori by
> rs6000_emit_set_const of define_split, do you mean add new define_insn_and_split
> to do the split?  Another patch to do this after this one goes upstream in stage 1?

I mean a splitter that recognises
  reg := reg + imm
for a big immediate, and then does addis+addi again if that is the best
way to do it.

We expand as a load of the 32-bit constant, which means it can be hoisted
just fine, and when we can use paddi we can; but when we cannot, we can
still split it again.

> > This also links in with Alan's work on big immediates, and with paddi
> > insns, etc.
> 
> Seems PR94393?

Yes, PR94393.  And everything else that will show up if we start pulling
at that string (which we should now, btw, for paddi).


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-04-03 12:13         ` Alan Modra
@ 2020-04-03 21:11           ` Segher Boessenkool
  2020-04-03 23:51             ` Alan Modra
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2020-04-03 21:11 UTC (permalink / raw)
  To: Alan Modra; +Cc: luoxhu, wschmidt, gcc-patches, linkw

On Fri, Apr 03, 2020 at 10:43:49PM +1030, Alan Modra wrote:
> On Fri, Apr 03, 2020 at 02:13:06PM +0800, luoxhu via Gcc-patches wrote:
> > Seems PR94393?  Yes, rs6000_emit_set_const calls rs6000_emit_set_long_const for DImode.
> > I tried unsigned long like 0xabcd87654321, 0xffffabcd87654321 and 0xc000000000000000ULL, 
> > All of them are outside of loop even without my patch.  No difference with or without
> > Alan's patch.
> 
> Segher probably meant the part I'm working on and haven't posted yet,
> fixing lots of problems with rs6000_rtx_costs.

I meant PR94393 in fact, but yeah, this is connected *everywhere* :-)

> One of the improvements
> I'm aiming for is that we should be able to emit code that loads
> constants from memory without following optimisation passes converting
> the loads from memory to those long sequences of dependent instructions.

Yeah.  Looking forward to it :-)


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-04-03 21:11           ` Segher Boessenkool
@ 2020-04-03 23:51             ` Alan Modra
  2020-04-07 21:47               ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Modra @ 2020-04-03 23:51 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: luoxhu, wschmidt, gcc-patches, linkw

On Fri, Apr 03, 2020 at 04:11:32PM -0500, Segher Boessenkool wrote:
> On Fri, Apr 03, 2020 at 10:43:49PM +1030, Alan Modra wrote:
> > Segher probably meant the part I'm working on and haven't posted yet,
> > fixing lots of problems with rs6000_rtx_costs.
> 
> I meant PR94393 in fact, but yeah, this is connected *everywhere* :-)
> 
> > One of the improvements
> > I'm aiming for is that we should be able to emit code that loads
> > constants from memory without following optimisation passes converting
> > the loads from memory to those long sequences of dependent instructions.
> 
> Yeah.  Looking forward to it :-)

I have a series of small patches already, the most significant so far
being the one that adds the following comment to rs6000_rtx_costs:

  "Calls from places like optabs.c:avoid_expensive_constant will come
   here with OUTER_CODE set to an operation such as AND with X being a
   CONST_INT or other CONSTANT_P type.  This will be compared against
   set_src_cost, where we'll come here with OUTER_CODE as SET and X
   the same constant.

   Calls from places like default_noce_conversion_profitable_p will
   come here via seq_cost and pass the pattern of a SET insn in X.
   The SET isn't handled here so *TOTAL will remain as
   COSTS_N_INSNS(1) multiplied by the number of words being set.
   Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
   will next see the set_src.  Continuing the example of an AND, this
   might be an AND of two other regs.  This AND should cost zero here
   since the insn cost has already been counted.  The overall cost
   value should be comparable to rs6000_insn_cost."

It pays to figure out what you need to do before doing anything.  :-)

Those two paragraphs will result in quite a few changes.  The first
one means that, for example, the rs6000_is_valid_and_mask test belongs
under case CONST_INT as
	  || (outer_code == AND
	      && rs6000_is_valid_and_mask (x, mode))
not under case AND.

The second paragraph says we are costing practically all operations
too highly.

I'm still in analysis mode, worried about whether gcc generates deep
rtl expressions to pass to rtx_cost.  I have a vague recollection of
seeing that years ago, but it looks like most places with anything
complex generate a sequence of insns and cost the sequence.  If we do
have expressions like (set (reg1) (and (plus (reg2) cst1) cst2)) then
rs6000_rtx_cost should recognise that AND as costing an extra insn.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization
  2020-04-03 23:51             ` Alan Modra
@ 2020-04-07 21:47               ` Segher Boessenkool
  0 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2020-04-07 21:47 UTC (permalink / raw)
  To: Alan Modra; +Cc: luoxhu, wschmidt, gcc-patches, linkw

Hi!

On Sat, Apr 04, 2020 at 10:21:00AM +1030, Alan Modra wrote:
> I have a series of small patches already, the most significant so far
> being the one that adds the following comment to rs6000_rtx_costs:
> 
>   "Calls from places like optabs.c:avoid_expensive_constant will come
>    here with OUTER_CODE set to an operation such as AND with X being a
>    CONST_INT or other CONSTANT_P type.  This will be compared against
>    set_src_cost, where we'll come here with OUTER_CODE as SET and X
>    the same constant.
> 
>    Calls from places like default_noce_conversion_profitable_p will
>    come here via seq_cost and pass the pattern of a SET insn in X.

That uses set_rtx_cost, which still doesn't use insn_cost, yes.
set_rtx_cost and set_src_cost are the two biggest things that still
need rtx_cost.

>    The SET isn't handled here so *TOTAL will remain as
>    COSTS_N_INSNS(1) multiplied by the number of words being set.
>    Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
>    will next see the set_src.  Continuing the example of an AND, this
>    might be an AND of two other regs.  This AND should cost zero here
>    since the insn cost has already been counted.  The overall cost
>    value should be comparable to rs6000_insn_cost."

The biggest problem in converting it is that sometimes there is no valid
code yet, and constructing it isn't easy either, but some code still
wants to know the cost :-/

> It pays to figure out what you need to do before doing anything.  :-)

Yes :-)

> Those two paragraphs will result in quite a few changes.  The first
> one means that, for example, the rs6000_is_valid_and_mask test belongs
> under case CONST_INT as
> 	  || (outer_code == AND
> 	      && rs6000_is_valid_and_mask (x, mode))
> not under case AND.

Does it?  Isn't the rtx_cost logic inside-out enough already?

> The second paragraph says we are costing practically all operations
> too highly.

Another huge advantage of insn_cost.

> I'm still in analysis mode, worried about whether gcc generates deep
> rtl expressions to pass to rtx_cost.

Yes :-(

> I have a vague recollection of
> seeing that years ago, but it looks like most places with anything
> complex generate a sequence of insns and cost the sequence.

Sure, and anything not a single_set will use insn_cost, not set_rtx_cost,
and sanity prevails.

> If we do
> have expressions like (set (reg1) (and (plus (reg2) cst1) cst2)) then
> rs6000_rtx_cost should recognise that AND as costing an extra insn.

Well, what does such an expression *mean*, if there is no instruction
like that?  How should we estimate the cost if we do not know what insns
it will generate?


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-04-07 21:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26 10:06 [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization luoxhu
2020-03-26 15:18 ` will schmidt
2020-03-27 14:33 ` Segher Boessenkool
2020-03-30  3:59   ` luoxhu
2020-04-02 22:16     ` Segher Boessenkool
2020-04-03  6:13       ` luoxhu
2020-04-03 12:13         ` Alan Modra
2020-04-03 21:11           ` Segher Boessenkool
2020-04-03 23:51             ` Alan Modra
2020-04-07 21:47               ` Segher Boessenkool
2020-04-03 21:08         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).