public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] rs6000: Enable generate const through pli+pli+rldimi
@ 2022-08-10  7:11 Jiufu Guo
  2022-08-10 16:43 ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: Jiufu Guo @ 2022-08-10  7:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, linkw, guojiufu

Hi,

As mentioned in PR106550, since pli could support 34bits immediate, we could
use less instructions(3insn would be ok) to build 64bits constant with pli.

For example, for constant 0x020805006106003, we could generate it with:
asm code1:
pli 9,101736451 (0x6106003)
sldi 9,9,32
paddi 9,9, 2130000 (0x0208050)

or asm code2:
pli 10, 2130000
pli 9, 101736451
rldimi 9, 10, 32, 0

If there is only one register can be used, then the asm code1 is ok. Otherwise
asm code2 may be better.

This patch re-enable the constant building(splitter) before RA by updating the
constrains from int_reg_operand_not_pseudo to gpc_reg_operand.  And then, we
could use two different pseduo for two pli(s), and asm code2 can be generated.

This patch also could generate asm code1 if hard register is allocated for the
constant.

This patch pass boostrap and regtest on ppc64le(includes p10).
Is it ok for trunk?

BR,
Jeff(Jiufu)


	PR target/106550

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Enable constant
	build with pli instructions.
	* config/rs6000/rs6000.md: Use gpc_reg_operand for constant splitter.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr106550.c: New test.


---
 gcc/config/rs6000/rs6000.cc                 | 40 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                 |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 ++++++++
 3 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..a2e6b7f59a0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10181,6 +10181,46 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 			gen_rtx_IOR (DImode, copy_rtx (temp),
 				     GEN_INT (ud1)));
     }
+  else if (TARGET_PREFIXED)
+    {
+      /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0.  */
+      if (can_create_pseudo_p ())
+	{
+	  temp = gen_reg_rtx (DImode);
+	  rtx temp1 = gen_reg_rtx (DImode);
+	  emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3));
+	  emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1));
+
+	  rtx one = gen_rtx_AND (DImode, temp1, GEN_INT (0xffffffff));
+	  rtx two = gen_rtx_ASHIFT (DImode, temp, GEN_INT (32));
+	  emit_move_insn (dest, gen_rtx_IOR (DImode, one, two));
+	}
+
+      /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32.  */
+      else
+	{
+	  emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3));
+
+	  emit_move_insn (copy_rtx (dest),
+			  gen_rtx_ASHIFT (DImode, copy_rtx (dest),
+					  GEN_INT (32)));
+
+	  bool cannot_use_paddi = REGNO (dest) == FIRST_GPR_REGNO;
+
+	  /* Use paddi for the low32 bits.  */
+	  if (ud2 != 0 && ud1 != 0 && !cannot_use_paddi)
+	    emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest),
+						GEN_INT ((ud2 << 16) | ud1)));
+	  /* Use oris, ori for low32 bits.  */
+	  if (ud2 != 0 && (ud1 == 0 || cannot_use_paddi))
+	    emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest,
+			    gen_rtx_IOR (DImode, copy_rtx (dest),
+					 GEN_INT (ud2 << 16)));
+	  if (ud1 != 0 && (ud2 == 0 || cannot_use_paddi))
+	    emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest),
+					       GEN_INT (ud1)));
+	}
+    }
   else
     {
       temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1367a2cb779..abe777a593c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9659,7 +9659,7 @@ (define_split
 ;; When non-easy constants can go in the TOC, this should use
 ;; easy_fp_constant predicate.
 (define_split
-  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
+  [(set (match_operand:DI 0 "gpc_reg_operand")
 	(match_operand:DI 1 "const_int_operand"))]
   "TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
   [(set (match_dup 0) (match_dup 2))
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c b/gcc/testsuite/gcc.target/powerpc/pr106550.c
new file mode 100644
index 00000000000..bca7760bad9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
@@ -0,0 +1,14 @@
+/* PR target/106550 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */
+
+void
+foo (unsigned long long *a)
+{
+  *a++ = 0x020805006106003;
+  *a++ = 0x2351847027482577;  
+}
+
+/* { dg-final { scan-assembler-times {\mpli\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
+
-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] rs6000: Enable generate const through pli+pli+rldimi
  2022-08-10  7:11 [PATCH] rs6000: Enable generate const through pli+pli+rldimi Jiufu Guo
@ 2022-08-10 16:43 ` Segher Boessenkool
  2022-08-11 12:52   ` Jiufu Guo
  0 siblings, 1 reply; 4+ messages in thread
From: Segher Boessenkool @ 2022-08-10 16:43 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, dje.gcc, linkw

Hi!

On Wed, Aug 10, 2022 at 03:11:23PM +0800, Jiufu Guo wrote:
> As mentioned in PR106550, since pli could support 34bits immediate, we could
> use less instructions(3insn would be ok) to build 64bits constant with pli.
> 
> For example, for constant 0x020805006106003, we could generate it with:
> asm code1:
> pli 9,101736451 (0x6106003)
> sldi 9,9,32
> paddi 9,9, 2130000 (0x0208050)
> 
> or asm code2:
> pli 10, 2130000
> pli 9, 101736451
> rldimi 9, 10, 32, 0
> 
> If there is only one register can be used, then the asm code1 is ok. Otherwise
> asm code2 may be better.

It is significantly better yes.  That code with sldi is perhaps what we
have to do after reload, but all those three insns are sequential,
expensive.

> This patch re-enable the constant building(splitter) before RA by updating the
> constrains from int_reg_operand_not_pseudo to gpc_reg_operand.  And then, we
> could use two different pseduo for two pli(s), and asm code2 can be generated.

> This patch also could generate asm code1 if hard register is allocated for the
> constant.

> +  else if (TARGET_PREFIXED)
> +    {
> +      /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0.  */
> +      if (can_create_pseudo_p ())
> +	{
> +	  temp = gen_reg_rtx (DImode);
> +	  rtx temp1 = gen_reg_rtx (DImode);
> +	  emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3));
> +	  emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1));
> +
> +	  rtx one = gen_rtx_AND (DImode, temp1, GEN_INT (0xffffffff));

Why do you meed to mask the value here?  That is a nop, no?

> +	  rtx two = gen_rtx_ASHIFT (DImode, temp, GEN_INT (32));
> +	  emit_move_insn (dest, gen_rtx_IOR (DImode, one, two));

But you can call gen_rotldi3_insert_3 explicitly, a better idea if this
code can run late (so we cannot rely on other optimisations to clean
things up).

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -9659,7 +9659,7 @@ (define_split
>  ;; When non-easy constants can go in the TOC, this should use
>  ;; easy_fp_constant predicate.
>  (define_split
> -  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
> +  [(set (match_operand:DI 0 "gpc_reg_operand")
>  	(match_operand:DI 1 "const_int_operand"))]
>    "TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
>    [(set (match_dup 0) (match_dup 2))

This is a huge change.  Do you have some indication that it helps /
hurts / is neutral?  Some reasoning why it is a good idea?

I am not against it, but some more rationale would be good :-)

Btw, this splitter uses operands[2] and [3] in the replacement, and
neither of those exists.  The replacement never is used of course.
Instead, rs6000_emit_set_const is called always.  It would be less
misleading if the replacement text was just "(pc)" or such.


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] rs6000: Enable generate const through pli+pli+rldimi
  2022-08-10 16:43 ` Segher Boessenkool
@ 2022-08-11 12:52   ` Jiufu Guo
  2022-08-11 13:48     ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: Jiufu Guo @ 2022-08-11 12:52 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc, linkw


Hi,

Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Wed, Aug 10, 2022 at 03:11:23PM +0800, Jiufu Guo wrote:
>> As mentioned in PR106550, since pli could support 34bits immediate, we could
>> use less instructions(3insn would be ok) to build 64bits constant with pli.
>> 
>> For example, for constant 0x020805006106003, we could generate it with:
>> asm code1:
>> pli 9,101736451 (0x6106003)
>> sldi 9,9,32
>> paddi 9,9, 2130000 (0x0208050)
>> 
>> or asm code2:
>> pli 10, 2130000
>> pli 9, 101736451
>> rldimi 9, 10, 32, 0
>> 
>> If there is only one register can be used, then the asm code1 is ok. Otherwise
>> asm code2 may be better.
>
> It is significantly better yes.  That code with sldi is perhaps what we
> have to do after reload, but all those three insns are sequential,
> expensive.

>
>> This patch re-enable the constant building(splitter) before RA by updating the
>> constrains from int_reg_operand_not_pseudo to gpc_reg_operand.  And then, we
>> could use two different pseduo for two pli(s), and asm code2 can be generated.
>
>> This patch also could generate asm code1 if hard register is allocated for the
>> constant.
>
>> +  else if (TARGET_PREFIXED)
>> +    {
>> +      /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0.  */
>> +      if (can_create_pseudo_p ())
>> +	{
>> +	  temp = gen_reg_rtx (DImode);
>> +	  rtx temp1 = gen_reg_rtx (DImode);
>> +	  emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3));
>> +	  emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1));
>> +
>> +	  rtx one = gen_rtx_AND (DImode, temp1, GEN_INT (0xffffffff));
>
> Why do you meed to mask the value here?  That is a nop, no?
As you mentioned, this is not needed if using gen_rotldi3_insert_3.
>
>> +	  rtx two = gen_rtx_ASHIFT (DImode, temp, GEN_INT (32));
>> +	  emit_move_insn (dest, gen_rtx_IOR (DImode, one, two));
>
> But you can call gen_rotldi3_insert_3 explicitly, a better idea if this
> code can run late (so we cannot rely on other optimisations to clean
> things up).
Thanks! Using gen_rotldi3_insert_3 would indicate the rotate behavior
explicitly.

emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1,
                                GEN_INT (0xffffffff)));

>
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -9659,7 +9659,7 @@ (define_split
>>  ;; When non-easy constants can go in the TOC, this should use
>>  ;; easy_fp_constant predicate.
>>  (define_split
>> -  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
>> +  [(set (match_operand:DI 0 "gpc_reg_operand")
>>  	(match_operand:DI 1 "const_int_operand"))]
>>    "TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
>>    [(set (match_dup 0) (match_dup 2))
>
> This is a huge change.  Do you have some indication that it helps /
> hurts / is neutral?  Some reasoning why it is a good idea?
Thanks for this concern! I would do more check/test for this.

The 'int_reg_operand_not_pseudo' cause this splitter only work after RA.
Using 'int_reg_operand_not_pseudo', the code sequence "pli+sldi+paddi"
can be used.

>
> I am not against it, but some more rationale would be good :-)
>
> Btw, this splitter uses operands[2] and [3] in the replacement, and
> neither of those exists.  The replacement never is used of course.
> Instead, rs6000_emit_set_const is called always.  It would be less
> misleading if the replacement text was just "(pc)" or such.

Right, "(pc)" would be better to avoid misleading.

BR,
Jeff(Jiufu)
>
>
> Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] rs6000: Enable generate const through pli+pli+rldimi
  2022-08-11 12:52   ` Jiufu Guo
@ 2022-08-11 13:48     ` Segher Boessenkool
  0 siblings, 0 replies; 4+ messages in thread
From: Segher Boessenkool @ 2022-08-11 13:48 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, dje.gcc, linkw

Hi!

On Thu, Aug 11, 2022 at 08:52:49PM +0800, Jiufu Guo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Wed, Aug 10, 2022 at 03:11:23PM +0800, Jiufu Guo wrote:
> >> @@ -9659,7 +9659,7 @@ (define_split
> >>  ;; When non-easy constants can go in the TOC, this should use
> >>  ;; easy_fp_constant predicate.
> >>  (define_split
> >> -  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
> >> +  [(set (match_operand:DI 0 "gpc_reg_operand")
> >>  	(match_operand:DI 1 "const_int_operand"))]
> >>    "TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
> >>    [(set (match_dup 0) (match_dup 2))
> >
> > This is a huge change.  Do you have some indication that it helps /
> > hurts / is neutral?  Some reasoning why it is a good idea?
> Thanks for this concern! I would do more check/test for this.
> 
> The 'int_reg_operand_not_pseudo' cause this splitter only work after RA.
> Using 'int_reg_operand_not_pseudo', the code sequence "pli+sldi+paddi"
> can be used.

Yes.  Splitting early and to allowing pseudos is all good -- but this
was disallowed for a reason, supposedly.

Changing this now and then changing it back in a few weeks or months
isn't progress :-/


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-11 13:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10  7:11 [PATCH] rs6000: Enable generate const through pli+pli+rldimi Jiufu Guo
2022-08-10 16:43 ` Segher Boessenkool
2022-08-11 12:52   ` Jiufu Guo
2022-08-11 13:48     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).