[PATCH] rs6000: Use rldimi for vec init instead of shift + ior

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
@ 2021-02-03  6:37 Kewen.Lin
  2021-02-18  1:50 ` PING " Kewen.Lin
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kewen.Lin @ 2021-02-03  6:37 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, Bill Schmidt, David Edelsohn

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

Hi,

This patch merges the previously approved one[1] and its relied patch
made by Segher here[2], it's to make unsigned int vector init go with
rldimi to merge two integers instead of shift and ior.

Segher's patch in [2] is required to make the test case pass,
otherwise the costing for new pseudo-to-pseudo copies and the folding
with nonzero_bits in combine will make the rl*imi pattern become
compact and split into ior and shift unexpectedly.

The commit log of Segher's patch describes it in more details:

"An rl*imi is usually written as an IOR of an ASHIFT or similar, and an
AND of a register with a constant mask.  In some cases combine knows
that that AND doesn't do anything (because all zero bits in that mask
correspond to bits known to be already zero), and then no pattern
matches.  This patch adds a define_split for such cases.  It uses
nonzero_bits in the condition of the splitter, but does not need it
afterwards for the instruction to be recognised.  This is necessary
because later passes can see fewer nonzero_bits.

Because it is a splitter, combine will only use it when starting with
three insns (or more), even though the result is just one.  This isn't
a huge problem in practice, but some possible combinations still won't
happen."

Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9.

Is it ok for trunk?

BR,
Kewen

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html


gcc/ChangeLog:

2020-02-03  Segher Boessenkool  <segher@kernel.crashing.org>
	    Kewen Lin  <linkw@gcc.gnu.org>

	* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
	(rotl<mode>3_insert_3): ...this.
	(plus_ior_xor): New code_iterator.
	(define_split for GPR rl*imi): New splitter.
	* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
	for integer merging.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/vec-init-10.c: New test.

-----


[-- Attachment #2: vec_init.patch --]
[-- Type: text/plain, Size: 3872 bytes --]

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bb9fb42f82a..dca311ebc80 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4067,7 +4067,7 @@
   [(set_attr "type" "insert")])
 
 ; There are also some forms without one of the ANDs.
-(define_insn "*rotl<mode>3_insert_3"
+(define_insn "rotl<mode>3_insert_3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
 			  (match_operand:GPR 4 "const_int_operand" "n"))
@@ -4082,6 +4082,24 @@
 }
   [(set_attr "type" "insert")])
 
+(define_code_iterator plus_ior_xor [plus ior xor])
+
+(define_split
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+	(plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand")
+				      (match_operand:SI 2 "const_int_operand"))
+			  (match_operand:GPR 3 "gpc_reg_operand")))]
+  "nonzero_bits (operands[3], <MODE>mode)
+   < HOST_WIDE_INT_1U << INTVAL (operands[2])"
+  [(set (match_dup 0)
+	(ior:GPR (and:GPR (match_dup 3)
+			  (match_dup 4))
+		 (ashift:GPR (match_dup 1)
+			     (match_dup 2))))]
+{
+  operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
+})
+
 (define_insn "*rotl<mode>3_insert_4"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0c1bda522a9..07c2f7ffa6e 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3008,28 +3008,22 @@
    (use (match_operand:SI 4 "gpc_reg_operand"))]
    "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
 {
-  rtx a = gen_reg_rtx (DImode);
-  rtx b = gen_reg_rtx (DImode);
-  rtx c = gen_reg_rtx (DImode);
-  rtx d = gen_reg_rtx (DImode);
-  emit_insn (gen_zero_extendsidi2 (a, operands[1]));
-  emit_insn (gen_zero_extendsidi2 (b, operands[2]));
-  emit_insn (gen_zero_extendsidi2 (c, operands[3]));
-  emit_insn (gen_zero_extendsidi2 (d, operands[4]));
+  rtx a = gen_lowpart_SUBREG (DImode, operands[1]);
+  rtx b = gen_lowpart_SUBREG (DImode, operands[2]);
+  rtx c = gen_lowpart_SUBREG (DImode, operands[3]);
+  rtx d = gen_lowpart_SUBREG (DImode, operands[4]);
   if (!BYTES_BIG_ENDIAN)
     {
       std::swap (a, b);
       std::swap (c, d);
     }
 
-  rtx aa = gen_reg_rtx (DImode);
   rtx ab = gen_reg_rtx (DImode);
-  rtx cc = gen_reg_rtx (DImode);
   rtx cd = gen_reg_rtx (DImode);
-  emit_insn (gen_ashldi3 (aa, a, GEN_INT (32)));
-  emit_insn (gen_ashldi3 (cc, c, GEN_INT (32)));
-  emit_insn (gen_iordi3 (ab, aa, b));
-  emit_insn (gen_iordi3 (cd, cc, d));
+  emit_insn (gen_rotldi3_insert_3 (ab, a, GEN_INT (32), b,
+				   GEN_INT (0xffffffff)));
+  emit_insn (gen_rotldi3_insert_3 (cd, c, GEN_INT (32), d,
+				   GEN_INT (0xffffffff)));
 
   rtx abcd = gen_reg_rtx (V2DImode);
   emit_insn (gen_vsx_concat_v2di (abcd, ab, cd));
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-init-10.c b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c
new file mode 100644
index 00000000000..680538e67f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+
+/* Check that we can optimize sldi + or to rldimi for vector int init.  */
+
+vector unsigned int
+testu (unsigned int i1, unsigned int i2, unsigned int i3, unsigned int i4)
+{
+  vector unsigned int v = {i1, i2, i3, i4};
+  return v;
+}
+
+vector signed int
+tests (signed int i1, signed int i2, signed int i3, signed int i4)
+{
+  vector signed int v = {i1, i2, i3, i4};
+  return v;
+}
+
+/* { dg-final { scan-assembler-not "sldi" } } */
+/* { dg-final { scan-assembler-not "or" } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* PING [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-02-03  6:37 [PATCH] rs6000: Use rldimi for vec init instead of shift + ior Kewen.Lin
@ 2021-02-18  1:50 ` Kewen.Lin
  2021-02-18 17:58 ` will schmidt
  2021-02-18 18:33 ` Segher Boessenkool
  2 siblings, 0 replies; 11+ messages in thread
From: Kewen.Lin @ 2021-02-18  1:50 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bill Schmidt, David Edelsohn, Segher Boessenkool

Hi,

I'd like to gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564758.html

BR,
Kewen

on 2021/2/3 下午2:37, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> This patch merges the previously approved one[1] and its relied patch
> made by Segher here[2], it's to make unsigned int vector init go with
> rldimi to merge two integers instead of shift and ior.
> 
> Segher's patch in [2] is required to make the test case pass,
> otherwise the costing for new pseudo-to-pseudo copies and the folding
> with nonzero_bits in combine will make the rl*imi pattern become
> compact and split into ior and shift unexpectedly.
> 
> The commit log of Segher's patch describes it in more details:
> 
> "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an
> AND of a register with a constant mask.  In some cases combine knows
> that that AND doesn't do anything (because all zero bits in that mask
> correspond to bits known to be already zero), and then no pattern
> matches.  This patch adds a define_split for such cases.  It uses
> nonzero_bits in the condition of the splitter, but does not need it
> afterwards for the instruction to be recognised.  This is necessary
> because later passes can see fewer nonzero_bits.
> 
> Because it is a splitter, combine will only use it when starting with
> three insns (or more), even though the result is just one.  This isn't
> a huge problem in practice, but some possible combinations still won't
> happen."
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html
> 
> 
> gcc/ChangeLog:
> 
> 2020-02-03  Segher Boessenkool  <segher@kernel.crashing.org>
> 	    Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
> 	(rotl<mode>3_insert_3): ...this.
> 	(plus_ior_xor): New code_iterator.
> 	(define_split for GPR rl*imi): New splitter.
> 	* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
> 	for integer merging.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/vec-init-10.c: New test.
> 
> -----
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-02-03  6:37 [PATCH] rs6000: Use rldimi for vec init instead of shift + ior Kewen.Lin
  2021-02-18  1:50 ` PING " Kewen.Lin
@ 2021-02-18 17:58 ` will schmidt
  2021-02-18 18:28   ` Segher Boessenkool
  2021-02-18 18:33 ` Segher Boessenkool
  2 siblings, 1 reply; 11+ messages in thread
From: will schmidt @ 2021-02-18 17:58 UTC (permalink / raw)
  To: Kewen.Lin, GCC Patches; +Cc: Bill Schmidt, David Edelsohn, Segher Boessenkool

On Wed, 2021-02-03 at 14:37 +0800, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 

Hi,


> This patch merges the previously approved one[1] and its relied patch


I don't see the review for [1] in the archives.  


> made by Segher here[2], it's to make unsigned int vector init go with
> rldimi to merge two integers instead of shift and ior.
> 
> Segher's patch in [2] is required to make the test case pass,
> otherwise the costing for new pseudo-to-pseudo copies and the folding
> with nonzero_bits in combine will make the rl*imi pattern become
> compact and split into ior and shift unexpectedly.
> 
> The commit log of Segher's patch describes it in more details:
> 
> "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an
> AND of a register with a constant mask.  In some cases combine knows
> that that AND doesn't do anything (because all zero bits in that mask
> correspond to bits known to be already zero), and then no pattern
> matches.  This patch adds a define_split for such cases.  It uses
> nonzero_bits in the condition of the splitter, but does not need it
> afterwards for the instruction to be recognised.  This is necessary
> because later passes can see fewer nonzero_bits.
> 
> Because it is a splitter, combine will only use it when starting with
> three insns (or more), even though the result is just one.  This isn't
> a huge problem in practice, but some possible combinations still won't
> happen."
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html
> 
> 
> gcc/ChangeLog:
> 
> 2020-02-03  Segher Boessenkool  <segher@kernel.crashing.org>
> 	    Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
> 	(rotl<mode>3_insert_3): ...this.

ok

> 	(plus_ior_xor): New code_iterator.
> 	(define_split for GPR rl*imi): New splitter.

As described above, these two changes appear to identical to what was
posted by Segher in [1].

(
> 	* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
> 	for integer merging.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/vec-init-10.c: New test.

+/* { dg-final { scan-assembler-not "or" } } */

As is, it looks OK to me.  Per other reviews i've gotten, you may
get a request to wrap the "or" with \m \M .
Some existing cases with
scan-assembler-not have a leading whitespace/tab qualifier too. 
i.e.
/* { dg-final { scan-assembler-not "\[ \t\]or "      } } */

Thanks,
-Will

> 
> -----
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-02-18 17:58 ` will schmidt
@ 2021-02-18 18:28   ` Segher Boessenkool
  0 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2021-02-18 18:28 UTC (permalink / raw)
  To: will schmidt; +Cc: Kewen.Lin, GCC Patches, Bill Schmidt, David Edelsohn

On Thu, Feb 18, 2021 at 11:58:59AM -0600, will schmidt wrote:
> On Wed, 2021-02-03 at 14:37 +0800, Kewen.Lin via Gcc-patches wrote:
> > This patch merges the previously approved one[1] and its relied patch
> 
> I don't see the review for [1] in the archives.  

I said:
  Can you make a different testcase perhaps?  This patch looks fine
  otherwise.

> > made by Segher here[2], it's to make unsigned int vector init go with
> > rldimi to merge two integers instead of shift and ior.
> > 
> > Segher's patch in [2] is required to make the test case pass,
> > otherwise the costing for new pseudo-to-pseudo copies and the folding
> > with nonzero_bits in combine will make the rl*imi pattern become
> > compact and split into ior and shift unexpectedly.
> > 
> > The commit log of Segher's patch describes it in more details:
> > 
> > "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an
> > AND of a register with a constant mask.  In some cases combine knows
> > that that AND doesn't do anything (because all zero bits in that mask
> > correspond to bits known to be already zero), and then no pattern
> > matches.  This patch adds a define_split for such cases.  It uses
> > nonzero_bits in the condition of the splitter, but does not need it
> > afterwards for the instruction to be recognised.  This is necessary
> > because later passes can see fewer nonzero_bits.
> > 
> > Because it is a splitter, combine will only use it when starting with
> > three insns (or more), even though the result is just one.  This isn't
> > a huge problem in practice, but some possible combinations still won't
> > happen."
> > 
> > Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
> > powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9.
> > 
> > Is it ok for trunk?
> > 
> > BR,
> > Kewen
> > 
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html
> > [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html
> > 
> > 
> > gcc/ChangeLog:
> > 
> > 2020-02-03  Segher Boessenkool  <segher@kernel.crashing.org>
> > 	    Kewen Lin  <linkw@gcc.gnu.org>
> > 
> > 	* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
> > 	(rotl<mode>3_insert_3): ...this.
> 
> ok
> 
> > 	(plus_ior_xor): New code_iterator.
> > 	(define_split for GPR rl*imi): New splitter.
> 
> As described above, these two changes appear to identical to what was
> posted by Segher in [1].

But with testcase :-)

> +/* { dg-final { scan-assembler-not "or" } } */
> 
> As is, it looks OK to me.  Per other reviews i've gotten, you may
> get a request to wrap the "or" with \m \M .

Right, because it is very easy to end up with "or" as a substring of
something else.

> Some existing cases with
> scan-assembler-not have a leading whitespace/tab qualifier too. 
> i.e.
> /* { dg-final { scan-assembler-not "\[ \t\]or "      } } */

Yeah, but the \m in {\mor\M} will do the same already.

Thanks,


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-02-03  6:37 [PATCH] rs6000: Use rldimi for vec init instead of shift + ior Kewen.Lin
  2021-02-18  1:50 ` PING " Kewen.Lin
  2021-02-18 17:58 ` will schmidt
@ 2021-02-18 18:33 ` Segher Boessenkool
  2021-02-19  3:06   ` Kewen.Lin
  2 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2021-02-18 18:33 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Bill Schmidt, David Edelsohn

Hi!

On Wed, Feb 03, 2021 at 02:37:05PM +0800, Kewen.Lin wrote:
> This patch merges the previously approved one[1] and its relied patch
> made by Segher here[2], it's to make unsigned int vector init go with
> rldimi to merge two integers instead of shift and ior.

> gcc/ChangeLog:
> 
> 2020-02-03  Segher Boessenkool  <segher@kernel.crashing.org>
> 	    Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
> 	(rotl<mode>3_insert_3): ...this.
> 	(plus_ior_xor): New code_iterator.
> 	(define_split for GPR rl*imi): New splitter.
> 	* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
> 	for integer merging.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/vec-init-10.c: New test.

Is there a PR you should mention here?

> +/* { dg-final { scan-assembler-not "sldi" } } */
> +/* { dg-final { scan-assembler-not "or" } } */
> +/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */

/* { dg-final { scan-assembler-not {\msldi\M} } } */
/* { dg-final { scan-assembler-not {\mor\M} } } */
/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */

Okay for trunk with that tweak.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-02-18 18:33 ` Segher Boessenkool
@ 2021-02-19  3:06   ` Kewen.Lin
  2021-02-19 21:03     ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Kewen.Lin @ 2021-02-19  3:06 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt
  Cc: GCC Patches, Bill Schmidt, David Edelsohn

Hi Segher & Will,

Thanks for your review comments!

on 2021/2/19 上午2:33, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Feb 03, 2021 at 02:37:05PM +0800, Kewen.Lin wrote:
>> This patch merges the previously approved one[1] and its relied patch
>> made by Segher here[2], it's to make unsigned int vector init go with
>> rldimi to merge two integers instead of shift and ior.
> 
>> gcc/ChangeLog:
>>
>> 2020-02-03  Segher Boessenkool  <segher@kernel.crashing.org>
>> 	    Kewen Lin  <linkw@gcc.gnu.org>
>>
>> 	* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
>> 	(rotl<mode>3_insert_3): ...this.
>> 	(plus_ior_xor): New code_iterator.
>> 	(define_split for GPR rl*imi): New splitter.
>> 	* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
>> 	for integer merging.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/powerpc/vec-init-10.c: New test.
> 
> Is there a PR you should mention here?

I thought this is trivial so didn't file one upstream PR.  I did a
searching in upstream bugzilla, PR93453 looks similar but different.
I confirmed that the current patch can't make it pass (just two insns
instead of three insns).

Do you happen to have one related in mind?  If no, I will commit it
without PR.

> 
>> +/* { dg-final { scan-assembler-not "sldi" } } */
>> +/* { dg-final { scan-assembler-not "or" } } */
>> +/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */
> 
> /* { dg-final { scan-assembler-not {\msldi\M} } } */
> /* { dg-final { scan-assembler-not {\mor\M} } } */
> /* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */
> 
> Okay for trunk with that tweak.  Thanks!
> 

Will fix it, thanks!


BR,
Kewen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-02-19  3:06   ` Kewen.Lin
@ 2021-02-19 21:03     ` Segher Boessenkool
  0 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2021-02-19 21:03 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: will schmidt, GCC Patches, Bill Schmidt, David Edelsohn

Hi!

On Fri, Feb 19, 2021 at 11:06:16AM +0800, Kewen.Lin wrote:
> on 2021/2/19 上午2:33, Segher Boessenkool wrote:
> > Is there a PR you should mention here?
> 
> I thought this is trivial so didn't file one upstream PR.  I did a
> searching in upstream bugzilla, PR93453 looks similar but different.
> I confirmed that the current patch can't make it pass (just two insns
> instead of three insns).
> 
> Do you happen to have one related in mind?  If no, I will commit it
> without PR.

That is fine.  I just asked if perhaps you forgot to mention a PR; I do
that all the time myself!  Filing a new PR if a patch is ready and
approved is just bureaucracy, so let's not (it is useful if it needs
backporting though, to have a place to track that).

Thanks,


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
@ 2020-12-22  8:08 Kewen.Lin
  2021-01-15  0:50 ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Kewen.Lin @ 2020-12-22  8:08 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

Hi,

This patch is to make unsigned int vector init go with
rldimi to merge two integers instead of shift and ior.

I tried to use nonzero_bits in md file to make it more
general, but the testing shows it isn't doable.  The
reason is that some passes would replace some pseudos
with other pseudos and do the recog again, but at that
time the nonzero_bits could get rough information and
lead the recog fails unexpectedly.

btw, the test case would reply on the combine patch[1].

Bootstrapped/regtested on powerpc64le-linux-gnu P9.

BR,
Kewen

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561413.html

gcc/ChangeLog:

	* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
	(rotl<mode>3_insert_3): ...this.
	* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
	for integer merging.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/vec-init-10.c: New test.

-----

[-- Attachment #2: vec_init.diff --]
[-- Type: text/plain, Size: 3053 bytes --]

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0e799198a50..3529b79d35d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4067,7 +4067,7 @@
   [(set_attr "type" "insert")])
 
 ; There are also some forms without one of the ANDs.
-(define_insn "*rotl<mode>3_insert_3"
+(define_insn "rotl<mode>3_insert_3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
 			  (match_operand:GPR 4 "const_int_operand" "n"))
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 947631d83ee..37105a5aabf 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3008,28 +3008,22 @@
    (use (match_operand:SI 4 "gpc_reg_operand"))]
    "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
 {
-  rtx a = gen_reg_rtx (DImode);
-  rtx b = gen_reg_rtx (DImode);
-  rtx c = gen_reg_rtx (DImode);
-  rtx d = gen_reg_rtx (DImode);
-  emit_insn (gen_zero_extendsidi2 (a, operands[1]));
-  emit_insn (gen_zero_extendsidi2 (b, operands[2]));
-  emit_insn (gen_zero_extendsidi2 (c, operands[3]));
-  emit_insn (gen_zero_extendsidi2 (d, operands[4]));
+  rtx a = gen_lowpart_SUBREG (DImode, operands[1]);
+  rtx b = gen_lowpart_SUBREG (DImode, operands[2]);
+  rtx c = gen_lowpart_SUBREG (DImode, operands[3]);
+  rtx d = gen_lowpart_SUBREG (DImode, operands[4]);
   if (!BYTES_BIG_ENDIAN)
     {
       std::swap (a, b);
       std::swap (c, d);
     }
 
-  rtx aa = gen_reg_rtx (DImode);
   rtx ab = gen_reg_rtx (DImode);
-  rtx cc = gen_reg_rtx (DImode);
   rtx cd = gen_reg_rtx (DImode);
-  emit_insn (gen_ashldi3 (aa, a, GEN_INT (32)));
-  emit_insn (gen_ashldi3 (cc, c, GEN_INT (32)));
-  emit_insn (gen_iordi3 (ab, aa, b));
-  emit_insn (gen_iordi3 (cd, cc, d));
+  emit_insn (gen_rotldi3_insert_3 (ab, a, GEN_INT (32), b,
+				   GEN_INT (0xffffffff)));
+  emit_insn (gen_rotldi3_insert_3 (cd, c, GEN_INT (32), d,
+				   GEN_INT (0xffffffff)));
 
   rtx abcd = gen_reg_rtx (V2DImode);
   emit_insn (gen_vsx_concat_v2di (abcd, ab, cd));
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-init-10.c b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c
new file mode 100644
index 00000000000..680538e67f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+
+/* Check that we can optimize sldi + or to rldimi for vector int init.  */
+
+vector unsigned int
+testu (unsigned int i1, unsigned int i2, unsigned int i3, unsigned int i4)
+{
+  vector unsigned int v = {i1, i2, i3, i4};
+  return v;
+}
+
+vector signed int
+tests (signed int i1, signed int i2, signed int i3, signed int i4)
+{
+  vector signed int v = {i1, i2, i3, i4};
+  return v;
+}
+
+/* { dg-final { scan-assembler-not "sldi" } } */
+/* { dg-final { scan-assembler-not "or" } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2020-12-22  8:08 Kewen.Lin
@ 2021-01-15  0:50 ` Segher Boessenkool
  2021-01-15  6:40   ` Kewen.Lin
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2021-01-15  0:50 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Bill Schmidt

Hi!

On Tue, Dec 22, 2020 at 04:08:26PM +0800, Kewen.Lin wrote:
> This patch is to make unsigned int vector init go with
> rldimi to merge two integers instead of shift and ior.
> 
> I tried to use nonzero_bits in md file to make it more
> general, but the testing shows it isn't doable.  The
> reason is that some passes would replace some pseudos
> with other pseudos and do the recog again, but at that
> time the nonzero_bits could get rough information and
> lead the recog fails unexpectedly.

Aha.  So nonzero_bits is unusable for most purposes as well.  Great :-/

> btw, the test case would reply on the combine patch[1].

Can you make a different testcase perhaps?  This patch looks fine
otherwise.

(Ideally the compiler would just form those insert insns itself, of
course.  Why doesn't that work?  This is important for quite a lot of
code, we really can advantageously use the rl*imi insns often!)


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-01-15  0:50 ` Segher Boessenkool
@ 2021-01-15  6:40   ` Kewen.Lin
  2021-01-19  7:14     ` Kewen.Lin
  0 siblings, 1 reply; 11+ messages in thread
From: Kewen.Lin @ 2021-01-15  6:40 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches, Bill Schmidt

Hi Segher,

on 2021/1/15 上午8:50, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Dec 22, 2020 at 04:08:26PM +0800, Kewen.Lin wrote:
>> This patch is to make unsigned int vector init go with
>> rldimi to merge two integers instead of shift and ior.
>>
>> I tried to use nonzero_bits in md file to make it more
>> general, but the testing shows it isn't doable.  The
>> reason is that some passes would replace some pseudos
>> with other pseudos and do the recog again, but at that
>> time the nonzero_bits could get rough information and
>> lead the recog fails unexpectedly.
> 
> Aha.  So nonzero_bits is unusable for most purposes as well.  Great :-/
> 
>> btw, the test case would reply on the combine patch[1].
> 
> Can you make a different testcase perhaps?  This patch looks fine
> otherwise.
> 

Thanks!  But I'm sorry that there is a typo, it should be "rely on"
rather than "reply on", without the patch[1] "combine: zeroing cost
for new copies", this test case doesn't get the desirable result.
I'll explain it more in that patch's thread.  If your uncommitted
patch there with define_split and nonzero_bits works, this patch
can be discarded totally.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561413.html

> (Ideally the compiler would just form those insert insns itself, of
> course.  Why doesn't that work?  This is important for quite a lot of
> code, we really can advantageously use the rl*imi insns often!)
> 
The reason why it doesn't work is that the *rotl<mode>3_insert_3
pattern requires one explicit AND (with some masks) there but in
this case the AND is optimized away due to that nonzero_bits
indicates the AND is redundant.

BR,
Kewen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
  2021-01-15  6:40   ` Kewen.Lin
@ 2021-01-19  7:14     ` Kewen.Lin
  0 siblings, 0 replies; 11+ messages in thread
From: Kewen.Lin @ 2021-01-19  7:14 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bill Schmidt, GCC Patches

on 2021/1/15 下午2:40, Kewen.Lin via Gcc-patches wrote:
> Hi Segher,
> 
> on 2021/1/15 上午8:50, Segher Boessenkool wrote:
>> Hi!
>>
>> On Tue, Dec 22, 2020 at 04:08:26PM +0800, Kewen.Lin wrote:
>>> This patch is to make unsigned int vector init go with
>>> rldimi to merge two integers instead of shift and ior.
>>>
>>> I tried to use nonzero_bits in md file to make it more
>>> general, but the testing shows it isn't doable.  The
>>> reason is that some passes would replace some pseudos
>>> with other pseudos and do the recog again, but at that
>>> time the nonzero_bits could get rough information and
>>> lead the recog fails unexpectedly.
>>
>> Aha.  So nonzero_bits is unusable for most purposes as well.  Great :-/
>>
>>> btw, the test case would reply on the combine patch[1].
>>
>> Can you make a different testcase perhaps?  This patch looks fine
>> otherwise.
>>
> 
> Thanks!  But I'm sorry that there is a typo, it should be "rely on"
> rather than "reply on", without the patch[1] "combine: zeroing cost
> for new copies", this test case doesn't get the desirable result.
> I'll explain it more in that patch's thread.  If your uncommitted
> patch there with define_split and nonzero_bits works, this patch
> can be discarded totally.

After the testing and the commit log shows, I realized that that
patch can only work for three instruction combination, so this patch
is still needed.


BR,
Kewen

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-19 21:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03  6:37 [PATCH] rs6000: Use rldimi for vec init instead of shift + ior Kewen.Lin
2021-02-18  1:50 ` PING " Kewen.Lin
2021-02-18 17:58 ` will schmidt
2021-02-18 18:28   ` Segher Boessenkool
2021-02-18 18:33 ` Segher Boessenkool
2021-02-19  3:06   ` Kewen.Lin
2021-02-19 21:03     ` Segher Boessenkool
  -- strict thread matches above, loose matches on Subject: below --
2020-12-22  8:08 Kewen.Lin
2021-01-15  0:50 ` Segher Boessenkool
2021-01-15  6:40   ` Kewen.Lin
2021-01-19  7:14     ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).