public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Xionghu Luo <yinyuefengyi@gmail.com>
To: "Kewen.Lin" <linkw@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>
Cc: Xionghu Luo <xionghuluo@tencent.com>,
	gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>,
	Segher Boessenkool <segher@kernel.crashing.org>
Subject: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
Date: Wed, 24 Aug 2022 09:24:31 +0800	[thread overview]
Message-ID: <5418ebd2-d544-f4cc-d930-bdde64ad2807@gmail.com> (raw)
In-Reply-To: <37b57a54-f98e-96a3-edff-866c8aae4c7d@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 63 bytes --]

Hi Segher, I'd like to resend and ping for this patch. Thanks.

[-- Attachment #2: v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch --]
[-- Type: text/plain, Size: 25714 bytes --]

From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
From: Xionghu Luo <xionghuluo@tencent.com>
Date: Thu, 4 Aug 2022 03:44:58 +0000
Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
 UNSPECS [PR106069]

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
	(altivec_vmrghb_direct_be): New pattern for BE.
	(altivec_vmrghb_direct_le): New pattern for LE.
	(altivec_vmrghh_direct): Remove.
	(altivec_vmrghh_direct_be): New pattern for BE.
	(altivec_vmrghh_direct_le): New pattern for LE.
	(altivec_vmrghw_direct_<mode>): Remove.
	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
	(altivec_vmrglb_direct): Remove.
	(altivec_vmrglb_direct_be): New pattern for BE.
	(altivec_vmrglb_direct_le): New pattern for LE.
	(altivec_vmrglh_direct): Remove.
	(altivec_vmrglh_direct_be): New pattern for BE.
	(altivec_vmrglh_direct_le): New pattern for LE.
	(altivec_vmrglw_direct_<mode>): Remove.
	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
	Adjust.
	* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

	PR target/106069
	* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
 gcc/config/rs6000/altivec.md                | 222 ++++++++++++++------
 gcc/config/rs6000/rs6000.cc                 |  24 +--
 gcc/config/rs6000/vsx.md                    |  28 +--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
 4 files changed, 307 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..c6a381908cb 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
 		     (const_int 5) (const_int 21)
 		     (const_int 6) (const_int 22)
 		     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int  8) (const_int 24)
+		     (const_int  9) (const_int 25)
+		     (const_int 10) (const_int 26)
+		     (const_int 11) (const_int 27)
+		     (const_int 12) (const_int 28)
+		     (const_int 13) (const_int 29)
+		     (const_int 14) (const_int 30)
+		     (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+	(vec_select:V8HI
 	  (vec_concat:V16HI
 	    (match_operand:V8HI 1 "register_operand" "v")
 	    (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
 		     (const_int 1) (const_int 9)
 		     (const_int 2) (const_int 10)
 		     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 4) (const_int 12)
+		     (const_int 5) (const_int 13)
+		     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghh %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 0) (const_int 4)
 		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrghw %x0,%x1,%x2
+   vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrghw %x0,%x1,%x2
    vmrghw %0,%1,%2"
@@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
@@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
 		     (const_int 13) (const_int 29)
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 16)
+		     (const_int 1) (const_int 17)
+		     (const_int 2) (const_int 18)
+		     (const_int 3) (const_int 19)
+		     (const_int 4) (const_int 20)
+		     (const_int 5) (const_int 21)
+		     (const_int 6) (const_int 22)
+		     (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (vec_select:V8HI
 	  (vec_concat:V16HI
@@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
 		     (const_int 5) (const_int 13)
 		     (const_int 6) (const_int 14)
 		     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 8)
+		     (const_int 1) (const_int 9)
+		     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglh %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 2) (const_int 6)
 		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrglw %x0,%x1,%x2
+   vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrglw %x0,%x1,%x2
    vmrglw %0,%1,%2"
@@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..c6ccd40e089 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
      CODE_FOR_altivec_vpkuwum_direct,
      {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
+		      : CODE_FOR_altivec_vmrglb_direct_le,
      {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
+		      : CODE_FOR_altivec_vmrglh_direct_le,
      {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
      {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
+		      : CODE_FOR_altivec_vmrghb_direct_le,
      {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
+		      : CODE_FOR_altivec_vmrghh_direct_le,
      {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
      {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
     {OPTION_MASK_P8_VECTOR,
      BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..80f84e9b141 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
 		     (const_int 1) (const_int 5)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
@@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
 		     (const_int 3) (const_int 7)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..c89739ecb55
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,118 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    __builtin_memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0


  reply	other threads:[~2022-08-24  1:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08  3:42 [PATCH] " Xionghu Luo
2022-08-09  3:01 ` Kewen.Lin
2022-08-09 22:03   ` Segher Boessenkool
2022-08-10  6:39   ` [PATCH v2] " Xionghu Luo
2022-08-10 17:07     ` Segher Boessenkool
2022-08-11  6:15       ` Xionghu Luo
2022-08-16  6:53         ` Kewen.Lin
2022-08-17  6:23           ` [PATCH v4] " Xionghu Luo
2022-08-24  1:24             ` Xionghu Luo [this message]
2023-01-18  9:11               ` Ping: " Kewen.Lin
2023-02-09  2:15                 ` Xionghu Luo
2023-02-09 15:52                   ` Segher Boessenkool
2023-02-10  2:59 Xionghu Luo
2023-02-28  6:43 ` Ping: " Xionghu Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5418ebd2-d544-f4cc-d930-bdde64ad2807@gmail.com \
    --to=yinyuefengyi@gmail.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=segher@kernel.crashing.org \
    --cc=xionghuluo@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).