[PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
@ 2023-02-10  2:59 Xionghu Luo
  2023-02-28  6:43 ` Ping: " Xionghu Luo
  2023-03-30 19:30 ` Segher Boessenkool
  0 siblings, 2 replies; 5+ messages in thread
From: Xionghu Luo @ 2023-02-10  2:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, Xionghu Luo

Resend this patch...

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
	(altivec_vmrghb_direct_be): New pattern for BE.
	(altivec_vmrghb_direct_le): New pattern for LE.
	(altivec_vmrghh_direct): Remove.
	(altivec_vmrghh_direct_be): New pattern for BE.
	(altivec_vmrghh_direct_le): New pattern for LE.
	(altivec_vmrghw_direct_<mode>): Remove.
	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
	(altivec_vmrglb_direct): Remove.
	(altivec_vmrglb_direct_be): New pattern for BE.
	(altivec_vmrglb_direct_le): New pattern for LE.
	(altivec_vmrglh_direct): Remove.
	(altivec_vmrglh_direct_be): New pattern for BE.
	(altivec_vmrglh_direct_le): New pattern for LE.
	(altivec_vmrglw_direct_<mode>): Remove.
	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
	Adjust.
	* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

	PR target/106069
	* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
 gcc/config/rs6000/altivec.md                | 222 ++++++++++++++------
 gcc/config/rs6000/rs6000.cc                 |  24 +--
 gcc/config/rs6000/vsx.md                    |  28 +--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
 4 files changed, 307 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 30606b8ab21..4bfeecec224 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
 		     (const_int 5) (const_int 21)
 		     (const_int 6) (const_int 22)
 		     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int  8) (const_int 24)
+		     (const_int  9) (const_int 25)
+		     (const_int 10) (const_int 26)
+		     (const_int 11) (const_int 27)
+		     (const_int 12) (const_int 28)
+		     (const_int 13) (const_int 29)
+		     (const_int 14) (const_int 30)
+		     (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+	(vec_select:V8HI
 	  (vec_concat:V16HI
 	    (match_operand:V8HI 1 "register_operand" "v")
 	    (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
 		     (const_int 1) (const_int 9)
 		     (const_int 2) (const_int 10)
 		     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 4) (const_int 12)
+		     (const_int 5) (const_int 13)
+		     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghh %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 0) (const_int 4)
 		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrghw %x0,%x1,%x2
+   vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrghw %x0,%x1,%x2
    vmrghw %0,%1,%2"
@@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
@@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
 		     (const_int 13) (const_int 29)
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 16)
+		     (const_int 1) (const_int 17)
+		     (const_int 2) (const_int 18)
+		     (const_int 3) (const_int 19)
+		     (const_int 4) (const_int 20)
+		     (const_int 5) (const_int 21)
+		     (const_int 6) (const_int 22)
+		     (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (vec_select:V8HI
 	  (vec_concat:V16HI
@@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
 		     (const_int 5) (const_int 13)
 		     (const_int 6) (const_int 14)
 		     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 8)
+		     (const_int 1) (const_int 9)
+		     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglh %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 2) (const_int 6)
 		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrglw %x0,%x1,%x2
+   vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrglw %x0,%x1,%x2
    vmrglw %0,%1,%2"
@@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 16ca3a31757..aba6315cd5f 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
      CODE_FOR_altivec_vpkuwum_direct,
      {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
+		      : CODE_FOR_altivec_vmrglb_direct_le,
      {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
+		      : CODE_FOR_altivec_vmrglh_direct_le,
      {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
      {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
+		      : CODE_FOR_altivec_vmrghb_direct_le,
      {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
+		      : CODE_FOR_altivec_vmrghh_direct_le,
      {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
      {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
     {OPTION_MASK_P8_VECTOR,
      BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..f8d2c316a55 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_<mode>"
 		     (const_int 1) (const_int 5)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
@@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_<mode>"
 		     (const_int 3) (const_int 7)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..c89739ecb55
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,118 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    __builtin_memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2023-02-10  2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo
@ 2023-02-28  6:43 ` Xionghu Luo
  2023-03-30 19:30 ` Segher Boessenkool
  1 sibling, 0 replies; 5+ messages in thread
From: Xionghu Luo @ 2023-02-28  6:43 UTC (permalink / raw)
  To: Xionghu Luo, gcc-patches; +Cc: segher, linkw

Hi Segher, Ping this for stage 4...


On 2023/2/10 10:59, Xionghu Luo via Gcc-patches wrote:
> Resend this patch...
> 
> v4: Update per comments.
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
> 
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> 		   (subreg:V4SI (reg:V16QI 139) 0)
> 		   (subreg:V4SI (reg:V16QI 140) 0))
> 		   [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> Linux.
> 
> gcc/ChangeLog:
> 
> 	PR target/106069
> 	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
> 	(altivec_vmrghb_direct_be): New pattern for BE.
> 	(altivec_vmrghb_direct_le): New pattern for LE.
> 	(altivec_vmrghh_direct): Remove.
> 	(altivec_vmrghh_direct_be): New pattern for BE.
> 	(altivec_vmrghh_direct_le): New pattern for LE.
> 	(altivec_vmrghw_direct_<mode>): Remove.
> 	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
> 	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
> 	(altivec_vmrglb_direct): Remove.
> 	(altivec_vmrglb_direct_be): New pattern for BE.
> 	(altivec_vmrglb_direct_le): New pattern for LE.
> 	(altivec_vmrglh_direct): Remove.
> 	(altivec_vmrglh_direct_be): New pattern for BE.
> 	(altivec_vmrglh_direct_le): New pattern for LE.
> 	(altivec_vmrglw_direct_<mode>): Remove.
> 	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
> 	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
> 	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
> 	Adjust.
> 	* config/rs6000/vsx.md: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR target/106069
> 	* g++.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
> ---
>   gcc/config/rs6000/altivec.md                | 222 ++++++++++++++------
>   gcc/config/rs6000/rs6000.cc                 |  24 +--
>   gcc/config/rs6000/vsx.md                    |  28 +--
>   gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
>   4 files changed, 307 insertions(+), 85 deletions(-)
>   create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 30606b8ab21..4bfeecec224 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
>      (use (match_operand:V16QI 2 "register_operand"))]
>     "TARGET_ALTIVEC"
>   {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> -						: gen_altivec_vmrglb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
>     DONE;
>   })
>   
> -(define_insn "altivec_vmrghb_direct"
> +(define_insn "altivec_vmrghb_direct_be"
>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>   	(vec_select:V16QI
>   	  (vec_concat:V32QI
> @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
>   		     (const_int 5) (const_int 21)
>   		     (const_int 6) (const_int 22)
>   		     (const_int 7) (const_int 23)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrghb %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghb_direct_le"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +	(vec_select:V16QI
> +	  (vec_concat:V32QI
> +	    (match_operand:V16QI 2 "register_operand" "v")
> +	    (match_operand:V16QI 1 "register_operand" "v"))
> +	  (parallel [(const_int  8) (const_int 24)
> +		     (const_int  9) (const_int 25)
> +		     (const_int 10) (const_int 26)
> +		     (const_int 11) (const_int 27)
> +		     (const_int 12) (const_int 28)
> +		     (const_int 13) (const_int 29)
> +		     (const_int 14) (const_int 30)
> +		     (const_int 15) (const_int 31)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>     "vmrghb %0,%1,%2"
>     [(set_attr "type" "vecperm")])
>   
> @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
>      (use (match_operand:V8HI 2 "register_operand"))]
>     "TARGET_ALTIVEC"
>   {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
> -						: gen_altivec_vmrglh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
>     DONE;
>   })
>   
> -(define_insn "altivec_vmrghh_direct"
> +(define_insn "altivec_vmrghh_direct_be"
>     [(set (match_operand:V8HI 0 "register_operand" "=v")
> -        (vec_select:V8HI
> +	(vec_select:V8HI
>   	  (vec_concat:V16HI
>   	    (match_operand:V8HI 1 "register_operand" "v")
>   	    (match_operand:V8HI 2 "register_operand" "v"))
> @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
>   		     (const_int 1) (const_int 9)
>   		     (const_int 2) (const_int 10)
>   		     (const_int 3) (const_int 11)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrghh %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghh_direct_le"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> +        (vec_select:V8HI
> +	  (vec_concat:V16HI
> +	    (match_operand:V8HI 2 "register_operand" "v")
> +	    (match_operand:V8HI 1 "register_operand" "v"))
> +	  (parallel [(const_int 4) (const_int 12)
> +		     (const_int 5) (const_int 13)
> +		     (const_int 6) (const_int 14)
> +		     (const_int 7) (const_int 15)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>     "vmrghh %0,%1,%2"
>     [(set_attr "type" "vecperm")])
>   
> @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
>      (use (match_operand:V4SI 2 "register_operand"))]
>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>   {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
> -			 : gen_altivec_vmrglw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>     DONE;
>   })
>   
> -(define_insn "altivec_vmrghw_direct_<mode>"
> +(define_insn "altivec_vmrghw_direct_<mode>_be"
>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>   	(vec_select:VSX_W
>   	  (vec_concat:<VS_double>
> @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
>   	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>   	  (parallel [(const_int 0) (const_int 4)
>   		     (const_int 1) (const_int 5)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "@
> +   xxmrghw %x0,%x1,%x2
> +   vmrghw %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghw_direct_<mode>_le"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> +	(vec_select:VSX_W
> +	  (vec_concat:<VS_double>
> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
> +	  (parallel [(const_int 2) (const_int 6)
> +		     (const_int 3) (const_int 7)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>     "@
>      xxmrghw %x0,%x1,%x2
>      vmrghw %0,%1,%2"
> @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
>      (use (match_operand:V16QI 2 "register_operand"))]
>     "TARGET_ALTIVEC"
>   {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
> -						: gen_altivec_vmrghb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
>     DONE;
>   })
>   
> -(define_insn "altivec_vmrglb_direct"
> +(define_insn "altivec_vmrglb_direct_be"
>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>   	(vec_select:V16QI
>   	  (vec_concat:V32QI
> @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
>   		     (const_int 13) (const_int 29)
>   		     (const_int 14) (const_int 30)
>   		     (const_int 15) (const_int 31)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrglb %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglb_direct_le"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +	(vec_select:V16QI
> +	  (vec_concat:V32QI
> +	    (match_operand:V16QI 2 "register_operand" "v")
> +	    (match_operand:V16QI 1 "register_operand" "v"))
> +	  (parallel [(const_int 0) (const_int 16)
> +		     (const_int 1) (const_int 17)
> +		     (const_int 2) (const_int 18)
> +		     (const_int 3) (const_int 19)
> +		     (const_int 4) (const_int 20)
> +		     (const_int 5) (const_int 21)
> +		     (const_int 6) (const_int 22)
> +		     (const_int 7) (const_int 23)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>     "vmrglb %0,%1,%2"
>     [(set_attr "type" "vecperm")])
>   
> @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
>      (use (match_operand:V8HI 2 "register_operand"))]
>     "TARGET_ALTIVEC"
>   {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
> -						: gen_altivec_vmrghh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
>     DONE;
>   })
>   
> -(define_insn "altivec_vmrglh_direct"
> +(define_insn "altivec_vmrglh_direct_be"
>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>           (vec_select:V8HI
>   	  (vec_concat:V16HI
> @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
>   		     (const_int 5) (const_int 13)
>   		     (const_int 6) (const_int 14)
>   		     (const_int 7) (const_int 15)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrglh %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglh_direct_le"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> +	(vec_select:V8HI
> +	  (vec_concat:V16HI
> +	    (match_operand:V8HI 2 "register_operand" "v")
> +	    (match_operand:V8HI 1 "register_operand" "v"))
> +	  (parallel [(const_int 0) (const_int 8)
> +		     (const_int 1) (const_int 9)
> +		     (const_int 2) (const_int 10)
> +		     (const_int 3) (const_int 11)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>     "vmrglh %0,%1,%2"
>     [(set_attr "type" "vecperm")])
>   
> @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
>      (use (match_operand:V4SI 2 "register_operand"))]
>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>   {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
> -			 : gen_altivec_vmrghw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>     DONE;
>   })
>   
> -(define_insn "altivec_vmrglw_direct_<mode>"
> +(define_insn "altivec_vmrglw_direct_<mode>_be"
>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>   	(vec_select:VSX_W
>   	  (vec_concat:<VS_double>
> @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
>   	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>   	  (parallel [(const_int 2) (const_int 6)
>   		     (const_int 3) (const_int 7)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "@
> +   xxmrglw %x0,%x1,%x2
> +   vmrglw %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglw_direct_<mode>_le"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> +	(vec_select:VSX_W
> +	  (vec_concat:<VS_double>
> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
> +	  (parallel [(const_int 0) (const_int 4)
> +		     (const_int 1) (const_int 5)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>     "@
>      xxmrglw %x0,%x1,%x2
>      vmrglw %0,%1,%2"
> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>       {
>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
>       {
>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
>       {
>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
>       {
>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
>       {
>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
>       {
>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
>       {
>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
>       {
>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>       }
>     else
>       {
>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>       }
>     DONE;
>   })
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 16ca3a31757..aba6315cd5f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>        CODE_FOR_altivec_vpkuwum_direct,
>        {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
>       {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
> -		      : CODE_FOR_altivec_vmrglb_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
> +		      : CODE_FOR_altivec_vmrglb_direct_le,
>        {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
>       {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
> -		      : CODE_FOR_altivec_vmrglh_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
> +		      : CODE_FOR_altivec_vmrglh_direct_le,
>        {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
>       {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
> -		      : CODE_FOR_altivec_vmrglw_direct_v4si,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
> +		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
>        {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
>       {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
> -		      : CODE_FOR_altivec_vmrghb_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
> +		      : CODE_FOR_altivec_vmrghb_direct_le,
>        {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
>       {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
> -		      : CODE_FOR_altivec_vmrghh_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
> +		      : CODE_FOR_altivec_vmrghh_direct_le,
>        {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
>       {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
> -		      : CODE_FOR_altivec_vmrghw_direct_v4si,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
> +		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
>        {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
>       {OPTION_MASK_P8_VECTOR,
>        BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0865608f94a..f8d2c316a55 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_<mode>"
>   		     (const_int 1) (const_int 5)])))]
>     "VECTOR_MEM_VSX_P (<MODE>mode)"
>   {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
> -			 : gen_altivec_vmrglw_direct_<mode>;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>     DONE;
>   }
>     [(set_attr "type" "vecperm")])
> @@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_<mode>"
>   		     (const_int 3) (const_int 7)])))]
>     "VECTOR_MEM_VSX_P (<MODE>mode)"
>   {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
> -			 : gen_altivec_vmrghw_direct_<mode>;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>     DONE;
>   }
>     [(set_attr "type" "vecperm")])
> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
> new file mode 100644
> index 00000000000..c89739ecb55
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
> @@ -0,0 +1,118 @@
> +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
> +/* { dg-require-effective-target vmx_hw } */
> +/* { dg-do run } */
> +
> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
> +
> +union
> +{
> +  native_simd_type V;
> +  int R[4];
> +} store_le_vec;
> +
> +struct S
> +{
> +  S () = default;
> +  S (unsigned B0)
> +  {
> +    native_simd_type val{B0};
> +    m_simd = val;
> +  }
> +  void store_le (unsigned int out[])
> +  {
> +    store_le_vec.V = m_simd;
> +    unsigned int x0 = store_le_vec.R[0];
> +    __builtin_memcpy (out, &x0, 4);
> +  }
> +  S rotl (unsigned int r)
> +  {
> +    native_simd_type rot{r};
> +    return __builtin_vec_rl (m_simd, rot);
> +  }
> +  void operator+= (S other)
> +  {
> +    m_simd = __builtin_vec_add (m_simd, other.m_simd);
> +  }
> +  void operator^= (S other)
> +  {
> +    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
> +  }
> +  static void transpose (S &B0, S B1, S B2, S B3)
> +  {
> +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
> +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
> +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
> +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
> +    B0 = __builtin_vec_mergeh (T0, T1);
> +    B3 = __builtin_vec_mergel (T2, T3);
> +  }
> +  S (native_simd_type x) : m_simd (x) {}
> +  native_simd_type m_simd;
> +};
> +
> +void
> +foo (unsigned int output[], unsigned state[])
> +{
> +  S R00 = state[0];
> +  S R01 = state[0];
> +  S R02 = state[2];
> +  S R03 = state[0];
> +  S R05 = state[5];
> +  S R06 = state[6];
> +  S R07 = state[7];
> +  S R08 = state[8];
> +  S R09 = state[9];
> +  S R10 = state[10];
> +  S R11 = state[11];
> +  S R12 = state[12];
> +  S R13 = state[13];
> +  S R14 = state[4];
> +  S R15 = state[15];
> +  for (int r = 0; r != 10; ++r)
> +    {
> +      R09 += R13;
> +      R11 += R15;
> +      R05 ^= R09;
> +      R06 ^= R10;
> +      R07 ^= R11;
> +      R07 = R07.rotl (7);
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 ^= R01;
> +      R13 ^= R02;
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 = R12.rotl (8);
> +      R13 = R13.rotl (8);
> +      R10 += R15;
> +      R11 += R12;
> +      R08 += R13;
> +      R09 += R14;
> +      R05 ^= R10;
> +      R06 ^= R11;
> +      R07 ^= R08;
> +      R05 = R05.rotl (7);
> +      R06 = R06.rotl (7);
> +      R07 = R07.rotl (7);
> +    }
> +  R00 += state[0];
> +  S::transpose (R00, R01, R02, R03);
> +  R00.store_le (output);
> +}
> +
> +unsigned int res[1];
> +unsigned main_state[]{1634760805, 60878,      2036477234, 6,
> +		      0,	  825562964,  1471091955, 1346092787,
> +		      506976774,  4197066702, 518848283,  118491664,
> +		      0,	  0,	      0,	  0};
> +int
> +main ()
> +{
> +  foo (res, main_state);
> +  if (res[0] != 0x41fcef98)
> +    __builtin_abort ();
> +}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2023-02-10  2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo
  2023-02-28  6:43 ` Ping: " Xionghu Luo
@ 2023-03-30 19:30 ` Segher Boessenkool
  2023-03-31  2:47   ` Xionghu Luo
  1 sibling, 1 reply; 5+ messages in thread
From: Segher Boessenkool @ 2023-03-30 19:30 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw

Hi!

On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.

This isn't so obvious at all.  All elements of these constructs are
very much not endian-independent, because of very unfortunate choices
in the meaning of some RTL constructs.  It is possible all things in
this negate all other things, but please show that then.

>  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> 		   (subreg:V4SI (reg:V16QI 139) 0)
> 		   (subreg:V4SI (reg:V16QI 140) 0))
> 		   [const_int 0 4 1 5]))

With BE, if the source vecs are ABCD and EFGH, the vec_concat gives
ABCDEFGH, and the vec_select than gives AEBF.

What happens for LE?


Segher

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2023-03-30 19:30 ` Segher Boessenkool
@ 2023-03-31  2:47   ` Xionghu Luo
  0 siblings, 0 replies; 5+ messages in thread
From: Xionghu Luo @ 2023-03-31  2:47 UTC (permalink / raw)
  To: Segher Boessenkool, Xionghu Luo
  Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw

Thanks,

On 2023/3/31 03:30, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
>> The native RTL expression for vec_mrghw should be same for BE and LE as
>> they are register and endian-independent.
> 
> This isn't so obvious at all.  All elements of these constructs are
> very much not endian-independent, because of very unfortunate choices
> in the meaning of some RTL constructs.  It is possible all things in
> this negate all other things, but please show that then.
> 
>>   So both BE and LE need
>> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
>> with vec_select and vec_concat.
>>
>> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>> 		   (subreg:V4SI (reg:V16QI 139) 0)
>> 		   (subreg:V4SI (reg:V16QI 140) 0))
>> 		   [const_int 0 4 1 5]))
> 
> With BE, if the source vecs are ABCD and EFGH, the vec_concat gives
> ABCDEFGH, and the vec_select than gives AEBF.
> 
> What happens for LE?

on LE, the sources looks like DCBA and HGFE, vec_concat gives HGFEACBA 
with index reversed [7 6 5 4 3 2 1 0], so it also chooses FBEA like BE.


Take the case as example on P8LE:

test.c

__attribute__ ((__noinline__))
vector int bar (vector int a, vector int b)
{
   return vec_vmrghw (a, b);
}

int main ()
{

   vector int a = {0xa1345678, 0xa2345678,0xa3345678, 0xa4345678};
   vector int b = {0xb1345678, 0xb2345678,0xb3345678, 0xb4345678};
   vector int c = bar (a, b);
   printf("%x,%x,%x,%x\n", c[0], c[1], c[2], c[3]);
   return c[0];
}


.expand:

_3 = VEC_PERM_EXPR <a_1(D), b_2(D), { 0, 4, 1, 5 }>;

(insn 7 4 8 2 (set (reg:V16QI 122)
         (subreg:V16QI (reg/v:V4SI 118 [ a ]) 0)) "test.c":15:10 -1
      (nil))
(insn 8 7 9 2 (set (reg:V16QI 123)
         (subreg:V16QI (reg/v:V4SI 119 [ b ]) 0)) "test.c":15:10 -1
      (nil))
(insn 9 8 10 2 (set (reg:V4SI 124)
         (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 122) 0)
                 (subreg:V4SI (reg:V16QI 123) 0))
             (parallel [
                     (const_int 0 [0])
                     (const_int 4 [0x4])
                     (const_int 1 [0x1])
                     (const_int 5 [0x5])
                 ]))) "test.c":15:10 -1
      (nil))


And .vregs to .final:

(insn 15 9 16 (set (reg/i:V4SI 66 %v2)
         (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 66 %v2 [125])
                 (reg:V4SI 67 %v3 [126]))
             (parallel [
                     (const_int 0 [0])
                     (const_int 4 [0x4])
                     (const_int 1 [0x1])
                     (const_int 5 [0x5])
                 ]))) "test.c":16:1 1825 {altivec_vmrglw_direct_v4si_le}
      (expr_list:REG_DEAD (reg:V4SI 67 %v3 [126])
         (nil)))


As altivec_vmrglw_direct_v4si_le is defined as with this patch:


(define_insn "altivec_vmrglw_direct_<mode>_le"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
         (vec_select:VSX_W
           (vec_concat:<VS_double>
             (match_operand:VSX_W 2 "register_operand" "wa,v")
             (match_operand:VSX_W 1 "register_operand" "wa,v"))
           (parallel [(const_int 0) (const_int 4)
                      (const_int 1) (const_int 5)])))]
   "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrglw %x0,%x1,%x2
    vmrglw %0,%1,%2"
   [(set_attr "type" "vecperm")])


ASM:

bar:
.LFB11:
         .cfi_startproc
         xxmrglw 34,35,34
         blr


./test
a1345678,b1345678,a2345678,b2345678

Exactly matches [a1 b1 a2 b2].  Does this look reasonable?


BR,
Xionghu


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-16  6:53         ` Kewen.Lin
@ 2022-08-17  6:23           ` Xionghu Luo
  0 siblings, 0 replies; 5+ messages in thread
From: Xionghu Luo @ 2022-08-17  6:23 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Xionghu Luo, gcc-patches, David Edelsohn, Segher Boessenkool



On 2022/8/16 14:53, Kewen.Lin wrote:
> Hi Xionghu,
> 
> Thanks for the updated version of patch, some comments are inlined.
> 
> on 2022/8/11 14:15, Xionghu Luo wrote:
>>
>>
>> On 2022/8/11 01:07, Segher Boessenkool wrote:
>>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
>>>> On 2022/8/9 11:01, Kewen.Lin wrote:
>>>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix
>>>>> "_direct" is normally to indicate the define_insn is mapped to the
>>>>> corresponding hw insn directly.  With this change, for example,
>>>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
>>>>> misleading.  Maybe we can add the corresponding _direct_le and _direct_be
>>>>> versions, both are mapped into the same insn but have different RTL
>>>>> patterns.  Looking forward to Segher's and David's suggestions.
>>>>
>>>> Thanks!  Do you mean same RTL patterns with different hw insn?
>>>
>>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
>>> instruction, never a vmrglb instead.  Misleading names are an expensive
>>> problem.
>>>
>>>
>>
>> Thanks.  Then on LE platforms, if user calls altivec_vmrghw，it will be
>> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
>> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
>> For BE just strict forward, seems more clear :-), OK for master?
>>
>>
>> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
>>
>> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
>> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
>> patterns.
>> v2: Split the direct pattern to be and le with same RTL but different insn.
>>
>> The native RTL expression for vec_mrghw should be same for BE and LE as
>> they are register and endian-independent.  So both BE and LE need
>> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
>> with vec_select and vec_concat.
>>
>> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>>             (subreg:V4SI (reg:V16QI 139) 0)
>>             (subreg:V4SI (reg:V16QI 140) 0))
>>             [const_int 0 4 1 5]))
>>
>> Then combine pass could do the nested vec_select optimization
>> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
>> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
>>
>> =>
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
>> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
>>
>> The endianness check need only once at ASM generation finally.
>> ASM would be better due to nested vec_select simplified to simple scalar
>> load.
>>
>> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
>> Linux(Thanks to Kewen).
>>
>> gcc/ChangeLog:
>>
>>      PR target/106069
>>      * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
>>      (altivec_vmrghb_direct_be): New pattern for BE.
>>      (altivec_vmrglb_direct_le): New pattern for LE.
>>      (altivec_vmrghh_direct): Remove.
>>      (altivec_vmrghh_direct_be): New pattern for BE.
>>      (altivec_vmrglh_direct_le): New pattern for LE.
>>      (altivec_vmrghw_direct_<mode>): Remove.
>>      (altivec_vmrghw_direct_<mode>_be): New pattern for BE.
>>      (altivec_vmrglw_direct_<mode>_le): New pattern for LE.
>>      (altivec_vmrglb_direct): Remove.
>>      (altivec_vmrglb_direct_be): New pattern for BE.
>>      (altivec_vmrghb_direct_le): New pattern for LE.
>>      (altivec_vmrglh_direct): Remove.
>>      (altivec_vmrglh_direct_be): New pattern for BE.
>>      (altivec_vmrghh_direct_le): New pattern for LE.
>>      (altivec_vmrglw_direct_<mode>): Remove.
>>      (altivec_vmrglw_direct_<mode>_be): New pattern for BE.
>>      (altivec_vmrghw_direct_<mode>_le): New pattern for LE.
>>      * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
>>      Adjust.
>>      * config/rs6000/vsx.md: Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      PR target/106069
>>      * g++.target/powerpc/pr106069.C: New test.
>>
>> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
>> ---
>>   gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------
>>   gcc/config/rs6000/rs6000.cc                 |  36 ++--
>>   gcc/config/rs6000/vsx.md                    |  24 +--
>>   gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++
>>   4 files changed, 305 insertions(+), 98 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
>>
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 2c4940f2e21..78245f470e9 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
>> -                        : gen_altivec_vmrglb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17),
>> +              GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
>> +              GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
>> +              GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
>> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
> 
> I think you can just call gen_altivec_vmrghb_direct_be and
> gen_altivec_vmrghb_direct_le separately here.  Similar for some other
> define_expands.
> 
>>   
>> -(define_insn "altivec_vmrghb_direct"
>> +(define_insn "altivec_vmrghb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>       (vec_select:V16QI
>>         (vec_concat:V32QI
>> @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct"
>>                (const_int 5) (const_int 21)
>>                (const_int 6) (const_int 22)
>>                (const_int 7) (const_int 23)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrghb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
> 
> Could you move the following altivec_vmrghb_direct_le here?
> Then readers can easily check the difference between be and
> le for the same altivec_vmrghb_direct.
> 
> Same comment applied for some other similar cases.
> 
>> +(define_insn "altivec_vmrglb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +    (vec_select:V16QI
>> +      (vec_concat:V32QI
>> +        (match_operand:V16QI 1 "register_operand" "v")
>> +        (match_operand:V16QI 2 "register_operand" "v"))
>> +      (parallel [(const_int 0) (const_int 16)
>> +             (const_int 1) (const_int 17)
>> +             (const_int 2) (const_int 18)
>> +             (const_int 3) (const_int 19)
>> +             (const_int 4) (const_int 20)
>> +             (const_int 5) (const_int 21)
>> +             (const_int 6) (const_int 22)
>> +             (const_int 7) (const_int 23)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrglb %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
> 
> Could you update this pattern for assembly "vmrglb %0,%1,%2"
> instead of "vmrglb %0,%2,%1"?  I checked the previous md
> before the culprit commit 0910c516a3d72af048, it emits
> "vmrglb %0,%1,%2" for altivec_vmrglb_direct.
> 
> Same comment applied for some other similar cases.
> 
>> +
>>   (define_expand "altivec_vmrghh"
>>     [(use (match_operand:V8HI 0 "register_operand"))
>>      (use (match_operand:V8HI 1 "register_operand"))
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
>> -                        : gen_altivec_vmrglh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
>> +              GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
>> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
>> +
>> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghh_direct"
>> +(define_insn "altivec_vmrghh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>> -        (vec_select:V8HI
>> +    (vec_select:V8HI
>>         (vec_concat:V16HI
>>           (match_operand:V8HI 1 "register_operand" "v")
>>           (match_operand:V8HI 2 "register_operand" "v"))
>> @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct"
>>                (const_int 1) (const_int 9)
>>                (const_int 2) (const_int 10)
>>                (const_int 3) (const_int 11)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrghh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> +(define_insn "altivec_vmrglh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +    (vec_select:V8HI
>> +      (vec_concat:V16HI
>> +        (match_operand:V8HI 1 "register_operand" "v")
>> +        (match_operand:V8HI 2 "register_operand" "v"))
>> +      (parallel [(const_int 0) (const_int 8)
>> +             (const_int 1) (const_int 9)
>> +             (const_int 2) (const_int 10)
>> +             (const_int 3) (const_int 11)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrglh %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
>> +
>>   (define_expand "altivec_vmrghw"
>>     [(use (match_operand:V4SI 0 "register_operand"))
>>      (use (match_operand:V4SI 1 "register_operand"))
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
>> -             : gen_altivec_vmrglw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5));
>> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghw_direct_<mode>"
>> +(define_insn "altivec_vmrghw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>       (vec_select:VSX_W
>>         (vec_concat:<VS_double>
>> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
>>           (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>         (parallel [(const_int 0) (const_int 4)
>>                (const_int 1) (const_int 5)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +  xxmrghw %x0,%x1,%x2
>> +  vmrghw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +    (vec_select:VSX_W
>> +      (vec_concat:<VS_double>
>> +        (match_operand:VSX_W 1 "register_operand" "wa,v")
>> +        (match_operand:VSX_W 2 "register_operand" "wa,v"))
>> +      (parallel [(const_int 0) (const_int 4)
>> +             (const_int 1) (const_int 5)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>> -   xxmrghw %x0,%x1,%x2
>> -   vmrghw %0,%1,%2"
>> +  xxmrglw %x0,%x2,%x1
>> +  vmrglw %0,%2,%1"
>>     [(set_attr "type" "vecperm")])
>>   
>>   (define_insn "*altivec_vmrghsf"
>> @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
>> -                        : gen_altivec_vmrghb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25),
>> +              GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27),
>> +              GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29),
>> +              GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31));
>> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglb_direct"
>> +(define_insn "altivec_vmrglb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>       (vec_select:V16QI
>>         (vec_concat:V32QI
>> @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct"
>>                (const_int 13) (const_int 29)
>>                (const_int 14) (const_int 30)
>>                (const_int 15) (const_int 31)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrglb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> +(define_insn "altivec_vmrghb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +    (vec_select:V16QI
>> +      (vec_concat:V32QI
>> +        (match_operand:V16QI 1 "register_operand" "v")
>> +        (match_operand:V16QI 2 "register_operand" "v"))
>> +      (parallel [(const_int  8) (const_int 24)
>> +             (const_int  9) (const_int 25)
>> +             (const_int 10) (const_int 26)
>> +             (const_int 11) (const_int 27)
>> +             (const_int 12) (const_int 28)
>> +             (const_int 13) (const_int 29)
>> +             (const_int 14) (const_int 30)
>> +             (const_int 15) (const_int 31)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrghb %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
>> +
>>   (define_expand "altivec_vmrglh"
>>     [(use (match_operand:V8HI 0 "register_operand"))
>>      (use (match_operand:V8HI 1 "register_operand"))
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
>> -                        : gen_altivec_vmrghh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13),
>> +              GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15));
>> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglh_direct"
>> +(define_insn "altivec_vmrglh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>>           (vec_select:V8HI
>>         (vec_concat:V16HI
>> @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct"
>>                (const_int 5) (const_int 13)
>>                (const_int 6) (const_int 14)
>>                (const_int 7) (const_int 15)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrglh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> +(define_insn "altivec_vmrghh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +        (vec_select:V8HI
>> +      (vec_concat:V16HI
>> +        (match_operand:V8HI 1 "register_operand" "v")
>> +        (match_operand:V8HI 2 "register_operand" "v"))
>> +      (parallel [(const_int 4) (const_int 12)
>> +             (const_int 5) (const_int 13)
>> +             (const_int 6) (const_int 14)
>> +             (const_int 7) (const_int 15)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrghh %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
>> +
>>   (define_expand "altivec_vmrglw"
>>     [(use (match_operand:V4SI 0 "register_operand"))
>>      (use (match_operand:V4SI 1 "register_operand"))
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
>> -             : gen_altivec_vmrghw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7));
>> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglw_direct_<mode>"
>> +(define_insn "altivec_vmrglw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>       (vec_select:VSX_W
>>         (vec_concat:<VS_double>
>> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
>>           (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>         (parallel [(const_int 2) (const_int 6)
>>                (const_int 3) (const_int 7)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +  xxmrglw %x0,%x1,%x2
>> +  vmrglw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +    (vec_select:VSX_W
>> +      (vec_concat:<VS_double>
>> +        (match_operand:VSX_W 1 "register_operand" "wa,v")
>> +        (match_operand:VSX_W 2 "register_operand" "wa,v"))
>> +      (parallel [(const_int 2) (const_int 6)
>> +             (const_int 3) (const_int 7)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>> -   xxmrglw %x0,%x1,%x2
>> -   vmrglw %0,%1,%2"
>> +  xxmrghw %x0,%x2,%x1
>> +  vmrghw %0,%2,%1"
>>     [(set_attr "type" "vecperm")])
>>   
>>   (define_insn "*altivec_vmrglsf"
>> @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
> 
> Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2",
> you need to change this to:
> 
>    emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
> 
> Same comment applied for some other similar cases.
> 
>>       }
>>     DONE;
>>   })
>> @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..97da7706f63 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>>       {OPTION_MASK_ALTIVEC,
>>        CODE_FOR_altivec_vpkuwum_direct,
>>        {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
>> -              : CODE_FOR_altivec_vmrglb_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be,
>>        {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
> 
> Before the culprit commit 0910c516a3d72af04, we have:
> 
>      { OPTION_MASK_ALTIVEC,
>        (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
>         : CODE_FOR_altivec_vmrglb_direct),
>        {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
> 
> I think we should use:
> 
>      { OPTION_MASK_ALTIVEC,
>        (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
>         : CODE_FOR_altivec_vmrglb_direct_le),
>        {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
> 
> here instead.  Similar comment for those related below.
> 
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
>> -              : CODE_FOR_altivec_vmrglh_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be,
>>        {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
>> -              : CODE_FOR_altivec_vmrglw_direct_v4si,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be,
>>        {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
>> -              : CODE_FOR_altivec_vmrghb_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be,
>>        {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
>> -              : CODE_FOR_altivec_vmrghh_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be,
>>        {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
>> -              : CODE_FOR_altivec_vmrghw_direct_v4si,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be,
>>        {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
>>       {OPTION_MASK_P8_VECTOR,
>>        BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
>> @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>>   
>>             /* For little-endian, the two input operands must be swapped
>>                (or swapped back) to ensure proper right-to-left numbering
>> -             from 0 to 2N-1.  */
>> -      if (swapped ^ !BYTES_BIG_ENDIAN
>> -          && icode != CODE_FOR_vsx_xxpermdi_v16qi)
>> +         from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */
>> +      if (swapped ^ !BYTES_BIG_ENDIAN)
>> +        if (!(icode == CODE_FOR_altivec_vmrghb_direct_be
>> +          || icode == CODE_FOR_altivec_vmrglb_direct_be
>> +          || icode == CODE_FOR_altivec_vmrghh_direct_be
>> +          || icode == CODE_FOR_altivec_vmrglh_direct_be
>> +          || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be
>> +          || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be
>> +          || icode == CODE_FOR_vsx_xxpermdi_v16qi))
>>           std::swap (op0, op1);
> 
> IIUC, we don't need this part of change once we fix the operand order in
> the assembly for those LE "direct"s.
> 
> BR,
> Kewen
> 

Thanks.  Addressed all the comments as v4.


v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
	(altivec_vmrghb_direct_be): New pattern for BE.
	(altivec_vmrghb_direct_le): New pattern for LE.
	(altivec_vmrghh_direct): Remove.
	(altivec_vmrghh_direct_be): New pattern for BE.
	(altivec_vmrghh_direct_le): New pattern for LE.
	(altivec_vmrghw_direct_<mode>): Remove.
	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
	(altivec_vmrglb_direct): Remove.
	(altivec_vmrglb_direct_be): New pattern for BE.
	(altivec_vmrglb_direct_le): New pattern for LE.
	(altivec_vmrglh_direct): Remove.
	(altivec_vmrglh_direct_be): New pattern for BE.
	(altivec_vmrglh_direct_le): New pattern for LE.
	(altivec_vmrglw_direct_<mode>): Remove.
	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
	Adjust.
	* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

	PR target/106069
	* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
  gcc/config/rs6000/altivec.md                | 230 ++++++++++++++------
  gcc/config/rs6000/rs6000.cc                 |  24 +-
  gcc/config/rs6000/vsx.md                    |  28 ++-
  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 ++++++++++
  4 files changed, 313 insertions(+), 89 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..962df4657e6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
  		     (const_int 5) (const_int 21)
  		     (const_int 6) (const_int 22)
  		     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int  8) (const_int 24)
+		     (const_int  9) (const_int 25)
+		     (const_int 10) (const_int 26)
+		     (const_int 11) (const_int 27)
+		     (const_int 12) (const_int 28)
+		     (const_int 13) (const_int 29)
+		     (const_int 14) (const_int 30)
+		     (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrghb %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+	(vec_select:V8HI
  	  (vec_concat:V16HI
  	    (match_operand:V8HI 1 "register_operand" "v")
  	    (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
  		     (const_int 1) (const_int 9)
  		     (const_int 2) (const_int 10)
  		     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 4) (const_int 12)
+		     (const_int 5) (const_int 13)
+		     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrghh %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1221,10 +1257,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 0) (const_int 4)
  		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
  (define_insn "*altivec_vmrghsf"
@@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
  		     (const_int 13) (const_int 29)
  		     (const_int 14) (const_int 30)
  		     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 16)
+		     (const_int 1) (const_int 17)
+		     (const_int 2) (const_int 18)
+		     (const_int 3) (const_int 19)
+		     (const_int 4) (const_int 20)
+		     (const_int 5) (const_int 21)
+		     (const_int 6) (const_int 22)
+		     (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrglb %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
          (vec_select:V8HI
  	  (vec_concat:V16HI
@@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
  		     (const_int 5) (const_int 13)
  		     (const_int 6) (const_int 14)
  		     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 8)
+		     (const_int 1) (const_int 9)
+		     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrglh %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1327,10 +1413,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 2) (const_int 6)
  		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrglw %x0,%x1,%x2
-   vmrglw %0,%1,%2"
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
  (define_insn "*altivec_vmrglsf"
@@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..c6ccd40e089 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
       CODE_FOR_altivec_vpkuwum_direct,
       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
+		      : CODE_FOR_altivec_vmrglb_direct_le,
       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
+		      : CODE_FOR_altivec_vmrglh_direct_le,
       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
+		      : CODE_FOR_altivec_vmrghb_direct_le,
       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
+		      : CODE_FOR_altivec_vmrghh_direct_le,
       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
      {OPTION_MASK_P8_VECTOR,
       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..80f84e9b141 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
  		     (const_int 1) (const_int 5)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  }
    [(set_attr "type" "vecperm")])
@@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
  		     (const_int 3) (const_int 7)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  }
    [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..2cde9b821e3
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,120 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+extern "C" void *
+memcpy (void *, const void *, unsigned long);
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-31  2:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10  2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo
2023-02-28  6:43 ` Ping: " Xionghu Luo
2023-03-30 19:30 ` Segher Boessenkool
2023-03-31  2:47   ` Xionghu Luo
  -- strict thread matches above, loose matches on Subject: below --
2022-08-08  3:42 [PATCH] " Xionghu Luo
2022-08-09  3:01 ` Kewen.Lin
2022-08-10  6:39   ` [PATCH v2] " Xionghu Luo
2022-08-10 17:07     ` Segher Boessenkool
2022-08-11  6:15       ` Xionghu Luo
2022-08-16  6:53         ` Kewen.Lin
2022-08-17  6:23           ` [PATCH v4] " Xionghu Luo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).