[PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
@ 2022-08-08  3:42 Xionghu Luo
  2022-08-09  3:01 ` Kewen.Lin
  0 siblings, 1 reply; 12+ messages in thread
From: Xionghu Luo @ 2022-08-08  3:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, linkw, Xionghu Luo

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux(Thanks to Kewen), OK for master?  Or should we revert r12-4496 to
restore to the UNSPEC implementation?

gcc/ChangeLog:
	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb): Emit same native
	RTL for BE and LE.
	(altivec_vmrghh): Likewise.
	(altivec_vmrghw): Likewise.
	(*altivec_vmrghsf): Adjust.
	(altivec_vmrglb): Likewise.
	(altivec_vmrglh): Likewise.
	(altivec_vmrglw): Likewise.
	(*altivec_vmrglsf): Adjust.
	(altivec_vmrghb_direct): Emit different ASM for BE and LE.
	(altivec_vmrghh_direct): Likewise.
	(altivec_vmrghw_direct_<mode>): Likewise.
	(altivec_vmrglb_direct): Likewise.
	(altivec_vmrglh_direct): Likewise.
	(altivec_vmrglw_direct_<mode>): Likewise.
	(vec_widen_smult_hi_v16qi): Adjust.
	(vec_widen_smult_lo_v16qi): Adjust.
	(vec_widen_umult_hi_v16qi): Adjust.
	(vec_widen_umult_lo_v16qi): Adjust.
	(vec_widen_smult_hi_v8hi): Adjust.
	(vec_widen_smult_lo_v8hi): Adjust.
	(vec_widen_umult_hi_v8hi): Adjust.
	(vec_widen_umult_lo_v8hi): Adjust.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same
	native RTL for BE and LE.
	* config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise.
	(vsx_xxmrglw_<mode>): Likewise.

gcc/testsuite/ChangeLog:
	PR target/106069
	* gcc.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
 gcc/config/rs6000/altivec.md                | 122 ++++++++++++--------
 gcc/config/rs6000/rs6000.cc                 |  36 +++---
 gcc/config/rs6000/vsx.md                    |  16 +--
 gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++
 4 files changed, 209 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..8d9c0109559 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2]));
   DONE;
 })
 
@@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct"
 		     (const_int 6) (const_int 22)
 		     (const_int 7) (const_int 23)])))]
   "TARGET_ALTIVEC"
-  "vmrghb %0,%1,%2"
+  {
+     if (BYTES_BIG_ENDIAN)
+      return "vmrghb %0,%1,%2";
+    else
+      return "vmrglb %0,%2,%1";
+ }
   [(set_attr "type" "vecperm")])
 
 (define_expand "altivec_vmrghh"
@@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2]));
   DONE;
 })
 
@@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct"
 		     (const_int 2) (const_int 10)
 		     (const_int 3) (const_int 11)])))]
   "TARGET_ALTIVEC"
-  "vmrghh %0,%1,%2"
+  {
+     if (BYTES_BIG_ENDIAN)
+      return "vmrghh %0,%1,%2";
+    else
+      return "vmrglh %0,%2,%1";
+ }
   [(set_attr "type" "vecperm")])
 
 (define_expand "altivec_vmrghw"
@@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (
+    gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2]));
   DONE;
 })
 
@@ -1222,9 +1220,22 @@ (define_insn "altivec_vmrghw_direct_<mode>"
 	  (parallel [(const_int 0) (const_int 4)
 		     (const_int 1) (const_int 5)])))]
   "TARGET_ALTIVEC"
-  "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+ {
+   if (which_alternative == 0)
+   {
+     if (BYTES_BIG_ENDIAN)
+       return "xxmrghw %x0,%x1,%x2";
+     else
+       return "xxmrglw %x0,%x2,%x1";
+   }
+   else
+   {
+     if (BYTES_BIG_ENDIAN)
+      return "vmrghw %0,%1,%2";
+    else
+      return "vmrglw %0,%2,%1";
+   }
+ }
   [(set_attr "type" "vecperm")])
 
 (define_insn "*altivec_vmrghsf"
@@ -1250,11 +1261,7 @@ (define_expand "altivec_vmrglb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrglb_direct (operands[0], operands[1], operands[2]));
   DONE;
 })
 
@@ -1273,7 +1280,12 @@ (define_insn "altivec_vmrglb_direct"
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
   "TARGET_ALTIVEC"
-  "vmrglb %0,%1,%2"
+  {
+    if (BYTES_BIG_ENDIAN)
+      return "vmrglb %0,%1,%2";
+    else
+      return "vmrghb %0,%2,%1";
+ }
   [(set_attr "type" "vecperm")])
 
 (define_expand "altivec_vmrglh"
@@ -1282,11 +1294,7 @@ (define_expand "altivec_vmrglh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (gen_altivec_vmrglh_direct (operands[0], operands[1], operands[2]));
   DONE;
 })
 
@@ -1301,7 +1309,12 @@ (define_insn "altivec_vmrglh_direct"
 		     (const_int 6) (const_int 14)
 		     (const_int 7) (const_int 15)])))]
   "TARGET_ALTIVEC"
-  "vmrglh %0,%1,%2"
+  {
+    if (BYTES_BIG_ENDIAN)
+      return "vmrglh %0,%1,%2";
+    else
+      return "vmrghh %0,%2,%1";
+ }
   [(set_attr "type" "vecperm")])
 
 (define_expand "altivec_vmrglw"
@@ -1310,12 +1323,8 @@ (define_expand "altivec_vmrglw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (
+    gen_altivec_vmrglw_direct_v4si (operands[0], operands[1], operands[2]));
   DONE;
 })
 
@@ -1328,9 +1337,22 @@ (define_insn "altivec_vmrglw_direct_<mode>"
 	  (parallel [(const_int 2) (const_int 6)
 		     (const_int 3) (const_int 7)])))]
   "TARGET_ALTIVEC"
-  "@
-   xxmrglw %x0,%x1,%x2
-   vmrglw %0,%1,%2"
+  {
+   if (which_alternative == 0)
+   {
+     if (BYTES_BIG_ENDIAN)
+       return "xxmrglw %x0,%x1,%x2";
+     else
+       return "xxmrghw %x0,%x2,%x1";
+   }
+   else
+   {
+     if (BYTES_BIG_ENDIAN)
+      return "vmrglw %0,%1,%2";
+     else
+      return "vmrghw %0,%2,%1";
+   }
+ }
   [(set_attr "type" "vecperm")])
 
 (define_insn "*altivec_vmrglsf"
@@ -3705,7 +3727,7 @@ (define_expand "vec_widen_umult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3730,7 +3752,7 @@ (define_expand "vec_widen_umult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3755,7 +3777,7 @@ (define_expand "vec_widen_smult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3780,7 +3802,7 @@ (define_expand "vec_widen_smult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3805,7 +3827,7 @@ (define_expand "vec_widen_umult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3830,7 +3852,7 @@ (define_expand "vec_widen_umult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3855,7 +3877,7 @@ (define_expand "vec_widen_smult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3880,7 +3902,7 @@ (define_expand "vec_widen_smult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
     }
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..018bea9f2f8 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
     {OPTION_MASK_ALTIVEC,
      CODE_FOR_altivec_vpkuwum_direct,
      {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct,
      {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct,
      {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si,
      {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct,
      {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct,
      {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si,
      {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
     {OPTION_MASK_P8_VECTOR,
      BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
@@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
 
           /* For little-endian, the two input operands must be swapped
              (or swapped back) to ensure proper right-to-left numbering
-             from 0 to 2N-1.  */
-	  if (swapped ^ !BYTES_BIG_ENDIAN
-	      && icode != CODE_FOR_vsx_xxpermdi_v16qi)
+	     from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */
+	  if (swapped ^ !BYTES_BIG_ENDIAN)
+	    if (!(icode == CODE_FOR_altivec_vmrghb_direct
+		  || icode == CODE_FOR_altivec_vmrglb_direct
+		  || icode == CODE_FOR_altivec_vmrghh_direct
+		  || icode == CODE_FOR_altivec_vmrglh_direct
+		  || icode == CODE_FOR_altivec_vmrghw_direct_v4si
+		  || icode == CODE_FOR_altivec_vmrglw_direct_v4si
+		  || icode == CODE_FOR_vsx_xxpermdi_v16qi))
 	    std::swap (op0, op1);
 	  if (imode != V16QImode)
 	    {
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..b84f667e4b2 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4688,12 +4688,8 @@ (define_expand "vsx_xxmrghw_<mode>"
 		     (const_int 1) (const_int 5)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (
+    gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
@@ -4708,12 +4704,8 @@ (define_expand "vsx_xxmrglw_<mode>"
 		     (const_int 3) (const_int 7)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  emit_insn (
+    gen_altivec_vmrglw_direct_v4si (operands[0], operands[1], operands[2]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..56219a74692
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C
@@ -0,0 +1,118 @@
+/* { dg-do run } */
+
+extern "C" void *
+memcpy (void *, const void *, unsigned long);
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    memcpy (out, &x0, 1);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-08  3:42 [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo
@ 2022-08-09  3:01 ` Kewen.Lin
  2022-08-09 22:03   ` Segher Boessenkool
  2022-08-10  6:39   ` [PATCH v2] " Xionghu Luo
  0 siblings, 2 replies; 12+ messages in thread
From: Kewen.Lin @ 2022-08-09  3:01 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: segher, Xionghu Luo, gcc-patches, David Edelsohn

Hi Xionghu,

Thanks for the fix.

on 2022/8/8 11:42, Xionghu Luo wrote:
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> 		   (subreg:V4SI (reg:V16QI 139) 0)
> 		   (subreg:V4SI (reg:V16QI 140) 0))
> 		   [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}

Sorry, no -m32 for LE testing.  I noticed the attachement in that PR didn't
include the test case (though the changelog has it), so I re-tested it
again, nothing changed.  :)

> Linux(Thanks to Kewen), OK for master?  Or should we revert r12-4496 to
> restore to the UNSPEC implementation?
> 

I have some concern on those changed "altivec_*_direct", IMHO the suffix
"_direct" is normally to indicate the define_insn is mapped to the
corresponding hw insn directly.  With this change, for example,
altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
misleading.  Maybe we can add the corresponding _direct_le and _direct_be
versions, both are mapped into the same insn but have different RTL
patterns.  Looking forward to Segher's and David's suggestions.

> gcc/ChangeLog:
> 	PR target/106069
> 	* config/rs6000/altivec.md (altivec_vmrghb): Emit same native
> 	RTL for BE and LE.
> 	(altivec_vmrghh): Likewise.
> 	(altivec_vmrghw): Likewise.
> 	(*altivec_vmrghsf): Adjust.
> 	(altivec_vmrglb): Likewise.
> 	(altivec_vmrglh): Likewise.
> 	(altivec_vmrglw): Likewise.
> 	(*altivec_vmrglsf): Adjust.
> 	(altivec_vmrghb_direct): Emit different ASM for BE and LE.
> 	(altivec_vmrghh_direct): Likewise.
> 	(altivec_vmrghw_direct_<mode>): Likewise.
> 	(altivec_vmrglb_direct): Likewise.
> 	(altivec_vmrglh_direct): Likewise.
> 	(altivec_vmrglw_direct_<mode>): Likewise.
> 	(vec_widen_smult_hi_v16qi): Adjust.
> 	(vec_widen_smult_lo_v16qi): Adjust.
> 	(vec_widen_umult_hi_v16qi): Adjust.
> 	(vec_widen_umult_lo_v16qi): Adjust.
> 	(vec_widen_smult_hi_v8hi): Adjust.
> 	(vec_widen_smult_lo_v8hi): Adjust.
> 	(vec_widen_umult_hi_v8hi): Adjust.
> 	(vec_widen_umult_lo_v8hi): Adjust.
> 	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same
> 	native RTL for BE and LE.
> 	* config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise.
> 	(vsx_xxmrglw_<mode>): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 	PR target/106069
> 	* gcc.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
> ---
>  gcc/config/rs6000/altivec.md                | 122 ++++++++++++--------
>  gcc/config/rs6000/rs6000.cc                 |  36 +++---
>  gcc/config/rs6000/vsx.md                    |  16 +--
>  gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++
>  4 files changed, 209 insertions(+), 83 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..8d9c0109559 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> -						: gen_altivec_vmrglb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
> @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct"
>  		     (const_int 6) (const_int 22)
>  		     (const_int 7) (const_int 23)])))]
>    "TARGET_ALTIVEC"
> -  "vmrghb %0,%1,%2"
> +  {
> +     if (BYTES_BIG_ENDIAN)
> +      return "vmrghb %0,%1,%2";
> +    else
> +      return "vmrglb %0,%2,%1";
> + }
>    [(set_attr "type" "vecperm")])
>  
>  (define_expand "altivec_vmrghh"
> @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh"
>     (use (match_operand:V8HI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
> -						: gen_altivec_vmrglh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
> @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct"
>  		     (const_int 2) (const_int 10)
>  		     (const_int 3) (const_int 11)])))]
>    "TARGET_ALTIVEC"
> -  "vmrghh %0,%1,%2"
> +  {
> +     if (BYTES_BIG_ENDIAN)
> +      return "vmrghh %0,%1,%2";
> +    else
> +      return "vmrglh %0,%2,%1";
> + }
>    [(set_attr "type" "vecperm")])
>  
>  (define_expand "altivec_vmrghw"
> @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw"
>     (use (match_operand:V4SI 2 "register_operand"))]
>    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
> -			 : gen_altivec_vmrglw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  emit_insn (
> +    gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
[snip]
>    [(set_attr "type" "vecperm")])
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C
> new file mode 100644
> index 00000000000..56219a74692
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C

Since this is a C++ test case, it should be placed in gcc/testsuite/g++.target/powerpc/.

> @@ -0,0 +1,118 @@
> +/* { dg-do run } */

This case requires altivec, it needs something like:

/* { dg-require-effective-target vmx_hw } */
/* { dg-options "-maltivec" } */

BR,
Kewen

> +
> +extern "C" void *
> +memcpy (void *, const void *, unsigned long);
> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
> +
> +union
> +{
> +  native_simd_type V;
> +  int R[4];
> +} store_le_vec;
> +
> +struct S
> +{
> +  S () = default;
> +  S (unsigned B0)
> +  {
> +    native_simd_type val{B0};
> +    m_simd = val;
> +  }
> +  void store_le (unsigned int out[])
> +  {
> +    store_le_vec.V = m_simd;
> +    unsigned int x0 = store_le_vec.R[0];
> +    memcpy (out, &x0, 1);
> +  }
> +  S rotl (unsigned int r)
> +  {
> +    native_simd_type rot{r};
> +    return __builtin_vec_rl (m_simd, rot);
> +  }
> +  void operator+= (S other)
> +  {
> +    m_simd = __builtin_vec_add (m_simd, other.m_simd);
> +  }
> +  void operator^= (S other)
> +  {
> +    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
> +  }
> +  static void transpose (S &B0, S B1, S B2, S B3)
> +  {
> +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
> +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
> +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
> +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
> +    B0 = __builtin_vec_mergeh (T0, T1);
> +    B3 = __builtin_vec_mergel (T2, T3);
> +  }
> +  S (native_simd_type x) : m_simd (x) {}
> +  native_simd_type m_simd;
> +};
> +
> +void
> +foo (unsigned int output[], unsigned state[])
> +{
> +  S R00 = state[0];
> +  S R01 = state[0];
> +  S R02 = state[2];
> +  S R03 = state[0];
> +  S R05 = state[5];
> +  S R06 = state[6];
> +  S R07 = state[7];
> +  S R08 = state[8];
> +  S R09 = state[9];
> +  S R10 = state[10];
> +  S R11 = state[11];
> +  S R12 = state[12];
> +  S R13 = state[13];
> +  S R14 = state[4];
> +  S R15 = state[15];
> +  for (int r = 0; r != 10; ++r)
> +    {
> +      R09 += R13;
> +      R11 += R15;
> +      R05 ^= R09;
> +      R06 ^= R10;
> +      R07 ^= R11;
> +      R07 = R07.rotl (7);
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 ^= R01;
> +      R13 ^= R02;
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 = R12.rotl (8);
> +      R13 = R13.rotl (8);
> +      R10 += R15;
> +      R11 += R12;
> +      R08 += R13;
> +      R09 += R14;
> +      R05 ^= R10;
> +      R06 ^= R11;
> +      R07 ^= R08;
> +      R05 = R05.rotl (7);
> +      R06 = R06.rotl (7);
> +      R07 = R07.rotl (7);
> +    }
> +  R00 += state[0];
> +  S::transpose (R00, R01, R02, R03);
> +  R00.store_le (output);
> +}
> +
> +unsigned int res[1];
> +unsigned main_state[]{1634760805, 60878,      2036477234, 6,
> +		      0,	  825562964,  1471091955, 1346092787,
> +		      506976774,  4197066702, 518848283,  118491664,
> +		      0,	  0,	      0,	  0};
> +int
> +main ()
> +{
> +  foo (res, main_state);
> +  if (res[0] != 0x41fcef98)
> +    __builtin_abort ();
> +}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-09  3:01 ` Kewen.Lin
@ 2022-08-09 22:03   ` Segher Boessenkool
  2022-08-10  6:39   ` [PATCH v2] " Xionghu Luo
  1 sibling, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2022-08-09 22:03 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Xionghu Luo, Xionghu Luo, gcc-patches, David Edelsohn

Hi!

On Tue, Aug 09, 2022 at 11:01:05AM +0800, Kewen.Lin wrote:
> on 2022/8/8 11:42, Xionghu Luo wrote:
> > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> 
> Sorry, no -m32 for LE testing.

You can use -m32 on powerpc64le-*, but the default configuration
disallows it.  There also is powerpcle-*, which in the distant past
actually was used (string insns (like lswi) and multiple insns (like
lmw) do not work, and unaligned accesses are more problematic as well,
but :-) )

It isn't something we support with ELFv2 at all, indeed.

> I have some concern on those changed "altivec_*_direct", IMHO the suffix
> "_direct" is normally to indicate the define_insn is mapped to the
> corresponding hw insn directly.

Exactly.  Let's please keep this intact.

> With this change, for example,
> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
> misleading.  Maybe we can add the corresponding _direct_le and _direct_be
> versions, both are mapped into the same insn but have different RTL
> patterns.

If that is the best we can do, that is the best we can do.  It would be
lovely if there was something nicer we can do though :-)

Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-09  3:01 ` Kewen.Lin
  2022-08-09 22:03   ` Segher Boessenkool
@ 2022-08-10  6:39   ` Xionghu Luo
  2022-08-10 17:07     ` Segher Boessenkool
  1 sibling, 1 reply; 12+ messages in thread
From: Xionghu Luo @ 2022-08-10  6:39 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: segher, Xionghu Luo, gcc-patches, David Edelsohn



On 2022/8/9 11:01, Kewen.Lin wrote:
> Hi Xionghu,
> 
> Thanks for the fix.
> 
> on 2022/8/8 11:42, Xionghu Luo wrote:
>> The native RTL expression for vec_mrghw should be same for BE and LE as
>> they are register and endian-independent.  So both BE and LE need
>> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
>> with vec_select and vec_concat.
>>
>> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>> 		   (subreg:V4SI (reg:V16QI 139) 0)
>> 		   (subreg:V4SI (reg:V16QI 140) 0))
>> 		   [const_int 0 4 1 5]))
>>
>> Then combine pass could do the nested vec_select optimization
>> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
>> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
>>
>> =>
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
>> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
>>
>> The endianness check need only once at ASM generation finally.
>> ASM would be better due to nested vec_select simplified to simple scalar
>> load.
>>
>> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> 
> Sorry, no -m32 for LE testing.  I noticed the attachement in that PR didn't
> include the test case (though the changelog has it), so I re-tested it
> again, nothing changed.  :)
> 
>> Linux(Thanks to Kewen), OK for master?  Or should we revert r12-4496 to
>> restore to the UNSPEC implementation?
>>
> 
> I have some concern on those changed "altivec_*_direct", IMHO the suffix
> "_direct" is normally to indicate the define_insn is mapped to the
> corresponding hw insn directly.  With this change, for example,
> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
> misleading.  Maybe we can add the corresponding _direct_le and _direct_be
> versions, both are mapped into the same insn but have different RTL
> patterns.  Looking forward to Segher's and David's suggestions.
> 

Thanks!  Do you mean same RTL patterns with different hw insn?
Updated as:

v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux(Thanks to Kewen), OK for master?  Or should we revert r12-4496 to
restore to the UNSPEC implementation?

gcc/ChangeLog:
	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb): Emit same native
	RTL for BE and LE.
	(altivec_vmrghh): Likewise.
	(altivec_vmrghw): Likewise.
	(*altivec_vmrghsf): Adjust.
	(altivec_vmrglb): Likewise.
	(altivec_vmrglh): Likewise.
	(altivec_vmrglw): Likewise.
	(*altivec_vmrglsf): Adjust.
	(altivec_vmrghb_direct): Emit different ASM for BE and LE.
	(altivec_vmrghh_direct): Likewise.
	(altivec_vmrghw_direct_<mode>): Likewise.
	(altivec_vmrglb_direct): Likewise.
	(altivec_vmrglh_direct): Likewise.
	(altivec_vmrglw_direct_<mode>): Likewise.
	(vec_widen_smult_hi_v16qi): Adjust.
	(vec_widen_smult_lo_v16qi): Adjust.
	(vec_widen_umult_hi_v16qi): Adjust.
	(vec_widen_umult_lo_v16qi): Adjust.
	(vec_widen_smult_hi_v8hi): Adjust.
	(vec_widen_smult_lo_v8hi): Adjust.
	(vec_widen_umult_hi_v8hi): Adjust.
	(vec_widen_umult_lo_v8hi): Adjust.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same
	native RTL for BE and LE.
	* config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise.
	(vsx_xxmrglw_<mode>): Likewise.

gcc/testsuite/ChangeLog:
	PR target/106069
	* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
  gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------
  gcc/config/rs6000/rs6000.cc                 |  36 ++--
  gcc/config/rs6000/vsx.md                    |  26 +--
  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++
  4 files changed, 303 insertions(+), 102 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..f5c7a89de7c 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17),
+		      GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
+		      GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
+		      GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
+  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })

-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct"
  		     (const_int 5) (const_int 21)
  		     (const_int 6) (const_int 22)
  		     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrghb %0,%1,%2"
    [(set_attr "type" "vecperm")])

+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 1 "register_operand" "v")
+	    (match_operand:V16QI 2 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 16)
+		     (const_int 1) (const_int 17)
+		     (const_int 2) (const_int 18)
+		     (const_int 3) (const_int 19)
+		     (const_int 4) (const_int 20)
+		     (const_int 5) (const_int 21)
+		     (const_int 6) (const_int 22)
+		     (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrglb %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrghh"
    [(use (match_operand:V8HI 0 "register_operand"))
     (use (match_operand:V8HI 1 "register_operand"))
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
+		      GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
+  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
+
+  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })

-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+	(vec_select:V8HI
  	  (vec_concat:V16HI
  	    (match_operand:V8HI 1 "register_operand" "v")
  	    (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct"
  		     (const_int 1) (const_int 9)
  		     (const_int 2) (const_int 10)
  		     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrghh %0,%1,%2"
    [(set_attr "type" "vecperm")])

+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 1 "register_operand" "v")
+	    (match_operand:V8HI 2 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 8)
+		     (const_int 1) (const_int 9)
+		     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrglh %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrghw"
    [(use (match_operand:V4SI 0 "register_operand"))
     (use (match_operand:V4SI 1 "register_operand"))
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5));
+  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })

-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 0) (const_int 4)
  		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 1 "register_operand" "wa,v")
+	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrglw %x0,%x2,%x1
+  vmrglw %0,%2,%1"
    [(set_attr "type" "vecperm")])

  (define_insn "*altivec_vmrghsf"
@@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25),
+		      GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27),
+		      GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29),
+		      GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31));
+  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })

-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct"
  		     (const_int 13) (const_int 29)
  		     (const_int 14) (const_int 30)
  		     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrglb %0,%1,%2"
    [(set_attr "type" "vecperm")])

+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 1 "register_operand" "v")
+	    (match_operand:V16QI 2 "register_operand" "v"))
+	  (parallel [(const_int  8) (const_int 24)
+		     (const_int  9) (const_int 25)
+		     (const_int 10) (const_int 26)
+		     (const_int 11) (const_int 27)
+		     (const_int 12) (const_int 28)
+		     (const_int 13) (const_int 29)
+		     (const_int 14) (const_int 30)
+		     (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrghb %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrglh"
    [(use (match_operand:V8HI 0 "register_operand"))
     (use (match_operand:V8HI 1 "register_operand"))
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13),
+		      GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15));
+  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })

-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
          (vec_select:V8HI
  	  (vec_concat:V16HI
@@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct"
  		     (const_int 5) (const_int 13)
  		     (const_int 6) (const_int 14)
  		     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrglh %0,%1,%2"
    [(set_attr "type" "vecperm")])

+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 1 "register_operand" "v")
+	    (match_operand:V8HI 2 "register_operand" "v"))
+	  (parallel [(const_int 4) (const_int 12)
+		     (const_int 5) (const_int 13)
+		     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrghh %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrglw"
    [(use (match_operand:V4SI 0 "register_operand"))
     (use (match_operand:V4SI 1 "register_operand"))
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7));
+  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })

-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 2) (const_int 6)
  		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 1 "register_operand" "wa,v")
+	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrglw %x0,%x1,%x2
-   vmrglw %0,%1,%2"
+  xxmrghw %x0,%x2,%x1
+  vmrghw %0,%2,%1"
    [(set_attr "type" "vecperm")])

  (define_insn "*altivec_vmrglsf"
@@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..97da7706f63 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
      {OPTION_MASK_ALTIVEC,
       CODE_FOR_altivec_vpkuwum_direct,
       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be,
       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be,
       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be,
       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be,
       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be,
       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be,
       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
      {OPTION_MASK_P8_VECTOR,
       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
@@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,

            /* For little-endian, the two input operands must be swapped
               (or swapped back) to ensure proper right-to-left numbering
-             from 0 to 2N-1.  */
-	  if (swapped ^ !BYTES_BIG_ENDIAN
-	      && icode != CODE_FOR_vsx_xxpermdi_v16qi)
+	     from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */
+	  if (swapped ^ !BYTES_BIG_ENDIAN)
+	    if (!(icode == CODE_FOR_altivec_vmrghb_direct_be
+		  || icode == CODE_FOR_altivec_vmrglb_direct_be
+		  || icode == CODE_FOR_altivec_vmrghh_direct_be
+		  || icode == CODE_FOR_altivec_vmrglh_direct_be
+		  || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be
+		  || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be
+		  || icode == CODE_FOR_vsx_xxpermdi_v16qi))
  	    std::swap (op0, op1);
  	  if (imode != V16QImode)
  	    {
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..c46d7e4f643 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4678,7 +4678,7 @@ (define_insn "vsx_xxspltd_<mode>"
    [(set_attr "type" "vecperm")])

  ;; V4SF/V4SI interleave
-(define_expand "vsx_xxmrghw_<mode>"
+(define_insn "vsx_xxmrghw_<mode>"
    [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
          (vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -4688,17 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
  		     (const_int 1) (const_int 5)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
-  DONE;
+  if (BYTES_BIG_ENDIAN)
+    return "xxmrghw %x0,%x1,%x2";
+  else
+    return "xxmrglw %x0,%x2,%x1";
  }
    [(set_attr "type" "vecperm")])

-(define_expand "vsx_xxmrglw_<mode>"
+(define_insn "vsx_xxmrglw_<mode>"
    [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -4708,13 +4705,10 @@ (define_expand "vsx_xxmrglw_<mode>"
  		     (const_int 3) (const_int 7)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
-  DONE;
+  if (BYTES_BIG_ENDIAN)
+    return "xxmrglw %x0,%x1,%x2";
+  else
+    return "xxmrghw %x0,%x2,%x1";
  }
    [(set_attr "type" "vecperm")])

diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..2cde9b821e3
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,120 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+extern "C" void *
+memcpy (void *, const void *, unsigned long);
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-10  6:39   ` [PATCH v2] " Xionghu Luo
@ 2022-08-10 17:07     ` Segher Boessenkool
  2022-08-11  6:15       ` Xionghu Luo
  0 siblings, 1 reply; 12+ messages in thread
From: Segher Boessenkool @ 2022-08-10 17:07 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: Kewen.Lin, Xionghu Luo, gcc-patches, David Edelsohn

On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
> On 2022/8/9 11:01, Kewen.Lin wrote:
> >I have some concern on those changed "altivec_*_direct", IMHO the suffix
> >"_direct" is normally to indicate the define_insn is mapped to the
> >corresponding hw insn directly.  With this change, for example,
> >altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
> >misleading.  Maybe we can add the corresponding _direct_le and _direct_be
> >versions, both are mapped into the same insn but have different RTL
> >patterns.  Looking forward to Segher's and David's suggestions.
> 
> Thanks!  Do you mean same RTL patterns with different hw insn?

A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
instruction, never a vmrglb instead.  Misleading names are an expensive
problem.


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-10 17:07     ` Segher Boessenkool
@ 2022-08-11  6:15       ` Xionghu Luo
  2022-08-16  6:53         ` Kewen.Lin
  0 siblings, 1 reply; 12+ messages in thread
From: Xionghu Luo @ 2022-08-11  6:15 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Kewen.Lin, Xionghu Luo, gcc-patches, David Edelsohn



On 2022/8/11 01:07, Segher Boessenkool wrote:
> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
>> On 2022/8/9 11:01, Kewen.Lin wrote:
>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix
>>> "_direct" is normally to indicate the define_insn is mapped to the
>>> corresponding hw insn directly.  With this change, for example,
>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
>>> misleading.  Maybe we can add the corresponding _direct_le and _direct_be
>>> versions, both are mapped into the same insn but have different RTL
>>> patterns.  Looking forward to Segher's and David's suggestions.
>>
>> Thanks!  Do you mean same RTL patterns with different hw insn?
> 
> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
> instruction, never a vmrglb instead.  Misleading names are an expensive
> problem.
> 
> 

Thanks.  Then on LE platforms, if user calls altivec_vmrghw，it will be
expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
For BE just strict forward, seems more clear :-), OK for master?


[PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
Linux(Thanks to Kewen).

gcc/ChangeLog:

	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
	(altivec_vmrghb_direct_be): New pattern for BE.
	(altivec_vmrglb_direct_le): New pattern for LE.
	(altivec_vmrghh_direct): Remove.
	(altivec_vmrghh_direct_be): New pattern for BE.
	(altivec_vmrglh_direct_le): New pattern for LE.
	(altivec_vmrghw_direct_<mode>): Remove.
	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
	(altivec_vmrglb_direct): Remove.
	(altivec_vmrglb_direct_be): New pattern for BE.
	(altivec_vmrghb_direct_le): New pattern for LE.
	(altivec_vmrglh_direct): Remove.
	(altivec_vmrglh_direct_be): New pattern for BE.
	(altivec_vmrghh_direct_le): New pattern for LE.
	(altivec_vmrglw_direct_<mode>): Remove.
	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
	Adjust.
	* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

	PR target/106069
	* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
  gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------
  gcc/config/rs6000/rs6000.cc                 |  36 ++--
  gcc/config/rs6000/vsx.md                    |  24 +--
  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++
  4 files changed, 305 insertions(+), 98 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..78245f470e9 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17),
+		      GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
+		      GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
+		      GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
+  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
  
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct"
  		     (const_int 5) (const_int 21)
  		     (const_int 6) (const_int 22)
  		     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrghb %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 1 "register_operand" "v")
+	    (match_operand:V16QI 2 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 16)
+		     (const_int 1) (const_int 17)
+		     (const_int 2) (const_int 18)
+		     (const_int 3) (const_int 19)
+		     (const_int 4) (const_int 20)
+		     (const_int 5) (const_int 21)
+		     (const_int 6) (const_int 22)
+		     (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrglb %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrghh"
    [(use (match_operand:V8HI 0 "register_operand"))
     (use (match_operand:V8HI 1 "register_operand"))
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
+		      GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
+  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
+
+  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
  
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+	(vec_select:V8HI
  	  (vec_concat:V16HI
  	    (match_operand:V8HI 1 "register_operand" "v")
  	    (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct"
  		     (const_int 1) (const_int 9)
  		     (const_int 2) (const_int 10)
  		     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrghh %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 1 "register_operand" "v")
+	    (match_operand:V8HI 2 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 8)
+		     (const_int 1) (const_int 9)
+		     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrglh %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrghw"
    [(use (match_operand:V4SI 0 "register_operand"))
     (use (match_operand:V4SI 1 "register_operand"))
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5));
+  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
  
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 0) (const_int 4)
  		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 1 "register_operand" "wa,v")
+	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrglw %x0,%x2,%x1
+  vmrglw %0,%2,%1"
    [(set_attr "type" "vecperm")])
  
  (define_insn "*altivec_vmrghsf"
@@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25),
+		      GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27),
+		      GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29),
+		      GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31));
+  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
  
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct"
  		     (const_int 13) (const_int 29)
  		     (const_int 14) (const_int 30)
  		     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrglb %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 1 "register_operand" "v")
+	    (match_operand:V16QI 2 "register_operand" "v"))
+	  (parallel [(const_int  8) (const_int 24)
+		     (const_int  9) (const_int 25)
+		     (const_int 10) (const_int 26)
+		     (const_int 11) (const_int 27)
+		     (const_int 12) (const_int 28)
+		     (const_int 13) (const_int 29)
+		     (const_int 14) (const_int 30)
+		     (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrghb %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrglh"
    [(use (match_operand:V8HI 0 "register_operand"))
     (use (match_operand:V8HI 1 "register_operand"))
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13),
+		      GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15));
+  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
  
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
          (vec_select:V8HI
  	  (vec_concat:V16HI
@@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct"
  		     (const_int 5) (const_int 13)
  		     (const_int 6) (const_int 14)
  		     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrglh %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 1 "register_operand" "v")
+	    (match_operand:V8HI 2 "register_operand" "v"))
+	  (parallel [(const_int 4) (const_int 12)
+		     (const_int 5) (const_int 13)
+		     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrghh %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrglw"
    [(use (match_operand:V4SI 0 "register_operand"))
     (use (match_operand:V4SI 1 "register_operand"))
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7));
+  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
  
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 2) (const_int 6)
  		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 1 "register_operand" "wa,v")
+	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrglw %x0,%x1,%x2
-   vmrglw %0,%1,%2"
+  xxmrghw %x0,%x2,%x1
+  vmrghw %0,%2,%1"
    [(set_attr "type" "vecperm")])
  
  (define_insn "*altivec_vmrglsf"
@@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..97da7706f63 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
      {OPTION_MASK_ALTIVEC,
       CODE_FOR_altivec_vpkuwum_direct,
       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be,
       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be,
       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be,
       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be,
       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be,
       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be,
       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
      {OPTION_MASK_P8_VECTOR,
       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
@@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
  
            /* For little-endian, the two input operands must be swapped
               (or swapped back) to ensure proper right-to-left numbering
-             from 0 to 2N-1.  */
-	  if (swapped ^ !BYTES_BIG_ENDIAN
-	      && icode != CODE_FOR_vsx_xxpermdi_v16qi)
+	     from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */
+	  if (swapped ^ !BYTES_BIG_ENDIAN)
+	    if (!(icode == CODE_FOR_altivec_vmrghb_direct_be
+		  || icode == CODE_FOR_altivec_vmrglb_direct_be
+		  || icode == CODE_FOR_altivec_vmrghh_direct_be
+		  || icode == CODE_FOR_altivec_vmrglh_direct_be
+		  || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be
+		  || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be
+		  || icode == CODE_FOR_vsx_xxpermdi_v16qi))
  	    std::swap (op0, op1);
  	  if (imode != V16QImode)
  	    {
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..2ae1bce131d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_<mode>"
  		     (const_int 1) (const_int 5)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+	gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+	gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], operands[2]));
    DONE;
  }
    [(set_attr "type" "vecperm")])
@@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_<mode>"
  		     (const_int 3) (const_int 7)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+	gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+	gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], operands[2]));
    DONE;
  }
    [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..2cde9b821e3
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,120 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+extern "C" void *
+memcpy (void *, const void *, unsigned long);
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-11  6:15       ` Xionghu Luo
@ 2022-08-16  6:53         ` Kewen.Lin
  2022-08-17  6:23           ` [PATCH v4] " Xionghu Luo
  0 siblings, 1 reply; 12+ messages in thread
From: Kewen.Lin @ 2022-08-16  6:53 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: Xionghu Luo, gcc-patches, David Edelsohn, Segher Boessenkool

Hi Xionghu,

Thanks for the updated version of patch, some comments are inlined.

on 2022/8/11 14:15, Xionghu Luo wrote:
> 
> 
> On 2022/8/11 01:07, Segher Boessenkool wrote:
>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
>>> On 2022/8/9 11:01, Kewen.Lin wrote:
>>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix
>>>> "_direct" is normally to indicate the define_insn is mapped to the
>>>> corresponding hw insn directly.  With this change, for example,
>>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
>>>> misleading.  Maybe we can add the corresponding _direct_le and _direct_be
>>>> versions, both are mapped into the same insn but have different RTL
>>>> patterns.  Looking forward to Segher's and David's suggestions.
>>>
>>> Thanks!  Do you mean same RTL patterns with different hw insn?
>>
>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
>> instruction, never a vmrglb instead.  Misleading names are an expensive
>> problem.
>>
>>
> 
> Thanks.  Then on LE platforms, if user calls altivec_vmrghw，it will be
> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
> For BE just strict forward, seems more clear :-), OK for master?
> 
> 
> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
> 
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
> 
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>            (subreg:V4SI (reg:V16QI 139) 0)
>            (subreg:V4SI (reg:V16QI 140) 0))
>            [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
> Linux(Thanks to Kewen).
> 
> gcc/ChangeLog:
> 
>     PR target/106069
>     * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
>     (altivec_vmrghb_direct_be): New pattern for BE.
>     (altivec_vmrglb_direct_le): New pattern for LE.
>     (altivec_vmrghh_direct): Remove.
>     (altivec_vmrghh_direct_be): New pattern for BE.
>     (altivec_vmrglh_direct_le): New pattern for LE.
>     (altivec_vmrghw_direct_<mode>): Remove.
>     (altivec_vmrghw_direct_<mode>_be): New pattern for BE.
>     (altivec_vmrglw_direct_<mode>_le): New pattern for LE.
>     (altivec_vmrglb_direct): Remove.
>     (altivec_vmrglb_direct_be): New pattern for BE.
>     (altivec_vmrghb_direct_le): New pattern for LE.
>     (altivec_vmrglh_direct): Remove.
>     (altivec_vmrglh_direct_be): New pattern for BE.
>     (altivec_vmrghh_direct_le): New pattern for LE.
>     (altivec_vmrglw_direct_<mode>): Remove.
>     (altivec_vmrglw_direct_<mode>_be): New pattern for BE.
>     (altivec_vmrghw_direct_<mode>_le): New pattern for LE.
>     * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
>     Adjust.
>     * config/rs6000/vsx.md: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>     PR target/106069
>     * g++.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
> ---
>  gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------
>  gcc/config/rs6000/rs6000.cc                 |  36 ++--
>  gcc/config/rs6000/vsx.md                    |  24 +--
>  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++
>  4 files changed, 305 insertions(+), 98 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..78245f470e9 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> -                        : gen_altivec_vmrglb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17),
> +              GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
> +              GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
> +              GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
> +  emit_insn (gen_rtx_SET (operands[0], x));
>    DONE;
>  })

I think you can just call gen_altivec_vmrghb_direct_be and 
gen_altivec_vmrghb_direct_le separately here.  Similar for some other
define_expands.

>  
> -(define_insn "altivec_vmrghb_direct"
> +(define_insn "altivec_vmrghb_direct_be"
>    [(set (match_operand:V16QI 0 "register_operand" "=v")
>      (vec_select:V16QI
>        (vec_concat:V32QI
> @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct"
>               (const_int 5) (const_int 21)
>               (const_int 6) (const_int 22)
>               (const_int 7) (const_int 23)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>    "vmrghb %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  

Could you move the following altivec_vmrghb_direct_le here?
Then readers can easily check the difference between be and
le for the same altivec_vmrghb_direct.

Same comment applied for some other similar cases.

> +(define_insn "altivec_vmrglb_direct_le"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +    (vec_select:V16QI
> +      (vec_concat:V32QI
> +        (match_operand:V16QI 1 "register_operand" "v")
> +        (match_operand:V16QI 2 "register_operand" "v"))
> +      (parallel [(const_int 0) (const_int 16)
> +             (const_int 1) (const_int 17)
> +             (const_int 2) (const_int 18)
> +             (const_int 3) (const_int 19)
> +             (const_int 4) (const_int 20)
> +             (const_int 5) (const_int 21)
> +             (const_int 6) (const_int 22)
> +             (const_int 7) (const_int 23)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> +  "vmrglb %0,%2,%1"
> +  [(set_attr "type" "vecperm")])

Could you update this pattern for assembly "vmrglb %0,%1,%2"
instead of "vmrglb %0,%2,%1"?  I checked the previous md
before the culprit commit 0910c516a3d72af048, it emits
"vmrglb %0,%1,%2" for altivec_vmrglb_direct.

Same comment applied for some other similar cases.

> +
>  (define_expand "altivec_vmrghh"
>    [(use (match_operand:V8HI 0 "register_operand"))
>     (use (match_operand:V8HI 1 "register_operand"))
>     (use (match_operand:V8HI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
> -                        : gen_altivec_vmrglh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
> +              GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
> +
> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
> +  emit_insn (gen_rtx_SET (operands[0], x));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrghh_direct"
> +(define_insn "altivec_vmrghh_direct_be"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
> -        (vec_select:V8HI
> +    (vec_select:V8HI
>        (vec_concat:V16HI
>          (match_operand:V8HI 1 "register_operand" "v")
>          (match_operand:V8HI 2 "register_operand" "v"))
> @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct"
>               (const_int 1) (const_int 9)
>               (const_int 2) (const_int 10)
>               (const_int 3) (const_int 11)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>    "vmrghh %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> +(define_insn "altivec_vmrglh_direct_le"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> +    (vec_select:V8HI
> +      (vec_concat:V16HI
> +        (match_operand:V8HI 1 "register_operand" "v")
> +        (match_operand:V8HI 2 "register_operand" "v"))
> +      (parallel [(const_int 0) (const_int 8)
> +             (const_int 1) (const_int 9)
> +             (const_int 2) (const_int 10)
> +             (const_int 3) (const_int 11)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> +  "vmrglh %0,%2,%1"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_expand "altivec_vmrghw"
>    [(use (match_operand:V4SI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
>     (use (match_operand:V4SI 2 "register_operand"))]
>    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
> -             : gen_altivec_vmrglw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5));
> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
> +  emit_insn (gen_rtx_SET (operands[0], x));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrghw_direct_<mode>"
> +(define_insn "altivec_vmrghw_direct_<mode>_be"
>    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>      (vec_select:VSX_W
>        (vec_concat:<VS_double>
> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
>          (match_operand:VSX_W 2 "register_operand" "wa,v"))
>        (parallel [(const_int 0) (const_int 4)
>               (const_int 1) (const_int 5)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "@
> +  xxmrghw %x0,%x1,%x2
> +  vmrghw %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglw_direct_<mode>_le"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> +    (vec_select:VSX_W
> +      (vec_concat:<VS_double>
> +        (match_operand:VSX_W 1 "register_operand" "wa,v")
> +        (match_operand:VSX_W 2 "register_operand" "wa,v"))
> +      (parallel [(const_int 0) (const_int 4)
> +             (const_int 1) (const_int 5)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "@
> -   xxmrghw %x0,%x1,%x2
> -   vmrghw %0,%1,%2"
> +  xxmrglw %x0,%x2,%x1
> +  vmrglw %0,%2,%1"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "*altivec_vmrghsf"
> @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
> -                        : gen_altivec_vmrghb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25),
> +              GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27),
> +              GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29),
> +              GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31));
> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
> +  emit_insn (gen_rtx_SET (operands[0], x));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrglb_direct"
> +(define_insn "altivec_vmrglb_direct_be"
>    [(set (match_operand:V16QI 0 "register_operand" "=v")
>      (vec_select:V16QI
>        (vec_concat:V32QI
> @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct"
>               (const_int 13) (const_int 29)
>               (const_int 14) (const_int 30)
>               (const_int 15) (const_int 31)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>    "vmrglb %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> +(define_insn "altivec_vmrghb_direct_le"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +    (vec_select:V16QI
> +      (vec_concat:V32QI
> +        (match_operand:V16QI 1 "register_operand" "v")
> +        (match_operand:V16QI 2 "register_operand" "v"))
> +      (parallel [(const_int  8) (const_int 24)
> +             (const_int  9) (const_int 25)
> +             (const_int 10) (const_int 26)
> +             (const_int 11) (const_int 27)
> +             (const_int 12) (const_int 28)
> +             (const_int 13) (const_int 29)
> +             (const_int 14) (const_int 30)
> +             (const_int 15) (const_int 31)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> +  "vmrghb %0,%2,%1"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_expand "altivec_vmrglh"
>    [(use (match_operand:V8HI 0 "register_operand"))
>     (use (match_operand:V8HI 1 "register_operand"))
>     (use (match_operand:V8HI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
> -                        : gen_altivec_vmrghh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13),
> +              GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15));
> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
> +  emit_insn (gen_rtx_SET (operands[0], x));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrglh_direct"
> +(define_insn "altivec_vmrglh_direct_be"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (vec_select:V8HI
>        (vec_concat:V16HI
> @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct"
>               (const_int 5) (const_int 13)
>               (const_int 6) (const_int 14)
>               (const_int 7) (const_int 15)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>    "vmrglh %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> +(define_insn "altivec_vmrghh_direct_le"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> +        (vec_select:V8HI
> +      (vec_concat:V16HI
> +        (match_operand:V8HI 1 "register_operand" "v")
> +        (match_operand:V8HI 2 "register_operand" "v"))
> +      (parallel [(const_int 4) (const_int 12)
> +             (const_int 5) (const_int 13)
> +             (const_int 6) (const_int 14)
> +             (const_int 7) (const_int 15)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> +  "vmrghh %0,%2,%1"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_expand "altivec_vmrglw"
>    [(use (match_operand:V4SI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
>     (use (match_operand:V4SI 2 "register_operand"))]
>    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
> -             : gen_altivec_vmrghw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7));
> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
> +  emit_insn (gen_rtx_SET (operands[0], x));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrglw_direct_<mode>"
> +(define_insn "altivec_vmrglw_direct_<mode>_be"
>    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>      (vec_select:VSX_W
>        (vec_concat:<VS_double>
> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
>          (match_operand:VSX_W 2 "register_operand" "wa,v"))
>        (parallel [(const_int 2) (const_int 6)
>               (const_int 3) (const_int 7)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "@
> +  xxmrglw %x0,%x1,%x2
> +  vmrglw %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghw_direct_<mode>_le"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> +    (vec_select:VSX_W
> +      (vec_concat:<VS_double>
> +        (match_operand:VSX_W 1 "register_operand" "wa,v")
> +        (match_operand:VSX_W 2 "register_operand" "wa,v"))
> +      (parallel [(const_int 2) (const_int 6)
> +             (const_int 3) (const_int 7)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "@
> -   xxmrglw %x0,%x1,%x2
> -   vmrglw %0,%1,%2"
> +  xxmrghw %x0,%x2,%x1
> +  vmrghw %0,%2,%1"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "*altivec_vmrglsf"
> @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>      {
>        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));

Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2",
you need to change this to:

  emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));

Same comment applied for some other similar cases.

>      }
>    DONE;
>  })
> @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
>      {
>        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
>      }
>    DONE;
>  })
> @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
>      {
>        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
>      }
>    DONE;
>  })
> @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
>      {
>        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
>      }
>    DONE;
>  })
> @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
>      {
>        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
>      }
>    DONE;
>  })
> @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
>      {
>        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
>      }
>    DONE;
>  })
> @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
>      {
>        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
>      }
>    DONE;
>  })
> @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
>      {
>        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
>      }
>    DONE;
>  })
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..97da7706f63 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>      {OPTION_MASK_ALTIVEC,
>       CODE_FOR_altivec_vpkuwum_direct,
>       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
> -    {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
> -              : CODE_FOR_altivec_vmrglb_direct,
> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be,
>       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},

Before the culprit commit 0910c516a3d72af04, we have:

    { OPTION_MASK_ALTIVEC,
      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
       : CODE_FOR_altivec_vmrglb_direct),
      {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },

I think we should use:

    { OPTION_MASK_ALTIVEC,
      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
       : CODE_FOR_altivec_vmrglb_direct_le),
      {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },

here instead.  Similar comment for those related below.

> -    {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
> -              : CODE_FOR_altivec_vmrglh_direct,
> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be,
>       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
> -    {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
> -              : CODE_FOR_altivec_vmrglw_direct_v4si,
> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be,
>       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
> -    {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
> -              : CODE_FOR_altivec_vmrghb_direct,
> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be,
>       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
> -    {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
> -              : CODE_FOR_altivec_vmrghh_direct,
> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be,
>       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
> -    {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
> -              : CODE_FOR_altivec_vmrghw_direct_v4si,
> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be,
>       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
>      {OPTION_MASK_P8_VECTOR,
>       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
> @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>  
>            /* For little-endian, the two input operands must be swapped
>               (or swapped back) to ensure proper right-to-left numbering
> -             from 0 to 2N-1.  */
> -      if (swapped ^ !BYTES_BIG_ENDIAN
> -          && icode != CODE_FOR_vsx_xxpermdi_v16qi)
> +         from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */
> +      if (swapped ^ !BYTES_BIG_ENDIAN)
> +        if (!(icode == CODE_FOR_altivec_vmrghb_direct_be
> +          || icode == CODE_FOR_altivec_vmrglb_direct_be
> +          || icode == CODE_FOR_altivec_vmrghh_direct_be
> +          || icode == CODE_FOR_altivec_vmrglh_direct_be
> +          || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be
> +          || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be
> +          || icode == CODE_FOR_vsx_xxpermdi_v16qi))
>          std::swap (op0, op1);

IIUC, we don't need this part of change once we fix the operand order in
the assembly for those LE "direct"s.

BR,
Kewen

>        if (imode != V16QImode)
>          {
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..2ae1bce131d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_<mode>"
>               (const_int 1) (const_int 5)])))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
> -             : gen_altivec_vmrglw_direct_<mode>;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +    gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +    gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], operands[2]));
>    DONE;
>  }
>    [(set_attr "type" "vecperm")])
> @@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_<mode>"
>               (const_int 3) (const_int 7)])))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
> -             : gen_altivec_vmrghw_direct_<mode>;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +    gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +    gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], operands[2]));
>    DONE;
>  }
>    [(set_attr "type" "vecperm")])
> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
> new file mode 100644
> index 00000000000..2cde9b821e3
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
> @@ -0,0 +1,120 @@
> +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
> +/* { dg-require-effective-target vmx_hw } */
> +/* { dg-do run } */
> +
> +extern "C" void *
> +memcpy (void *, const void *, unsigned long);
> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
> +
> +union
> +{
> +  native_simd_type V;
> +  int R[4];
> +} store_le_vec;
> +
> +struct S
> +{
> +  S () = default;
> +  S (unsigned B0)
> +  {
> +    native_simd_type val{B0};
> +    m_simd = val;
> +  }
> +  void store_le (unsigned int out[])
> +  {
> +    store_le_vec.V = m_simd;
> +    unsigned int x0 = store_le_vec.R[0];
> +    memcpy (out, &x0, 4);
> +  }
> +  S rotl (unsigned int r)
> +  {
> +    native_simd_type rot{r};
> +    return __builtin_vec_rl (m_simd, rot);
> +  }
> +  void operator+= (S other)
> +  {
> +    m_simd = __builtin_vec_add (m_simd, other.m_simd);
> +  }
> +  void operator^= (S other)
> +  {
> +    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
> +  }
> +  static void transpose (S &B0, S B1, S B2, S B3)
> +  {
> +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
> +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
> +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
> +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
> +    B0 = __builtin_vec_mergeh (T0, T1);
> +    B3 = __builtin_vec_mergel (T2, T3);
> +  }
> +  S (native_simd_type x) : m_simd (x) {}
> +  native_simd_type m_simd;
> +};
> +
> +void
> +foo (unsigned int output[], unsigned state[])
> +{
> +  S R00 = state[0];
> +  S R01 = state[0];
> +  S R02 = state[2];
> +  S R03 = state[0];
> +  S R05 = state[5];
> +  S R06 = state[6];
> +  S R07 = state[7];
> +  S R08 = state[8];
> +  S R09 = state[9];
> +  S R10 = state[10];
> +  S R11 = state[11];
> +  S R12 = state[12];
> +  S R13 = state[13];
> +  S R14 = state[4];
> +  S R15 = state[15];
> +  for (int r = 0; r != 10; ++r)
> +    {
> +      R09 += R13;
> +      R11 += R15;
> +      R05 ^= R09;
> +      R06 ^= R10;
> +      R07 ^= R11;
> +      R07 = R07.rotl (7);
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 ^= R01;
> +      R13 ^= R02;
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 = R12.rotl (8);
> +      R13 = R13.rotl (8);
> +      R10 += R15;
> +      R11 += R12;
> +      R08 += R13;
> +      R09 += R14;
> +      R05 ^= R10;
> +      R06 ^= R11;
> +      R07 ^= R08;
> +      R05 = R05.rotl (7);
> +      R06 = R06.rotl (7);
> +      R07 = R07.rotl (7);
> +    }
> +  R00 += state[0];
> +  S::transpose (R00, R01, R02, R03);
> +  R00.store_le (output);
> +}
> +
> +unsigned int res[1];
> +unsigned main_state[]{1634760805, 60878,      2036477234, 6,
> +              0,      825562964,  1471091955, 1346092787,
> +              506976774,  4197066702, 518848283,  118491664,
> +              0,      0,          0,      0};
> +int
> +main ()
> +{
> +  foo (res, main_state);
> +  if (res[0] != 0x41fcef98)
> +    __builtin_abort ();
> +}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-16  6:53         ` Kewen.Lin
@ 2022-08-17  6:23           ` Xionghu Luo
  2022-08-24  1:24             ` Ping: " Xionghu Luo
  0 siblings, 1 reply; 12+ messages in thread
From: Xionghu Luo @ 2022-08-17  6:23 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Xionghu Luo, gcc-patches, David Edelsohn, Segher Boessenkool



On 2022/8/16 14:53, Kewen.Lin wrote:
> Hi Xionghu,
> 
> Thanks for the updated version of patch, some comments are inlined.
> 
> on 2022/8/11 14:15, Xionghu Luo wrote:
>>
>>
>> On 2022/8/11 01:07, Segher Boessenkool wrote:
>>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
>>>> On 2022/8/9 11:01, Kewen.Lin wrote:
>>>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix
>>>>> "_direct" is normally to indicate the define_insn is mapped to the
>>>>> corresponding hw insn directly.  With this change, for example,
>>>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
>>>>> misleading.  Maybe we can add the corresponding _direct_le and _direct_be
>>>>> versions, both are mapped into the same insn but have different RTL
>>>>> patterns.  Looking forward to Segher's and David's suggestions.
>>>>
>>>> Thanks!  Do you mean same RTL patterns with different hw insn?
>>>
>>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
>>> instruction, never a vmrglb instead.  Misleading names are an expensive
>>> problem.
>>>
>>>
>>
>> Thanks.  Then on LE platforms, if user calls altivec_vmrghw，it will be
>> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
>> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
>> For BE just strict forward, seems more clear :-), OK for master?
>>
>>
>> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
>>
>> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
>> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
>> patterns.
>> v2: Split the direct pattern to be and le with same RTL but different insn.
>>
>> The native RTL expression for vec_mrghw should be same for BE and LE as
>> they are register and endian-independent.  So both BE and LE need
>> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
>> with vec_select and vec_concat.
>>
>> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>>             (subreg:V4SI (reg:V16QI 139) 0)
>>             (subreg:V4SI (reg:V16QI 140) 0))
>>             [const_int 0 4 1 5]))
>>
>> Then combine pass could do the nested vec_select optimization
>> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
>> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
>>
>> =>
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
>> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
>>
>> The endianness check need only once at ASM generation finally.
>> ASM would be better due to nested vec_select simplified to simple scalar
>> load.
>>
>> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
>> Linux(Thanks to Kewen).
>>
>> gcc/ChangeLog:
>>
>>      PR target/106069
>>      * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
>>      (altivec_vmrghb_direct_be): New pattern for BE.
>>      (altivec_vmrglb_direct_le): New pattern for LE.
>>      (altivec_vmrghh_direct): Remove.
>>      (altivec_vmrghh_direct_be): New pattern for BE.
>>      (altivec_vmrglh_direct_le): New pattern for LE.
>>      (altivec_vmrghw_direct_<mode>): Remove.
>>      (altivec_vmrghw_direct_<mode>_be): New pattern for BE.
>>      (altivec_vmrglw_direct_<mode>_le): New pattern for LE.
>>      (altivec_vmrglb_direct): Remove.
>>      (altivec_vmrglb_direct_be): New pattern for BE.
>>      (altivec_vmrghb_direct_le): New pattern for LE.
>>      (altivec_vmrglh_direct): Remove.
>>      (altivec_vmrglh_direct_be): New pattern for BE.
>>      (altivec_vmrghh_direct_le): New pattern for LE.
>>      (altivec_vmrglw_direct_<mode>): Remove.
>>      (altivec_vmrglw_direct_<mode>_be): New pattern for BE.
>>      (altivec_vmrghw_direct_<mode>_le): New pattern for LE.
>>      * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
>>      Adjust.
>>      * config/rs6000/vsx.md: Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      PR target/106069
>>      * g++.target/powerpc/pr106069.C: New test.
>>
>> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
>> ---
>>   gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------
>>   gcc/config/rs6000/rs6000.cc                 |  36 ++--
>>   gcc/config/rs6000/vsx.md                    |  24 +--
>>   gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++
>>   4 files changed, 305 insertions(+), 98 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
>>
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 2c4940f2e21..78245f470e9 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
>> -                        : gen_altivec_vmrglb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17),
>> +              GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
>> +              GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
>> +              GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
>> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
> 
> I think you can just call gen_altivec_vmrghb_direct_be and
> gen_altivec_vmrghb_direct_le separately here.  Similar for some other
> define_expands.
> 
>>   
>> -(define_insn "altivec_vmrghb_direct"
>> +(define_insn "altivec_vmrghb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>       (vec_select:V16QI
>>         (vec_concat:V32QI
>> @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct"
>>                (const_int 5) (const_int 21)
>>                (const_int 6) (const_int 22)
>>                (const_int 7) (const_int 23)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrghb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
> 
> Could you move the following altivec_vmrghb_direct_le here?
> Then readers can easily check the difference between be and
> le for the same altivec_vmrghb_direct.
> 
> Same comment applied for some other similar cases.
> 
>> +(define_insn "altivec_vmrglb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +    (vec_select:V16QI
>> +      (vec_concat:V32QI
>> +        (match_operand:V16QI 1 "register_operand" "v")
>> +        (match_operand:V16QI 2 "register_operand" "v"))
>> +      (parallel [(const_int 0) (const_int 16)
>> +             (const_int 1) (const_int 17)
>> +             (const_int 2) (const_int 18)
>> +             (const_int 3) (const_int 19)
>> +             (const_int 4) (const_int 20)
>> +             (const_int 5) (const_int 21)
>> +             (const_int 6) (const_int 22)
>> +             (const_int 7) (const_int 23)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrglb %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
> 
> Could you update this pattern for assembly "vmrglb %0,%1,%2"
> instead of "vmrglb %0,%2,%1"?  I checked the previous md
> before the culprit commit 0910c516a3d72af048, it emits
> "vmrglb %0,%1,%2" for altivec_vmrglb_direct.
> 
> Same comment applied for some other similar cases.
> 
>> +
>>   (define_expand "altivec_vmrghh"
>>     [(use (match_operand:V8HI 0 "register_operand"))
>>      (use (match_operand:V8HI 1 "register_operand"))
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
>> -                        : gen_altivec_vmrglh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
>> +              GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
>> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
>> +
>> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghh_direct"
>> +(define_insn "altivec_vmrghh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>> -        (vec_select:V8HI
>> +    (vec_select:V8HI
>>         (vec_concat:V16HI
>>           (match_operand:V8HI 1 "register_operand" "v")
>>           (match_operand:V8HI 2 "register_operand" "v"))
>> @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct"
>>                (const_int 1) (const_int 9)
>>                (const_int 2) (const_int 10)
>>                (const_int 3) (const_int 11)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrghh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> +(define_insn "altivec_vmrglh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +    (vec_select:V8HI
>> +      (vec_concat:V16HI
>> +        (match_operand:V8HI 1 "register_operand" "v")
>> +        (match_operand:V8HI 2 "register_operand" "v"))
>> +      (parallel [(const_int 0) (const_int 8)
>> +             (const_int 1) (const_int 9)
>> +             (const_int 2) (const_int 10)
>> +             (const_int 3) (const_int 11)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrglh %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
>> +
>>   (define_expand "altivec_vmrghw"
>>     [(use (match_operand:V4SI 0 "register_operand"))
>>      (use (match_operand:V4SI 1 "register_operand"))
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
>> -             : gen_altivec_vmrglw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5));
>> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghw_direct_<mode>"
>> +(define_insn "altivec_vmrghw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>       (vec_select:VSX_W
>>         (vec_concat:<VS_double>
>> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
>>           (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>         (parallel [(const_int 0) (const_int 4)
>>                (const_int 1) (const_int 5)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +  xxmrghw %x0,%x1,%x2
>> +  vmrghw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +    (vec_select:VSX_W
>> +      (vec_concat:<VS_double>
>> +        (match_operand:VSX_W 1 "register_operand" "wa,v")
>> +        (match_operand:VSX_W 2 "register_operand" "wa,v"))
>> +      (parallel [(const_int 0) (const_int 4)
>> +             (const_int 1) (const_int 5)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>> -   xxmrghw %x0,%x1,%x2
>> -   vmrghw %0,%1,%2"
>> +  xxmrglw %x0,%x2,%x1
>> +  vmrglw %0,%2,%1"
>>     [(set_attr "type" "vecperm")])
>>   
>>   (define_insn "*altivec_vmrghsf"
>> @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
>> -                        : gen_altivec_vmrghb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25),
>> +              GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27),
>> +              GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29),
>> +              GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31));
>> +  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglb_direct"
>> +(define_insn "altivec_vmrglb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>       (vec_select:V16QI
>>         (vec_concat:V32QI
>> @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct"
>>                (const_int 13) (const_int 29)
>>                (const_int 14) (const_int 30)
>>                (const_int 15) (const_int 31)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrglb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> +(define_insn "altivec_vmrghb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +    (vec_select:V16QI
>> +      (vec_concat:V32QI
>> +        (match_operand:V16QI 1 "register_operand" "v")
>> +        (match_operand:V16QI 2 "register_operand" "v"))
>> +      (parallel [(const_int  8) (const_int 24)
>> +             (const_int  9) (const_int 25)
>> +             (const_int 10) (const_int 26)
>> +             (const_int 11) (const_int 27)
>> +             (const_int 12) (const_int 28)
>> +             (const_int 13) (const_int 29)
>> +             (const_int 14) (const_int 30)
>> +             (const_int 15) (const_int 31)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrghb %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
>> +
>>   (define_expand "altivec_vmrglh"
>>     [(use (match_operand:V8HI 0 "register_operand"))
>>      (use (match_operand:V8HI 1 "register_operand"))
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
>> -                        : gen_altivec_vmrghh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13),
>> +              GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15));
>> +  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglh_direct"
>> +(define_insn "altivec_vmrglh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>>           (vec_select:V8HI
>>         (vec_concat:V16HI
>> @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct"
>>                (const_int 5) (const_int 13)
>>                (const_int 6) (const_int 14)
>>                (const_int 7) (const_int 15)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>>     "vmrglh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> +(define_insn "altivec_vmrghh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +        (vec_select:V8HI
>> +      (vec_concat:V16HI
>> +        (match_operand:V8HI 1 "register_operand" "v")
>> +        (match_operand:V8HI 2 "register_operand" "v"))
>> +      (parallel [(const_int 4) (const_int 12)
>> +             (const_int 5) (const_int 13)
>> +             (const_int 6) (const_int 14)
>> +             (const_int 7) (const_int 15)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>> +  "vmrghh %0,%2,%1"
>> +  [(set_attr "type" "vecperm")])
>> +
>>   (define_expand "altivec_vmrglw"
>>     [(use (match_operand:V4SI 0 "register_operand"))
>>      (use (match_operand:V4SI 1 "register_operand"))
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
>> -             : gen_altivec_vmrghw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7));
>> +  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
>> +  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
>> +  emit_insn (gen_rtx_SET (operands[0], x));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglw_direct_<mode>"
>> +(define_insn "altivec_vmrglw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>       (vec_select:VSX_W
>>         (vec_concat:<VS_double>
>> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
>>           (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>         (parallel [(const_int 2) (const_int 6)
>>                (const_int 3) (const_int 7)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +  xxmrglw %x0,%x1,%x2
>> +  vmrglw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +    (vec_select:VSX_W
>> +      (vec_concat:<VS_double>
>> +        (match_operand:VSX_W 1 "register_operand" "wa,v")
>> +        (match_operand:VSX_W 2 "register_operand" "wa,v"))
>> +      (parallel [(const_int 2) (const_int 6)
>> +             (const_int 3) (const_int 7)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>> -   xxmrglw %x0,%x1,%x2
>> -   vmrglw %0,%1,%2"
>> +  xxmrghw %x0,%x2,%x1
>> +  vmrghw %0,%2,%1"
>>     [(set_attr "type" "vecperm")])
>>   
>>   (define_insn "*altivec_vmrglsf"
>> @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
> 
> Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2",
> you need to change this to:
> 
>    emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
> 
> Same comment applied for some other similar cases.
> 
>>       }
>>     DONE;
>>   })
>> @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
>>       }
>>     DONE;
>>   })
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..97da7706f63 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>>       {OPTION_MASK_ALTIVEC,
>>        CODE_FOR_altivec_vpkuwum_direct,
>>        {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
>> -              : CODE_FOR_altivec_vmrglb_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be,
>>        {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
> 
> Before the culprit commit 0910c516a3d72af04, we have:
> 
>      { OPTION_MASK_ALTIVEC,
>        (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
>         : CODE_FOR_altivec_vmrglb_direct),
>        {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
> 
> I think we should use:
> 
>      { OPTION_MASK_ALTIVEC,
>        (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
>         : CODE_FOR_altivec_vmrglb_direct_le),
>        {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
> 
> here instead.  Similar comment for those related below.
> 
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
>> -              : CODE_FOR_altivec_vmrglh_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be,
>>        {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
>> -              : CODE_FOR_altivec_vmrglw_direct_v4si,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be,
>>        {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
>> -              : CODE_FOR_altivec_vmrghb_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be,
>>        {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
>> -              : CODE_FOR_altivec_vmrghh_direct,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be,
>>        {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
>> -    {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
>> -              : CODE_FOR_altivec_vmrghw_direct_v4si,
>> +    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be,
>>        {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
>>       {OPTION_MASK_P8_VECTOR,
>>        BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
>> @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>>   
>>             /* For little-endian, the two input operands must be swapped
>>                (or swapped back) to ensure proper right-to-left numbering
>> -             from 0 to 2N-1.  */
>> -      if (swapped ^ !BYTES_BIG_ENDIAN
>> -          && icode != CODE_FOR_vsx_xxpermdi_v16qi)
>> +         from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */
>> +      if (swapped ^ !BYTES_BIG_ENDIAN)
>> +        if (!(icode == CODE_FOR_altivec_vmrghb_direct_be
>> +          || icode == CODE_FOR_altivec_vmrglb_direct_be
>> +          || icode == CODE_FOR_altivec_vmrghh_direct_be
>> +          || icode == CODE_FOR_altivec_vmrglh_direct_be
>> +          || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be
>> +          || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be
>> +          || icode == CODE_FOR_vsx_xxpermdi_v16qi))
>>           std::swap (op0, op1);
> 
> IIUC, we don't need this part of change once we fix the operand order in
> the assembly for those LE "direct"s.
> 
> BR,
> Kewen
> 

Thanks.  Addressed all the comments as v4.


v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
	(altivec_vmrghb_direct_be): New pattern for BE.
	(altivec_vmrghb_direct_le): New pattern for LE.
	(altivec_vmrghh_direct): Remove.
	(altivec_vmrghh_direct_be): New pattern for BE.
	(altivec_vmrghh_direct_le): New pattern for LE.
	(altivec_vmrghw_direct_<mode>): Remove.
	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
	(altivec_vmrglb_direct): Remove.
	(altivec_vmrglb_direct_be): New pattern for BE.
	(altivec_vmrglb_direct_le): New pattern for LE.
	(altivec_vmrglh_direct): Remove.
	(altivec_vmrglh_direct_be): New pattern for BE.
	(altivec_vmrglh_direct_le): New pattern for LE.
	(altivec_vmrglw_direct_<mode>): Remove.
	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
	Adjust.
	* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

	PR target/106069
	* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
  gcc/config/rs6000/altivec.md                | 230 ++++++++++++++------
  gcc/config/rs6000/rs6000.cc                 |  24 +-
  gcc/config/rs6000/vsx.md                    |  28 ++-
  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 ++++++++++
  4 files changed, 313 insertions(+), 89 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..962df4657e6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
  		     (const_int 5) (const_int 21)
  		     (const_int 6) (const_int 22)
  		     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int  8) (const_int 24)
+		     (const_int  9) (const_int 25)
+		     (const_int 10) (const_int 26)
+		     (const_int 11) (const_int 27)
+		     (const_int 12) (const_int 28)
+		     (const_int 13) (const_int 29)
+		     (const_int 14) (const_int 30)
+		     (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrghb %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+	(vec_select:V8HI
  	  (vec_concat:V16HI
  	    (match_operand:V8HI 1 "register_operand" "v")
  	    (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
  		     (const_int 1) (const_int 9)
  		     (const_int 2) (const_int 10)
  		     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 4) (const_int 12)
+		     (const_int 5) (const_int 13)
+		     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrghh %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1221,10 +1257,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 0) (const_int 4)
  		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
  (define_insn "*altivec_vmrghsf"
@@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
  	(vec_select:V16QI
  	  (vec_concat:V32QI
@@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
  		     (const_int 13) (const_int 29)
  		     (const_int 14) (const_int 30)
  		     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 16)
+		     (const_int 1) (const_int 17)
+		     (const_int 2) (const_int 18)
+		     (const_int 3) (const_int 19)
+		     (const_int 4) (const_int 20)
+		     (const_int 5) (const_int 21)
+		     (const_int 6) (const_int 22)
+		     (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrglb %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
          (vec_select:V8HI
  	  (vec_concat:V16HI
@@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
  		     (const_int 5) (const_int 13)
  		     (const_int 6) (const_int 14)
  		     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 8)
+		     (const_int 1) (const_int 9)
+		     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "vmrglh %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
@@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  })
  
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
  	(vec_select:VSX_W
  	  (vec_concat:<VS_double>
@@ -1327,10 +1413,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
  	  (parallel [(const_int 2) (const_int 6)
  		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrglw %x0,%x1,%x2
-   vmrglw %0,%1,%2"
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
    [(set_attr "type" "vecperm")])
  
  (define_insn "*altivec_vmrglsf"
@@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
@@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
      }
    DONE;
  })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..c6ccd40e089 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
       CODE_FOR_altivec_vpkuwum_direct,
       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
+		      : CODE_FOR_altivec_vmrglb_direct_le,
       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
+		      : CODE_FOR_altivec_vmrglh_direct_le,
       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
+		      : CODE_FOR_altivec_vmrghb_direct_le,
       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
+		      : CODE_FOR_altivec_vmrghh_direct_le,
       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
      {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
      {OPTION_MASK_P8_VECTOR,
       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..80f84e9b141 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
  		     (const_int 1) (const_int 5)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  }
    [(set_attr "type" "vecperm")])
@@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
  		     (const_int 3) (const_int 7)])))]
    "VECTOR_MEM_VSX_P (<MODE>mode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
    DONE;
  }
    [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..2cde9b821e3
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,120 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+extern "C" void *
+memcpy (void *, const void *, unsigned long);
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-17  6:23           ` [PATCH v4] " Xionghu Luo
@ 2022-08-24  1:24             ` Xionghu Luo
  2023-01-18  9:11               ` Kewen.Lin
  0 siblings, 1 reply; 12+ messages in thread
From: Xionghu Luo @ 2022-08-24  1:24 UTC (permalink / raw)
  To: Kewen.Lin, Segher Boessenkool
  Cc: Xionghu Luo, gcc-patches, David Edelsohn, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 63 bytes --]

Hi Segher, I'd like to resend and ping for this patch. Thanks.

[-- Attachment #2: v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch --]
[-- Type: text/plain, Size: 25714 bytes --]

From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
From: Xionghu Luo <xionghuluo@tencent.com>
Date: Thu, 4 Aug 2022 03:44:58 +0000
Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
 UNSPECS [PR106069]

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
		   (subreg:V4SI (reg:V16QI 139) 0)
		   (subreg:V4SI (reg:V16QI 140) 0))
		   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

	PR target/106069
	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
	(altivec_vmrghb_direct_be): New pattern for BE.
	(altivec_vmrghb_direct_le): New pattern for LE.
	(altivec_vmrghh_direct): Remove.
	(altivec_vmrghh_direct_be): New pattern for BE.
	(altivec_vmrghh_direct_le): New pattern for LE.
	(altivec_vmrghw_direct_<mode>): Remove.
	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
	(altivec_vmrglb_direct): Remove.
	(altivec_vmrglb_direct_be): New pattern for BE.
	(altivec_vmrglb_direct_le): New pattern for LE.
	(altivec_vmrglh_direct): Remove.
	(altivec_vmrglh_direct_be): New pattern for BE.
	(altivec_vmrglh_direct_le): New pattern for LE.
	(altivec_vmrglw_direct_<mode>): Remove.
	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
	Adjust.
	* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

	PR target/106069
	* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
 gcc/config/rs6000/altivec.md                | 222 ++++++++++++++------
 gcc/config/rs6000/rs6000.cc                 |  24 +--
 gcc/config/rs6000/vsx.md                    |  28 +--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
 4 files changed, 307 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..c6a381908cb 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-						: gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
 		     (const_int 5) (const_int 21)
 		     (const_int 6) (const_int 22)
 		     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int  8) (const_int 24)
+		     (const_int  9) (const_int 25)
+		     (const_int 10) (const_int 26)
+		     (const_int 11) (const_int 27)
+		     (const_int 12) (const_int 28)
+		     (const_int 13) (const_int 29)
+		     (const_int 14) (const_int 30)
+		     (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-						: gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+	(vec_select:V8HI
 	  (vec_concat:V16HI
 	    (match_operand:V8HI 1 "register_operand" "v")
 	    (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
 		     (const_int 1) (const_int 9)
 		     (const_int 2) (const_int 10)
 		     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 4) (const_int 12)
+		     (const_int 5) (const_int 13)
+		     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghh %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 0) (const_int 4)
 		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrghw %x0,%x1,%x2
+   vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrghw %x0,%x1,%x2
    vmrghw %0,%1,%2"
@@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-						: gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
@@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
 		     (const_int 13) (const_int 29)
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(vec_select:V16QI
+	  (vec_concat:V32QI
+	    (match_operand:V16QI 2 "register_operand" "v")
+	    (match_operand:V16QI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 16)
+		     (const_int 1) (const_int 17)
+		     (const_int 2) (const_int 18)
+		     (const_int 3) (const_int 19)
+		     (const_int 4) (const_int 20)
+		     (const_int 5) (const_int 21)
+		     (const_int 6) (const_int 22)
+		     (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-						: gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (vec_select:V8HI
 	  (vec_concat:V16HI
@@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
 		     (const_int 5) (const_int 13)
 		     (const_int 6) (const_int 14)
 		     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(vec_select:V8HI
+	  (vec_concat:V16HI
+	    (match_operand:V8HI 2 "register_operand" "v")
+	    (match_operand:V8HI 1 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 8)
+		     (const_int 1) (const_int 9)
+		     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglh %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 2) (const_int 6)
 		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrglw %x0,%x1,%x2
+   vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrglw %x0,%x1,%x2
    vmrglw %0,%1,%2"
@@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..c6ccd40e089 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
      CODE_FOR_altivec_vpkuwum_direct,
      {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-		      : CODE_FOR_altivec_vmrglb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
+		      : CODE_FOR_altivec_vmrglb_direct_le,
      {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-		      : CODE_FOR_altivec_vmrglh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
+		      : CODE_FOR_altivec_vmrglh_direct_le,
      {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
      {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-		      : CODE_FOR_altivec_vmrghb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
+		      : CODE_FOR_altivec_vmrghb_direct_le,
      {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-		      : CODE_FOR_altivec_vmrghh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
+		      : CODE_FOR_altivec_vmrghh_direct_le,
      {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
      {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
     {OPTION_MASK_P8_VECTOR,
      BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..80f84e9b141 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
 		     (const_int 1) (const_int 5)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
@@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
 		     (const_int 3) (const_int 7)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..c89739ecb55
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,118 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    __builtin_memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2022-08-24  1:24             ` Ping: " Xionghu Luo
@ 2023-01-18  9:11               ` Kewen.Lin
  2023-02-09  2:15                 ` Xionghu Luo
  0 siblings, 1 reply; 12+ messages in thread
From: Kewen.Lin @ 2023-01-18  9:11 UTC (permalink / raw)
  To: Xionghu Luo, Segher Boessenkool
  Cc: Xionghu Luo, gcc-patches, David Edelsohn, Jakub Jelinek

Hi Segher,

I guessed that this patch escaped from your radar. :)

As Jakub asked the status in PR106069, I applied this attached patch from Xionghu
to the latest trunk, re-tested it and confirmed that it's still bootstrapped and
regtested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.

This new version has separated out direct le and be, it's more clear than before,
it looked good to me.  What do you think of this?  Looking forward to your opinion.

btw, the link in archives:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600169.html

BR,
Kewen

on 2022/8/24 09:24, Xionghu Luo wrote:
> 主题:
> Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
> From:
> Xionghu Luo <yinyuefengyi@gmail.com>
> 日期:
> 2022/8/24, 09:24
> 
> 收件人:
> "Kewen.Lin" <linkw@linux.ibm.com>, Segher Boessenkool <segher@kernel.crashing.org>
> 抄送:
> Xionghu Luo <xionghuluo@tencent.com>, gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>, Segher Boessenkool <segher@kernel.crashing.org>
> 
> 
> Hi Segher, I'd like to resend and ping for this patch. Thanks.
> 
> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch
> 
> From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
> From: Xionghu Luo <xionghuluo@tencent.com>
> Date: Thu, 4 Aug 2022 03:44:58 +0000
> Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
>  UNSPECS [PR106069]
> 
> v4: Update per comments.
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
> 
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> 		   (subreg:V4SI (reg:V16QI 139) 0)
> 		   (subreg:V4SI (reg:V16QI 140) 0))
> 		   [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> Linux.
> 
> gcc/ChangeLog:
> 
> 	PR target/106069
> 	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
> 	(altivec_vmrghb_direct_be): New pattern for BE.
> 	(altivec_vmrghb_direct_le): New pattern for LE.
> 	(altivec_vmrghh_direct): Remove.
> 	(altivec_vmrghh_direct_be): New pattern for BE.
> 	(altivec_vmrghh_direct_le): New pattern for LE.
> 	(altivec_vmrghw_direct_<mode>): Remove.
> 	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
> 	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
> 	(altivec_vmrglb_direct): Remove.
> 	(altivec_vmrglb_direct_be): New pattern for BE.
> 	(altivec_vmrglb_direct_le): New pattern for LE.
> 	(altivec_vmrglh_direct): Remove.
> 	(altivec_vmrglh_direct_be): New pattern for BE.
> 	(altivec_vmrglh_direct_le): New pattern for LE.
> 	(altivec_vmrglw_direct_<mode>): Remove.
> 	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
> 	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
> 	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
> 	Adjust.
> 	* config/rs6000/vsx.md: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR target/106069
> 	* g++.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
> ---
>  gcc/config/rs6000/altivec.md                | 222 ++++++++++++++------
>  gcc/config/rs6000/rs6000.cc                 |  24 +--
>  gcc/config/rs6000/vsx.md                    |  28 +--
>  gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
>  4 files changed, 307 insertions(+), 85 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..c6a381908cb 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> -						: gen_altivec_vmrglb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrghb_direct"
> +(define_insn "altivec_vmrghb_direct_be"
>    [(set (match_operand:V16QI 0 "register_operand" "=v")
>  	(vec_select:V16QI
>  	  (vec_concat:V32QI
> @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
>  		     (const_int 5) (const_int 21)
>  		     (const_int 6) (const_int 22)
>  		     (const_int 7) (const_int 23)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrghb %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghb_direct_le"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +	(vec_select:V16QI
> +	  (vec_concat:V32QI
> +	    (match_operand:V16QI 2 "register_operand" "v")
> +	    (match_operand:V16QI 1 "register_operand" "v"))
> +	  (parallel [(const_int  8) (const_int 24)
> +		     (const_int  9) (const_int 25)
> +		     (const_int 10) (const_int 26)
> +		     (const_int 11) (const_int 27)
> +		     (const_int 12) (const_int 28)
> +		     (const_int 13) (const_int 29)
> +		     (const_int 14) (const_int 30)
> +		     (const_int 15) (const_int 31)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "vmrghb %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
>     (use (match_operand:V8HI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
> -						: gen_altivec_vmrglh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrghh_direct"
> +(define_insn "altivec_vmrghh_direct_be"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
> -        (vec_select:V8HI
> +	(vec_select:V8HI
>  	  (vec_concat:V16HI
>  	    (match_operand:V8HI 1 "register_operand" "v")
>  	    (match_operand:V8HI 2 "register_operand" "v"))
> @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
>  		     (const_int 1) (const_int 9)
>  		     (const_int 2) (const_int 10)
>  		     (const_int 3) (const_int 11)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrghh %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghh_direct_le"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> +        (vec_select:V8HI
> +	  (vec_concat:V16HI
> +	    (match_operand:V8HI 2 "register_operand" "v")
> +	    (match_operand:V8HI 1 "register_operand" "v"))
> +	  (parallel [(const_int 4) (const_int 12)
> +		     (const_int 5) (const_int 13)
> +		     (const_int 6) (const_int 14)
> +		     (const_int 7) (const_int 15)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "vmrghh %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
>     (use (match_operand:V4SI 2 "register_operand"))]
>    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
> -			 : gen_altivec_vmrglw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrghw_direct_<mode>"
> +(define_insn "altivec_vmrghw_direct_<mode>_be"
>    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>  	(vec_select:VSX_W
>  	  (vec_concat:<VS_double>
> @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
>  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>  	  (parallel [(const_int 0) (const_int 4)
>  		     (const_int 1) (const_int 5)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "@
> +   xxmrghw %x0,%x1,%x2
> +   vmrghw %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghw_direct_<mode>_le"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> +	(vec_select:VSX_W
> +	  (vec_concat:<VS_double>
> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
> +	  (parallel [(const_int 2) (const_int 6)
> +		     (const_int 3) (const_int 7)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "@
>     xxmrghw %x0,%x1,%x2
>     vmrghw %0,%1,%2"
> @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
> -						: gen_altivec_vmrghb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrglb_direct"
> +(define_insn "altivec_vmrglb_direct_be"
>    [(set (match_operand:V16QI 0 "register_operand" "=v")
>  	(vec_select:V16QI
>  	  (vec_concat:V32QI
> @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
>  		     (const_int 13) (const_int 29)
>  		     (const_int 14) (const_int 30)
>  		     (const_int 15) (const_int 31)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrglb %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglb_direct_le"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +	(vec_select:V16QI
> +	  (vec_concat:V32QI
> +	    (match_operand:V16QI 2 "register_operand" "v")
> +	    (match_operand:V16QI 1 "register_operand" "v"))
> +	  (parallel [(const_int 0) (const_int 16)
> +		     (const_int 1) (const_int 17)
> +		     (const_int 2) (const_int 18)
> +		     (const_int 3) (const_int 19)
> +		     (const_int 4) (const_int 20)
> +		     (const_int 5) (const_int 21)
> +		     (const_int 6) (const_int 22)
> +		     (const_int 7) (const_int 23)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "vmrglb %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
>     (use (match_operand:V8HI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
> -						: gen_altivec_vmrghh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (
> +      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (
> +      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrglh_direct"
> +(define_insn "altivec_vmrglh_direct_be"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (vec_select:V8HI
>  	  (vec_concat:V16HI
> @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
>  		     (const_int 5) (const_int 13)
>  		     (const_int 6) (const_int 14)
>  		     (const_int 7) (const_int 15)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "vmrglh %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglh_direct_le"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> +	(vec_select:V8HI
> +	  (vec_concat:V16HI
> +	    (match_operand:V8HI 2 "register_operand" "v")
> +	    (match_operand:V8HI 1 "register_operand" "v"))
> +	  (parallel [(const_int 0) (const_int 8)
> +		     (const_int 1) (const_int 9)
> +		     (const_int 2) (const_int 10)
> +		     (const_int 3) (const_int 11)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "vmrglh %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
>     (use (match_operand:V4SI 2 "register_operand"))]
>    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
> -			 : gen_altivec_vmrghw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>    DONE;
>  })
>  
> -(define_insn "altivec_vmrglw_direct_<mode>"
> +(define_insn "altivec_vmrglw_direct_<mode>_be"
>    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>  	(vec_select:VSX_W
>  	  (vec_concat:<VS_double>
> @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
>  	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>  	  (parallel [(const_int 2) (const_int 6)
>  		     (const_int 3) (const_int 7)])))]
> -  "TARGET_ALTIVEC"
> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> +  "@
> +   xxmrglw %x0,%x1,%x2
> +   vmrglw %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglw_direct_<mode>_le"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> +	(vec_select:VSX_W
> +	  (vec_concat:<VS_double>
> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
> +	  (parallel [(const_int 0) (const_int 4)
> +		     (const_int 1) (const_int 5)])))]
> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>    "@
>     xxmrglw %x0,%x1,%x2
>     vmrglw %0,%1,%2"
> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>      {
>        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
>      {
>        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
>      {
>        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
>      {
>        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
>      {
>        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
>      {
>        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
>      {
>        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
>      {
>        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>      }
>    else
>      {
>        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>      }
>    DONE;
>  })
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..c6ccd40e089 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>       CODE_FOR_altivec_vpkuwum_direct,
>       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
>      {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
> -		      : CODE_FOR_altivec_vmrglb_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
> +		      : CODE_FOR_altivec_vmrglb_direct_le,
>       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
>      {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
> -		      : CODE_FOR_altivec_vmrglh_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
> +		      : CODE_FOR_altivec_vmrglh_direct_le,
>       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
>      {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
> -		      : CODE_FOR_altivec_vmrglw_direct_v4si,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
> +		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
>       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
>      {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
> -		      : CODE_FOR_altivec_vmrghb_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
> +		      : CODE_FOR_altivec_vmrghb_direct_le,
>       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
>      {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
> -		      : CODE_FOR_altivec_vmrghh_direct,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
> +		      : CODE_FOR_altivec_vmrghh_direct_le,
>       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
>      {OPTION_MASK_ALTIVEC,
> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
> -		      : CODE_FOR_altivec_vmrghw_direct_v4si,
> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
> +		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
>       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
>      {OPTION_MASK_P8_VECTOR,
>       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..80f84e9b141 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
>  		     (const_int 1) (const_int 5)])))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
> -			 : gen_altivec_vmrglw_direct_<mode>;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>    DONE;
>  }
>    [(set_attr "type" "vecperm")])
> @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
>  		     (const_int 3) (const_int 7)])))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
> -			 : gen_altivec_vmrghw_direct_<mode>;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
> +						  operands[1],
> +						  operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
> +						  operands[2],
> +						  operands[1]));
>    DONE;
>  }
>    [(set_attr "type" "vecperm")])
> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
> new file mode 100644
> index 00000000000..c89739ecb55
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
> @@ -0,0 +1,118 @@
> +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
> +/* { dg-require-effective-target vmx_hw } */
> +/* { dg-do run } */
> +
> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
> +
> +union
> +{
> +  native_simd_type V;
> +  int R[4];
> +} store_le_vec;
> +
> +struct S
> +{
> +  S () = default;
> +  S (unsigned B0)
> +  {
> +    native_simd_type val{B0};
> +    m_simd = val;
> +  }
> +  void store_le (unsigned int out[])
> +  {
> +    store_le_vec.V = m_simd;
> +    unsigned int x0 = store_le_vec.R[0];
> +    __builtin_memcpy (out, &x0, 4);
> +  }
> +  S rotl (unsigned int r)
> +  {
> +    native_simd_type rot{r};
> +    return __builtin_vec_rl (m_simd, rot);
> +  }
> +  void operator+= (S other)
> +  {
> +    m_simd = __builtin_vec_add (m_simd, other.m_simd);
> +  }
> +  void operator^= (S other)
> +  {
> +    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
> +  }
> +  static void transpose (S &B0, S B1, S B2, S B3)
> +  {
> +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
> +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
> +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
> +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
> +    B0 = __builtin_vec_mergeh (T0, T1);
> +    B3 = __builtin_vec_mergel (T2, T3);
> +  }
> +  S (native_simd_type x) : m_simd (x) {}
> +  native_simd_type m_simd;
> +};
> +
> +void
> +foo (unsigned int output[], unsigned state[])
> +{
> +  S R00 = state[0];
> +  S R01 = state[0];
> +  S R02 = state[2];
> +  S R03 = state[0];
> +  S R05 = state[5];
> +  S R06 = state[6];
> +  S R07 = state[7];
> +  S R08 = state[8];
> +  S R09 = state[9];
> +  S R10 = state[10];
> +  S R11 = state[11];
> +  S R12 = state[12];
> +  S R13 = state[13];
> +  S R14 = state[4];
> +  S R15 = state[15];
> +  for (int r = 0; r != 10; ++r)
> +    {
> +      R09 += R13;
> +      R11 += R15;
> +      R05 ^= R09;
> +      R06 ^= R10;
> +      R07 ^= R11;
> +      R07 = R07.rotl (7);
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 ^= R01;
> +      R13 ^= R02;
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 = R12.rotl (8);
> +      R13 = R13.rotl (8);
> +      R10 += R15;
> +      R11 += R12;
> +      R08 += R13;
> +      R09 += R14;
> +      R05 ^= R10;
> +      R06 ^= R11;
> +      R07 ^= R08;
> +      R05 = R05.rotl (7);
> +      R06 = R06.rotl (7);
> +      R07 = R07.rotl (7);
> +    }
> +  R00 += state[0];
> +  S::transpose (R00, R01, R02, R03);
> +  R00.store_le (output);
> +}
> +
> +unsigned int res[1];
> +unsigned main_state[]{1634760805, 60878,      2036477234, 6,
> +		      0,	  825562964,  1471091955, 1346092787,
> +		      506976774,  4197066702, 518848283,  118491664,
> +		      0,	  0,	      0,	  0};
> +int
> +main ()
> +{
> +  foo (res, main_state);
> +  if (res[0] != 0x41fcef98)
> +    __builtin_abort ();
> +}
> -- 2.27.0
> 
> 附件：
> 
> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch	25.1 K
> 


BR,
Kewen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2023-01-18  9:11               ` Kewen.Lin
@ 2023-02-09  2:15                 ` Xionghu Luo
  2023-02-09 15:52                   ` Segher Boessenkool
  0 siblings, 1 reply; 12+ messages in thread
From: Xionghu Luo @ 2023-02-09  2:15 UTC (permalink / raw)
  To: Kewen.Lin, Segher Boessenkool
  Cc: Xionghu Luo, gcc-patches, David Edelsohn, Jakub Jelinek

Thanks Kewen!
Ping this again @Segher.
Maybe we could also merge this patch if no objections from Segher as 
several reviews and tests taken on this already...


BR,
Xionghu


On 2023/1/18 17:11, Kewen.Lin wrote:
> Hi Segher,
> 
> I guessed that this patch escaped from your radar. :)
> 
> As Jakub asked the status in PR106069, I applied this attached patch from Xionghu
> to the latest trunk, re-tested it and confirmed that it's still bootstrapped and
> regtested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.
> 
> This new version has separated out direct le and be, it's more clear than before,
> it looked good to me.  What do you think of this?  Looking forward to your opinion.
> 
> btw, the link in archives:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600169.html
> 
> BR,
> Kewen
> 
> on 2022/8/24 09:24, Xionghu Luo wrote:
>> 主题:
>> Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
>> From:
>> Xionghu Luo <yinyuefengyi@gmail.com>
>> 日期:
>> 2022/8/24, 09:24
>>
>> 收件人:
>> "Kewen.Lin" <linkw@linux.ibm.com>, Segher Boessenkool <segher@kernel.crashing.org>
>> 抄送:
>> Xionghu Luo <xionghuluo@tencent.com>, gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>, Segher Boessenkool <segher@kernel.crashing.org>
>>
>>
>> Hi Segher, I'd like to resend and ping for this patch. Thanks.
>>
>> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch
>>
>>  From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
>> From: Xionghu Luo <xionghuluo@tencent.com>
>> Date: Thu, 4 Aug 2022 03:44:58 +0000
>> Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
>>   UNSPECS [PR106069]
>>
>> v4: Update per comments.
>> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
>> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
>> patterns.
>> v2: Split the direct pattern to be and le with same RTL but different insn.
>>
>> The native RTL expression for vec_mrghw should be same for BE and LE as
>> they are register and endian-independent.  So both BE and LE need
>> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
>> with vec_select and vec_concat.
>>
>> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>> 		   (subreg:V4SI (reg:V16QI 139) 0)
>> 		   (subreg:V4SI (reg:V16QI 140) 0))
>> 		   [const_int 0 4 1 5]))
>>
>> Then combine pass could do the nested vec_select optimization
>> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
>> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
>>
>> =>
>>
>> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
>> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
>>
>> The endianness check need only once at ASM generation finally.
>> ASM would be better due to nested vec_select simplified to simple scalar
>> load.
>>
>> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
>> Linux.
>>
>> gcc/ChangeLog:
>>
>> 	PR target/106069
>> 	* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
>> 	(altivec_vmrghb_direct_be): New pattern for BE.
>> 	(altivec_vmrghb_direct_le): New pattern for LE.
>> 	(altivec_vmrghh_direct): Remove.
>> 	(altivec_vmrghh_direct_be): New pattern for BE.
>> 	(altivec_vmrghh_direct_le): New pattern for LE.
>> 	(altivec_vmrghw_direct_<mode>): Remove.
>> 	(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
>> 	(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
>> 	(altivec_vmrglb_direct): Remove.
>> 	(altivec_vmrglb_direct_be): New pattern for BE.
>> 	(altivec_vmrglb_direct_le): New pattern for LE.
>> 	(altivec_vmrglh_direct): Remove.
>> 	(altivec_vmrglh_direct_be): New pattern for BE.
>> 	(altivec_vmrglh_direct_le): New pattern for LE.
>> 	(altivec_vmrglw_direct_<mode>): Remove.
>> 	(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
>> 	(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
>> 	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
>> 	Adjust.
>> 	* config/rs6000/vsx.md: Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR target/106069
>> 	* g++.target/powerpc/pr106069.C: New test.
>>
>> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
>> ---
>>   gcc/config/rs6000/altivec.md                | 222 ++++++++++++++------
>>   gcc/config/rs6000/rs6000.cc                 |  24 +--
>>   gcc/config/rs6000/vsx.md                    |  28 +--
>>   gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
>>   4 files changed, 307 insertions(+), 85 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
>>
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 2c4940f2e21..c6a381908cb 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
>> -						: gen_altivec_vmrglb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghb_direct"
>> +(define_insn "altivec_vmrghb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>   	(vec_select:V16QI
>>   	  (vec_concat:V32QI
>> @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
>>   		     (const_int 5) (const_int 21)
>>   		     (const_int 6) (const_int 22)
>>   		     (const_int 7) (const_int 23)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrghb %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +	(vec_select:V16QI
>> +	  (vec_concat:V32QI
>> +	    (match_operand:V16QI 2 "register_operand" "v")
>> +	    (match_operand:V16QI 1 "register_operand" "v"))
>> +	  (parallel [(const_int  8) (const_int 24)
>> +		     (const_int  9) (const_int 25)
>> +		     (const_int 10) (const_int 26)
>> +		     (const_int 11) (const_int 27)
>> +		     (const_int 12) (const_int 28)
>> +		     (const_int 13) (const_int 29)
>> +		     (const_int 14) (const_int 30)
>> +		     (const_int 15) (const_int 31)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrghb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
>> -						: gen_altivec_vmrglh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghh_direct"
>> +(define_insn "altivec_vmrghh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>> -        (vec_select:V8HI
>> +	(vec_select:V8HI
>>   	  (vec_concat:V16HI
>>   	    (match_operand:V8HI 1 "register_operand" "v")
>>   	    (match_operand:V8HI 2 "register_operand" "v"))
>> @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
>>   		     (const_int 1) (const_int 9)
>>   		     (const_int 2) (const_int 10)
>>   		     (const_int 3) (const_int 11)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrghh %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +        (vec_select:V8HI
>> +	  (vec_concat:V16HI
>> +	    (match_operand:V8HI 2 "register_operand" "v")
>> +	    (match_operand:V8HI 1 "register_operand" "v"))
>> +	  (parallel [(const_int 4) (const_int 12)
>> +		     (const_int 5) (const_int 13)
>> +		     (const_int 6) (const_int 14)
>> +		     (const_int 7) (const_int 15)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrghh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
>> -			 : gen_altivec_vmrglw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrghw_direct_<mode>"
>> +(define_insn "altivec_vmrghw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>   	(vec_select:VSX_W
>>   	  (vec_concat:<VS_double>
>> @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
>>   	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>   	  (parallel [(const_int 0) (const_int 4)
>>   		     (const_int 1) (const_int 5)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +   xxmrghw %x0,%x1,%x2
>> +   vmrghw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrghw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +	(vec_select:VSX_W
>> +	  (vec_concat:<VS_double>
>> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
>> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
>> +	  (parallel [(const_int 2) (const_int 6)
>> +		     (const_int 3) (const_int 7)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>>      xxmrghw %x0,%x1,%x2
>>      vmrghw %0,%1,%2"
>> @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
>>      (use (match_operand:V16QI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
>> -						: gen_altivec_vmrghb_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglb_direct"
>> +(define_insn "altivec_vmrglb_direct_be"
>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>   	(vec_select:V16QI
>>   	  (vec_concat:V32QI
>> @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
>>   		     (const_int 13) (const_int 29)
>>   		     (const_int 14) (const_int 30)
>>   		     (const_int 15) (const_int 31)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrglb %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglb_direct_le"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +	(vec_select:V16QI
>> +	  (vec_concat:V32QI
>> +	    (match_operand:V16QI 2 "register_operand" "v")
>> +	    (match_operand:V16QI 1 "register_operand" "v"))
>> +	  (parallel [(const_int 0) (const_int 16)
>> +		     (const_int 1) (const_int 17)
>> +		     (const_int 2) (const_int 18)
>> +		     (const_int 3) (const_int 19)
>> +		     (const_int 4) (const_int 20)
>> +		     (const_int 5) (const_int 21)
>> +		     (const_int 6) (const_int 22)
>> +		     (const_int 7) (const_int 23)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrglb %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
>>      (use (match_operand:V8HI 2 "register_operand"))]
>>     "TARGET_ALTIVEC"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
>> -						: gen_altivec_vmrghh_direct;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (
>> +      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
>> +  else
>> +    emit_insn (
>> +      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglh_direct"
>> +(define_insn "altivec_vmrglh_direct_be"
>>     [(set (match_operand:V8HI 0 "register_operand" "=v")
>>           (vec_select:V8HI
>>   	  (vec_concat:V16HI
>> @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
>>   		     (const_int 5) (const_int 13)
>>   		     (const_int 6) (const_int 14)
>>   		     (const_int 7) (const_int 15)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "vmrglh %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglh_direct_le"
>> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> +	(vec_select:V8HI
>> +	  (vec_concat:V16HI
>> +	    (match_operand:V8HI 2 "register_operand" "v")
>> +	    (match_operand:V8HI 1 "register_operand" "v"))
>> +	  (parallel [(const_int 0) (const_int 8)
>> +		     (const_int 1) (const_int 9)
>> +		     (const_int 2) (const_int 10)
>> +		     (const_int 3) (const_int 11)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "vmrglh %0,%1,%2"
>>     [(set_attr "type" "vecperm")])
>>   
>> @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
>>      (use (match_operand:V4SI 2 "register_operand"))]
>>     "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
>> -			 : gen_altivec_vmrghw_direct_v4si;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   })
>>   
>> -(define_insn "altivec_vmrglw_direct_<mode>"
>> +(define_insn "altivec_vmrglw_direct_<mode>_be"
>>     [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>>   	(vec_select:VSX_W
>>   	  (vec_concat:<VS_double>
>> @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
>>   	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
>>   	  (parallel [(const_int 2) (const_int 6)
>>   		     (const_int 3) (const_int 7)])))]
>> -  "TARGET_ALTIVEC"
>> +  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
>> +  "@
>> +   xxmrglw %x0,%x1,%x2
>> +   vmrglw %0,%1,%2"
>> +  [(set_attr "type" "vecperm")])
>> +
>> +(define_insn "altivec_vmrglw_direct_<mode>_le"
>> +  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
>> +	(vec_select:VSX_W
>> +	  (vec_concat:<VS_double>
>> +	    (match_operand:VSX_W 2 "register_operand" "wa,v")
>> +	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
>> +	  (parallel [(const_int 0) (const_int 4)
>> +		     (const_int 1) (const_int 5)])))]
>> +  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
>>     "@
>>      xxmrglw %x0,%x1,%x2
>>      vmrglw %0,%1,%2"
>> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
>>       {
>>         emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
>>       {
>>         emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
>>       }
>>     else
>>       {
>>         emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
>>         emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
>> -      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
>> +      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
>>       }
>>     DONE;
>>   })
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..c6ccd40e089 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>>        CODE_FOR_altivec_vpkuwum_direct,
>>        {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
>> -		      : CODE_FOR_altivec_vmrglb_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
>> +		      : CODE_FOR_altivec_vmrglb_direct_le,
>>        {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
>> -		      : CODE_FOR_altivec_vmrglh_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
>> +		      : CODE_FOR_altivec_vmrglh_direct_le,
>>        {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
>> -		      : CODE_FOR_altivec_vmrglw_direct_v4si,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
>> +		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
>>        {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
>> -		      : CODE_FOR_altivec_vmrghb_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
>> +		      : CODE_FOR_altivec_vmrghb_direct_le,
>>        {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
>> -		      : CODE_FOR_altivec_vmrghh_direct,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
>> +		      : CODE_FOR_altivec_vmrghh_direct_le,
>>        {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
>>       {OPTION_MASK_ALTIVEC,
>> -     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
>> -		      : CODE_FOR_altivec_vmrghw_direct_v4si,
>> +     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
>> +		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
>>        {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
>>       {OPTION_MASK_P8_VECTOR,
>>        BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index e226a93bbe5..80f84e9b141 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
>>   		     (const_int 1) (const_int 5)])))]
>>     "VECTOR_MEM_VSX_P (<MODE>mode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
>> -			 : gen_altivec_vmrglw_direct_<mode>;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   }
>>     [(set_attr "type" "vecperm")])
>> @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
>>   		     (const_int 3) (const_int 7)])))]
>>     "VECTOR_MEM_VSX_P (<MODE>mode)"
>>   {
>> -  rtx (*fun) (rtx, rtx, rtx);
>> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
>> -			 : gen_altivec_vmrghw_direct_<mode>;
>> -  if (!BYTES_BIG_ENDIAN)
>> -    std::swap (operands[1], operands[2]);
>> -  emit_insn (fun (operands[0], operands[1], operands[2]));
>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
>> +						  operands[1],
>> +						  operands[2]));
>> +  else
>> +    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
>> +						  operands[2],
>> +						  operands[1]));
>>     DONE;
>>   }
>>     [(set_attr "type" "vecperm")])
>> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
>> new file mode 100644
>> index 00000000000..c89739ecb55
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
>> @@ -0,0 +1,118 @@
>> +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
>> +/* { dg-require-effective-target vmx_hw } */
>> +/* { dg-do run } */
>> +
>> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
>> +
>> +union
>> +{
>> +  native_simd_type V;
>> +  int R[4];
>> +} store_le_vec;
>> +
>> +struct S
>> +{
>> +  S () = default;
>> +  S (unsigned B0)
>> +  {
>> +    native_simd_type val{B0};
>> +    m_simd = val;
>> +  }
>> +  void store_le (unsigned int out[])
>> +  {
>> +    store_le_vec.V = m_simd;
>> +    unsigned int x0 = store_le_vec.R[0];
>> +    __builtin_memcpy (out, &x0, 4);
>> +  }
>> +  S rotl (unsigned int r)
>> +  {
>> +    native_simd_type rot{r};
>> +    return __builtin_vec_rl (m_simd, rot);
>> +  }
>> +  void operator+= (S other)
>> +  {
>> +    m_simd = __builtin_vec_add (m_simd, other.m_simd);
>> +  }
>> +  void operator^= (S other)
>> +  {
>> +    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
>> +  }
>> +  static void transpose (S &B0, S B1, S B2, S B3)
>> +  {
>> +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
>> +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
>> +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
>> +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
>> +    B0 = __builtin_vec_mergeh (T0, T1);
>> +    B3 = __builtin_vec_mergel (T2, T3);
>> +  }
>> +  S (native_simd_type x) : m_simd (x) {}
>> +  native_simd_type m_simd;
>> +};
>> +
>> +void
>> +foo (unsigned int output[], unsigned state[])
>> +{
>> +  S R00 = state[0];
>> +  S R01 = state[0];
>> +  S R02 = state[2];
>> +  S R03 = state[0];
>> +  S R05 = state[5];
>> +  S R06 = state[6];
>> +  S R07 = state[7];
>> +  S R08 = state[8];
>> +  S R09 = state[9];
>> +  S R10 = state[10];
>> +  S R11 = state[11];
>> +  S R12 = state[12];
>> +  S R13 = state[13];
>> +  S R14 = state[4];
>> +  S R15 = state[15];
>> +  for (int r = 0; r != 10; ++r)
>> +    {
>> +      R09 += R13;
>> +      R11 += R15;
>> +      R05 ^= R09;
>> +      R06 ^= R10;
>> +      R07 ^= R11;
>> +      R07 = R07.rotl (7);
>> +      R00 += R05;
>> +      R01 += R06;
>> +      R02 += R07;
>> +      R15 ^= R00;
>> +      R12 ^= R01;
>> +      R13 ^= R02;
>> +      R00 += R05;
>> +      R01 += R06;
>> +      R02 += R07;
>> +      R15 ^= R00;
>> +      R12 = R12.rotl (8);
>> +      R13 = R13.rotl (8);
>> +      R10 += R15;
>> +      R11 += R12;
>> +      R08 += R13;
>> +      R09 += R14;
>> +      R05 ^= R10;
>> +      R06 ^= R11;
>> +      R07 ^= R08;
>> +      R05 = R05.rotl (7);
>> +      R06 = R06.rotl (7);
>> +      R07 = R07.rotl (7);
>> +    }
>> +  R00 += state[0];
>> +  S::transpose (R00, R01, R02, R03);
>> +  R00.store_le (output);
>> +}
>> +
>> +unsigned int res[1];
>> +unsigned main_state[]{1634760805, 60878,      2036477234, 6,
>> +		      0,	  825562964,  1471091955, 1346092787,
>> +		      506976774,  4197066702, 518848283,  118491664,
>> +		      0,	  0,	      0,	  0};
>> +int
>> +main ()
>> +{
>> +  foo (res, main_state);
>> +  if (res[0] != 0x41fcef98)
>> +    __builtin_abort ();
>> +}
>> -- 2.27.0
>>
>> 附件：
>>
>> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch	25.1 K
>>
> 
> 
> BR,
> Kewen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
  2023-02-09  2:15                 ` Xionghu Luo
@ 2023-02-09 15:52                   ` Segher Boessenkool
  0 siblings, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2023-02-09 15:52 UTC (permalink / raw)
  To: Xionghu Luo
  Cc: Kewen.Lin, Xionghu Luo, gcc-patches, David Edelsohn, Jakub Jelinek

On Thu, Feb 09, 2023 at 10:15:22AM +0800, Xionghu Luo wrote:
> Thanks Kewen!
> Ping this again @Segher.
> Maybe we could also merge this patch if no objections from Segher as 
> several reviews and tests taken on this already...

Please send the patch as the head of its own thread, not as a reply deep
in a thread of an older version?


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-09 15:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-08  3:42 [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo
2022-08-09  3:01 ` Kewen.Lin
2022-08-09 22:03   ` Segher Boessenkool
2022-08-10  6:39   ` [PATCH v2] " Xionghu Luo
2022-08-10 17:07     ` Segher Boessenkool
2022-08-11  6:15       ` Xionghu Luo
2022-08-16  6:53         ` Kewen.Lin
2022-08-17  6:23           ` [PATCH v4] " Xionghu Luo
2022-08-24  1:24             ` Ping: " Xionghu Luo
2023-01-18  9:11               ` Kewen.Lin
2023-02-09  2:15                 ` Xionghu Luo
2023-02-09 15:52                   ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).