* [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] @ 2023-02-10 2:59 Xionghu Luo 2023-02-28 6:43 ` Ping: " Xionghu Luo ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Xionghu Luo @ 2023-02-10 2:59 UTC (permalink / raw) To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, Xionghu Luo Resend this patch... v4: Update per comments. v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match the actual output ASM vmrglb. Likewise for all similar xxx_direct_le patterns. v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Linux. gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. (altivec_vmrghb_direct_be): New pattern for BE. (altivec_vmrghb_direct_le): New pattern for LE. (altivec_vmrghh_direct): Remove. (altivec_vmrghh_direct_be): New pattern for BE. (altivec_vmrghh_direct_le): New pattern for LE. (altivec_vmrghw_direct_<mode>): Remove. (altivec_vmrghw_direct_<mode>_be): New pattern for BE. (altivec_vmrghw_direct_<mode>_le): New pattern for LE. (altivec_vmrglb_direct): Remove. (altivec_vmrglb_direct_be): New pattern for BE. (altivec_vmrglb_direct_le): New pattern for LE. (altivec_vmrglh_direct): Remove. (altivec_vmrglh_direct_be): New pattern for BE. (altivec_vmrglh_direct_le): New pattern for LE. (altivec_vmrglw_direct_<mode>): Remove. (altivec_vmrglw_direct_<mode>_be): New pattern for BE. (altivec_vmrglw_direct_<mode>_le): New pattern for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Adjust. * config/rs6000/vsx.md: Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 222 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 24 +-- gcc/config/rs6000/vsx.md | 28 +-- gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++ 4 files changed, 307 insertions(+), 85 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 30606b8ab21..4bfeecec224 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrghw_direct_<mode>" +(define_insn "altivec_vmrghw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ xxmrghw %x0,%x1,%x2 vmrghw %0,%1,%2" @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrglw_direct_<mode>" +(define_insn "altivec_vmrglw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ xxmrglw %x0,%x1,%x2 vmrglw %0,%1,%2" @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 16ca3a31757..aba6315cd5f 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be + : CODE_FOR_altivec_vmrglb_direct_le, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be + : CODE_FOR_altivec_vmrglh_direct_le, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be + : CODE_FOR_altivec_vmrglw_direct_v4si_le, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be + : CODE_FOR_altivec_vmrghb_direct_le, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be + : CODE_FOR_altivec_vmrghh_direct_le, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be + : CODE_FOR_altivec_vmrghw_direct_v4si_le, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0865608f94a..f8d2c316a55 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) @@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..c89739ecb55 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,118 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + __builtin_memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +} -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2023-02-10 2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo @ 2023-02-28 6:43 ` Xionghu Luo 2023-03-30 19:30 ` Segher Boessenkool ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Xionghu Luo @ 2023-02-28 6:43 UTC (permalink / raw) To: Xionghu Luo, gcc-patches; +Cc: segher, linkw Hi Segher, Ping this for stage 4... On 2023/2/10 10:59, Xionghu Luo via Gcc-patches wrote: > Resend this patch... > > v4: Update per comments. > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} > Linux. > > gcc/ChangeLog: > > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. > (altivec_vmrghb_direct_be): New pattern for BE. > (altivec_vmrghb_direct_le): New pattern for LE. > (altivec_vmrghh_direct): Remove. > (altivec_vmrghh_direct_be): New pattern for BE. > (altivec_vmrghh_direct_le): New pattern for LE. > (altivec_vmrghw_direct_<mode>): Remove. > (altivec_vmrghw_direct_<mode>_be): New pattern for BE. > (altivec_vmrghw_direct_<mode>_le): New pattern for LE. > (altivec_vmrglb_direct): Remove. > (altivec_vmrglb_direct_be): New pattern for BE. > (altivec_vmrglb_direct_le): New pattern for LE. > (altivec_vmrglh_direct): Remove. > (altivec_vmrglh_direct_be): New pattern for BE. > (altivec_vmrglh_direct_le): New pattern for LE. > (altivec_vmrglw_direct_<mode>): Remove. > (altivec_vmrglw_direct_<mode>_be): New pattern for BE. > (altivec_vmrglw_direct_<mode>_le): New pattern for LE. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > > PR target/106069 > * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> > --- > gcc/config/rs6000/altivec.md | 222 ++++++++++++++------ > gcc/config/rs6000/rs6000.cc | 24 +-- > gcc/config/rs6000/vsx.md | 28 +-- > gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++ > 4 files changed, 307 insertions(+), 85 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 30606b8ab21..4bfeecec224 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" > (const_int 5) (const_int 21) > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 8) (const_int 24) > + (const_int 9) (const_int 25) > + (const_int 10) (const_int 26) > + (const_int 11) (const_int 27) > + (const_int 12) (const_int 28) > + (const_int 13) (const_int 29) > + (const_int 14) (const_int 30) > + (const_int 15) (const_int 31)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > - (vec_select:V8HI > + (vec_select:V8HI > (vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 4) (const_int 12) > + (const_int 5) (const_int 13) > + (const_int 6) (const_int 14) > + (const_int 7) (const_int 15)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghw_direct_<mode>" > +(define_insn "altivec_vmrghw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 0) (const_int 4) > (const_int 1) (const_int 5)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 2) (const_int 6) > + (const_int 3) (const_int 7)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrghw %x0,%x1,%x2 > vmrghw %0,%1,%2" > @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > - : gen_altivec_vmrghb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" > (const_int 13) (const_int 29) > (const_int 14) (const_int 30) > (const_int 15) (const_int 31)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 16) > + (const_int 1) (const_int 17) > + (const_int 2) (const_int 18) > + (const_int 3) (const_int 19) > + (const_int 4) (const_int 20) > + (const_int 5) (const_int 21) > + (const_int 6) (const_int 22) > + (const_int 7) (const_int 23)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > - : gen_altivec_vmrghh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI > (vec_concat:V16HI > @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" > (const_int 5) (const_int 13) > (const_int 6) (const_int 14) > (const_int 7) (const_int 15)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 1) (const_int 9) > + (const_int 2) (const_int 10) > + (const_int 3) (const_int 11)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > - : gen_altivec_vmrghw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglw_direct_<mode>" > +(define_insn "altivec_vmrglw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 2) (const_int 6) > (const_int 3) (const_int 7)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrglw %x0,%x1,%x2 > + vmrglw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 0) (const_int 4) > + (const_int 1) (const_int 5)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrglw %x0,%x1,%x2 > vmrglw %0,%1,%2" > @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 16ca3a31757..aba6315cd5f 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > CODE_FOR_altivec_vpkuwum_direct, > {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > - : CODE_FOR_altivec_vmrglb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be > + : CODE_FOR_altivec_vmrglb_direct_le, > {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > - : CODE_FOR_altivec_vmrglh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be > + : CODE_FOR_altivec_vmrglh_direct_le, > {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > - : CODE_FOR_altivec_vmrglw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be > + : CODE_FOR_altivec_vmrglw_direct_v4si_le, > {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > - : CODE_FOR_altivec_vmrghb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be > + : CODE_FOR_altivec_vmrghb_direct_le, > {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > - : CODE_FOR_altivec_vmrghh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be > + : CODE_FOR_altivec_vmrghh_direct_le, > {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > - : CODE_FOR_altivec_vmrghw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be > + : CODE_FOR_altivec_vmrghw_direct_v4si_le, > {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 0865608f94a..f8d2c316a55 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_<mode>" > (const_int 1) (const_int 5)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> > - : gen_altivec_vmrglw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > @@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_<mode>" > (const_int 3) (const_int 7)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> > - : gen_altivec_vmrghw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..c89739ecb55 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,118 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + __builtin_memcpy (out, &x0, 4); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +} ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2023-02-10 2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo 2023-02-28 6:43 ` Ping: " Xionghu Luo @ 2023-03-30 19:30 ` Segher Boessenkool 2023-03-31 2:47 ` Xionghu Luo 2024-06-12 7:50 ` Kewen.Lin 2024-06-18 20:31 ` Segher Boessenkool 3 siblings, 1 reply; 10+ messages in thread From: Segher Boessenkool @ 2023-03-30 19:30 UTC (permalink / raw) To: Xionghu Luo; +Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw Hi! On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote: > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. This isn't so obvious at all. All elements of these constructs are very much not endian-independent, because of very unfortunate choices in the meaning of some RTL constructs. It is possible all things in this negate all other things, but please show that then. > So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) With BE, if the source vecs are ABCD and EFGH, the vec_concat gives ABCDEFGH, and the vec_select than gives AEBF. What happens for LE? Segher ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2023-03-30 19:30 ` Segher Boessenkool @ 2023-03-31 2:47 ` Xionghu Luo 0 siblings, 0 replies; 10+ messages in thread From: Xionghu Luo @ 2023-03-31 2:47 UTC (permalink / raw) To: Segher Boessenkool, Xionghu Luo Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw Thanks, On 2023/3/31 03:30, Segher Boessenkool wrote: > Hi! > > On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote: >> The native RTL expression for vec_mrghw should be same for BE and LE as >> they are register and endian-independent. > > This isn't so obvious at all. All elements of these constructs are > very much not endian-independent, because of very unfortunate choices > in the meaning of some RTL constructs. It is possible all things in > this negate all other things, but please show that then. > >> So both BE and LE need >> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw >> with vec_select and vec_concat. >> >> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >> (subreg:V4SI (reg:V16QI 139) 0) >> (subreg:V4SI (reg:V16QI 140) 0)) >> [const_int 0 4 1 5])) > > With BE, if the source vecs are ABCD and EFGH, the vec_concat gives > ABCDEFGH, and the vec_select than gives AEBF. > > What happens for LE? on LE, the sources looks like DCBA and HGFE, vec_concat gives HGFEACBA with index reversed [7 6 5 4 3 2 1 0], so it also chooses FBEA like BE. Take the case as example on P8LE: test.c __attribute__ ((__noinline__)) vector int bar (vector int a, vector int b) { return vec_vmrghw (a, b); } int main () { vector int a = {0xa1345678, 0xa2345678,0xa3345678, 0xa4345678}; vector int b = {0xb1345678, 0xb2345678,0xb3345678, 0xb4345678}; vector int c = bar (a, b); printf("%x,%x,%x,%x\n", c[0], c[1], c[2], c[3]); return c[0]; } .expand: _3 = VEC_PERM_EXPR <a_1(D), b_2(D), { 0, 4, 1, 5 }>; (insn 7 4 8 2 (set (reg:V16QI 122) (subreg:V16QI (reg/v:V4SI 118 [ a ]) 0)) "test.c":15:10 -1 (nil)) (insn 8 7 9 2 (set (reg:V16QI 123) (subreg:V16QI (reg/v:V4SI 119 [ b ]) 0)) "test.c":15:10 -1 (nil)) (insn 9 8 10 2 (set (reg:V4SI 124) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 122) 0) (subreg:V4SI (reg:V16QI 123) 0)) (parallel [ (const_int 0 [0]) (const_int 4 [0x4]) (const_int 1 [0x1]) (const_int 5 [0x5]) ]))) "test.c":15:10 -1 (nil)) And .vregs to .final: (insn 15 9 16 (set (reg/i:V4SI 66 %v2) (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 66 %v2 [125]) (reg:V4SI 67 %v3 [126])) (parallel [ (const_int 0 [0]) (const_int 4 [0x4]) (const_int 1 [0x1]) (const_int 5 [0x5]) ]))) "test.c":16:1 1825 {altivec_vmrglw_direct_v4si_le} (expr_list:REG_DEAD (reg:V4SI 67 %v3 [126]) (nil))) As altivec_vmrglw_direct_v4si_le is defined as with this patch: (define_insn "altivec_vmrglw_direct_<mode>_le" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> (match_operand:VSX_W 2 "register_operand" "wa,v") (match_operand:VSX_W 1 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ xxmrglw %x0,%x1,%x2 vmrglw %0,%1,%2" [(set_attr "type" "vecperm")]) ASM: bar: .LFB11: .cfi_startproc xxmrglw 34,35,34 blr ./test a1345678,b1345678,a2345678,b2345678 Exactly matches [a1 b1 a2 b2]. Does this look reasonable? BR, Xionghu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2023-02-10 2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo 2023-02-28 6:43 ` Ping: " Xionghu Luo 2023-03-30 19:30 ` Segher Boessenkool @ 2024-06-12 7:50 ` Kewen.Lin 2024-06-18 19:02 ` Peter Bergner 2024-06-18 20:31 ` Segher Boessenkool 3 siblings, 1 reply; 10+ messages in thread From: Kewen.Lin @ 2024-06-12 7:50 UTC (permalink / raw) To: Xionghu Luo; +Cc: segher, dje.gcc, guojiufu, linkw, gcc-patches, Peter Bergner Hi, on 2023/2/10 10:59, Xionghu Luo wrote: > Resend this patch... > > v4: Update per comments. > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} > Linux. As the recent PR115355 shows, this issue can also affect the behavior when users are adopting vectorization optimization, IMHO we should get this landed as soon as possible. The culprit commit r12-4496 changes the expanders for vector merge {high/h,low/l} {byte/b, halfword/h, word/w}, which are mainly for built-in function vec_merge{h,l} expanding (also used as gen function in some internal uses). As PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors" and vec_mergel "Merges the last halves (in element order) of two vectors", so both of them clearly have endian considerations. Taking define_expand "altivec_vmrghb" as example, before commit r12-4496 it generates below RTL pattern for both BE and LE: // from (define_insn "*altivec_vmrghb_internal" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI (match_operand:V16QI 1 "register_operand" "v") (match_operand:V16QI 2 "register_operand" "v")) (parallel [(const_int 0) (const_int 16) (const_int 1) (const_int 17) (const_int 2) (const_int 18) (const_int 3) (const_int 19) (const_int 4) (const_int 20) (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] and which matches hardware insn "vmrghb %0,%1,%2" on BE while "vmrglb %0,%2,%1" on LE. After commit r12-4496, on BE it generates RTL pattern // from (define_insn "altivec_vmrghb_direct" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI (match_operand:V16QI 1 "register_operand" "v") (match_operand:V16QI 2 "register_operand" "v")) (parallel [(const_int 0) (const_int 16) (const_int 1) (const_int 17) (const_int 2) (const_int 18) (const_int 3) (const_int 19) (const_int 4) (const_int 20) (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] and matches hw insn "vmrghb %0,%1,%2" which is consistent with the previous. However, on LE it generates pattern // from (define_insn "altivec_vmrglb_direct" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI (match_operand:V16QI 2 "register_operand" "v") (match_operand:V16QI 1 "register_operand" "v")) (parallel [(const_int 8) (const_int 24) (const_int 9) (const_int 25) (const_int 10) (const_int 26) (const_int 11) (const_int 27) (const_int 12) (const_int 28) (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] , note that it's adjusted by considering the effect of std::swap on operands. It matches hw insn "vmrglb %0,%1,%2" which is the same as before (as swapping operands), but its associated RTL pattern is totally changed, which is wrong. If optimization passes leave this pattern alone, even if its pattern doesn't represent its hw insn, it's still fine, that's why simple testing on bif doesn't expose this issue. But once some optimization pass such as combine does some changes basing on this wrong pattern, because the pattern isn't match the semantics that the expanded insn is intended to represent, it would cause the unexpected result. So this patch is to fix the inconsistency after the culprit commit r12-4496, to ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped hw insns. With the proposed patch, the expanders like altivec_vmrghb expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le, it depends on the endianness and the patterns are turned back to the previous. The altivec_vmrg{hl}{bhw}_direct are expected to emit insns vmrg{hl}{bhw} directly, and to make it more clear, there are different versions for BE and LE to use (with _be and _le suffixes and the RTL patterns are certainly different). As all said above, I believe this patch is a correct fix and considering the impact of the issue, I'd like to get this pushed next week if no objections. btw, it's bootstrapped and regtested on powerpc64-linux-gnu P8/P9 & powerpc64le-linux-gnu P9/P10 again. BR, Kewen > > gcc/ChangeLog: > > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. > (altivec_vmrghb_direct_be): New pattern for BE. > (altivec_vmrghb_direct_le): New pattern for LE. > (altivec_vmrghh_direct): Remove. > (altivec_vmrghh_direct_be): New pattern for BE. > (altivec_vmrghh_direct_le): New pattern for LE. > (altivec_vmrghw_direct_<mode>): Remove. > (altivec_vmrghw_direct_<mode>_be): New pattern for BE. > (altivec_vmrghw_direct_<mode>_le): New pattern for LE. > (altivec_vmrglb_direct): Remove. > (altivec_vmrglb_direct_be): New pattern for BE. > (altivec_vmrglb_direct_le): New pattern for LE. > (altivec_vmrglh_direct): Remove. > (altivec_vmrglh_direct_be): New pattern for BE. > (altivec_vmrglh_direct_le): New pattern for LE. > (altivec_vmrglw_direct_<mode>): Remove. > (altivec_vmrglw_direct_<mode>_be): New pattern for BE. > (altivec_vmrglw_direct_<mode>_le): New pattern for LE. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > > PR target/106069 > * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> > --- > gcc/config/rs6000/altivec.md | 222 ++++++++++++++------ > gcc/config/rs6000/rs6000.cc | 24 +-- > gcc/config/rs6000/vsx.md | 28 +-- > gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++ > 4 files changed, 307 insertions(+), 85 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 30606b8ab21..4bfeecec224 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" > (const_int 5) (const_int 21) > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 8) (const_int 24) > + (const_int 9) (const_int 25) > + (const_int 10) (const_int 26) > + (const_int 11) (const_int 27) > + (const_int 12) (const_int 28) > + (const_int 13) (const_int 29) > + (const_int 14) (const_int 30) > + (const_int 15) (const_int 31)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > - (vec_select:V8HI > + (vec_select:V8HI > (vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 4) (const_int 12) > + (const_int 5) (const_int 13) > + (const_int 6) (const_int 14) > + (const_int 7) (const_int 15)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghw_direct_<mode>" > +(define_insn "altivec_vmrghw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 0) (const_int 4) > (const_int 1) (const_int 5)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 2) (const_int 6) > + (const_int 3) (const_int 7)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrghw %x0,%x1,%x2 > vmrghw %0,%1,%2" > @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > - : gen_altivec_vmrghb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" > (const_int 13) (const_int 29) > (const_int 14) (const_int 30) > (const_int 15) (const_int 31)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 16) > + (const_int 1) (const_int 17) > + (const_int 2) (const_int 18) > + (const_int 3) (const_int 19) > + (const_int 4) (const_int 20) > + (const_int 5) (const_int 21) > + (const_int 6) (const_int 22) > + (const_int 7) (const_int 23)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > - : gen_altivec_vmrghh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI > (vec_concat:V16HI > @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" > (const_int 5) (const_int 13) > (const_int 6) (const_int 14) > (const_int 7) (const_int 15)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 1) (const_int 9) > + (const_int 2) (const_int 10) > + (const_int 3) (const_int 11)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > - : gen_altivec_vmrghw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglw_direct_<mode>" > +(define_insn "altivec_vmrglw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 2) (const_int 6) > (const_int 3) (const_int 7)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrglw %x0,%x1,%x2 > + vmrglw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 0) (const_int 4) > + (const_int 1) (const_int 5)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrglw %x0,%x1,%x2 > vmrglw %0,%1,%2" > @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 16ca3a31757..aba6315cd5f 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > CODE_FOR_altivec_vpkuwum_direct, > {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > - : CODE_FOR_altivec_vmrglb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be > + : CODE_FOR_altivec_vmrglb_direct_le, > {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > - : CODE_FOR_altivec_vmrglh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be > + : CODE_FOR_altivec_vmrglh_direct_le, > {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > - : CODE_FOR_altivec_vmrglw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be > + : CODE_FOR_altivec_vmrglw_direct_v4si_le, > {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > - : CODE_FOR_altivec_vmrghb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be > + : CODE_FOR_altivec_vmrghb_direct_le, > {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > - : CODE_FOR_altivec_vmrghh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be > + : CODE_FOR_altivec_vmrghh_direct_le, > {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > - : CODE_FOR_altivec_vmrghw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be > + : CODE_FOR_altivec_vmrghw_direct_v4si_le, > {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 0865608f94a..f8d2c316a55 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_<mode>" > (const_int 1) (const_int 5)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> > - : gen_altivec_vmrglw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > @@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_<mode>" > (const_int 3) (const_int 7)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> > - : gen_altivec_vmrghw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..c89739ecb55 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,118 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + __builtin_memcpy (out, &x0, 4); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +} ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2024-06-12 7:50 ` Kewen.Lin @ 2024-06-18 19:02 ` Peter Bergner 2024-06-19 7:28 ` Kewen.Lin 0 siblings, 1 reply; 10+ messages in thread From: Peter Bergner @ 2024-06-18 19:02 UTC (permalink / raw) To: Kewen.Lin, Xionghu Luo; +Cc: segher, dje.gcc, guojiufu, linkw, gcc-patches On 6/12/24 2:50 AM, Kewen.Lin wrote: > As the recent PR115355 shows, this issue can also affect the > behavior when users are adopting vectorization optimization, > IMHO we should get this landed as soon as possible. I agree we want this fixed ASAP. > As all said above, I believe this patch is a correct fix and > considering the impact of the issue, I'd like to get this > pushed next week if no objections. The only complaint I have on the patch, and I know this existed before the patch, is we're using register_operand for the predicate for these patterns when we probably should be using altivec_register_operand or vsx_register_operand depending on the specific pattern. Yes, other pre-existing patterns use that, but those should probably be fixed too. Maybe we go with register_operand for now with this patch and then have a follow-on patch (from us) that cleans those all up??? Otherwise, LGTM (although I can't approve it). Peter ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2024-06-18 19:02 ` Peter Bergner @ 2024-06-19 7:28 ` Kewen.Lin 0 siblings, 0 replies; 10+ messages in thread From: Kewen.Lin @ 2024-06-19 7:28 UTC (permalink / raw) To: Peter Bergner; +Cc: segher, dje.gcc, guojiufu, linkw, gcc-patches, Xionghu Luo on 2024/6/19 03:02, Peter Bergner wrote: > On 6/12/24 2:50 AM, Kewen.Lin wrote: >> As the recent PR115355 shows, this issue can also affect the >> behavior when users are adopting vectorization optimization, >> IMHO we should get this landed as soon as possible. > > I agree we want this fixed ASAP. > > > > >> As all said above, I believe this patch is a correct fix and >> considering the impact of the issue, I'd like to get this >> pushed next week if no objections. > > The only complaint I have on the patch, and I know this existed before > the patch, is we're using register_operand for the predicate for these > patterns when we probably should be using altivec_register_operand or > vsx_register_operand depending on the specific pattern. Good catch. > > Yes, other pre-existing patterns use that, but those should probably be > fixed too. Maybe we go with register_operand for now with this patch > and then have a follow-on patch (from us) that cleans those all up??? Yes, since this issue existed before and sort of widely, I think we want some other separated patch to clean them up. > > Otherwise, LGTM (although I can't approve it). Thanks! I noticed Segher posted some more review comments on patch v4, I'll follow up them. :) BR, Kewen ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2023-02-10 2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo ` (2 preceding siblings ...) 2024-06-12 7:50 ` Kewen.Lin @ 2024-06-18 20:31 ` Segher Boessenkool 2024-06-19 7:29 ` Kewen.Lin 3 siblings, 1 reply; 10+ messages in thread From: Segher Boessenkool @ 2024-06-18 20:31 UTC (permalink / raw) To: Xionghu Luo; +Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote: So, nothing here is obvious at all still. Could you please split it up a bit more, so that every step is either small or simple? So maybe first just split patterns to BE and LE versions, and nothing else? And one patch per insn, if at all possible. This matters so that a regression search will immediately show the culprit pattern, if anything went wrong. Most patches will not change anything consequential, but some will, and it should be very clear which do! And change (or add) comments in the patch so that I don't have to ask the same questions as before again! :-) Most of this seems clean and good, but there is just too much independent stuff going on at the same time. If your patch series is split up correctly writing a changelog for it is very easy (this is a good canary to use!), and if we get regressions from this it should be trivial to fond the problem, too. > @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) Please don't. Call the generic gen_vmrg* patterns from the widen things, don't try to do the compilers job of specialising stuff, it only makes things much less readable, and causes more mistakes. Just do like what was there before, essentially. Segher ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2024-06-18 20:31 ` Segher Boessenkool @ 2024-06-19 7:29 ` Kewen.Lin 0 siblings, 0 replies; 10+ messages in thread From: Kewen.Lin @ 2024-06-19 7:29 UTC (permalink / raw) To: Segher Boessenkool Cc: gcc-patches, dje.gcc, guojiufu, linkw, Peter Bergner, Xionghu Luo Hi Segher, on 2024/6/19 04:31, Segher Boessenkool wrote: > On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote: > So, nothing here is obvious at all still. Could you please split it up > a bit more, so that every step is either small or simple? I just chatted with Xionghu off-list, he is being busy on some other tasks and preferred me to follow up this. > > So maybe first just split patterns to BE and LE versions, and nothing > else? > > And one patch per insn, if at all possible. OK, I'll try to separate them as element type word, half-word and byte. > > This matters so that a regression search will immediately show the > culprit pattern, if anything went wrong. > > Most patches will not change anything consequential, but some will, and > it should be very clear which do! > > And change (or add) comments in the patch so that I don't have to ask > the same questions as before again! :-) > > Most of this seems clean and good, but there is just too much > independent stuff going on at the same time. If your patch series is > split up correctly writing a changelog for it is very easy (this is a > good canary to use!), and if we get regressions from this it should be > trivial to fond the problem, too. Good point. > >> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" >> { >> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); >> } >> DONE; >> }) > > Please don't. Call the generic gen_vmrg* patterns from the widen > things, don't try to do the compilers job of specialising stuff, it > only makes things much less readable, and causes more mistakes. Just do > like what was there before, essentially. Before r12-4496 (the culprit commit), this part looks like: @@ -3795,182 +3708,182 @@ (define_expand "vec_widen_smult_hi_v16qi" emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); } DONE; }) , its associated gen_altivec_vmrghh_direct looks like: -(define_insn "altivec_vmrghh_direct" - [(set (match_operand:V8HI 0 "register_operand" "=v") - (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v") - (match_operand:V8HI 2 "register_operand" "v")] - UNSPEC_VMRGH_DIRECT))] - "TARGET_ALTIVEC" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) , the intention is to emit exactly the insn "vmrghh". It's doable to call gen_vmrg* here instead, but I'm not sure if it's more readable, as this vec_widen_smult_hi_v16qi expander already has the different arms for BE and LE, for calling with the generic gen_vmrg*, it would be gen_altivec_vmrghb for BE and gen_altivec_vmrglb for LE, for LE readers need to be more careful that we actually generate vmrghh. From this perspective, gen_altivec_vmrghh_direct_{be,le} seems more straight. BR, Kewen ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] @ 2022-08-08 3:42 Xionghu Luo 2022-08-09 3:01 ` Kewen.Lin 0 siblings, 1 reply; 10+ messages in thread From: Xionghu Luo @ 2022-08-08 3:42 UTC (permalink / raw) To: gcc-patches; +Cc: segher, linkw, Xionghu Luo The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to restore to the UNSPEC implementation? gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb): Emit same native RTL for BE and LE. (altivec_vmrghh): Likewise. (altivec_vmrghw): Likewise. (*altivec_vmrghsf): Adjust. (altivec_vmrglb): Likewise. (altivec_vmrglh): Likewise. (altivec_vmrglw): Likewise. (*altivec_vmrglsf): Adjust. (altivec_vmrghb_direct): Emit different ASM for BE and LE. (altivec_vmrghh_direct): Likewise. (altivec_vmrghw_direct_<mode>): Likewise. (altivec_vmrglb_direct): Likewise. (altivec_vmrglh_direct): Likewise. (altivec_vmrglw_direct_<mode>): Likewise. (vec_widen_smult_hi_v16qi): Adjust. (vec_widen_smult_lo_v16qi): Adjust. (vec_widen_umult_hi_v16qi): Adjust. (vec_widen_umult_lo_v16qi): Adjust. (vec_widen_smult_hi_v8hi): Adjust. (vec_widen_smult_lo_v8hi): Adjust. (vec_widen_umult_hi_v8hi): Adjust. (vec_widen_umult_lo_v8hi): Adjust. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same native RTL for BE and LE. * config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise. (vsx_xxmrglw_<mode>): Likewise. gcc/testsuite/ChangeLog: PR target/106069 * gcc.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 122 ++++++++++++-------- gcc/config/rs6000/rs6000.cc | 36 +++--- gcc/config/rs6000/vsx.md | 16 +-- gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++ 4 files changed, 209 insertions(+), 83 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..8d9c0109559 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct" (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] "TARGET_ALTIVEC" - "vmrghb %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrghb %0,%1,%2"; + else + return "vmrglb %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrghh" @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct" (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] "TARGET_ALTIVEC" - "vmrghh %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrghh %0,%1,%2"; + else + return "vmrglh %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrghw" @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2])); DONE; }) @@ -1222,9 +1220,22 @@ (define_insn "altivec_vmrghw_direct_<mode>" (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] "TARGET_ALTIVEC" - "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + { + if (which_alternative == 0) + { + if (BYTES_BIG_ENDIAN) + return "xxmrghw %x0,%x1,%x2"; + else + return "xxmrglw %x0,%x2,%x1"; + } + else + { + if (BYTES_BIG_ENDIAN) + return "vmrghw %0,%1,%2"; + else + return "vmrglw %0,%2,%1"; + } + } [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,11 +1261,7 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrglb_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1273,7 +1280,12 @@ (define_insn "altivec_vmrglb_direct" (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] "TARGET_ALTIVEC" - "vmrglb %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrglb %0,%1,%2"; + else + return "vmrghb %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrglh" @@ -1282,11 +1294,7 @@ (define_expand "altivec_vmrglh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrglh_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1301,7 +1309,12 @@ (define_insn "altivec_vmrglh_direct" (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] "TARGET_ALTIVEC" - "vmrglh %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrglh %0,%1,%2"; + else + return "vmrghh %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrglw" @@ -1310,12 +1323,8 @@ (define_expand "altivec_vmrglw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrglw_direct_v4si (operands[0], operands[1], operands[2])); DONE; }) @@ -1328,9 +1337,22 @@ (define_insn "altivec_vmrglw_direct_<mode>" (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] "TARGET_ALTIVEC" - "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + { + if (which_alternative == 0) + { + if (BYTES_BIG_ENDIAN) + return "xxmrglw %x0,%x1,%x2"; + else + return "xxmrghw %x0,%x2,%x1"; + } + else + { + if (BYTES_BIG_ENDIAN) + return "vmrglw %0,%1,%2"; + else + return "vmrghw %0,%2,%1"; + } + } [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3705,7 +3727,7 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); } DONE; }) @@ -3730,7 +3752,7 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); } DONE; }) @@ -3755,7 +3777,7 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); } DONE; }) @@ -3780,7 +3802,7 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); } DONE; }) @@ -3805,7 +3827,7 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); } DONE; }) @@ -3830,7 +3852,7 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); } DONE; }) @@ -3855,7 +3877,7 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); } DONE; }) @@ -3880,7 +3902,7 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..018bea9f2f8 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering - from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN - && icode != CODE_FOR_vsx_xxpermdi_v16qi) + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ + if (swapped ^ !BYTES_BIG_ENDIAN) + if (!(icode == CODE_FOR_altivec_vmrghb_direct + || icode == CODE_FOR_altivec_vmrglb_direct + || icode == CODE_FOR_altivec_vmrghh_direct + || icode == CODE_FOR_altivec_vmrglh_direct + || icode == CODE_FOR_altivec_vmrghw_direct_v4si + || icode == CODE_FOR_altivec_vmrglw_direct_v4si + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) std::swap (op0, op1); if (imode != V16QImode) { diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..b84f667e4b2 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4688,12 +4688,8 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) @@ -4708,12 +4704,8 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrglw_direct_v4si (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C new file mode 100644 index 00000000000..56219a74692 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C @@ -0,0 +1,118 @@ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 1); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +} -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2022-08-08 3:42 [PATCH] " Xionghu Luo @ 2022-08-09 3:01 ` Kewen.Lin 2022-08-10 6:39 ` [PATCH v2] " Xionghu Luo 0 siblings, 1 reply; 10+ messages in thread From: Kewen.Lin @ 2022-08-09 3:01 UTC (permalink / raw) To: Xionghu Luo; +Cc: segher, Xionghu Luo, gcc-patches, David Edelsohn Hi Xionghu, Thanks for the fix. on 2022/8/8 11:42, Xionghu Luo wrote: > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Sorry, no -m32 for LE testing. I noticed the attachement in that PR didn't include the test case (though the changelog has it), so I re-tested it again, nothing changed. :) > Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to > restore to the UNSPEC implementation? > I have some concern on those changed "altivec_*_direct", IMHO the suffix "_direct" is normally to indicate the define_insn is mapped to the corresponding hw insn directly. With this change, for example, altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks misleading. Maybe we can add the corresponding _direct_le and _direct_be versions, both are mapped into the same insn but have different RTL patterns. Looking forward to Segher's and David's suggestions. > gcc/ChangeLog: > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb): Emit same native > RTL for BE and LE. > (altivec_vmrghh): Likewise. > (altivec_vmrghw): Likewise. > (*altivec_vmrghsf): Adjust. > (altivec_vmrglb): Likewise. > (altivec_vmrglh): Likewise. > (altivec_vmrglw): Likewise. > (*altivec_vmrglsf): Adjust. > (altivec_vmrghb_direct): Emit different ASM for BE and LE. > (altivec_vmrghh_direct): Likewise. > (altivec_vmrghw_direct_<mode>): Likewise. > (altivec_vmrglb_direct): Likewise. > (altivec_vmrglh_direct): Likewise. > (altivec_vmrglw_direct_<mode>): Likewise. > (vec_widen_smult_hi_v16qi): Adjust. > (vec_widen_smult_lo_v16qi): Adjust. > (vec_widen_umult_hi_v16qi): Adjust. > (vec_widen_umult_lo_v16qi): Adjust. > (vec_widen_smult_hi_v8hi): Adjust. > (vec_widen_smult_lo_v8hi): Adjust. > (vec_widen_umult_hi_v8hi): Adjust. > (vec_widen_umult_lo_v8hi): Adjust. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same > native RTL for BE and LE. > * config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise. > (vsx_xxmrglw_<mode>): Likewise. > > gcc/testsuite/ChangeLog: > PR target/106069 > * gcc.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> > --- > gcc/config/rs6000/altivec.md | 122 ++++++++++++-------- > gcc/config/rs6000/rs6000.cc | 36 +++--- > gcc/config/rs6000/vsx.md | 16 +-- > gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++ > 4 files changed, 209 insertions(+), 83 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..8d9c0109559 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2])); > DONE; > }) > > @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct" > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > "TARGET_ALTIVEC" > - "vmrghb %0,%1,%2" > + { > + if (BYTES_BIG_ENDIAN) > + return "vmrghb %0,%1,%2"; > + else > + return "vmrglb %0,%2,%1"; > + } > [(set_attr "type" "vecperm")]) > > (define_expand "altivec_vmrghh" > @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2])); > DONE; > }) > > @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct" > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > "TARGET_ALTIVEC" > - "vmrghh %0,%1,%2" > + { > + if (BYTES_BIG_ENDIAN) > + return "vmrghh %0,%1,%2"; > + else > + return "vmrglh %0,%2,%1"; > + } > [(set_attr "type" "vecperm")]) > > (define_expand "altivec_vmrghw" > @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2])); > DONE; > }) > [snip] > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..56219a74692 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C Since this is a C++ test case, it should be placed in gcc/testsuite/g++.target/powerpc/. > @@ -0,0 +1,118 @@ > +/* { dg-do run } */ This case requires altivec, it needs something like: /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec" } */ BR, Kewen > + > +extern "C" void * > +memcpy (void *, const void *, unsigned long); > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + memcpy (out, &x0, 1); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +} ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2022-08-09 3:01 ` Kewen.Lin @ 2022-08-10 6:39 ` Xionghu Luo 2022-08-10 17:07 ` Segher Boessenkool 0 siblings, 1 reply; 10+ messages in thread From: Xionghu Luo @ 2022-08-10 6:39 UTC (permalink / raw) To: Kewen.Lin; +Cc: segher, Xionghu Luo, gcc-patches, David Edelsohn On 2022/8/9 11:01, Kewen.Lin wrote: > Hi Xionghu, > > Thanks for the fix. > > on 2022/8/8 11:42, Xionghu Luo wrote: >> The native RTL expression for vec_mrghw should be same for BE and LE as >> they are register and endian-independent. So both BE and LE need >> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw >> with vec_select and vec_concat. >> >> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >> (subreg:V4SI (reg:V16QI 139) 0) >> (subreg:V4SI (reg:V16QI 140) 0)) >> [const_int 0 4 1 5])) >> >> Then combine pass could do the nested vec_select optimization >> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) >> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} >> >> => >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) >> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} >> >> The endianness check need only once at ASM generation finally. >> ASM would be better due to nested vec_select simplified to simple scalar >> load. >> >> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} > > Sorry, no -m32 for LE testing. I noticed the attachement in that PR didn't > include the test case (though the changelog has it), so I re-tested it > again, nothing changed. :) > >> Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to >> restore to the UNSPEC implementation? >> > > I have some concern on those changed "altivec_*_direct", IMHO the suffix > "_direct" is normally to indicate the define_insn is mapped to the > corresponding hw insn directly. With this change, for example, > altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks > misleading. Maybe we can add the corresponding _direct_le and _direct_be > versions, both are mapped into the same insn but have different RTL > patterns. Looking forward to Segher's and David's suggestions. > Thanks! Do you mean same RTL patterns with different hw insn? Updated as: v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to restore to the UNSPEC implementation? gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb): Emit same native RTL for BE and LE. (altivec_vmrghh): Likewise. (altivec_vmrghw): Likewise. (*altivec_vmrghsf): Adjust. (altivec_vmrglb): Likewise. (altivec_vmrglh): Likewise. (altivec_vmrglw): Likewise. (*altivec_vmrglsf): Adjust. (altivec_vmrghb_direct): Emit different ASM for BE and LE. (altivec_vmrghh_direct): Likewise. (altivec_vmrghw_direct_<mode>): Likewise. (altivec_vmrglb_direct): Likewise. (altivec_vmrglh_direct): Likewise. (altivec_vmrglw_direct_<mode>): Likewise. (vec_widen_smult_hi_v16qi): Adjust. (vec_widen_smult_lo_v16qi): Adjust. (vec_widen_umult_hi_v16qi): Adjust. (vec_widen_umult_lo_v16qi): Adjust. (vec_widen_smult_hi_v8hi): Adjust. (vec_widen_smult_lo_v8hi): Adjust. (vec_widen_umult_hi_v8hi): Adjust. (vec_widen_umult_lo_v8hi): Adjust. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same native RTL for BE and LE. * config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise. (vsx_xxmrglw_<mode>): Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 36 ++-- gcc/config/rs6000/vsx.md | 26 +-- gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ 4 files changed, 303 insertions(+), 102 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..f5c7a89de7c 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghw_direct_<mode>" +(define_insn "altivec_vmrghw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + xxmrglw %x0,%x2,%x1 + vmrglw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglw_direct_<mode>" +(define_insn "altivec_vmrglw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + xxmrghw %x0,%x2,%x1 + vmrghw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..97da7706f63 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering - from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN - && icode != CODE_FOR_vsx_xxpermdi_v16qi) + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ + if (swapped ^ !BYTES_BIG_ENDIAN) + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be + || icode == CODE_FOR_altivec_vmrglb_direct_be + || icode == CODE_FOR_altivec_vmrghh_direct_be + || icode == CODE_FOR_altivec_vmrglh_direct_be + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) std::swap (op0, op1); if (imode != V16QImode) { diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..c46d7e4f643 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4678,7 +4678,7 @@ (define_insn "vsx_xxspltd_<mode>" [(set_attr "type" "vecperm")]) ;; V4SF/V4SI interleave -(define_expand "vsx_xxmrghw_<mode>" +(define_insn "vsx_xxmrghw_<mode>" [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") (vec_select:VSX_W (vec_concat:<VS_double> @@ -4688,17 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); - DONE; + if (BYTES_BIG_ENDIAN) + return "xxmrghw %x0,%x1,%x2"; + else + return "xxmrglw %x0,%x2,%x1"; } [(set_attr "type" "vecperm")]) -(define_expand "vsx_xxmrglw_<mode>" +(define_insn "vsx_xxmrglw_<mode>" [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") (vec_select:VSX_W (vec_concat:<VS_double> @@ -4708,13 +4705,10 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); - DONE; + if (BYTES_BIG_ENDIAN) + return "xxmrglw %x0,%x1,%x2"; + else + return "xxmrghw %x0,%x2,%x1"; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..2cde9b821e3 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,120 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +} -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2022-08-10 6:39 ` [PATCH v2] " Xionghu Luo @ 2022-08-10 17:07 ` Segher Boessenkool 2022-08-11 6:15 ` Xionghu Luo 0 siblings, 1 reply; 10+ messages in thread From: Segher Boessenkool @ 2022-08-10 17:07 UTC (permalink / raw) To: Xionghu Luo; +Cc: Kewen.Lin, Xionghu Luo, gcc-patches, David Edelsohn On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: > On 2022/8/9 11:01, Kewen.Lin wrote: > >I have some concern on those changed "altivec_*_direct", IMHO the suffix > >"_direct" is normally to indicate the define_insn is mapped to the > >corresponding hw insn directly. With this change, for example, > >altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks > >misleading. Maybe we can add the corresponding _direct_le and _direct_be > >versions, both are mapped into the same insn but have different RTL > >patterns. Looking forward to Segher's and David's suggestions. > > Thanks! Do you mean same RTL patterns with different hw insn? A pattern called altivec_vmrghb_direct_le should always emit a vmrghb instruction, never a vmrglb instead. Misleading names are an expensive problem. Segher ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2022-08-10 17:07 ` Segher Boessenkool @ 2022-08-11 6:15 ` Xionghu Luo 2022-08-16 6:53 ` Kewen.Lin 0 siblings, 1 reply; 10+ messages in thread From: Xionghu Luo @ 2022-08-11 6:15 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Kewen.Lin, Xionghu Luo, gcc-patches, David Edelsohn On 2022/8/11 01:07, Segher Boessenkool wrote: > On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >> On 2022/8/9 11:01, Kewen.Lin wrote: >>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>> "_direct" is normally to indicate the define_insn is mapped to the >>> corresponding hw insn directly. With this change, for example, >>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>> misleading. Maybe we can add the corresponding _direct_le and _direct_be >>> versions, both are mapped into the same insn but have different RTL >>> patterns. Looking forward to Segher's and David's suggestions. >> >> Thanks! Do you mean same RTL patterns with different hw insn? > > A pattern called altivec_vmrghb_direct_le should always emit a vmrghb > instruction, never a vmrglb instead. Misleading names are an expensive > problem. > > Thanks. Then on LE platforms, if user calls altivec_vmrghw,it will be expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". For BE just strict forward, seems more clear :-), OK for master? [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match the actual output ASM vmrglb. Likewise for all similar xxx_direct_le patterns. v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} Linux(Thanks to Kewen). gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. (altivec_vmrghb_direct_be): New pattern for BE. (altivec_vmrglb_direct_le): New pattern for LE. (altivec_vmrghh_direct): Remove. (altivec_vmrghh_direct_be): New pattern for BE. (altivec_vmrglh_direct_le): New pattern for LE. (altivec_vmrghw_direct_<mode>): Remove. (altivec_vmrghw_direct_<mode>_be): New pattern for BE. (altivec_vmrglw_direct_<mode>_le): New pattern for LE. (altivec_vmrglb_direct): Remove. (altivec_vmrglb_direct_be): New pattern for BE. (altivec_vmrghb_direct_le): New pattern for LE. (altivec_vmrglh_direct): Remove. (altivec_vmrglh_direct_be): New pattern for BE. (altivec_vmrghh_direct_le): New pattern for LE. (altivec_vmrglw_direct_<mode>): Remove. (altivec_vmrglw_direct_<mode>_be): New pattern for BE. (altivec_vmrghw_direct_<mode>_le): New pattern for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Adjust. * config/rs6000/vsx.md: Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 36 ++-- gcc/config/rs6000/vsx.md | 24 +-- gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ 4 files changed, 305 insertions(+), 98 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..78245f470e9 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghw_direct_<mode>" +(define_insn "altivec_vmrghw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + xxmrglw %x0,%x2,%x1 + vmrglw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglw_direct_<mode>" +(define_insn "altivec_vmrglw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + xxmrghw %x0,%x2,%x1 + vmrghw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..97da7706f63 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering - from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN - && icode != CODE_FOR_vsx_xxpermdi_v16qi) + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ + if (swapped ^ !BYTES_BIG_ENDIAN) + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be + || icode == CODE_FOR_altivec_vmrglb_direct_be + || icode == CODE_FOR_altivec_vmrghh_direct_be + || icode == CODE_FOR_altivec_vmrglh_direct_be + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) std::swap (op0, op1); if (imode != V16QImode) { diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..2ae1bce131d 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) @@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..2cde9b821e3 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,120 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +} -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2022-08-11 6:15 ` Xionghu Luo @ 2022-08-16 6:53 ` Kewen.Lin 2022-08-17 6:23 ` [PATCH v4] " Xionghu Luo 0 siblings, 1 reply; 10+ messages in thread From: Kewen.Lin @ 2022-08-16 6:53 UTC (permalink / raw) To: Xionghu Luo; +Cc: Xionghu Luo, gcc-patches, David Edelsohn, Segher Boessenkool Hi Xionghu, Thanks for the updated version of patch, some comments are inlined. on 2022/8/11 14:15, Xionghu Luo wrote: > > > On 2022/8/11 01:07, Segher Boessenkool wrote: >> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >>> On 2022/8/9 11:01, Kewen.Lin wrote: >>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>>> "_direct" is normally to indicate the define_insn is mapped to the >>>> corresponding hw insn directly. With this change, for example, >>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>>> misleading. Maybe we can add the corresponding _direct_le and _direct_be >>>> versions, both are mapped into the same insn but have different RTL >>>> patterns. Looking forward to Segher's and David's suggestions. >>> >>> Thanks! Do you mean same RTL patterns with different hw insn? >> >> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb >> instruction, never a vmrglb instead. Misleading names are an expensive >> problem. >> >> > > Thanks. Then on LE platforms, if user calls altivec_vmrghw,it will be > expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and > finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". > For BE just strict forward, seems more clear :-), OK for master? > > > [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] > > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} > Linux(Thanks to Kewen). > > gcc/ChangeLog: > > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. > (altivec_vmrghb_direct_be): New pattern for BE. > (altivec_vmrglb_direct_le): New pattern for LE. > (altivec_vmrghh_direct): Remove. > (altivec_vmrghh_direct_be): New pattern for BE. > (altivec_vmrglh_direct_le): New pattern for LE. > (altivec_vmrghw_direct_<mode>): Remove. > (altivec_vmrghw_direct_<mode>_be): New pattern for BE. > (altivec_vmrglw_direct_<mode>_le): New pattern for LE. > (altivec_vmrglb_direct): Remove. > (altivec_vmrglb_direct_be): New pattern for BE. > (altivec_vmrghb_direct_le): New pattern for LE. > (altivec_vmrglh_direct): Remove. > (altivec_vmrglh_direct_be): New pattern for BE. > (altivec_vmrghh_direct_le): New pattern for LE. > (altivec_vmrglw_direct_<mode>): Remove. > (altivec_vmrglw_direct_<mode>_be): New pattern for BE. > (altivec_vmrghw_direct_<mode>_le): New pattern for LE. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > > PR target/106069 > * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> > --- > gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ > gcc/config/rs6000/rs6000.cc | 36 ++-- > gcc/config/rs6000/vsx.md | 24 +-- > gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ > 4 files changed, 305 insertions(+), 98 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..78245f470e9 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), > + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), > + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), > + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); > + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) I think you can just call gen_altivec_vmrghb_direct_be and gen_altivec_vmrghb_direct_le separately here. Similar for some other define_expands. > > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" > (const_int 5) (const_int 21) > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrghb %0,%1,%2" > [(set_attr "type" "vecperm")]) > Could you move the following altivec_vmrghb_direct_le here? Then readers can easily check the difference between be and le for the same altivec_vmrghb_direct. Same comment applied for some other similar cases. > +(define_insn "altivec_vmrglb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "register_operand" "v") > + (match_operand:V16QI 2 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 16) > + (const_int 1) (const_int 17) > + (const_int 2) (const_int 18) > + (const_int 3) (const_int 19) > + (const_int 4) (const_int 20) > + (const_int 5) (const_int 21) > + (const_int 6) (const_int 22) > + (const_int 7) (const_int 23)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrglb %0,%2,%1" > + [(set_attr "type" "vecperm")]) Could you update this pattern for assembly "vmrglb %0,%1,%2" instead of "vmrglb %0,%2,%1"? I checked the previous md before the culprit commit 0910c516a3d72af048, it emits "vmrglb %0,%1,%2" for altivec_vmrglb_direct. Same comment applied for some other similar cases. > + > (define_expand "altivec_vmrghh" > [(use (match_operand:V8HI 0 "register_operand")) > (use (match_operand:V8HI 1 "register_operand")) > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), > + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); > + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > + > + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > - (vec_select:V8HI > + (vec_select:V8HI > (vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrghh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrglh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 1) (const_int 9) > + (const_int 2) (const_int 10) > + (const_int 3) (const_int 11)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrglh %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrghw" > [(use (match_operand:V4SI 0 "register_operand")) > (use (match_operand:V4SI 1 "register_operand")) > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); > + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrghw_direct_<mode>" > +(define_insn "altivec_vmrghw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 0) (const_int 4) > (const_int 1) (const_int 5)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 1 "register_operand" "wa,v") > + (match_operand:VSX_W 2 "register_operand" "wa,v")) > + (parallel [(const_int 0) (const_int 4) > + (const_int 1) (const_int 5)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > - xxmrghw %x0,%x1,%x2 > - vmrghw %0,%1,%2" > + xxmrglw %x0,%x2,%x1 > + vmrglw %0,%2,%1" > [(set_attr "type" "vecperm")]) > > (define_insn "*altivec_vmrghsf" > @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > - : gen_altivec_vmrghb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), > + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), > + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), > + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); > + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" > (const_int 13) (const_int 29) > (const_int 14) (const_int 30) > (const_int 15) (const_int 31)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrghb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "register_operand" "v") > + (match_operand:V16QI 2 "register_operand" "v")) > + (parallel [(const_int 8) (const_int 24) > + (const_int 9) (const_int 25) > + (const_int 10) (const_int 26) > + (const_int 11) (const_int 27) > + (const_int 12) (const_int 28) > + (const_int 13) (const_int 29) > + (const_int 14) (const_int 30) > + (const_int 15) (const_int 31)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrghb %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrglh" > [(use (match_operand:V8HI 0 "register_operand")) > (use (match_operand:V8HI 1 "register_operand")) > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > - : gen_altivec_vmrghh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), > + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); > + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI > (vec_concat:V16HI > @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" > (const_int 5) (const_int 13) > (const_int 6) (const_int 14) > (const_int 7) (const_int 15)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrghh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")) > + (parallel [(const_int 4) (const_int 12) > + (const_int 5) (const_int 13) > + (const_int 6) (const_int 14) > + (const_int 7) (const_int 15)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrghh %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrglw" > [(use (match_operand:V4SI 0 "register_operand")) > (use (match_operand:V4SI 1 "register_operand")) > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > - : gen_altivec_vmrghw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); > + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglw_direct_<mode>" > +(define_insn "altivec_vmrglw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 2) (const_int 6) > (const_int 3) (const_int 7)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrglw %x0,%x1,%x2 > + vmrglw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 1 "register_operand" "wa,v") > + (match_operand:VSX_W 2 "register_operand" "wa,v")) > + (parallel [(const_int 2) (const_int 6) > + (const_int 3) (const_int 7)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > - xxmrglw %x0,%x1,%x2 > - vmrglw %0,%1,%2" > + xxmrghw %x0,%x2,%x1 > + vmrghw %0,%2,%1" > [(set_attr "type" "vecperm")]) > > (define_insn "*altivec_vmrglsf" > @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2", you need to change this to: emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); Same comment applied for some other similar cases. > } > DONE; > }) > @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index df491bee2ea..97da7706f63 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > {OPTION_MASK_ALTIVEC, > CODE_FOR_altivec_vpkuwum_direct, > {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > - : CODE_FOR_altivec_vmrglb_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, > {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, Before the culprit commit 0910c516a3d72af04, we have: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct : CODE_FOR_altivec_vmrglb_direct), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, I think we should use: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be : CODE_FOR_altivec_vmrglb_direct_le), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, here instead. Similar comment for those related below. > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > - : CODE_FOR_altivec_vmrglh_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, > {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > - : CODE_FOR_altivec_vmrglw_direct_v4si, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, > {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > - : CODE_FOR_altivec_vmrghb_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, > {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > - : CODE_FOR_altivec_vmrghh_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, > {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > - : CODE_FOR_altivec_vmrghw_direct_v4si, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, > {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > > /* For little-endian, the two input operands must be swapped > (or swapped back) to ensure proper right-to-left numbering > - from 0 to 2N-1. */ > - if (swapped ^ !BYTES_BIG_ENDIAN > - && icode != CODE_FOR_vsx_xxpermdi_v16qi) > + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ > + if (swapped ^ !BYTES_BIG_ENDIAN) > + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be > + || icode == CODE_FOR_altivec_vmrglb_direct_be > + || icode == CODE_FOR_altivec_vmrghh_direct_be > + || icode == CODE_FOR_altivec_vmrglh_direct_be > + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be > + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be > + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) > std::swap (op0, op1); IIUC, we don't need this part of change once we fix the operand order in the assembly for those LE "direct"s. BR, Kewen > if (imode != V16QImode) > { > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index e226a93bbe5..2ae1bce131d 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_<mode>" > (const_int 1) (const_int 5)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> > - : gen_altivec_vmrglw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], operands[2])); > DONE; > } > [(set_attr "type" "vecperm")]) > @@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_<mode>" > (const_int 3) (const_int 7)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> > - : gen_altivec_vmrghw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], operands[2])); > DONE; > } > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..2cde9b821e3 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,120 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +extern "C" void * > +memcpy (void *, const void *, unsigned long); > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + memcpy (out, &x0, 4); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +} ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] 2022-08-16 6:53 ` Kewen.Lin @ 2022-08-17 6:23 ` Xionghu Luo 0 siblings, 0 replies; 10+ messages in thread From: Xionghu Luo @ 2022-08-17 6:23 UTC (permalink / raw) To: Kewen.Lin; +Cc: Xionghu Luo, gcc-patches, David Edelsohn, Segher Boessenkool On 2022/8/16 14:53, Kewen.Lin wrote: > Hi Xionghu, > > Thanks for the updated version of patch, some comments are inlined. > > on 2022/8/11 14:15, Xionghu Luo wrote: >> >> >> On 2022/8/11 01:07, Segher Boessenkool wrote: >>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >>>> On 2022/8/9 11:01, Kewen.Lin wrote: >>>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>>>> "_direct" is normally to indicate the define_insn is mapped to the >>>>> corresponding hw insn directly. With this change, for example, >>>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>>>> misleading. Maybe we can add the corresponding _direct_le and _direct_be >>>>> versions, both are mapped into the same insn but have different RTL >>>>> patterns. Looking forward to Segher's and David's suggestions. >>>> >>>> Thanks! Do you mean same RTL patterns with different hw insn? >>> >>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb >>> instruction, never a vmrglb instead. Misleading names are an expensive >>> problem. >>> >>> >> >> Thanks. Then on LE platforms, if user calls altivec_vmrghw,it will be >> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and >> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". >> For BE just strict forward, seems more clear :-), OK for master? >> >> >> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] >> >> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match >> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le >> patterns. >> v2: Split the direct pattern to be and le with same RTL but different insn. >> >> The native RTL expression for vec_mrghw should be same for BE and LE as >> they are register and endian-independent. So both BE and LE need >> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw >> with vec_select and vec_concat. >> >> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >> (subreg:V4SI (reg:V16QI 139) 0) >> (subreg:V4SI (reg:V16QI 140) 0)) >> [const_int 0 4 1 5])) >> >> Then combine pass could do the nested vec_select optimization >> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) >> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} >> >> => >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) >> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} >> >> The endianness check need only once at ASM generation finally. >> ASM would be better due to nested vec_select simplified to simple scalar >> load. >> >> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} >> Linux(Thanks to Kewen). >> >> gcc/ChangeLog: >> >> PR target/106069 >> * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. >> (altivec_vmrghb_direct_be): New pattern for BE. >> (altivec_vmrglb_direct_le): New pattern for LE. >> (altivec_vmrghh_direct): Remove. >> (altivec_vmrghh_direct_be): New pattern for BE. >> (altivec_vmrglh_direct_le): New pattern for LE. >> (altivec_vmrghw_direct_<mode>): Remove. >> (altivec_vmrghw_direct_<mode>_be): New pattern for BE. >> (altivec_vmrglw_direct_<mode>_le): New pattern for LE. >> (altivec_vmrglb_direct): Remove. >> (altivec_vmrglb_direct_be): New pattern for BE. >> (altivec_vmrghb_direct_le): New pattern for LE. >> (altivec_vmrglh_direct): Remove. >> (altivec_vmrglh_direct_be): New pattern for BE. >> (altivec_vmrghh_direct_le): New pattern for LE. >> (altivec_vmrglw_direct_<mode>): Remove. >> (altivec_vmrglw_direct_<mode>_be): New pattern for BE. >> (altivec_vmrghw_direct_<mode>_le): New pattern for LE. >> * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): >> Adjust. >> * config/rs6000/vsx.md: Likewise. >> >> gcc/testsuite/ChangeLog: >> >> PR target/106069 >> * g++.target/powerpc/pr106069.C: New test. >> >> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> >> --- >> gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ >> gcc/config/rs6000/rs6000.cc | 36 ++-- >> gcc/config/rs6000/vsx.md | 24 +-- >> gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ >> 4 files changed, 305 insertions(+), 98 deletions(-) >> create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C >> >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index 2c4940f2e21..78245f470e9 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" >> (use (match_operand:V16QI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct >> - : gen_altivec_vmrglb_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), >> + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), >> + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), >> + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); >> + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) > > I think you can just call gen_altivec_vmrghb_direct_be and > gen_altivec_vmrghb_direct_le separately here. Similar for some other > define_expands. > >> >> -(define_insn "altivec_vmrghb_direct" >> +(define_insn "altivec_vmrghb_direct_be" >> [(set (match_operand:V16QI 0 "register_operand" "=v") >> (vec_select:V16QI >> (vec_concat:V32QI >> @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" >> (const_int 5) (const_int 21) >> (const_int 6) (const_int 22) >> (const_int 7) (const_int 23)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrghb %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> > > Could you move the following altivec_vmrghb_direct_le here? > Then readers can easily check the difference between be and > le for the same altivec_vmrghb_direct. > > Same comment applied for some other similar cases. > >> +(define_insn "altivec_vmrglb_direct_le" >> + [(set (match_operand:V16QI 0 "register_operand" "=v") >> + (vec_select:V16QI >> + (vec_concat:V32QI >> + (match_operand:V16QI 1 "register_operand" "v") >> + (match_operand:V16QI 2 "register_operand" "v")) >> + (parallel [(const_int 0) (const_int 16) >> + (const_int 1) (const_int 17) >> + (const_int 2) (const_int 18) >> + (const_int 3) (const_int 19) >> + (const_int 4) (const_int 20) >> + (const_int 5) (const_int 21) >> + (const_int 6) (const_int 22) >> + (const_int 7) (const_int 23)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrglb %0,%2,%1" >> + [(set_attr "type" "vecperm")]) > > Could you update this pattern for assembly "vmrglb %0,%1,%2" > instead of "vmrglb %0,%2,%1"? I checked the previous md > before the culprit commit 0910c516a3d72af048, it emits > "vmrglb %0,%1,%2" for altivec_vmrglb_direct. > > Same comment applied for some other similar cases. > >> + >> (define_expand "altivec_vmrghh" >> [(use (match_operand:V8HI 0 "register_operand")) >> (use (match_operand:V8HI 1 "register_operand")) >> (use (match_operand:V8HI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct >> - : gen_altivec_vmrglh_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), >> + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); >> + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); >> + >> + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrghh_direct" >> +(define_insn "altivec_vmrghh_direct_be" >> [(set (match_operand:V8HI 0 "register_operand" "=v") >> - (vec_select:V8HI >> + (vec_select:V8HI >> (vec_concat:V16HI >> (match_operand:V8HI 1 "register_operand" "v") >> (match_operand:V8HI 2 "register_operand" "v")) >> @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" >> (const_int 1) (const_int 9) >> (const_int 2) (const_int 10) >> (const_int 3) (const_int 11)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrghh %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrglh_direct_le" >> + [(set (match_operand:V8HI 0 "register_operand" "=v") >> + (vec_select:V8HI >> + (vec_concat:V16HI >> + (match_operand:V8HI 1 "register_operand" "v") >> + (match_operand:V8HI 2 "register_operand" "v")) >> + (parallel [(const_int 0) (const_int 8) >> + (const_int 1) (const_int 9) >> + (const_int 2) (const_int 10) >> + (const_int 3) (const_int 11)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrglh %0,%2,%1" >> + [(set_attr "type" "vecperm")]) >> + >> (define_expand "altivec_vmrghw" >> [(use (match_operand:V4SI 0 "register_operand")) >> (use (match_operand:V4SI 1 "register_operand")) >> (use (match_operand:V4SI 2 "register_operand"))] >> "VECTOR_MEM_ALTIVEC_P (V4SImode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si >> - : gen_altivec_vmrglw_direct_v4si; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); >> + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrghw_direct_<mode>" >> +(define_insn "altivec_vmrghw_direct_<mode>_be" >> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> (vec_select:VSX_W >> (vec_concat:<VS_double> >> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" >> (match_operand:VSX_W 2 "register_operand" "wa,v")) >> (parallel [(const_int 0) (const_int 4) >> (const_int 1) (const_int 5)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "@ >> + xxmrghw %x0,%x1,%x2 >> + vmrghw %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrglw_direct_<mode>_le" >> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> + (vec_select:VSX_W >> + (vec_concat:<VS_double> >> + (match_operand:VSX_W 1 "register_operand" "wa,v") >> + (match_operand:VSX_W 2 "register_operand" "wa,v")) >> + (parallel [(const_int 0) (const_int 4) >> + (const_int 1) (const_int 5)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "@ >> - xxmrghw %x0,%x1,%x2 >> - vmrghw %0,%1,%2" >> + xxmrglw %x0,%x2,%x1 >> + vmrglw %0,%2,%1" >> [(set_attr "type" "vecperm")]) >> >> (define_insn "*altivec_vmrghsf" >> @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" >> (use (match_operand:V16QI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct >> - : gen_altivec_vmrghb_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), >> + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), >> + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), >> + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); >> + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglb_direct" >> +(define_insn "altivec_vmrglb_direct_be" >> [(set (match_operand:V16QI 0 "register_operand" "=v") >> (vec_select:V16QI >> (vec_concat:V32QI >> @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" >> (const_int 13) (const_int 29) >> (const_int 14) (const_int 30) >> (const_int 15) (const_int 31)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrglb %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrghb_direct_le" >> + [(set (match_operand:V16QI 0 "register_operand" "=v") >> + (vec_select:V16QI >> + (vec_concat:V32QI >> + (match_operand:V16QI 1 "register_operand" "v") >> + (match_operand:V16QI 2 "register_operand" "v")) >> + (parallel [(const_int 8) (const_int 24) >> + (const_int 9) (const_int 25) >> + (const_int 10) (const_int 26) >> + (const_int 11) (const_int 27) >> + (const_int 12) (const_int 28) >> + (const_int 13) (const_int 29) >> + (const_int 14) (const_int 30) >> + (const_int 15) (const_int 31)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrghb %0,%2,%1" >> + [(set_attr "type" "vecperm")]) >> + >> (define_expand "altivec_vmrglh" >> [(use (match_operand:V8HI 0 "register_operand")) >> (use (match_operand:V8HI 1 "register_operand")) >> (use (match_operand:V8HI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct >> - : gen_altivec_vmrghh_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), >> + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); >> + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglh_direct" >> +(define_insn "altivec_vmrglh_direct_be" >> [(set (match_operand:V8HI 0 "register_operand" "=v") >> (vec_select:V8HI >> (vec_concat:V16HI >> @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" >> (const_int 5) (const_int 13) >> (const_int 6) (const_int 14) >> (const_int 7) (const_int 15)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrglh %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrghh_direct_le" >> + [(set (match_operand:V8HI 0 "register_operand" "=v") >> + (vec_select:V8HI >> + (vec_concat:V16HI >> + (match_operand:V8HI 1 "register_operand" "v") >> + (match_operand:V8HI 2 "register_operand" "v")) >> + (parallel [(const_int 4) (const_int 12) >> + (const_int 5) (const_int 13) >> + (const_int 6) (const_int 14) >> + (const_int 7) (const_int 15)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrghh %0,%2,%1" >> + [(set_attr "type" "vecperm")]) >> + >> (define_expand "altivec_vmrglw" >> [(use (match_operand:V4SI 0 "register_operand")) >> (use (match_operand:V4SI 1 "register_operand")) >> (use (match_operand:V4SI 2 "register_operand"))] >> "VECTOR_MEM_ALTIVEC_P (V4SImode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si >> - : gen_altivec_vmrghw_direct_v4si; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); >> + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglw_direct_<mode>" >> +(define_insn "altivec_vmrglw_direct_<mode>_be" >> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> (vec_select:VSX_W >> (vec_concat:<VS_double> >> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" >> (match_operand:VSX_W 2 "register_operand" "wa,v")) >> (parallel [(const_int 2) (const_int 6) >> (const_int 3) (const_int 7)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "@ >> + xxmrglw %x0,%x1,%x2 >> + vmrglw %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrghw_direct_<mode>_le" >> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> + (vec_select:VSX_W >> + (vec_concat:<VS_double> >> + (match_operand:VSX_W 1 "register_operand" "wa,v") >> + (match_operand:VSX_W 2 "register_operand" "wa,v")) >> + (parallel [(const_int 2) (const_int 6) >> + (const_int 3) (const_int 7)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "@ >> - xxmrglw %x0,%x1,%x2 >> - vmrglw %0,%1,%2" >> + xxmrghw %x0,%x2,%x1 >> + vmrghw %0,%2,%1" >> [(set_attr "type" "vecperm")]) >> >> (define_insn "*altivec_vmrglsf" >> @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" >> { >> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); > > Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2", > you need to change this to: > > emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > > Same comment applied for some other similar cases. > >> } >> DONE; >> }) >> @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" >> { >> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" >> { >> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" >> { >> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" >> { >> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" >> { >> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" >> { >> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" >> { >> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index df491bee2ea..97da7706f63 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >> {OPTION_MASK_ALTIVEC, >> CODE_FOR_altivec_vpkuwum_direct, >> {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct >> - : CODE_FOR_altivec_vmrglb_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, >> {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, > > Before the culprit commit 0910c516a3d72af04, we have: > > { OPTION_MASK_ALTIVEC, > (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > : CODE_FOR_altivec_vmrglb_direct), > { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > > I think we should use: > > { OPTION_MASK_ALTIVEC, > (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be > : CODE_FOR_altivec_vmrglb_direct_le), > { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > > here instead. Similar comment for those related below. > >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct >> - : CODE_FOR_altivec_vmrglh_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, >> {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si >> - : CODE_FOR_altivec_vmrglw_direct_v4si, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, >> {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct >> - : CODE_FOR_altivec_vmrghb_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, >> {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct >> - : CODE_FOR_altivec_vmrghh_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, >> {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si >> - : CODE_FOR_altivec_vmrghw_direct_v4si, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, >> {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, >> {OPTION_MASK_P8_VECTOR, >> BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct >> @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >> >> /* For little-endian, the two input operands must be swapped >> (or swapped back) to ensure proper right-to-left numbering >> - from 0 to 2N-1. */ >> - if (swapped ^ !BYTES_BIG_ENDIAN >> - && icode != CODE_FOR_vsx_xxpermdi_v16qi) >> + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ >> + if (swapped ^ !BYTES_BIG_ENDIAN) >> + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be >> + || icode == CODE_FOR_altivec_vmrglb_direct_be >> + || icode == CODE_FOR_altivec_vmrghh_direct_be >> + || icode == CODE_FOR_altivec_vmrglh_direct_be >> + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be >> + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be >> + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) >> std::swap (op0, op1); > > IIUC, we don't need this part of change once we fix the operand order in > the assembly for those LE "direct"s. > > BR, > Kewen > Thanks. Addressed all the comments as v4. v4: Update per comments. v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match the actual output ASM vmrglb. Likewise for all similar xxx_direct_le patterns. v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Linux. gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. (altivec_vmrghb_direct_be): New pattern for BE. (altivec_vmrghb_direct_le): New pattern for LE. (altivec_vmrghh_direct): Remove. (altivec_vmrghh_direct_be): New pattern for BE. (altivec_vmrghh_direct_le): New pattern for LE. (altivec_vmrghw_direct_<mode>): Remove. (altivec_vmrghw_direct_<mode>_be): New pattern for BE. (altivec_vmrghw_direct_<mode>_le): New pattern for LE. (altivec_vmrglb_direct): Remove. (altivec_vmrglb_direct_be): New pattern for BE. (altivec_vmrglb_direct_le): New pattern for LE. (altivec_vmrglh_direct): Remove. (altivec_vmrglh_direct_be): New pattern for BE. (altivec_vmrglh_direct_le): New pattern for LE. (altivec_vmrglw_direct_<mode>): Remove. (altivec_vmrglw_direct_<mode>_be): New pattern for BE. (altivec_vmrglw_direct_<mode>_le): New pattern for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Adjust. * config/rs6000/vsx.md: Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 230 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 24 +- gcc/config/rs6000/vsx.md | 28 ++- gcc/testsuite/g++.target/powerpc/pr106069.C | 120 ++++++++++ 4 files changed, 313 insertions(+), 89 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..962df4657e6 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrghw_direct_<mode>" +(define_insn "altivec_vmrghw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1221,10 +1257,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrglw_direct_<mode>" +(define_insn "altivec_vmrglw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1327,10 +1413,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..c6ccd40e089 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be + : CODE_FOR_altivec_vmrglb_direct_le, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be + : CODE_FOR_altivec_vmrglh_direct_le, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be + : CODE_FOR_altivec_vmrglw_direct_v4si_le, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be + : CODE_FOR_altivec_vmrghb_direct_le, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be + : CODE_FOR_altivec_vmrghh_direct_le, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be + : CODE_FOR_altivec_vmrghw_direct_v4si_le, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..80f84e9b141 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..2cde9b821e3 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,120 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +} -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-06-19 7:29 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-10 2:59 [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Xionghu Luo 2023-02-28 6:43 ` Ping: " Xionghu Luo 2023-03-30 19:30 ` Segher Boessenkool 2023-03-31 2:47 ` Xionghu Luo 2024-06-12 7:50 ` Kewen.Lin 2024-06-18 19:02 ` Peter Bergner 2024-06-19 7:28 ` Kewen.Lin 2024-06-18 20:31 ` Segher Boessenkool 2024-06-19 7:29 ` Kewen.Lin -- strict thread matches above, loose matches on Subject: below -- 2022-08-08 3:42 [PATCH] " Xionghu Luo 2022-08-09 3:01 ` Kewen.Lin 2022-08-10 6:39 ` [PATCH v2] " Xionghu Luo 2022-08-10 17:07 ` Segher Boessenkool 2022-08-11 6:15 ` Xionghu Luo 2022-08-16 6:53 ` Kewen.Lin 2022-08-17 6:23 ` [PATCH v4] " Xionghu Luo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).