public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Kewen.Lin" <linkw@linux.ibm.com>
To: Xionghu Luo <yinyuefengyi@gmail.com>
Cc: segher@kernel.crashing.org, Xionghu Luo <xionghuluo@tencent.com>,
	gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>
Subject: Re: [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
Date: Tue, 9 Aug 2022 11:01:05 +0800	[thread overview]
Message-ID: <ec28ad09-f23a-3ffc-3025-f0f52d0e773d@linux.ibm.com> (raw)
In-Reply-To: <20220808034247.2618809-1-xionghuluo@tencent.com>

Hi Xionghu,

Thanks for the fix.

on 2022/8/8 11:42, Xionghu Luo wrote:
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> 		   (subreg:V4SI (reg:V16QI 139) 0)
> 		   (subreg:V4SI (reg:V16QI 140) 0))
> 		   [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}

Sorry, no -m32 for LE testing.  I noticed the attachement in that PR didn't
include the test case (though the changelog has it), so I re-tested it
again, nothing changed.  :)

> Linux(Thanks to Kewen), OK for master?  Or should we revert r12-4496 to
> restore to the UNSPEC implementation?
> 

I have some concern on those changed "altivec_*_direct", IMHO the suffix
"_direct" is normally to indicate the define_insn is mapped to the
corresponding hw insn directly.  With this change, for example,
altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
misleading.  Maybe we can add the corresponding _direct_le and _direct_be
versions, both are mapped into the same insn but have different RTL
patterns.  Looking forward to Segher's and David's suggestions.

> gcc/ChangeLog:
> 	PR target/106069
> 	* config/rs6000/altivec.md (altivec_vmrghb): Emit same native
> 	RTL for BE and LE.
> 	(altivec_vmrghh): Likewise.
> 	(altivec_vmrghw): Likewise.
> 	(*altivec_vmrghsf): Adjust.
> 	(altivec_vmrglb): Likewise.
> 	(altivec_vmrglh): Likewise.
> 	(altivec_vmrglw): Likewise.
> 	(*altivec_vmrglsf): Adjust.
> 	(altivec_vmrghb_direct): Emit different ASM for BE and LE.
> 	(altivec_vmrghh_direct): Likewise.
> 	(altivec_vmrghw_direct_<mode>): Likewise.
> 	(altivec_vmrglb_direct): Likewise.
> 	(altivec_vmrglh_direct): Likewise.
> 	(altivec_vmrglw_direct_<mode>): Likewise.
> 	(vec_widen_smult_hi_v16qi): Adjust.
> 	(vec_widen_smult_lo_v16qi): Adjust.
> 	(vec_widen_umult_hi_v16qi): Adjust.
> 	(vec_widen_umult_lo_v16qi): Adjust.
> 	(vec_widen_smult_hi_v8hi): Adjust.
> 	(vec_widen_smult_lo_v8hi): Adjust.
> 	(vec_widen_umult_hi_v8hi): Adjust.
> 	(vec_widen_umult_lo_v8hi): Adjust.
> 	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same
> 	native RTL for BE and LE.
> 	* config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise.
> 	(vsx_xxmrglw_<mode>): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 	PR target/106069
> 	* gcc.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
> ---
>  gcc/config/rs6000/altivec.md                | 122 ++++++++++++--------
>  gcc/config/rs6000/rs6000.cc                 |  36 +++---
>  gcc/config/rs6000/vsx.md                    |  16 +--
>  gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++
>  4 files changed, 209 insertions(+), 83 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..8d9c0109559 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> -						: gen_altivec_vmrglb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
> @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct"
>  		     (const_int 6) (const_int 22)
>  		     (const_int 7) (const_int 23)])))]
>    "TARGET_ALTIVEC"
> -  "vmrghb %0,%1,%2"
> +  {
> +     if (BYTES_BIG_ENDIAN)
> +      return "vmrghb %0,%1,%2";
> +    else
> +      return "vmrglb %0,%2,%1";
> + }
>    [(set_attr "type" "vecperm")])
>  
>  (define_expand "altivec_vmrghh"
> @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh"
>     (use (match_operand:V8HI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
> -						: gen_altivec_vmrglh_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
> @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct"
>  		     (const_int 2) (const_int 10)
>  		     (const_int 3) (const_int 11)])))]
>    "TARGET_ALTIVEC"
> -  "vmrghh %0,%1,%2"
> +  {
> +     if (BYTES_BIG_ENDIAN)
> +      return "vmrghh %0,%1,%2";
> +    else
> +      return "vmrglh %0,%2,%1";
> + }
>    [(set_attr "type" "vecperm")])
>  
>  (define_expand "altivec_vmrghw"
> @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw"
>     (use (match_operand:V4SI 2 "register_operand"))]
>    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
>  {
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
> -			 : gen_altivec_vmrglw_direct_v4si;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  emit_insn (
> +    gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
[snip]
>    [(set_attr "type" "vecperm")])
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C
> new file mode 100644
> index 00000000000..56219a74692
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C

Since this is a C++ test case, it should be placed in gcc/testsuite/g++.target/powerpc/.

> @@ -0,0 +1,118 @@
> +/* { dg-do run } */

This case requires altivec, it needs something like:

/* { dg-require-effective-target vmx_hw } */
/* { dg-options "-maltivec" } */

BR,
Kewen

> +
> +extern "C" void *
> +memcpy (void *, const void *, unsigned long);
> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
> +
> +union
> +{
> +  native_simd_type V;
> +  int R[4];
> +} store_le_vec;
> +
> +struct S
> +{
> +  S () = default;
> +  S (unsigned B0)
> +  {
> +    native_simd_type val{B0};
> +    m_simd = val;
> +  }
> +  void store_le (unsigned int out[])
> +  {
> +    store_le_vec.V = m_simd;
> +    unsigned int x0 = store_le_vec.R[0];
> +    memcpy (out, &x0, 1);
> +  }
> +  S rotl (unsigned int r)
> +  {
> +    native_simd_type rot{r};
> +    return __builtin_vec_rl (m_simd, rot);
> +  }
> +  void operator+= (S other)
> +  {
> +    m_simd = __builtin_vec_add (m_simd, other.m_simd);
> +  }
> +  void operator^= (S other)
> +  {
> +    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
> +  }
> +  static void transpose (S &B0, S B1, S B2, S B3)
> +  {
> +    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
> +    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
> +    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
> +    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
> +    B0 = __builtin_vec_mergeh (T0, T1);
> +    B3 = __builtin_vec_mergel (T2, T3);
> +  }
> +  S (native_simd_type x) : m_simd (x) {}
> +  native_simd_type m_simd;
> +};
> +
> +void
> +foo (unsigned int output[], unsigned state[])
> +{
> +  S R00 = state[0];
> +  S R01 = state[0];
> +  S R02 = state[2];
> +  S R03 = state[0];
> +  S R05 = state[5];
> +  S R06 = state[6];
> +  S R07 = state[7];
> +  S R08 = state[8];
> +  S R09 = state[9];
> +  S R10 = state[10];
> +  S R11 = state[11];
> +  S R12 = state[12];
> +  S R13 = state[13];
> +  S R14 = state[4];
> +  S R15 = state[15];
> +  for (int r = 0; r != 10; ++r)
> +    {
> +      R09 += R13;
> +      R11 += R15;
> +      R05 ^= R09;
> +      R06 ^= R10;
> +      R07 ^= R11;
> +      R07 = R07.rotl (7);
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 ^= R01;
> +      R13 ^= R02;
> +      R00 += R05;
> +      R01 += R06;
> +      R02 += R07;
> +      R15 ^= R00;
> +      R12 = R12.rotl (8);
> +      R13 = R13.rotl (8);
> +      R10 += R15;
> +      R11 += R12;
> +      R08 += R13;
> +      R09 += R14;
> +      R05 ^= R10;
> +      R06 ^= R11;
> +      R07 ^= R08;
> +      R05 = R05.rotl (7);
> +      R06 = R06.rotl (7);
> +      R07 = R07.rotl (7);
> +    }
> +  R00 += state[0];
> +  S::transpose (R00, R01, R02, R03);
> +  R00.store_le (output);
> +}
> +
> +unsigned int res[1];
> +unsigned main_state[]{1634760805, 60878,      2036477234, 6,
> +		      0,	  825562964,  1471091955, 1346092787,
> +		      506976774,  4197066702, 518848283,  118491664,
> +		      0,	  0,	      0,	  0};
> +int
> +main ()
> +{
> +  foo (res, main_state);
> +  if (res[0] != 0x41fcef98)
> +    __builtin_abort ();
> +}

  reply	other threads:[~2022-08-09  3:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08  3:42 Xionghu Luo
2022-08-09  3:01 ` Kewen.Lin [this message]
2022-08-09 22:03   ` Segher Boessenkool
2022-08-10  6:39   ` [PATCH v2] " Xionghu Luo
2022-08-10 17:07     ` Segher Boessenkool
2022-08-11  6:15       ` Xionghu Luo
2022-08-16  6:53         ` Kewen.Lin
2022-08-17  6:23           ` [PATCH v4] " Xionghu Luo
2022-08-24  1:24             ` Ping: " Xionghu Luo
2023-01-18  9:11               ` Kewen.Lin
2023-02-09  2:15                 ` Xionghu Luo
2023-02-09 15:52                   ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ec28ad09-f23a-3ffc-3025-f0f52d0e773d@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=xionghuluo@tencent.com \
    --cc=yinyuefengyi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).