[PATCH 0/6 ver 4] ] Permute Class Operations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/6 ver 4]  ] Permute Class Operations
@ 2020-07-08 19:44 Carl Love
  0 siblings, 0 replies; 18+ messages in thread
From: Carl Love @ 2020-07-08 19:44 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

Segher:

The following is version 4 of the series of patches for the permute
class operations.  Per your request, I will send each patch as a reply
to this message so they are all in the same thread in your email box.  

Patches 1, 2,3  and 4 just have minor fixes per your earlier comments. 
However, the patches have been rebased onto the latest mainline tree
which required changing FUTURE to P10 in the code and test cases.  So I
am sending everything.  

Patch 5 has the changes for the F32bit_const_operand stuff that we
discussed at length.  It has the changes with regards to the xxspltidp
instruction which has undefined results for subnormal inputs that we
also talked about.  See comments in the V4 fixes as to specific things
that need reviewing and commenting on.

Patch 6 didn't get reviewed the last time as we discussed that the
whole series needed rebasing due to the FUTURE to P10 changes that had
gone into mainline.

The series has been retested on Power 9 as well as running the
testcases on mambo.  Everything seems to checkout fine.

Please let me know if the series is acceptable for mainline.  Thanks.

                     Carl 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
  2020-07-09 17:38 ` will schmidt
@ 2020-07-15 20:07 ` Segher Boessenkool
  1 sibling, 0 replies; 18+ messages in thread
From: Segher Boessenkool @ 2020-07-15 20:07 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt

Hi!

On Wed, Jul 08, 2020 at 12:59:29PM -0700, Carl Love wrote:
> [PATCH 5/6] rs6000, Add vector splat builtin support

> +(define_insn "xxspltiw_v4si"
> +  [(set (match_operand:V4SI 0 "register_operand" "=wa")
> +	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
> +		     UNSPEC_XXSPLTIW))]
> + "TARGET_POWER10"
> + "xxspltiw %x0,%1"
> + [(set_attr "type" "vecsimple")])

Hrm, from the instruction description (in the ISA) this should be an
unsigned integer, instead?  (GNU as doesn't care, it takes the low 32
bits, of any integer, it doesn't have to be either a s32 or a u32
apparently).

> +(define_insn "xxspltiw_v4sf_inst"
> +  [(set (match_operand:V4SF 0 "register_operand" "=wa")
> +	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> +		     UNSPEC_XXSPLTIW))]
> + "TARGET_POWER10"
> + "xxspltiw %x0,%c1"
> + [(set_attr "type" "vecsimple")])

This will do exactly the same as just "%1"?  Or not?  (I.e. call
output_addr_const for that arg).  (We don't use %c anywhere else in the
port AFAICS, so let's not start that if there is no reason to).

> +(define_expand "xxspltidp_v2df"
> +  [(set (match_operand:V2DF 0 "register_operand" )
> +	(unspec:V2DF [(match_operand:SF 1 "const_double_operand")]
> +		     UNSPEC_XXSPLTID))]
> + "TARGET_POWER10"
> +{
> +  long value = rs6000_const_f32_to_i32 (operands[1]);
> +  emit_insn (gen_xxspltidp_v2df_inst (operands[0], GEN_INT (value)));
> +  DONE;
> +})
> +
> +(define_insn "xxspltidp_v2df_inst"
> +  [(set (match_operand:V2DF 0 "register_operand" "=wa")
> +	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> +		     UNSPEC_XXSPLTID))]
> +  "TARGET_POWER10"
> +{
> +  /* Note, the xxspltidp gives undefined results if the operand is a single
> +     precision subnormal number. */
> +  int value = INTVAL (operands[1]);
> +
> +  if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0))
> +    /* value is subnormal */
> +    fprintf (stderr, "WARNING: Result for the xxspltidp instruction is undefined for subnormal input values.\n");
> +
> +  return "xxspltidp %x0,%c1";
> +}
> +  [(set_attr "type" "vecsimple")])

There are utility functions to print warnings.  But, we shouldn't at all
here.  Instead, the insn shouldn't match at all with bad inputs, or give
an actual error maybe (although it is nicer if the builtin handling code
does that).

> +(define_insn "xxsplti32dx_v4sf_inst"
> +  [(set (match_operand:V4SF 0 "register_operand" "=wa")
> +	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
> +		      (match_operand:QI 2 "u1bit_cint_operand" "n")
> +		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
> +		     UNSPEC_XXSPLTI32DX))]
> +  "TARGET_POWER10"
> +  "xxsplti32dx %x0,%2,%3"
> +   [(set_attr "type" "vecsimple")])

(a space too much indent here)

> +;; Return 1 if op is a unsigned 1-bit constant integer.
> +(define_predicate "u1bit_cint_operand"

"an unsigned"

> +long long
> +rs6000_const_f32_to_i32 (rtx operand)
> +{
> +  long long value;
> +  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand);
> +
> +  gcc_assert (GET_MODE (operand) == SFmode);
> +  REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
> +  return value;
> +}

Can this just return "int"?  (Or "unsigned int"?)


The rest of the patch looks good.


Segher

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
  2020-07-09 16:13 ` will schmidt
@ 2020-07-14 20:15 ` Segher Boessenkool
  1 sibling, 0 replies; 18+ messages in thread
From: Segher Boessenkool @ 2020-07-14 20:15 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt

Hi!

On Wed, Jul 08, 2020 at 12:59:19PM -0700, Carl Love wrote:
> [PATCH 4/6] rs6000, Add vector shift double builtin support

> 	* config/rs6000/altivec.h (vec_sldb, vec_srdb): New defines.
> 	* config/rs6000/altivec.md (UNSPEC_SLDB, UNSPEC_SRDB): New.
> 	(SLDB_LR): New attribute.
> 	(VSHIFT_DBL_LR): New iterator.
> 	(vs<SLDB_LR>db_<mode>): New define_insn.

You renamed these iterators / attributes?  Well, some of them, anyway.
Hrm.

> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_SLDB,
> 	P10_BUILTIN_VEC_SRDB): New definitions.
> 	(rs6000_expand_ternop_builtin) [CODE_FOR_vsldb_v16qi,
> 	CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di,
> 	CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si,
> 	CODE_FOR_vsrdb_v2di}: Add clauses.

"]" (not "}").


Okay for trunk.  Thank you!


Segher

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
  2020-07-09 16:02 ` will schmidt
@ 2020-07-13 14:30 ` Segher Boessenkool
  1 sibling, 0 replies; 18+ messages in thread
From: Segher Boessenkool @ 2020-07-13 14:30 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt

Hi!

On Wed, Jul 08, 2020 at 12:59:12PM -0700, Carl Love wrote:
> [PATCH 3/6] rs6000, Add vector replace builtin support

This is okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-09 16:02 ` will schmidt
@ 2020-07-13 12:41   ` Segher Boessenkool
  0 siblings, 0 replies; 18+ messages in thread
From: Segher Boessenkool @ 2020-07-13 12:41 UTC (permalink / raw)
  To: will schmidt; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt

On Thu, Jul 09, 2020 at 11:02:39AM -0500, will schmidt wrote:
> > 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_REPLACE_ELT,
> > 	P10_BUILTIN_VEC_REPLACE_UN): New.
> 
> New what?

Just "New." is fine :-)


Segher

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
  2020-07-09 15:44 ` will schmidt
@ 2020-07-13 12:04 ` Segher Boessenkool
  1 sibling, 0 replies; 18+ messages in thread
From: Segher Boessenkool @ 2020-07-13 12:04 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt

Hi!

On Wed, Jul 08, 2020 at 12:59:00PM -0700, Carl Love wrote:
> [PATCH 2/6] rs6000 Add vector insert builtin support

> +For little-endian,
> +the generated code will be semantically equivalent to vinsbrx, vinshrx,
> +or vinswrx instructions.  Similarly for big-endian it will be semantically
> +equivalent to vinsblx, vinshlx, vinswlx.

"This builtin generates vins[bhw]lx on BE" etc.?

>  Note that some
> +fairly anomalous results can be generated if the byte index is not aligned
> +on an element boundary for the sort of element being inserted. This is a
> +limitation of the bi-endian vector programming model.

Yeah, leave this out, like Will says?  (Also, two spaces after dot.)

> +For little-endian, the code generation will be semantically equivalent to
> +vins*lx, while for big-endian it will be semantically equivalent to vins*rx.

Similar / same as above.

This is okay for trunk (with that improved a bit, also the typos and
other doc things Will found).  Thanks!


Segher

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
@ 2020-07-09 18:28 ` will schmidt
  0 siblings, 0 replies; 18+ messages in thread
From: will schmidt @ 2020-07-09 18:28 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 6/6] rs6000 Add vector blend, permute builtin support
> 
> ----------------------------------
> V4 Fixes:
> 
>    Rebased on mainline.  Changed FUTURE to P10.
> ---------
> 
> v3 fixes:
>    Replace spaces with tabs in ChangeLog description.
>    Fix implementation comments for define_expand "xxpermx" in file
>      gcc/config/rs6000/alitvec.md.
>    Fix minor typos in the comments for the changes in
> gcc/config/rs6000/rs6000-call.c.
> 
> --------------------
> v2 changes:
> 
>    Updated ChangeLog per comments.
> 
>    Updated implementation of the define_expand "xxpermx".
> 
>    Fixed the comments and check for 3-bit immediate field for the
> 	CODE_FOR_xxpermx check.
> 
>    gcc/doc/extend.texi:
> 	comment "Maybe it should say it is related to vsel/xxsel, but
> per
> 	bigger element?", added comment.  I took the description
> directly
> 	from spec.  Don't really don't want to mess with the approved
> 	description.
> 
>        fixed typo for Vector Permute Extendedextracth
> 
> ----------
> 
> GCC maintainers:
> 
> The following patch adds support for the vec_blendv and vec_permx
> builtins.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The test cases were compiled on a Power 9 system and then tested on
> Mambo.
> 
>                          Carl Love
> 
> ---------------------------------------------------------------
> rs6000 RFC2609 vector blend, permute instructions
> 
> gcc/ChangeLog
> 
> 2020-07-06  Carl Love  <cel@us.ibm.com>
> 
> 	* config/rs6000/altivec.h (vec_blendv, vec_permx): Add define.
> 	* config/rs6000/altivec.md (UNSPEC_XXBLEND, UNSPEC_XXPERMX.):
> New
> 	unspecs.
> 	(VM3): New define_mode.
> 	(VM3_char): New define_attr.
> 	(xxblend_<mode> mode VM3): New define_insn.
> 	(xxpermx): New define_expand.
> 	(xxpermx_inst): New define_insn.
> 	* config/rs6000/rs6000-builtin.def (VXXBLEND_V16QI,
> VXXBLEND_V8HI,
> 	VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF):
> New
> 	BU_P10V_3 definitions.

> 	(XXBLENDBU_P10_OVERLOAD_3): New BU_P10_OVERLOAD_3 definition.

extra noise in (), should be just XXBLEND .

> 	(XXPERMX): New BU_P10_OVERLOAD_4 definition.


> 	* config/rs6000/rs6000-c.c
> (altivec_resolve_overloaded_builtin):
> 	(P10_BUILTIN_VXXPERMX): Add if case support.
> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VXXBLEND_V16QI,
> 	P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI,
> 	P10_BUILTIN_VXXBLEND_V2DI, P10_BUILTIN_VXXBLEND_V4SF,
> 	P10_BUILTIN_VXXBLEND_V2DF, P10_BUILTIN_VXXPERMX): Define
> 	overloaded arguments.
> 	(rs6000_expand_quaternop_builtin): Add if case for
> CODE_FOR_xxpermx.

s/if//

> 	(builtin_quaternary_function_type): Add v16uqi_type and
> xxpermx_type
> 	variables, add case statement for P10_BUILTIN_VXXPERMX.
> 	(builtin_function_type)[P10_BUILTIN_VXXBLEND_V16QI,
> 	P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI,
> 	P10_BUILTIN_VXXBLEND_V2DI]: Add case statements.

Add space after (builtin_function_type)
Reverse tense?
(b_f_t) Add case statements for P10_BUILTIN_..., P10_BUILTIN_...


> 	* doc/extend.texi: Add documentation for vec_blendv and
> vec_permx.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-07-06  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/vec-blend-runnable.c: New test.
> 	gcc.target/powerpc/vec-permute-ext-runnable.c: New test.
> ---
>  gcc/config/rs6000/altivec.h                   |   2 +
>  gcc/config/rs6000/altivec.md                  |  71 +++++
>  gcc/config/rs6000/rs6000-builtin.def          |  13 +
>  gcc/config/rs6000/rs6000-c.c                  |  27 +-
>  gcc/config/rs6000/rs6000-call.c               |  95 ++++++
>  gcc/doc/extend.texi                           |  63 ++++
>  .../gcc.target/powerpc/vec-blend-runnable.c   | 276 ++++++++++++++++
>  .../powerpc/vec-permute-ext-runnable.c        | 294
> ++++++++++++++++++
>  8 files changed, 835 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-blend-
> runnable.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-permute-ext-
> runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index 126409c168b..e8fdeb31b0b 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -708,6 +708,8 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_splati(a)  __builtin_vec_xxspltiw (a)
>  #define vec_splatid(a) __builtin_vec_xxspltid (a)
>  #define vec_splati_ins(a, b, c)        __builtin_vec_xxsplti32dx (a,
> b, c)
> +#define vec_blendv(a, b, c)    __builtin_vec_xxblend (a, b, c)
> +#define vec_permx(a, b, c, d)  __builtin_vec_xxpermx (a, b, c, d)
> 
>  #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
>  #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index f6858b5bf2a..226cf121f12 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -177,6 +177,8 @@
>     UNSPEC_XXSPLTIW
>     UNSPEC_XXSPLTID
>     UNSPEC_XXSPLTI32DX
> +   UNSPEC_XXBLEND
> +   UNSPEC_XXPERMX
>  ])
> 
>  (define_c_enum "unspecv"
> @@ -219,6 +221,21 @@
>  			   (KF "FLOAT128_VECTOR_P (KFmode)")
>  			   (TF "FLOAT128_VECTOR_P (TFmode)")])
> 
> +;; Like VM2, just do char, short, int, long, float and double
> +(define_mode_iterator VM3 [V4SI
> +			   V8HI
> +			   V16QI
> +			   V4SF
> +			   V2DF
> +			   V2DI])
> +
> +(define_mode_attr VM3_char [(V2DI "d")
> +			   (V4SI "w")
> +			   (V8HI "h")
> +			   (V16QI "b")
> +			   (V2DF  "d")
> +			   (V4SF  "w")])
> +
>  ;; Map the Vector convert single precision to double precision for
> integer
>  ;; versus floating point
>  (define_mode_attr VS_sxwsp [(V4SI "sxw") (V4SF "sp")])
> @@ -916,6 +933,60 @@
>    "xxsplti32dx %x0,%2,%3"
>     [(set_attr "type" "vecsimple")])
> 
> +(define_insn "xxblend_<mode>"
> +  [(set (match_operand:VM3 0 "register_operand" "=wa")
> +	(unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa")
> +		     (match_operand:VM3 2 "register_operand" "wa")
> +		     (match_operand:VM3 3 "register_operand" "wa")]
> +		    UNSPEC_XXBLEND))]
> +  "TARGET_POWER10"
> +  "xxblendv<VM3_char> %x0,%x1,%x2,%x3"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_expand "xxpermx"
> +  [(set (match_operand:V2DI 0 "register_operand" "+wa")
> +	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "wa")
> +		      (match_operand:V2DI 2 "register_operand" "wa")
> +		      (match_operand:V16QI 3 "register_operand" "wa")
> +		      (match_operand:QI 4 "u8bit_cint_operand" "n")]
> +		     UNSPEC_XXPERMX))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_xxpermx_inst (operands[0], operands[1],
> +				 operands[2], operands[3],
> +				 operands[4]));
> +  else
> +    {
> +      /* Reverse value of byte element indexes by XORing with 0xFF.
> +	 Reverse the 32-byte section identifier match by subracting
> bits [0:2]
> +	 of elemet from 7.  */
> +      int value = INTVAL (operands[4]);
> +      rtx vreg = gen_reg_rtx (V16QImode);
> +
> +      emit_insn (gen_xxspltib_v16qi (vreg, GEN_INT (-1)));
> +      emit_insn (gen_xorv16qi3 (operands[3], operands[3], vreg));
> +      value = 7 - value;
> +      emit_insn (gen_xxpermx_inst (operands[0], operands[2],
> +				   operands[1], operands[3],
> +				   GEN_INT (value)));
> +    }
> +
> +  DONE;
> +}
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "xxpermx_inst"
> +  [(set (match_operand:V2DI 0 "register_operand" "+v")
> +	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")
> +		      (match_operand:V2DI 2 "register_operand" "v")
> +		      (match_operand:V16QI 3 "register_operand" "v")
> +		      (match_operand:QI 4 "u3bit_cint_operand" "n")]
> +		     UNSPEC_XXPERMX))]
> +  "TARGET_POWER10"
> +  "xxpermx %x0,%x1,%x2,%x3,%4"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_expand "vstrir_<mode>"
>    [(set (match_operand:VIshort 0 "altivec_register_operand")
>  	(unspec:VIshort [(match_operand:VIshort 1
> "altivec_register_operand")]
> diff --git a/gcc/config/rs6000/rs6000-builtin.def
> b/gcc/config/rs6000/rs6000-builtin.def
> index ddfe287efc8..3d45354c573 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2756,6 +2756,15 @@ BU_P10V_1 (VXXSPLTID, "vxxspltidp", CONST,
> xxspltidp_v2df)
>  BU_P10V_3 (VXXSPLTI32DX_V4SI, "vxxsplti32dx_v4si", CONST,
> xxsplti32dx_v4si)
>  BU_P10V_3 (VXXSPLTI32DX_V4SF, "vxxsplti32dx_v4sf", CONST,
> xxsplti32dx_v4sf)
> 
> +BU_P10V_3 (VXXBLEND_V16QI, "xxblend_v16qi", CONST, xxblend_v16qi)
> +BU_P10V_3 (VXXBLEND_V8HI, "xxblend_v8hi", CONST, xxblend_v8hi)
> +BU_P10V_3 (VXXBLEND_V4SI, "xxblend_v4si", CONST, xxblend_v4si)
> +BU_P10V_3 (VXXBLEND_V2DI, "xxblend_v2di", CONST, xxblend_v2di)
> +BU_P10V_3 (VXXBLEND_V4SF, "xxblend_v4sf", CONST, xxblend_v4sf)
> +BU_P10V_3 (VXXBLEND_V2DF, "xxblend_v2df", CONST, xxblend_v2df)
> +
> +BU_P10V_4 (VXXPERMX, "xxpermx", CONST, xxpermx)
> +
>  BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
>  BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
>  BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
> @@ -2791,6 +2800,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
>  BU_P10_OVERLOAD_1 (XXSPLTIW, "xxspltiw")
>  BU_P10_OVERLOAD_1 (XXSPLTID, "xxspltid")
>  BU_P10_OVERLOAD_3 (XXSPLTI32DX, "xxsplti32dx")
> +
> +BU_P10_OVERLOAD_3 (XXBLEND, "xxblend")
> +BU_P10_OVERLOAD_4 (XXPERMX, "xxpermx")
> +
>  
>  /* 1 argument crypto functions.  */
>  BU_CRYPTO_1 (VSBOX,		"vsbox",	  CONST, crypto_vsbox_v2di)
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-
> c.c
> index cb7d34dcdb5..db6aecfad2d 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -1800,22 +1800,37 @@ altivec_resolve_overloaded_builtin
> (location_t loc, tree fndecl,
>  	      unsupported_builtin = true;
>  	  }
>        }
> -    else if (fcode == P10_BUILTIN_VEC_XXEVAL)
> +    else if ((fcode == P10_BUILTIN_VEC_XXEVAL)
> +	    || (fcode == P10_BUILTIN_VXXPERMX))
>        {
> -	/* Need to special case __builtin_vec_xxeval because this takes
> -	   4 arguments, and the existing infrastructure handles no
> -	   more than three.  */
> +	signed char op3_type;
> +
> +	/* Need to special case the builins_xxeval because it takes
> +	   4 arguments, and the existing infrastructure handles
> three.  */

A couple typos and should probably add the xxpermx reference.  so stl:

.. the builtins __builtin_vec_xxeval and __builtin_vec_xxpermx because
they require 4 arguments .. 


>  	if (nargs != 4)
>  	  {
> -	    error ("builtin %qs requires 4 arguments",
> -		   "__builtin_vec_xxeval");
> +	    if (fcode == P10_BUILTIN_VEC_XXEVAL)
> +	      error ("builtin %qs requires 4 arguments",
> +		     "__builtin_vec_xxeval");
> +	    else
> +	      error ("builtin %qs requires 4 arguments",
> +		     "__builtin_vec_xxpermx");
> +


May be able to compress this a bit, see what was done with the argument
checking for ALTIVEC_BUILTIN_VEC_ADDEC. 


>  	    return error_mark_node;
>  	  }
> +
> +	/* Set value for vec_xxpermx here as it is a constant.  */
> +	op3_type = RS6000_BTI_V16QI;
> +
>  	for ( ; desc->code == fcode; desc++)
>  	  {
> +	    if (fcode == P10_BUILTIN_VEC_XXEVAL)
> +	      op3_type = desc->op3;

I had to confirm the op3_type change below was proper..   Since the
only use of op3_type is within this sub-block, i'd say combine the
previous assignment with the logic here so it's clear that op3_type has
been preperly set before the call into
rs600)_builtin_type_compatible().

something like 

	if (fcode == P10_BUILTIN_VEC_XXEVAL)
		op3_type = desc->op3;
	else  /* P10_BUILTIN_VXXPERMX */
		op3_type = RS6000_BTI_V16QI;

> +
>  	    if (rs6000_builtin_type_compatible (types[0], desc->op1)
>  		&& rs6000_builtin_type_compatible (types[1], desc->op2)
>  		&& rs6000_builtin_type_compatible (types[2], desc->op3)
> +		&& rs6000_builtin_type_compatible (types[2], op3_type)
>  		&& rs6000_builtin_type_compatible (types[3],
>  						   RS6000_BTI_UINTSI))
>  	      {

> diff --git a/gcc/config/rs6000/rs6000-call.c
> b/gcc/config/rs6000/rs6000-call.c
> index 06320279138..dc69d4873a0 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5563,6 +5563,39 @@ const struct altivec_builtin_types
> altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
>      RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
> 
> +  /* The overloaded XXPERMX definitions are handled specially
> because the
> +     fourth unsigned char operand is not encoded in this table.  */
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI,
> +     RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> +     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI,
> +     RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
> +     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI,
> +     RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
> +     RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF,
> +     RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
> +     RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF,
> +     RS6000_BTI_unsigned_V16QI },
> +
>    { P10_BUILTIN_VEC_EXTRACTL, P10_BUILTIN_VEXTRACTBL,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
> @@ -5704,6 +5737,37 @@ const struct altivec_builtin_types
> altivec_overloaded_builtins[] = {
>    { P10_BUILTIN_VEC_XXSPLTI32DX, P10_BUILTIN_VXXSPLTI32DX_V4SF,
>      RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_UINTQI,
> RS6000_BTI_float },
> 
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V16QI,
> +     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI,
> +     RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V16QI,
> +     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> +     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V8HI,
> +     RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI,
> +     RS6000_BTI_unsigned_V8HI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V8HI,
> +     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
> +     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SI,
> +     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI,
> +     RS6000_BTI_unsigned_V4SI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SI,
> +     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DI,
> +     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
> +     RS6000_BTI_unsigned_V2DI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DI,
> +     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SF,
> +     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF,
> +     RS6000_BTI_unsigned_V4SI },
> +  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DF,
> +     RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF,
> +     RS6000_BTI_unsigned_V2DI },
> +
>    { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI,
>      RS6000_BTI_V16QI, RS6000_BTI_V16QI,
>      RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
> @@ -10101,6 +10165,19 @@ rs6000_expand_quaternop_builtin (enum
> insn_code icode, tree exp, rtx target)
>  	  return CONST0_RTX (tmode);
>  	}
>      }
> +
> +  else if (icode == CODE_FOR_xxpermx)
> +    {
> +      /* Only allow 3-bit unsigned literals.  */
> +      STRIP_NOPS (arg3);
> +      if (TREE_CODE (arg3) != INTEGER_CST
> +	  || TREE_INT_CST_LOW (arg3) & ~0x7)
> +	{
> +	  error ("argument 4 must be an 3-bit unsigned literal");

s/an/a/

> +	  return CONST0_RTX (tmode);
> +	}
> +    }
> +
>    else if (icode == CODE_FOR_vreplace_elt_v4si
>  	   || icode == CODE_FOR_vreplace_elt_v4sf)
>     {
> @@ -13788,12 +13865,17 @@ builtin_quaternary_function_type
> (machine_mode mode_ret,
>    tree function_type = NULL;
> 
>    static tree v2udi_type = builtin_mode_to_type[V2DImode][1];
> +  static tree v16uqi_type = builtin_mode_to_type[V16QImode][1];
>    static tree uchar_type = builtin_mode_to_type[QImode][1];
> 
>    static tree xxeval_type =
>      build_function_type_list (v2udi_type, v2udi_type, v2udi_type,
>  			      v2udi_type, uchar_type, NULL_TREE);
> 
> +  static tree xxpermx_type =
> +    build_function_type_list (v2udi_type, v2udi_type, v2udi_type,
> +			      v16uqi_type, uchar_type, NULL_TREE);
> +
>    switch (builtin) {
> 
>    case P10_BUILTIN_XXEVAL:
> @@ -13805,6 +13887,15 @@ builtin_quaternary_function_type
> (machine_mode mode_ret,
>      function_type = xxeval_type;
>      break;
> 
> +  case P10_BUILTIN_VXXPERMX:
> +    gcc_assert ((mode_ret == V2DImode)
> +		&& (mode_arg0 == V2DImode)
> +		&& (mode_arg1 == V2DImode)
> +		&& (mode_arg2 == V16QImode)
> +		&& (mode_arg3 == QImode));
> +    function_type = xxpermx_type;
> +    break;
> +
>    default:
>      /* A case for each quaternary built-in must be provided
> above.  */
>      gcc_unreachable ();
> @@ -13986,6 +14077,10 @@ builtin_function_type (machine_mode
> mode_ret, machine_mode mode_arg0,
>      case P10_BUILTIN_VREPLACE_ELT_UV2DI:
>      case P10_BUILTIN_VREPLACE_UN_UV4SI:
>      case P10_BUILTIN_VREPLACE_UN_UV2DI:
> +    case P10_BUILTIN_VXXBLEND_V16QI:
> +    case P10_BUILTIN_VXXBLEND_V8HI:
> +    case P10_BUILTIN_VXXBLEND_V4SI:
> +    case P10_BUILTIN_VXXBLEND_V2DI:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;


ok


> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e9aa06553aa..0e4d91a43f6 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21200,6 +21200,69 @@ result.  The other words of argument 1 are
> unchanged.
> 
>  @findex vec_splati_ins
> 
> +Vector Blend Variable
> +
> +@smallexample
> +@exdent vector signed char vec_blendv (vector signed char, vector
> signed char,
> +vector unsigned char);
> +@exdent vector unsigned char vec_blendv (vector unsigned char,
> +vector unsigned char, vector unsigned char);
> +@exdent vector signed short vec_blendv (vector signed short,
> +vector signed short, vector unsigned short);
> +@exdent vector unsigned short vec_blendv (vector unsigned short,
> +vector unsigned short, vector unsigned short);
> +@exdent vector signed int vec_blendv (vector signed int, vector
> signed int,
> +vector unsigned int);
> +@exdent vector unsigned int vec_blendv (vector unsigned int,
> +vector unsigned int, vector unsigned int);
> +@exdent vector signed long long vec_blendv (vector signed long long,
> +vector signed long long, vector unsigned long long);
> +@exdent vector unsigned long long vec_blendv (vector unsigned long
> long,
> +vector unsigned long long, vector unsigned long long);
> +@exdent vector float vec_blendv (vector float, vector float,
> +vector unsigned int);
> +@exdent vector double vec_blendv (vector double, vector double,
> +vector unsigned long long);
> +@end smallexample
> +
> +Blend the first and second argument vectors according to the sign
> bits of the
> +corresponding elements of the third argument vector.  This is
> similar to the
> +vsel and xxsel instructions but for bigger elements.

@code{} around vsel,xxsel


> +
> +@findex vec_blendv
> +
> +Vector Permute Extended
> +
> +@smallexample
> +@exdent vector signed char vec_permx (vector signed char, vector
> signed char,
> +vector unsigned char, const int);
> +@exdent vector unsigned char vec_permx (vector unsigned char,
> +vector unsigned char, vector unsigned char, const int);
> +@exdent vector signed short vec_permx (vector signed short,
> +vector signed short, vector unsigned char, const int);
> +@exdent vector unsigned short vec_permx (vector unsigned short,
> +vector unsigned short, vector unsigned char, const int);
> +@exdent vector signed int vec_permx (vector signed int, vector
> signed int,
> +vector unsigned char, const int);
> +@exdent vector unsigned int vec_permx (vector unsigned int,
> +vector unsigned int, vector unsigned char, const int);
> +@exdent vector signed long long vec_permx (vector signed long long,
> +vector signed long long, vector unsigned char, const int);
> +@exdent vector unsigned long long vec_permx (vector unsigned long
> long,
> +vector unsigned long long, vector unsigned char, const int);
> +@exdent vector float (vector float, vector float, vector unsigned
> char,
> +const int);
> +@exdent vector double (vector double, vector double, vector unsigned
> char,
> +const int);
> +@end smallexample
> +
> +Perform a partial permute of the first two arguments, which form a
> 32-byte
> +section of an emulated vector up to 256 bytes wide, using the
> partial permute
> +control vector in the third argument.  The fourth argument
> (constrained to
> +values of 0-7) identifies which 32-byte section of the emulated
> vector is
> +contained in the first two arguments.
> +@findex vec_permx
> +
>  @smallexample
>  @exdent vector unsigned long long int
>  @exdent vec_pext (vector unsigned long long int, vector unsigned
> long long int)


ok

Glanced at tests, nothing jumped out at me there.  
<snip>

Thanks,
-Will



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
@ 2020-07-09 17:38 ` will schmidt
  2020-07-15 20:07 ` Segher Boessenkool
  1 sibling, 0 replies; 18+ messages in thread
From: will schmidt @ 2020-07-09 17:38 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 5/6] rs6000, Add vector splat builtin support
> 
> ----------------------------------
> V4 Fixes:
> 
>    Rebased on mainline.  Changed FUTURE to P10.
>    define_predicate "s32bit_cint_operand" removed unnecessary cast in
>      definition.
>    Changed define_expand "xxsplti32dx_v4si" to use "0" for constraint
>      of operand 1.
>    Changed define_insn "xxsplti32dx_v4si_inst" to use "0 for constraint
>      of operand 1.
>    Removed define_predicate "f32bit_const_operand".  Use const_double_operand
>      instead.
> 
>    *** Please provide feedback for the following change:
>    (define_insn "xxspltidp_v2df_inst", Added print statement to warn of
>    possible undefined behavior.  The xxspltidp instruction result is
>    undefined for subnormal inputs.  I added a test for subnormal input with
>    a fprintf to stderr to warn the "user" if the constant input is a subnormal
>    value.  I tried assert initially, but that causes GCC to exit ungracefully
>    with no information as to why.  I really didn't like that behavior.
>    A subnormal input is not really a fatal error but the "user" needs
>    to be told it is not a good idea.  Not sure if using an fprintf statement
>    in a define_insn is an acceptable thing either.  But it does give the
>    user the needed input and GCC exits normally.  Let me know if there
>    is a better option here.


Maybe this should be an RFC/Patch then..   Put this in the intro
paragraph, not in the v4 fixes blurb. 


I certainly defer to other opinions here.
I don't see any other define_insn entries that emit warning printfs for undefined results.
I'd lean towards dropping that part here.

You may be able to add some logic over in rs6000-call.c  rs6000_expand_* to handle this.
Though I think there are many cases where undefined results can be the result of user input.
Perhaps just documenting this (extend.texi) is enough? 


<snip>
Nothing else below jumped out at me.

thanks, 
-Will




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
@ 2020-07-09 16:13 ` will schmidt
  2020-07-14 20:15 ` Segher Boessenkool
  1 sibling, 0 replies; 18+ messages in thread
From: will schmidt @ 2020-07-09 16:13 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 4/6] rs6000, Add vector shift double builtin support
> 

Nothing popped out at me for this patch.
lgtm

thanks
-Will


> ----------------------------------
> V4 Fixes:
> 
>    Rebased on mainline.  Changed FUTURE to P10.
>    Changed SLDB_LR to SLDB_lr
>    Changed error ("argument 3 must be in the range 0 to 7"); to
>        error ("argument 3 must be a constant in the range 0 to 7");
> 
> -----------------------------------------------------------------
> V3 Fixes
> 	Replace spaces with tabs in ChangeLog.
> 	Minor edits to ChangeLog entry.
> 	Minor edits to vec_sldb description in gcc/doc/extend.texi.
> 
> ----------------------------------------------------
> v2 fixes:
> 
>  change logs redone
> 
>   gcc/config/rs6000/rs6000-call.c - added spaces before parenthesis
> around args.
> 
> -----------------------------------------------------------------
> GCC maintainers:
> 
> The following patch adds support for the vector shift double
> builtins.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> and Mambo with no regression errors.
> 
> Please let me know if this patch is acceptable for the mainline
> branch.
> 
> Thanks.
> 
>                          Carl Love
> 
> -------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-07-06  Carl Love  <cel@us.ibm.com>
> 
> 	* config/rs6000/altivec.h (vec_sldb, vec_srdb): New defines.
> 	* config/rs6000/altivec.md (UNSPEC_SLDB, UNSPEC_SRDB): New.
> 	(SLDB_LR): New attribute.
> 	(VSHIFT_DBL_LR): New iterator.
> 	(vs<SLDB_LR>db_<mode>): New define_insn.
> 	* config/rs6000/rs6000-builtin.def (VSLDB_V16QI, VSLDB_V8HI,
> 	VSLDB_V4SI, VSLDB_V2DI, VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI,
> 	VSRDB_V2DI): New BU_P10V_3 definitions.
> 	(SLDB, SRDB): New BU_P10_OVERLOAD_3 definitions.
> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_SLDB,
> 	P10_BUILTIN_VEC_SRDB): New definitions.
> 	(rs6000_expand_ternop_builtin) [CODE_FOR_vsldb_v16qi,
> 	CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di,
> 	CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si,
> 	CODE_FOR_vsrdb_v2di}: Add clauses.
> 	* doc/extend.texi: Add description for vec_sldb and vec_srdb.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-07-06  Carl Love  <cel@us.ibm.com>
> 
> 	* gcc.target/powerpc/vec-shift-double-runnable.c:  New test
> file.
> ---
>  gcc/config/rs6000/altivec.h                   |   2 +
>  gcc/config/rs6000/altivec.md                  |  18 +
>  gcc/config/rs6000/rs6000-builtin.def          |  12 +
>  gcc/config/rs6000/rs6000-call.c               |  70 ++++
>  gcc/doc/extend.texi                           |  53 +++
>  .../powerpc/vec-shift-double-runnable.c       | 384
> ++++++++++++++++++
>  6 files changed, 539 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-
> double-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index 560c43cfc99..c202fcf25da 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -703,6 +703,8 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
>  #define vec_replace_elt(a, b, c)       __builtin_vec_replace_elt (a,
> b, c)
>  #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a,
> b, c)
> +#define vec_sldb(a, b, c)      __builtin_vec_sldb (a, b, c)
> +#define vec_srdb(a, b, c)      __builtin_vec_srdb (a, b, c)
> 
>  #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
>  #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 749b2c42c14..c58fb3961e0 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -172,6 +172,8 @@
>     UNSPEC_XXEVAL
>     UNSPEC_VSTRIR
>     UNSPEC_VSTRIL
> +   UNSPEC_SLDB
> +   UNSPEC_SRDB
>  ])
> 
>  (define_c_enum "unspecv"
> @@ -782,6 +784,22 @@
>    DONE;
>  })
> 
> +;; Map UNSPEC_SLDB to "l" and  UNSPEC_SRDB to "r".
> +(define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
> +			  (UNSPEC_SRDB "r")])
> +
> +(define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> +
> +(define_insn "vs<SLDB_lr>db_<mode>"
> + [(set (match_operand:VI2 0 "register_operand" "=v")
> +  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> +	       (match_operand:VI2 2 "register_operand" "v")
> +	       (match_operand:QI 3 "const_0_to_12_operand" "n")]
> +	      VSHIFT_DBL_LR))]
> +  "TARGET_POWER10"
> +  "vs<SLDB_lr>dbi %0,%1,%2,%3"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_expand "vstrir_<mode>"
>    [(set (match_operand:VIshort 0 "altivec_register_operand")
>  	(unspec:VIshort [(match_operand:VIshort 1
> "altivec_register_operand")]
> diff --git a/gcc/config/rs6000/rs6000-builtin.def
> b/gcc/config/rs6000/rs6000-builtin.def
> index e22b3e4d53b..c6fdfadeda8 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2738,6 +2738,16 @@ BU_P10V_3 (VREPLACE_UN_V2DI,
> "vreplace_un_v2di", CONST, vreplace_un_v2di)
>  BU_P10V_3 (VREPLACE_UN_UV2DI, "vreplace_un_uv2di", CONST,
> vreplace_un_v2di)
>  BU_P10V_3 (VREPLACE_UN_V2DF, "vreplace_un_v2df", CONST,
> vreplace_un_v2df)
> 
> +BU_P10V_3 (VSLDB_V16QI, "vsldb_v16qi", CONST, vsldb_v16qi)
> +BU_P10V_3 (VSLDB_V8HI, "vsldb_v8hi", CONST, vsldb_v8hi)
> +BU_P10V_3 (VSLDB_V4SI, "vsldb_v4si", CONST, vsldb_v4si)
> +BU_P10V_3 (VSLDB_V2DI, "vsldb_v2di", CONST, vsldb_v2di)
> +
> +BU_P10V_3 (VSRDB_V16QI, "vsrdb_v16qi", CONST, vsrdb_v16qi)
> +BU_P10V_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
> +BU_P10V_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
> +BU_P10V_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
> +
>  BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
>  BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
>  BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
> @@ -2761,6 +2771,8 @@ BU_P10_OVERLOAD_3 (INSERTL, "insertl")
>  BU_P10_OVERLOAD_3 (INSERTH, "inserth")
>  BU_P10_OVERLOAD_3 (REPLACE_ELT, "replace_elt")
>  BU_P10_OVERLOAD_3 (REPLACE_UN, "replace_un")
> +BU_P10_OVERLOAD_3 (SLDB, "sldb")
> +BU_P10_OVERLOAD_3 (SRDB, "srdb")
> 
>  BU_P10_OVERLOAD_1 (VSTRIR, "strir")
>  BU_P10_OVERLOAD_1 (VSTRIL, "stril")
> diff --git a/gcc/config/rs6000/rs6000-call.c
> b/gcc/config/rs6000/rs6000-call.c
> index d5d294fd940..edc67fafd88 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5663,6 +5663,56 @@ const struct altivec_builtin_types
> altivec_overloaded_builtins[] = {
>    { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_V2DF,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_double,
> RS6000_BTI_INTQI },
> 
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V16QI,
> +    RS6000_BTI_V16QI, RS6000_BTI_V16QI,
> +    RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V16QI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V8HI,
> +    RS6000_BTI_V8HI, RS6000_BTI_V8HI,
> +    RS6000_BTI_V8HI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V8HI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
> +
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI,
> +    RS6000_BTI_V16QI, RS6000_BTI_V16QI,
> +    RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V8HI,
> +    RS6000_BTI_V8HI, RS6000_BTI_V8HI,
> +    RS6000_BTI_V8HI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V8HI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
> +
>    { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
>    { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
> @@ -10063,6 +10113,26 @@ rs6000_expand_quaternop_builtin (enum
> insn_code icode, tree exp, rtx target)
>  	}
>     }
> 
> +  else if (icode == CODE_FOR_vsldb_v16qi
> +	   || icode == CODE_FOR_vsldb_v8hi
> +	   || icode == CODE_FOR_vsldb_v4si
> +	   || icode == CODE_FOR_vsldb_v2di
> +	   || icode == CODE_FOR_vsrdb_v16qi
> +	   || icode == CODE_FOR_vsrdb_v8hi
> +	   || icode == CODE_FOR_vsrdb_v4si
> +	   || icode == CODE_FOR_vsrdb_v2di)
> +   {
> +     /* Check whether the 3rd argument is an integer constant in the
> range
> +	0 to 7 inclusive.  */
> +     STRIP_NOPS (arg2);
> +     if (TREE_CODE (arg2) != INTEGER_CST
> +	 || !IN_RANGE (TREE_INT_CST_LOW (arg2), 0, 7))
> +	{
> +	  error ("argument 3 must be a constant in the range 0 to 7");
> +	  return CONST0_RTX (tmode);
> +	}
> +   }
> +
>    if (target == 0
>        || GET_MODE (target) != tmode
>        || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b9cbd136316..1c39be37c1d 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21112,6 +21112,59 @@ The programmer is responsible for
> understanding the endianness issues involved
>  with the first argument and the result.
>  @findex vec_replace_unaligned
> 
> +Vector Shift Left Double Bit Immediate
> +@smallexample
> +@exdent vector signed char vec_sldb (vector signed char, vector
> signed char,
> +const unsigned int);
> +@exdent vector unsigned char vec_sldb (vector unsigned char,
> +vector unsigned char, const unsigned int);
> +@exdent vector signed short vec_sldb (vector signed short, vector
> signed short,
> +const unsigned int);
> +@exdent vector unsigned short vec_sldb (vector unsigned short,
> +vector unsigned short, const unsigned int);
> +@exdent vector signed int vec_sldb (vector signed int, vector signed
> int,
> +const unsigned int);
> +@exdent vector unsigned int vec_sldb (vector unsigned int, vector
> unsigned int,
> +const unsigned int);
> +@exdent vector signed long long vec_sldb (vector signed long long,
> +vector signed long long, const unsigned int);
> +@exdent vector unsigned long long vec_sldb (vector unsigned long
> long,
> +vector unsigned long long, const unsigned int);
> +@end smallexample
> +
> +Shift the combined input vectors left by the amount specified by the
> low-order
> +three bits of the third argument, and return the leftmost remaining
> 128 bits.
> +Code using this instruction must be endian-aware.
> +
> +@findex vec_sldb
> +
> +Vector Shift Right Double Bit Immediate
> +
> +@smallexample
> +@exdent vector signed char vec_srdb (vector signed char, vector
> signed char,
> +const unsigned int);
> +@exdent vector unsigned char vec_srdb (vector unsigned char, vector
> unsigned char,
> +const unsigned int);
> +@exdent vector signed short vec_srdb (vector signed short, vector
> signed short,
> +const unsigned int);
> +@exdent vector unsigned short vec_srdb (vector unsigned short,
> vector unsigned short,
> +const unsigned int);
> +@exdent vector signed int vec_srdb (vector signed int, vector signed
> int,
> +const unsigned int);
> +@exdent vector unsigned int vec_srdb (vector unsigned int, vector
> unsigned int,
> +const unsigned int);
> +@exdent vector signed long long vec_srdb (vector signed long long,
> +vector signed long long, const unsigned int);
> +@exdent vector unsigned long long vec_srdb (vector unsigned long
> long,
> +vector unsigned long long, const unsigned int);
> +@end smallexample
> +
> +Shift the combined input vectors right by the amount specified by
> the low-order
> +three bits of the third argument, and return the remaining 128
> bits.  Code
> +using this built-in must be endian-aware.
> +
> +@findex vec_srdb
> +
>  @smallexample
>  @exdent vector unsigned long long int
>  @exdent vec_pext (vector unsigned long long int, vector unsigned
> long long int)
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-
> runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-
> runnable.c
> new file mode 100644
> index 00000000000..13213bd22ee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c
> @@ -0,0 +1,384 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10" } */
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#ifdef DEBUG
> +#include <stdio.h>
> +#endif
> +
> +extern void abort (void);
> +
> +int
> +main (int argc, char *argv [])
> +{
> +  int i;
> +
> +  vector signed char vresult_char;
> +  vector signed char expected_vresult_char;
> +  vector signed char src_va_char;
> +  vector signed char src_vb_char;
> +
> +  vector unsigned char vresult_uchar;
> +  vector unsigned char expected_vresult_uchar;
> +  vector unsigned char src_va_uchar;
> +  vector unsigned char src_vb_uchar;
> +
> +  vector short int vresult_sh;
> +  vector short int expected_vresult_sh;
> +  vector short int src_va_sh;
> +  vector short int src_vb_sh;
> +
> +  vector short unsigned int vresult_ush;
> +  vector short unsigned int expected_vresult_ush;
> +  vector short unsigned int src_va_ush;
> +  vector short unsigned int src_vb_ush;
> +
> +  vector int vresult_int;
> +  vector int expected_vresult_int;
> +  vector int src_va_int;
> +  vector int src_vb_int;
> +  int src_a_int;
> +
> +  vector unsigned int vresult_uint;
> +  vector unsigned int expected_vresult_uint;
> +  vector unsigned int src_va_uint;
> +  vector unsigned int src_vb_uint;
> +  unsigned int src_a_uint;
> +
> +  vector long long int vresult_llint;
> +  vector long long int expected_vresult_llint;
> +  vector long long int src_va_llint;
> +  vector long long int src_vb_llint;
> +  long long int src_a_llint;
> +
> +  vector unsigned long long int vresult_ullint;
> +  vector unsigned long long int expected_vresult_ullint;
> +  vector unsigned long long int src_va_ullint;
> +  vector unsigned long long int src_vb_ullint;
> +  unsigned int long long src_a_ullint;
> +
> +  /* Vector shift double left */
> +  src_va_char = (vector signed char) { 0, 2, 4, 6, 8, 10, 12, 14,
> +				       16, 18, 20, 22, 24, 26, 28, 30
> }; 
> +  src_vb_char = (vector signed char) { 10, 20, 30, 40, 50, 60, 70,
> 80, 90,
> +					100, 110, 120, 130, 140, 150,
> 160 };
> +  vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_char = (vector signed char) { 80, 0, 1, 2, 3, 4,
> 5, 6, 7,
> +						 8, 9, 10, 11, 12, 13,
> 14 }; 
> +						 
> +  vresult_char = vec_sldb (src_va_char, src_vb_char, 7);
> +
> +  if (!vec_all_eq (vresult_char,  expected_vresult_char)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_char_, src_vb_char, 7)\n");
> +    for(i = 0; i < 16; i++)
> +      printf(" vresult_char[%d] = %d, expected_vresult_char[%d] =
> %d\n",
> +	     i, vresult_char[i], i, expected_vresult_char[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_uchar = (vector unsigned char) { 0, 2, 4, 6, 8, 10, 12, 14,
> +					  16, 18, 20, 22, 24, 26, 28,
> 30 }; 
> +  src_vb_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_uchar = (vector unsigned char) { 0, 0, 1, 2, 3,
> 4, 5, 6, 7,
> +						    8, 9, 10, 11, 12,
> 13, 14 };
> +						 
> +  vresult_uchar = vec_sldb (src_va_uchar, src_vb_uchar, 7);
> +
> +  if (!vec_all_eq (vresult_uchar,  expected_vresult_uchar)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_uchar_, src_vb_uchar, 7)\n");
> +    for(i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] =
> %d\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_sh = (vector short int) { 0, 2, 4, 6, 8, 10, 12, 14 };
> +  src_vb_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
> +  vresult_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_sh = (vector short int) { 0, 2*128, 4*128, 6*128,
> +					     8*128, 10*128, 12*128,
> 14*128 }; 
> +						 
> +  vresult_sh = vec_sldb (src_va_sh, src_vb_sh, 7);
> +
> +  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_sh_, src_vb_sh, 7)\n");
> +    for(i = 0; i < 8; i++)
> +      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
> +	     i, vresult_sh[i], i, expected_vresult_sh[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_ush = (vector short unsigned int) { 0, 2, 4, 6, 8, 10, 12,
> 14 };
> +  src_vb_ush = (vector short unsigned int) { 10, 20, 30, 40, 50, 60,
> 70, 80 };
> +  vresult_ush = (vector short unsigned int) { 0, 0, 0, 0, 0, 0, 0, 0
> };
> +  expected_vresult_ush = (vector short unsigned int) { 0, 2*128,
> 4*128, 6*128,
> +						       8*128, 10*128,
> 12*128,
> +						       14*128 }; 
> +						 
> +  vresult_ush = vec_sldb (src_va_ush, src_vb_ush, 7);
> +
> +  if (!vec_all_eq (vresult_ush,  expected_vresult_ush)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_ush_, src_vb_ush, 7)\n");
> +    for(i = 0; i < 8; i++)
> +      printf(" vresult_ush[%d] = %d, expected_vresult_ush[%d] =
> %d\n",
> +	     i, vresult_ush[i], i, expected_vresult_ush[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_int = (vector signed int) { 0, 2, 3, 1 };
> +  src_vb_int = (vector signed int) { 0, 0, 0, 0 };
> +  vresult_int = (vector signed int) { 0, 0, 0, 0 };
> +  expected_vresult_int = (vector signed int) { 0, 2*128, 3*128,
> 1*128 }; 
> +						 
> +  vresult_int = vec_sldb (src_va_int, src_vb_int, 7);
> +
> +  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_int_, src_vb_int, 7)\n");
> +    for(i = 0; i < 4; i++)
> +      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
> %d\n",
> +	     i, vresult_int[i], i, expected_vresult_int[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_uint = (vector unsigned int) { 0, 2, 4, 6 };
> +  src_vb_uint = (vector unsigned int) { 10, 20, 30, 40 };
> +  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
> +  expected_vresult_uint = (vector unsigned int) { 0, 2*128, 4*128,
> 6*128 }; 
> +						 
> +  vresult_uint = vec_sldb (src_va_uint, src_vb_uint, 7);
> +
> +  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_uint_, src_vb_uint, 7)\n");
> +    for(i = 0; i < 4; i++)
> +      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
> %d\n",
> +	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_llint = (vector signed long long int) { 5, 6 };
> +  src_vb_llint = (vector signed long long int) { 0, 0 };
> +  vresult_llint = (vector signed long long int) { 0, 0 };
> +  expected_vresult_llint = (vector signed long long int) { 5*128,
> 6*128 }; 
> +						 
> +  vresult_llint = vec_sldb (src_va_llint, src_vb_llint, 7);
> +
> +  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_llint_, src_vb_llint, 7)\n");
> +    for(i = 0; i < 2; i++)
> +      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] =
> %d\n",
> +	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_ullint = (vector unsigned long long int) { 54, 26 };
> +  src_vb_ullint = (vector unsigned long long int) { 10, 20 };
> +  vresult_ullint = (vector unsigned long long int) { 0, 0 };
> +  expected_vresult_ullint = (vector unsigned long long int) {
> 54*128,
> +							      26*128
> }; 
> +						 
> +  vresult_ullint = vec_sldb (src_va_ullint, src_vb_ullint, 7);
> +
> +  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> +#if DEBUG
> +    printf("ERROR, vec_sldb (src_va_ullint_, src_vb_ullint, 7)\n");
> +    for(i = 0; i < 2; i++)
> +      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d]
> = %d\n",
> +	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector shift double right */
> +  src_va_char = (vector signed char) { 0, 2, 4, 6, 8, 10, 12, 14,
> +				       16, 18, 20, 22, 24, 26, 28, 30
> }; 
> +  src_vb_char = (vector signed char) { 10, 12, 14, 16, 18, 20, 22,
> 24, 26,
> +					28, 30, 32, 34, 36, 38, 40 };
> +  vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_char = (vector signed char) { 24, 28, 32, 36, 40,
> 44, 48,
> +						 52, 56, 60, 64, 68,
> 72, 76,
> +						 80, 0 }; 
> +						 
> +  vresult_char = vec_srdb (src_va_char, src_vb_char, 7);
> +
> +  if (!vec_all_eq (vresult_char,  expected_vresult_char)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_char_, src_vb_char, 7)\n");
> +    for(i = 0; i < 16; i++)
> +      printf(" vresult_char[%d] = %d, expected_vresult_char[%d] =
> %d\n",
> +	     i, vresult_char[i], i, expected_vresult_char[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_uchar = (vector unsigned char) { 100, 0, 0, 0, 0, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
> +  src_vb_uchar = (vector unsigned char) { 0, 2, 4, 6, 8, 10, 12, 14,
> +					  16, 18, 20, 22, 24, 26, 28,
> 30 }; 
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_uchar = (vector unsigned char) { 4, 8, 12, 16,
> 20, 24, 28,
> +						    32, 36, 40, 44, 48,
> 52,
> +						    56, 60, 200 };
> +						 
> +  vresult_uchar = vec_srdb (src_va_uchar, src_vb_uchar, 7);
> +
> +  if (!vec_all_eq (vresult_uchar,  expected_vresult_uchar)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_uchar_, src_vb_uchar, 7)\n");
> +    for(i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] =
> %d\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
> +  src_vb_sh = (vector short int) { 0, 2*128, 4*128, 6*128,
> +					     8*128, 10*128, 12*128,
> 14*128 };
> +  vresult_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_sh = (vector short int) { 0, 2, 4, 6, 8, 10, 12,
> 14 }; 
> +						 
> +  vresult_sh = vec_srdb (src_va_sh, src_vb_sh, 7);
> +
> +  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_sh_, src_vb_sh, 7)\n");
> +    for(i = 0; i < 8; i++)
> +      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
> +	     i, vresult_sh[i], i, expected_vresult_sh[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_ush = (vector short unsigned int) { 0, 20, 30, 40, 50, 60,
> 70, 80 };
> +  src_vb_ush = (vector short unsigned int) { 0, 2*128, 4*128, 6*128,
> +					     8*128, 10*128, 12*128,
> 14*128 };
> +  vresult_ush = (vector short unsigned int) { 0, 0, 0, 0, 0, 0, 0, 0
> };
> +  expected_vresult_ush = (vector short unsigned int) { 0, 2, 4, 6,
> 8, 10,
> +						       12, 14 }; 
> +						 
> +  vresult_ush = vec_srdb (src_va_ush, src_vb_ush, 7);
> +
> +  if (!vec_all_eq (vresult_ush,  expected_vresult_ush)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_ush_, src_vb_ush, 7)\n");
> +    for(i = 0; i < 8; i++)
> +      printf(" vresult_ush[%d] = %d, expected_vresult_ush[%d] =
> %d\n",
> +	     i, vresult_ush[i], i, expected_vresult_ush[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_int = (vector signed int) { 0, 0, 0, 0 };
> +  src_vb_int = (vector signed int) { 0, 2*128, 3*128, 1*128 };
> +  vresult_int = (vector signed int) { 0, 0, 0, 0 };
> +  expected_vresult_int = (vector signed int) { 0, 2, 3, 1  }; 
> +						 
> +  vresult_int = vec_srdb (src_va_int, src_vb_int, 7);
> +
> +  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_int_, src_vb_int, 7)\n");
> +    for(i = 0; i < 4; i++)
> +      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
> %d\n",
> +	     i, vresult_int[i], i, expected_vresult_int[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_uint = (vector unsigned int) { 0, 20, 30, 40 };
> +  src_vb_uint = (vector unsigned int) { 128, 2*128, 4*128, 6*128 };
> +  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
> +  expected_vresult_uint = (vector unsigned int) { 1, 2, 4, 6 }; 
> +						 
> +  vresult_uint = vec_srdb (src_va_uint, src_vb_uint, 7);
> +
> +  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_uint_, src_vb_uint, 7)\n");
> +    for(i = 0; i < 4; i++)
> +      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
> %d\n",
> +	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_llint = (vector signed long long int) { 0, 0 };
> +  src_vb_llint = (vector signed long long int) { 5*128, 6*128 };
> +  vresult_llint = (vector signed long long int) { 0, 0 };
> +  expected_vresult_llint = (vector signed long long int) { 5, 6 }; 
> +						 
> +  vresult_llint = vec_srdb (src_va_llint, src_vb_llint, 7);
> +
> +  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_llint_, src_vb_llint, 7)\n");
> +    for(i = 0; i < 2; i++)
> +      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] =
> %d\n",
> +	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  src_va_ullint = (vector unsigned long long int) { 0, 0 };
> +  src_vb_ullint = (vector unsigned long long int) { 54*128, 26*128
> };
> +  vresult_ullint = (vector unsigned long long int) { 0, 0 };
> +  expected_vresult_ullint = (vector unsigned long long int) { 54, 26
> }; 
> +
> +  vresult_ullint = vec_srdb (src_va_ullint, src_vb_ullint, 7);
> +
> +  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> +#if DEBUG
> +    printf("ERROR, vec_srdb (src_va_ullint_, src_vb_ullint, 7)\n");
> +    for(i = 0; i < 2; i++)
> +      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d]
> = %d\n",
> +	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {\msldbi\M} 6 } } */
> +/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */
> +
> +


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
@ 2020-07-09 16:02 ` will schmidt
  2020-07-13 12:41   ` Segher Boessenkool
  2020-07-13 14:30 ` Segher Boessenkool
  1 sibling, 1 reply; 18+ messages in thread
From: will schmidt @ 2020-07-09 16:02 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 3/6] rs6000, Add vector replace builtin support
> 
> ----------------------------------
> V4 Fixes:
> 
>    Rebased on mainline.  Changed FUTURE to P10 in code and ChangeLog.
>    Set DEBUG to 0 in vec-replace-word-runnable.c test program.
>    Fixed too long lines in ChangeLog.
> 
> ----------------------------------
> V3 fixes:
>    Fixed bad word breaks in ChangLog.
>    Replace spaces with tabs in ChangeLog.
> 
> ------------------------------------
> v2 fixes:
> 
> change log entries config/rs6000/vsx.md, config/rs6000/rs6000-builtin.def,
> config/rs6000/rs6000-call.c.
> 
> gcc/config/rs6000/rs6000-call.c: fixed if check for 3rd arg between 0 and 3
>                                  fixed if check for 3rd arg between 0 and 12
> 
> gcc/config/rs6000/vsx.md: removed REPLACE_ELT_atr definition and used
>                           VS_scalar instead.
>                           removed REPLACE_ELT_inst definition and used
> 			  <mode> instead
>                           fixed spelling mistake on Endianness.
>                           fixed indenting for vreplace_elt_<mode>
> 
> -----------------------------------
> 
> GCC maintainers:
> 
> The following patch adds support for builtins vec_replace_elt and
> vec_replace_unaligned.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> and mambo with no regression errors.
> 
> Please let me know if this patch is acceptable for the mainline
> branch.  Thanks.
> 
>                          Carl Love
> 
> -------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-07-06 Carl Love  <cel@us.ibm.com>
> 
> 	* config/rs6000/altivec.h: Add define for vec_replace_elt and
> 	vec_replace_unaligned.
> 	* config/rs6000/vsx.md (UNSPEC_REPLACE_ELT, UNSPEC_REPLACE_UN): New.

New unspecs.


> 	(REPLACE_ELT): New mode iterator.
> 	(REPLACE_ELT_atr, REPLACE_ELT_inst, REPLACE_ELT_char,
> 	REPLACE_ELT_sh, REPLACE_ELT_max): New mode attributes.


_atr and _inst are no longer in the patch.  


> 	(vreplace_un_<mode>, vreplace_elt_<mode>_inst): New.



> 	* config/rs6000/rs6000-builtin.def (VREPLACE_ELT_V4SI,
> 	VREPLACE_ELT_UV4SI, VREPLACE_ELT_V4SF, VREPLACE_ELT_UV2DI,
> 	VREPLACE_ELT_V2DF, VREPLACE_UN_V4SI, VREPLACE_UN_UV4SI,
> 	VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, VREPLACE_UN_UV2DI,
> 	VREPLACE_UN_V2DF, (REPLACE_ELT, REPLACE_UN): New.

New builtin entries.

VREPLACE_ELT_V2DI is missing from list.


> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_REPLACE_ELT,
> 	P10_BUILTIN_VEC_REPLACE_UN): New.

New what?

> 	(rs6000_expand_ternop_builtin): Add 3rd argument checks for

diff suggests this is in rs6000_expand_quaternop_builtin()  ? 


> 	CODE_FOR_vreplace_elt_v4si, CODE_FOR_vreplace_elt_v4sf,
> 	CODE_FOR_vreplace_un_v4si, CODE_FOR_vreplace_un_v4sf.
> 	(builtin_function_type) [P10_BUILTIN_VREPLACE_ELT_UV4SI,
> 	P10_BUILTIN_VREPLACE_ELT_UV2DI, P10_BUILTIN_VREPLACE_UN_UV4SI,
> 	P10_BUILTIN_VREPLACE_UN_UV2DI]: New cases.



> 	* doc/extend.texi: Add description for vec_replace_elt and
> 	vec_replace_unaligned builtins.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-07-06 Carl Love  <cel@us.ibm.com>
> 
> 	* gcc.target/powerpc/vec-replace-word.c: Add new test.


s/Add new/New/

Nothing else jumped out at me below.
<snip>

Thanks
-Will




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:59 Carl Love
@ 2020-07-09 15:44 ` will schmidt
  2020-07-13 12:04 ` Segher Boessenkool
  1 sibling, 0 replies; 18+ messages in thread
From: will schmidt @ 2020-07-09 15:44 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 2/6] rs6000 Add vector insert builtin support
> 
> ------------------------------------
> V4 changes
>   Rebased on mainline.  Changed FUTURE to P10 as needed.
> 
> ------------------------------------
> V3 changes
> 
>   Replace spaces with of tabs in ChangeLog
>   Ditto in gcc/config/rs6000/vsx.md.
>   Updated description for vec_insertl() builtin.
>   Cleaned up vec_insert description.
> 
> -----------------------------------------------------------------
> v2 changes
> 
> Fix change log entry for config/rs6000/altivec.h
> 
> Fix change log entry for config/rs6000/rs6000-builtin.def
> 
> Fix change log entry for config/rs6000/rs6000-call.c
> 
> vsx.md: Fixed if (BYTES_BIG_ENDIAN) else statements.
> Porting error from pu branch.
> 
> ---------------------------------------------------------------
> GCC maintainers:
> 
> This patch adds support for vec_insertl and vec_inserth builtins.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> and mambo with no regression errors.
> 
> Please let me know if this patch is acceptable for the mainline branch.
> 
> Thanks.
> 
>                          Carl Love
> 
> --------------------------------------------------------------
> gcc/ChangeLog
> 
> 2020-07-02  Carl Love  <cel@us.ibm.com>
> 
> 	* config/rs6000/altivec.h (vec_insertl, vec_inserth): New defines.
> 	* config/rs6000/rs6000-builtin.def (VINSERTGPRBL, VINSERTGPRHL,
> 	VINSERTGPRWL, VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL,
> 	VINSERTGPRBR, VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR,
> 	VINSERTVPRHR, VINSERTVPRWR): New builtins.
> 	(INSERTL, INSERTH): New builtins.
> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_INSERTL,
> 	P10_BUILTIN_VEC_INSERTH): New overloaded definitions.
> 	(P10_BUILTIN_VINSERTGPRBL, P10_BUILTIN_VINSERTGPRHL,
> 	P10_BUILTIN_VINSERTGPRWL, P10_BUILTIN_VINSERTGPRDL,
> 	P10_BUILTIN_VINSERTVPRBL, P10_BUILTIN_VINSERTVPRHL,
> 	P10_BUILTIN_VINSERTVPRWL): Add case entries.
> 	* config/rs6000/vsx.md (define_c_enum): Add UNSPEC_INSERTL,
> 	UNSPEC_INSERTR.
> 	(define_expand): Add vinsertvl_<mode>, vinsertvr_<mode>,
> 	vinsertgl_<mode>, vinsertgr_<mode>, mode is VI2.
> 	(define_ins): vinsertvl_internal_<mode>, vinsertvr_internal_<mode>,
> 	vinsertgl_internal_<mode>, vinsertgr_internal_<mode>, mode VEC_I.
> 	* doc/extend.texi: Add documentation for vec_insertl, vec_inserth.
> 

ok

> gcc/testsuite/ChangeLog
> 
> 2020-07-02  Carl Love  <cel@us.ibm.com>
> 
> 	* gcc.target/powerpc/vec-insert-word-runnable.c: New test case.
> ---
>  gcc/config/rs6000/altivec.h                   |   2 +
>  gcc/config/rs6000/rs6000-builtin.def          |  18 +
>  gcc/config/rs6000/rs6000-call.c               |  51 +++
>  gcc/config/rs6000/vsx.md                      | 110 ++++++
>  gcc/doc/extend.texi                           |  71 ++++
>  .../powerpc/vec-insert-word-runnable.c        | 345 ++++++++++++++++++
>  6 files changed, 597 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index bb1524f4a67..0563853c03f 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -699,6 +699,8 @@ __altivec_scalar_pred(vec_any_nle,
>  /* Overloaded built-in functions for ISA 3.1.  */
>  #define vec_extractl(a, b, c)	__builtin_vec_extractl (a, b, c)
>  #define vec_extracth(a, b, c)	__builtin_vec_extracth (a, b, c)
> +#define vec_insertl(a, b, c)   __builtin_vec_insertl (a, b, c)
> +#define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
> 
>  #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
>  #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index 363656ec05c..e73d144c1cc 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2708,6 +2708,22 @@ BU_P10V_3 (VEXTRACTHR, "vextduhvhx", CONST, vextractrv8hi)
>  BU_P10V_3 (VEXTRACTWR, "vextduwvhx", CONST, vextractrv4si)
>  BU_P10V_3 (VEXTRACTDR, "vextddvhx", CONST, vextractrv2di)
> 
> +BU_P10V_3 (VINSERTGPRBL, "vinsgubvlx", CONST, vinsertgl_v16qi)
> +BU_P10V_3 (VINSERTGPRHL, "vinsguhvlx", CONST, vinsertgl_v8hi)
> +BU_P10V_3 (VINSERTGPRWL, "vinsguwvlx", CONST, vinsertgl_v4si)
> +BU_P10V_3 (VINSERTGPRDL, "vinsgudvlx", CONST, vinsertgl_v2di)
> +BU_P10V_3 (VINSERTVPRBL, "vinsvubvlx", CONST, vinsertvl_v16qi)
> +BU_P10V_3 (VINSERTVPRHL, "vinsvuhvlx", CONST, vinsertvl_v8hi)
> +BU_P10V_3 (VINSERTVPRWL, "vinsvuwvlx", CONST, vinsertvl_v4si)
> +
> +BU_P10V_3 (VINSERTGPRBR, "vinsgubvrx", CONST, vinsertgr_v16qi)
> +BU_P10V_3 (VINSERTGPRHR, "vinsguhvrx", CONST, vinsertgr_v8hi)
> +BU_P10V_3 (VINSERTGPRWR, "vinsguwvrx", CONST, vinsertgr_v4si)
> +BU_P10V_3 (VINSERTGPRDR, "vinsgudvrx", CONST, vinsertgr_v2di)
> +BU_P10V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi)
> +BU_P10V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi)
> +BU_P10V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si)
> +
>  BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
>  BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
>  BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
> @@ -2727,6 +2743,8 @@ BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
> 
>  BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
>  BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
> +BU_P10_OVERLOAD_3 (INSERTL, "insertl")
> +BU_P10_OVERLOAD_3 (INSERTH, "inserth")
> 
>  BU_P10_OVERLOAD_1 (VSTRIR, "strir")
>  BU_P10_OVERLOAD_1 (VSTRIL, "stril")

ok

> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index d3cf2de8878..820b361c0f6 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5576,6 +5576,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
> 
> +  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRBL,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI },
> +  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRHL,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI },
> +  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRWL,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI },
> +  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRDL,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI },
> + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRBL,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRHL,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRWL,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
> +
>    { P10_BUILTIN_VEC_EXTRACTH, P10_BUILTIN_VEXTRACTBR,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
> @@ -5589,6 +5611,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
> 
> +  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRBR,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI },
> +  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRHR,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI },
> +  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRWR,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI },
> +  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRDR,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI },
> +  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRBR,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRHR,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
> +  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRWR,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
> +
>    { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
>    { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
> @@ -13788,6 +13832,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case P10_BUILTIN_VEXTRACTHR:
>      case P10_BUILTIN_VEXTRACTWR:
>      case P10_BUILTIN_VEXTRACTDR:
> +    case P10_BUILTIN_VINSERTGPRBL:
> +    case P10_BUILTIN_VINSERTGPRHL:
> +    case P10_BUILTIN_VINSERTGPRWL:
> +    case P10_BUILTIN_VINSERTGPRDL:
> +    case P10_BUILTIN_VINSERTVPRBL:
> +    case P10_BUILTIN_VINSERTVPRHL:
> +    case P10_BUILTIN_VINSERTVPRWL:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;

ok

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e9f89d43b3f..e9d45d1dcfd 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -349,6 +349,8 @@
>     UNSPEC_XXGENPCV
>     UNSPEC_EXTRACTL
>     UNSPEC_EXTRACTR
> +   UNSPEC_INSERTL
> +   UNSPEC_INSERTR
>    ])
> 
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
> @@ -3865,6 +3867,114 @@
>    "vext<du_or_d><wd>vrx %0,%1,%2,%3"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_expand "vinsertvl_<mode>"
> +  [(set (match_operand:VI2 0 "altivec_register_operand")
> +	(unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand")
> +		     (match_operand:VI2 2 "altivec_register_operand")
> +		     (match_operand:SI 3 "register_operand" "r")]
> +		    UNSPEC_INSERTL))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +     emit_insn (gen_vinsertvl_internal_<mode> (operands[0], operands[3],
> +                                               operands[1], operands[2]));
> +   else
> +     emit_insn (gen_vinsertvr_internal_<mode> (operands[0], operands[3],
> +                                               operands[1], operands[2]));
> +   DONE;
> +})
> +
> +(define_insn "vinsertvl_internal_<mode>"
> +  [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
> +	(unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
> +		       (match_operand:VEC_I 2 "altivec_register_operand" "v")
> +		       (match_operand:VEC_I 3 "altivec_register_operand" "0")]
> +		      UNSPEC_INSERTL))]
> +  "TARGET_POWER10"
> +  "vins<wd>vlx %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_expand "vinsertvr_<mode>"
> +  [(set (match_operand:VI2 0 "altivec_register_operand")
> +	(unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand")
> +		     (match_operand:VI2 2 "altivec_register_operand")
> +		     (match_operand:SI 3 "register_operand" "r")]
> +		    UNSPEC_INSERTR))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +     emit_insn (gen_vinsertvr_internal_<mode> (operands[0], operands[3],
> +                                               operands[1], operands[2]));
> +   else
> +     emit_insn (gen_vinsertvl_internal_<mode> (operands[0], operands[3],
> +                                               operands[1], operands[2]));
> +   DONE;
> +})
> +
> +(define_insn "vinsertvr_internal_<mode>"
> +  [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
> +	(unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
> +		       (match_operand:VEC_I 2 "altivec_register_operand" "v")
> +		       (match_operand:VEC_I 3 "altivec_register_operand" "0")]
> +		      UNSPEC_INSERTR))]
> +  "TARGET_POWER10"
> +  "vins<wd>vrx %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_expand "vinsertgl_<mode>"
> +  [(set (match_operand:VI2 0 "altivec_register_operand")
> +	(unspec:VI2 [(match_operand:SI 1 "register_operand")
> +		     (match_operand:VI2 2 "altivec_register_operand")
> +		     (match_operand:SI 3 "register_operand")]
> +		    UNSPEC_INSERTL))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_vinsertgl_internal_<mode> (operands[0], operands[3],
> +                                            operands[1], operands[2]));
> +  else
> +    emit_insn (gen_vinsertgr_internal_<mode> (operands[0], operands[3],
> +                                            operands[1], operands[2]));
> +  DONE;
> + })
> +
> +(define_insn "vinsertgl_internal_<mode>"
> + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
> +       (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
> +		      (match_operand:SI 2 "register_operand" "r")
> +		      (match_operand:VEC_I 3 "altivec_register_operand" "0")]
> +		     UNSPEC_INSERTL))]
> + "TARGET_POWER10"
> + "vins<wd>lx %0,%1,%2"
> + [(set_attr "type" "vecsimple")])
> +
> +(define_expand "vinsertgr_<mode>"
> +  [(set (match_operand:VI2 0 "altivec_register_operand")
> +	(unspec:VI2 [(match_operand:SI 1 "register_operand")
> +		     (match_operand:VI2 2 "altivec_register_operand")
> +		     (match_operand:SI 3 "register_operand")]
> +		    UNSPEC_INSERTR))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_vinsertgr_internal_<mode> (operands[0], operands[3],
> +                                            operands[1], operands[2]));
> +  else
> +    emit_insn (gen_vinsertgl_internal_<mode> (operands[0], operands[3],
> +                                            operands[1], operands[2]));
> +  DONE;
> + })
> +
> +(define_insn "vinsertgr_internal_<mode>"
> + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
> +   (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
> +		  (match_operand:SI 2 "register_operand" "r")
> +		  (match_operand:VEC_I 3 "altivec_register_operand" "0")]
> +		 UNSPEC_INSERTR))]
> + "TARGET_POWER10"
> + "vins<wd>rx %0,%1,%2"
> + [(set_attr "type" "vecsimple")])
> +
>  ;; VSX_EXTRACT optimizations
>  ;; Optimize double d = (double) vec_extract (vi, <n>)
>  ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 0e65d542587..e643346a160 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -20991,6 +20991,77 @@ Perform a vector parallel bits deposit operation, as if implemented by
>  the @code{vpdepd} instruction.
>  @findex vec_pdep
> 
> +Vector Insert
> +
> +@smallexample
> +@exdent vector unsigned char
> +@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int);
> +@exdent vector unsigned short
> +@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int);
> +@exdent vector unsigned int
> +@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int);
> +@exdent vector unsigned long long
> +@exdent vec_insertl (unsigned long long, vector unsigned long long,
> +unsigned int);
> +@exdent vector unsigned char
> +@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int;
> +@exdent vector unsigned short
> +@exdent vec_insertl (vector unsigned short, vector unsigned short,
> +unsigned int);
> +@exdent vector unsigned int
> +@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int);
> +@end smallexample
> +
> +Let src be the first argument, when the first argument is a scalar, or the
> +rightmost element of the left doubleword of the first argument, when the first
> +argument is a vector.  Insert the source into the destination at the position
> +given by the third argument, using natural element order in the second
> +argument.  The rest of the second argument is unchanged.  If the byte
> +index is greater than 14 for halfwords, greatere than 12 for words, or

greatere

> +greater than 8 for doublewords the result is undefined.   For little-endian,
> +the generated code will be semantically equivalent to vinsbrx, vinshrx,
> +or vinswrx instructions.  Similarly for big-endian it will be semantically

wrap those in @code

> +equivalent to vinsblx, vinshlx, vinswlx.  Note that some
> +fairly anomalous results can be generated if the byte index is not aligned
> +on an element boundary for the sort of element being inserted. This is a

s/sort/type/ ? 

> +limitation of the bi-endian vector programming model.

Not sure the limitation statemt is usefulfor the description of the
builtin.


> +@findex vec_insertl
> +
> +@smallexample
> +@exdent vector unsigned char
> +@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int);
> +@exdent vector unsigned short
> +@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int);
> +@exdent vector unsigned int
> +@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int);
> +@exdent vector unsigned long long
> +@exdent vec_inserth (unsigned long long, vector unsigned long long,
> +unsigned int);
> +@exdent vector unsigned char
> +@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int);
> +@exdent vector unsigned short
> +@exdent vec_inserth (vector unsigned short, vector unsigned short,
> +unsigned int);
> +@exdent vector unsigned int
> +@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int);
> +@end smallexample
> +
> +Let src be the first argument, when the first argument is a scalar, or the
> +rightmost element of the first argument, when the first argument is a vector.
> +Insert src into the second argument at the position identified by the third
> +argument, using opposite element order in the second argument, and leaving the
> +rest of the second argument unchanged.  If the byte index is greater than 14
> +for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be
> +rejected. Note that the underlying hardware instruction uses the same register
> +for the second argument and the result, but this is hidden by the built-in.

If it's hidden, it probably doesn't need to be discussed here.  (A
comment on the builtin implementation would be appropriate).

> +For little-endian, the code generation will be semantically equivalent to
> +vins*lx, while for big-endian it will be semantically equivalent to vins*rx.

wrap in @code{}

> +Note that some fairly anomalous results can be generated if the byte index is
> +not aligned on an element boundary for the sort of element being inserted.
> +This is a limitation of the bi-endian vector programming model consistent with
> +the limitation on vec_perm, for example.
> +@findex vec_inserth
> +
>  @smallexample
>  @exdent vector unsigned long long int
>  @exdent vec_pext (vector unsigned long long int, vector unsigned long long int)
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c
> new file mode 100644
> index 00000000000..8c2721aedfc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c


<snip>
ok.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
  2020-07-08 19:58 Carl Love
@ 2020-07-09 15:31 ` will schmidt
  0 siblings, 0 replies; 18+ messages in thread
From: will schmidt @ 2020-07-09 15:31 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Wed, 2020-07-08 at 12:58 -0700, Carl Love wrote:
> [PATCH 1/6] rs6000, Update support for vec_extract



Email subject needs to be updated too.  This is at least correct in-
line.  Here and subsequent messages in thread.


> 
> -------------------------
> V4 changes
> 	rebased onto mainline 7/2/2020
> 	Add iterator name to Change log
> 
> -------------------------------
> V3 changes
> 
>   Redo ChangeLog for code move.
>   Replace spaces with tabs in ChangeLog.
>   Replaced intruction names using * with the actual list of names.  For
> 	example vextdu*vrx with the explicit instruction names vextdubvrx,
> 	vextduhvrx, etc.
> -------------------------
> v2 changes
> 
> config/rs6000/altivec.md log entry for move from changed as suggested.
> 
> config/rs6000/vsx.md log entro for moved to here changed as suggested.
> 
> define_mode_iterator VI2 also moved, included in both change log entries
> 
> --------------------------------------------
> GCC maintainers:
> 
> Move the existing vector extract support in altivec.md to vsx.md
> so all of the vector insert and extract support is in the same file.
> 
> The patch also updates the name of the builtins and descriptions for the
> builtins in the documentation file so they match the approved builtin
> names and descriptions.
> 
> The patch does not make any functional changes.
> 
> Please let me know if the changes are acceptable for mainline.  Thanks.
> 
>                   Carl Love
> 
> ------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-07-06  Carl Love  <cel@us.ibm.com>
> 
> 	* config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
> 	(vextractl<mode>, vextractr<mode>)
> 	(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
> 	(VI2): Move to ...
> 	* config/rs6000/vsx.md:	(UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
> 	(vextractl<mode>, vextractr<mode>)
> 	(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
> 	(VI2):  ..here.
> 	* gcc/doc/extend.texi: Update documentation for vec_extractl.
> 	Replace builtin name vec_extractr with vec_extracth.  Update description
> 	of vec_extracth.
> ---
>  gcc/config/rs6000/altivec.md | 64 -----------------------------
>  gcc/config/rs6000/vsx.md     | 66 ++++++++++++++++++++++++++++++
>  gcc/doc/extend.texi          | 78 ++++++++++++++++++------------------
>  3 files changed, 105 insertions(+), 103 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2ce9227c765..749b2c42c14 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -172,8 +172,6 @@
>     UNSPEC_XXEVAL
>     UNSPEC_VSTRIR
>     UNSPEC_VSTRIL
> -   UNSPEC_EXTRACTL
> -   UNSPEC_EXTRACTR
>  ])
> 
>  (define_c_enum "unspecv"
> @@ -184,8 +182,6 @@
>     UNSPECV_DSS
>    ])
> 
> -;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
> -(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
>  ;; Longer vec int modes for rotate/mask ops
> @@ -786,66 +782,6 @@
>    DONE;
>  })
> 
> -(define_expand "vextractl<mode>"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand")
> -	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> -		      (match_operand:VI2 2 "altivec_register_operand")
> -		      (match_operand:SI 3 "register_operand")]
> -		     UNSPEC_EXTRACTL))]
> -  "TARGET_POWER10"
> -{
> -  if (BYTES_BIG_ENDIAN)
> -    {
> -      emit_insn (gen_vextractl<mode>_internal (operands[0], operands[1],
> -					       operands[2], operands[3]));
> -      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> -    }
> -  else
> -    emit_insn (gen_vextractr<mode>_internal (operands[0], operands[2],
> -					     operands[1], operands[3]));
> -  DONE;
> -})
> -
> -(define_insn "vextractl<mode>_internal"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> -	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> -		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> -		      (match_operand:SI 3 "register_operand" "r")]
> -		     UNSPEC_EXTRACTL))]
> -  "TARGET_POWER10"
> -  "vext<du_or_d><wd>vlx %0,%1,%2,%3"
> -  [(set_attr "type" "vecsimple")])
> -
> -(define_expand "vextractr<mode>"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand")
> -	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> -		      (match_operand:VI2 2 "altivec_register_operand")
> -		      (match_operand:SI 3 "register_operand")]
> -		     UNSPEC_EXTRACTR))]
> -  "TARGET_POWER10"
> -{
> -  if (BYTES_BIG_ENDIAN)
> -    {
> -      emit_insn (gen_vextractr<mode>_internal (operands[0], operands[1],
> -					       operands[2], operands[3]));
> -      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> -    }
> -  else
> -    emit_insn (gen_vextractl<mode>_internal (operands[0], operands[2],
> -    					     operands[1], operands[3]));
> -  DONE;
> -})
> -
> -(define_insn "vextractr<mode>_internal"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> -	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> -		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> -		      (match_operand:SI 3 "register_operand" "r")]
> -		     UNSPEC_EXTRACTR))]
> -  "TARGET_POWER10"
> -  "vext<du_or_d><wd>vrx %0,%1,%2,%3"
> -  [(set_attr "type" "vecsimple")])
> -
>  (define_expand "vstrir_<mode>"
>    [(set (match_operand:VIshort 0 "altivec_register_operand")
>  	(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 732a54842b6..e9f89d43b3f 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -347,6 +347,8 @@
>     UNSPEC_VSX_FIRST_MISMATCH_INDEX
>     UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX
>     UNSPEC_XXGENPCV
> +   UNSPEC_EXTRACTL
> +   UNSPEC_EXTRACTR
>    ])
> 
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
> @@ -355,6 +357,9 @@
>  (define_int_attr xvcvbf16       [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
>  				 (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
> 
> +;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
> +(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
> +
>  ;; VSX moves
> 
>  ;; The patterns for LE permuted loads and stores come before the general
> @@ -3799,6 +3804,67 @@
>  }
>    [(set_attr "type" "load")])
> 
> +;; ISA 3.1 extract
> +(define_expand "vextractl<mode>"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
> +	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> +		      (match_operand:VI2 2 "altivec_register_operand")
> +		      (match_operand:SI 3 "register_operand")]
> +		     UNSPEC_EXTRACTL))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +      emit_insn (gen_vextractl<mode>_internal (operands[0], operands[1],
> +					       operands[2], operands[3]));
> +      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> +    }
> +  else
> +    emit_insn (gen_vextractr<mode>_internal (operands[0], operands[2],
> +					     operands[1], operands[3]));
> +  DONE;
> +})
> +
> +(define_insn "vextractl<mode>_internal"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> +	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> +		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> +		      (match_operand:SI 3 "register_operand" "r")]
> +		     UNSPEC_EXTRACTL))]
> +  "TARGET_POWER10"
> +  "vext<du_or_d><wd>vlx %0,%1,%2,%3"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_expand "vextractr<mode>"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
> +	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> +		      (match_operand:VI2 2 "altivec_register_operand")
> +		      (match_operand:SI 3 "register_operand")]
> +		     UNSPEC_EXTRACTR))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +      emit_insn (gen_vextractr<mode>_internal (operands[0], operands[1],
> +					       operands[2], operands[3]));
> +      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> +    }
> +  else
> +    emit_insn (gen_vextractl<mode>_internal (operands[0], operands[2],
> +					     operands[1], operands[3]));
> +  DONE;
> +})
> +
> +(define_insn "vextractr<mode>_internal"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> +	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> +		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> +		      (match_operand:SI 3 "register_operand" "r")]
> +		     UNSPEC_EXTRACTR))]
> +  "TARGET_POWER10"
> +  "vext<du_or_d><wd>vrx %0,%1,%2,%3"
> +  [(set_attr "type" "vecsimple")])
> +
>  ;; VSX_EXTRACT optimizations
>  ;; Optimize double d = (double) vec_extract (vi, <n>)
>  ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index ecd3661d257..0e65d542587 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -20927,6 +20927,9 @@ Perform a 128-bit vector gather  operation, as if implemented by the
>  integer value between 2 and 7 inclusive.
>  @findex vec_gnb
> 
> +
> +Vector Extract
> +
>  @smallexample
>  @exdent vector unsigned long long int
>  @exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int)
> @@ -20937,52 +20940,49 @@ integer value between 2 and 7 inclusive.
>  @exdent vector unsigned long long int
>  @exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int)
>  @end smallexample
> -Extract a single element from the vector formed by catenating this function's
> -first two arguments at the byte offset specified by this function's
> -third argument.  On big-endian targets, this function behaves as if
> -implemented by the @code{vextdubvlx}, @code{vextduhvlx},
> -@code{vextduwvlx}, or @code{vextddvlx} instructions, depending on the
> -types of the function's first two arguments.  On little-endian
> -targets, this function behaves as if implemented by the
> -@code{vextdubvrx}, @code{vextduhvrx},
> -@code{vextduwvrx}, or @code{vextddvrx} instructions.
> -The byte offset of the element to be extracted is calculated
> -by computing the remainder of dividing the third argument by 32.
> -If this reminader value is not a multiple of the vector element size,
> -or if its value added to the vector element size exceeds 32, the
> -result is undefined.
> +Extract an element from two concatenated vectors starting at the given byte index
> +in natural-endian order, and place it zero-extended in doubleword 1 of the result
> +according to natural element order.  If the byte index is out of range for the
> +data type, the intrinsic will be rejected.
> +For little-endian, this output will match the placement by the hardware
> +instruction, i.e., dword[0] in RTL notation.  For big-endian, an additional
> +instruction is needed to move it from the "left" doubleword to the  "right" one.
> +For little-endian, semantics matching the vextdubvrx, vextduhvrx,
> +vextduwvrx instruction will be generated, while for big-endian, semantics
> +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions
> +will be generated.  Note that some fairly anomalous results can be generated if
> +the byte index is not aligned on an element boundary for the element being
> +extracted.  This is a limitation of the bi-endian vector programming model is
> +consistent with the limitation on vec_perm, for example.
>  @findex vec_extractl
> 
>  @smallexample
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned char, vector unsigned char, unsigned int)
> +@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int)
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned short, vector unsigned short, unsigned int)
> +@exdent vec_extracth (vector unsigned short, vector unsigned short,
> +unsigned int)
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned int, vector unsigned int, unsigned int)
> +@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int)
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned long long, vector unsigned long long, unsigned int)
> -@end smallexample
> -Extract a single element from the vector formed by catenating this function's
> -first two arguments at the byte offset calculated by subtracting this
> -function's third argument from 31.  On big-endian targets, this
> -function behaves as if
> -implemented by the
> -@code{vextdubvrx}, @code{vextduhvrx},
> -@code{vextduwvrx}, or @code{vextddvrx} instructions, depending on the
> -types of the function's first two arguments.
> -On little-endian
> -targets, this function behaves as if implemented by the
> -@code{vextdubvlx}, @code{vextduhvlx},
> -@code{vextduwvlx}, or @code{vextddvlx} instructions.
> -The byte offset of the element to be extracted, measured from the
> -right end of the catenation of the two vector arguments, is calculated
> -by computing the remainder of dividing the third argument by 32.
> -If this reminader value is not a multiple of the vector element size,
> -or if its value added to the vector element size exceeds 32, the
> -result is undefined.
> -@findex vec_extractr
> -
> +@exdent vec_extracth (vector unsigned long long, vector unsigned long long,
> +unsigned int)
> +@end smallexample
> +Extract an element from two concatenated vectors starting at the given byte
> +index in opposite-endian order, and place it zero-extended in doubleword 1

opposite-endian ? 

> +according to natural element order.  If the byte index is out of range for the
> +data type, the intrinsic will be rejected.  For little-endian, this output
> +will match the placement by the hardware instruction, i.e., dword[0] in RTL

Should the 'hardware instruction' be replaced with the instruction
reference itself? 

> +notation.  For big-endian, an additional instruction is needed to move it
> +from the "left" doubleword to the "right" one.  For little-endian, semantics
> +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions will be generated,

Should wrap the instruction references in @code{}


> +while for big-endian, semantics matching the vextdubvrx, vextduhvrx,
> +vextduwvrx instructions will be generated.  Note that some fairly anomalous
> +results can be generated if the byte index is not aligned on the
> +element boundary for the element being extracted.  This is a
> +limitation of the bi-endian vector programming model consistent with the
> +limitation on vec_perm, for example.

This reads akwardly.   maybe  s/for example//  ?

wrap vec_perm reference in @code{}


> +@findex vec_extracth
>  @smallexample
>  @exdent vector unsigned long long int
>  @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
@ 2020-07-08 19:59 Carl Love
  2020-07-09 18:28 ` will schmidt
  0 siblings, 1 reply; 18+ messages in thread
From: Carl Love @ 2020-07-08 19:59 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

[PATCH 6/6] rs6000 Add vector blend, permute builtin support

----------------------------------
V4 Fixes:

   Rebased on mainline.  Changed FUTURE to P10.
---------

v3 fixes:
   Replace spaces with tabs in ChangeLog description.
   Fix implementation comments for define_expand "xxpermx" in file
     gcc/config/rs6000/alitvec.md.
   Fix minor typos in the comments for the changes in gcc/config/rs6000/rs6000-call.c.

--------------------
v2 changes:

   Updated ChangeLog per comments.

   Updated implementation of the define_expand "xxpermx".

   Fixed the comments and check for 3-bit immediate field for the
	CODE_FOR_xxpermx check.

   gcc/doc/extend.texi:
	comment "Maybe it should say it is related to vsel/xxsel, but per
	bigger element?", added comment.  I took the description directly
	from spec.  Don't really don't want to mess with the approved
	description.

       fixed typo for Vector Permute Extendedextracth

----------

GCC maintainers:

The following patch adds support for the vec_blendv and vec_permx
builtins.

The patch has been compiled and tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

The test cases were compiled on a Power 9 system and then tested on
Mambo.

                         Carl Love

---------------------------------------------------------------
rs6000 RFC2609 vector blend, permute instructions

gcc/ChangeLog

2020-07-06  Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.h (vec_blendv, vec_permx): Add define.
	* config/rs6000/altivec.md (UNSPEC_XXBLEND, UNSPEC_XXPERMX.): New
	unspecs.
	(VM3): New define_mode.
	(VM3_char): New define_attr.
	(xxblend_<mode> mode VM3): New define_insn.
	(xxpermx): New define_expand.
	(xxpermx_inst): New define_insn.
	* config/rs6000/rs6000-builtin.def (VXXBLEND_V16QI, VXXBLEND_V8HI,
	VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF): New
	BU_P10V_3 definitions.
	(XXBLENDBU_P10_OVERLOAD_3): New BU_P10_OVERLOAD_3 definition.
	(XXPERMX): New BU_P10_OVERLOAD_4 definition.
	* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
	(P10_BUILTIN_VXXPERMX): Add if case support.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VXXBLEND_V16QI,
	P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI,
	P10_BUILTIN_VXXBLEND_V2DI, P10_BUILTIN_VXXBLEND_V4SF,
	P10_BUILTIN_VXXBLEND_V2DF, P10_BUILTIN_VXXPERMX): Define
	overloaded arguments.
	(rs6000_expand_quaternop_builtin): Add if case for CODE_FOR_xxpermx.
	(builtin_quaternary_function_type): Add v16uqi_type and xxpermx_type
	variables, add case statement for P10_BUILTIN_VXXPERMX.
	(builtin_function_type)[P10_BUILTIN_VXXBLEND_V16QI,
	P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI,
	P10_BUILTIN_VXXBLEND_V2DI]: Add case statements.
	* doc/extend.texi: Add documentation for vec_blendv and vec_permx.

gcc/testsuite/ChangeLog

2020-07-06  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/vec-blend-runnable.c: New test.
	gcc.target/powerpc/vec-permute-ext-runnable.c: New test.
---
 gcc/config/rs6000/altivec.h                   |   2 +
 gcc/config/rs6000/altivec.md                  |  71 +++++
 gcc/config/rs6000/rs6000-builtin.def          |  13 +
 gcc/config/rs6000/rs6000-c.c                  |  27 +-
 gcc/config/rs6000/rs6000-call.c               |  95 ++++++
 gcc/doc/extend.texi                           |  63 ++++
 .../gcc.target/powerpc/vec-blend-runnable.c   | 276 ++++++++++++++++
 .../powerpc/vec-permute-ext-runnable.c        | 294 ++++++++++++++++++
 8 files changed, 835 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 126409c168b..e8fdeb31b0b 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -708,6 +708,8 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_splati(a)  __builtin_vec_xxspltiw (a)
 #define vec_splatid(a) __builtin_vec_xxspltid (a)
 #define vec_splati_ins(a, b, c)        __builtin_vec_xxsplti32dx (a, b, c)
+#define vec_blendv(a, b, c)    __builtin_vec_xxblend (a, b, c)
+#define vec_permx(a, b, c, d)  __builtin_vec_xxpermx (a, b, c, d)
 
 #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
 #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index f6858b5bf2a..226cf121f12 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -177,6 +177,8 @@
    UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
+   UNSPEC_XXBLEND
+   UNSPEC_XXPERMX
 ])
 
 (define_c_enum "unspecv"
@@ -219,6 +221,21 @@
 			   (KF "FLOAT128_VECTOR_P (KFmode)")
 			   (TF "FLOAT128_VECTOR_P (TFmode)")])
 
+;; Like VM2, just do char, short, int, long, float and double
+(define_mode_iterator VM3 [V4SI
+			   V8HI
+			   V16QI
+			   V4SF
+			   V2DF
+			   V2DI])
+
+(define_mode_attr VM3_char [(V2DI "d")
+			   (V4SI "w")
+			   (V8HI "h")
+			   (V16QI "b")
+			   (V2DF  "d")
+			   (V4SF  "w")])
+
 ;; Map the Vector convert single precision to double precision for integer
 ;; versus floating point
 (define_mode_attr VS_sxwsp [(V4SI "sxw") (V4SF "sp")])
@@ -916,6 +933,60 @@
   "xxsplti32dx %x0,%2,%3"
    [(set_attr "type" "vecsimple")])
 
+(define_insn "xxblend_<mode>"
+  [(set (match_operand:VM3 0 "register_operand" "=wa")
+	(unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa")
+		     (match_operand:VM3 2 "register_operand" "wa")
+		     (match_operand:VM3 3 "register_operand" "wa")]
+		    UNSPEC_XXBLEND))]
+  "TARGET_POWER10"
+  "xxblendv<VM3_char> %x0,%x1,%x2,%x3"
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "xxpermx"
+  [(set (match_operand:V2DI 0 "register_operand" "+wa")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "wa")
+		      (match_operand:V2DI 2 "register_operand" "wa")
+		      (match_operand:V16QI 3 "register_operand" "wa")
+		      (match_operand:QI 4 "u8bit_cint_operand" "n")]
+		     UNSPEC_XXPERMX))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_xxpermx_inst (operands[0], operands[1],
+				 operands[2], operands[3],
+				 operands[4]));
+  else
+    {
+      /* Reverse value of byte element indexes by XORing with 0xFF.
+	 Reverse the 32-byte section identifier match by subracting bits [0:2]
+	 of elemet from 7.  */
+      int value = INTVAL (operands[4]);
+      rtx vreg = gen_reg_rtx (V16QImode);
+
+      emit_insn (gen_xxspltib_v16qi (vreg, GEN_INT (-1)));
+      emit_insn (gen_xorv16qi3 (operands[3], operands[3], vreg));
+      value = 7 - value;
+      emit_insn (gen_xxpermx_inst (operands[0], operands[2],
+				   operands[1], operands[3],
+				   GEN_INT (value)));
+    }
+
+  DONE;
+}
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "xxpermx_inst"
+  [(set (match_operand:V2DI 0 "register_operand" "+v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")
+		      (match_operand:V2DI 2 "register_operand" "v")
+		      (match_operand:V16QI 3 "register_operand" "v")
+		      (match_operand:QI 4 "u3bit_cint_operand" "n")]
+		     UNSPEC_XXPERMX))]
+  "TARGET_POWER10"
+  "xxpermx %x0,%x1,%x2,%x3,%4"
+  [(set_attr "type" "vecsimple")])
+
 (define_expand "vstrir_<mode>"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
 	(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index ddfe287efc8..3d45354c573 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2756,6 +2756,15 @@ BU_P10V_1 (VXXSPLTID, "vxxspltidp", CONST, xxspltidp_v2df)
 BU_P10V_3 (VXXSPLTI32DX_V4SI, "vxxsplti32dx_v4si", CONST, xxsplti32dx_v4si)
 BU_P10V_3 (VXXSPLTI32DX_V4SF, "vxxsplti32dx_v4sf", CONST, xxsplti32dx_v4sf)
 
+BU_P10V_3 (VXXBLEND_V16QI, "xxblend_v16qi", CONST, xxblend_v16qi)
+BU_P10V_3 (VXXBLEND_V8HI, "xxblend_v8hi", CONST, xxblend_v8hi)
+BU_P10V_3 (VXXBLEND_V4SI, "xxblend_v4si", CONST, xxblend_v4si)
+BU_P10V_3 (VXXBLEND_V2DI, "xxblend_v2di", CONST, xxblend_v2di)
+BU_P10V_3 (VXXBLEND_V4SF, "xxblend_v4sf", CONST, xxblend_v4sf)
+BU_P10V_3 (VXXBLEND_V2DF, "xxblend_v2df", CONST, xxblend_v2df)
+
+BU_P10V_4 (VXXPERMX, "xxpermx", CONST, xxpermx)
+
 BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
 BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
 BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
@@ -2791,6 +2800,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XXSPLTIW, "xxspltiw")
 BU_P10_OVERLOAD_1 (XXSPLTID, "xxspltid")
 BU_P10_OVERLOAD_3 (XXSPLTI32DX, "xxsplti32dx")
+
+BU_P10_OVERLOAD_3 (XXBLEND, "xxblend")
+BU_P10_OVERLOAD_4 (XXPERMX, "xxpermx")
+
 \f
 /* 1 argument crypto functions.  */
 BU_CRYPTO_1 (VSBOX,		"vsbox",	  CONST, crypto_vsbox_v2di)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index cb7d34dcdb5..db6aecfad2d 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1800,22 +1800,37 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 	      unsupported_builtin = true;
 	  }
       }
-    else if (fcode == P10_BUILTIN_VEC_XXEVAL)
+    else if ((fcode == P10_BUILTIN_VEC_XXEVAL)
+	    || (fcode == P10_BUILTIN_VXXPERMX))
       {
-	/* Need to special case __builtin_vec_xxeval because this takes
-	   4 arguments, and the existing infrastructure handles no
-	   more than three.  */
+	signed char op3_type;
+
+	/* Need to special case the builins_xxeval because it takes
+	   4 arguments, and the existing infrastructure handles three.  */
 	if (nargs != 4)
 	  {
-	    error ("builtin %qs requires 4 arguments",
-		   "__builtin_vec_xxeval");
+	    if (fcode == P10_BUILTIN_VEC_XXEVAL)
+	      error ("builtin %qs requires 4 arguments",
+		     "__builtin_vec_xxeval");
+	    else
+	      error ("builtin %qs requires 4 arguments",
+		     "__builtin_vec_xxpermx");
+
 	    return error_mark_node;
 	  }
+
+	/* Set value for vec_xxpermx here as it is a constant.  */
+	op3_type = RS6000_BTI_V16QI;
+
 	for ( ; desc->code == fcode; desc++)
 	  {
+	    if (fcode == P10_BUILTIN_VEC_XXEVAL)
+	      op3_type = desc->op3;
+
 	    if (rs6000_builtin_type_compatible (types[0], desc->op1)
 		&& rs6000_builtin_type_compatible (types[1], desc->op2)
 		&& rs6000_builtin_type_compatible (types[2], desc->op3)
+		&& rs6000_builtin_type_compatible (types[2], op3_type)
 		&& rs6000_builtin_type_compatible (types[3],
 						   RS6000_BTI_UINTSI))
 	      {
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 06320279138..dc69d4873a0 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5563,6 +5563,39 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
     RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
 
+  /* The overloaded XXPERMX definitions are handled specially because the
+     fourth unsigned char operand is not encoded in this table.  */
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI,
+     RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI,
+     RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI,
+     RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
+     RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF,
+     RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX,
+     RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF,
+     RS6000_BTI_unsigned_V16QI },
+
   { P10_BUILTIN_VEC_EXTRACTL, P10_BUILTIN_VEXTRACTBL,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
@@ -5704,6 +5737,37 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P10_BUILTIN_VEC_XXSPLTI32DX, P10_BUILTIN_VXXSPLTI32DX_V4SF,
     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_UINTQI, RS6000_BTI_float },
 
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V16QI,
+     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI,
+     RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V16QI,
+     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V8HI,
+     RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI,
+     RS6000_BTI_unsigned_V8HI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V8HI,
+     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SI,
+     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI,
+     RS6000_BTI_unsigned_V4SI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SI,
+     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DI,
+     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
+     RS6000_BTI_unsigned_V2DI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DI,
+     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SF,
+     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF,
+     RS6000_BTI_unsigned_V4SI },
+  {  P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DF,
+     RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF,
+     RS6000_BTI_unsigned_V2DI },
+
   { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI,
     RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
@@ -10101,6 +10165,19 @@ rs6000_expand_quaternop_builtin (enum insn_code icode, tree exp, rtx target)
 	  return CONST0_RTX (tmode);
 	}
     }
+
+  else if (icode == CODE_FOR_xxpermx)
+    {
+      /* Only allow 3-bit unsigned literals.  */
+      STRIP_NOPS (arg3);
+      if (TREE_CODE (arg3) != INTEGER_CST
+	  || TREE_INT_CST_LOW (arg3) & ~0x7)
+	{
+	  error ("argument 4 must be an 3-bit unsigned literal");
+	  return CONST0_RTX (tmode);
+	}
+    }
+
   else if (icode == CODE_FOR_vreplace_elt_v4si
 	   || icode == CODE_FOR_vreplace_elt_v4sf)
    {
@@ -13788,12 +13865,17 @@ builtin_quaternary_function_type (machine_mode mode_ret,
   tree function_type = NULL;
 
   static tree v2udi_type = builtin_mode_to_type[V2DImode][1];
+  static tree v16uqi_type = builtin_mode_to_type[V16QImode][1];
   static tree uchar_type = builtin_mode_to_type[QImode][1];
 
   static tree xxeval_type =
     build_function_type_list (v2udi_type, v2udi_type, v2udi_type,
 			      v2udi_type, uchar_type, NULL_TREE);
 
+  static tree xxpermx_type =
+    build_function_type_list (v2udi_type, v2udi_type, v2udi_type,
+			      v16uqi_type, uchar_type, NULL_TREE);
+
   switch (builtin) {
 
   case P10_BUILTIN_XXEVAL:
@@ -13805,6 +13887,15 @@ builtin_quaternary_function_type (machine_mode mode_ret,
     function_type = xxeval_type;
     break;
 
+  case P10_BUILTIN_VXXPERMX:
+    gcc_assert ((mode_ret == V2DImode)
+		&& (mode_arg0 == V2DImode)
+		&& (mode_arg1 == V2DImode)
+		&& (mode_arg2 == V16QImode)
+		&& (mode_arg3 == QImode));
+    function_type = xxpermx_type;
+    break;
+
   default:
     /* A case for each quaternary built-in must be provided above.  */
     gcc_unreachable ();
@@ -13986,6 +14077,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10_BUILTIN_VREPLACE_ELT_UV2DI:
     case P10_BUILTIN_VREPLACE_UN_UV4SI:
     case P10_BUILTIN_VREPLACE_UN_UV2DI:
+    case P10_BUILTIN_VXXBLEND_V16QI:
+    case P10_BUILTIN_VXXBLEND_V8HI:
+    case P10_BUILTIN_VXXBLEND_V4SI:
+    case P10_BUILTIN_VXXBLEND_V2DI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e9aa06553aa..0e4d91a43f6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21200,6 +21200,69 @@ result.  The other words of argument 1 are unchanged.
 
 @findex vec_splati_ins
 
+Vector Blend Variable
+
+@smallexample
+@exdent vector signed char vec_blendv (vector signed char, vector signed char,
+vector unsigned char);
+@exdent vector unsigned char vec_blendv (vector unsigned char,
+vector unsigned char, vector unsigned char);
+@exdent vector signed short vec_blendv (vector signed short,
+vector signed short, vector unsigned short);
+@exdent vector unsigned short vec_blendv (vector unsigned short,
+vector unsigned short, vector unsigned short);
+@exdent vector signed int vec_blendv (vector signed int, vector signed int,
+vector unsigned int);
+@exdent vector unsigned int vec_blendv (vector unsigned int,
+vector unsigned int, vector unsigned int);
+@exdent vector signed long long vec_blendv (vector signed long long,
+vector signed long long, vector unsigned long long);
+@exdent vector unsigned long long vec_blendv (vector unsigned long long,
+vector unsigned long long, vector unsigned long long);
+@exdent vector float vec_blendv (vector float, vector float,
+vector unsigned int);
+@exdent vector double vec_blendv (vector double, vector double,
+vector unsigned long long);
+@end smallexample
+
+Blend the first and second argument vectors according to the sign bits of the
+corresponding elements of the third argument vector.  This is similar to the
+vsel and xxsel instructions but for bigger elements.
+
+@findex vec_blendv
+
+Vector Permute Extended
+
+@smallexample
+@exdent vector signed char vec_permx (vector signed char, vector signed char,
+vector unsigned char, const int);
+@exdent vector unsigned char vec_permx (vector unsigned char,
+vector unsigned char, vector unsigned char, const int);
+@exdent vector signed short vec_permx (vector signed short,
+vector signed short, vector unsigned char, const int);
+@exdent vector unsigned short vec_permx (vector unsigned short,
+vector unsigned short, vector unsigned char, const int);
+@exdent vector signed int vec_permx (vector signed int, vector signed int,
+vector unsigned char, const int);
+@exdent vector unsigned int vec_permx (vector unsigned int,
+vector unsigned int, vector unsigned char, const int);
+@exdent vector signed long long vec_permx (vector signed long long,
+vector signed long long, vector unsigned char, const int);
+@exdent vector unsigned long long vec_permx (vector unsigned long long,
+vector unsigned long long, vector unsigned char, const int);
+@exdent vector float (vector float, vector float, vector unsigned char,
+const int);
+@exdent vector double (vector double, vector double, vector unsigned char,
+const int);
+@end smallexample
+
+Perform a partial permute of the first two arguments, which form a 32-byte
+section of an emulated vector up to 256 bytes wide, using the partial permute
+control vector in the third argument.  The fourth argument (constrained to
+values of 0-7) identifies which 32-byte section of the emulated vector is
+contained in the first two arguments.
+@findex vec_permx
+
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_pext (vector unsigned long long int, vector unsigned long long int)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c
new file mode 100644
index 00000000000..1c701aefc6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c
@@ -0,0 +1,276 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  vector signed char vsrc_a_char, vsrc_b_char;
+  vector signed char vresult_char;
+  vector signed char expected_vresult_char;
+
+  vector unsigned char vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar;
+  vector unsigned char vresult_uchar;
+  vector unsigned char expected_vresult_uchar;
+
+  vector signed short vsrc_a_short, vsrc_b_short, vsrc_c_short;
+  vector signed short vresult_short;
+  vector signed short expected_vresult_short;
+
+  vector unsigned short vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort;
+  vector unsigned short vresult_ushort;
+  vector unsigned short expected_vresult_ushort;
+
+  vector int vsrc_a_int, vsrc_b_int, vsrc_c_int;
+  vector int vresult_int;
+  vector int expected_vresult_int;
+
+  vector unsigned int vsrc_a_uint, vsrc_b_uint, vsrc_c_uint;
+  vector unsigned int vresult_uint;
+  vector unsigned int expected_vresult_uint;
+
+  vector long long int vsrc_a_ll, vsrc_b_ll, vsrc_c_ll;
+  vector long long int vresult_ll;
+  vector long long int expected_vresult_ll;
+
+  vector unsigned long long int vsrc_a_ull,  vsrc_b_ull,  vsrc_c_ull;
+  vector unsigned long long int vresult_ull;
+  vector unsigned long long int expected_vresult_ull;
+
+  vector float vresult_f;
+  vector float expected_vresult_f;
+  vector float vsrc_a_f, vsrc_b_f;
+
+  vector double vsrc_a_d, vsrc_b_d;
+  vector double vresult_d;
+  vector double expected_vresult_d;
+ 
+  /* Vector blend */
+  vsrc_c_uchar = (vector unsigned char) { 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80,
+					  0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80 };
+
+  vsrc_a_char = (vector signed char) { -1, 3, 5, 7, 9, 11, 13, 15,
+                                       17, 19, 21, 23, 25, 27, 29 };
+  vsrc_b_char = (vector signed char) { 2, -4, 6, 8, 10, 12, 14, 16,
+				       18, 20, 22, 24, 26, 28, 30, 32 };
+  vsrc_c_uchar = (vector unsigned char) { 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80,
+					  0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80 };
+  vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_char = (vector signed char) { -1, -4, 5, 8,
+						 9, 12, 13, 16,
+						 17, 20, 21, 24,
+						 25, 28, 29, 32 };
+						 
+  vresult_char = vec_blendv (vsrc_a_char, vsrc_b_char, vsrc_c_uchar);
+
+  if (!vec_all_eq (vresult_char,  expected_vresult_char)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_char, vsrc_b_char, vsrc_c_uchar)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n",
+	     i, vresult_char[i], i, expected_vresult_char[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_uchar = (vector unsigned char) { 1, 3, 5, 7, 9, 11, 13, 15,
+					  17, 19, 21, 23, 25, 27, 29 };
+  vsrc_b_uchar = (vector unsigned char) { 2, 4, 6, 8, 10, 12, 14, 16,
+					  18, 20, 22, 24, 26, 28, 30, 32 };
+  vsrc_c_uchar = (vector unsigned char) { 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80,
+					  0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar = (vector unsigned char) { 1, 4, 5, 8,
+						    9, 12, 13, 16,
+						    17, 20, 21, 24,
+						    25, 28, 29, 32 };
+						 
+  vresult_uchar = vec_blendv (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar);
+
+  if (!vec_all_eq (vresult_uchar,  expected_vresult_uchar)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_short = (vector signed short) { -1, 3, 5, 7, 9, 11, 13, 15 };
+  vsrc_b_short = (vector signed short) { 2, -4, 6, 8, 10, 12, 14, 16 };
+  vsrc_c_ushort = (vector unsigned short) { 0, 0x8000, 0, 0x8000,
+					    0, 0x8000, 0, 0x8000 };
+  vresult_short = (vector signed short) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_short = (vector signed short) { -1, -4, 5, 8,
+						   9, 12, 13, 16 };
+
+  vresult_short = vec_blendv (vsrc_a_short, vsrc_b_short, vsrc_c_ushort);
+
+  if (!vec_all_eq (vresult_short,  expected_vresult_short)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_short, vsrc_b_short, vsrc_c_ushort)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_short[%d] = %d, expected_vresult_short[%d] = %d\n",
+	     i, vresult_short[i], i, expected_vresult_short[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_ushort = (vector unsigned short) { 1, 3, 5, 7, 9, 11, 13, 15 };
+  vsrc_b_ushort = (vector unsigned short) { 2, 4, 6, 8, 10, 12, 14, 16 };
+  vsrc_c_ushort = (vector unsigned short) { 0, 0x8000, 0, 0x8000,
+					    0, 0x8000, 0, 0x8000 };
+  vresult_ushort = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ushort = (vector unsigned short) { 1, 4, 5, 8,
+						      9, 12, 13, 16 };
+						 
+  vresult_ushort = vec_blendv (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort);
+
+  if (!vec_all_eq (vresult_ushort,  expected_vresult_ushort)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_ushort[%d] = %d, expected_vresult_ushort[%d] = %d\n",
+	     i, vresult_ushort[i], i, expected_vresult_ushort[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_int = (vector signed int) { -1, -3, -5, -7 };
+  vsrc_b_int = (vector signed int) { 2, 4, 6, 8 };
+  vsrc_c_uint = (vector unsigned int) { 0, 0x80000000, 0, 0x80000000};
+  vresult_int = (vector signed int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector signed int) { -1, 4, -5, 8 };
+						 
+  vresult_int = vec_blendv (vsrc_a_int, vsrc_b_int, vsrc_c_uint);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_int, vsrc_b_int, vsrc_c_uint)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_uint = (vector unsigned int) { 1, 3, 5, 7 };
+  vsrc_b_uint = (vector unsigned int) { 2, 4, 6, 8 };
+  vsrc_c_uint = (vector unsigned int) { 0, 0x80000000, 0, 0x80000000 };
+  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_uint = (vector unsigned int) { 1, 4, 5, 8 };
+						 
+  vresult_uint = vec_blendv (vsrc_a_uint, vsrc_b_uint, vsrc_c_uint);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_uint, vsrc_b_uint, vsrc_c_uint)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_ll = (vector signed long long int) { -1, -3 };
+  vsrc_b_ll = (vector signed long long int) { 2, 4,  };
+  vsrc_c_ull = (vector unsigned long long int) { 0, 0x8000000000000000ULL };
+  vresult_ll = (vector signed long long int) { 0, 0 };
+  expected_vresult_ll = (vector signed long long int) { -1, 4 };
+						 
+  vresult_ll = vec_blendv (vsrc_a_ll, vsrc_b_ll, vsrc_c_ull);
+
+  if (!vec_all_eq (vresult_ll,  expected_vresult_ll)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_ll, vsrc_b_ll, vsrc_c_ull)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n",
+	     i, vresult_ll[i], i, expected_vresult_ll[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_ull = (vector unsigned long long) { 1, 3 };
+  vsrc_b_ull = (vector unsigned long long) { 2, 4 };
+  vsrc_c_ull = (vector unsigned long long int) { 0, 0x8000000000000000ULL };
+  vresult_ull = (vector unsigned long long) { 0, 0 };
+  expected_vresult_ull = (vector unsigned long long) { 1, 4 };
+						 
+  vresult_ull = vec_blendv (vsrc_a_ull, vsrc_b_ull, vsrc_c_ull);
+
+  if (!vec_all_eq (vresult_ull,  expected_vresult_ull)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_ull, vsrc_b_ull, vsrc_c_ull)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ull[%d] = %d, expected_vresult_ull[%d] = %d\n",
+	     i, vresult_ull[i], i, expected_vresult_ull[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_f = (vector float) { -1.0, -3.0, -5.0, -7.0 };
+  vsrc_b_f = (vector float) { 2.0, 4.0, 6.0, 8.0 };
+  vsrc_c_uint = (vector unsigned int) { 0, 0x80000000, 0, 0x80000000};
+  vresult_f = (vector float) { 0, 0, 0, 0 };
+  expected_vresult_f = (vector float) { -1, 4, -5, 8 };
+						 
+  vresult_f = vec_blendv (vsrc_a_f, vsrc_b_f, vsrc_c_uint);
+
+  if (!vec_all_eq (vresult_f,  expected_vresult_f)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_f, vsrc_b_f, vsrc_c_uint)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_f[%d] = %d, expected_vresult_f[%d] = %d\n",
+	     i, vresult_f[i], i, expected_vresult_f[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_d = (vector double) { -1.0, -3.0 };
+  vsrc_b_d = (vector double) { 2.0, 4.0 };
+  vsrc_c_ull = (vector unsigned long long int) { 0, 0x8000000000000000ULL };
+  vresult_d = (vector double) { 0, 0 };
+  expected_vresult_d = (vector double) { -1, 4 };
+						 
+  vresult_d = vec_blendv (vsrc_a_d, vsrc_b_d, vsrc_c_ull);
+
+  if (!vec_all_eq (vresult_d,  expected_vresult_d)) {
+#if DEBUG
+    printf("ERROR, vec_blendv (vsrc_a_d, vsrc_b_d, vsrc_c_ull)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_d[%d] = %d, expected_vresult_d[%d] = %d\n",
+	     i, vresult_d[i], i, expected_vresult_d[i]);
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\msplati\M} 6 } } */
+/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */
+
+
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c
new file mode 100644
index 00000000000..ed5aa97f74b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c
@@ -0,0 +1,294 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  vector signed char vsrc_a_char, vsrc_b_char;
+  vector signed char vresult_char;
+  vector signed char expected_vresult_char;
+
+  vector unsigned char vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar;
+  vector unsigned char vresult_uchar;
+  vector unsigned char expected_vresult_uchar;
+
+  vector signed short vsrc_a_short, vsrc_b_short, vsrc_c_short;
+  vector signed short vresult_short;
+  vector signed short expected_vresult_short;
+
+  vector unsigned short vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort;
+  vector unsigned short vresult_ushort;
+  vector unsigned short expected_vresult_ushort;
+
+  vector int vsrc_a_int, vsrc_b_int, vsrc_c_int;
+  vector int vresult_int;
+  vector int expected_vresult_int;
+
+  vector unsigned int vsrc_a_uint, vsrc_b_uint, vsrc_c_uint;
+  vector unsigned int vresult_uint;
+  vector unsigned int expected_vresult_uint;
+
+  vector long long int vsrc_a_ll, vsrc_b_ll, vsrc_c_ll;
+  vector long long int vresult_ll;
+  vector long long int expected_vresult_ll;
+
+  vector unsigned long long int vsrc_a_ull,  vsrc_b_ull,  vsrc_c_ull;
+  vector unsigned long long int vresult_ull;
+  vector unsigned long long int expected_vresult_ull;
+
+  vector float vresult_f;
+  vector float expected_vresult_f;
+  vector float vsrc_a_f, vsrc_b_f;
+
+  vector double vsrc_a_d, vsrc_b_d;
+  vector double vresult_d;
+  vector double expected_vresult_d;
+ 
+  /* Vector permx */
+  vsrc_a_char = (vector signed char) { -1, 3, 5, 7, 9, 11, 13, 15,
+                                       17, 19, 21, 23, 25, 27, 29 };
+  vsrc_b_char = (vector signed char) { 2, -4, 6, 8, 10, 12, 14, 16,
+				       18, 20, 22, 24, 26, 28, 30, 32 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x7, 0, 0x5, 0, 0x3, 0, 0x1,
+					  0, 0x2, 0, 0x4, 0, 0x6, 0, 0x0 };
+  vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_char = (vector signed char) { -1, 15, -1, 11,
+						 -1, 7, -1, 3,
+						 -1, 5, -1, 9,
+						 -1, 13, -1, -1 };
+						 
+  vresult_char = vec_permx (vsrc_a_char, vsrc_b_char, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_char,  expected_vresult_char)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_char, vsrc_b_char, vsrc_c_uchar)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n",
+	     i, vresult_char[i], i, expected_vresult_char[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_uchar = (vector unsigned char) { 1, 3, 5, 7, 9, 11, 13, 15,
+					  17, 19, 21, 23, 25, 27, 29 };
+  vsrc_b_uchar = (vector unsigned char) { 2, 4, 6, 8, 10, 12, 14, 16,
+					  18, 20, 22, 24, 26, 28, 30, 32 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x7, 0, 0x5, 0, 0x3, 0, 0x1,
+					  0, 0x2, 0, 0x4, 0, 0x6, 0, 0x0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar = (vector unsigned char) { 1, 15, 1, 11,
+						    1, 7, 1, 3,
+						    1, 5, 1, 9,
+						    1, 13, 1, 1 };
+						 
+  vresult_uchar = vec_permx (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_uchar,  expected_vresult_uchar)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_short = (vector signed short int) { 1, -3, 5, 7, 9, 11, 13, 15 };
+  vsrc_b_short = (vector signed short int) { 2, 4, -6, 8, 10, 12, 14, 16 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x2, 0x3,
+					  0x8, 0x9, 0x2, 0x3,
+					  0x1E, 0x1F, 0x2, 0x3 };
+  vresult_short = (vector signed short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_short = (vector signed short int) { 1, -3, 5, -3,
+						       9, -3, 16, -3 };
+						 
+  vresult_short = vec_permx (vsrc_a_short, vsrc_b_short, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_short,  expected_vresult_short)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_short, vsrc_b_short, vsrc_c_uchar)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_short[%d] = %d, expected_vresult_short[%d] = %d\n",
+	     i, vresult_short[i], i, expected_vresult_short[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_ushort = (vector unsigned short int) { 1, 3, 5, 7, 9, 11, 13, 15 };
+  vsrc_b_ushort = (vector unsigned short int) { 2, 4, 6, 8, 10, 12, 14, 16 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x2, 0x3,
+					  0x8, 0x9, 0x2, 0x3,
+					  0x1E, 0x1F, 0x2, 0x3 };
+  vresult_ushort = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ushort = (vector unsigned short int) { 1, 3, 5, 3,
+							  9, 3, 16, 3 };
+						 
+  vresult_ushort = vec_permx (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_ushort,  expected_vresult_ushort)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_uchar)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_ushort[%d] = %d, expected_vresult_ushort[%d] = %d\n",
+	     i, vresult_ushort[i], i, expected_vresult_ushort[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_int = (vector signed int) { 1, -3, 5, 7 };
+  vsrc_b_int = (vector signed int) { 2, 4, -6, 8 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x6, 0x7,
+					  0x18, 0x19, 0x1A, 0x1B,
+					  0x1C, 0x1D, 0x1E, 0x1F };
+  vresult_int = (vector signed int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector signed int) { 1, -3, -6, 8 };
+						 
+  vresult_int = vec_permx (vsrc_a_int, vsrc_b_int, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_int, vsrc_b_int, vsrc_c_uchar)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_uint = (vector unsigned int) { 1, 3, 5, 7 };
+  vsrc_b_uint = (vector unsigned int) { 10, 12, 14, 16 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x6, 0x7,
+					  0x18, 0x19, 0x1A, 0x1B,
+					  0x1C, 0x1D, 0x1E, 0x1F };
+  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_uint = (vector unsigned int) { 1, 3, 14, 16 };
+						 
+  vresult_uint = vec_permx (vsrc_a_uint, vsrc_b_uint, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_uint, vsrc_b_uint, vsrc_c_uchar)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_ll = (vector signed long long int) { 1, -3 };
+  vsrc_b_ll = (vector signed long long int) { 2, -4 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x6, 0x7,
+					  0x18, 0x19, 0x1A, 0x1B,
+					  0x1C, 0x1D, 0x1E, 0x1F };
+  vresult_ll = (vector signed long long int) { 0, 0};
+  expected_vresult_ll = (vector signed long long int) { 1, -4 };
+						 
+  vresult_ll = vec_permx (vsrc_a_ll, vsrc_b_ll, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_ll,  expected_vresult_ll)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_ll, vsrc_b_ll, vsrc_c_uchar)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ll[%d] = %lld, expected_vresult_ll[%d] = %lld\n",
+	     i, vresult_ll[i], i, expected_vresult_ll[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_ull = (vector unsigned long long int) { 1, 3 };
+  vsrc_b_ull = (vector unsigned long long int) { 10, 12 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x6, 0x7,
+					  0x18, 0x19, 0x1A, 0x1B,
+					  0x1C, 0x1D, 0x1E, 0x1F };
+  vresult_ull = (vector unsigned long long int) { 0, 0 };
+  expected_vresult_ull = (vector unsigned long long int) { 1, 12 };
+						 
+  vresult_ull = vec_permx (vsrc_a_ull, vsrc_b_ull, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_ull,  expected_vresult_ull)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_ull, vsrc_b_ull, vsrc_c_uchar)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ull[%d] = %d, expected_vresult_ull[%d] = %d\n",
+	     i, vresult_ull[i], i, expected_vresult_ull[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_f = (vector float) { -3.0, 5.0, 7.0, 9.0 };
+  vsrc_b_f = (vector float) { 2.0,  4.0, 6.0, 8.0  };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x6, 0x7,
+					  0x18, 0x19, 0x1A, 0x1B,
+					  0x1C, 0x1D, 0x1E, 0x1F };
+  vresult_f = (vector float) { 0.0, 0.0, 0.0, 0.0 };
+  expected_vresult_f = (vector float) { -3.0, 5.0, 6.0, 8.0 };
+						 
+  vresult_f = vec_permx (vsrc_a_f, vsrc_b_f, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_f,  expected_vresult_f)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_f, vsrc_b_f, vsrc_c_uchar)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_f[%d] = %f, expected_vresult_f[%d] = %f\n",
+	     i, vresult_f[i], i, expected_vresult_f[i]);
+#else
+    abort();
+#endif
+  }
+
+  vsrc_a_d = (vector double) { 1.0, -3.0 };
+  vsrc_b_d = (vector double) { 2.0, -4.0 };
+  vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3,
+					  0x4, 0x5, 0x6, 0x7,
+					  0x1A, 0x1B, 0x1C, 0x1B,
+					  0x1C, 0x1D, 0x1E, 0x1F };
+  vresult_d = (vector double) { 0.0, 0.0 };
+  expected_vresult_d = (vector double) { 1.0, -4.0 };
+						 
+  vresult_d = vec_permx (vsrc_a_d, vsrc_b_d, vsrc_c_uchar, 0);
+
+  if (!vec_all_eq (vresult_d,  expected_vresult_d)) {
+#if DEBUG
+    printf("ERROR, vec_permx (vsrc_a_d, vsrc_b_d, vsrc_c_uchar)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_d[%d] = %f, expected_vresult_d[%d] = %f\n",
+	     i, vresult_d[i], i, expected_vresult_d[i]);
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermx\M} 6 } } */
+
+
-- 
2.17.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
@ 2020-07-08 19:59 Carl Love
  2020-07-09 17:38 ` will schmidt
  2020-07-15 20:07 ` Segher Boessenkool
  0 siblings, 2 replies; 18+ messages in thread
From: Carl Love @ 2020-07-08 19:59 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

[PATCH 5/6] rs6000, Add vector splat builtin support

----------------------------------
V4 Fixes:

   Rebased on mainline.  Changed FUTURE to P10.
   define_predicate "s32bit_cint_operand" removed unnecessary cast in
     definition.
   Changed define_expand "xxsplti32dx_v4si" to use "0" for constraint
     of operand 1.
   Changed define_insn "xxsplti32dx_v4si_inst" to use "0 for constraint
     of operand 1.
   Removed define_predicate "f32bit_const_operand".  Use const_double_operand
     instead.

   *** Please provide feedback for the following change:
   (define_insn "xxspltidp_v2df_inst", Added print statement to warn of
   possible undefined behavior.  The xxspltidp instruction result is
   undefined for subnormal inputs.  I added a test for subnormal input with
   a fprintf to stderr to warn the "user" if the constant input is a subnormal
   value.  I tried assert initially, but that causes GCC to exit ungracefully
   with no information as to why.  I really didn't like that behavior.
   A subnormal input is not really a fatal error but the "user" needs
   to be told it is not a good idea.  Not sure if using an fprintf statement
   in a define_insn is an acceptable thing either.  But it does give the
   user the needed input and GCC exits normally.  Let me know if there
   is a better option here.
--------------------
v3 fixes:
   Minor cleanup in the ChangeLog description.

-------------------------------------------------
v2 fixes:

  change log fixes
    gcc/config/rs6000/altivec changed name of define_insn and define_expand
    for vxxspltiw... to xxspltiw...   Fixed spaces in gen_xxsplti32dx_v4sf_inst (operands[0], GEN_INT

    gcc/rs6000-builtin.def propagated name changes above where they are used.

    Updated definition for S32bit_cint_operand, c32bit_cint_operand,
    f32bit_const_operand predicate definitions.

    Changed name of rs6000_constF32toI32 to rs6000_const_f32_to_i32, propagated
    name change as needed.  Replaced if test with gcc_assert().

    Fixed description of vec_splatid() in documentation.
-----------------------

GCC maintainers:

The following patch adds support for the vec_splati, vec_splatid and
vec_splati_ins builtins.

This patch adds support for instructions that take a 32-bit immediate
value that represents a floating point value.  This support adds new
predicates and a support function to properly handle the immediate value.

The patch has been compiled and tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

The test case was compiled on a Power 9 system and then tested on
Mambo.

Please let me know if this patch is acceptable for the mainline
branch.  Thanks.

                         Carl Love
--------------------------------------------------------
gcc/ChangeLog

2020-07-06  Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.h (vec_splati, vec_splatid, vec_splati_ins):
	Add defines.
	* config/rs6000/altivec.md (UNSPEC_XXSPLTIW, UNSPEC_XXSPLTID,
	UNSPEC_XXSPLTI32DX): New.
	(vxxspltiw_v4si, vxxspltiw_v4sf_inst, vxxspltidp_v2df_inst,
	vxxsplti32dx_v4si_inst, vxxsplti32dx_v4sf_inst): New define_insn.
	(vxxspltiw_v4sf, vxxspltidp_v2df, vxxsplti32dx_v4si,
	vxxsplti32dx_v4sf.): New define_expands.
	* config/rs6000/predicates (u1bit_cint_operand,
	s32bit_cint_operand, c32bit_cint_operand): New predicates.
	* config/rs6000/rs6000-builtin.def (VXXSPLTIW_V4SI, VXXSPLTIW_V4SF,
	VXXSPLTID): New definitions.
	(VXXSPLTI32DX_V4SI, VXXSPLTI32DX_V4SF): New BU_P10V_3
	definitions.
	(XXSPLTIW, XXSPLTID): New definitions.
	(XXSPLTI32DX): Add definitions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_XXSPLTIW,
	P10_BUILTIN_VEC_XXSPLTID, P10_BUILTIN_VEC_XXSPLTI32DX):
	New definitions.
	* config/rs6000/rs6000-protos.h (rs6000_constF32toI32): New extern
	declaration.
	* config/rs6000/rs6000.c (rs6000_constF32toI32): New function.
	* config/doc/extend.texi: Add documentation for vec_splati,
	vec_splatid, and vec_splati_ins.

gcc/testsuite/ChangeLog

2020-07-06  Carl Love  <cel@us.ibm.com>

	* testsuite/gcc.target/powerpc/vec-splati-runnable: New test.
---
 gcc/config/rs6000/altivec.h                   |   3 +
 gcc/config/rs6000/altivec.md                  | 116 ++++++++++++++
 gcc/config/rs6000/predicates.md               |  15 ++
 gcc/config/rs6000/rs6000-builtin.def          |  12 ++
 gcc/config/rs6000/rs6000-call.c               |  19 +++
 gcc/config/rs6000/rs6000-protos.h             |   1 +
 gcc/config/rs6000/rs6000.c                    |  11 ++
 gcc/doc/extend.texi                           |  35 +++++
 .../gcc.target/powerpc/vec-splati-runnable.c  | 145 ++++++++++++++++++
 9 files changed, 357 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index c202fcf25da..126409c168b 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -705,6 +705,9 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c)
 #define vec_sldb(a, b, c)      __builtin_vec_sldb (a, b, c)
 #define vec_srdb(a, b, c)      __builtin_vec_srdb (a, b, c)
+#define vec_splati(a)  __builtin_vec_xxspltiw (a)
+#define vec_splatid(a) __builtin_vec_xxspltid (a)
+#define vec_splati_ins(a, b, c)        __builtin_vec_xxsplti32dx (a, b, c)
 
 #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
 #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c58fb3961e0..f6858b5bf2a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -174,6 +174,9 @@
    UNSPEC_VSTRIL
    UNSPEC_SLDB
    UNSPEC_SRDB
+   UNSPEC_XXSPLTIW
+   UNSPEC_XXSPLTID
+   UNSPEC_XXSPLTI32DX
 ])
 
 (define_c_enum "unspecv"
@@ -800,6 +803,119 @@
   "vs<SLDB_lr>dbi %0,%1,%2,%3"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "xxspltiw_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=wa")
+	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW))]
+ "TARGET_POWER10"
+ "xxspltiw %x0,%1"
+ [(set_attr "type" "vecsimple")])
+
+(define_expand "xxspltiw_v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
+		     UNSPEC_XXSPLTIW))]
+ "TARGET_POWER10"
+{
+  long long value = rs6000_const_f32_to_i32 (operands[1]);
+  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
+  DONE;
+})
+
+(define_insn "xxspltiw_v4sf_inst"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW))]
+ "TARGET_POWER10"
+ "xxspltiw %x0,%c1"
+ [(set_attr "type" "vecsimple")])
+
+(define_expand "xxspltidp_v2df"
+  [(set (match_operand:V2DF 0 "register_operand" )
+	(unspec:V2DF [(match_operand:SF 1 "const_double_operand")]
+		     UNSPEC_XXSPLTID))]
+ "TARGET_POWER10"
+{
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  emit_insn (gen_xxspltidp_v2df_inst (operands[0], GEN_INT (value)));
+  DONE;
+})
+
+(define_insn "xxspltidp_v2df_inst"
+  [(set (match_operand:V2DF 0 "register_operand" "=wa")
+	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTID))]
+  "TARGET_POWER10"
+{
+  /* Note, the xxspltidp gives undefined results if the operand is a single
+     precision subnormal number. */
+  int value = INTVAL (operands[1]);
+
+  if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0))
+    /* value is subnormal */
+    fprintf (stderr, "WARNING: Result for the xxspltidp instruction is undefined for subnormal input values.\n");
+
+  return "xxspltidp %x0,%c1";
+}
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "xxsplti32dx_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=wa")
+	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+ "TARGET_POWER10"
+{
+  int index = INTVAL (operands[2]);
+
+  if (!BYTES_BIG_ENDIAN)
+    index = 1 - index;
+
+   emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], operands[1],
+					 GEN_INT (index), operands[3]));
+   DONE;
+}
+ [(set_attr "type" "vecsimple")])
+
+(define_insn "xxsplti32dx_v4si_inst"
+  [(set (match_operand:V4SI 0 "register_operand" "=wa")
+	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_POWER10"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "xxsplti32dx_v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SF 3 "const_double_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_POWER10"
+{
+  int index = INTVAL (operands[2]);
+  long value = rs6000_const_f32_to_i32 (operands[3]);
+  if (!BYTES_BIG_ENDIAN)
+    index = 1 - index;
+
+   emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], operands[1],
+					 GEN_INT (index), GEN_INT (value)));
+   DONE;
+})
+
+(define_insn "xxsplti32dx_v4sf_inst"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_POWER10"
+  "xxsplti32dx %x0,%2,%3"
+   [(set_attr "type" "vecsimple")])
+
 (define_expand "vstrir_<mode>"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
 	(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9762855d76d..e9f7f143159 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -214,6 +214,11 @@
   (and (match_code "const_int")
        (match_test "INTVAL (op) >= -16 && INTVAL (op) <= 15")))
 
+;; Return 1 if op is a unsigned 1-bit constant integer.
+(define_predicate "u1bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1")))
+
 ;; Return 1 if op is a unsigned 3-bit constant integer.
 (define_predicate "u3bit_cint_operand"
   (and (match_code "const_int")
@@ -272,6 +277,16 @@
        (match_test "(unsigned HOST_WIDE_INT)
 		    (INTVAL (op) + 0x8000) >= 0x10000")))
 
+;; Return 1 if op is a 32-bit constant signed integer
+(define_predicate "s32bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "(0x80000000 + UINTVAL (op)) >> 32 == 0")))
+
+;; Return 1 if op is a constant 32-bit unsigned
+(define_predicate "c32bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "((UINTVAL (op) >> 32) == 0)")))
+
 ;; Return 1 if op is a positive constant integer that is an exact power of 2.
 (define_predicate "exact_log2_cint_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index c6fdfadeda8..ddfe287efc8 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2748,6 +2748,14 @@ BU_P10V_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_P10V_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
+BU_P10V_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
+
+BU_P10V_1 (VXXSPLTID, "vxxspltidp", CONST, xxspltidp_v2df)
+
+BU_P10V_3 (VXXSPLTI32DX_V4SI, "vxxsplti32dx_v4si", CONST, xxsplti32dx_v4si)
+BU_P10V_3 (VXXSPLTI32DX_V4SF, "vxxsplti32dx_v4sf", CONST, xxsplti32dx_v4sf)
+
 BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
 BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
 BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
@@ -2779,6 +2787,10 @@ BU_P10_OVERLOAD_1 (VSTRIL, "stril")
 
 BU_P10_OVERLOAD_1 (VSTRIR_P, "strir_p")
 BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
+
+BU_P10_OVERLOAD_1 (XXSPLTIW, "xxspltiw")
+BU_P10_OVERLOAD_1 (XXSPLTID, "xxspltid")
+BU_P10_OVERLOAD_3 (XXSPLTI32DX, "xxsplti32dx")
 \f
 /* 1 argument crypto functions.  */
 BU_CRYPTO_1 (VSBOX,		"vsbox",	  CONST, crypto_vsbox_v2di)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index edc67fafd88..06320279138 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5688,6 +5688,22 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
 
+  { P10_BUILTIN_VEC_XXSPLTIW, P10_BUILTIN_VXXSPLTIW_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, 0, 0 },
+  { P10_BUILTIN_VEC_XXSPLTIW, P10_BUILTIN_VXXSPLTIW_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_float, 0, 0 },
+
+  { P10_BUILTIN_VEC_XXSPLTID, P10_BUILTIN_VXXSPLTID,
+    RS6000_BTI_V2DF, RS6000_BTI_float, 0, 0 },
+
+  { P10_BUILTIN_VEC_XXSPLTI32DX, P10_BUILTIN_VXXSPLTI32DX_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_UINTQI, RS6000_BTI_INTSI },
+  { P10_BUILTIN_VEC_XXSPLTI32DX, P10_BUILTIN_VXXSPLTI32DX_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI,
+    RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_XXSPLTI32DX, P10_BUILTIN_VXXSPLTI32DX_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_UINTQI, RS6000_BTI_float },
+
   { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI,
     RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
@@ -14036,6 +14052,9 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case ALTIVEC_BUILTIN_VSRH:
     case ALTIVEC_BUILTIN_VSRW:
     case P8V_BUILTIN_VSRD:
+    /* Vector splat immediate insert */
+    case P10_BUILTIN_VXXSPLTI32DX_V4SI:
+    case P10_BUILTIN_VXXSPLTI32DX_V4SF:
       h.uns_p[2] = 1;
       break;
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 5508484ba19..c6158874ce9 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -274,6 +274,7 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, int size,
 					   const char *label);
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 					     const char *label);
+extern long long rs6000_const_f32_to_i32 (rtx operand);
 
 /* Declare functions in rs6000-c.c */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fef72884b31..046adc02dfc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26767,6 +26767,17 @@ rs6000_invalid_conversion (const_tree fromtype, const_tree totype)
   return NULL;
 }
 
+long long
+rs6000_const_f32_to_i32 (rtx operand)
+{
+  long long value;
+  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand);
+
+  gcc_assert (GET_MODE (operand) == SFmode);
+  REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
+  return value;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 1c39be37c1d..e9aa06553aa 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21165,6 +21165,41 @@ using this built-in must be endian-aware.
 
 @findex vec_srdb
 
+Vector Splat
+
+@smallexample
+@exdent vector signed int vec_splati (const signed int);
+@exdent vector float vec_splati (const float);
+@end smallexample
+
+Splat a 32-bit immediate into a vector of words.
+
+@findex vec_splati
+
+@smallexample
+@exdent vector double vec_splatid (const float);
+@end smallexample
+
+Convert a single precision floating-point value to double-precision and splat
+the result to a vector of double-precision floats.
+
+@findex vec_splatid
+
+@smallexample
+@exdent vector signed int vec_splati_ins (vector signed int,
+const unsigned int, const signed int);
+@exdent vector unsigned int vec_splati_ins (vector unsigned int,
+const unsigned int, const unsigned int);
+@exdent vector float vec_splati_ins (vector float, const unsigned int,
+const float);
+@end smallexample
+
+Argument 2 must be either 0 or 1.  Splat the value of argument 3 into the word
+identified by argument 2 of each doubleword of argument 1 and return the
+result.  The other words of argument 1 are unchanged.
+
+@findex vec_splati_ins
+
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_pext (vector unsigned long long int, vector unsigned long long int)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
new file mode 100644
index 00000000000..a0ce456c6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -0,0 +1,145 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  vector int vsrc_a_int;
+  vector int vresult_int;
+  vector int expected_vresult_int;
+  int src_a_int = 13;
+
+  vector unsigned int vsrc_a_uint;
+  vector unsigned int vresult_uint;
+  vector unsigned int expected_vresult_uint;
+  unsigned int src_a_uint = 7;
+
+  vector float vresult_f;
+  vector float expected_vresult_f;
+  vector float vsrc_a_f;
+  float src_a_f = 23.0;
+
+  vector double vsrc_a_d;
+  vector double vresult_d;
+  vector double expected_vresult_d;
+ 
+  /* Vector splati word */
+  vresult_int = (vector signed int) { 1, 2, 3, 4 };
+  expected_vresult_int = (vector signed int) { -13, -13, -13, -13 }; 
+						 
+  vresult_int = vec_splati ( -13 );
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_splati (src_a_int)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  vresult_f = (vector float) { 1.0, 2.0, 3.0, 4.0 };
+  expected_vresult_f = (vector float) { 23.0, 23.0, 23.0, 23.0 };
+						 
+  vresult_f = vec_splati (23.0f);
+
+  if (!vec_all_eq (vresult_f,  expected_vresult_f)) {
+#if DEBUG
+    printf("ERROR, vec_splati (src_a_f)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_f[%d] = %f, expected_vresult_f[%d] = %f\n",
+	     i, vresult_f[i], i, expected_vresult_f[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector splati double */
+  vresult_d = (vector double) { 2.0, 3.0 };
+  expected_vresult_d = (vector double) { -31.0, -31.0 };
+						 
+  vresult_d = vec_splatid (-31.0f);
+
+  if (!vec_all_eq (vresult_d,  expected_vresult_d)) {
+#if DEBUG
+    printf("ERROR, vec_splati (-31.0f)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_d[%i] = %f, expected_vresult_d[%i] = %f\n",
+	     i, vresult_d[i], i, expected_vresult_d[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector splat immediate */
+  vsrc_a_int = (vector int) { 2, 3, 4, 5 };
+  vresult_int = (vector int) { 1, 1, 1, 1 };
+  expected_vresult_int = (vector int) { 2, 20, 4, 20 };
+						 
+  vresult_int = vec_splati_ins (vsrc_a_int, 1, 20);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_splati_ins (vsrc_a_int, 1, 20)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%i] = %d, expected_vresult_int[%i] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+  
+  vsrc_a_uint = (vector unsigned int) { 4, 5, 6, 7 };
+  vresult_uint = (vector unsigned int) { 1, 1, 1, 1 };
+  expected_vresult_uint = (vector unsigned int) { 4, 40, 6, 40 };
+						 
+  vresult_uint = vec_splati_ins (vsrc_a_uint, 1, 40);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_splati_ins (vsrc_a_uint, 1, 40)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%i] = %d, expected_vresult_uint[%i] = %d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+  
+  vsrc_a_f = (vector float) { 2.0, 3.0, 4.0, 5.0 };
+  vresult_f = (vector float) { 1.0, 1.0, 1.0, 1.0 };
+  expected_vresult_f = (vector float) { 2.0, 20.1, 4.0, 20.1 };
+						 
+  vresult_f = vec_splati_ins (vsrc_a_f, 1, 20.1f);
+
+  if (!vec_all_eq (vresult_f,  expected_vresult_f)) {
+#if DEBUG
+    printf("ERROR, vec_splati_ins (vsrc_a_f, 1, 20.1)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_f[%i] = %f, expected_vresult_f[%i] = %f\n",
+	     i, vresult_f[i], i, expected_vresult_f[i]);
+#else
+    abort();
+#endif
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\msplati\M} 6 } } */
+/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */
+
+
-- 
2.17.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
@ 2020-07-08 19:59 Carl Love
  2020-07-09 16:13 ` will schmidt
  2020-07-14 20:15 ` Segher Boessenkool
  0 siblings, 2 replies; 18+ messages in thread
From: Carl Love @ 2020-07-08 19:59 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

[PATCH 4/6] rs6000, Add vector shift double builtin support

----------------------------------
V4 Fixes:

   Rebased on mainline.  Changed FUTURE to P10.
   Changed SLDB_LR to SLDB_lr
   Changed error ("argument 3 must be in the range 0 to 7"); to
       error ("argument 3 must be a constant in the range 0 to 7");

-----------------------------------------------------------------
V3 Fixes
	Replace spaces with tabs in ChangeLog.
	Minor edits to ChangeLog entry.
	Minor edits to vec_sldb description in gcc/doc/extend.texi.

----------------------------------------------------
v2 fixes:

 change logs redone

  gcc/config/rs6000/rs6000-call.c - added spaces before parenthesis around args.

-----------------------------------------------------------------
GCC maintainers:

The following patch adds support for the vector shift double builtins.

The patch has been compiled and tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

and Mambo with no regression errors.

Please let me know if this patch is acceptable for the mainline branch.

Thanks.

                         Carl Love

-------------------------------------------------------

gcc/ChangeLog

2020-07-06  Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.h (vec_sldb, vec_srdb): New defines.
	* config/rs6000/altivec.md (UNSPEC_SLDB, UNSPEC_SRDB): New.
	(SLDB_LR): New attribute.
	(VSHIFT_DBL_LR): New iterator.
	(vs<SLDB_LR>db_<mode>): New define_insn.
	* config/rs6000/rs6000-builtin.def (VSLDB_V16QI, VSLDB_V8HI,
	VSLDB_V4SI, VSLDB_V2DI, VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI,
	VSRDB_V2DI): New BU_P10V_3 definitions.
	(SLDB, SRDB): New BU_P10_OVERLOAD_3 definitions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_SLDB,
	P10_BUILTIN_VEC_SRDB): New definitions.
	(rs6000_expand_ternop_builtin) [CODE_FOR_vsldb_v16qi,
	CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di,
	CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si,
	CODE_FOR_vsrdb_v2di}: Add clauses.
	* doc/extend.texi: Add description for vec_sldb and vec_srdb.

gcc/testsuite/ChangeLog

2020-07-06  Carl Love  <cel@us.ibm.com>

	* gcc.target/powerpc/vec-shift-double-runnable.c:  New test file.
---
 gcc/config/rs6000/altivec.h                   |   2 +
 gcc/config/rs6000/altivec.md                  |  18 +
 gcc/config/rs6000/rs6000-builtin.def          |  12 +
 gcc/config/rs6000/rs6000-call.c               |  70 ++++
 gcc/doc/extend.texi                           |  53 +++
 .../powerpc/vec-shift-double-runnable.c       | 384 ++++++++++++++++++
 6 files changed, 539 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 560c43cfc99..c202fcf25da 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -703,6 +703,8 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
 #define vec_replace_elt(a, b, c)       __builtin_vec_replace_elt (a, b, c)
 #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c)
+#define vec_sldb(a, b, c)      __builtin_vec_sldb (a, b, c)
+#define vec_srdb(a, b, c)      __builtin_vec_srdb (a, b, c)
 
 #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
 #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 749b2c42c14..c58fb3961e0 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -172,6 +172,8 @@
    UNSPEC_XXEVAL
    UNSPEC_VSTRIR
    UNSPEC_VSTRIL
+   UNSPEC_SLDB
+   UNSPEC_SRDB
 ])
 
 (define_c_enum "unspecv"
@@ -782,6 +784,22 @@
   DONE;
 })
 
+;; Map UNSPEC_SLDB to "l" and  UNSPEC_SRDB to "r".
+(define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
+			  (UNSPEC_SRDB "r")])
+
+(define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
+
+(define_insn "vs<SLDB_lr>db_<mode>"
+ [(set (match_operand:VI2 0 "register_operand" "=v")
+  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
+	       (match_operand:VI2 2 "register_operand" "v")
+	       (match_operand:QI 3 "const_0_to_12_operand" "n")]
+	      VSHIFT_DBL_LR))]
+  "TARGET_POWER10"
+  "vs<SLDB_lr>dbi %0,%1,%2,%3"
+  [(set_attr "type" "vecsimple")])
+
 (define_expand "vstrir_<mode>"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
 	(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index e22b3e4d53b..c6fdfadeda8 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2738,6 +2738,16 @@ BU_P10V_3 (VREPLACE_UN_V2DI, "vreplace_un_v2di", CONST, vreplace_un_v2di)
 BU_P10V_3 (VREPLACE_UN_UV2DI, "vreplace_un_uv2di", CONST, vreplace_un_v2di)
 BU_P10V_3 (VREPLACE_UN_V2DF, "vreplace_un_v2df", CONST, vreplace_un_v2df)
 
+BU_P10V_3 (VSLDB_V16QI, "vsldb_v16qi", CONST, vsldb_v16qi)
+BU_P10V_3 (VSLDB_V8HI, "vsldb_v8hi", CONST, vsldb_v8hi)
+BU_P10V_3 (VSLDB_V4SI, "vsldb_v4si", CONST, vsldb_v4si)
+BU_P10V_3 (VSLDB_V2DI, "vsldb_v2di", CONST, vsldb_v2di)
+
+BU_P10V_3 (VSRDB_V16QI, "vsrdb_v16qi", CONST, vsrdb_v16qi)
+BU_P10V_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
+BU_P10V_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
+BU_P10V_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
+
 BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
 BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
 BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
@@ -2761,6 +2771,8 @@ BU_P10_OVERLOAD_3 (INSERTL, "insertl")
 BU_P10_OVERLOAD_3 (INSERTH, "inserth")
 BU_P10_OVERLOAD_3 (REPLACE_ELT, "replace_elt")
 BU_P10_OVERLOAD_3 (REPLACE_UN, "replace_un")
+BU_P10_OVERLOAD_3 (SLDB, "sldb")
+BU_P10_OVERLOAD_3 (SRDB, "srdb")
 
 BU_P10_OVERLOAD_1 (VSTRIR, "strir")
 BU_P10_OVERLOAD_1 (VSTRIL, "stril")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index d5d294fd940..edc67fafd88 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5663,6 +5663,56 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_V2DF,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_double, RS6000_BTI_INTQI },
 
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SLDB, P10_BUILTIN_VSLDB_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
+
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
+
   { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
   { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
@@ -10063,6 +10113,26 @@ rs6000_expand_quaternop_builtin (enum insn_code icode, tree exp, rtx target)
 	}
    }
 
+  else if (icode == CODE_FOR_vsldb_v16qi
+	   || icode == CODE_FOR_vsldb_v8hi
+	   || icode == CODE_FOR_vsldb_v4si
+	   || icode == CODE_FOR_vsldb_v2di
+	   || icode == CODE_FOR_vsrdb_v16qi
+	   || icode == CODE_FOR_vsrdb_v8hi
+	   || icode == CODE_FOR_vsrdb_v4si
+	   || icode == CODE_FOR_vsrdb_v2di)
+   {
+     /* Check whether the 3rd argument is an integer constant in the range
+	0 to 7 inclusive.  */
+     STRIP_NOPS (arg2);
+     if (TREE_CODE (arg2) != INTEGER_CST
+	 || !IN_RANGE (TREE_INT_CST_LOW (arg2), 0, 7))
+	{
+	  error ("argument 3 must be a constant in the range 0 to 7");
+	  return CONST0_RTX (tmode);
+	}
+   }
+
   if (target == 0
       || GET_MODE (target) != tmode
       || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b9cbd136316..1c39be37c1d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21112,6 +21112,59 @@ The programmer is responsible for understanding the endianness issues involved
 with the first argument and the result.
 @findex vec_replace_unaligned
 
+Vector Shift Left Double Bit Immediate
+@smallexample
+@exdent vector signed char vec_sldb (vector signed char, vector signed char,
+const unsigned int);
+@exdent vector unsigned char vec_sldb (vector unsigned char,
+vector unsigned char, const unsigned int);
+@exdent vector signed short vec_sldb (vector signed short, vector signed short,
+const unsigned int);
+@exdent vector unsigned short vec_sldb (vector unsigned short,
+vector unsigned short, const unsigned int);
+@exdent vector signed int vec_sldb (vector signed int, vector signed int,
+const unsigned int);
+@exdent vector unsigned int vec_sldb (vector unsigned int, vector unsigned int,
+const unsigned int);
+@exdent vector signed long long vec_sldb (vector signed long long,
+vector signed long long, const unsigned int);
+@exdent vector unsigned long long vec_sldb (vector unsigned long long,
+vector unsigned long long, const unsigned int);
+@end smallexample
+
+Shift the combined input vectors left by the amount specified by the low-order
+three bits of the third argument, and return the leftmost remaining 128 bits.
+Code using this instruction must be endian-aware.
+
+@findex vec_sldb
+
+Vector Shift Right Double Bit Immediate
+
+@smallexample
+@exdent vector signed char vec_srdb (vector signed char, vector signed char,
+const unsigned int);
+@exdent vector unsigned char vec_srdb (vector unsigned char, vector unsigned char,
+const unsigned int);
+@exdent vector signed short vec_srdb (vector signed short, vector signed short,
+const unsigned int);
+@exdent vector unsigned short vec_srdb (vector unsigned short, vector unsigned short,
+const unsigned int);
+@exdent vector signed int vec_srdb (vector signed int, vector signed int,
+const unsigned int);
+@exdent vector unsigned int vec_srdb (vector unsigned int, vector unsigned int,
+const unsigned int);
+@exdent vector signed long long vec_srdb (vector signed long long,
+vector signed long long, const unsigned int);
+@exdent vector unsigned long long vec_srdb (vector unsigned long long,
+vector unsigned long long, const unsigned int);
+@end smallexample
+
+Shift the combined input vectors right by the amount specified by the low-order
+three bits of the third argument, and return the remaining 128 bits.  Code
+using this built-in must be endian-aware.
+
+@findex vec_srdb
+
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_pext (vector unsigned long long int, vector unsigned long long int)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c
new file mode 100644
index 00000000000..13213bd22ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c
@@ -0,0 +1,384 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+
+  vector signed char vresult_char;
+  vector signed char expected_vresult_char;
+  vector signed char src_va_char;
+  vector signed char src_vb_char;
+
+  vector unsigned char vresult_uchar;
+  vector unsigned char expected_vresult_uchar;
+  vector unsigned char src_va_uchar;
+  vector unsigned char src_vb_uchar;
+
+  vector short int vresult_sh;
+  vector short int expected_vresult_sh;
+  vector short int src_va_sh;
+  vector short int src_vb_sh;
+
+  vector short unsigned int vresult_ush;
+  vector short unsigned int expected_vresult_ush;
+  vector short unsigned int src_va_ush;
+  vector short unsigned int src_vb_ush;
+
+  vector int vresult_int;
+  vector int expected_vresult_int;
+  vector int src_va_int;
+  vector int src_vb_int;
+  int src_a_int;
+
+  vector unsigned int vresult_uint;
+  vector unsigned int expected_vresult_uint;
+  vector unsigned int src_va_uint;
+  vector unsigned int src_vb_uint;
+  unsigned int src_a_uint;
+
+  vector long long int vresult_llint;
+  vector long long int expected_vresult_llint;
+  vector long long int src_va_llint;
+  vector long long int src_vb_llint;
+  long long int src_a_llint;
+
+  vector unsigned long long int vresult_ullint;
+  vector unsigned long long int expected_vresult_ullint;
+  vector unsigned long long int src_va_ullint;
+  vector unsigned long long int src_vb_ullint;
+  unsigned int long long src_a_ullint;
+
+  /* Vector shift double left */
+  src_va_char = (vector signed char) { 0, 2, 4, 6, 8, 10, 12, 14,
+				       16, 18, 20, 22, 24, 26, 28, 30 }; 
+  src_vb_char = (vector signed char) { 10, 20, 30, 40, 50, 60, 70, 80, 90,
+					100, 110, 120, 130, 140, 150, 160 };
+  vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_char = (vector signed char) { 80, 0, 1, 2, 3, 4, 5, 6, 7,
+						 8, 9, 10, 11, 12, 13, 14 }; 
+						 
+  vresult_char = vec_sldb (src_va_char, src_vb_char, 7);
+
+  if (!vec_all_eq (vresult_char,  expected_vresult_char)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_char_, src_vb_char, 7)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n",
+	     i, vresult_char[i], i, expected_vresult_char[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_uchar = (vector unsigned char) { 0, 2, 4, 6, 8, 10, 12, 14,
+					  16, 18, 20, 22, 24, 26, 28, 30 }; 
+  src_vb_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar = (vector unsigned char) { 0, 0, 1, 2, 3, 4, 5, 6, 7,
+						    8, 9, 10, 11, 12, 13, 14 };
+						 
+  vresult_uchar = vec_sldb (src_va_uchar, src_vb_uchar, 7);
+
+  if (!vec_all_eq (vresult_uchar,  expected_vresult_uchar)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_uchar_, src_vb_uchar, 7)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_sh = (vector short int) { 0, 2, 4, 6, 8, 10, 12, 14 };
+  src_vb_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  vresult_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_sh = (vector short int) { 0, 2*128, 4*128, 6*128,
+					     8*128, 10*128, 12*128, 14*128 }; 
+						 
+  vresult_sh = vec_sldb (src_va_sh, src_vb_sh, 7);
+
+  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_sh_, src_vb_sh, 7)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
+	     i, vresult_sh[i], i, expected_vresult_sh[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_ush = (vector short unsigned int) { 0, 2, 4, 6, 8, 10, 12, 14 };
+  src_vb_ush = (vector short unsigned int) { 10, 20, 30, 40, 50, 60, 70, 80 };
+  vresult_ush = (vector short unsigned int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ush = (vector short unsigned int) { 0, 2*128, 4*128, 6*128,
+						       8*128, 10*128, 12*128,
+						       14*128 }; 
+						 
+  vresult_ush = vec_sldb (src_va_ush, src_vb_ush, 7);
+
+  if (!vec_all_eq (vresult_ush,  expected_vresult_ush)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_ush_, src_vb_ush, 7)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_ush[%d] = %d, expected_vresult_ush[%d] = %d\n",
+	     i, vresult_ush[i], i, expected_vresult_ush[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_int = (vector signed int) { 0, 2, 3, 1 };
+  src_vb_int = (vector signed int) { 0, 0, 0, 0 };
+  vresult_int = (vector signed int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector signed int) { 0, 2*128, 3*128, 1*128 }; 
+						 
+  vresult_int = vec_sldb (src_va_int, src_vb_int, 7);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_int_, src_vb_int, 7)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_uint = (vector unsigned int) { 0, 2, 4, 6 };
+  src_vb_uint = (vector unsigned int) { 10, 20, 30, 40 };
+  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_uint = (vector unsigned int) { 0, 2*128, 4*128, 6*128 }; 
+						 
+  vresult_uint = vec_sldb (src_va_uint, src_vb_uint, 7);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_uint_, src_vb_uint, 7)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_llint = (vector signed long long int) { 5, 6 };
+  src_vb_llint = (vector signed long long int) { 0, 0 };
+  vresult_llint = (vector signed long long int) { 0, 0 };
+  expected_vresult_llint = (vector signed long long int) { 5*128, 6*128 }; 
+						 
+  vresult_llint = vec_sldb (src_va_llint, src_vb_llint, 7);
+
+  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_llint_, src_vb_llint, 7)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
+	     i, vresult_llint[i], i, expected_vresult_llint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_ullint = (vector unsigned long long int) { 54, 26 };
+  src_vb_ullint = (vector unsigned long long int) { 10, 20 };
+  vresult_ullint = (vector unsigned long long int) { 0, 0 };
+  expected_vresult_ullint = (vector unsigned long long int) { 54*128,
+							      26*128 }; 
+						 
+  vresult_ullint = vec_sldb (src_va_ullint, src_vb_ullint, 7);
+
+  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
+#if DEBUG
+    printf("ERROR, vec_sldb (src_va_ullint_, src_vb_ullint, 7)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
+	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector shift double right */
+  src_va_char = (vector signed char) { 0, 2, 4, 6, 8, 10, 12, 14,
+				       16, 18, 20, 22, 24, 26, 28, 30 }; 
+  src_vb_char = (vector signed char) { 10, 12, 14, 16, 18, 20, 22, 24, 26,
+					28, 30, 32, 34, 36, 38, 40 };
+  vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_char = (vector signed char) { 24, 28, 32, 36, 40, 44, 48,
+						 52, 56, 60, 64, 68, 72, 76,
+						 80, 0 }; 
+						 
+  vresult_char = vec_srdb (src_va_char, src_vb_char, 7);
+
+  if (!vec_all_eq (vresult_char,  expected_vresult_char)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_char_, src_vb_char, 7)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n",
+	     i, vresult_char[i], i, expected_vresult_char[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_uchar = (vector unsigned char) { 100, 0, 0, 0, 0, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
+  src_vb_uchar = (vector unsigned char) { 0, 2, 4, 6, 8, 10, 12, 14,
+					  16, 18, 20, 22, 24, 26, 28, 30 }; 
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar = (vector unsigned char) { 4, 8, 12, 16, 20, 24, 28,
+						    32, 36, 40, 44, 48, 52,
+						    56, 60, 200 };
+						 
+  vresult_uchar = vec_srdb (src_va_uchar, src_vb_uchar, 7);
+
+  if (!vec_all_eq (vresult_uchar,  expected_vresult_uchar)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_uchar_, src_vb_uchar, 7)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  src_vb_sh = (vector short int) { 0, 2*128, 4*128, 6*128,
+					     8*128, 10*128, 12*128, 14*128 };
+  vresult_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_sh = (vector short int) { 0, 2, 4, 6, 8, 10, 12, 14 }; 
+						 
+  vresult_sh = vec_srdb (src_va_sh, src_vb_sh, 7);
+
+  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_sh_, src_vb_sh, 7)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
+	     i, vresult_sh[i], i, expected_vresult_sh[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_ush = (vector short unsigned int) { 0, 20, 30, 40, 50, 60, 70, 80 };
+  src_vb_ush = (vector short unsigned int) { 0, 2*128, 4*128, 6*128,
+					     8*128, 10*128, 12*128, 14*128 };
+  vresult_ush = (vector short unsigned int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ush = (vector short unsigned int) { 0, 2, 4, 6, 8, 10,
+						       12, 14 }; 
+						 
+  vresult_ush = vec_srdb (src_va_ush, src_vb_ush, 7);
+
+  if (!vec_all_eq (vresult_ush,  expected_vresult_ush)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_ush_, src_vb_ush, 7)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_ush[%d] = %d, expected_vresult_ush[%d] = %d\n",
+	     i, vresult_ush[i], i, expected_vresult_ush[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_int = (vector signed int) { 0, 0, 0, 0 };
+  src_vb_int = (vector signed int) { 0, 2*128, 3*128, 1*128 };
+  vresult_int = (vector signed int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector signed int) { 0, 2, 3, 1  }; 
+						 
+  vresult_int = vec_srdb (src_va_int, src_vb_int, 7);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_int_, src_vb_int, 7)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_uint = (vector unsigned int) { 0, 20, 30, 40 };
+  src_vb_uint = (vector unsigned int) { 128, 2*128, 4*128, 6*128 };
+  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_uint = (vector unsigned int) { 1, 2, 4, 6 }; 
+						 
+  vresult_uint = vec_srdb (src_va_uint, src_vb_uint, 7);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_uint_, src_vb_uint, 7)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_llint = (vector signed long long int) { 0, 0 };
+  src_vb_llint = (vector signed long long int) { 5*128, 6*128 };
+  vresult_llint = (vector signed long long int) { 0, 0 };
+  expected_vresult_llint = (vector signed long long int) { 5, 6 }; 
+						 
+  vresult_llint = vec_srdb (src_va_llint, src_vb_llint, 7);
+
+  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_llint_, src_vb_llint, 7)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
+	     i, vresult_llint[i], i, expected_vresult_llint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_va_ullint = (vector unsigned long long int) { 0, 0 };
+  src_vb_ullint = (vector unsigned long long int) { 54*128, 26*128 };
+  vresult_ullint = (vector unsigned long long int) { 0, 0 };
+  expected_vresult_ullint = (vector unsigned long long int) { 54, 26 }; 
+
+  vresult_ullint = vec_srdb (src_va_ullint, src_vb_ullint, 7);
+
+  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
+#if DEBUG
+    printf("ERROR, vec_srdb (src_va_ullint_, src_vb_ullint, 7)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
+	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\msldbi\M} 6 } } */
+/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */
+
+
-- 
2.17.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
@ 2020-07-08 19:59 Carl Love
  2020-07-09 16:02 ` will schmidt
  2020-07-13 14:30 ` Segher Boessenkool
  0 siblings, 2 replies; 18+ messages in thread
From: Carl Love @ 2020-07-08 19:59 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

[PATCH 3/6] rs6000, Add vector replace builtin support

----------------------------------
V4 Fixes:

   Rebased on mainline.  Changed FUTURE to P10 in code and ChangeLog.
   Set DEBUG to 0 in vec-replace-word-runnable.c test program.
   Fixed too long lines in ChangeLog.

----------------------------------
V3 fixes:
   Fixed bad word breaks in ChangLog.
   Replace spaces with tabs in ChangeLog.

------------------------------------
v2 fixes:

change log entries config/rs6000/vsx.md, config/rs6000/rs6000-builtin.def,
config/rs6000/rs6000-call.c.

gcc/config/rs6000/rs6000-call.c: fixed if check for 3rd arg between 0 and 3
                                 fixed if check for 3rd arg between 0 and 12

gcc/config/rs6000/vsx.md: removed REPLACE_ELT_atr definition and used
                          VS_scalar instead.
                          removed REPLACE_ELT_inst definition and used
			  <mode> instead
                          fixed spelling mistake on Endianness.
                          fixed indenting for vreplace_elt_<mode>

-----------------------------------

GCC maintainers:

The following patch adds support for builtins vec_replace_elt and
vec_replace_unaligned.

The patch has been compiled and tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

and mambo with no regression errors.

Please let me know if this patch is acceptable for the mainline
branch.  Thanks.

                         Carl Love

-------------------------------------------------------

gcc/ChangeLog

2020-07-06 Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.h: Add define for vec_replace_elt and
	vec_replace_unaligned.
	* config/rs6000/vsx.md (UNSPEC_REPLACE_ELT, UNSPEC_REPLACE_UN): New.
	(REPLACE_ELT): New mode iterator.
	(REPLACE_ELT_atr, REPLACE_ELT_inst, REPLACE_ELT_char,
	REPLACE_ELT_sh, REPLACE_ELT_max): New mode attributes.
	(vreplace_un_<mode>, vreplace_elt_<mode>_inst): New.
	* config/rs6000/rs6000-builtin.def (VREPLACE_ELT_V4SI,
	VREPLACE_ELT_UV4SI, VREPLACE_ELT_V4SF, VREPLACE_ELT_UV2DI,
	VREPLACE_ELT_V2DF, VREPLACE_UN_V4SI, VREPLACE_UN_UV4SI,
	VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, VREPLACE_UN_UV2DI,
	VREPLACE_UN_V2DF, (REPLACE_ELT, REPLACE_UN): New.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_REPLACE_ELT,
	P10_BUILTIN_VEC_REPLACE_UN): New.
	(rs6000_expand_ternop_builtin): Add 3rd argument checks for
	CODE_FOR_vreplace_elt_v4si, CODE_FOR_vreplace_elt_v4sf,
	CODE_FOR_vreplace_un_v4si, CODE_FOR_vreplace_un_v4sf.
	(builtin_function_type) [P10_BUILTIN_VREPLACE_ELT_UV4SI,
	P10_BUILTIN_VREPLACE_ELT_UV2DI, P10_BUILTIN_VREPLACE_UN_UV4SI,
	P10_BUILTIN_VREPLACE_UN_UV2DI]: New cases.
	* doc/extend.texi: Add description for vec_replace_elt and
	vec_replace_unaligned builtins.

gcc/testsuite/ChangeLog

2020-07-06 Carl Love  <cel@us.ibm.com>

	* gcc.target/powerpc/vec-replace-word.c: Add new test.
---
 gcc/config/rs6000/altivec.h                   |   2 +
 gcc/config/rs6000/rs6000-builtin.def          |  16 +
 gcc/config/rs6000/rs6000-call.c               |  61 ++++
 gcc/config/rs6000/vsx.md                      |  60 ++++
 gcc/doc/extend.texi                           |  50 +++
 .../powerpc/vec-replace-word-runnable.c       | 289 ++++++++++++++++++
 6 files changed, 478 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 0563853c03f..560c43cfc99 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -701,6 +701,8 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_extracth(a, b, c)	__builtin_vec_extracth (a, b, c)
 #define vec_insertl(a, b, c)   __builtin_vec_insertl (a, b, c)
 #define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
+#define vec_replace_elt(a, b, c)       __builtin_vec_replace_elt (a, b, c)
+#define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c)
 
 #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
 #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index e73d144c1cc..e22b3e4d53b 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2724,6 +2724,20 @@ BU_P10V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi)
 BU_P10V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi)
 BU_P10V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si)
 
+BU_P10V_3 (VREPLACE_ELT_V4SI, "vreplace_v4si", CONST, vreplace_elt_v4si)
+BU_P10V_3 (VREPLACE_ELT_UV4SI, "vreplace_uv4si", CONST, vreplace_elt_v4si)
+BU_P10V_3 (VREPLACE_ELT_V4SF, "vreplace_v4sf", CONST, vreplace_elt_v4sf)
+BU_P10V_3 (VREPLACE_ELT_V2DI, "vreplace_v2di", CONST, vreplace_elt_v2di)
+BU_P10V_3 (VREPLACE_ELT_UV2DI, "vreplace_uv2di", CONST, vreplace_elt_v2di)
+BU_P10V_3 (VREPLACE_ELT_V2DF, "vreplace_v2df", CONST, vreplace_elt_v2df)
+
+BU_P10V_3 (VREPLACE_UN_V4SI, "vreplace_un_v4si", CONST, vreplace_un_v4si)
+BU_P10V_3 (VREPLACE_UN_UV4SI, "vreplace_un_uv4si", CONST, vreplace_un_v4si)
+BU_P10V_3 (VREPLACE_UN_V4SF, "vreplace_un_v4sf", CONST, vreplace_un_v4sf)
+BU_P10V_3 (VREPLACE_UN_V2DI, "vreplace_un_v2di", CONST, vreplace_un_v2di)
+BU_P10V_3 (VREPLACE_UN_UV2DI, "vreplace_un_uv2di", CONST, vreplace_un_v2di)
+BU_P10V_3 (VREPLACE_UN_V2DF, "vreplace_un_v2df", CONST, vreplace_un_v2df)
+
 BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
 BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
 BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
@@ -2745,6 +2759,8 @@ BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
 BU_P10_OVERLOAD_3 (INSERTL, "insertl")
 BU_P10_OVERLOAD_3 (INSERTH, "inserth")
+BU_P10_OVERLOAD_3 (REPLACE_ELT, "replace_elt")
+BU_P10_OVERLOAD_3 (REPLACE_UN, "replace_un")
 
 BU_P10_OVERLOAD_1 (VSTRIR, "strir")
 BU_P10_OVERLOAD_1 (VSTRIL, "stril")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 820b361c0f6..d5d294fd940 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5633,6 +5633,36 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
 
+  { P10_BUILTIN_VEC_REPLACE_ELT, P10_BUILTIN_VREPLACE_ELT_UV4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_UINTSI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_REPLACE_ELT, P10_BUILTIN_VREPLACE_ELT_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, RS6000_BTI_INTQI },
+  { P10_BUILTIN_VEC_REPLACE_ELT, P10_BUILTIN_VREPLACE_ELT_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_float, RS6000_BTI_INTQI },
+  { P10_BUILTIN_VEC_REPLACE_ELT, P10_BUILTIN_VREPLACE_ELT_UV2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_UINTDI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_REPLACE_ELT, P10_BUILTIN_VREPLACE_ELT_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTDI, RS6000_BTI_INTQI },
+  { P10_BUILTIN_VEC_REPLACE_ELT, P10_BUILTIN_VREPLACE_ELT_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_double, RS6000_BTI_INTQI },
+
+  { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_UV4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_UINTSI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, RS6000_BTI_INTQI },
+  { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_float, RS6000_BTI_INTQI },
+  { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_UV2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_UINTDI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTDI, RS6000_BTI_INTQI },
+  { P10_BUILTIN_VEC_REPLACE_UN, P10_BUILTIN_VREPLACE_UN_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_double, RS6000_BTI_INTQI },
+
   { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
   { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
@@ -10005,6 +10035,33 @@ rs6000_expand_quaternop_builtin (enum insn_code icode, tree exp, rtx target)
 	  return CONST0_RTX (tmode);
 	}
     }
+  else if (icode == CODE_FOR_vreplace_elt_v4si
+	   || icode == CODE_FOR_vreplace_elt_v4sf)
+   {
+     /* Check whether the 3rd argument is an integer constant in the range
+	0 to 3 inclusive.  */
+     STRIP_NOPS (arg2);
+     if (TREE_CODE (arg2) != INTEGER_CST
+	 || !IN_RANGE (TREE_INT_CST_LOW (arg2), 0, 3))
+	{
+	  error ("argument 3 must be in the range 0 to 3");
+	  return CONST0_RTX (tmode);
+	}
+   }
+
+  else if (icode == CODE_FOR_vreplace_un_v4si
+	   || icode == CODE_FOR_vreplace_un_v4sf)
+   {
+     /* Check whether the 3rd argument is an integer constant in the range
+	0 to 12 inclusive.  */
+     STRIP_NOPS (arg2);
+     if (TREE_CODE (arg2) != INTEGER_CST
+	 || !IN_RANGE(TREE_INT_CST_LOW (arg2), 0, 12))
+	{
+	  error ("argument 3 must be in the range 0 to 12");
+	  return CONST0_RTX (tmode);
+	}
+   }
 
   if (target == 0
       || GET_MODE (target) != tmode
@@ -13839,6 +13896,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10_BUILTIN_VINSERTVPRBL:
     case P10_BUILTIN_VINSERTVPRHL:
     case P10_BUILTIN_VINSERTVPRWL:
+    case P10_BUILTIN_VREPLACE_ELT_UV4SI:
+    case P10_BUILTIN_VREPLACE_ELT_UV2DI:
+    case P10_BUILTIN_VREPLACE_UN_UV4SI:
+    case P10_BUILTIN_VREPLACE_UN_UV2DI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e9d45d1dcfd..5601dbaadad 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -351,6 +351,8 @@
    UNSPEC_EXTRACTR
    UNSPEC_INSERTL
    UNSPEC_INSERTR
+   UNSPEC_REPLACE_ELT
+   UNSPEC_REPLACE_UN
   ])
 
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
@@ -362,6 +364,15 @@
 ;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
 (define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
 
+;; Vector extract_elt iterator/attr for 32-bit and 64-bit elements
+(define_mode_iterator REPLACE_ELT [V4SI V4SF V2DI V2DF])
+(define_mode_attr REPLACE_ELT_char [(V4SI "w") (V4SF "w")
+				    (V2DI  "d") (V2DF "d")])
+(define_mode_attr REPLACE_ELT_sh [(V4SI "2") (V4SF "2")
+				  (V2DI  "3") (V2DF "3")])
+(define_mode_attr REPLACE_ELT_max [(V4SI "12") (V4SF "12")
+				   (V2DI  "8") (V2DF "8")])
+
 ;; VSX moves
 
 ;; The patterns for LE permuted loads and stores come before the general
@@ -3975,6 +3986,55 @@
  "vins<wd>rx %0,%1,%2"
  [(set_attr "type" "vecsimple")])
 
+(define_expand "vreplace_elt_<mode>"
+  [(set (match_operand:REPLACE_ELT 0 "register_operand")
+  (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand")
+		       (match_operand:<VS_scalar> 2 "register_operand")
+		       (match_operand:QI 3 "const_0_to_3_operand")]
+		      UNSPEC_REPLACE_ELT))]
+ "TARGET_POWER10"
+{
+   int index;
+   /* Immediate value is the word index, convert to byte index and adjust for
+      Endianness if needed.  */
+   if (BYTES_BIG_ENDIAN)
+     index = INTVAL (operands[3]) << <REPLACE_ELT_sh>;
+
+   else
+     index = <REPLACE_ELT_max> - (INTVAL (operands[3]) << <REPLACE_ELT_sh>);
+
+   emit_insn (gen_vreplace_elt_<mode>_inst (operands[0], operands[1],
+					    operands[2],
+					    GEN_INT (index)));
+   DONE;
+ }
+[(set_attr "type" "vecsimple")])
+
+(define_expand "vreplace_un_<mode>"
+ [(set (match_operand:REPLACE_ELT 0 "register_operand")
+ (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand")
+		      (match_operand:<VS_scalar> 2 "register_operand")
+		      (match_operand:QI 3 "const_0_to_12_operand")]
+		     UNSPEC_REPLACE_UN))]
+ "TARGET_POWER10"
+{
+   /* Immediate value is the byte index Big Endian numbering.  */
+   emit_insn (gen_vreplace_elt_<mode>_inst (operands[0], operands[1],
+					    operands[2], operands[3]));
+   DONE;
+ }
+[(set_attr "type" "vecsimple")])
+
+(define_insn "vreplace_elt_<mode>_inst"
+ [(set (match_operand:REPLACE_ELT 0 "register_operand" "=v")
+  (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand" "0")
+		       (match_operand:<VS_scalar> 2 "register_operand" "r")
+		       (match_operand:QI 3 "const_0_to_12_operand" "n")]
+		      UNSPEC_REPLACE_ELT))]
+ "TARGET_POWER10"
+ "vins<REPLACE_ELT_char> %0,%2,%3"
+ [(set_attr "type" "vecsimple")])
+
 ;; VSX_EXTRACT optimizations
 ;; Optimize double d = (double) vec_extract (vi, <n>)
 ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e643346a160..b9cbd136316 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21062,6 +21062,56 @@ This is a limitation of the bi-endian vector programming model consistent with
 the limitation on vec_perm, for example.
 @findex vec_inserth
 
+Vector Replace Element
+@smallexample
+@exdent vector signed int vec_replace_elt (vector signed int, signed int,
+const int);
+@exdent vector unsigned int vec_replace_elt (vector unsigned int,
+unsigned int, const int);
+@exdent vector float vec_replace_elt (vector float, float, const int);
+@exdent vector signed long long vec_replace_elt (vector signed long long,
+signed long long, const int);
+@exdent vector unsigned long long vec_replace_elt (vector unsigned long long,
+unsigned long long, const int);
+@exdent vector double rec_replace_elt (vector double, double, const int);
+@end smallexample
+The third argument (constrained to [0,3]) identifies the natural-endian
+element number of the first argument that will be replaced by the second
+argument to produce the result.  The other elements of the first argument will
+remain unchanged in the result.
+
+If it's desirable to insert a word at an unaligned position, use
+vec_replace_unaligned instead.
+
+@findex vec_replace_element
+
+Vector Replace Unaligned
+@smallexample
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+signed int, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+unsigned int, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+float, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+signed long long, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+unsigned long long, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+double, const int);
+@end smallexample
+
+The second argument replaces a portion of the first argument to produce the
+result, with the rest of the first argument unchanged in the result.  The
+third argument identifies the byte index (using left-to-right, or big-endian
+order) where the high-order byte of the second argument will be placed, with
+the remaining bytes of the second argument placed naturally "to the right"
+of the high-order byte.
+
+The programmer is responsible for understanding the endianness issues involved
+with the first argument and the result.
+@findex vec_replace_unaligned
+
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_pext (vector unsigned long long int, vector unsigned long long int)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
new file mode 100644
index 00000000000..94af2106482
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
@@ -0,0 +1,289 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  unsigned char ch;
+  unsigned int index;
+
+  vector unsigned int vresult_uint;
+  vector unsigned int expected_vresult_uint;
+  vector unsigned int src_va_uint;
+  vector unsigned int src_vb_uint;
+  unsigned int src_a_uint;
+
+  vector int vresult_int;
+  vector int expected_vresult_int;
+  vector int src_va_int;
+  vector int src_vb_int;
+  int src_a_int;
+
+  vector unsigned long long int vresult_ullint;
+  vector unsigned long long int expected_vresult_ullint;
+  vector unsigned long long int src_va_ullint;
+  vector unsigned long long int src_vb_ullint;
+  unsigned int long long src_a_ullint;
+
+  vector long long int vresult_llint;
+  vector long long int expected_vresult_llint;
+  vector long long int src_va_llint;
+  vector long long int src_vb_llint;
+  long long int src_a_llint;
+
+  vector float vresult_float;
+  vector float expected_vresult_float;
+  vector float src_va_float;
+  float src_a_float;
+
+  vector double vresult_double;
+  vector double expected_vresult_double;
+  vector double src_va_double;
+  double src_a_double;
+
+  /* Vector replace 32-bit element */
+  src_a_uint = 345;
+  src_va_uint = (vector unsigned int) { 0, 1, 2, 3 };
+  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_uint = (vector unsigned int) { 0, 1, 345, 3 };
+						 
+  vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_int = 234;
+  src_va_int = (vector int) { 0, 1, 2, 3 };
+  vresult_int = (vector int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector int) { 0, 234, 2, 3 };
+						 
+  vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_replace_elt (src_vb_int, src_va_int, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+  
+  src_a_float = 34.0;
+  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
+  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
+  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 };
+						 
+  vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
+
+  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
+#if DEBUG
+    printf("ERROR, vec_replace_elt (src_vb_float, src_va_float, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
+	     i, vresult_float[i], i, expected_vresult_float[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector replace 64-bit element */
+  src_a_ullint = 456;
+  src_va_ullint = (vector unsigned long long int) { 0, 1 };
+  vresult_ullint = (vector unsigned long long int) { 0, 0 };
+  expected_vresult_ullint = (vector unsigned long long int) { 0, 456 };
+						 
+  vresult_ullint = vec_replace_elt (src_va_ullint, src_a_ullint, 1);
+
+  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
+#if DEBUG
+    printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint, index)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
+	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_llint = 678;
+  src_va_llint = (vector long long int) { 0, 1 };
+  vresult_llint = (vector long long int) { 0, 0 };
+  expected_vresult_llint = (vector long long int) { 0, 678 };
+						 
+  vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
+
+  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+#if DEBUG
+    printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint, index)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
+	     i, vresult_llint[i], i, expected_vresult_llint[i]);
+#else
+    abort();
+#endif
+  }
+  
+  src_a_double = 678.0;
+  src_va_double = (vector double) { 0.0, 50.0 };
+  vresult_double = (vector double) { 0.0, 0.0 };
+  expected_vresult_double = (vector double) { 0.0, 678.0 };
+						 
+  vresult_double = vec_replace_elt (src_va_double, src_a_double, 1);
+
+  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
+#if DEBUG
+    printf("ERROR, vec_replace_elt (src_vb_double, src_va_double, index)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
+	     i, vresult_double[i], i, expected_vresult_double[i]);
+#else
+    abort();
+#endif
+  }
+
+
+  /* Vector replace 32-bit element, unaligned */
+  src_a_uint = 345;
+  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
+  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  /* Byte index 7 will overwrite part of elements 2 and 3 */
+  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0 };
+						 
+  vresult_uint = vec_replace_unaligned (src_va_uint, src_a_uint, 3);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_replace_unaligned (src_vb_uint, src_va_uint, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_int = 234;
+  src_va_int = (vector int) { 1, 0, 3, 4 };
+  vresult_int = (vector int) { 0, 0, 0, 0 };
+  /* Byte index 7 will over write part of elements 1 and 2 */
+  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
+						 
+  vresult_int = vec_replace_unaligned (src_va_int, src_a_int, 7);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_float = 34.0;
+  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
+  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
+  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 };
+						 
+  vresult_float = vec_replace_unaligned (src_va_float, src_a_float, 8);
+
+  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
+#if DEBUG
+    printf("ERROR, vec_replace_unaligned (src_vb_float, src_va_float, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
+	     i, vresult_float[i], i, expected_vresult_float[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector replace 64-bit element, unaligned  */
+  src_a_ullint = 456;
+  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
+  vresult_ullint = (vector unsigned long long int) { 0, 0 };
+  expected_vresult_ullint = (vector unsigned long long int) { 456*256,
+							      0x200 };
+						 
+  /* Byte index 7 will over write least significant byte of  element 0  */
+  vresult_ullint = vec_replace_unaligned (src_va_ullint, src_a_ullint, 7);
+
+  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
+#if DEBUG
+    printf("ERROR, vec_replace_unaligned (src_vb_ullint, src_va_ullint, index)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
+	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_llint = 678;
+  src_va_llint = (vector long long int) { 0, 0x101 };
+  vresult_llint = (vector long long int) { 0, 0 };
+  /* Byte index 7 will over write least significant byte of  element 0  */
+  expected_vresult_llint = (vector long long int) { 678*256, 0x100 };
+						 
+  vresult_llint = vec_replace_unaligned (src_va_llint, src_a_llint, 7);
+
+  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+#if DEBUG
+    printf("ERROR, vec_replace_unaligned (src_vb_llint, src_va_llint, index)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
+	     i, vresult_llint[i], i, expected_vresult_llint[i]);
+#else
+    abort();
+#endif
+  }
+  
+  src_a_double = 678.0;
+  src_va_double = (vector double) { 0.0, 50.0 };
+  vresult_double = (vector double) { 0.0, 0.0 };
+  expected_vresult_double = (vector double) { 0.0, 678.0 };
+						 
+  vresult_double = vec_replace_unaligned (src_va_double, src_a_double, 0);
+
+  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
+#if DEBUG
+    printf("ERROR, vec_replace_unaligned (src_vb_double, src_va_double, index)\
+n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
+	     i, vresult_double[i], i, expected_vresult_double[i]);
+#else
+    abort();
+#endif
+  }
+    
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
+
+
-- 
2.17.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
@ 2020-07-08 19:59 Carl Love
  2020-07-09 15:44 ` will schmidt
  2020-07-13 12:04 ` Segher Boessenkool
  0 siblings, 2 replies; 18+ messages in thread
From: Carl Love @ 2020-07-08 19:59 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

[PATCH 2/6] rs6000 Add vector insert builtin support

------------------------------------
V4 changes
  Rebased on mainline.  Changed FUTURE to P10 as needed.

------------------------------------
V3 changes

  Replace spaces with of tabs in ChangeLog
  Ditto in gcc/config/rs6000/vsx.md.
  Updated description for vec_insertl() builtin.
  Cleaned up vec_insert description.

-----------------------------------------------------------------
v2 changes

Fix change log entry for config/rs6000/altivec.h

Fix change log entry for config/rs6000/rs6000-builtin.def

Fix change log entry for config/rs6000/rs6000-call.c

vsx.md: Fixed if (BYTES_BIG_ENDIAN) else statements.
Porting error from pu branch.

---------------------------------------------------------------
GCC maintainers:

This patch adds support for vec_insertl and vec_inserth builtins.

The patch has been compiled and tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

and mambo with no regression errors.

Please let me know if this patch is acceptable for the mainline branch.

Thanks.

                         Carl Love

--------------------------------------------------------------
gcc/ChangeLog

2020-07-02  Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.h (vec_insertl, vec_inserth): New defines.
	* config/rs6000/rs6000-builtin.def (VINSERTGPRBL, VINSERTGPRHL,
	VINSERTGPRWL, VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL,
	VINSERTGPRBR, VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR,
	VINSERTVPRHR, VINSERTVPRWR): New builtins.
	(INSERTL, INSERTH): New builtins.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_INSERTL,
	P10_BUILTIN_VEC_INSERTH): New overloaded definitions.
	(P10_BUILTIN_VINSERTGPRBL, P10_BUILTIN_VINSERTGPRHL,
	P10_BUILTIN_VINSERTGPRWL, P10_BUILTIN_VINSERTGPRDL,
	P10_BUILTIN_VINSERTVPRBL, P10_BUILTIN_VINSERTVPRHL,
	P10_BUILTIN_VINSERTVPRWL): Add case entries.
	* config/rs6000/vsx.md (define_c_enum): Add UNSPEC_INSERTL,
	UNSPEC_INSERTR.
	(define_expand): Add vinsertvl_<mode>, vinsertvr_<mode>,
	vinsertgl_<mode>, vinsertgr_<mode>, mode is VI2.
	(define_ins): vinsertvl_internal_<mode>, vinsertvr_internal_<mode>,
	vinsertgl_internal_<mode>, vinsertgr_internal_<mode>, mode VEC_I.
	* doc/extend.texi: Add documentation for vec_insertl, vec_inserth.

gcc/testsuite/ChangeLog

2020-07-02  Carl Love  <cel@us.ibm.com>

	* gcc.target/powerpc/vec-insert-word-runnable.c: New test case.
---
 gcc/config/rs6000/altivec.h                   |   2 +
 gcc/config/rs6000/rs6000-builtin.def          |  18 +
 gcc/config/rs6000/rs6000-call.c               |  51 +++
 gcc/config/rs6000/vsx.md                      | 110 ++++++
 gcc/doc/extend.texi                           |  71 ++++
 .../powerpc/vec-insert-word-runnable.c        | 345 ++++++++++++++++++
 6 files changed, 597 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index bb1524f4a67..0563853c03f 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -699,6 +699,8 @@ __altivec_scalar_pred(vec_any_nle,
 /* Overloaded built-in functions for ISA 3.1.  */
 #define vec_extractl(a, b, c)	__builtin_vec_extractl (a, b, c)
 #define vec_extracth(a, b, c)	__builtin_vec_extracth (a, b, c)
+#define vec_insertl(a, b, c)   __builtin_vec_insertl (a, b, c)
+#define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
 
 #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
 #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 363656ec05c..e73d144c1cc 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2708,6 +2708,22 @@ BU_P10V_3 (VEXTRACTHR, "vextduhvhx", CONST, vextractrv8hi)
 BU_P10V_3 (VEXTRACTWR, "vextduwvhx", CONST, vextractrv4si)
 BU_P10V_3 (VEXTRACTDR, "vextddvhx", CONST, vextractrv2di)
 
+BU_P10V_3 (VINSERTGPRBL, "vinsgubvlx", CONST, vinsertgl_v16qi)
+BU_P10V_3 (VINSERTGPRHL, "vinsguhvlx", CONST, vinsertgl_v8hi)
+BU_P10V_3 (VINSERTGPRWL, "vinsguwvlx", CONST, vinsertgl_v4si)
+BU_P10V_3 (VINSERTGPRDL, "vinsgudvlx", CONST, vinsertgl_v2di)
+BU_P10V_3 (VINSERTVPRBL, "vinsvubvlx", CONST, vinsertvl_v16qi)
+BU_P10V_3 (VINSERTVPRHL, "vinsvuhvlx", CONST, vinsertvl_v8hi)
+BU_P10V_3 (VINSERTVPRWL, "vinsvuwvlx", CONST, vinsertvl_v4si)
+
+BU_P10V_3 (VINSERTGPRBR, "vinsgubvrx", CONST, vinsertgr_v16qi)
+BU_P10V_3 (VINSERTGPRHR, "vinsguhvrx", CONST, vinsertgr_v8hi)
+BU_P10V_3 (VINSERTGPRWR, "vinsguwvrx", CONST, vinsertgr_v4si)
+BU_P10V_3 (VINSERTGPRDR, "vinsgudvrx", CONST, vinsertgr_v2di)
+BU_P10V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi)
+BU_P10V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi)
+BU_P10V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si)
+
 BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
 BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
 BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
@@ -2727,6 +2743,8 @@ BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
+BU_P10_OVERLOAD_3 (INSERTL, "insertl")
+BU_P10_OVERLOAD_3 (INSERTH, "inserth")
 
 BU_P10_OVERLOAD_1 (VSTRIR, "strir")
 BU_P10_OVERLOAD_1 (VSTRIL, "stril")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index d3cf2de8878..820b361c0f6 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5576,6 +5576,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
 
+  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRBL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRHL,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRWL,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRDL,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI },
+ { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRBL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRHL,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRWL,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
+
   { P10_BUILTIN_VEC_EXTRACTH, P10_BUILTIN_VEXTRACTBR,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
@@ -5589,6 +5611,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
 
+  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRBR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRHR,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRWR,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRDR,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI },
+  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRBR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRHR,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI },
+  { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRWR,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI },
+
   { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
   { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL,
@@ -13788,6 +13832,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10_BUILTIN_VEXTRACTHR:
     case P10_BUILTIN_VEXTRACTWR:
     case P10_BUILTIN_VEXTRACTDR:
+    case P10_BUILTIN_VINSERTGPRBL:
+    case P10_BUILTIN_VINSERTGPRHL:
+    case P10_BUILTIN_VINSERTGPRWL:
+    case P10_BUILTIN_VINSERTGPRDL:
+    case P10_BUILTIN_VINSERTVPRBL:
+    case P10_BUILTIN_VINSERTVPRHL:
+    case P10_BUILTIN_VINSERTVPRWL:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e9f89d43b3f..e9d45d1dcfd 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -349,6 +349,8 @@
    UNSPEC_XXGENPCV
    UNSPEC_EXTRACTL
    UNSPEC_EXTRACTR
+   UNSPEC_INSERTL
+   UNSPEC_INSERTR
   ])
 
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
@@ -3865,6 +3867,114 @@
   "vext<du_or_d><wd>vrx %0,%1,%2,%3"
   [(set_attr "type" "vecsimple")])
 
+(define_expand "vinsertvl_<mode>"
+  [(set (match_operand:VI2 0 "altivec_register_operand")
+	(unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand")
+		     (match_operand:VI2 2 "altivec_register_operand")
+		     (match_operand:SI 3 "register_operand" "r")]
+		    UNSPEC_INSERTL))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+     emit_insn (gen_vinsertvl_internal_<mode> (operands[0], operands[3],
+                                               operands[1], operands[2]));
+   else
+     emit_insn (gen_vinsertvr_internal_<mode> (operands[0], operands[3],
+                                               operands[1], operands[2]));
+   DONE;
+})
+
+(define_insn "vinsertvl_internal_<mode>"
+  [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
+	(unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
+		       (match_operand:VEC_I 2 "altivec_register_operand" "v")
+		       (match_operand:VEC_I 3 "altivec_register_operand" "0")]
+		      UNSPEC_INSERTL))]
+  "TARGET_POWER10"
+  "vins<wd>vlx %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "vinsertvr_<mode>"
+  [(set (match_operand:VI2 0 "altivec_register_operand")
+	(unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand")
+		     (match_operand:VI2 2 "altivec_register_operand")
+		     (match_operand:SI 3 "register_operand" "r")]
+		    UNSPEC_INSERTR))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+     emit_insn (gen_vinsertvr_internal_<mode> (operands[0], operands[3],
+                                               operands[1], operands[2]));
+   else
+     emit_insn (gen_vinsertvl_internal_<mode> (operands[0], operands[3],
+                                               operands[1], operands[2]));
+   DONE;
+})
+
+(define_insn "vinsertvr_internal_<mode>"
+  [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
+	(unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
+		       (match_operand:VEC_I 2 "altivec_register_operand" "v")
+		       (match_operand:VEC_I 3 "altivec_register_operand" "0")]
+		      UNSPEC_INSERTR))]
+  "TARGET_POWER10"
+  "vins<wd>vrx %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "vinsertgl_<mode>"
+  [(set (match_operand:VI2 0 "altivec_register_operand")
+	(unspec:VI2 [(match_operand:SI 1 "register_operand")
+		     (match_operand:VI2 2 "altivec_register_operand")
+		     (match_operand:SI 3 "register_operand")]
+		    UNSPEC_INSERTL))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_vinsertgl_internal_<mode> (operands[0], operands[3],
+                                            operands[1], operands[2]));
+  else
+    emit_insn (gen_vinsertgr_internal_<mode> (operands[0], operands[3],
+                                            operands[1], operands[2]));
+  DONE;
+ })
+
+(define_insn "vinsertgl_internal_<mode>"
+ [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
+       (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
+		      (match_operand:SI 2 "register_operand" "r")
+		      (match_operand:VEC_I 3 "altivec_register_operand" "0")]
+		     UNSPEC_INSERTL))]
+ "TARGET_POWER10"
+ "vins<wd>lx %0,%1,%2"
+ [(set_attr "type" "vecsimple")])
+
+(define_expand "vinsertgr_<mode>"
+  [(set (match_operand:VI2 0 "altivec_register_operand")
+	(unspec:VI2 [(match_operand:SI 1 "register_operand")
+		     (match_operand:VI2 2 "altivec_register_operand")
+		     (match_operand:SI 3 "register_operand")]
+		    UNSPEC_INSERTR))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_vinsertgr_internal_<mode> (operands[0], operands[3],
+                                            operands[1], operands[2]));
+  else
+    emit_insn (gen_vinsertgl_internal_<mode> (operands[0], operands[3],
+                                            operands[1], operands[2]));
+  DONE;
+ })
+
+(define_insn "vinsertgr_internal_<mode>"
+ [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v")
+   (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r")
+		  (match_operand:SI 2 "register_operand" "r")
+		  (match_operand:VEC_I 3 "altivec_register_operand" "0")]
+		 UNSPEC_INSERTR))]
+ "TARGET_POWER10"
+ "vins<wd>rx %0,%1,%2"
+ [(set_attr "type" "vecsimple")])
+
 ;; VSX_EXTRACT optimizations
 ;; Optimize double d = (double) vec_extract (vi, <n>)
 ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0e65d542587..e643346a160 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20991,6 +20991,77 @@ Perform a vector parallel bits deposit operation, as if implemented by
 the @code{vpdepd} instruction.
 @findex vec_pdep
 
+Vector Insert
+
+@smallexample
+@exdent vector unsigned char
+@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned short
+@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int);
+@exdent vector unsigned int
+@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int);
+@exdent vector unsigned long long
+@exdent vec_insertl (unsigned long long, vector unsigned long long,
+unsigned int);
+@exdent vector unsigned char
+@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int;
+@exdent vector unsigned short
+@exdent vec_insertl (vector unsigned short, vector unsigned short,
+unsigned int);
+@exdent vector unsigned int
+@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int);
+@end smallexample
+
+Let src be the first argument, when the first argument is a scalar, or the
+rightmost element of the left doubleword of the first argument, when the first
+argument is a vector.  Insert the source into the destination at the position
+given by the third argument, using natural element order in the second
+argument.  The rest of the second argument is unchanged.  If the byte
+index is greater than 14 for halfwords, greatere than 12 for words, or
+greater than 8 for doublewords the result is undefined.   For little-endian,
+the generated code will be semantically equivalent to vinsbrx, vinshrx,
+or vinswrx instructions.  Similarly for big-endian it will be semantically
+equivalent to vinsblx, vinshlx, vinswlx.  Note that some
+fairly anomalous results can be generated if the byte index is not aligned
+on an element boundary for the sort of element being inserted. This is a
+limitation of the bi-endian vector programming model.
+@findex vec_insertl
+
+@smallexample
+@exdent vector unsigned char
+@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned short
+@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int);
+@exdent vector unsigned int
+@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int);
+@exdent vector unsigned long long
+@exdent vec_inserth (unsigned long long, vector unsigned long long,
+unsigned int);
+@exdent vector unsigned char
+@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned short
+@exdent vec_inserth (vector unsigned short, vector unsigned short,
+unsigned int);
+@exdent vector unsigned int
+@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int);
+@end smallexample
+
+Let src be the first argument, when the first argument is a scalar, or the
+rightmost element of the first argument, when the first argument is a vector.
+Insert src into the second argument at the position identified by the third
+argument, using opposite element order in the second argument, and leaving the
+rest of the second argument unchanged.  If the byte index is greater than 14
+for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be
+rejected. Note that the underlying hardware instruction uses the same register
+for the second argument and the result, but this is hidden by the built-in.
+For little-endian, the code generation will be semantically equivalent to
+vins*lx, while for big-endian it will be semantically equivalent to vins*rx.
+Note that some fairly anomalous results can be generated if the byte index is
+not aligned on an element boundary for the sort of element being inserted.
+This is a limitation of the bi-endian vector programming model consistent with
+the limitation on vec_perm, for example.
+@findex vec_inserth
+
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_pext (vector unsigned long long int, vector unsigned long long int)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c
new file mode 100644
index 00000000000..8c2721aedfc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c
@@ -0,0 +1,345 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  unsigned int index;
+  vector unsigned char vresult_ch;
+  vector unsigned char expected_vresult_ch;
+  vector unsigned char src_va_ch;
+  vector unsigned char src_vb_ch;
+  unsigned char src_a_ch;
+
+  vector unsigned short vresult_sh;
+  vector unsigned short expected_vresult_sh;
+  vector unsigned short src_va_sh;
+  vector unsigned short src_vb_sh;
+  unsigned short int src_a_sh;
+
+  vector unsigned int vresult_int;
+  vector unsigned int expected_vresult_int;
+  vector unsigned int src_va_int;
+  vector unsigned int src_vb_int;
+  unsigned int src_a_int;
+  
+  vector unsigned long long vresult_ll;
+  vector unsigned long long expected_vresult_ll;
+  vector unsigned long long src_va_ll;
+  unsigned long long int src_a_ll;
+
+  /* Vector insert, low index, from GPR */
+  src_a_ch = 79;
+  index = 2;
+  src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 13, 14, 15 };
+  vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ch = (vector unsigned char) { 0, 1, 79, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 13, 14, 15 };
+						 
+    vresult_ch = vec_insertl (src_a_ch, src_va_ch, index);
+
+  if (!vec_all_eq (vresult_ch,  expected_vresult_ch)) {
+#if DEBUG
+    printf("ERROR, vec_insertl (src_a_ch, src_va_ch, index)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n",
+	     i, vresult_ch[i], i, expected_vresult_ch[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_sh = 79;
+  index = 10;
+  src_va_sh = (vector unsigned short int) { 0, 1, 2, 3, 4, 5, 6, 7 };
+  vresult_sh = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_sh = (vector unsigned short int) { 0, 1, 2, 3,
+						      4, 79, 6, 7 };
+
+  vresult_sh = vec_insertl (src_a_sh, src_va_sh, index);
+
+  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
+#if DEBUG
+    printf("ERROR, vec_insertl (src_a_sh, src_va_sh, index)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
+	     i, vresult_sh[i], i, expected_vresult_sh[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_int = 79;
+  index = 8;
+  src_va_int = (vector unsigned int) { 0, 1, 2, 3 };
+  vresult_int = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector unsigned int) { 0, 1, 79, 3 };
+
+  vresult_int = vec_insertl (src_a_int, src_va_int, index);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_insertl (src_a_int, src_va_int, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_ll = 79;
+  index = 8;
+  src_va_ll = (vector unsigned long long) { 0, 1 };
+  vresult_ll = (vector unsigned long long) { 0, 0 };
+  expected_vresult_ll = (vector unsigned long long) { 0, 79 };
+
+  vresult_ll = vec_insertl (src_a_ll, src_va_ll, index);
+
+  if (!vec_all_eq (vresult_ll,  expected_vresult_ll)) {
+#if DEBUG
+    printf("ERROR, vec_insertl (src_a_ll, src_va_ll, index)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n",
+	     i, vresult_ll[i], i, expected_vresult_ll[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector insert, low index, from vector */
+  index = 2;
+  src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 13, 14, 15 };
+  src_vb_ch = (vector unsigned char) { 10, 11, 12, 13, 14, 15, 16, 17,
+				       18, 19, 20, 21, 22, 23, 24, 25 };
+  vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ch = (vector unsigned char) { 0, 1, 18, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 13, 14, 15 };
+						 
+  vresult_ch = vec_insertl (src_vb_ch, src_va_ch, index);
+
+  if (!vec_all_eq (vresult_ch,  expected_vresult_ch)) {
+#if DEBUG
+    printf("ERROR, vec_insertl (src_vb_ch, src_va_ch, index)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n",
+	     i, vresult_ch[i], i, expected_vresult_ch[i]);
+#else
+    abort();
+#endif
+  }
+
+  index = 4;
+  src_va_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 5, 6, 7 };
+  src_vb_sh = (vector unsigned short) { 10, 11, 12, 13, 14, 15, 16, 17 };
+  vresult_sh = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_sh = (vector unsigned short) { 0, 1, 14, 3, 4, 5, 6, 7 };
+						 
+  vresult_sh = vec_insertl (src_vb_sh, src_va_sh, index);
+
+  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
+#if DEBUG
+    printf("ERROR, vec_insertl (src_vb_sh, src_va_sh, index)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
+	     i, vresult_sh[i], i, expected_vresult_sh[i]);
+#else
+    abort();
+#endif
+  }
+
+  index = 8;
+  src_va_int = (vector unsigned int) { 0, 1, 2, 3 };
+  src_vb_int = (vector unsigned int) { 10, 11, 12, 13 };
+  vresult_int = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector unsigned int) { 0, 1, 12, 3 };
+						 
+  vresult_int = vec_insertl (src_vb_int, src_va_int, index);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_insertl (src_vb_int, src_va_int, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector insert, high index, from GPR */
+  src_a_ch = 79;
+  index = 2;
+  src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 13, 14, 15 };
+  vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 79, 14, 15 };
+						 
+    vresult_ch = vec_inserth (src_a_ch, src_va_ch, index);
+
+  if (!vec_all_eq (vresult_ch,  expected_vresult_ch)) {
+#if DEBUG
+   printf("ERROR, vec_inserth (src_a_ch, src_va_ch, index)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n",
+	     i, vresult_ch[i], i, expected_vresult_ch[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_sh = 79;
+  index = 10;
+  src_va_sh = (vector unsigned short int) { 0, 1, 2, 3, 4, 5, 6, 7 };
+  vresult_sh = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_sh = (vector unsigned short int) { 0, 1, 79, 3,
+						      4, 5, 6, 7 };
+
+  vresult_sh = vec_inserth (src_a_sh, src_va_sh, index);
+
+  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
+#if DEBUG
+    printf("ERROR, vec_inserth (src_a_sh, src_va_sh, index)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
+	     i, vresult_sh[i], i, expected_vresult_sh[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_int = 79;
+  index = 8;
+  src_va_int = (vector unsigned int) { 0, 1, 2, 3 };
+  vresult_int = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector unsigned int) { 0, 79, 2, 3 };
+
+  vresult_int = vec_inserth (src_a_int, src_va_int, index);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_inserth (src_a_int, src_va_int, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  src_a_ll = 79;
+  index = 8;
+  src_va_ll = (vector unsigned long long) { 0, 1 };
+  vresult_ll = (vector unsigned long long) { 0, 0 };
+  expected_vresult_ll = (vector unsigned long long) { 79, 1 };
+
+  vresult_ll = vec_inserth (src_a_ll, src_va_ll, index);
+
+  if (!vec_all_eq (vresult_ll,  expected_vresult_ll)) {
+#if DEBUG
+    printf("ERROR, vec_inserth (src_a_ll, src_va_ll, index)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n",
+	     i, vresult_ll[i], i, expected_vresult_ll[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector insert, left index, from vector */
+  index = 2;
+  src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 13, 14, 15 };
+  src_vb_ch = (vector unsigned char) { 10, 11, 12, 13, 14, 15, 16, 17,
+				       18, 19, 20, 21, 22, 23, 24, 25 };
+  vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7,
+				       8, 9, 10, 11, 12, 18, 14, 15 };
+						 
+  vresult_ch = vec_inserth (src_vb_ch, src_va_ch, index);
+
+  if (!vec_all_eq (vresult_ch,  expected_vresult_ch)) {
+#if DEBUG
+    printf("ERROR, vec_inserth (src_vb_ch, src_va_ch, index)\n");
+    for(i = 0; i < 16; i++)
+      printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n",
+	     i, vresult_ch[i], i, expected_vresult_ch[i]);
+#else
+    abort();
+#endif
+  }
+
+  index = 4;
+  src_va_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 5, 6, 7 };
+  src_vb_sh = (vector unsigned short) { 10, 11, 12, 13, 14, 15, 16, 17 };
+  vresult_sh = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 14, 6, 7 };
+						 
+  vresult_sh = vec_inserth (src_vb_sh, src_va_sh, index);
+
+  if (!vec_all_eq (vresult_sh,  expected_vresult_sh)) {
+#if DEBUG
+    printf("ERROR, vec_inserth (src_vb_sh, src_va_sh, index)\n");
+    for(i = 0; i < 8; i++)
+      printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n",
+	     i, vresult_sh[i], i, expected_vresult_sh[i]);
+#else
+    abort();
+#endif
+  }
+
+  index = 8;
+  src_va_int = (vector unsigned int) { 0, 1, 2, 3 };
+  src_vb_int = (vector unsigned int) { 10, 11, 12, 13 };
+  vresult_int = (vector unsigned int) { 0, 0, 0, 0 };
+  expected_vresult_int = (vector unsigned int) { 0, 12, 2, 3 };
+						 
+  vresult_int = vec_inserth (src_vb_int, src_va_int, index);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_inserth (src_vb_int, src_va_int, index)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\mvinsblx\M} } } */
+/* { dg-final { scan-assembler {\mvinshlx\M} } } */
+/* { dg-final { scan-assembler {\mvinswlx\M} } } */
+/* { dg-final { scan-assembler {\mvinsdlx\M} } } */
+/* { dg-final { scan-assembler {\mvinsbvlx\M} } } */
+/* { dg-final { scan-assembler {\mvinshvlx\M} } } */
+/* { dg-final { scan-assembler {\mvinswvlx\M} } } */
+
+/* { dg-final { scan-assembler {\mvinsbrx\M} } } */
+/* { dg-final { scan-assembler {\mvinshrx\M} } } */
+/* { dg-final { scan-assembler {\mvinswrx\M} } } */
+/* { dg-final { scan-assembler {\mvinsdrx\M} } } */
+/* { dg-final { scan-assembler {\mvinsbvrx\M} } } */
+/* { dg-final { scan-assembler {\mvinshvrx\M} } } */
+/* { dg-final { scan-assembler {\mvinswvrx\M} } } */
+
-- 
2.17.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
@ 2020-07-08 19:58 Carl Love
  2020-07-09 15:31 ` will schmidt
  0 siblings, 1 reply; 18+ messages in thread
From: Carl Love @ 2020-07-08 19:58 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt


[PATCH 1/6] rs6000, Update support for vec_extract

-------------------------
V4 changes
	rebased onto mainline 7/2/2020
	Add iterator name to Change log

-------------------------------
V3 changes

  Redo ChangeLog for code move.
  Replace spaces with tabs in ChangeLog.
  Replaced intruction names using * with the actual list of names.  For
	example vextdu*vrx with the explicit instruction names vextdubvrx,
	vextduhvrx, etc.
-------------------------
v2 changes

config/rs6000/altivec.md log entry for move from changed as suggested.

config/rs6000/vsx.md log entro for moved to here changed as suggested.

define_mode_iterator VI2 also moved, included in both change log entries

--------------------------------------------
GCC maintainers:

Move the existing vector extract support in altivec.md to vsx.md
so all of the vector insert and extract support is in the same file.

The patch also updates the name of the builtins and descriptions for the
builtins in the documentation file so they match the approved builtin
names and descriptions.

The patch does not make any functional changes.

Please let me know if the changes are acceptable for mainline.  Thanks.

                  Carl Love

------------------------------------------------------

gcc/ChangeLog

2020-07-06  Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
	(vextractl<mode>, vextractr<mode>)
	(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
	(VI2): Move to ...
	* config/rs6000/vsx.md:	(UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
	(vextractl<mode>, vextractr<mode>)
	(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
	(VI2):  ..here.
	* gcc/doc/extend.texi: Update documentation for vec_extractl.
	Replace builtin name vec_extractr with vec_extracth.  Update description
	of vec_extracth.
---
 gcc/config/rs6000/altivec.md | 64 -----------------------------
 gcc/config/rs6000/vsx.md     | 66 ++++++++++++++++++++++++++++++
 gcc/doc/extend.texi          | 78 ++++++++++++++++++------------------
 3 files changed, 105 insertions(+), 103 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2ce9227c765..749b2c42c14 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -172,8 +172,6 @@
    UNSPEC_XXEVAL
    UNSPEC_VSTRIR
    UNSPEC_VSTRIL
-   UNSPEC_EXTRACTL
-   UNSPEC_EXTRACTR
 ])
 
 (define_c_enum "unspecv"
@@ -184,8 +182,6 @@
    UNSPECV_DSS
   ])
 
-;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
-(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
 ;; Longer vec int modes for rotate/mask ops
@@ -786,66 +782,6 @@
   DONE;
 })
 
-(define_expand "vextractl<mode>"
-  [(set (match_operand:V2DI 0 "altivec_register_operand")
-	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
-		      (match_operand:VI2 2 "altivec_register_operand")
-		      (match_operand:SI 3 "register_operand")]
-		     UNSPEC_EXTRACTL))]
-  "TARGET_POWER10"
-{
-  if (BYTES_BIG_ENDIAN)
-    {
-      emit_insn (gen_vextractl<mode>_internal (operands[0], operands[1],
-					       operands[2], operands[3]));
-      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
-    }
-  else
-    emit_insn (gen_vextractr<mode>_internal (operands[0], operands[2],
-					     operands[1], operands[3]));
-  DONE;
-})
-
-(define_insn "vextractl<mode>_internal"
-  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
-	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
-		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
-		      (match_operand:SI 3 "register_operand" "r")]
-		     UNSPEC_EXTRACTL))]
-  "TARGET_POWER10"
-  "vext<du_or_d><wd>vlx %0,%1,%2,%3"
-  [(set_attr "type" "vecsimple")])
-
-(define_expand "vextractr<mode>"
-  [(set (match_operand:V2DI 0 "altivec_register_operand")
-	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
-		      (match_operand:VI2 2 "altivec_register_operand")
-		      (match_operand:SI 3 "register_operand")]
-		     UNSPEC_EXTRACTR))]
-  "TARGET_POWER10"
-{
-  if (BYTES_BIG_ENDIAN)
-    {
-      emit_insn (gen_vextractr<mode>_internal (operands[0], operands[1],
-					       operands[2], operands[3]));
-      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
-    }
-  else
-    emit_insn (gen_vextractl<mode>_internal (operands[0], operands[2],
-    					     operands[1], operands[3]));
-  DONE;
-})
-
-(define_insn "vextractr<mode>_internal"
-  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
-	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
-		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
-		      (match_operand:SI 3 "register_operand" "r")]
-		     UNSPEC_EXTRACTR))]
-  "TARGET_POWER10"
-  "vext<du_or_d><wd>vrx %0,%1,%2,%3"
-  [(set_attr "type" "vecsimple")])
-
 (define_expand "vstrir_<mode>"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
 	(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 732a54842b6..e9f89d43b3f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -347,6 +347,8 @@
    UNSPEC_VSX_FIRST_MISMATCH_INDEX
    UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX
    UNSPEC_XXGENPCV
+   UNSPEC_EXTRACTL
+   UNSPEC_EXTRACTR
   ])
 
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
@@ -355,6 +357,9 @@
 (define_int_attr xvcvbf16       [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
 				 (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
 
+;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
+(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
+
 ;; VSX moves
 
 ;; The patterns for LE permuted loads and stores come before the general
@@ -3799,6 +3804,67 @@
 }
   [(set_attr "type" "load")])
 
+;; ISA 3.1 extract
+(define_expand "vextractl<mode>"
+  [(set (match_operand:V2DI 0 "altivec_register_operand")
+	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
+		      (match_operand:VI2 2 "altivec_register_operand")
+		      (match_operand:SI 3 "register_operand")]
+		     UNSPEC_EXTRACTL))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      emit_insn (gen_vextractl<mode>_internal (operands[0], operands[1],
+					       operands[2], operands[3]));
+      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
+    }
+  else
+    emit_insn (gen_vextractr<mode>_internal (operands[0], operands[2],
+					     operands[1], operands[3]));
+  DONE;
+})
+
+(define_insn "vextractl<mode>_internal"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
+	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
+		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
+		      (match_operand:SI 3 "register_operand" "r")]
+		     UNSPEC_EXTRACTL))]
+  "TARGET_POWER10"
+  "vext<du_or_d><wd>vlx %0,%1,%2,%3"
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "vextractr<mode>"
+  [(set (match_operand:V2DI 0 "altivec_register_operand")
+	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
+		      (match_operand:VI2 2 "altivec_register_operand")
+		      (match_operand:SI 3 "register_operand")]
+		     UNSPEC_EXTRACTR))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      emit_insn (gen_vextractr<mode>_internal (operands[0], operands[1],
+					       operands[2], operands[3]));
+      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
+    }
+  else
+    emit_insn (gen_vextractl<mode>_internal (operands[0], operands[2],
+					     operands[1], operands[3]));
+  DONE;
+})
+
+(define_insn "vextractr<mode>_internal"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
+	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
+		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
+		      (match_operand:SI 3 "register_operand" "r")]
+		     UNSPEC_EXTRACTR))]
+  "TARGET_POWER10"
+  "vext<du_or_d><wd>vrx %0,%1,%2,%3"
+  [(set_attr "type" "vecsimple")])
+
 ;; VSX_EXTRACT optimizations
 ;; Optimize double d = (double) vec_extract (vi, <n>)
 ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ecd3661d257..0e65d542587 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20927,6 +20927,9 @@ Perform a 128-bit vector gather  operation, as if implemented by the
 integer value between 2 and 7 inclusive.
 @findex vec_gnb
 
+
+Vector Extract
+
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int)
@@ -20937,52 +20940,49 @@ integer value between 2 and 7 inclusive.
 @exdent vector unsigned long long int
 @exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int)
 @end smallexample
-Extract a single element from the vector formed by catenating this function's
-first two arguments at the byte offset specified by this function's
-third argument.  On big-endian targets, this function behaves as if
-implemented by the @code{vextdubvlx}, @code{vextduhvlx},
-@code{vextduwvlx}, or @code{vextddvlx} instructions, depending on the
-types of the function's first two arguments.  On little-endian
-targets, this function behaves as if implemented by the
-@code{vextdubvrx}, @code{vextduhvrx},
-@code{vextduwvrx}, or @code{vextddvrx} instructions.
-The byte offset of the element to be extracted is calculated
-by computing the remainder of dividing the third argument by 32.
-If this reminader value is not a multiple of the vector element size,
-or if its value added to the vector element size exceeds 32, the
-result is undefined.
+Extract an element from two concatenated vectors starting at the given byte index
+in natural-endian order, and place it zero-extended in doubleword 1 of the result
+according to natural element order.  If the byte index is out of range for the
+data type, the intrinsic will be rejected.
+For little-endian, this output will match the placement by the hardware
+instruction, i.e., dword[0] in RTL notation.  For big-endian, an additional
+instruction is needed to move it from the "left" doubleword to the  "right" one.
+For little-endian, semantics matching the vextdubvrx, vextduhvrx,
+vextduwvrx instruction will be generated, while for big-endian, semantics
+matching the vextdubvlx, vextduhvlx, vextduwvlx instructions
+will be generated.  Note that some fairly anomalous results can be generated if
+the byte index is not aligned on an element boundary for the element being
+extracted.  This is a limitation of the bi-endian vector programming model is
+consistent with the limitation on vec_perm, for example.
 @findex vec_extractl
 
 @smallexample
 @exdent vector unsigned long long int
-@exdent vec_extractr (vector unsigned char, vector unsigned char, unsigned int)
+@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int)
 @exdent vector unsigned long long int
-@exdent vec_extractr (vector unsigned short, vector unsigned short, unsigned int)
+@exdent vec_extracth (vector unsigned short, vector unsigned short,
+unsigned int)
 @exdent vector unsigned long long int
-@exdent vec_extractr (vector unsigned int, vector unsigned int, unsigned int)
+@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int)
 @exdent vector unsigned long long int
-@exdent vec_extractr (vector unsigned long long, vector unsigned long long, unsigned int)
-@end smallexample
-Extract a single element from the vector formed by catenating this function's
-first two arguments at the byte offset calculated by subtracting this
-function's third argument from 31.  On big-endian targets, this
-function behaves as if
-implemented by the
-@code{vextdubvrx}, @code{vextduhvrx},
-@code{vextduwvrx}, or @code{vextddvrx} instructions, depending on the
-types of the function's first two arguments.
-On little-endian
-targets, this function behaves as if implemented by the
-@code{vextdubvlx}, @code{vextduhvlx},
-@code{vextduwvlx}, or @code{vextddvlx} instructions.
-The byte offset of the element to be extracted, measured from the
-right end of the catenation of the two vector arguments, is calculated
-by computing the remainder of dividing the third argument by 32.
-If this reminader value is not a multiple of the vector element size,
-or if its value added to the vector element size exceeds 32, the
-result is undefined.
-@findex vec_extractr
-
+@exdent vec_extracth (vector unsigned long long, vector unsigned long long,
+unsigned int)
+@end smallexample
+Extract an element from two concatenated vectors starting at the given byte
+index in opposite-endian order, and place it zero-extended in doubleword 1
+according to natural element order.  If the byte index is out of range for the
+data type, the intrinsic will be rejected.  For little-endian, this output
+will match the placement by the hardware instruction, i.e., dword[0] in RTL
+notation.  For big-endian, an additional instruction is needed to move it
+from the "left" doubleword to the "right" one.  For little-endian, semantics
+matching the vextdubvlx, vextduhvlx, vextduwvlx instructions will be generated,
+while for big-endian, semantics matching the vextdubvrx, vextduhvrx,
+vextduwvrx instructions will be generated.  Note that some fairly anomalous
+results can be generated if the byte index is not aligned on the
+element boundary for the element being extracted.  This is a
+limitation of the bi-endian vector programming model consistent with the
+limitation on vec_perm, for example.
+@findex vec_extracth
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int)
-- 
2.17.1




----------------------------------------------------------



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-07-15 20:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 19:44 [PATCH 0/6 ver 4] ] Permute Class Operations Carl Love
2020-07-08 19:58 Carl Love
2020-07-09 15:31 ` will schmidt
2020-07-08 19:59 Carl Love
2020-07-09 15:44 ` will schmidt
2020-07-13 12:04 ` Segher Boessenkool
2020-07-08 19:59 Carl Love
2020-07-09 16:02 ` will schmidt
2020-07-13 12:41   ` Segher Boessenkool
2020-07-13 14:30 ` Segher Boessenkool
2020-07-08 19:59 Carl Love
2020-07-09 16:13 ` will schmidt
2020-07-14 20:15 ` Segher Boessenkool
2020-07-08 19:59 Carl Love
2020-07-09 17:38 ` will schmidt
2020-07-15 20:07 ` Segher Boessenkool
2020-07-08 19:59 Carl Love
2020-07-09 18:28 ` will schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).