From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <will_schmidt@vnet.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id E234D384B821;
 Thu,  9 Jul 2020 15:31:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E234D384B821
Received: from pps.filterd (m0098394.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 069F268L053735; Thu, 9 Jul 2020 11:31:21 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 325uqvjtfk-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 09 Jul 2020 11:31:20 -0400
Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 069FNIk1175807;
 Thu, 9 Jul 2020 11:31:20 -0400
Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com
 [169.47.144.26])
 by mx0a-001b2d01.pphosted.com with ESMTP id 325uqvjteu-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 09 Jul 2020 11:31:20 -0400
Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1])
 by ppma04wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 069FEX6s021609;
 Thu, 9 Jul 2020 15:31:19 GMT
Received: from b03cxnp08026.gho.boulder.ibm.com
 (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18])
 by ppma04wdc.us.ibm.com with ESMTP id 325k2477ty-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 09 Jul 2020 15:31:19 +0000
Received: from b03ledav002.gho.boulder.ibm.com
 (b03ledav002.gho.boulder.ibm.com [9.17.130.233])
 by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 069FVFXm26280382
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Thu, 9 Jul 2020 15:31:15 GMT
Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 305C013604F;
 Thu,  9 Jul 2020 15:31:18 +0000 (GMT)
Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 506B7136051;
 Thu,  9 Jul 2020 15:31:17 +0000 (GMT)
Received: from sig-9-65-252-120.ibm.com (unknown [9.65.252.120])
 by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP;
 Thu,  9 Jul 2020 15:31:17 +0000 (GMT)
Message-ID: <77c96aea9332b714803ef58844e71f223b0d7521.camel@vnet.ibm.com>
Subject: Re: [PATCH 0/6 ver 4]  ] Permute Class Operations
From: will schmidt <will_schmidt@vnet.ibm.com>
To: Carl Love <cel@us.ibm.com>, segher@gcc.gnu.org, dje.gcc@gmail.com,
 gcc-patches@gcc.gnu.org
Date: Thu, 09 Jul 2020 10:31:16 -0500
In-Reply-To: <50a25bffa56dcbb951afb105b77d5ae16ee91d40.camel@us.ibm.com>
References: <50a25bffa56dcbb951afb105b77d5ae16ee91d40.camel@us.ibm.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.28.5 (3.28.5-8.el7) 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687
 definitions=2020-07-09_08:2020-07-09,
 2020-07-09 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 adultscore=0
 lowpriorityscore=0 suspectscore=4 spamscore=0 mlxscore=0 bulkscore=0
 mlxlogscore=999 phishscore=0 clxscore=1015 malwarescore=0
 priorityscore=1501 impostorscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.12.0-2006250000 definitions=main-2007090109
X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jul 2020 15:31:24 -0000

On Wed, 2020-07-08 at 12:58 -0700, Carl Love wrote:
> [PATCH 1/6] rs6000, Update support for vec_extract


Email subject needs to be updated too.  This is at least correct in-
line.  Here and subsequent messages in thread.


> 
> -------------------------
> V4 changes
> 	rebased onto mainline 7/2/2020
> 	Add iterator name to Change log
> 
> -------------------------------
> V3 changes
> 
>   Redo ChangeLog for code move.
>   Replace spaces with tabs in ChangeLog.
>   Replaced intruction names using * with the actual list of names.  For
> 	example vextdu*vrx with the explicit instruction names vextdubvrx,
> 	vextduhvrx, etc.
> -------------------------
> v2 changes
> 
> config/rs6000/altivec.md log entry for move from changed as suggested.
> 
> config/rs6000/vsx.md log entro for moved to here changed as suggested.
> 
> define_mode_iterator VI2 also moved, included in both change log entries
> 
> --------------------------------------------
> GCC maintainers:
> 
> Move the existing vector extract support in altivec.md to vsx.md
> so all of the vector insert and extract support is in the same file.
> 
> The patch also updates the name of the builtins and descriptions for the
> builtins in the documentation file so they match the approved builtin
> names and descriptions.
> 
> The patch does not make any functional changes.
> 
> Please let me know if the changes are acceptable for mainline.  Thanks.
> 
>                   Carl Love
> 
> ------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-07-06  Carl Love  <cel@us.ibm.com>
> 
> 	* config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
> 	(vextractl<mode>, vextractr<mode>)
> 	(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
> 	(VI2): Move to ...
> 	* config/rs6000/vsx.md:	(UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
> 	(vextractl<mode>, vextractr<mode>)
> 	(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
> 	(VI2):  ..here.
> 	* gcc/doc/extend.texi: Update documentation for vec_extractl.
> 	Replace builtin name vec_extractr with vec_extracth.  Update description
> 	of vec_extracth.
> ---
>  gcc/config/rs6000/altivec.md | 64 -----------------------------
>  gcc/config/rs6000/vsx.md     | 66 ++++++++++++++++++++++++++++++
>  gcc/doc/extend.texi          | 78 ++++++++++++++++++------------------
>  3 files changed, 105 insertions(+), 103 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2ce9227c765..749b2c42c14 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -172,8 +172,6 @@
>     UNSPEC_XXEVAL
>     UNSPEC_VSTRIR
>     UNSPEC_VSTRIL
> -   UNSPEC_EXTRACTL
> -   UNSPEC_EXTRACTR
>  ])
> 
>  (define_c_enum "unspecv"
> @@ -184,8 +182,6 @@
>     UNSPECV_DSS
>    ])
> 
> -;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
> -(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
>  ;; Longer vec int modes for rotate/mask ops
> @@ -786,66 +782,6 @@
>    DONE;
>  })
> 
> -(define_expand "vextractl<mode>"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand")
> -	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> -		      (match_operand:VI2 2 "altivec_register_operand")
> -		      (match_operand:SI 3 "register_operand")]
> -		     UNSPEC_EXTRACTL))]
> -  "TARGET_POWER10"
> -{
> -  if (BYTES_BIG_ENDIAN)
> -    {
> -      emit_insn (gen_vextractl<mode>_internal (operands[0], operands[1],
> -					       operands[2], operands[3]));
> -      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> -    }
> -  else
> -    emit_insn (gen_vextractr<mode>_internal (operands[0], operands[2],
> -					     operands[1], operands[3]));
> -  DONE;
> -})
> -
> -(define_insn "vextractl<mode>_internal"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> -	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> -		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> -		      (match_operand:SI 3 "register_operand" "r")]
> -		     UNSPEC_EXTRACTL))]
> -  "TARGET_POWER10"
> -  "vext<du_or_d><wd>vlx %0,%1,%2,%3"
> -  [(set_attr "type" "vecsimple")])
> -
> -(define_expand "vextractr<mode>"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand")
> -	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> -		      (match_operand:VI2 2 "altivec_register_operand")
> -		      (match_operand:SI 3 "register_operand")]
> -		     UNSPEC_EXTRACTR))]
> -  "TARGET_POWER10"
> -{
> -  if (BYTES_BIG_ENDIAN)
> -    {
> -      emit_insn (gen_vextractr<mode>_internal (operands[0], operands[1],
> -					       operands[2], operands[3]));
> -      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> -    }
> -  else
> -    emit_insn (gen_vextractl<mode>_internal (operands[0], operands[2],
> -    					     operands[1], operands[3]));
> -  DONE;
> -})
> -
> -(define_insn "vextractr<mode>_internal"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> -	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> -		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> -		      (match_operand:SI 3 "register_operand" "r")]
> -		     UNSPEC_EXTRACTR))]
> -  "TARGET_POWER10"
> -  "vext<du_or_d><wd>vrx %0,%1,%2,%3"
> -  [(set_attr "type" "vecsimple")])
> -
>  (define_expand "vstrir_<mode>"
>    [(set (match_operand:VIshort 0 "altivec_register_operand")
>  	(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 732a54842b6..e9f89d43b3f 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -347,6 +347,8 @@
>     UNSPEC_VSX_FIRST_MISMATCH_INDEX
>     UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX
>     UNSPEC_XXGENPCV
> +   UNSPEC_EXTRACTL
> +   UNSPEC_EXTRACTR
>    ])
> 
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
> @@ -355,6 +357,9 @@
>  (define_int_attr xvcvbf16       [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
>  				 (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
> 
> +;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
> +(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
> +
>  ;; VSX moves
> 
>  ;; The patterns for LE permuted loads and stores come before the general
> @@ -3799,6 +3804,67 @@
>  }
>    [(set_attr "type" "load")])
> 
> +;; ISA 3.1 extract
> +(define_expand "vextractl<mode>"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
> +	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> +		      (match_operand:VI2 2 "altivec_register_operand")
> +		      (match_operand:SI 3 "register_operand")]
> +		     UNSPEC_EXTRACTL))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +      emit_insn (gen_vextractl<mode>_internal (operands[0], operands[1],
> +					       operands[2], operands[3]));
> +      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> +    }
> +  else
> +    emit_insn (gen_vextractr<mode>_internal (operands[0], operands[2],
> +					     operands[1], operands[3]));
> +  DONE;
> +})
> +
> +(define_insn "vextractl<mode>_internal"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> +	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> +		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> +		      (match_operand:SI 3 "register_operand" "r")]
> +		     UNSPEC_EXTRACTL))]
> +  "TARGET_POWER10"
> +  "vext<du_or_d><wd>vlx %0,%1,%2,%3"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_expand "vextractr<mode>"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
> +	(unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> +		      (match_operand:VI2 2 "altivec_register_operand")
> +		      (match_operand:SI 3 "register_operand")]
> +		     UNSPEC_EXTRACTR))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +      emit_insn (gen_vextractr<mode>_internal (operands[0], operands[1],
> +					       operands[2], operands[3]));
> +      emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> +    }
> +  else
> +    emit_insn (gen_vextractl<mode>_internal (operands[0], operands[2],
> +					     operands[1], operands[3]));
> +  DONE;
> +})
> +
> +(define_insn "vextractr<mode>_internal"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> +	(unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> +		      (match_operand:VEC_I 2 "altivec_register_operand" "v")
> +		      (match_operand:SI 3 "register_operand" "r")]
> +		     UNSPEC_EXTRACTR))]
> +  "TARGET_POWER10"
> +  "vext<du_or_d><wd>vrx %0,%1,%2,%3"
> +  [(set_attr "type" "vecsimple")])
> +
>  ;; VSX_EXTRACT optimizations
>  ;; Optimize double d = (double) vec_extract (vi, <n>)
>  ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index ecd3661d257..0e65d542587 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -20927,6 +20927,9 @@ Perform a 128-bit vector gather  operation, as if implemented by the
>  integer value between 2 and 7 inclusive.
>  @findex vec_gnb
> 
> +
> +Vector Extract
> +
>  @smallexample
>  @exdent vector unsigned long long int
>  @exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int)
> @@ -20937,52 +20940,49 @@ integer value between 2 and 7 inclusive.
>  @exdent vector unsigned long long int
>  @exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int)
>  @end smallexample
> -Extract a single element from the vector formed by catenating this function's
> -first two arguments at the byte offset specified by this function's
> -third argument.  On big-endian targets, this function behaves as if
> -implemented by the @code{vextdubvlx}, @code{vextduhvlx},
> -@code{vextduwvlx}, or @code{vextddvlx} instructions, depending on the
> -types of the function's first two arguments.  On little-endian
> -targets, this function behaves as if implemented by the
> -@code{vextdubvrx}, @code{vextduhvrx},
> -@code{vextduwvrx}, or @code{vextddvrx} instructions.
> -The byte offset of the element to be extracted is calculated
> -by computing the remainder of dividing the third argument by 32.
> -If this reminader value is not a multiple of the vector element size,
> -or if its value added to the vector element size exceeds 32, the
> -result is undefined.
> +Extract an element from two concatenated vectors starting at the given byte index
> +in natural-endian order, and place it zero-extended in doubleword 1 of the result
> +according to natural element order.  If the byte index is out of range for the
> +data type, the intrinsic will be rejected.
> +For little-endian, this output will match the placement by the hardware
> +instruction, i.e., dword[0] in RTL notation.  For big-endian, an additional
> +instruction is needed to move it from the "left" doubleword to the  "right" one.
> +For little-endian, semantics matching the vextdubvrx, vextduhvrx,
> +vextduwvrx instruction will be generated, while for big-endian, semantics
> +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions
> +will be generated.  Note that some fairly anomalous results can be generated if
> +the byte index is not aligned on an element boundary for the element being
> +extracted.  This is a limitation of the bi-endian vector programming model is
> +consistent with the limitation on vec_perm, for example.
>  @findex vec_extractl
> 
>  @smallexample
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned char, vector unsigned char, unsigned int)
> +@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int)
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned short, vector unsigned short, unsigned int)
> +@exdent vec_extracth (vector unsigned short, vector unsigned short,
> +unsigned int)
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned int, vector unsigned int, unsigned int)
> +@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int)
>  @exdent vector unsigned long long int
> -@exdent vec_extractr (vector unsigned long long, vector unsigned long long, unsigned int)
> -@end smallexample
> -Extract a single element from the vector formed by catenating this function's
> -first two arguments at the byte offset calculated by subtracting this
> -function's third argument from 31.  On big-endian targets, this
> -function behaves as if
> -implemented by the
> -@code{vextdubvrx}, @code{vextduhvrx},
> -@code{vextduwvrx}, or @code{vextddvrx} instructions, depending on the
> -types of the function's first two arguments.
> -On little-endian
> -targets, this function behaves as if implemented by the
> -@code{vextdubvlx}, @code{vextduhvlx},
> -@code{vextduwvlx}, or @code{vextddvlx} instructions.
> -The byte offset of the element to be extracted, measured from the
> -right end of the catenation of the two vector arguments, is calculated
> -by computing the remainder of dividing the third argument by 32.
> -If this reminader value is not a multiple of the vector element size,
> -or if its value added to the vector element size exceeds 32, the
> -result is undefined.
> -@findex vec_extractr
> -
> +@exdent vec_extracth (vector unsigned long long, vector unsigned long long,
> +unsigned int)
> +@end smallexample
> +Extract an element from two concatenated vectors starting at the given byte
> +index in opposite-endian order, and place it zero-extended in doubleword 1

opposite-endian ? 

> +according to natural element order.  If the byte index is out of range for the
> +data type, the intrinsic will be rejected.  For little-endian, this output
> +will match the placement by the hardware instruction, i.e., dword[0] in RTL

Should the 'hardware instruction' be replaced with the instruction
reference itself? 

> +notation.  For big-endian, an additional instruction is needed to move it
> +from the "left" doubleword to the "right" one.  For little-endian, semantics
> +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions will be generated,

Should wrap the instruction references in @code{}


> +while for big-endian, semantics matching the vextdubvrx, vextduhvrx,
> +vextduwvrx instructions will be generated.  Note that some fairly anomalous
> +results can be generated if the byte index is not aligned on the
> +element boundary for the element being extracted.  This is a
> +limitation of the bi-endian vector programming model consistent with the
> +limitation on vec_perm, for example.

This reads akwardly.   maybe  s/for example//  ?

wrap vec_perm reference in @code{}


> +@findex vec_extracth
>  @smallexample
>  @exdent vector unsigned long long int
>  @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int)