From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id E234D384B821; Thu, 9 Jul 2020 15:31:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E234D384B821 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 069F268L053735; Thu, 9 Jul 2020 11:31:21 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 325uqvjtfk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Jul 2020 11:31:20 -0400 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 069FNIk1175807; Thu, 9 Jul 2020 11:31:20 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com with ESMTP id 325uqvjteu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Jul 2020 11:31:20 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 069FEX6s021609; Thu, 9 Jul 2020 15:31:19 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma04wdc.us.ibm.com with ESMTP id 325k2477ty-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Jul 2020 15:31:19 +0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 069FVFXm26280382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Jul 2020 15:31:15 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 305C013604F; Thu, 9 Jul 2020 15:31:18 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 506B7136051; Thu, 9 Jul 2020 15:31:17 +0000 (GMT) Received: from sig-9-65-252-120.ibm.com (unknown [9.65.252.120]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 9 Jul 2020 15:31:17 +0000 (GMT) Message-ID: <77c96aea9332b714803ef58844e71f223b0d7521.camel@vnet.ibm.com> Subject: Re: [PATCH 0/6 ver 4] ] Permute Class Operations From: will schmidt To: Carl Love , segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org Date: Thu, 09 Jul 2020 10:31:16 -0500 In-Reply-To: <50a25bffa56dcbb951afb105b77d5ae16ee91d40.camel@us.ibm.com> References: <50a25bffa56dcbb951afb105b77d5ae16ee91d40.camel@us.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-8.el7) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-09_08:2020-07-09, 2020-07-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 lowpriorityscore=0 suspectscore=4 spamscore=0 mlxscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 clxscore=1015 malwarescore=0 priorityscore=1501 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007090109 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2020 15:31:24 -0000 On Wed, 2020-07-08 at 12:58 -0700, Carl Love wrote: > [PATCH 1/6] rs6000, Update support for vec_extract Email subject needs to be updated too. This is at least correct in- line. Here and subsequent messages in thread. > > ------------------------- > V4 changes > rebased onto mainline 7/2/2020 > Add iterator name to Change log > > ------------------------------- > V3 changes > > Redo ChangeLog for code move. > Replace spaces with tabs in ChangeLog. > Replaced intruction names using * with the actual list of names. For > example vextdu*vrx with the explicit instruction names vextdubvrx, > vextduhvrx, etc. > ------------------------- > v2 changes > > config/rs6000/altivec.md log entry for move from changed as suggested. > > config/rs6000/vsx.md log entro for moved to here changed as suggested. > > define_mode_iterator VI2 also moved, included in both change log entries > > -------------------------------------------- > GCC maintainers: > > Move the existing vector extract support in altivec.md to vsx.md > so all of the vector insert and extract support is in the same file. > > The patch also updates the name of the builtins and descriptions for the > builtins in the documentation file so they match the approved builtin > names and descriptions. > > The patch does not make any functional changes. > > Please let me know if the changes are acceptable for mainline. Thanks. > > Carl Love > > ------------------------------------------------------ > > gcc/ChangeLog > > 2020-07-06 Carl Love > > * config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR) > (vextractl, vextractr) > (vextractl_internal, vextractr_internal for mode VI2) > (VI2): Move to ... > * config/rs6000/vsx.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR) > (vextractl, vextractr) > (vextractl_internal, vextractr_internal for mode VI2) > (VI2): ..here. > * gcc/doc/extend.texi: Update documentation for vec_extractl. > Replace builtin name vec_extractr with vec_extracth. Update description > of vec_extracth. > --- > gcc/config/rs6000/altivec.md | 64 ----------------------------- > gcc/config/rs6000/vsx.md | 66 ++++++++++++++++++++++++++++++ > gcc/doc/extend.texi | 78 ++++++++++++++++++------------------ > 3 files changed, 105 insertions(+), 103 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2ce9227c765..749b2c42c14 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -172,8 +172,6 @@ > UNSPEC_XXEVAL > UNSPEC_VSTRIR > UNSPEC_VSTRIL > - UNSPEC_EXTRACTL > - UNSPEC_EXTRACTR > ]) > > (define_c_enum "unspecv" > @@ -184,8 +182,6 @@ > UNSPECV_DSS > ]) > > -;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops > -(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) > ;; Short vec int modes > (define_mode_iterator VIshort [V8HI V16QI]) > ;; Longer vec int modes for rotate/mask ops > @@ -786,66 +782,6 @@ > DONE; > }) > > -(define_expand "vextractl" > - [(set (match_operand:V2DI 0 "altivec_register_operand") > - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") > - (match_operand:VI2 2 "altivec_register_operand") > - (match_operand:SI 3 "register_operand")] > - UNSPEC_EXTRACTL))] > - "TARGET_POWER10" > -{ > - if (BYTES_BIG_ENDIAN) > - { > - emit_insn (gen_vextractl_internal (operands[0], operands[1], > - operands[2], operands[3])); > - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); > - } > - else > - emit_insn (gen_vextractr_internal (operands[0], operands[2], > - operands[1], operands[3])); > - DONE; > -}) > - > -(define_insn "vextractl_internal" > - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") > - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") > - (match_operand:VEC_I 2 "altivec_register_operand" "v") > - (match_operand:SI 3 "register_operand" "r")] > - UNSPEC_EXTRACTL))] > - "TARGET_POWER10" > - "vextvlx %0,%1,%2,%3" > - [(set_attr "type" "vecsimple")]) > - > -(define_expand "vextractr" > - [(set (match_operand:V2DI 0 "altivec_register_operand") > - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") > - (match_operand:VI2 2 "altivec_register_operand") > - (match_operand:SI 3 "register_operand")] > - UNSPEC_EXTRACTR))] > - "TARGET_POWER10" > -{ > - if (BYTES_BIG_ENDIAN) > - { > - emit_insn (gen_vextractr_internal (operands[0], operands[1], > - operands[2], operands[3])); > - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); > - } > - else > - emit_insn (gen_vextractl_internal (operands[0], operands[2], > - operands[1], operands[3])); > - DONE; > -}) > - > -(define_insn "vextractr_internal" > - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") > - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") > - (match_operand:VEC_I 2 "altivec_register_operand" "v") > - (match_operand:SI 3 "register_operand" "r")] > - UNSPEC_EXTRACTR))] > - "TARGET_POWER10" > - "vextvrx %0,%1,%2,%3" > - [(set_attr "type" "vecsimple")]) > - > (define_expand "vstrir_" > [(set (match_operand:VIshort 0 "altivec_register_operand") > (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 732a54842b6..e9f89d43b3f 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -347,6 +347,8 @@ > UNSPEC_VSX_FIRST_MISMATCH_INDEX > UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX > UNSPEC_XXGENPCV > + UNSPEC_EXTRACTL > + UNSPEC_EXTRACTR > ]) > > (define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16 > @@ -355,6 +357,9 @@ > (define_int_attr xvcvbf16 [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16") > (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")]) > > +;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops > +(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) > + > ;; VSX moves > > ;; The patterns for LE permuted loads and stores come before the general > @@ -3799,6 +3804,67 @@ > } > [(set_attr "type" "load")]) > > +;; ISA 3.1 extract > +(define_expand "vextractl" > + [(set (match_operand:V2DI 0 "altivec_register_operand") > + (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") > + (match_operand:VI2 2 "altivec_register_operand") > + (match_operand:SI 3 "register_operand")] > + UNSPEC_EXTRACTL))] > + "TARGET_POWER10" > +{ > + if (BYTES_BIG_ENDIAN) > + { > + emit_insn (gen_vextractl_internal (operands[0], operands[1], > + operands[2], operands[3])); > + emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); > + } > + else > + emit_insn (gen_vextractr_internal (operands[0], operands[2], > + operands[1], operands[3])); > + DONE; > +}) > + > +(define_insn "vextractl_internal" > + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") > + (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") > + (match_operand:VEC_I 2 "altivec_register_operand" "v") > + (match_operand:SI 3 "register_operand" "r")] > + UNSPEC_EXTRACTL))] > + "TARGET_POWER10" > + "vextvlx %0,%1,%2,%3" > + [(set_attr "type" "vecsimple")]) > + > +(define_expand "vextractr" > + [(set (match_operand:V2DI 0 "altivec_register_operand") > + (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") > + (match_operand:VI2 2 "altivec_register_operand") > + (match_operand:SI 3 "register_operand")] > + UNSPEC_EXTRACTR))] > + "TARGET_POWER10" > +{ > + if (BYTES_BIG_ENDIAN) > + { > + emit_insn (gen_vextractr_internal (operands[0], operands[1], > + operands[2], operands[3])); > + emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); > + } > + else > + emit_insn (gen_vextractl_internal (operands[0], operands[2], > + operands[1], operands[3])); > + DONE; > +}) > + > +(define_insn "vextractr_internal" > + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") > + (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") > + (match_operand:VEC_I 2 "altivec_register_operand" "v") > + (match_operand:SI 3 "register_operand" "r")] > + UNSPEC_EXTRACTR))] > + "TARGET_POWER10" > + "vextvrx %0,%1,%2,%3" > + [(set_attr "type" "vecsimple")]) > + > ;; VSX_EXTRACT optimizations > ;; Optimize double d = (double) vec_extract (vi, ) > ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index ecd3661d257..0e65d542587 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -20927,6 +20927,9 @@ Perform a 128-bit vector gather operation, as if implemented by the > integer value between 2 and 7 inclusive. > @findex vec_gnb > > + > +Vector Extract > + > @smallexample > @exdent vector unsigned long long int > @exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int) > @@ -20937,52 +20940,49 @@ integer value between 2 and 7 inclusive. > @exdent vector unsigned long long int > @exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int) > @end smallexample > -Extract a single element from the vector formed by catenating this function's > -first two arguments at the byte offset specified by this function's > -third argument. On big-endian targets, this function behaves as if > -implemented by the @code{vextdubvlx}, @code{vextduhvlx}, > -@code{vextduwvlx}, or @code{vextddvlx} instructions, depending on the > -types of the function's first two arguments. On little-endian > -targets, this function behaves as if implemented by the > -@code{vextdubvrx}, @code{vextduhvrx}, > -@code{vextduwvrx}, or @code{vextddvrx} instructions. > -The byte offset of the element to be extracted is calculated > -by computing the remainder of dividing the third argument by 32. > -If this reminader value is not a multiple of the vector element size, > -or if its value added to the vector element size exceeds 32, the > -result is undefined. > +Extract an element from two concatenated vectors starting at the given byte index > +in natural-endian order, and place it zero-extended in doubleword 1 of the result > +according to natural element order. If the byte index is out of range for the > +data type, the intrinsic will be rejected. > +For little-endian, this output will match the placement by the hardware > +instruction, i.e., dword[0] in RTL notation. For big-endian, an additional > +instruction is needed to move it from the "left" doubleword to the "right" one. > +For little-endian, semantics matching the vextdubvrx, vextduhvrx, > +vextduwvrx instruction will be generated, while for big-endian, semantics > +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions > +will be generated. Note that some fairly anomalous results can be generated if > +the byte index is not aligned on an element boundary for the element being > +extracted. This is a limitation of the bi-endian vector programming model is > +consistent with the limitation on vec_perm, for example. > @findex vec_extractl > > @smallexample > @exdent vector unsigned long long int > -@exdent vec_extractr (vector unsigned char, vector unsigned char, unsigned int) > +@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int) > @exdent vector unsigned long long int > -@exdent vec_extractr (vector unsigned short, vector unsigned short, unsigned int) > +@exdent vec_extracth (vector unsigned short, vector unsigned short, > +unsigned int) > @exdent vector unsigned long long int > -@exdent vec_extractr (vector unsigned int, vector unsigned int, unsigned int) > +@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int) > @exdent vector unsigned long long int > -@exdent vec_extractr (vector unsigned long long, vector unsigned long long, unsigned int) > -@end smallexample > -Extract a single element from the vector formed by catenating this function's > -first two arguments at the byte offset calculated by subtracting this > -function's third argument from 31. On big-endian targets, this > -function behaves as if > -implemented by the > -@code{vextdubvrx}, @code{vextduhvrx}, > -@code{vextduwvrx}, or @code{vextddvrx} instructions, depending on the > -types of the function's first two arguments. > -On little-endian > -targets, this function behaves as if implemented by the > -@code{vextdubvlx}, @code{vextduhvlx}, > -@code{vextduwvlx}, or @code{vextddvlx} instructions. > -The byte offset of the element to be extracted, measured from the > -right end of the catenation of the two vector arguments, is calculated > -by computing the remainder of dividing the third argument by 32. > -If this reminader value is not a multiple of the vector element size, > -or if its value added to the vector element size exceeds 32, the > -result is undefined. > -@findex vec_extractr > - > +@exdent vec_extracth (vector unsigned long long, vector unsigned long long, > +unsigned int) > +@end smallexample > +Extract an element from two concatenated vectors starting at the given byte > +index in opposite-endian order, and place it zero-extended in doubleword 1 opposite-endian ? > +according to natural element order. If the byte index is out of range for the > +data type, the intrinsic will be rejected. For little-endian, this output > +will match the placement by the hardware instruction, i.e., dword[0] in RTL Should the 'hardware instruction' be replaced with the instruction reference itself? > +notation. For big-endian, an additional instruction is needed to move it > +from the "left" doubleword to the "right" one. For little-endian, semantics > +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions will be generated, Should wrap the instruction references in @code{} > +while for big-endian, semantics matching the vextdubvrx, vextduhvrx, > +vextduwvrx instructions will be generated. Note that some fairly anomalous > +results can be generated if the byte index is not aligned on the > +element boundary for the element being extracted. This is a > +limitation of the bi-endian vector programming model consistent with the > +limitation on vec_perm, for example. This reads akwardly. maybe s/for example// ? wrap vec_perm reference in @code{} > +@findex vec_extracth > @smallexample > @exdent vector unsigned long long int > @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int)