From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id DC799386191F; Wed, 8 Jul 2020 19:58:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DC799386191F Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 068JWtUh082981; Wed, 8 Jul 2020 15:58:52 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 325kts1hf7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 15:58:52 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 068JX03D083654; Wed, 8 Jul 2020 15:58:52 -0400 Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0b-001b2d01.pphosted.com with ESMTP id 325kts1hf3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 15:58:52 -0400 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 068JtuJ8023568; Wed, 8 Jul 2020 19:58:51 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma02wdc.us.ibm.com with ESMTP id 325k1v8n2n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 19:58:51 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 068JwpEI50004288 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 8 Jul 2020 19:58:51 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 288FE112062; Wed, 8 Jul 2020 19:58:51 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 457BC112067; Wed, 8 Jul 2020 19:58:50 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 8 Jul 2020 19:58:50 +0000 (GMT) Message-ID: <50a25bffa56dcbb951afb105b77d5ae16ee91d40.camel@us.ibm.com> Subject: Re: [PATCH 0/6 ver 4] ] Permute Class Operations From: Carl Love To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org, Will Schmidt Date: Wed, 08 Jul 2020 12:58:48 -0700 Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-08_16:2020-07-08, 2020-07-08 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 adultscore=0 clxscore=1015 bulkscore=0 mlxlogscore=999 phishscore=0 suspectscore=4 impostorscore=0 lowpriorityscore=0 malwarescore=0 priorityscore=1501 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007080115 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GB_TO_NAME_FREEMAIL, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2020 19:58:54 -0000 [PATCH 1/6] rs6000, Update support for vec_extract ------------------------- V4 changes rebased onto mainline 7/2/2020 Add iterator name to Change log ------------------------------- V3 changes Redo ChangeLog for code move. Replace spaces with tabs in ChangeLog. Replaced intruction names using * with the actual list of names. For example vextdu*vrx with the explicit instruction names vextdubvrx, vextduhvrx, etc. ------------------------- v2 changes config/rs6000/altivec.md log entry for move from changed as suggested. config/rs6000/vsx.md log entro for moved to here changed as suggested. define_mode_iterator VI2 also moved, included in both change log entries -------------------------------------------- GCC maintainers: Move the existing vector extract support in altivec.md to vsx.md so all of the vector insert and extract support is in the same file. The patch also updates the name of the builtins and descriptions for the builtins in the documentation file so they match the approved builtin names and descriptions. The patch does not make any functional changes. Please let me know if the changes are acceptable for mainline. Thanks. Carl Love ------------------------------------------------------ gcc/ChangeLog 2020-07-06 Carl Love * config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR) (vextractl, vextractr) (vextractl_internal, vextractr_internal for mode VI2) (VI2): Move to ... * config/rs6000/vsx.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR) (vextractl, vextractr) (vextractl_internal, vextractr_internal for mode VI2) (VI2): ..here. * gcc/doc/extend.texi: Update documentation for vec_extractl. Replace builtin name vec_extractr with vec_extracth. Update description of vec_extracth. --- gcc/config/rs6000/altivec.md | 64 ----------------------------- gcc/config/rs6000/vsx.md | 66 ++++++++++++++++++++++++++++++ gcc/doc/extend.texi | 78 ++++++++++++++++++------------------ 3 files changed, 105 insertions(+), 103 deletions(-) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2ce9227c765..749b2c42c14 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -172,8 +172,6 @@ UNSPEC_XXEVAL UNSPEC_VSTRIR UNSPEC_VSTRIL - UNSPEC_EXTRACTL - UNSPEC_EXTRACTR ]) (define_c_enum "unspecv" @@ -184,8 +182,6 @@ UNSPECV_DSS ]) -;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops -(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) ;; Short vec int modes (define_mode_iterator VIshort [V8HI V16QI]) ;; Longer vec int modes for rotate/mask ops @@ -786,66 +782,6 @@ DONE; }) -(define_expand "vextractl" - [(set (match_operand:V2DI 0 "altivec_register_operand") - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") - (match_operand:VI2 2 "altivec_register_operand") - (match_operand:SI 3 "register_operand")] - UNSPEC_EXTRACTL))] - "TARGET_POWER10" -{ - if (BYTES_BIG_ENDIAN) - { - emit_insn (gen_vextractl_internal (operands[0], operands[1], - operands[2], operands[3])); - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); - } - else - emit_insn (gen_vextractr_internal (operands[0], operands[2], - operands[1], operands[3])); - DONE; -}) - -(define_insn "vextractl_internal" - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") - (match_operand:VEC_I 2 "altivec_register_operand" "v") - (match_operand:SI 3 "register_operand" "r")] - UNSPEC_EXTRACTL))] - "TARGET_POWER10" - "vextvlx %0,%1,%2,%3" - [(set_attr "type" "vecsimple")]) - -(define_expand "vextractr" - [(set (match_operand:V2DI 0 "altivec_register_operand") - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") - (match_operand:VI2 2 "altivec_register_operand") - (match_operand:SI 3 "register_operand")] - UNSPEC_EXTRACTR))] - "TARGET_POWER10" -{ - if (BYTES_BIG_ENDIAN) - { - emit_insn (gen_vextractr_internal (operands[0], operands[1], - operands[2], operands[3])); - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); - } - else - emit_insn (gen_vextractl_internal (operands[0], operands[2], - operands[1], operands[3])); - DONE; -}) - -(define_insn "vextractr_internal" - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") - (match_operand:VEC_I 2 "altivec_register_operand" "v") - (match_operand:SI 3 "register_operand" "r")] - UNSPEC_EXTRACTR))] - "TARGET_POWER10" - "vextvrx %0,%1,%2,%3" - [(set_attr "type" "vecsimple")]) - (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 732a54842b6..e9f89d43b3f 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -347,6 +347,8 @@ UNSPEC_VSX_FIRST_MISMATCH_INDEX UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX UNSPEC_XXGENPCV + UNSPEC_EXTRACTL + UNSPEC_EXTRACTR ]) (define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16 @@ -355,6 +357,9 @@ (define_int_attr xvcvbf16 [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16") (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")]) +;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops +(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) + ;; VSX moves ;; The patterns for LE permuted loads and stores come before the general @@ -3799,6 +3804,67 @@ } [(set_attr "type" "load")]) +;; ISA 3.1 extract +(define_expand "vextractl" + [(set (match_operand:V2DI 0 "altivec_register_operand") + (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_EXTRACTL))] + "TARGET_POWER10" +{ + if (BYTES_BIG_ENDIAN) + { + emit_insn (gen_vextractl_internal (operands[0], operands[1], + operands[2], operands[3])); + emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); + } + else + emit_insn (gen_vextractr_internal (operands[0], operands[2], + operands[1], operands[3])); + DONE; +}) + +(define_insn "vextractl_internal" + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") + (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_EXTRACTL))] + "TARGET_POWER10" + "vextvlx %0,%1,%2,%3" + [(set_attr "type" "vecsimple")]) + +(define_expand "vextractr" + [(set (match_operand:V2DI 0 "altivec_register_operand") + (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_EXTRACTR))] + "TARGET_POWER10" +{ + if (BYTES_BIG_ENDIAN) + { + emit_insn (gen_vextractr_internal (operands[0], operands[1], + operands[2], operands[3])); + emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); + } + else + emit_insn (gen_vextractl_internal (operands[0], operands[2], + operands[1], operands[3])); + DONE; +}) + +(define_insn "vextractr_internal" + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") + (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_EXTRACTR))] + "TARGET_POWER10" + "vextvrx %0,%1,%2,%3" + [(set_attr "type" "vecsimple")]) + ;; VSX_EXTRACT optimizations ;; Optimize double d = (double) vec_extract (vi, ) ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ecd3661d257..0e65d542587 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -20927,6 +20927,9 @@ Perform a 128-bit vector gather operation, as if implemented by the integer value between 2 and 7 inclusive. @findex vec_gnb + +Vector Extract + @smallexample @exdent vector unsigned long long int @exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int) @@ -20937,52 +20940,49 @@ integer value between 2 and 7 inclusive. @exdent vector unsigned long long int @exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int) @end smallexample -Extract a single element from the vector formed by catenating this function's -first two arguments at the byte offset specified by this function's -third argument. On big-endian targets, this function behaves as if -implemented by the @code{vextdubvlx}, @code{vextduhvlx}, -@code{vextduwvlx}, or @code{vextddvlx} instructions, depending on the -types of the function's first two arguments. On little-endian -targets, this function behaves as if implemented by the -@code{vextdubvrx}, @code{vextduhvrx}, -@code{vextduwvrx}, or @code{vextddvrx} instructions. -The byte offset of the element to be extracted is calculated -by computing the remainder of dividing the third argument by 32. -If this reminader value is not a multiple of the vector element size, -or if its value added to the vector element size exceeds 32, the -result is undefined. +Extract an element from two concatenated vectors starting at the given byte index +in natural-endian order, and place it zero-extended in doubleword 1 of the result +according to natural element order. If the byte index is out of range for the +data type, the intrinsic will be rejected. +For little-endian, this output will match the placement by the hardware +instruction, i.e., dword[0] in RTL notation. For big-endian, an additional +instruction is needed to move it from the "left" doubleword to the "right" one. +For little-endian, semantics matching the vextdubvrx, vextduhvrx, +vextduwvrx instruction will be generated, while for big-endian, semantics +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions +will be generated. Note that some fairly anomalous results can be generated if +the byte index is not aligned on an element boundary for the element being +extracted. This is a limitation of the bi-endian vector programming model is +consistent with the limitation on vec_perm, for example. @findex vec_extractl @smallexample @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned char, vector unsigned char, unsigned int) +@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int) @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned short, vector unsigned short, unsigned int) +@exdent vec_extracth (vector unsigned short, vector unsigned short, +unsigned int) @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned int, vector unsigned int, unsigned int) +@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int) @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned long long, vector unsigned long long, unsigned int) -@end smallexample -Extract a single element from the vector formed by catenating this function's -first two arguments at the byte offset calculated by subtracting this -function's third argument from 31. On big-endian targets, this -function behaves as if -implemented by the -@code{vextdubvrx}, @code{vextduhvrx}, -@code{vextduwvrx}, or @code{vextddvrx} instructions, depending on the -types of the function's first two arguments. -On little-endian -targets, this function behaves as if implemented by the -@code{vextdubvlx}, @code{vextduhvlx}, -@code{vextduwvlx}, or @code{vextddvlx} instructions. -The byte offset of the element to be extracted, measured from the -right end of the catenation of the two vector arguments, is calculated -by computing the remainder of dividing the third argument by 32. -If this reminader value is not a multiple of the vector element size, -or if its value added to the vector element size exceeds 32, the -result is undefined. -@findex vec_extractr - +@exdent vec_extracth (vector unsigned long long, vector unsigned long long, +unsigned int) +@end smallexample +Extract an element from two concatenated vectors starting at the given byte +index in opposite-endian order, and place it zero-extended in doubleword 1 +according to natural element order. If the byte index is out of range for the +data type, the intrinsic will be rejected. For little-endian, this output +will match the placement by the hardware instruction, i.e., dword[0] in RTL +notation. For big-endian, an additional instruction is needed to move it +from the "left" doubleword to the "right" one. For little-endian, semantics +matching the vextdubvlx, vextduhvlx, vextduwvlx instructions will be generated, +while for big-endian, semantics matching the vextdubvrx, vextduhvrx, +vextduwvrx instructions will be generated. Note that some fairly anomalous +results can be generated if the byte index is not aligned on the +element boundary for the element being extracted. This is a +limitation of the bi-endian vector programming model consistent with the +limitation on vec_perm, for example. +@findex vec_extracth @smallexample @exdent vector unsigned long long int @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int) -- 2.17.1 ----------------------------------------------------------