From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E6806386191F; Wed, 8 Jul 2020 19:59:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E6806386191F Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 068JWrj1082801; Wed, 8 Jul 2020 15:59:04 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 325kts1hkb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 15:59:04 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 068JWxlR083402; Wed, 8 Jul 2020 15:59:04 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0b-001b2d01.pphosted.com with ESMTP id 325kts1hk6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 15:59:03 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 068JsdSf025920; Wed, 8 Jul 2020 19:59:03 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma04wdc.us.ibm.com with ESMTP id 325k240mnc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 19:59:03 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 068Jx2GR55312684 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 8 Jul 2020 19:59:03 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CF558112062; Wed, 8 Jul 2020 19:59:02 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E4C23112061; Wed, 8 Jul 2020 19:59:01 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 8 Jul 2020 19:59:01 +0000 (GMT) Message-ID: Subject: Re: [PATCH 0/6 ver 4] ] Permute Class Operations From: Carl Love To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org, Will Schmidt Date: Wed, 08 Jul 2020 12:59:00 -0700 Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-08_16:2020-07-08, 2020-07-08 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 adultscore=0 clxscore=1015 bulkscore=0 mlxlogscore=999 phishscore=0 suspectscore=4 impostorscore=0 lowpriorityscore=0 malwarescore=0 priorityscore=1501 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007080115 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, GB_TO_NAME_FREEMAIL, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2020 19:59:07 -0000 [PATCH 2/6] rs6000 Add vector insert builtin support ------------------------------------ V4 changes Rebased on mainline. Changed FUTURE to P10 as needed. ------------------------------------ V3 changes Replace spaces with of tabs in ChangeLog Ditto in gcc/config/rs6000/vsx.md. Updated description for vec_insertl() builtin. Cleaned up vec_insert description. ----------------------------------------------------------------- v2 changes Fix change log entry for config/rs6000/altivec.h Fix change log entry for config/rs6000/rs6000-builtin.def Fix change log entry for config/rs6000/rs6000-call.c vsx.md: Fixed if (BYTES_BIG_ENDIAN) else statements. Porting error from pu branch. --------------------------------------------------------------- GCC maintainers: This patch adds support for vec_insertl and vec_inserth builtins. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) and mambo with no regression errors. Please let me know if this patch is acceptable for the mainline branch. Thanks. Carl Love -------------------------------------------------------------- gcc/ChangeLog 2020-07-02 Carl Love * config/rs6000/altivec.h (vec_insertl, vec_inserth): New defines. * config/rs6000/rs6000-builtin.def (VINSERTGPRBL, VINSERTGPRHL, VINSERTGPRWL, VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL, VINSERTGPRBR, VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR, VINSERTVPRHR, VINSERTVPRWR): New builtins. (INSERTL, INSERTH): New builtins. * config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VEC_INSERTH): New overloaded definitions. (P10_BUILTIN_VINSERTGPRBL, P10_BUILTIN_VINSERTGPRHL, P10_BUILTIN_VINSERTGPRWL, P10_BUILTIN_VINSERTGPRDL, P10_BUILTIN_VINSERTVPRBL, P10_BUILTIN_VINSERTVPRHL, P10_BUILTIN_VINSERTVPRWL): Add case entries. * config/rs6000/vsx.md (define_c_enum): Add UNSPEC_INSERTL, UNSPEC_INSERTR. (define_expand): Add vinsertvl_, vinsertvr_, vinsertgl_, vinsertgr_, mode is VI2. (define_ins): vinsertvl_internal_, vinsertvr_internal_, vinsertgl_internal_, vinsertgr_internal_, mode VEC_I. * doc/extend.texi: Add documentation for vec_insertl, vec_inserth. gcc/testsuite/ChangeLog 2020-07-02 Carl Love * gcc.target/powerpc/vec-insert-word-runnable.c: New test case. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/rs6000-builtin.def | 18 + gcc/config/rs6000/rs6000-call.c | 51 +++ gcc/config/rs6000/vsx.md | 110 ++++++ gcc/doc/extend.texi | 71 ++++ .../powerpc/vec-insert-word-runnable.c | 345 ++++++++++++++++++ 6 files changed, 597 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index bb1524f4a67..0563853c03f 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -699,6 +699,8 @@ __altivec_scalar_pred(vec_any_nle, /* Overloaded built-in functions for ISA 3.1. */ #define vec_extractl(a, b, c) __builtin_vec_extractl (a, b, c) #define vec_extracth(a, b, c) __builtin_vec_extracth (a, b, c) +#define vec_insertl(a, b, c) __builtin_vec_insertl (a, b, c) +#define vec_inserth(a, b, c) __builtin_vec_inserth (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 363656ec05c..e73d144c1cc 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2708,6 +2708,22 @@ BU_P10V_3 (VEXTRACTHR, "vextduhvhx", CONST, vextractrv8hi) BU_P10V_3 (VEXTRACTWR, "vextduwvhx", CONST, vextractrv4si) BU_P10V_3 (VEXTRACTDR, "vextddvhx", CONST, vextractrv2di) +BU_P10V_3 (VINSERTGPRBL, "vinsgubvlx", CONST, vinsertgl_v16qi) +BU_P10V_3 (VINSERTGPRHL, "vinsguhvlx", CONST, vinsertgl_v8hi) +BU_P10V_3 (VINSERTGPRWL, "vinsguwvlx", CONST, vinsertgl_v4si) +BU_P10V_3 (VINSERTGPRDL, "vinsgudvlx", CONST, vinsertgl_v2di) +BU_P10V_3 (VINSERTVPRBL, "vinsvubvlx", CONST, vinsertvl_v16qi) +BU_P10V_3 (VINSERTVPRHL, "vinsvuhvlx", CONST, vinsertvl_v8hi) +BU_P10V_3 (VINSERTVPRWL, "vinsvuwvlx", CONST, vinsertvl_v4si) + +BU_P10V_3 (VINSERTGPRBR, "vinsgubvrx", CONST, vinsertgr_v16qi) +BU_P10V_3 (VINSERTGPRHR, "vinsguhvrx", CONST, vinsertgr_v8hi) +BU_P10V_3 (VINSERTGPRWR, "vinsguwvrx", CONST, vinsertgr_v4si) +BU_P10V_3 (VINSERTGPRDR, "vinsgudvrx", CONST, vinsertgr_v2di) +BU_P10V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi) +BU_P10V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi) +BU_P10V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si) + BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2727,6 +2743,8 @@ BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm") BU_P10_OVERLOAD_3 (EXTRACTL, "extractl") BU_P10_OVERLOAD_3 (EXTRACTH, "extracth") +BU_P10_OVERLOAD_3 (INSERTL, "insertl") +BU_P10_OVERLOAD_3 (INSERTH, "inserth") BU_P10_OVERLOAD_1 (VSTRIR, "strir") BU_P10_OVERLOAD_1 (VSTRIL, "stril") diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index d3cf2de8878..820b361c0f6 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -5576,6 +5576,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRBL, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRHL, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRWL, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTGPRDL, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRBL, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRHL, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_INSERTL, P10_BUILTIN_VINSERTVPRWL, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_EXTRACTH, P10_BUILTIN_VEXTRACTBR, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, @@ -5589,6 +5611,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRBR, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRHR, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRWR, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTGPRDR, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI }, + { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRBR, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRHR, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_INSERTH, P10_BUILTIN_VINSERTVPRWR, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI }, + { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, { P10_BUILTIN_VEC_VSTRIL, P10_BUILTIN_VSTRIBL, @@ -13788,6 +13832,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0, case P10_BUILTIN_VEXTRACTHR: case P10_BUILTIN_VEXTRACTWR: case P10_BUILTIN_VEXTRACTDR: + case P10_BUILTIN_VINSERTGPRBL: + case P10_BUILTIN_VINSERTGPRHL: + case P10_BUILTIN_VINSERTGPRWL: + case P10_BUILTIN_VINSERTGPRDL: + case P10_BUILTIN_VINSERTVPRBL: + case P10_BUILTIN_VINSERTVPRHL: + case P10_BUILTIN_VINSERTVPRWL: h.uns_p[0] = 1; h.uns_p[1] = 1; h.uns_p[2] = 1; diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e9f89d43b3f..e9d45d1dcfd 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -349,6 +349,8 @@ UNSPEC_XXGENPCV UNSPEC_EXTRACTL UNSPEC_EXTRACTR + UNSPEC_INSERTL + UNSPEC_INSERTR ]) (define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16 @@ -3865,6 +3867,114 @@ "vextvrx %0,%1,%2,%3" [(set_attr "type" "vecsimple")]) +(define_expand "vinsertvl_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_INSERTL))] + "TARGET_POWER10" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertvl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertvr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; +}) + +(define_insn "vinsertvl_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTL))] + "TARGET_POWER10" + "vinsvlx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_expand "vinsertvr_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_INSERTR))] + "TARGET_POWER10" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertvr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertvl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; +}) + +(define_insn "vinsertvr_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTR))] + "TARGET_POWER10" + "vinsvrx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_expand "vinsertgl_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:SI 1 "register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_INSERTL))] + "TARGET_POWER10" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertgl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertgr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; + }) + +(define_insn "vinsertgl_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "register_operand" "r") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTL))] + "TARGET_POWER10" + "vinslx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_expand "vinsertgr_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:SI 1 "register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_INSERTR))] + "TARGET_POWER10" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertgr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertgl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; + }) + +(define_insn "vinsertgr_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "register_operand" "r") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTR))] + "TARGET_POWER10" + "vinsrx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + ;; VSX_EXTRACT optimizations ;; Optimize double d = (double) vec_extract (vi, ) ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0e65d542587..e643346a160 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -20991,6 +20991,77 @@ Perform a vector parallel bits deposit operation, as if implemented by the @code{vpdepd} instruction. @findex vec_pdep +Vector Insert + +@smallexample +@exdent vector unsigned char +@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int); +@exdent vector unsigned int +@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long +@exdent vec_insertl (unsigned long long, vector unsigned long long, +unsigned int); +@exdent vector unsigned char +@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int; +@exdent vector unsigned short +@exdent vec_insertl (vector unsigned short, vector unsigned short, +unsigned int); +@exdent vector unsigned int +@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int); +@end smallexample + +Let src be the first argument, when the first argument is a scalar, or the +rightmost element of the left doubleword of the first argument, when the first +argument is a vector. Insert the source into the destination at the position +given by the third argument, using natural element order in the second +argument. The rest of the second argument is unchanged. If the byte +index is greater than 14 for halfwords, greatere than 12 for words, or +greater than 8 for doublewords the result is undefined. For little-endian, +the generated code will be semantically equivalent to vinsbrx, vinshrx, +or vinswrx instructions. Similarly for big-endian it will be semantically +equivalent to vinsblx, vinshlx, vinswlx. Note that some +fairly anomalous results can be generated if the byte index is not aligned +on an element boundary for the sort of element being inserted. This is a +limitation of the bi-endian vector programming model. +@findex vec_insertl + +@smallexample +@exdent vector unsigned char +@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int); +@exdent vector unsigned int +@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long +@exdent vec_inserth (unsigned long long, vector unsigned long long, +unsigned int); +@exdent vector unsigned char +@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_inserth (vector unsigned short, vector unsigned short, +unsigned int); +@exdent vector unsigned int +@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int); +@end smallexample + +Let src be the first argument, when the first argument is a scalar, or the +rightmost element of the first argument, when the first argument is a vector. +Insert src into the second argument at the position identified by the third +argument, using opposite element order in the second argument, and leaving the +rest of the second argument unchanged. If the byte index is greater than 14 +for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be +rejected. Note that the underlying hardware instruction uses the same register +for the second argument and the result, but this is hidden by the built-in. +For little-endian, the code generation will be semantically equivalent to +vins*lx, while for big-endian it will be semantically equivalent to vins*rx. +Note that some fairly anomalous results can be generated if the byte index is +not aligned on an element boundary for the sort of element being inserted. +This is a limitation of the bi-endian vector programming model consistent with +the limitation on vec_perm, for example. +@findex vec_inserth + @smallexample @exdent vector unsigned long long int @exdent vec_pext (vector unsigned long long int, vector unsigned long long int) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c new file mode 100644 index 00000000000..8c2721aedfc --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c @@ -0,0 +1,345 @@ +/* { dg-do run } */ +/* { dg-require-effective-target power10_hw } */ +/* { dg-options "-mdejagnu-cpu=power10" } */ +#include + +#define DEBUG 0 + +#ifdef DEBUG +#include +#endif + +extern void abort (void); + +int +main (int argc, char *argv []) +{ + int i; + unsigned int index; + vector unsigned char vresult_ch; + vector unsigned char expected_vresult_ch; + vector unsigned char src_va_ch; + vector unsigned char src_vb_ch; + unsigned char src_a_ch; + + vector unsigned short vresult_sh; + vector unsigned short expected_vresult_sh; + vector unsigned short src_va_sh; + vector unsigned short src_vb_sh; + unsigned short int src_a_sh; + + vector unsigned int vresult_int; + vector unsigned int expected_vresult_int; + vector unsigned int src_va_int; + vector unsigned int src_vb_int; + unsigned int src_a_int; + + vector unsigned long long vresult_ll; + vector unsigned long long expected_vresult_ll; + vector unsigned long long src_va_ll; + unsigned long long int src_a_ll; + + /* Vector insert, low index, from GPR */ + src_a_ch = 79; + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 79, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + + vresult_ch = vec_insertl (src_a_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + src_a_sh = 79; + index = 10; + src_va_sh = (vector unsigned short int) { 0, 1, 2, 3, 4, 5, 6, 7 }; + vresult_sh = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short int) { 0, 1, 2, 3, + 4, 79, 6, 7 }; + + vresult_sh = vec_insertl (src_a_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + src_a_int = 79; + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 1, 79, 3 }; + + vresult_int = vec_insertl (src_a_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_a_ll = 79; + index = 8; + src_va_ll = (vector unsigned long long) { 0, 1 }; + vresult_ll = (vector unsigned long long) { 0, 0 }; + expected_vresult_ll = (vector unsigned long long) { 0, 79 }; + + vresult_ll = vec_insertl (src_a_ll, src_va_ll, index); + + if (!vec_all_eq (vresult_ll, expected_vresult_ll)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_ll, src_va_ll, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n", + i, vresult_ll[i], i, expected_vresult_ll[i]); +#else + abort(); +#endif + } + + /* Vector insert, low index, from vector */ + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + src_vb_ch = (vector unsigned char) { 10, 11, 12, 13, 14, 15, 16, 17, + 18, 19, 20, 21, 22, 23, 24, 25 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 18, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + + vresult_ch = vec_insertl (src_vb_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_insertl (src_vb_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + index = 4; + src_va_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 5, 6, 7 }; + src_vb_sh = (vector unsigned short) { 10, 11, 12, 13, 14, 15, 16, 17 }; + vresult_sh = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short) { 0, 1, 14, 3, 4, 5, 6, 7 }; + + vresult_sh = vec_insertl (src_vb_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_insertl (src_vb_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + src_vb_int = (vector unsigned int) { 10, 11, 12, 13 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 1, 12, 3 }; + + vresult_int = vec_insertl (src_vb_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_insertl (src_vb_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + /* Vector insert, high index, from GPR */ + src_a_ch = 79; + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 79, 14, 15 }; + + vresult_ch = vec_inserth (src_a_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + src_a_sh = 79; + index = 10; + src_va_sh = (vector unsigned short int) { 0, 1, 2, 3, 4, 5, 6, 7 }; + vresult_sh = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short int) { 0, 1, 79, 3, + 4, 5, 6, 7 }; + + vresult_sh = vec_inserth (src_a_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + src_a_int = 79; + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 79, 2, 3 }; + + vresult_int = vec_inserth (src_a_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_a_ll = 79; + index = 8; + src_va_ll = (vector unsigned long long) { 0, 1 }; + vresult_ll = (vector unsigned long long) { 0, 0 }; + expected_vresult_ll = (vector unsigned long long) { 79, 1 }; + + vresult_ll = vec_inserth (src_a_ll, src_va_ll, index); + + if (!vec_all_eq (vresult_ll, expected_vresult_ll)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_ll, src_va_ll, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n", + i, vresult_ll[i], i, expected_vresult_ll[i]); +#else + abort(); +#endif + } + + /* Vector insert, left index, from vector */ + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + src_vb_ch = (vector unsigned char) { 10, 11, 12, 13, 14, 15, 16, 17, + 18, 19, 20, 21, 22, 23, 24, 25 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 18, 14, 15 }; + + vresult_ch = vec_inserth (src_vb_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_inserth (src_vb_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + index = 4; + src_va_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 5, 6, 7 }; + src_vb_sh = (vector unsigned short) { 10, 11, 12, 13, 14, 15, 16, 17 }; + vresult_sh = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 14, 6, 7 }; + + vresult_sh = vec_inserth (src_vb_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_inserth (src_vb_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + src_vb_int = (vector unsigned int) { 10, 11, 12, 13 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 12, 2, 3 }; + + vresult_int = vec_inserth (src_vb_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_inserth (src_vb_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + return 0; +} + +/* { dg-final { scan-assembler {\mvinsblx\M} } } */ +/* { dg-final { scan-assembler {\mvinshlx\M} } } */ +/* { dg-final { scan-assembler {\mvinswlx\M} } } */ +/* { dg-final { scan-assembler {\mvinsdlx\M} } } */ +/* { dg-final { scan-assembler {\mvinsbvlx\M} } } */ +/* { dg-final { scan-assembler {\mvinshvlx\M} } } */ +/* { dg-final { scan-assembler {\mvinswvlx\M} } } */ + +/* { dg-final { scan-assembler {\mvinsbrx\M} } } */ +/* { dg-final { scan-assembler {\mvinshrx\M} } } */ +/* { dg-final { scan-assembler {\mvinswrx\M} } } */ +/* { dg-final { scan-assembler {\mvinsdrx\M} } } */ +/* { dg-final { scan-assembler {\mvinsbvrx\M} } } */ +/* { dg-final { scan-assembler {\mvinshvrx\M} } } */ +/* { dg-final { scan-assembler {\mvinswvrx\M} } } */ + -- 2.17.1