From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 63154 invoked by alias); 14 Feb 2018 20:08:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 63143 invoked by uid 89); 14 Feb 2018 20:08:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-27.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=ported X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Feb 2018 20:08:33 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w1EK3lrx028094 for ; Wed, 14 Feb 2018 15:08:32 -0500 Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by mx0b-001b2d01.pphosted.com with ESMTP id 2g4te13mks-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 14 Feb 2018 15:08:31 -0500 Received: from localhost by e32.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 Feb 2018 13:08:30 -0700 Received: from b03cxnp08025.gho.boulder.ibm.com (9.17.130.17) by e32.co.us.ibm.com (192.168.1.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 14 Feb 2018 13:08:29 -0700 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w1EK8Sxc13697306; Wed, 14 Feb 2018 13:08:28 -0700 Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7917DBE039; Wed, 14 Feb 2018 13:08:28 -0700 (MST) Received: from oc3304648336.ibm.com (unknown [9.70.82.121]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP id 0B66FBE03E; Wed, 14 Feb 2018 13:08:27 -0700 (MST) Subject: [PATCH, rs6000] Add builtin support for vec_insert4b, vec_extract4b From: Carl Love To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, David Edelsohn , Bill Schmidt Date: Wed, 14 Feb 2018 20:08:00 -0000 In-Reply-To: <20180206154734.GZ21977@gate.crashing.org> References: <1517513515.3596.25.camel@us.ibm.com> <20180206154734.GZ21977@gate.crashing.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 18021420-0004-0000-0000-000013A8FC18 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008534; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000253; SDB=6.00989742; UDB=6.00502590; IPR=6.00769096; BA=6.00005830; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00019554; XFM=3.00000015; UTC=2018-02-14 20:08:30 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18021420-0005-0000-0000-00008615727A Message-Id: <1518638907.7508.30.camel@us.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-02-14_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1802140234 X-IsSubscribed: yes X-SW-Source: 2018-02/txt/msg00863.txt.bz2 GCC maintainers: Per Segher's comments on the first version of the patch. I split the patch into two. The first patch (this one) adds the ABI specified vec_insert4b and vec_extract builtins.  It adds a runnable file to test the ABI specified builtin instances. Note, the runnable test file does not test for illegal argument values such as the const int second argument > 12 or of the wrong type. Note, the rtl for vec_insert4b in vsx.md is a copy of the vec_vinsert4b code with the name changed. The rtl for vec_extract4b is new. The second patch removes all of the non-ABI builtin support. Additionally, I have addressed the other comments from Segher with regards to formatting issues and rtl register specification. This patch has been tested on:   powerpc64le-unknown-linux-gnu (Power 8 LE)   powerpc64le-unknown-linux-gnu (Power 9 LE) with no regressions. Let me know if the patch looks OK or not. Thanks. The patch should also be ported to GCC 7 so we are in compliance with the ABI. Carl Love ----------------------------------------------------------------------- gcc/ChangeLog: 2018-02-13 Carl Love * config/rs6000/altivec.h: Add builtin names vec_extract4b vec_insert4b. * config/rs6000/rs6000-builtin.def: Add INSERT4B and EXTRACT4B definitions. * config/rs6000/rs6000-c.c: Add the definitions for P9V_BUILTIN_VEC_EXTRACT4B and P9V_BUILTIN_VEC_INSERT4B. * config/rs6000/rs6000.c (altivec_expand_builtin): Add P9V_BUILTIN_EXTRACT4B and P9V_BUILTIN_INSERT4B case statements. * config/rs6000/vsx.md: Add define_insn extract4b. Add define_expand definition for insert4b and define insn *insert3b_internal. * doc/extend.texi: Add documentation for vec_extract4b. gcc/testsuite/ChangeLog: 2018-02-13 Carl Love * gcc.target/powerpc/builtins-7-p9-runnable.c: New runnable test file for the ABI definitions for vec_extract4b and vec_insert4b. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/rs6000-builtin.def | 4 + gcc/config/rs6000/rs6000-c.c | 8 + gcc/config/rs6000/rs6000.c | 2 + gcc/config/rs6000/vsx.md | 41 +++++ gcc/doc/extend.texi | 7 + .../gcc.target/powerpc/builtins-7-p9-runnable.c | 169 +++++++++++++++++++++ 7 files changed, 233 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 684cb1990..3bce2ae39 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -435,6 +435,8 @@ #define vec_vctzw __builtin_vec_vctzw #define vec_vextract4b __builtin_vec_vextract4b #define vec_vinsert4b __builtin_vec_vinsert4b +#define vec_extract4b __builtin_vec_extract4b +#define vec_insert4b __builtin_vec_insert4b #define vec_vprtyb __builtin_vec_vprtyb #define vec_vprtybd __builtin_vec_vprtybd #define vec_vprtybw __builtin_vec_vprtybw diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 86604da46..420d12e29 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2229,6 +2229,8 @@ BU_P9V_AV_2 (VEXTUWRX, "vextuwrx", CONST, vextuwrx) BU_P9V_VSX_2 (VEXTRACT4B, "vextract4b", CONST, vextract4b) BU_P9V_VSX_3 (VINSERT4B, "vinsert4b", CONST, vinsert4b) BU_P9V_VSX_3 (VINSERT4B_DI, "vinsert4b_di", CONST, vinsert4b_di) +BU_P9V_VSX_3 (INSERT4B, "insert4b", CONST, insert4b) +BU_P9V_VSX_2 (EXTRACT4B, "extract4b", CONST, extract4b) /* Hardware IEEE 128-bit floating point round to odd instrucitons added in ISA 3.0 (power9). */ @@ -2291,11 +2293,13 @@ BU_P9V_OVERLOAD_2 (XL_LEN_R, "xl_len_r") BU_P9V_OVERLOAD_2 (VEXTULX, "vextulx") BU_P9V_OVERLOAD_2 (VEXTURX, "vexturx") BU_P9V_OVERLOAD_2 (VEXTRACT4B, "vextract4b") +BU_P9V_OVERLOAD_2 (EXTRACT4B, "extract4b") /* ISA 3.0 Vector scalar overloaded 3 argument functions */ BU_P9V_OVERLOAD_3 (STXVL, "stxvl") BU_P9V_OVERLOAD_3 (XST_LEN_R, "xst_len_r") BU_P9V_OVERLOAD_3 (VINSERT4B, "vinsert4b") +BU_P9V_OVERLOAD_3 (INSERT4B, "insert4b") /* Overloaded CMPNE support was implemented prior to Power 9, so is not mentioned here. */ diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index a68be511c..56e66db98 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -5433,6 +5433,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_INTDI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 }, { P9V_BUILTIN_VEC_VEXTRACT4B, P9V_BUILTIN_VEXTRACT4B, RS6000_BTI_INTDI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI, 0 }, + { P9V_BUILTIN_VEC_EXTRACT4B, P9V_BUILTIN_EXTRACT4B, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, 0 }, { P9V_BUILTIN_VEC_VEXTRACT_FP_FROM_SHORTH, P9V_BUILTIN_VEXTRACT_FP_FROM_SHORTH, RS6000_BTI_V4SF, RS6000_BTI_unsigned_V8HI, 0, 0 }, @@ -5492,6 +5494,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, + { P9V_BUILTIN_VEC_INSERT4B, P9V_BUILTIN_INSERT4B, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_V4SI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI }, + { P9V_BUILTIN_VEC_INSERT4B, P9V_BUILTIN_INSERT4B, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI }, { P9V_BUILTIN_VEC_VINSERT4B, P9V_BUILTIN_VINSERT4B, RS6000_BTI_V16QI, RS6000_BTI_V4SI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI }, diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6a6801aad..f8d8b9687 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -15730,6 +15730,7 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp) case P9V_BUILTIN_VEXTRACT4B: case P9V_BUILTIN_VEC_VEXTRACT4B: + case P9V_BUILTIN_VEC_EXTRACT4B: arg1 = CALL_EXPR_ARG (exp, 1); STRIP_NOPS (arg1); @@ -15747,6 +15748,7 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp) case P9V_BUILTIN_VINSERT4B: case P9V_BUILTIN_VINSERT4B_DI: case P9V_BUILTIN_VEC_VINSERT4B: + case P9V_BUILTIN_VEC_INSERT4B: arg2 = CALL_EXPR_ARG (exp, 2); STRIP_NOPS (arg2); diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 86efdced2..266923f98 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5204,6 +5204,47 @@ ;; Vector insert/extract word at arbitrary byte values. Note, the little ;; endian version needs to adjust the byte number, and the V4SI element in ;; vinsert4b. +(define_insn "extract4b" + [(set (match_operand:V2DI 0 "vsx_register_operand") + (unspec:V2DI [(match_operand:V16QI 1 "vsx_register_operand" "wa") + (match_operand:QI 2 "const_0_to_12_operand" "n")] + UNSPEC_XXEXTRACTUW))] + "TARGET_P9_VECTOR" +{ + if (!VECTOR_ELT_ORDER_BIG) + operands[2] = GEN_INT (12 - INTVAL (operands[2])); + + return "xxextractuw %x0,%x1,%2"; +}) + +(define_expand "insert4b" + [(set (match_operand:V16QI 0 "vsx_register_operand") + (unspec:V16QI [(match_operand:V4SI 1 "vsx_register_operand") + (match_operand:V16QI 2 "vsx_register_operand") + (match_operand:QI 3 "const_0_to_12_operand")] + UNSPEC_XXINSERTW))] + "TARGET_P9_VECTOR" +{ + if (!VECTOR_ELT_ORDER_BIG) + { + rtx op1 = operands[1]; + rtx v4si_tmp = gen_reg_rtx (V4SImode); + emit_insn (gen_vsx_xxpermdi_v4si_be (v4si_tmp, op1, op1, const1_rtx)); + operands[1] = v4si_tmp; + operands[3] = GEN_INT (12 - INTVAL (operands[3])); + } +}) + +(define_insn "*insert4b_internal" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (unspec:V16QI [(match_operand:V4SI 1 "vsx_register_operand" "wa") + (match_operand:V16QI 2 "vsx_register_operand" "0") + (match_operand:QI 3 "const_0_to_12_operand" "n")] + UNSPEC_XXINSERTW))] + "TARGET_P9_VECTOR" + "xxinsertw %x0,%x1,%3" + [(set_attr "type" "vecperm")]) + (define_expand "vextract4b" [(set (match_operand:DI 0 "gpc_reg_operand") (unspec:DI [(match_operand:V16QI 1 "vsx_register_operand") diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index cb9df971a..13dbac42e 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -19055,8 +19055,15 @@ vector int vec_vctzw (vector int); vector unsigned int vec_vctzw (vector int); long long vec_vextract4b (const vector signed char, const int); +vector unsigned long long vec_extract4b (vector unsigned char, + const int); +long long vec_extract4b (const vector signed char, const int); long long vec_vextract4b (const vector unsigned char, const int); +vector unsigned char vec_insert4b (vector signed int, vector unsigned char, + const int); +vector unsigned char vec_insert4b (vector unsigned int, vector unsigned char, + const int); vector signed char vec_insert4b (vector int, vector signed char, const int); vector unsigned char vec_insert4b (vector unsigned int, vector unsigned char, const int); diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c new file mode 100644 index 000000000..137b46b05 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c @@ -0,0 +1,169 @@ +/* { dg-do run { target { powerpc*-*-* && p9vector_hw } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include +#define TRUE 1 +#define FALSE 0 + +#ifdef DEBUG +#include +#endif + +#define EXTRACT 0 + +void abort (void); + +int result_wrong_ull (vector unsigned long long vec_expected, + vector unsigned long long vec_actual) +{ + int i; + + for (i = 0; i < 2; i++) + if (vec_expected[i] != vec_actual[i]) + return TRUE; + + return FALSE; +} + +int result_wrong_uc (vector unsigned char vec_expected, + vector unsigned char vec_actual) +{ + int i; + + for (i = 0; i < 16; i++) + if (vec_expected[i] != vec_actual[i]) + return TRUE; + + return FALSE; +} + +#ifdef DEBUG +void print_ull (vector unsigned long long vec_expected, + vector unsigned long long vec_actual) +{ + int i; + + printf("expected unsigned long long data\n"); + for (i = 0; i < 2; i++) + printf(" %lld,", vec_expected[i]); + + printf("\nactual signed char data\n"); + for (i = 0; i < 2; i++) + printf(" %lld,", vec_actual[i]); + printf("\n"); +} + +void print_uc (vector unsigned char vec_expected, + vector unsigned char vec_actual) +{ + int i; + + printf("expected unsigned char data\n"); + for (i = 0; i < 16; i++) + printf(" %d,", vec_expected[i]); + + printf("\nactual unsigned char data\n"); + for (i = 0; i < 16; i++) + printf(" %d,", vec_actual[i]); + printf("\n"); +} +#endif + +#if EXTRACT +vector unsigned long long +vext (vector unsigned char *vc) +{ + return vextract_si_vchar (*vc, 5); +} +#endif + +int main() +{ + vector signed int vsi_arg; + vector unsigned char vec_uc_arg, vec_uc_result, vec_uc_expected; + vector unsigned long long vec_ull_result, vec_ull_expected; + unsigned long long ull_result, ull_expected; + + vec_uc_arg = (vector unsigned char){1, 2, 3, 4, + 5, 6, 7, 8, + 9, 10, 11, 12, + 13, 14, 15, 16}; + + vsi_arg = (vector signed int){0xA, 0xB, 0xC, 0xD}; + + vec_uc_expected = (vector unsigned char){0xC, 0, 0, 0, + 5, 6, 7, 8, + 9, 10, 11, 12, + 13, 14, 15, 16}; + /* Test vec_insert4b() */ + /* Insert into char 0 location */ + vec_uc_result = vec_insert4b (vsi_arg, vec_uc_arg, 0); + + if (result_wrong_uc(vec_uc_expected, vec_uc_result)) + { +#ifdef DEBUG + printf("Error: vec_insert4b pos 0, result does not match expected result\n"); + print_uc (vec_uc_expected, vec_uc_result); +#else + abort(); +#endif + } + + /* insert into char 4 location */ + vec_uc_expected = (vector unsigned char){1, 2, 3, 4, + 0xC, 0, 0, 0, + 9, 10, 11, 12, + 13, 14, 15, 16}; + vec_uc_result = vec_insert4b (vsi_arg, vec_uc_arg, 4); + + if (result_wrong_uc(vec_uc_expected, vec_uc_result)) + { +#ifdef DEBUG + printf("Error: vec_insert4b pos 4, result does not match expected result\n"); + print_uc (vec_uc_expected, vec_uc_result); +#else + abort(); +#endif + } + + /* Test vec_extract4b() */ + /* Extract 4b, from char 0 location */ + vec_uc_arg = (vector unsigned char){10, 0, 0, 0, + 20, 0, 0, 0, + 30, 0, 0, 0, + 40, 0, 0, 0}; + + vec_ull_expected = (vector unsigned long long){0, 10}; + vec_ull_result = vec_extract4b(vec_uc_arg, 0); + + if (result_wrong_ull(vec_ull_expected, vec_ull_result)) + { +#ifdef DEBUG + printf("Error: vec_extract4b pos 0, result does not match expected result\n"); + print_ull (vec_ull_expected, vec_ull_result); +#else + abort(); +#endif + } + + /* Extract 4b, from char 12 location */ + vec_uc_arg = (vector unsigned char){10, 0, 0, 0, + 20, 0, 0, 0, + 30, 0, 0, 0, + 40, 0, 0, 0}; + + vec_ull_expected = (vector unsigned long long){0, 40}; + vec_ull_result = vec_extract4b(vec_uc_arg, 12); + + if (result_wrong_ull(vec_ull_expected, vec_ull_result)) + { +#ifdef DEBUG + printf("Error: vec_extract4b pos 12, result does not match expected result\n"); + print_ull (vec_ull_expected, vec_ull_result); +#else + abort(); +#endif + } +} -- 2.11.0