From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-473295-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 63154 invoked by alias); 14 Feb 2018 20:08:36 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 63143 invoked by uid 89); 14 Feb 2018 20:08:36 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-27.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=ported
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Feb 2018 20:08:33 +0000
Received: from pps.filterd (m0098416.ppops.net [127.0.0.1])	by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w1EK3lrx028094	for <gcc-patches@gcc.gnu.org>; Wed, 14 Feb 2018 15:08:32 -0500
Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150])	by mx0b-001b2d01.pphosted.com with ESMTP id 2g4te13mks-1	(version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>; Wed, 14 Feb 2018 15:08:31 -0500
Received: from localhost	by e32.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <cel@us.ibm.com>;	Wed, 14 Feb 2018 13:08:30 -0700
Received: from b03cxnp08025.gho.boulder.ibm.com (9.17.130.17)	by e32.co.us.ibm.com (192.168.1.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Wed, 14 Feb 2018 13:08:29 -0700
Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236])	by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w1EK8Sxc13697306;	Wed, 14 Feb 2018 13:08:28 -0700
Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id 7917DBE039;	Wed, 14 Feb 2018 13:08:28 -0700 (MST)
Received: from oc3304648336.ibm.com (unknown [9.70.82.121])	by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP id 0B66FBE03E;	Wed, 14 Feb 2018 13:08:27 -0700 (MST)
Subject: [PATCH, rs6000] Add builtin support for vec_insert4b, vec_extract4b
From: Carl Love <cel@us.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>,        Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Date: Wed, 14 Feb 2018 20:08:00 -0000
In-Reply-To: <20180206154734.GZ21977@gate.crashing.org>
References: <1517513515.3596.25.camel@us.ibm.com>	 <20180206154734.GZ21977@gate.crashing.org>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
X-TM-AS-GCONF: 00
x-cbid: 18021420-0004-0000-0000-000013A8FC18
X-IBM-SpamModules-Scores:
X-IBM-SpamModules-Versions: BY=3.00008534; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000253; SDB=6.00989742; UDB=6.00502590; IPR=6.00769096; BA=6.00005830; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00019554; XFM=3.00000015; UTC=2018-02-14 20:08:30
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 18021420-0005-0000-0000-00008615727A
Message-Id: <1518638907.7508.30.camel@us.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-02-14_08:,, signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1802140234
X-IsSubscribed: yes
X-SW-Source: 2018-02/txt/msg00863.txt.bz2

GCC maintainers:

Per Segher's comments on the first version of the patch.  I split the
patch into two.  The first patch (this one) adds the ABI specified
vec_insert4b and vec_extract builtins. Â It adds a runnable file to test
the ABI specified builtin instances.  Note, the runnable test file does
not test for illegal argument values such as the const int second
argument > 12 or of the wrong type.

Note, the rtl for vec_insert4b in vsx.md is a copy of the vec_vinsert4b
code with the name changed.  The rtl for vec_extract4b is new.

The second patch removes all of the non-ABI builtin support.

Additionally, I have addressed the other comments from Segher with
regards to formatting issues and rtl register specification.

This patch has been tested on:

Â  powerpc64le-unknown-linux-gnu (Power 8 LE)
Â  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.

Let me know if the patch looks OK or not.Â Thanks.

The patch should also be ported to GCC 7 so we are in compliance with
the ABI.

                           Carl Love

-----------------------------------------------------------------------

    gcc/ChangeLog:

    2018-02-13  Carl Love  <cel@us.ibm.com>

        * config/rs6000/altivec.h: Add builtin names vec_extract4b
        vec_insert4b.
        * config/rs6000/rs6000-builtin.def: Add INSERT4B and EXTRACT4B
        definitions.
        * config/rs6000/rs6000-c.c: Add the definitions for
        P9V_BUILTIN_VEC_EXTRACT4B and P9V_BUILTIN_VEC_INSERT4B.
        * config/rs6000/rs6000.c (altivec_expand_builtin): Add
        P9V_BUILTIN_EXTRACT4B and P9V_BUILTIN_INSERT4B case statements.
        * config/rs6000/vsx.md: Add define_insn extract4b.  Add define_expand
	definition for insert4b and define insn *insert3b_internal.
        * doc/extend.texi: Add documentation for vec_extract4b.

    gcc/testsuite/ChangeLog:

    2018-02-13  Carl Love  <cel@us.ibm.com>
        * gcc.target/powerpc/builtins-7-p9-runnable.c: New runnable test file
        for the ABI definitions for vec_extract4b and vec_insert4b.
---
 gcc/config/rs6000/altivec.h                        |   2 +
 gcc/config/rs6000/rs6000-builtin.def               |   4 +
 gcc/config/rs6000/rs6000-c.c                       |   8 +
 gcc/config/rs6000/rs6000.c                         |   2 +
 gcc/config/rs6000/vsx.md                           |  41 +++++
 gcc/doc/extend.texi                                |   7 +
 .../gcc.target/powerpc/builtins-7-p9-runnable.c    | 169 +++++++++++++++++++++
 7 files changed, 233 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 684cb1990..3bce2ae39 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -435,6 +435,8 @@
 #define vec_vctzw __builtin_vec_vctzw
 #define vec_vextract4b __builtin_vec_vextract4b
 #define vec_vinsert4b __builtin_vec_vinsert4b
+#define vec_extract4b __builtin_vec_extract4b
+#define vec_insert4b __builtin_vec_insert4b
 #define vec_vprtyb __builtin_vec_vprtyb
 #define vec_vprtybd __builtin_vec_vprtybd
 #define vec_vprtybw __builtin_vec_vprtybw
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 86604da46..420d12e29 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2229,6 +2229,8 @@ BU_P9V_AV_2 (VEXTUWRX, "vextuwrx",		CONST,	vextuwrx)
 BU_P9V_VSX_2 (VEXTRACT4B,   "vextract4b",	CONST,	vextract4b)
 BU_P9V_VSX_3 (VINSERT4B,    "vinsert4b",	CONST,	vinsert4b)
 BU_P9V_VSX_3 (VINSERT4B_DI, "vinsert4b_di",	CONST,	vinsert4b_di)
+BU_P9V_VSX_3 (INSERT4B,    "insert4b",		CONST,  insert4b)
+BU_P9V_VSX_2 (EXTRACT4B,   "extract4b", 	CONST,  extract4b)
 
 /* Hardware IEEE 128-bit floating point round to odd instrucitons added in ISA
    3.0 (power9).  */
@@ -2291,11 +2293,13 @@ BU_P9V_OVERLOAD_2 (XL_LEN_R,	"xl_len_r")
 BU_P9V_OVERLOAD_2 (VEXTULX,	"vextulx")
 BU_P9V_OVERLOAD_2 (VEXTURX,	"vexturx")
 BU_P9V_OVERLOAD_2 (VEXTRACT4B,	"vextract4b")
+BU_P9V_OVERLOAD_2 (EXTRACT4B,  "extract4b")
 
 /* ISA 3.0 Vector scalar overloaded 3 argument functions */
 BU_P9V_OVERLOAD_3 (STXVL,	"stxvl")
 BU_P9V_OVERLOAD_3 (XST_LEN_R,	"xst_len_r")
 BU_P9V_OVERLOAD_3 (VINSERT4B,	"vinsert4b")
+BU_P9V_OVERLOAD_3 (INSERT4B,    "insert4b")
 
 /* Overloaded CMPNE support was implemented prior to Power 9,
    so is not mentioned here.  */
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index a68be511c..56e66db98 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -5433,6 +5433,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTDI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
   { P9V_BUILTIN_VEC_VEXTRACT4B, P9V_BUILTIN_VEXTRACT4B,
     RS6000_BTI_INTDI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI, 0 },
+  { P9V_BUILTIN_VEC_EXTRACT4B, P9V_BUILTIN_EXTRACT4B,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, 0 },
 
   { P9V_BUILTIN_VEC_VEXTRACT_FP_FROM_SHORTH, P9V_BUILTIN_VEXTRACT_FP_FROM_SHORTH,
     RS6000_BTI_V4SF, RS6000_BTI_unsigned_V8HI, 0, 0 },
@@ -5492,6 +5494,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
 
+  { P9V_BUILTIN_VEC_INSERT4B, P9V_BUILTIN_INSERT4B,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_V4SI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI },
+  { P9V_BUILTIN_VEC_INSERT4B, P9V_BUILTIN_INSERT4B,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI },
   { P9V_BUILTIN_VEC_VINSERT4B, P9V_BUILTIN_VINSERT4B,
     RS6000_BTI_V16QI, RS6000_BTI_V4SI,
     RS6000_BTI_V16QI, RS6000_BTI_UINTSI },
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6a6801aad..f8d8b9687 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15730,6 +15730,7 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
 
     case P9V_BUILTIN_VEXTRACT4B:
     case P9V_BUILTIN_VEC_VEXTRACT4B:
+    case P9V_BUILTIN_VEC_EXTRACT4B:
       arg1 = CALL_EXPR_ARG (exp, 1);
       STRIP_NOPS (arg1);
 
@@ -15747,6 +15748,7 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
     case P9V_BUILTIN_VINSERT4B:
     case P9V_BUILTIN_VINSERT4B_DI:
     case P9V_BUILTIN_VEC_VINSERT4B:
+    case P9V_BUILTIN_VEC_INSERT4B:
       arg2 = CALL_EXPR_ARG (exp, 2);
       STRIP_NOPS (arg2);
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 86efdced2..266923f98 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5204,6 +5204,47 @@
 ;; Vector insert/extract word at arbitrary byte values.  Note, the little
 ;; endian version needs to adjust the byte number, and the V4SI element in
 ;; vinsert4b.
+(define_insn "extract4b"
+  [(set (match_operand:V2DI 0 "vsx_register_operand")
+       (unspec:V2DI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+                     (match_operand:QI 2 "const_0_to_12_operand" "n")]
+                    UNSPEC_XXEXTRACTUW))]
+  "TARGET_P9_VECTOR"
+{
+  if (!VECTOR_ELT_ORDER_BIG)
+    operands[2] = GEN_INT (12 - INTVAL (operands[2]));
+
+  return "xxextractuw %x0,%x1,%2";
+})
+
+(define_expand "insert4b"
+  [(set (match_operand:V16QI 0 "vsx_register_operand")
+	(unspec:V16QI [(match_operand:V4SI 1 "vsx_register_operand")
+		       (match_operand:V16QI 2 "vsx_register_operand")
+		       (match_operand:QI 3 "const_0_to_12_operand")]
+		   UNSPEC_XXINSERTW))]
+  "TARGET_P9_VECTOR"
+{
+  if (!VECTOR_ELT_ORDER_BIG)
+    {
+      rtx op1 = operands[1];
+      rtx v4si_tmp = gen_reg_rtx (V4SImode);
+      emit_insn (gen_vsx_xxpermdi_v4si_be (v4si_tmp, op1, op1, const1_rtx));
+      operands[1] = v4si_tmp;
+      operands[3] = GEN_INT (12 - INTVAL (operands[3]));
+    }
+})
+
+(define_insn "*insert4b_internal"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+	(unspec:V16QI [(match_operand:V4SI 1 "vsx_register_operand" "wa")
+		       (match_operand:V16QI 2 "vsx_register_operand" "0")
+		       (match_operand:QI 3 "const_0_to_12_operand" "n")]
+		   UNSPEC_XXINSERTW))]
+  "TARGET_P9_VECTOR"
+  "xxinsertw %x0,%x1,%3"
+  [(set_attr "type" "vecperm")])
+
 (define_expand "vextract4b"
   [(set (match_operand:DI 0 "gpc_reg_operand")
 	(unspec:DI [(match_operand:V16QI 1 "vsx_register_operand")
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cb9df971a..13dbac42e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -19055,8 +19055,15 @@ vector int vec_vctzw (vector int);
 vector unsigned int vec_vctzw (vector int);
 
 long long vec_vextract4b (const vector signed char, const int);
+vector unsigned long long vec_extract4b (vector unsigned char,
+                                         const int);
+long long vec_extract4b (const vector signed char, const int);
 long long vec_vextract4b (const vector unsigned char, const int);
 
+vector unsigned char vec_insert4b (vector signed int, vector unsigned char,
+                                   const int);
+vector unsigned char vec_insert4b (vector unsigned int, vector unsigned char,
+                                   const int);
 vector signed char vec_insert4b (vector int, vector signed char, const int);
 vector unsigned char vec_insert4b (vector unsigned int, vector unsigned char,
                                    const int);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c
new file mode 100644
index 000000000..137b46b05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-7-p9-runnable.c
@@ -0,0 +1,169 @@
+/* { dg-do run { target { powerpc*-*-* && p9vector_hw } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+#define TRUE 1
+#define FALSE 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+#define EXTRACT 0
+
+void abort (void);
+
+int result_wrong_ull (vector unsigned long long vec_expected,
+		      vector unsigned long long vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+int result_wrong_uc (vector unsigned char vec_expected,
+		     vector unsigned char vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+#ifdef DEBUG
+void print_ull (vector unsigned long long vec_expected,
+		vector unsigned long long vec_actual)
+{
+  int i;
+
+  printf("expected unsigned long long data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %lld,", vec_expected[i]);
+
+  printf("\nactual signed char data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %lld,", vec_actual[i]);
+  printf("\n");
+}
+
+void print_uc (vector unsigned char vec_expected,
+	       vector unsigned char vec_actual)
+{
+  int i;
+
+  printf("expected unsigned char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual unsigned char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+#endif
+
+#if EXTRACT
+vector unsigned long long
+vext (vector unsigned char *vc)
+{
+  return vextract_si_vchar (*vc, 5);
+}
+#endif
+
+int main()
+{
+   vector signed int vsi_arg;
+   vector unsigned char vec_uc_arg, vec_uc_result, vec_uc_expected;
+   vector unsigned long long vec_ull_result, vec_ull_expected;
+   unsigned long long ull_result, ull_expected;
+
+   vec_uc_arg = (vector unsigned char){1, 2, 3, 4,
+				       5, 6, 7, 8,
+				       9, 10, 11, 12,
+				       13, 14, 15, 16};
+
+   vsi_arg = (vector signed int){0xA, 0xB, 0xC, 0xD};
+
+   vec_uc_expected = (vector unsigned char){0xC, 0, 0, 0,
+					    5, 6, 7, 8,
+					    9, 10, 11, 12,
+					    13, 14, 15, 16};
+   /* Test vec_insert4b() */
+   /* Insert into char 0 location */
+   vec_uc_result = vec_insert4b (vsi_arg, vec_uc_arg, 0);
+
+   if (result_wrong_uc(vec_uc_expected, vec_uc_result))
+     {
+#ifdef DEBUG
+        printf("Error: vec_insert4b pos 0, result does not match expected result\n");
+	print_uc (vec_uc_expected, vec_uc_result);
+#else
+        abort();
+#endif
+      }
+
+   /* insert into char 4 location */
+   vec_uc_expected = (vector unsigned char){1, 2, 3, 4,
+					    0xC, 0, 0, 0,
+					    9, 10, 11, 12,
+					    13, 14, 15, 16};
+   vec_uc_result = vec_insert4b (vsi_arg, vec_uc_arg, 4);
+
+   if (result_wrong_uc(vec_uc_expected, vec_uc_result))
+     {
+#ifdef DEBUG
+        printf("Error: vec_insert4b pos 4, result does not match expected result\n");
+	print_uc (vec_uc_expected, vec_uc_result);
+#else
+        abort();
+#endif
+      }
+
+   /* Test vec_extract4b() */
+   /* Extract 4b, from char 0 location */
+   vec_uc_arg = (vector unsigned char){10, 0, 0, 0,
+				       20, 0, 0, 0,
+				       30, 0, 0, 0,
+				       40, 0, 0, 0};
+
+   vec_ull_expected = (vector unsigned long long){0, 10};
+   vec_ull_result = vec_extract4b(vec_uc_arg, 0);
+
+   if (result_wrong_ull(vec_ull_expected, vec_ull_result))
+     {
+#ifdef DEBUG
+        printf("Error: vec_extract4b pos 0, result does not match expected result\n");
+	print_ull (vec_ull_expected, vec_ull_result);
+#else
+        abort();
+#endif
+      }
+
+   /* Extract 4b, from char 12 location */
+   vec_uc_arg = (vector unsigned char){10, 0, 0, 0,
+				       20, 0, 0, 0,
+				       30, 0, 0, 0,
+				       40, 0, 0, 0};
+
+   vec_ull_expected = (vector unsigned long long){0, 40};
+   vec_ull_result = vec_extract4b(vec_uc_arg, 12);
+
+   if (result_wrong_ull(vec_ull_expected, vec_ull_result))
+     {
+#ifdef DEBUG
+        printf("Error: vec_extract4b pos 12, result does not match expected result\n");
+	print_ull (vec_ull_expected, vec_ull_result);
+#else
+        abort();
+#endif
+      }
+}
-- 
2.11.0