From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <christophe.lyon@linaro.org>
Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com
 [IPv6:2a00:1450:4864:20::330])
 by sourceware.org (Postfix) with ESMTPS id DFDA038515D8
 for <gcc-patches@gcc.gnu.org>; Fri, 30 Apr 2021 14:09:56 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DFDA038515D8
Received: by mail-wm1-x330.google.com with SMTP id
 b19-20020a05600c06d3b029014258a636e8so1754858wmn.2
 for <gcc-patches@gcc.gnu.org>; Fri, 30 Apr 2021 07:09:56 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
 :references;
 bh=DBQ0bKjY1TGah3cVkjIcyWMBOHmJMq9hYw/OrVY20zY=;
 b=inugq6P/tOxOgBhUyyHaDOf4MGgtbTbwdUx9Vd9GyznacvIBpp7F4QRMrutsTNwUos
 sZTvfR8qxwQmBGWl1++9jRY06oF1RTThvyiEkLT4d7M/ApzwmzCwvdixivCoMQfyxfbX
 R8V7MvoXwN1MgUSlmlCHqOXEyrP1NnsT9EVYn2ve9fhyhzyRtkKlWBK/k99S4DVne8Dy
 HNpxr7Sf7GstfcPoWtnXaS2z+Gsd4+MyHS4brMs9A2b1F1L1qnwaOzrx7CIHt5sdorYD
 qKQwj2CMu6Vo6T6m6ahLYA42Xs0c3a+KVHcFeQRfQXTvXkGXVAF+j24rmK4gUBkraben
 t/sA==
X-Gm-Message-State: AOAM5328XutWWLXyGILd3vgFh12JEKkqNjHPNG13ZfSN0x3+IAPfrq/H
 +r+Njqfl4oQQjDTFWB9Hbih0ETPOwnRc1h97
X-Google-Smtp-Source: ABdhPJxa+gQ6KqIJ1StaI96Jzy1XmZkdoEXFZEthgFXjk/wSUeP/B3aYcR6rJU067J/I6zGPbYSCww==
X-Received: by 2002:a7b:cf12:: with SMTP id l18mr6238319wmg.37.1619791795369; 
 Fri, 30 Apr 2021 07:09:55 -0700 (PDT)
Received: from localhost.localdomain
 (static.42.136.251.148.clients.your-server.de. [148.251.136.42])
 by smtp.gmail.com with ESMTPSA id o9sm2345756wmh.19.2021.04.30.07.09.54
 for <gcc-patches@gcc.gnu.org>
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 30 Apr 2021 07:09:55 -0700 (PDT)
From: Christophe Lyon <christophe.lyon@linaro.org>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to
 VCMP
Date: Fri, 30 Apr 2021 14:09:48 +0000
Message-Id: <1619791790-628-7-git-send-email-christophe.lyon@linaro.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1619791790-628-1-git-send-email-christophe.lyon@linaro.org>
References: <1619791790-628-1-git-send-email-christophe.lyon@linaro.org>
X-Spam-Status: No, score=-14.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2021 14:09:58 -0000

This patch adds __fp16 support to the previous patch that added vcmp
support with MVE. For this we update existing expanders to use VDQWH
iterator, and add a new expander vcond<VH_cvtto><mode>.  In the
process we need to create suitable iterators, and update v_cmp_result
as needed.

2021-04-26  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/iterators.md (V16): New iterator.
	(VH_cvtto): New iterator.
	(v_cmp_result): Added V4HF and V8HF support.
	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>): Use VDQWH.
	(vcond<mode><mode>): Likewise.
	(vcond_mask_<mode><v_cmp_result>): Likewise.
	(vcond<VH_cvtto><mode>): New expander.

	gcc/testsuite/
	* gcc.target/arm/simd/mve-compare-3.c: New test with GCC vectors.
	* gcc.target/arm/simd/mve-vcmp-f16.c: New test for
	auto-vectorization.
---
 gcc/config/arm/iterators.md                       |  6 ++++
 gcc/config/arm/vec-common.md                      | 40 ++++++++++++++++-------
 gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c | 38 +++++++++++++++++++++
 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c  | 30 +++++++++++++++++
 4 files changed, 102 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index a128465..3042baf 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -231,6 +231,9 @@ (define_mode_iterator VU [V16QI V8HI V4SI])
 ;; Vector modes for 16-bit floating-point support.
 (define_mode_iterator VH [V8HF V4HF])
 
+;; Modes with 16-bit elements only.
+(define_mode_iterator V16 [V4HI V4HF V8HI V8HF])
+
 ;; 16-bit floating-point vector modes suitable for moving (includes BFmode).
 (define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF])
 
@@ -571,6 +574,8 @@ (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si")
 ;; (Opposite) mode to convert to/from for vector-half mode conversions.
 (define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
 			    (V8HI "V8HF") (V8HF "V8HI")])
+(define_mode_attr VH_cvtto [(V4HI "v4hf") (V4HF "v4hi")
+			    (V8HI "v8hf") (V8HF "v8hi")])
 
 ;; Define element mode for each vector mode.
 (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
@@ -720,6 +725,7 @@ (define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI")
 (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
 				(V4HI "v4hi") (V8HI  "v8hi")
 				(V2SI "v2si") (V4SI  "v4si")
+				(V4HF "v4hi") (V8HF  "v8hi")
 				(DI   "di")   (V2DI  "v2di")
 				(V2SF "v2si") (V4SF  "v4si")])
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 034b48b..3fd341c 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -366,8 +366,8 @@ (define_expand "vlshr<mode>3"
 (define_expand "vec_cmp<mode><v_cmp_result>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
 	(match_operator:<V_cmp_result> 1 "comparison_operator"
-	  [(match_operand:VDQW 2 "s_register_operand")
-	   (match_operand:VDQW 3 "reg_or_zero_operand")]))]
+	  [(match_operand:VDQWH 2 "s_register_operand")
+	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
   "ARM_HAVE_<MODE>_ARITH
    && !TARGET_REALLY_IWMMXT
    && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
@@ -399,13 +399,13 @@ (define_expand "vec_cmpu<mode><mode>"
 ;; element-wise.
 
 (define_expand "vcond<mode><mode>"
-  [(set (match_operand:VDQW 0 "s_register_operand")
-	(if_then_else:VDQW
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(if_then_else:VDQWH
 	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQW 4 "s_register_operand")
-	     (match_operand:VDQW 5 "reg_or_zero_operand")])
-	  (match_operand:VDQW 1 "s_register_operand")
-	  (match_operand:VDQW 2 "s_register_operand")))]
+	    [(match_operand:VDQWH 4 "s_register_operand")
+	     (match_operand:VDQWH 5 "reg_or_zero_operand")])
+	  (match_operand:VDQWH 1 "s_register_operand")
+	  (match_operand:VDQWH 2 "s_register_operand")))]
   "ARM_HAVE_<MODE>_ARITH
    && !TARGET_REALLY_IWMMXT
    && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
@@ -430,6 +430,22 @@ (define_expand "vcond<V_cvtto><mode>"
   DONE;
 })
 
+(define_expand "vcond<VH_cvtto><mode>"
+  [(set (match_operand:<VH_CVTTO> 0 "s_register_operand")
+	(if_then_else:<VH_CVTTO>
+	  (match_operator 3 "comparison_operator"
+	    [(match_operand:V16 4 "s_register_operand")
+	     (match_operand:V16 5 "reg_or_zero_operand")])
+	  (match_operand:<VH_CVTTO> 1 "s_register_operand")
+	  (match_operand:<VH_CVTTO> 2 "s_register_operand")))]
+  "ARM_HAVE_<MODE>_ARITH
+   && !TARGET_REALLY_IWMMXT
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  arm_expand_vcond (operands, <V_cmp_result>mode);
+  DONE;
+})
+
 (define_expand "vcondu<mode><v_cmp_result>"
   [(set (match_operand:VDQW 0 "s_register_operand")
 	(if_then_else:VDQW
@@ -446,11 +462,11 @@ (define_expand "vcondu<mode><v_cmp_result>"
 })
 
 (define_expand "vcond_mask_<mode><v_cmp_result>"
-  [(set (match_operand:VDQW 0 "s_register_operand")
-        (if_then_else:VDQW
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+        (if_then_else:VDQWH
           (match_operand:<V_cmp_result> 3 "s_register_operand")
-          (match_operand:VDQW 1 "s_register_operand")
-          (match_operand:VDQW 2 "s_register_operand")))]
+          (match_operand:VDQWH 1 "s_register_operand")
+          (match_operand:VDQWH 2 "s_register_operand")))]
   "ARM_HAVE_<MODE>_ARITH
    && !TARGET_REALLY_IWMMXT"
 {
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
new file mode 100644
index 0000000..76f81e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
@@ -0,0 +1,38 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+/* float 16 tests.  */
+
+#ifndef ELEM_TYPE
+#define ELEM_TYPE __fp16
+#endif
+#ifndef INT_ELEM_TYPE
+#define INT_ELEM_TYPE __INT16_TYPE__
+#endif
+
+#define COMPARE(NAME, OP)			\
+  int_vec					\
+  cmp_##NAME##_reg (vec a, vec b)		\
+  {						\
+    return a OP b;				\
+  }
+
+typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16)));
+typedef ELEM_TYPE vec __attribute__((vector_size(16)));
+
+COMPARE (eq, ==)
+COMPARE (ne, !=)
+COMPARE (lt, <)
+COMPARE (le, <=)
+COMPARE (gt, >)
+COMPARE (ge, >=)
+
+/* eq, ne, lt, le, gt, ge.
+/* { dg-final { scan-assembler-times {\tvcmp.f16\teq, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tne, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tlt, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tle, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tgt, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tge, q[0-9]+, q[0-9]+\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
new file mode 100644
index 0000000..dbae2d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
@@ -0,0 +1,30 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include <stdint.h>
+
+#define NB 8
+
+#define FUNC(OP, NAME)							\
+  void test_ ## NAME ##_f (__fp16 * __restrict__ dest, __fp16 *a, __fp16 *b) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = a[i] OP b[i];						\
+    }									\
+  }
+
+FUNC(==, vcmpeq)
+FUNC(!=, vcmpne)
+FUNC(<, vcmplt)
+FUNC(<=, vcmple)
+FUNC(>, vcmpgt)
+FUNC(>=, vcmpge)
+
+/* { dg-final { scan-assembler-times {\tvcmp.f16\teq, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tne, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tlt, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tle, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tgt, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f16\tge, q[0-9]+, q[0-9]+\n} 1 } } */
-- 
2.7.4