From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vries@sourceware.org>
Received: by sourceware.org (Postfix, from userid 2205)
 id A159E3858003; Thu, 10 Feb 2022 08:02:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A159E3858003
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
From: Tom de Vries <vries@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc r12-7163] nvptx: Improved support for HFMode including neghf2
 and abshf2
X-Act-Checkin: gcc
X-Git-Author: Roger Sayle <roger@nextmovesoftware.com>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: bcbe280931535109544cae15b1e575dd53b5c647
X-Git-Newrev: 91a7e1daa7520489fafc0001d03c68bad4304f15
Message-Id: <20220210080222.A159E3858003@sourceware.org>
Date: Thu, 10 Feb 2022 08:02:22 +0000 (GMT)
X-BeenThere: gcc-cvs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-cvs mailing list <gcc-cvs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-cvs>,
 <mailto:gcc-cvs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-cvs/>
List-Help: <mailto:gcc-cvs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-cvs>,
 <mailto:gcc-cvs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Feb 2022 08:02:22 -0000

https://gcc.gnu.org/g:91a7e1daa7520489fafc0001d03c68bad4304f15

commit r12-7163-g91a7e1daa7520489fafc0001d03c68bad4304f15
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Feb 3 09:07:22 2022 +0100

    nvptx: Improved support for HFMode including neghf2 and abshf2
    
    This patch adds more support for _Float16 (HFmode) to the nvptx backend.
    Currently negation, absolute value and floating point comparisons are
    implemented by promoting to float (SFmode).  This patch adds suitable
    define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53).
    This patch also adds support for HFmode fused multiply-add.
    
    One subtlety is that neghf2 and abshf2 are implemented by (HImode)
    bit manipulation operations to update the sign bit.  The NVidia PTX
    ISA documentation for neg.f16 and abs.f16 contains the caution
    "Future implementations may comply with the IEEE 754 standard by preserving
    the (NaN) payload and modifying only the sign bit".  Given the availability
    of suitable replacements, I thought it best to provide IEEE 754 compliant
    implementations.  If anyone observes a performance penalty from this
    choice I'm happy to provide a -ffast-math variant (or revisit this
    decision).
    
    This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
    (including newlib) with a make and make -k check with no new failures.
    
    gcc/ChangeLog:
    
            * config/nvptx/nvptx.md (*cmpf): New define_insn.
            (cstorehf4): New define_expand.
            (fmahf4): New define_insn.
            (neghf2): New define_insn.
            (abshf2): New define_insn.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/nvptx/float16-3.c: New test case for neghf2.
            * gcc.target/nvptx/float16-4.c: New test case for abshf2.
            * gcc.target/nvptx/float16-5.c: New test case for fmahf4.
            * gcc.target/nvptx/float16-6.c: New test case.

Diff:
---
 gcc/config/nvptx/nvptx.md                  | 43 ++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/nvptx/float16-3.c | 11 ++++++++
 gcc/testsuite/gcc.target/nvptx/float16-4.c | 11 ++++++++
 gcc/testsuite/gcc.target/nvptx/float16-5.c | 14 ++++++++++
 gcc/testsuite/gcc.target/nvptx/float16-6.c | 38 ++++++++++++++++++++++++++
 5 files changed, 117 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 7463603a0b0..e26d24ed650 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -783,6 +783,14 @@
   ""
   "%.\\tsetp%c1\\t%0, %2, %3;")
 
+(define_insn "*cmphf"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_float_comparison_operator"
+	   [(match_operand:HF 2 "nvptx_register_operand" "R")
+	    (match_operand:HF 3 "nvptx_nonmemory_operand" "RF")]))]
+  "TARGET_SM53"
+  "%.\\tsetp%c1\\t%0, %2, %3;")
+
 (define_insn "jump"
   [(set (pc)
 	(label_ref (match_operand 0 "" "")))]
@@ -973,6 +981,21 @@
   DONE;
 })
 
+(define_expand "cstorehf4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+	  [(match_operand:HF 2 "nvptx_register_operand")
+	   (match_operand:HF 3 "nvptx_nonmemory_operand")]))]
+  "TARGET_SM53"
+{
+  rtx reg = gen_reg_rtx (BImode);
+  rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), BImode,
+			    operands[2], operands[3]);
+  emit_move_insn (reg, cmp);
+  emit_insn (gen_setccsi_from_bi (operands[0], reg));
+  DONE;
+})
+
 ;; Calls
 
 (define_insn "call_insn_<mode>"
@@ -1160,6 +1183,26 @@
   "TARGET_SM53"
   "%.\\tmul.f16\\t%0, %1, %2;")
 
+(define_insn "fmahf4"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(fma:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		(match_operand:HF 2 "nvptx_nonmemory_operand" "RF")
+		(match_operand:HF 3 "nvptx_nonmemory_operand" "RF")))]
+  "TARGET_SM53"
+  "%.\\tfma%#.f16\\t%0, %1, %2, %3;")
+
+(define_insn "neghf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(neg:HF (match_operand:HF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\txor.b16\\t%0, %1, -32768;")
+
+(define_insn "abshf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(abs:HF (match_operand:HF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tand.b16\\t%0, %1, 32767;")
+
 (define_insn "exp2hf2"
   [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
 	(unspec:HF [(match_operand:HF 1 "nvptx_register_operand" "R")]
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c b/gcc/testsuite/gcc.target/nvptx/float16-3.c
new file mode 100644
index 00000000000..914282aa1c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */
+
+_Float16 var;
+
+void neg()
+{
+  var = -var;
+}
+
+/* { dg-final { scan-assembler "xor.b16" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c b/gcc/testsuite/gcc.target/nvptx/float16-4.c
new file mode 100644
index 00000000000..b11f17a43ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */
+
+_Float16 var;
+
+void foo()
+{
+  var = (var < (_Float16)0.0) ? -var : var;
+}
+
+/* { dg-final { scan-assembler "and.b16" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c b/gcc/testsuite/gcc.target/nvptx/float16-5.c
new file mode 100644
index 00000000000..5fe15ecdf7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */
+
+_Float16 a;
+_Float16 b;
+_Float16 c;
+_Float16 d;
+
+void foo()
+{
+  a = (_Float16)(b*c) + d;
+}
+
+/* { dg-final { scan-assembler "fma.rn.f16" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-6.c b/gcc/testsuite/gcc.target/nvptx/float16-6.c
new file mode 100644
index 00000000000..8fe4fa3051f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-6.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */
+
+_Float16 x;
+_Float16 y;
+
+_Bool eq()
+{
+  return x == y;
+}
+
+_Bool ne()
+{
+  return x != y;
+}
+
+_Bool lt()
+{
+  return x < y;
+}
+
+_Bool le()
+{
+  return x <= y;
+}
+
+_Bool gt()
+{
+  return x < y;
+}
+
+_Bool ge()
+{
+  return x >= y;
+}
+
+/* { dg-final { scan-assembler-times "setp\.\[a-z\]*\.f16" 6 } } */
+/* { dg-final { scan-assembler-not "cvt.f32.f16" } } */