public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2.
@ 2022-01-08 12:21 Roger Sayle
  2022-02-10  8:08 ` Tom de Vries
  0 siblings, 1 reply; 2+ messages in thread
From: Roger Sayle @ 2022-01-08 12:21 UTC (permalink / raw)
  To: 'GCC Patches'

[-- Attachment #1: Type: text/plain, Size: 1599 bytes --]


This patch adds more support for _Float16 (HFmode) to the nvptx backend.
Currently negation, absolute value and floating point comparisons are
implemented by promoting to float (SFmode).  This patch adds suitable
define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53).
This patch also adds support for HFmode fused multiply-add.

One subtlety is that neghf2 and abshf2 are implemented by (HImode)
bit manipulation operations to update the sign bit.  The NVidia PTX
ISA documentation for neg.f16 and abs.f16 contains the caution
"Future implementations may comply with the IEEE 754 standard by preserving
the (NaN) payload and modifying only the sign bit".  Given the availability
of suitable replacements, I thought it best to provide IEEE 754 compliant
implementations.  If anyone observes a performance penalty from this
choice I'm happy to provide a -ffast-math variant (or revisit this
decision).

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.
Ok for mainline?


2022-01-08  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/nvptx/nvptx.md (*cmpf): New define_insn.
	(cstorehf4): New define_expand.
	(fmahf4): New define_insn.
	(neghf2): New define_insn.
	(abshf2): New define_insn.

gcc/testsuite/ChangeLog
	* gcc.target/nvptx/float16-3.c: New test case for neghf2.
	* gcc.target/nvptx/float16-4.c: New test case for abshf2.
	* gcc.target/nvptx/float16-5.c: New test case for fmahf4.
	* gcc.target/nvptx/float16-6.c: New test case.


Thanks in advance,
Roger
--


[-- Attachment #2: patchn2b.txt --]
[-- Type: text/plain, Size: 4270 bytes --]

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index ce74672..a6046d7 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -779,6 +779,14 @@
   ""
   "%.\\tsetp%c1\\t%0, %2, %3;")
 
+(define_insn "*cmphf"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_float_comparison_operator"
+	   [(match_operand:HF 2 "nvptx_register_operand" "R")
+	    (match_operand:HF 3 "nvptx_nonmemory_operand" "RF")]))]
+  "TARGET_SM53"
+  "%.\\tsetp%c1\\t%0, %2, %3;")
+
 (define_insn "jump"
   [(set (pc)
 	(label_ref (match_operand 0 "" "")))]
@@ -969,6 +977,21 @@
   DONE;
 })
 
+(define_expand "cstorehf4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+	  [(match_operand:HF 2 "nvptx_register_operand")
+	   (match_operand:HF 3 "nvptx_nonmemory_operand")]))]
+  "TARGET_SM53"
+{
+  rtx reg = gen_reg_rtx (BImode);
+  rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), BImode,
+			    operands[2], operands[3]);
+  emit_move_insn (reg, cmp);
+  emit_insn (gen_setccsi_from_bi (operands[0], reg));
+  DONE;
+})
+
 ;; Calls
 
 (define_insn "call_insn_<mode>"
@@ -1156,6 +1179,26 @@
   "TARGET_SM53"
   "%.\\tmul.f16\\t%0, %1, %2;")
 
+(define_insn "fmahf4"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(fma:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		(match_operand:HF 2 "nvptx_nonmemory_operand" "RF")
+		(match_operand:HF 3 "nvptx_nonmemory_operand" "RF")))]
+  "TARGET_SM53"
+  "%.\\tfma%#.f16\\t%0, %1, %2, %3;")
+
+(define_insn "neghf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(neg:HF (match_operand:HF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\txor.b16\\t%0, %1, -32768;")
+
+(define_insn "abshf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(abs:HF (match_operand:HF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tand.b16\\t%0, %1, 32767;")
+
 (define_insn "exp2hf2"
   [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
 	(unspec:HF [(match_operand:HF 1 "nvptx_register_operand" "R")]
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c b/gcc/testsuite/gcc.target/nvptx/float16-3.c
new file mode 100644
index 0000000..914282a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */
+
+_Float16 var;
+
+void neg()
+{
+  var = -var;
+}
+
+/* { dg-final { scan-assembler "xor.b16" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c b/gcc/testsuite/gcc.target/nvptx/float16-4.c
new file mode 100644
index 0000000..b11f17a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */
+
+_Float16 var;
+
+void foo()
+{
+  var = (var < (_Float16)0.0) ? -var : var;
+}
+
+/* { dg-final { scan-assembler "and.b16" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c b/gcc/testsuite/gcc.target/nvptx/float16-5.c
new file mode 100644
index 0000000..5fe15ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */
+
+_Float16 a;
+_Float16 b;
+_Float16 c;
+_Float16 d;
+
+void foo()
+{
+  a = (_Float16)(b*c) + d;
+}
+
+/* { dg-final { scan-assembler "fma.rn.f16" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-6.c b/gcc/testsuite/gcc.target/nvptx/float16-6.c
new file mode 100644
index 0000000..8fe4fa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/float16-6.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */
+
+_Float16 x;
+_Float16 y;
+
+_Bool eq()
+{
+  return x == y;
+}
+
+_Bool ne()
+{
+  return x != y;
+}
+
+_Bool lt()
+{
+  return x < y;
+}
+
+_Bool le()
+{
+  return x <= y;
+}
+
+_Bool gt()
+{
+  return x < y;
+}
+
+_Bool ge()
+{
+  return x >= y;
+}
+
+/* { dg-final { scan-assembler-times "setp\.\[a-z\]*\.f16" 6 } } */
+/* { dg-final { scan-assembler-not "cvt.f32.f16" } } */

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2.
  2022-01-08 12:21 [PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2 Roger Sayle
@ 2022-02-10  8:08 ` Tom de Vries
  0 siblings, 0 replies; 2+ messages in thread
From: Tom de Vries @ 2022-02-10  8:08 UTC (permalink / raw)
  To: Roger Sayle, 'GCC Patches'

On 1/8/22 13:21, Roger Sayle wrote:
> 
> This patch adds more support for _Float16 (HFmode) to the nvptx backend.
> Currently negation, absolute value and floating point comparisons are
> implemented by promoting to float (SFmode).  This patch adds suitable
> define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53).
> This patch also adds support for HFmode fused multiply-add.
> 
> One subtlety is that neghf2 and abshf2 are implemented by (HImode)
> bit manipulation operations to update the sign bit.  The NVidia PTX
> ISA documentation for neg.f16 and abs.f16 contains the caution
> "Future implementations may comply with the IEEE 754 standard by preserving
> the (NaN) payload and modifying only the sign bit".  Given the availability
> of suitable replacements, I thought it best to provide IEEE 754 compliant
> implementations.  If anyone observes a performance penalty from this
> choice I'm happy to provide a -ffast-math variant (or revisit this
> decision).
> 
> This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
> (including newlib) with a make and make -k check with no new failures.
> Ok for mainline?
> 

LGTM, applied.

Thanks,
- Tom

> 
> 2022-01-08  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
> 	* config/nvptx/nvptx.md (*cmpf): New define_insn.
> 	(cstorehf4): New define_expand.
> 	(fmahf4): New define_insn.
> 	(neghf2): New define_insn.
> 	(abshf2): New define_insn.
> 
> gcc/testsuite/ChangeLog
> 	* gcc.target/nvptx/float16-3.c: New test case for neghf2.
> 	* gcc.target/nvptx/float16-4.c: New test case for abshf2.
> 	* gcc.target/nvptx/float16-5.c: New test case for fmahf4.
> 	* gcc.target/nvptx/float16-6.c: New test case.
> 
> 
> Thanks in advance,
> Roger
> --
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-02-10  8:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-08 12:21 [PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2 Roger Sayle
2022-02-10  8:08 ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).