public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] amdgcn: Add support for additional natively supported floating-point operations
@ 2022-09-08 20:38 Kwok Cheung Yeung
  2022-09-09  8:10 ` Andrew Stubbs
  2022-09-09 17:57 ` Joseph Myers
  0 siblings, 2 replies; 8+ messages in thread
From: Kwok Cheung Yeung @ 2022-09-08 20:38 UTC (permalink / raw)
  To: gcc-patches, Andrew Stubbs

[-- Attachment #1: Type: text/plain, Size: 1127 bytes --]

Hello

This patch adds support for some additional floating-point operations, 
in scalar and vector modes, which are natively supported by the AMD GCN 
instruction set, but haven't been implemented in GCC yet. With the 
exception of frexp, these implement standard RTL names, and should be 
utilised automatically by GCC.

The instructions for the transcendental functions are documented to have 
limited numerical precision, so they are only used if 
unsafe_math_optimizations are enabled for now.

The sin and cos instructions for some reason are scaled by 2*PI radians 
(i.e. 1.0 == 2*PI radians/360 degrees), so their inputs need to be 
scaled by 1/(2*PI) first. I've implemented this as an expander to two 
instructions - one to do the pre-scaling, one to do the sin/cos. 
1/(2*PI) is a builtin constant for GCN, but the syntax to use it in the 
LLVM assembler was wrong - now fixed.

I have also added some extra GCN-specific builtins to access the vector 
versions of some of these operations (to implement vectorized versions 
of library math routines) and to access the frexp operations.

Okay for trunk?

Thanks

Kwok

[-- Attachment #2: 0001-amdgcn-Add-support-for-additional-natively-supported.patch --]
[-- Type: text/plain, Size: 15389 bytes --]

From 5592c4512212ba74a7a690821650ddcba05df848 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Thu, 8 Sep 2022 17:37:26 +0000
Subject: [PATCH] amdgcn: Add support for additional natively supported
 floating-point operations

This adds support for the following natively supported floating-point
operations, in scalar and vectorized modes:

floor, ceil, exp2*, log2*, sin*, cos*, ldexp, frexp

* These operations are single-precision float only and are only active
if unsafe_math_optimizations are enabled (due to potential numerical
precision issues).

2022-09-08  Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* config/gcn/gcn-builtins.def (FABSVF, LDEXPVF, LDEXPV, FREXPVF_EXP,
	FREXPVF_MANT, FREXPV_EXP, FREXPV_MANT): Add new builtins.
	* config/gcn/gcn-protos.h (gcn_dconst1over2pi): New prototype.
	* config/gcn/gcn-valu.md (MATH_UNOP_1OR2REG, MATH_UNOP_1REG,
	MATH_UNOP_TRIG): New iterators.
	(math_unop): New attributes.
	(<math_unop><mode>2, <math_unop><mode>2<exec>,
	<math_unop><mode>2, <math_unop><mode>2<exec>,
	<math_unop><mode>2_insn, <math_unop><mode>2<exec>_insn,
	ldexp<mode>3, ldexp<mode>3<exec>,
	frexp<mode>_exp2, frexp<mode>_mant2,
	frexp<mode>_exp2<exec>, frexp<mode>_mant2<exec>): New instructions.
	(<math_unop><mode>2, <math_unop><mode>2<exec>): New expanders.
	* config/gcn/gcn.cc (init_ext_gcn_constants): Update definition of
	dconst1over2pi.
	(gcn_dconst1over2pi): New.
	(gcn_builtin_type_index): Add entry for v64df type.
	(v64df_type_node): New.
	(gcn_init_builtin_types): Initialize v64df_type_node.
	(gcn_expand_builtin_1): Expand new builtins to instructions.
	(print_operand): Fix assembler output for 1/(2*PI) constant.
	* config/gcn/gcn.md (unspec): Add new entries.
---
 gcc/config/gcn/gcn-builtins.def |  35 ++++++
 gcc/config/gcn/gcn-protos.h     |   1 +
 gcc/config/gcn/gcn-valu.md      | 181 ++++++++++++++++++++++++++++++++
 gcc/config/gcn/gcn.cc           | 114 +++++++++++++++++++-
 gcc/config/gcn/gcn.md           |   4 +-
 5 files changed, 332 insertions(+), 3 deletions(-)

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index 54e4ea4e953..27691909925 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -59,6 +59,41 @@ DEF_BUILTIN (SQRTF, 3 /*CODE_FOR_sqrtf */,
 	     _A2 (GCN_BTI_SF, GCN_BTI_SF),
 	     gcn_expand_builtin_1)
 
+DEF_BUILTIN (FABSVF, 3 /*CODE_FOR_fabsvf */,
+	     "fabsvf", B_INSN,
+	     _A2 (GCN_BTI_V64SF, GCN_BTI_V64SF),
+	     gcn_expand_builtin_1)
+
+DEF_BUILTIN (LDEXPVF, 3 /*CODE_FOR_ldexpvf */,
+	     "ldexpvf", B_INSN,
+	     _A3 (GCN_BTI_V64SF, GCN_BTI_V64SF, GCN_BTI_V64SI),
+	     gcn_expand_builtin_1)
+
+DEF_BUILTIN (LDEXPV, 3 /*CODE_FOR_ldexpv */,
+	     "ldexpv", B_INSN,
+	     _A3 (GCN_BTI_V64DF, GCN_BTI_V64DF, GCN_BTI_V64SI),
+	     gcn_expand_builtin_1)
+
+DEF_BUILTIN (FREXPVF_EXP, 3 /*CODE_FOR_frexpvf_exp */,
+	     "frexpvf_exp", B_INSN,
+	     _A2 (GCN_BTI_V64SI, GCN_BTI_V64SF),
+	     gcn_expand_builtin_1)
+
+DEF_BUILTIN (FREXPVF_MANT, 3 /*CODE_FOR_frexpvf_mant */,
+	     "frexpvf_mant", B_INSN,
+	     _A2 (GCN_BTI_V64SF, GCN_BTI_V64SF),
+	     gcn_expand_builtin_1)
+
+DEF_BUILTIN (FREXPV_EXP, 3 /*CODE_FOR_frexpv_exp */,
+	     "frexpv_exp", B_INSN,
+	     _A2 (GCN_BTI_V64SI, GCN_BTI_V64DF),
+	     gcn_expand_builtin_1)
+
+DEF_BUILTIN (FREXPV_MANT, 3 /*CODE_FOR_frexpv_mant */,
+	     "frexpv_mant", B_INSN,
+	     _A2 (GCN_BTI_V64DF, GCN_BTI_V64DF),
+	     gcn_expand_builtin_1)
+
 DEF_BUILTIN (CMP_SWAP, -1,
 	    "cmp_swap", B_INSN,
 	    _A4 (GCN_BTI_UINT, GCN_BTI_VOIDPTR, GCN_BTI_UINT, GCN_BTI_UINT),
diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index 38197b929fd..ca804609c09 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -54,6 +54,7 @@ extern int gcn_hard_regno_nregs (int regno, machine_mode mode);
 extern void gcn_hsa_declare_function_name (FILE *file, const char *name,
 					   tree decl);
 extern HOST_WIDE_INT gcn_initial_elimination_offset (int, int);
+extern REAL_VALUE_TYPE gcn_dconst1over2pi (void);
 extern bool gcn_inline_constant64_p (rtx, bool);
 extern bool gcn_inline_constant_p (rtx);
 extern int gcn_inline_fp_constant_p (rtx, bool);
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 8c33ae0c717..3bfdf8213fc 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2290,6 +2290,187 @@
   [(set_attr "type" "vop1")
    (set_attr "length" "8")])
 
+; These FP unops have f64, f32 and f16 versions.
+(define_int_iterator MATH_UNOP_1OR2REG
+  [UNSPEC_FLOOR UNSPEC_CEIL])
+
+; These FP unops only have f16/f32 versions.
+(define_int_iterator MATH_UNOP_1REG
+  [UNSPEC_EXP2 UNSPEC_LOG2])
+
+(define_int_iterator MATH_UNOP_TRIG
+  [UNSPEC_SIN UNSPEC_COS])
+
+(define_int_attr math_unop
+  [(UNSPEC_FLOOR "floor")
+   (UNSPEC_CEIL "ceil")
+   (UNSPEC_EXP2 "exp2")
+   (UNSPEC_LOG2 "log2")
+   (UNSPEC_SIN "sin")
+   (UNSPEC_COS "cos")])
+
+(define_insn "<math_unop><mode>2"
+  [(set (match_operand:FP 0 "register_operand"  "=  v")
+	(unspec:FP
+	  [(match_operand:FP 1 "gcn_alu_operand" "vSvB")]
+	  MATH_UNOP_1OR2REG))]
+  ""
+  "v_<math_unop>%i0\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "<math_unop><mode>2<exec>"
+  [(set (match_operand:V_FP 0 "register_operand"  "=  v")
+	(unspec:V_FP
+	  [(match_operand:V_FP 1 "gcn_alu_operand" "vSvB")]
+	  MATH_UNOP_1OR2REG))]
+  ""
+  "v_<math_unop>%i0\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "<math_unop><mode>2"
+  [(set (match_operand:FP_1REG 0 "register_operand"  "=  v")
+	(unspec:FP_1REG
+	  [(match_operand:FP_1REG 1 "gcn_alu_operand" "vSvB")]
+	  MATH_UNOP_1REG))]
+  "flag_unsafe_math_optimizations"
+  "v_<math_unop>%i0\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "<math_unop><mode>2<exec>"
+  [(set (match_operand:V_FP_1REG 0 "register_operand"  "=  v")
+	(unspec:V_FP_1REG
+	  [(match_operand:V_FP_1REG 1 "gcn_alu_operand" "vSvB")]
+	  MATH_UNOP_1REG))]
+  "flag_unsafe_math_optimizations"
+  "v_<math_unop>%i0\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "*<math_unop><mode>2_insn"
+  [(set (match_operand:FP_1REG 0 "register_operand"  "=  v")
+	(unspec:FP_1REG
+	  [(match_operand:FP_1REG 1 "gcn_alu_operand" "vSvB")]
+	  MATH_UNOP_TRIG))]
+  "flag_unsafe_math_optimizations"
+  "v_<math_unop>%i0\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "*<math_unop><mode>2<exec>_insn"
+  [(set (match_operand:V_FP_1REG 0 "register_operand"  "=  v")
+	(unspec:V_FP_1REG
+	  [(match_operand:V_FP_1REG 1 "gcn_alu_operand" "vSvB")]
+	  MATH_UNOP_TRIG))]
+  "flag_unsafe_math_optimizations"
+  "v_<math_unop>%i0\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+; Trigonometric functions need their input scaled by 1/(2*PI) first.
+
+(define_expand "<math_unop><mode>2"
+  [(set (match_dup 2)
+	(mult:FP_1REG
+	  (match_dup 3)
+	  (match_operand:FP_1REG 1 "gcn_alu_operand")))
+   (set (match_operand:FP_1REG 0 "register_operand")
+	(unspec:FP_1REG
+	  [(match_dup 2)]
+	  MATH_UNOP_TRIG))]
+  "flag_unsafe_math_optimizations"
+  {
+    operands[2] = gen_reg_rtx (<MODE>mode);
+    operands[3] = const_double_from_real_value (gcn_dconst1over2pi (),
+						<MODE>mode);
+  })
+
+(define_expand "<math_unop><mode>2<exec>"
+  [(set (match_dup 2)
+	(mult:V_FP_1REG
+	  (match_dup 3)
+	  (match_operand:V_FP_1REG 1 "gcn_alu_operand")))
+   (set (match_operand:V_FP_1REG 0 "register_operand")
+	(unspec:V_FP_1REG
+	  [(match_dup 2)]
+	  MATH_UNOP_TRIG))]
+  "flag_unsafe_math_optimizations"
+  {
+    operands[2] = gen_reg_rtx (<MODE>mode);
+    operands[3] =
+	gcn_vec_constant (<MODE>mode,
+			  const_double_from_real_value (gcn_dconst1over2pi (),
+							<SCALAR_MODE>mode));
+  })
+
+; Implement ldexp pattern
+
+(define_insn "ldexp<mode>3"
+  [(set (match_operand:FP 0 "register_operand"  "=v")
+	(unspec:FP
+	  [(match_operand:FP 1 "gcn_alu_operand" "vB")
+	   (match_operand:SI 2 "gcn_alu_operand" "vSvA")]
+	  UNSPEC_LDEXP))]
+  ""
+  "v_ldexp%i0\t%0, %1, %2"
+  [(set_attr "type" "vop3a")
+   (set_attr "length" "8")])
+
+(define_insn "ldexp<mode>3<exec>"
+  [(set (match_operand:V_FP 0 "register_operand"  "=v")
+	(unspec:V_FP
+	  [(match_operand:V_FP 1 "gcn_alu_operand" "vB")
+	   (match_operand:V64SI 2 "gcn_alu_operand" "vSvA")]
+	  UNSPEC_LDEXP))]
+  ""
+  "v_ldexp%i0\t%0, %1, %2"
+  [(set_attr "type" "vop3a")
+   (set_attr "length" "8")])
+
+; Implement frexp patterns
+
+(define_insn "frexp<mode>_exp2"
+  [(set (match_operand:SI 0 "register_operand" "=v")
+	(unspec:SI
+	  [(match_operand:FP 1 "gcn_alu_operand" "vB")]
+	  UNSPEC_FREXP_EXP))]
+  ""
+  "v_frexp_exp_i32%i1\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "frexp<mode>_mant2"
+  [(set (match_operand:FP 0 "register_operand" "=v")
+	(unspec:FP
+	  [(match_operand:FP 1 "gcn_alu_operand" "vB")]
+	  UNSPEC_FREXP_MANT))]
+  ""
+  "v_frexp_mant%i1\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "frexp<mode>_exp2<exec>"
+  [(set (match_operand:V64SI 0 "register_operand" "=v")
+	(unspec:V64SI
+	  [(match_operand:V_FP 1 "gcn_alu_operand" "vB")]
+	  UNSPEC_FREXP_EXP))]
+  ""
+  "v_frexp_exp_i32%i1\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
+(define_insn "frexp<mode>_mant2<exec>"
+  [(set (match_operand:V_FP 0 "register_operand" "=v")
+	(unspec:V_FP
+	  [(match_operand:V_FP 1 "gcn_alu_operand" "vB")]
+	  UNSPEC_FREXP_MANT))]
+  ""
+  "v_frexp_mant%i1\t%0, %1"
+  [(set_attr "type" "vop1")
+   (set_attr "length" "8")])
+
 ;; }}}
 ;; {{{ FP fused multiply and add
 
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 82667556512..eb822e20dd1 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -779,12 +779,20 @@ init_ext_gcn_constants (void)
   /* FIXME: this constant probably does not match what hardware really loads.
      Reality check it eventually.  */
   real_from_string (&dconst1over2pi,
-		    "0.1591549430918953357663423455968866839");
+		    "0.15915494309189532");
   real_convert (&dconst1over2pi, SFmode, &dconst1over2pi);
 
   ext_gcn_constants_init = 1;
 }
 
+REAL_VALUE_TYPE
+gcn_dconst1over2pi (void)
+{
+  if (!ext_gcn_constants_init)
+    init_ext_gcn_constants ();
+  return dconst1over2pi;
+}
+
 /* Return non-zero if X is a constant that can appear as an inline operand.
    This is 0, 0.5, -0.5, 1, -1, 2, -2, 4,-4, 1/(2*pi)
    Or a vector of those.
@@ -3605,6 +3613,7 @@ enum gcn_builtin_type_index
   GCN_BTI_SF,
   GCN_BTI_V64SI,
   GCN_BTI_V64SF,
+  GCN_BTI_V64DF,
   GCN_BTI_V64PTR,
   GCN_BTI_SIPTR,
   GCN_BTI_SFPTR,
@@ -3621,6 +3630,7 @@ static GTY(()) tree gcn_builtin_types[GCN_BTI_MAX];
 #define sf_type_node (gcn_builtin_types[GCN_BTI_SF])
 #define v64si_type_node (gcn_builtin_types[GCN_BTI_V64SI])
 #define v64sf_type_node (gcn_builtin_types[GCN_BTI_V64SF])
+#define v64df_type_node (gcn_builtin_types[GCN_BTI_V64DF])
 #define v64ptr_type_node (gcn_builtin_types[GCN_BTI_V64PTR])
 #define siptr_type_node (gcn_builtin_types[GCN_BTI_SIPTR])
 #define sfptr_type_node (gcn_builtin_types[GCN_BTI_SFPTR])
@@ -3710,6 +3720,7 @@ gcn_init_builtin_types (void)
   sf_type_node = float32_type_node;
   v64si_type_node = build_vector_type (intSI_type_node, 64);
   v64sf_type_node = build_vector_type (float_type_node, 64);
+  v64df_type_node = build_vector_type (double_type_node, 64);
   v64ptr_type_node = build_vector_type (unsigned_intDI_type_node
 					/*build_pointer_type
 					  (integer_type_node) */
@@ -3977,6 +3988,105 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	emit_insn (gen_sqrtsf2 (target, arg));
 	return target;
       }
+    case GCN_BUILTIN_FABSVF:
+      {
+	if (ignore)
+	  return target;
+	rtx exec = gcn_full_exec_reg ();
+	rtx arg = force_reg (V64SFmode,
+			     expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+					  V64SFmode,
+					  EXPAND_NORMAL));
+	emit_insn (gen_absv64sf2_exec
+		   (target, arg, gcn_gen_undef (V64SFmode), exec));
+	return target;
+      }
+    case GCN_BUILTIN_LDEXPVF:
+      {
+	if (ignore)
+	  return target;
+	rtx exec = gcn_full_exec_reg ();
+	rtx arg1 = force_reg (V64SFmode,
+			      expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+					   V64SFmode,
+					   EXPAND_NORMAL));
+	rtx arg2 = force_reg (V64SImode,
+			      expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX,
+					   V64SImode,
+					   EXPAND_NORMAL));
+	emit_insn (gen_ldexpv64sf3_exec
+		   (target, arg1, arg2, gcn_gen_undef (V64SFmode), exec));
+	return target;
+      }
+    case GCN_BUILTIN_LDEXPV:
+      {
+	if (ignore)
+	  return target;
+	rtx exec = gcn_full_exec_reg ();
+	rtx arg1 = force_reg (V64DFmode,
+			      expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+					   V64SFmode,
+					   EXPAND_NORMAL));
+	rtx arg2 = force_reg (V64SImode,
+			      expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX,
+					   V64SImode,
+					   EXPAND_NORMAL));
+	emit_insn (gen_ldexpv64df3_exec
+		   (target, arg1, arg2, gcn_gen_undef (V64DFmode), exec));
+	return target;
+      }
+    case GCN_BUILTIN_FREXPVF_EXP:
+      {
+	if (ignore)
+	  return target;
+	rtx exec = gcn_full_exec_reg ();
+	rtx arg = force_reg (V64SFmode,
+			     expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+					  V64SFmode,
+					  EXPAND_NORMAL));
+	emit_insn (gen_frexpv64sf_exp2_exec
+		   (target, arg, gcn_gen_undef (V64SImode), exec));
+	return target;
+      }
+    case GCN_BUILTIN_FREXPVF_MANT:
+      {
+	if (ignore)
+	  return target;
+	rtx exec = gcn_full_exec_reg ();
+	rtx arg = force_reg (V64SFmode,
+			     expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+					  V64SFmode,
+					  EXPAND_NORMAL));
+	emit_insn (gen_frexpv64sf_mant2_exec
+		   (target, arg, gcn_gen_undef (V64SFmode), exec));
+	return target;
+      }
+    case GCN_BUILTIN_FREXPV_EXP:
+      {
+	if (ignore)
+	  return target;
+	rtx exec = gcn_full_exec_reg ();
+	rtx arg = force_reg (V64DFmode,
+			     expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+					  V64DFmode,
+					  EXPAND_NORMAL));
+	emit_insn (gen_frexpv64df_exp2_exec
+		   (target, arg, gcn_gen_undef (V64SImode), exec));
+	return target;
+      }
+    case GCN_BUILTIN_FREXPV_MANT:
+      {
+	if (ignore)
+	  return target;
+	rtx exec = gcn_full_exec_reg ();
+	rtx arg = force_reg (V64DFmode,
+			     expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+					  V64DFmode,
+					  EXPAND_NORMAL));
+	emit_insn (gen_frexpv64df_mant2_exec
+		   (target, arg, gcn_gen_undef (V64DFmode), exec));
+	return target;
+      }
     case GCN_BUILTIN_OMP_DIM_SIZE:
       {
 	if (ignore)
@@ -6476,7 +6586,7 @@ print_operand (FILE *file, rtx x, int code)
 	      str = "-4.0";
 	      break;
 	    case 248:
-	      str = "1/pi";
+	      str = "0.15915494";
 	      break;
 	    default:
 	      rtx ix = simplify_gen_subreg (GET_MODE (x) == DFmode
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 7805e867901..a3c9523cd6d 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -82,7 +82,9 @@
   UNSPEC_GATHER
   UNSPEC_SCATTER
   UNSPEC_RCP
-  UNSPEC_FLBIT_INT])
+  UNSPEC_FLBIT_INT
+  UNSPEC_FLOOR UNSPEC_CEIL UNSPEC_SIN UNSPEC_COS UNSPEC_EXP2 UNSPEC_LOG2
+  UNSPEC_LDEXP UNSPEC_FREXP_EXP UNSPEC_FREXP_MANT])
 
 ;; }}}
 ;; {{{ Attributes
-- 
2.30.0.335.ge636282


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations
  2022-09-08 20:38 [PATCH] amdgcn: Add support for additional natively supported floating-point operations Kwok Cheung Yeung
@ 2022-09-09  8:10 ` Andrew Stubbs
  2022-09-09  9:15   ` Tobias Burnus
  2022-09-09 17:57 ` Joseph Myers
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Stubbs @ 2022-09-09  8:10 UTC (permalink / raw)
  To: Kwok Cheung Yeung, gcc-patches

On 08/09/2022 21:38, Kwok Cheung Yeung wrote:
> Hello
> 
> This patch adds support for some additional floating-point operations, 
> in scalar and vector modes, which are natively supported by the AMD GCN 
> instruction set, but haven't been implemented in GCC yet. With the 
> exception of frexp, these implement standard RTL names, and should be 
> utilised automatically by GCC.
> 
> The instructions for the transcendental functions are documented to have 
> limited numerical precision, so they are only used if 
> unsafe_math_optimizations are enabled for now.
> 
> The sin and cos instructions for some reason are scaled by 2*PI radians 
> (i.e. 1.0 == 2*PI radians/360 degrees), so their inputs need to be 
> scaled by 1/(2*PI) first. I've implemented this as an expander to two 
> instructions - one to do the pre-scaling, one to do the sin/cos. 
> 1/(2*PI) is a builtin constant for GCN, but the syntax to use it in the 
> LLVM assembler was wrong - now fixed.
> 
> I have also added some extra GCN-specific builtins to access the vector 
> versions of some of these operations (to implement vectorized versions 
> of library math routines) and to access the frexp operations.
> 
> Okay for trunk?

LGTM. I'm assuming you've checked the maths. :)

Andrew


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations
  2022-09-09  8:10 ` Andrew Stubbs
@ 2022-09-09  9:15   ` Tobias Burnus
  2022-09-09 10:16     ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Tobias Burnus @ 2022-09-09  9:15 UTC (permalink / raw)
  To: Andrew Stubbs, Kwok Cheung Yeung, gcc-patches, Richard Biener


[-- Attachment #1.1: Type: text/plain, Size: 1880 bytes --]

On 09.09.22 10:10, Andrew Stubbs wrote:
On 08.09.22 22:38, Kwok Cheung Yeung wrote:
The instructions for the transcendental functions are documented to have limited numerical precision, so they are only used if unsafe_math_optimizations are enabled for now.

-funsafe-math-optimizations implies -fno-signed-zeros, -fno-trapping-math, -fassociative-math,
and -freciprocal-math. All of them reduce precision and my violate IEEE or ISO/language standards.

However, I think it is rather surprising to have all of the sudden only a precision of the
order of 100,000,000 ULP instead of ~4 ULP as to be expected. That's a precision loss of the
order of 10^8 or 2^29 which is huge!

For program deliberately using double precision, it can be too much – even if they do not need
double precision in reality. (Weather forecast system recently moved to single precision as the
quality is similar and benefits of faster results/finer grids or longer forecast times prevail.)

As this behavior is highly surprising, I think it should be at least documented.

In https://gcc.gnu.org/PR105246 , I suggested a new flag (such as -mpermit-reduced-precision) to
make it possible turn it on/off explicitly (might be still enabled by -funsafe-math-optimizations);
alternatively, it could also be handled as initial guess for the result which is then refined
in some iteration steps. (It could also be combined to give the user the choice.)

While still being convinced that a flag makes more sense than just documenting it,
I have nonetheless attached a documentation attempt.

Thoughts?

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: gcn-unsafe-math.diff --]
[-- Type: text/x-patch, Size: 904 bytes --]

GCN: Document in invoke.texi reduced precs with -funsafe-math-opt

gcc/ChangeLog:

	* doc/invoke.texi (AMD GCN Options): Document that
	-funsafe-math-optimizations implies single-precision
	results for some math intrinsics.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c066219a7d..01011bd9f9b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20139,8 +20139,14 @@ purpose.  The default is @option{-m1reg-none}.
 @cindex AMD GCN Options
 
 These options are defined specifically for the AMD GCN port.
 
+Note that the @option{-funsafe-math-optimizations} option implies that
+for 64bit floating-pointer numbers, the following operations yield results
+with only 23 bits instead of 52 bits for the fractional part of the
+floating-point number: @code{sqrt}, @code{exp2}, @code{log2}, @code{sin}
+and @code{cos}.
+
 @table @gcctabopt
 
 @item -march=@var{gpu}
 @opindex march

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations
  2022-09-09  9:15   ` Tobias Burnus
@ 2022-09-09 10:16     ` Richard Biener
  2022-09-09 12:20       ` GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations) Tobias Burnus
  2022-09-09 12:32       ` [PATCH] amdgcn: Add support for additional natively supported floating-point operations Stubbs, Andrew
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Biener @ 2022-09-09 10:16 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Andrew Stubbs, Kwok Cheung Yeung, gcc-patches

On Fri, 9 Sep 2022, Tobias Burnus wrote:

> On 09.09.22 10:10, Andrew Stubbs wrote:
> On 08.09.22 22:38, Kwok Cheung Yeung wrote:
> The instructions for the transcendental functions are documented to have
> limited numerical precision, so they are only used if
> unsafe_math_optimizations are enabled for now.
> 
> -funsafe-math-optimizations implies -fno-signed-zeros, -fno-trapping-math,
> -fassociative-math,
> and -freciprocal-math. All of them reduce precision and my violate IEEE or
> ISO/language standards.
> 
> However, I think it is rather surprising to have all of the sudden only a
> precision of the
> order of 100,000,000 ULP instead of ~4 ULP as to be expected. That's a
> precision loss of the
> order of 10^8 or 2^29 which is huge!
> 
> For program deliberately using double precision, it can be too much ? even if
> they do not need
> double precision in reality. (Weather forecast system recently moved to single
> precision as the
> quality is similar and benefits of faster results/finer grids or longer
> forecast times prevail.)
> 
> As this behavior is highly surprising, I think it should be at least
> documented.
> 
> In https://gcc.gnu.org/PR105246 , I suggested a new flag (such as
> -mpermit-reduced-precision) to
> make it possible turn it on/off explicitly (might be still enabled by
> -funsafe-math-optimizations);
> alternatively, it could also be handled as initial guess for the result which
> is then refined
> in some iteration steps. (It could also be combined to give the user the
> choice.)
> 
> While still being convinced that a flag makes more sense than just documenting
> it,
> I have nonetheless attached a documentation attempt.
> 
> Thoughts?

I agree - for example powerpc has -mrecip= to control which instructions
to use (float/double rsqrt or inverse) and -mrecip-precision to
specify whether further iteration is done or not.

x86 has similar but does always perform newton raphson iteration,
documenting 2 ulp instead of 0.5 ulp precision.

Your suggested huge reduction in precision isn't usually acceptable
and should be always explicitely enabled.

Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations)
  2022-09-09 10:16     ` Richard Biener
@ 2022-09-09 12:20       ` Tobias Burnus
  2022-09-09 12:40         ` Andrew Stubbs
  2022-09-09 12:32       ` [PATCH] amdgcn: Add support for additional natively supported floating-point operations Stubbs, Andrew
  1 sibling, 1 reply; 8+ messages in thread
From: Tobias Burnus @ 2022-09-09 12:20 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Kwok Cheung Yeung, gcc-patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2031 bytes --]

On 09.09.22 12:16, Richard Biener wrote:
> On Fri, 9 Sep 2022, Tobias Burnus wrote:
>> -funsafe-math-optimizations implies -fno-signed-zeros, -fno-trapping-math,
>> -fassociative-math,
>> and -freciprocal-math. All of them reduce precision and my violate IEEE or
>> ISO/language standards.
>>
>> However, I think it is rather surprising to have all of the sudden only a
>> precision of the order of 100,000,000 ULP instead of ~4 ULP as to be expected.
>> That's a precision loss of the order of 10^8 or 2^29 which is huge!
>> ...
> I agree - for example powerpc has -mrecip= to control which instructions
> to use (float/double rsqrt or inverse) and -mrecip-precision to
> specify whether further iteration is done or not.
> [...]
> Your suggested huge reduction in precision isn't usually acceptable
> and should be always explicitely enabled.

First, I have to correct myself – Kwok's -munsafe-math-optimizations is
only about single-precision functions, which do not have this problem.

However, the pre-existing 'sqrt' problem still is real. It also applies
to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN.

This patch now adds a commandline flag - off by default - to choose
whether this behavior is wanted. I did use the same name as aarch64,
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-mlow-precision-sqrt
(the latter also has -mlow-precision-recip-sqrt, which is not (yet)
sensible for GCN.)

This patch was manually tested for all combinations and I also looked at
insn-recog.cc, given that it is my first .md patch – it it seems to work
fine.

OK for mainline – or are there comments or more suggestions? I also
included some word for the release notes.

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: gcn-low-prec-sqrt.diff --]
[-- Type: text/x-patch, Size: 3550 bytes --]

GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246]

GCN's sqrt supports single and double precision; however, for either
the result has 23 bits for the fractional part of the floating-point
number. (For double precision: instead of 52 bits).

This adds now -mlow-precision-sqrt, using the same naming as aarch64.
Before, the hardware builtin sqrt was always used with
unsafe-math-optimiaztions, now only with single precision; for
double precision, the new -mlow-precision-sqrt is explicitly
required in addition. As there is no rsqrt, this flag likewise
applies to 1/sqrt.

	PR target/105246

gcc/ChangeLog:

	* config/gcn/gcn.opt (mlow-precision-sqrt): New, off by default.
	* config/gcn/gcn-valu.md (sqrt, v_sqrt): Require it unless SFmode.
	* doc/invoke.texi (GCN): Add -mlow-precision-sqrt entry.

 gcc/config/gcn/gcn-valu.md |  6 ++++--
 gcc/config/gcn/gcn.opt     |  7 +++++++
 gcc/doc/invoke.texi        | 11 +++++++++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 8c33ae0c717..c7a0b562874 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2276,7 +2276,8 @@ (define_insn "sqrt<mode>2<exec>"
   [(set (match_operand:V_FP 0 "register_operand"  "=  v")
 	(sqrt:V_FP
 	  (match_operand:V_FP 1 "gcn_alu_operand" "vSvB")))]
-  "flag_unsafe_math_optimizations"
+  "(flag_unsafe_math_optimizations
+    && (<MODE>mode == V64SFmode || flag_mlow_precision_sqrt))"
   "v_sqrt%i0\t%0, %1"
   [(set_attr "type" "vop1")
    (set_attr "length" "8")])
@@ -2285,7 +2286,8 @@ (define_insn "sqrt<mode>2"
   [(set (match_operand:FP 0 "register_operand"  "=  v")
 	(sqrt:FP
 	  (match_operand:FP 1 "gcn_alu_operand" "vSvB")))]
-  "flag_unsafe_math_optimizations"
+  "(flag_unsafe_math_optimizations
+    && (<MODE>mode == SFmode || flag_mlow_precision_sqrt))"
   "v_sqrt%i0\t%0, %1"
   [(set_attr "type" "vop1")
    (set_attr "length" "8")])
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 9606aaf0b1a..a3f341f7eb1 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -77,6 +77,13 @@ mgang-private-size=
 Target RejectNegative Joined UInteger Var(gang_private_size_opt) Init(-1)
 Amount of local data-share (LDS) memory to reserve for gang-private variables.
 
+mlow-precision-sqrt
+Target Var(flag_mlow_precision_sqrt) Optimization
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number.
+It also implies low-precision reciprocal sqrt.
+
 Wopenacc-dims
 Target Var(warn_openacc_dims) Warning
 Warn about invalid OpenACC dimensions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c066219a7d..fdd6e41cade 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20192,6 +20192,17 @@ compiled code must match the device mode.  The default is @samp{-mno-xnack}.
 At present this option is a placeholder for support that is not yet
 implemented.
 
+@item -mlow-precision-sqrt
+@itemx -mno-low-precision-sqrt
+@opindex mlow-precision-sqrt
+@opindex mno-low-precision-sqrt
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number (2@sup{29} ULP).
+It also implies low-precision reciprocal sqrt.
+This option only has an effect if @option{-ffast-math} or
+@option{-funsafe-math-optimizations} is used as well.
+
 @end table
 
 @node ARC Options

[-- Attachment #3: www-gcn-low-prec.diff --]
[-- Type: text/x-patch, Size: 901 bytes --]

gcc-13/changes.html - GCN: document -mlow-precision-sqrt

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 390193ca..d335eab3 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -179,6 +179,12 @@ a work-in-progress.</p>
   <li>Support for the Instinct MI200 series devices (<a
       href="https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html">
       <code>gfx90a</code></a>) has been added.</li>
+  <li>The <code>-mlow-precision-sqrt</code> option (disabled by default)
+      has been added to use the hardware <code>sqrt</code> also for
+      double-precision floating point arguments; note that the result
+      only has much a reduced accurary of 2<sup>29</sup> ULP. This
+      option requires <code>-funsafe-math-optimizations</code>
+      (implied by <code>-ffast-math</code>) in addition.</li>
 </ul>
 
 <!-- <h3 id="arc">ARC</h3> -->

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] amdgcn: Add support for additional natively supported floating-point operations
  2022-09-09 10:16     ` Richard Biener
  2022-09-09 12:20       ` GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations) Tobias Burnus
@ 2022-09-09 12:32       ` Stubbs, Andrew
  1 sibling, 0 replies; 8+ messages in thread
From: Stubbs, Andrew @ 2022-09-09 12:32 UTC (permalink / raw)
  To: Richard Biener, Burnus, Tobias; +Cc: Yeung, Kwok, gcc-patches

> -----Original Message-----
> I agree - for example powerpc has -mrecip= to control which instructions
> to use (float/double rsqrt or inverse) and -mrecip-precision to
> specify whether further iteration is done or not.
> 
> x86 has similar but does always perform newton raphson iteration,
> documenting 2 ulp instead of 0.5 ulp precision.
> 
> Your suggested huge reduction in precision isn't usually acceptable
> and should be always explicitely enabled.

There isn't a problem with *this* patch (although we do have existing accuracy issues thanks to previous documents lacking the information).

The "inaccurate" instructions are single-precision only, and therefore acceptable with -ffast-math.

Kwok intends to provide vectorized library calls for the double-precision and -fno-fast-math cases.

In general I want to avoid adding extra arch-specific options; partly because approximately no one will use them, and partly because the amdgcn compiler is almost always hidden behind an x86_64 compiler.

Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations)
  2022-09-09 12:20       ` GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations) Tobias Burnus
@ 2022-09-09 12:40         ` Andrew Stubbs
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Stubbs @ 2022-09-09 12:40 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Kwok Cheung Yeung, gcc-patches, Richard Biener

On 09/09/2022 13:20, Tobias Burnus wrote:
> However, the pre-existing 'sqrt' problem still is real. It also applies 
> to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN.
> 
> This patch now adds a commandline flag - off by default - to choose 
> whether this behavior is wanted. I did use the same name as aarch64, 
> https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-mlow-precision-sqrt (the latter also has -mlow-precision-recip-sqrt, which is not (yet) sensible for GCN.)
> 
> This patch was manually tested for all combinations and I also looked at 
> insn-recog.cc, given that it is my first .md patch – it it seems to work 
> fine.
> 
> OK for mainline – or are there comments or more suggestions? I also 
> included some word for the release notes.

No, thank you.

I don't see any value in adding an option no one cares about (but we 
still have to maintain and test).

I think it will make sense to drop the double-precision insn definition 
and fall back to libm in that case.

Kwok is currently reviewing all the libm functions and can probably 
include this one.

Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations
  2022-09-08 20:38 [PATCH] amdgcn: Add support for additional natively supported floating-point operations Kwok Cheung Yeung
  2022-09-09  8:10 ` Andrew Stubbs
@ 2022-09-09 17:57 ` Joseph Myers
  1 sibling, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2022-09-09 17:57 UTC (permalink / raw)
  To: Kwok Cheung Yeung; +Cc: gcc-patches, Andrew Stubbs

On Thu, 8 Sep 2022, Kwok Cheung Yeung wrote:

> The sin and cos instructions for some reason are scaled by 2*PI radians (i.e.
> 1.0 == 2*PI radians/360 degrees), so their inputs need to be scaled by
> 1/(2*PI) first. I've implemented this as an expander to two instructions - one

C2x has sinpi and cospi function families for sin(pi*x) and cos(pi*x); 
adding built-in functions / insn patterns for those functions would then 
allow those instructions to be used for those functions with scaling by 
1/2.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-09 17:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-08 20:38 [PATCH] amdgcn: Add support for additional natively supported floating-point operations Kwok Cheung Yeung
2022-09-09  8:10 ` Andrew Stubbs
2022-09-09  9:15   ` Tobias Burnus
2022-09-09 10:16     ` Richard Biener
2022-09-09 12:20       ` GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations) Tobias Burnus
2022-09-09 12:40         ` Andrew Stubbs
2022-09-09 12:32       ` [PATCH] amdgcn: Add support for additional natively supported floating-point operations Stubbs, Andrew
2022-09-09 17:57 ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).