public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tamar Christina <Tamar.Christina@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>,
	"jakub@redhat.com"	<jakub@redhat.com>,
	"rguenther@suse.de" <rguenther@suse.de>,
	"law@redhat.com"	<law@redhat.com>
Cc: nd <nd@arm.com>
Subject: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible
Date: Mon, 12 Sep 2016 16:21:00 -0000	[thread overview]
Message-ID: <VI1PR0801MB2031BC0C70CCAD966A9B2933FFFF0@VI1PR0801MB2031.eurprd08.prod.outlook.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3130 bytes --]

Hi All,

This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.

At a high level, the optimized path uses integer operations
to perform the following:

  if (exponent bits aren't all set or unset)
     return Normal;
  else if (no bits are set on the number after masking out
	   sign bits then)
     return Zero;
  else if (exponent has no bits set)
     return Subnormal;
  else if (mantissa has no bits set)
     return Infinite;
  else
     return NaN;

In case the optimization can't be applied the old
implementation is used as a fall-back.

A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).

To determine this IEEE likeness a new boolean was added to real_format.

Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 uses it's own implementation other than 
the fpclassify builtin.

As an example, Aarch64 now generates for classification of doubles:

f:
	fmov	x1, d0
	mov	w0, 7
	sbfx	x2, x1, 52, 11
	add	w3, w2, 1
	tst	w3, 0x07FE
	bne	.L1
	mov	w0, 13
	tst	x1, 0x7fffffffffffffff
	beq	.L1
	mov	w0, 11
	tbz	x2, 0, .L1
	tst	x1, 0xfffffffffffff
	mov	w0, 3
	mov	w1, 5
	csel	w0, w0, w1, ne

.L1:
	ret

No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 31.3%
performance gain.

Ok for trunk?

Thanks,
Tamar

PS. I don't have commit rights so if OK can someone apply the patch for me.

gcc/
2016-08-25  Tamar Christina  <tamar.christina@arm.com>
	    Wilco Dijkstra  <wilco.dijkstra@arm.com>

	* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version. 
	* gcc/real.h (real_format): Added is_ieee_compatible field.
	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
	(mips_single_format): Likewise.
	(motorola_single_format): Likewise.
	(spu_single_format): Likewise.
	(ieee_double_format): Likewise.
	(mips_double_format): Likewise.
	(motorola_double_format): Likewise.
	(ieee_extended_motorola_format): Likewise.
	(ieee_extended_intel_128_format): Likewise.
	(ieee_extended_intel_96_round_53_format): Likewise.
	(ibm_extended_format): Likewise.
	(mips_extended_format): Likewise.
	(ieee_quad_format): Likewise.
	(mips_quad_format): Likewise.
	(vax_f_format): Likewise.
	(vax_d_format): Likewise.
	(vax_g_format): Likewise.
	(decimal_single_format): Likewise.
	(decimal_quad_format): Likewise.
	(iee_half_format): Likewise.
	(mips_single_format): Likewise.
	(arm_half_format): Likewise.
	(real_internal_format): Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gcc-public.patch --]
[-- Type: text/x-patch; name=gcc-public.patch, Size: 11013 bytes --]

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1073e35b17b1bc1f6974c71c940bd9d82bbbfc0f..58bf129f9a0228659fd3b976d38d021d1d5bd6bb 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7947,10 +7947,8 @@ static tree
 fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
 {
   tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
-    arg, type, res, tmp;
+    arg, type, res;
   machine_mode mode;
-  REAL_VALUE_TYPE r;
-  char buf[128];
 
   /* Verify the required arguments in the original call.  */
   if (nargs != 6
@@ -7970,14 +7968,143 @@ fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
   arg = args[5];
   type = TREE_TYPE (arg);
   mode = TYPE_MODE (type);
-  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
+  const real_format *format = REAL_MODE_FORMAT (mode);
+
+  /*
+  For IEEE 754 types:
+
+  fpclassify (x) ->
+       !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
+	 ? (x & sign_mask == 0 ? FP_ZERO :
+	   (exp & exp_mask == exp_mask
+	      ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
+	      FP_SUBNORMAL)):
+       FP_NORMAL.
+
+  Otherwise
+
+  fpclassify (x) ->
+       isnan (x) ? FP_NAN :
+	(fabs (x) == Inf ? FP_INFINITE :
+	   (fabs (x) >= DBL_MIN ? FP_NORMAL :
+	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
+  */
+
+  /* Check if the number that is being classified is close enough to IEEE 754
+     format to be able to go in the early exit code.  */
+  if (format->is_binary_ieee_compatible)
+    {
+      gcc_assert (format->b == 2);
+
+      const tree int_type = integer_type_node;
+      const int exp_bits  = (GET_MODE_SIZE (mode) * BITS_PER_UNIT) - format->p;
+      const int exp_mask  = (1 << exp_bits) - 1;
+
+      tree exp, specials, exp_bitfield,
+	   const_arg0, const_arg1, const0, const1,
+	   not_sign_mask, zero_check, mantissa_mask,
+	   mantissa_any_set, exp_lsb_set, mask_check;
+      tree int_arg_type, int_arg;
+
+      /* Re-interpret the float as an unsigned integer type
+	 with equal precision.  */
+      int_arg_type = build_nonstandard_integer_type (TYPE_PRECISION (type), 0);
+      int_arg = fold_build1_loc (loc, INDIRECT_REF, int_arg_type,
+		  fold_build1_loc (loc, NOP_EXPR,
+				   build_pointer_type (int_arg_type),
+		    fold_build1_loc (loc, ADDR_EXPR,
+				     build_pointer_type (type), arg)));
+
+      /* Extract exp bits from the float, where we expect the exponent to be.
+	 We create a new type because BIT_FIELD_REF does not allow you to
+	 extract less bits than the precision of the storage variable.  */
+      exp_bitfield = fold_build3_loc (loc, BIT_FIELD_REF,
+			build_nonstandard_integer_type (exp_bits, 0), int_arg,
+			build_int_cst (int_type, exp_bits),
+			build_int_cst (int_type, format->p - 1));
+
+      /* Re-interpret the extracted exponent bits as a 32 bit int.
+	 This allows us to continue doing operations as int_type.  */
+      exp = fold_build1_loc (loc, NOP_EXPR, int_type, exp_bitfield);
+
+      /* Set up some often used constants.  */
+      const_arg0 = build_int_cst (int_arg_type, 0);
+      const_arg1 = build_int_cst (int_arg_type, 1);
+      const0 = build_int_cst (int_type, 0);
+      const1 = build_int_cst (int_type, 1);
+
+      /* 1) First check for 0 by first masking out sign bit.
+	 2) Then check for NaNs using a bit mask by checking first if the
+	    exponent has all bits set, if it does it can be either NaN or INF.
+	 3) Anything else are subnormal numbers.  */
+
+      /* ~(1 << location_sign_bit).
+	 This creates a mask that can be used to mask out the sign bit.  */
+      not_sign_mask = fold_build1_loc (loc, BIT_NOT_EXPR, int_arg_type,
+			fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
+			  const_arg1,
+			  build_int_cst (int_arg_type, format->signbit_rw)));
+
+      /* num & not_sign_mask == 0.
+	 This checks to see if the number is zero.  */
+      zero_check = fold_build2_loc (loc, EQ_EXPR, int_type, const_arg0,
+			 fold_build2_loc (loc, BIT_AND_EXPR, int_arg_type,
+			   int_arg, not_sign_mask));
+
+      /* b^(p-1) - 1 or 1 << (p - 2)
+	 This creates a mask to be used to check the mantissa value.  */
+      mantissa_mask = fold_build2_loc (loc, MINUS_EXPR, int_arg_type,
+			 fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
+			    build_int_cst (int_arg_type, format->b),
+			    build_int_cst (int_arg_type, format->p - 2)),
+			 const_arg1);
+
+      /* num & mantissa_mask != 0.  */
+      mantissa_any_set = fold_build2_loc (loc, NE_EXPR, int_type, const_arg0,
+			    fold_build2_loc (loc, BIT_AND_EXPR, int_arg_type,
+			      mantissa_mask, int_arg));
+
+      /* (exp & 1) != 0.
+	 This check can be used to check if the exp is all 0 or all 1.
+	 At the point it is used the exp is either all 1 or 0, so checking
+	 one bit is enough to disambiguate between the two.  */
+      exp_lsb_set = fold_build2_loc (loc, NE_EXPR, int_type, const0,
+			    fold_build2_loc (loc, BIT_AND_EXPR, int_type,
+					     exp, const1));
+
+      /* Combine the values together.  */
+      specials = fold_build3_loc (loc, COND_EXPR, int_type, zero_check, fp_zero,
+		   fold_build3_loc (loc, COND_EXPR, int_type, exp_lsb_set,
+		    fold_build3_loc (loc, COND_EXPR, int_type, mantissa_any_set,
+		      HONOR_NANS (mode) ? fp_nan : fp_normal,
+		      HONOR_INFINITIES (mode) ? fp_infinite : fp_normal),
+		    fp_subnormal));
+
+      /* Top level compare of the most general case,
+	 try to see if it's a normal real.  */
+
+      /* exp_mask & ~1.  */
+      mask_check = fold_build2_loc (loc, BIT_AND_EXPR, int_type,
+			  build_int_cst (int_type, exp_mask),
+			  fold_build1_loc (loc, BIT_NOT_EXPR, int_type,
+					   const1));
+
+      res = fold_build3_loc (loc, COND_EXPR, int_type,
+	       fold_build2_loc (loc, NE_EXPR, int_type, const0,
+		 /* (exp + 1) & mask_check.
+		    Check to see if exp is not all 0 or all 1.  */
+		 fold_build2_loc (loc, BIT_AND_EXPR, int_type,
+		   fold_build2_loc (loc, PLUS_EXPR, int_type, exp, const1),
+		     mask_check)),
+		   fp_normal, specials);
 
-  /* fpclassify(x) ->
-       isnan(x) ? FP_NAN :
-         (fabs(x) == Inf ? FP_INFINITE :
-	   (fabs(x) >= DBL_MIN ? FP_NORMAL :
-	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).  */
+      return res;
+    }
 
+  REAL_VALUE_TYPE r;
+  tree tmp;
+  char buf[128];
+  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
   tmp = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg,
 		     build_real (type, dconst0));
   res = fold_build3_loc (loc, COND_EXPR, integer_type_node,
diff --git a/gcc/real.h b/gcc/real.h
index 59af580e78f2637be84f71b98b45ec6611053222..36ded57cf4db7c30c935bdb24219a167480f39c8 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -161,6 +161,15 @@ struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
+
+  /* This flag indicates whether the format can be used in the optimized
+     code paths for the __builtin_fpclassify function and friends.
+     The format has to have the same NaN and INF representation as normal
+     IEEE floats (e.g. exp must have all bits set), most significant bit must be
+     sign bit, followed by exp bits of at most 32 bits.  Lastly the floating
+     point number must be representable as an integer.  The base of the number
+     also must be base 2.  */
+  bool is_binary_ieee_compatible;
   const char *name;
 };
 
diff --git a/gcc/real.c b/gcc/real.c
index 66e88e2ad366f7848609d157074c80420d778bcf..a9ad63072b5d5803eb048d30af5546e0b458f857 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -3052,6 +3052,7 @@ const struct real_format ieee_single_format =
     true,
     true,
     false,
+    true,
     "ieee_single"
   };
 
@@ -3075,6 +3076,7 @@ const struct real_format mips_single_format =
     true,
     false,
     true,
+    true,
     "mips_single"
   };
 
@@ -3098,6 +3100,7 @@ const struct real_format motorola_single_format =
     true,
     true,
     true,
+    true,
     "motorola_single"
   };
 
@@ -3132,6 +3135,7 @@ const struct real_format spu_single_format =
     true,
     false,
     false,
+    false,
     "spu_single"
   };
 \f
@@ -3343,6 +3347,7 @@ const struct real_format ieee_double_format =
     true,
     true,
     false,
+    true,
     "ieee_double"
   };
 
@@ -3366,6 +3371,7 @@ const struct real_format mips_double_format =
     true,
     false,
     true,
+    true,
     "mips_double"
   };
 
@@ -3389,6 +3395,7 @@ const struct real_format motorola_double_format =
     true,
     true,
     true,
+    true,
     "motorola_double"
   };
 \f
@@ -3735,6 +3742,7 @@ const struct real_format ieee_extended_motorola_format =
     true,
     true,
     true,
+    false,
     "ieee_extended_motorola"
   };
 
@@ -3758,6 +3766,7 @@ const struct real_format ieee_extended_intel_96_format =
     true,
     true,
     false,
+    false,
     "ieee_extended_intel_96"
   };
 
@@ -3781,6 +3790,7 @@ const struct real_format ieee_extended_intel_128_format =
     true,
     true,
     false,
+    false,
     "ieee_extended_intel_128"
   };
 
@@ -3806,6 +3816,7 @@ const struct real_format ieee_extended_intel_96_round_53_format =
     true,
     true,
     false,
+    false,
     "ieee_extended_intel_96_round_53"
   };
 \f
@@ -3896,6 +3907,7 @@ const struct real_format ibm_extended_format =
     true,
     true,
     false,
+    false,
     "ibm_extended"
   };
 
@@ -3919,6 +3931,7 @@ const struct real_format mips_extended_format =
     true,
     false,
     true,
+    false,
     "mips_extended"
   };
 
@@ -4184,6 +4197,7 @@ const struct real_format ieee_quad_format =
     true,
     true,
     false,
+    false,
     "ieee_quad"
   };
 
@@ -4207,6 +4221,7 @@ const struct real_format mips_quad_format =
     true,
     false,
     true,
+    false,
     "mips_quad"
   };
 \f
@@ -4509,6 +4524,7 @@ const struct real_format vax_f_format =
     false,
     false,
     false,
+    false,
     "vax_f"
   };
 
@@ -4532,6 +4548,7 @@ const struct real_format vax_d_format =
     false,
     false,
     false,
+    false,
     "vax_d"
   };
 
@@ -4555,6 +4572,7 @@ const struct real_format vax_g_format =
     false,
     false,
     false,
+    false,
     "vax_g"
   };
 \f
@@ -4633,6 +4651,7 @@ const struct real_format decimal_single_format =
     true,
     true,
     false,
+    false,
     "decimal_single"
   };
 
@@ -4657,6 +4676,7 @@ const struct real_format decimal_double_format =
     true,
     true,
     false,
+    false,
     "decimal_double"
   };
 
@@ -4681,6 +4701,7 @@ const struct real_format decimal_quad_format =
     true,
     true,
     false,
+    false,
     "decimal_quad"
   };
 \f
@@ -4820,6 +4841,7 @@ const struct real_format ieee_half_format =
     true,
     true,
     false,
+    false,
     "ieee_half"
   };
 
@@ -4846,6 +4868,7 @@ const struct real_format arm_half_format =
     true,
     false,
     false,
+    false,
     "arm_half"
   };
 \f
@@ -4893,6 +4916,7 @@ const struct real_format real_internal_format =
     true,
     true,
     false,
+    false,
     "real_internal"
   };
 \f

             reply	other threads:[~2016-09-12 16:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-12 16:21 Tamar Christina [this message]
2016-09-12 22:33 ` Joseph Myers
2016-09-13 12:25   ` Tamar Christina
2016-09-12 22:41 ` Joseph Myers
2016-09-13 12:30   ` Tamar Christina
2016-09-13 12:44     ` Joseph Myers
2016-09-15  9:08       ` Tamar Christina
2016-09-15 11:21         ` Wilco Dijkstra
2016-09-15 12:56           ` Joseph Myers
2016-09-15 13:05         ` Joseph Myers
2016-09-12 22:49 ` Joseph Myers
2016-09-13 12:33   ` Tamar Christina
2016-09-13 12:48     ` Joseph Myers
2016-09-13  8:58 ` Jakub Jelinek
2016-09-13 16:16   ` Jeff Law
2016-09-14  8:31     ` Richard Biener
2016-09-15 16:02       ` Jeff Law
2016-09-15 16:28         ` Richard Biener
2016-09-16 19:53 ` Jeff Law
2016-09-20 12:14   ` Tamar Christina
2016-09-20 14:52     ` Jeff Law
2016-09-20 17:52       ` Joseph Myers
2016-09-21  7:13       ` Richard Biener
2016-09-19 22:43 ` Michael Meissner
     [not found]   ` <41217f33-3861-dbb8-2f11-950ab30a7021@arm.com>
2016-09-20 21:27     ` Michael Meissner
2016-09-21  2:05       ` Joseph Myers
2016-09-21  8:32         ` Richard Biener
2016-09-12 17:24 Moritz Klammler
2016-09-12 20:08 ` Andrew Pinski
2016-09-13 12:16 Wilco Dijkstra
2016-09-13 16:10 ` Joseph Myers
2016-09-21 14:51 ` Richard Earnshaw (lists)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR0801MB2031BC0C70CCAD966A9B2933FFFF0@VI1PR0801MB2031.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=law@redhat.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).