Hi All,

This is v3 of the patch which adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The patch has been rewritten to do it in GIMPLE instead of a fold. As part of
the implementation optimized versions of is_normal, is_subnormal, is_nan,
is_infinite and is_zero have been created. This patch also introduces two new
intrinsics __builtin_iszero and __builtin_issubnormal.

NOTE: the old code for ISNORMAL, ISSUBNORMAL, ISNAN and ISINFINITE had a
      special case for ibm_extended_format which dropped the second part
      of the number (which was being represented as two numbers internally).
      fpclassify did not have such a case. As such I have dropped it as I am
      under the impression the format is deprecated? So the optimization isn't
      as important? If this is wrong it would be easy to add that back in.

Should ISFINITE be change as well? Also should it be SUBNORMAL or DENORMAL?
And what should I do about Documentation? I'm not sure how to document a new
BUILTIN.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.

At a high level, the optimized path uses integer operations
to perform the following checks in the given order:

  - normal
  - zero
  - nan
  - infinite
  - subnormal

The operations are ordered in the order of most occurrence of the values.

In case the optimization can't be applied a fall-back method is used which
is similar to the existing implementation using FP instructions. However
the operations now also follow the same order as described above. Which means
there should be some slight benefits there as well.

A limitation with this new approach is that the exponent
of the floating point has to fit in 32 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).

To determine this IEEE likeness a new boolean was added to real_format.

As an example, AArch64 now generates for classification of doubles:

f:
	fmov	x1, d0
	mov	w0, 7
	ubfx	x2, x1, 52, 11
	add	w2, w2, 1
	tst	w2, 2046
	bne	.L1
	lsl	x1, x1, 1
	mov	w0, 13
	cbz	x1, .L1
	mov	x2, -9007199254740992
	cmp	x1, x2
	mov	w0, 5
	mov	w3, 11
	csel	w0, w0, w3, eq
	mov	w1, 3
	csel	w0, w0, w1, ls
.L1:
	ret

and for the floating point version:

f:
	adrp	x2, .LC0
	fabs	d1, d0
	adrp	x1, .LC1
	mov	w0, 7
	ldr	d3, [x2, #:lo12:.LC0]
	ldr	d2, [x1, #:lo12:.LC1]
	fcmpe	d1, d3
	fccmpe	d1, d2, 2, ge
	bls	.L1
	fcmp	d0, #0.0
	mov	w0, 13
	beq	.L1
	fcmp	d1, d1
	bvs	.L5
	fcmpe	d1, d2
	mov	w0, 5
	mov	w1, 11
	csel	w0, w0, w1, gt
.L1:
	ret
.L5:
	mov	w0, 3
	ret

One new test to test  that the integer version does not generate FP code,
correctness is tested using the existing test code for FP classifiy.

Glibc benchmarks ran against the built-in and this shows the following
performance gain on Aarch64 using the integer code:

* zero: 0%
* inf/nan: 29%
* normal: 69.1%

On x86_64:

* zero: 0%
* inf/nan: 89.9%
* normal: 4.7%

Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86_64 bootstrapped successfully as well.

Ok for trunk?

Thanks,
Tamar

gcc/
2016-11-11  Tamar Christina  <tamar.christina@arm.com>

	* gcc/builtins.c (fold_builtin_fpclassify): Removed.
	(expand_builtin): Added builtins to lowering list.
	(fold_builtin_n): Removed fold_builtin_varargs.
	(fold_builtin_varargs): Removed.
	* gcc/builtins.def (BUILT_IN_ISZERO, BUILT_IN_ISSUBNORMAL): Added.
	(fold_builtin_interclass_mathfn): Use get_min_float instead.
	* gcc/real.h (get_min_float): Added.
	* gcc/real.c (get_min_float): Added.
	* gcc/gimple-low.c (lower_stm): Define BUILT_IN_FPCLASSIFY,
	CASE_FLT_FN (BUILT_IN_ISINF), BUILT_IN_ISINFD32, BUILT_IN_ISINFD64,
	BUILT_IN_ISINFD128, BUILT_IN_ISNAND32, BUILT_IN_ISNAND64,
	BUILT_IN_ISNAND128, BUILT_IN_ISNAN, BUILT_IN_ISNORMAL, BUILT_IN_ISZERO,
	BUILT_IN_ISSUBNORMAL.
	(lower_builtin_fpclassify, is_nan, is_normal, is_infinity): Added.
	(is_zero, is_subnormal, use_ieee_int_mode): Likewise.
	(lower_builtin_isnan, lower_builtin_isinfinite): Likewise.
	(lower_builtin_isnormal, lower_builtin_iszero): Likewise.
	(lower_builtin_issubnormal): Likewise.
	(emit_tree_cond, get_num_as_int, emit_tree_and_return_var): Added.
	* gcc/real.h (real_format): Added is_ieee_compatible field.
	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
	(mips_single_format): Likewise.
	(motorola_single_format): Likewise.
	(spu_single_format): Likewise.
	(ieee_double_format): Likewise.
	(mips_double_format): Likewise.
	(motorola_double_format): Likewise.
	(ieee_extended_motorola_format): Likewise.
	(ieee_extended_intel_128_format): Likewise.
	(ieee_extended_intel_96_round_53_format): Likewise.
	(ibm_extended_format): Likewise.
	(mips_extended_format): Likewise.
	(ieee_quad_format): Likewise.
	(mips_quad_format): Likewise.
	(vax_f_format): Likewise.
	(vax_d_format): Likewise.
	(vax_g_format): Likewise.
	(decimal_single_format): Likewise.
	(decimal_quad_format): Likewise.
	(iee_half_format): Likewise.
	(mips_single_format): Likewise.
	(arm_half_format): Likewise.
	(real_internal_format): Likewise.

gcc/testsuite/
2016-11-11  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/builtin-fpclassify.c: New codegen test.
	* gcc.dg/fold-notunord.c: Removed.