Hi All, This is v3 of the patch which adds an optimized route to the fpclassify builtin for floating point numbers which are similar to IEEE-754 in format. The patch has been rewritten to do it in GIMPLE instead of a fold. As part of the implementation optimized versions of is_normal, is_subnormal, is_nan, is_infinite and is_zero have been created. This patch also introduces two new intrinsics __builtin_iszero and __builtin_issubnormal. NOTE: the old code for ISNORMAL, ISSUBNORMAL, ISNAN and ISINFINITE had a special case for ibm_extended_format which dropped the second part of the number (which was being represented as two numbers internally). fpclassify did not have such a case. As such I have dropped it as I am under the impression the format is deprecated? So the optimization isn't as important? If this is wrong it would be easy to add that back in. Should ISFINITE be change as well? Also should it be SUBNORMAL or DENORMAL? And what should I do about Documentation? I'm not sure how to document a new BUILTIN. The goal is to make it faster by: 1. Trying to determine the most common case first (e.g. the float is a Normal number) and then the rest. The amount of code generated at -O2 are about the same +/- 1 instruction, but the code is much better. 2. Using integer operation in the optimized path. At a high level, the optimized path uses integer operations to perform the following checks in the given order: - normal - zero - nan - infinite - subnormal The operations are ordered in the order of most occurrence of the values. In case the optimization can't be applied a fall-back method is used which is similar to the existing implementation using FP instructions. However the operations now also follow the same order as described above. Which means there should be some slight benefits there as well. A limitation with this new approach is that the exponent of the floating point has to fit in 32 bits and the floating point has to have an IEEE like format and values for NaN and INF (e.g. for NaN and INF all bits of the exp must be set). To determine this IEEE likeness a new boolean was added to real_format. As an example, AArch64 now generates for classification of doubles: f: fmov x1, d0 mov w0, 7 ubfx x2, x1, 52, 11 add w2, w2, 1 tst w2, 2046 bne .L1 lsl x1, x1, 1 mov w0, 13 cbz x1, .L1 mov x2, -9007199254740992 cmp x1, x2 mov w0, 5 mov w3, 11 csel w0, w0, w3, eq mov w1, 3 csel w0, w0, w1, ls .L1: ret and for the floating point version: f: adrp x2, .LC0 fabs d1, d0 adrp x1, .LC1 mov w0, 7 ldr d3, [x2, #:lo12:.LC0] ldr d2, [x1, #:lo12:.LC1] fcmpe d1, d3 fccmpe d1, d2, 2, ge bls .L1 fcmp d0, #0.0 mov w0, 13 beq .L1 fcmp d1, d1 bvs .L5 fcmpe d1, d2 mov w0, 5 mov w1, 11 csel w0, w0, w1, gt .L1: ret .L5: mov w0, 3 ret One new test to test that the integer version does not generate FP code, correctness is tested using the existing test code for FP classifiy. Glibc benchmarks ran against the built-in and this shows the following performance gain on Aarch64 using the integer code: * zero: 0% * inf/nan: 29% * normal: 69.1% On x86_64: * zero: 0% * inf/nan: 89.9% * normal: 4.7% Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi and no regression. x86_64 bootstrapped successfully as well. Ok for trunk? Thanks, Tamar gcc/ 2016-11-11 Tamar Christina * gcc/builtins.c (fold_builtin_fpclassify): Removed. (expand_builtin): Added builtins to lowering list. (fold_builtin_n): Removed fold_builtin_varargs. (fold_builtin_varargs): Removed. * gcc/builtins.def (BUILT_IN_ISZERO, BUILT_IN_ISSUBNORMAL): Added. (fold_builtin_interclass_mathfn): Use get_min_float instead. * gcc/real.h (get_min_float): Added. * gcc/real.c (get_min_float): Added. * gcc/gimple-low.c (lower_stm): Define BUILT_IN_FPCLASSIFY, CASE_FLT_FN (BUILT_IN_ISINF), BUILT_IN_ISINFD32, BUILT_IN_ISINFD64, BUILT_IN_ISINFD128, BUILT_IN_ISNAND32, BUILT_IN_ISNAND64, BUILT_IN_ISNAND128, BUILT_IN_ISNAN, BUILT_IN_ISNORMAL, BUILT_IN_ISZERO, BUILT_IN_ISSUBNORMAL. (lower_builtin_fpclassify, is_nan, is_normal, is_infinity): Added. (is_zero, is_subnormal, use_ieee_int_mode): Likewise. (lower_builtin_isnan, lower_builtin_isinfinite): Likewise. (lower_builtin_isnormal, lower_builtin_iszero): Likewise. (lower_builtin_issubnormal): Likewise. (emit_tree_cond, get_num_as_int, emit_tree_and_return_var): Added. * gcc/real.h (real_format): Added is_ieee_compatible field. * gcc/real.c (ieee_single_format): Set is_ieee_compatible flag. (mips_single_format): Likewise. (motorola_single_format): Likewise. (spu_single_format): Likewise. (ieee_double_format): Likewise. (mips_double_format): Likewise. (motorola_double_format): Likewise. (ieee_extended_motorola_format): Likewise. (ieee_extended_intel_128_format): Likewise. (ieee_extended_intel_96_round_53_format): Likewise. (ibm_extended_format): Likewise. (mips_extended_format): Likewise. (ieee_quad_format): Likewise. (mips_quad_format): Likewise. (vax_f_format): Likewise. (vax_d_format): Likewise. (vax_g_format): Likewise. (decimal_single_format): Likewise. (decimal_quad_format): Likewise. (iee_half_format): Likewise. (mips_single_format): Likewise. (arm_half_format): Likewise. (real_internal_format): Likewise. gcc/testsuite/ 2016-11-11 Tamar Christina * gcc.target/aarch64/builtin-fpclassify.c: New codegen test. * gcc.dg/fold-notunord.c: Removed.