From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2153) id 6C3BB3858426; Sat, 2 Mar 2024 00:38:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6C3BB3858426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709339938; bh=4vB3MKn1i+4H7hh59zABh0zI3LVZba39eLo6edrn//8=; h=From:To:Subject:Date:From; b=LXHyFR1sH5JPkPp7DRyqfxCum2JjL/HDYxpvtPIsZ/2nywXPzbvc++ZXodQ+mkcEX wTw/X1m7xz0ejccpUjb40avqGi3ap/8vmTdygl6KUa+nMR/WWiik/EqkPdeMfrWO9J kcSc/QWzdqO7hXPKN8FcNWc/ukhP2wVochHuVvuc= MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Jakub Jelinek To: gcc-cvs@gcc.gnu.org Subject: [gcc r13-8391] call-cdce: Add missing BUILT_IN_*F{32, 64}X handling and improve BUILT_IN_*L [PR113993] X-Act-Checkin: gcc X-Git-Author: Jakub Jelinek X-Git-Refname: refs/heads/releases/gcc-13 X-Git-Oldrev: 9de6ff5ec9a46951d2c71b5b32574a516a72b907 X-Git-Newrev: 856a66a672a1fd7feb2dee7e7aca21118016063f Message-Id: <20240302003858.6C3BB3858426@sourceware.org> Date: Sat, 2 Mar 2024 00:38:58 +0000 (GMT) List-Id: https://gcc.gnu.org/g:856a66a672a1fd7feb2dee7e7aca21118016063f commit r13-8391-g856a66a672a1fd7feb2dee7e7aca21118016063f Author: Jakub Jelinek Date: Thu Feb 22 10:19:15 2024 +0100 call-cdce: Add missing BUILT_IN_*F{32,64}X handling and improve BUILT_IN_*L [PR113993] The following testcase ICEs, because can_test_argument_range returns true for BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X} among many other builtins, but get_no_error_domain doesn't handle those. float32x_type_node when supported in GCC always has DFmode, so that case is easy (and call-cdce assumes that SFmode is IEEE float and DFmode is IEEE double). So *F32X is simply handled by adding those cases next to *F64. float64x_type_node when supported in GCC by definition has a mode with larger precision and exponent range than DFmode, so it can be XFmode, TFmode or KFmode. I went through all the l/f128 suffixed builtins and verified that the float128x_type_node no error domain range is actually identical to the Intel extended long double no error domain range; it isn't that surprising, both IEEE quad and Intel/Motorola extended have the same exponent range [-16381, 16384] (well, Motorola -16382 probably because of different behavior for denormals, but that has nothing to do with get_no_error_domain which is about large inputs overflowing into +-Inf or triggering NaN, denormals could in theory do something solely for sqrt and even that is fine). In theory some target could have different larger type, so for *F64X the code verifies that REAL_MODE_FORMAT (TYPE_MODE (float64x_type_node))->emax == 16384 and if so, uses the *F128 domains, otherwise falls back to the non-suffixed ones (aka *F64), that is certainly the conservative minimum. While at it, the patch also changes the *L suffixed cases to do pretty much the same, the comment said that the function just assumes for *L the *F64 ranges, but that is unnecessarily conservative. All we currently have for long double is: 1) IEEE quad (emax 16384, *F128 ranges) 2) XFmode Intel/Motorola extended (emax 16384, same as *F128 ranges) 3) IBM extended (double double, emax 1024, the extra precision doesn't really help and the domains are the same as for *F64) 4) same as double (*F64 again) So, the patch uses also for *L REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384 checks and either tail recurses into the *F128 case for that or to non-suffixed (aka *F64) case otherwise. BUILT_IN_*F128X not handled because no target has those and it doesn't seem something is on the horizon and who knows what would be used for that. Thus, all we get this wrong for are probably VAX floats or something similar, no intent from me to look at that, that is preexisting issue. BTW, I'm surprised we don't have BUILT_IN_EXP10F{16,32,64,128,32X,64X,128X} builtins, seems glibc has those (sure, I think except *16 and *128x). 2024-02-22 Jakub Jelinek PR tree-optimization/113993 * tree-call-cdce.cc (get_no_error_domain): Handle BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X}. * gcc.dg/tree-ssa/pr113993.c: New test. (cherry picked from commit 7ed800c9c94b57077ba5911974a63bc06a5e1c35) Diff: --- gcc/testsuite/gcc.dg/tree-ssa/pr113993.c | 257 +++++++++++++++++++++++++++++++ gcc/tree-call-cdce.cc | 23 ++- 2 files changed, 278 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr113993.c b/gcc/testsuite/gcc.dg/tree-ssa/pr113993.c new file mode 100644 index 00000000000..b7d492ffddb --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr113993.c @@ -0,0 +1,257 @@ +/* PR tree-optimization/113993 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-add-options float32 } */ +/* { dg-add-options float64 } */ +/* { dg-add-options float128 } */ +/* { dg-add-options float32x } */ +/* { dg-add-options float64x } */ +/* { dg-final { scan-tree-dump-not "__builtin_\[a-z0-9\] \\\(\[^\n\r\]\\\);" "optimized" } } */ + +void +flt (float f1, float f2, float f3, float f4, float f5, + float f6, float f7, float f8, float f9, float f10) +{ + if (!(f1 >= -1.0f && f1 <= 1.0f)) __builtin_unreachable (); + __builtin_acosf (f1); + __builtin_asinf (f1); + if (!(f2 >= 1.0f && f2 <= __builtin_inff ())) __builtin_unreachable (); + __builtin_acoshf (f2); + if (!(f3 > -1.0f && f3 < 1.0f)) __builtin_unreachable (); + __builtin_atanhf (f3); + if (!(f4 > 0.0f && f4 < __builtin_inff ())) __builtin_unreachable (); + __builtin_logf (f4); + __builtin_log2f (f4); + __builtin_log10f (f4); + if (!(f5 > -1.0f && f5 < __builtin_inff ())) __builtin_unreachable (); + __builtin_log1pf (f5); + if (!(f6 >= 0.0f && f6 < __builtin_inff ())) __builtin_unreachable (); + __builtin_sqrtf (f6); +#if __FLT_MANT_DIG__ == __FLT32_MANT_DIG__ && __FLT_MAX_EXP__ == __FLT32_MAX_EXP__ + if (!(f7 > -89.0f && f7 < 89.0f)) __builtin_unreachable (); + __builtin_coshf (f7); + __builtin_sinhf (f7); + if (!(f8 > -__builtin_inff () && f8 < 88.0f)) __builtin_unreachable (); + __builtin_expf (f8); + if (!(f9 > -__builtin_inff () && f9 < 128.0f)) __builtin_unreachable (); + __builtin_exp2f (f9); + if (!(f10 > -__builtin_inff () && f10 < 38.0f)) __builtin_unreachable (); + __builtin_exp10f (f10); +#endif +} + +#if defined(__FLT16_MANT_DIG__) && 0 /* No library routines here, these don't actually fold away. */ +void +flt16 (_Float16 f1, _Float16 f2, _Float16 f3, _Float16 f4, _Float16 f5, + _Float16 f6, _Float16 f7, _Float16 f8, _Float16 f9) +{ + if (!(f1 >= -1.0f16 && f1 <= 1.0f16)) __builtin_unreachable (); + __builtin_acosf16 (f1); + __builtin_asinf16 (f1); + if (!(f2 >= 1.0f16 && f2 <= __builtin_inff16 ())) __builtin_unreachable (); + __builtin_acoshf16 (f2); + if (!(f3 > -1.0f16 && f3 < 1.0f16)) __builtin_unreachable (); + __builtin_atanhf16 (f3); + if (!(f4 > 0.0f16 && f4 < __builtin_inff16 ())) __builtin_unreachable (); + __builtin_logf16 (f4); + __builtin_log2f16 (f4); + __builtin_log10f16 (f4); + if (!(f5 > -1.0f16 && f5 < __builtin_inff16 ())) __builtin_unreachable (); + __builtin_log1pf16 (f5); + if (!(f6 >= 0.0f16 && f6 < __builtin_inff16 ())) __builtin_unreachable (); + __builtin_sqrtf16 (f6); + if (!(f7 > -11.0f16 && f7 < 11.0f16)) __builtin_unreachable (); + __builtin_coshf16 (f7); + __builtin_sinhf16 (f7); + if (!(f8 > -__builtin_inff16 () && f8 < 11.0f16)) __builtin_unreachable (); + __builtin_expf16 (f8); + if (!(f9 > -__builtin_inff16 () && f9 < 16.0f16)) __builtin_unreachable (); + __builtin_exp2f16 (f9); +} +#endif + +#ifdef __FLT32_MANT_DIG__ +void +flt32 (_Float32 f1, _Float32 f2, _Float32 f3, _Float32 f4, _Float32 f5, + _Float32 f6, _Float32 f7, _Float32 f8, _Float32 f9) +{ + if (!(f1 >= -1.0f32 && f1 <= 1.0f32)) __builtin_unreachable (); + __builtin_acosf32 (f1); + __builtin_asinf32 (f1); + if (!(f2 >= 1.0f32 && f2 <= __builtin_inff32 ())) __builtin_unreachable (); + __builtin_acoshf32 (f2); + if (!(f3 > -1.0f32 && f3 < 1.0f32)) __builtin_unreachable (); + __builtin_atanhf32 (f3); + if (!(f4 > 0.0f32 && f4 < __builtin_inff32 ())) __builtin_unreachable (); + __builtin_logf32 (f4); + __builtin_log2f32 (f4); + __builtin_log10f32 (f4); + if (!(f5 > -1.0f32 && f5 < __builtin_inff32 ())) __builtin_unreachable (); + __builtin_log1pf32 (f5); + if (!(f6 >= 0.0f32 && f6 < __builtin_inff32 ())) __builtin_unreachable (); + __builtin_sqrtf32 (f6); + if (!(f7 > -89.0f32 && f7 < 89.0f32)) __builtin_unreachable (); + __builtin_coshf32 (f7); + __builtin_sinhf32 (f7); + if (!(f8 > -__builtin_inff32 () && f8 < 88.0f32)) __builtin_unreachable (); + __builtin_expf32 (f8); + if (!(f9 > -__builtin_inff32 () && f9 < 128.0f32)) __builtin_unreachable (); + __builtin_exp2f32 (f9); +} +#endif + +void +dbl (double f1, double f2, double f3, double f4, double f5, + double f6, double f7, double f8, double f9, double f10) +{ + if (!(f1 >= -1.0 && f1 <= 1.0)) __builtin_unreachable (); + __builtin_acos (f1); + __builtin_asin (f1); + if (!(f2 >= 1.0 && f2 <= __builtin_inf ())) __builtin_unreachable (); + __builtin_acosh (f2); + if (!(f3 > -1.0 && f3 < 1.0)) __builtin_unreachable (); + __builtin_atanh (f3); + if (!(f4 > 0.0 && f4 < __builtin_inf ())) __builtin_unreachable (); + __builtin_log (f4); + __builtin_log2 (f4); + __builtin_log10 (f4); + if (!(f5 > -1.0 && f5 < __builtin_inf ())) __builtin_unreachable (); + __builtin_log1p (f5); + if (!(f6 >= 0.0 && f6 < __builtin_inf ())) __builtin_unreachable (); + __builtin_sqrt (f6); +#if __DBL_MANT_DIG__ == __FLT64_MANT_DIG__ && __DBL_MAX_EXP__ == __FLT64_MAX_EXP__ + if (!(f7 > -710.0 && f7 < 710.0)) __builtin_unreachable (); + __builtin_cosh (f7); + __builtin_sinh (f7); + if (!(f8 > -__builtin_inf () && f8 < 709.0)) __builtin_unreachable (); + __builtin_exp (f8); + if (!(f9 > -__builtin_inf () && f9 < 1024.0)) __builtin_unreachable (); + __builtin_exp2 (f9); + if (!(f10 > -__builtin_inf () && f10 < 308.0)) __builtin_unreachable (); + __builtin_exp10 (f10); +#endif +} + +#ifdef __FLT64_MANT_DIG__ +void +flt64 (_Float64 f1, _Float64 f2, _Float64 f3, _Float64 f4, _Float64 f5, + _Float64 f6, _Float64 f7, _Float64 f8, _Float64 f9) +{ + if (!(f1 >= -1.0f64 && f1 <= 1.0f64)) __builtin_unreachable (); + __builtin_acosf64 (f1); + __builtin_asinf64 (f1); + if (!(f2 >= 1.0f64 && f2 <= __builtin_inff64 ())) __builtin_unreachable (); + __builtin_acoshf64 (f2); + if (!(f3 > -1.0f64 && f3 < 1.0f64)) __builtin_unreachable (); + __builtin_atanhf64 (f3); + if (!(f4 > 0.0f64 && f4 < __builtin_inff64 ())) __builtin_unreachable (); + __builtin_logf64 (f4); + __builtin_log2f64 (f4); + __builtin_log10f64 (f4); + if (!(f5 > -1.0f64 && f5 < __builtin_inff64 ())) __builtin_unreachable (); + __builtin_log1pf64 (f5); + if (!(f6 >= 0.0f64 && f6 < __builtin_inff64 ())) __builtin_unreachable (); + __builtin_sqrtf64 (f6); + if (!(f7 > -710.0f64 && f7 < 710.0f64)) __builtin_unreachable (); + __builtin_coshf64 (f7); + __builtin_sinhf64 (f7); + if (!(f8 > -__builtin_inff64 () && f8 < 709.0f64)) __builtin_unreachable (); + __builtin_expf64 (f8); + if (!(f9 > -__builtin_inff64 () && f9 < 1024.0f64)) __builtin_unreachable (); + __builtin_exp2f64 (f9); +} +#endif + +#ifdef __FLT32X_MANT_DIG__ +void +flt32x (_Float32x f1, _Float32x f2, _Float32x f3, _Float32x f4, _Float32x f5, + _Float32x f6, _Float32x f7, _Float32x f8, _Float32x f9) +{ + if (!(f1 >= -1.0f32x && f1 <= 1.0f32x)) __builtin_unreachable (); + __builtin_acosf32x (f1); + __builtin_asinf32x (f1); + if (!(f2 >= 1.0f32x && f2 <= __builtin_inff32x ())) __builtin_unreachable (); + __builtin_acoshf32x (f2); + if (!(f3 > -1.0f32x && f3 < 1.0f32x)) __builtin_unreachable (); + __builtin_atanhf32x (f3); + if (!(f4 > 0.0f32x && f4 < __builtin_inff32x ())) __builtin_unreachable (); + __builtin_logf32x (f4); + __builtin_log2f32x (f4); + __builtin_log10f32x (f4); + if (!(f5 > -1.0f32x && f5 < __builtin_inff32x ())) __builtin_unreachable (); + __builtin_log1pf32x (f5); + if (!(f6 >= 0.0f32x && f6 < __builtin_inff32x ())) __builtin_unreachable (); + __builtin_sqrtf32x (f6); +#if __FLT32X_MANT_DIG__ == __FLT64_MANT_DIG__ && __FLT32X_MAX_EXP__ == __FLT64_MAX_EXP__ + if (!(f7 > -710.0f32x && f7 < 710.0f32x)) __builtin_unreachable (); + __builtin_coshf32x (f7); + __builtin_sinhf32x (f7); + if (!(f8 > -__builtin_inff32x () && f8 < 709.0f32x)) __builtin_unreachable (); + __builtin_expf32x (f8); + if (!(f9 > -__builtin_inff32x () && f9 < 1024.0f32x)) __builtin_unreachable (); + __builtin_exp2f32x (f9); +#endif +} +#endif + +#ifdef __FLT128_MANT_DIG__ +void +flt128 (_Float128 f1, _Float128 f2, _Float128 f3, _Float128 f4, _Float128 f5, + _Float128 f6, _Float128 f7, _Float128 f8, _Float128 f9) +{ + if (!(f1 >= -1.0f128 && f1 <= 1.0f128)) __builtin_unreachable (); + __builtin_acosf128 (f1); + __builtin_asinf128 (f1); + if (!(f2 >= 1.0f128 && f2 <= __builtin_inff128 ())) __builtin_unreachable (); + __builtin_acoshf128 (f2); + if (!(f3 > -1.0f128 && f3 < 1.0f128)) __builtin_unreachable (); + __builtin_atanhf128 (f3); + if (!(f4 > 0.0f128 && f4 < __builtin_inff128 ())) __builtin_unreachable (); + __builtin_logf128 (f4); + __builtin_log2f128 (f4); + __builtin_log10f128 (f4); + if (!(f5 > -1.0f128 && f5 < __builtin_inff128 ())) __builtin_unreachable (); + __builtin_log1pf128 (f5); + if (!(f6 >= 0.0f128 && f6 < __builtin_inff128 ())) __builtin_unreachable (); + __builtin_sqrtf128 (f6); + if (!(f7 > -11357.0f128 && f7 < 11357.0f128)) __builtin_unreachable (); + __builtin_coshf128 (f7); + __builtin_sinhf128 (f7); + if (!(f8 > -__builtin_inff128 () && f8 < 11356.0f128)) __builtin_unreachable (); + __builtin_expf128 (f8); + if (!(f9 > -__builtin_inff128 () && f9 < 16384.0f128)) __builtin_unreachable (); + __builtin_exp2f128 (f9); +} +#endif + +#ifdef __FLT64X_MANT_DIG__ +void +flt64x (_Float64x f1, _Float64x f2, _Float64x f3, _Float64x f4, _Float64x f5, + _Float64x f6, _Float64x f7, _Float64x f8, _Float64x f9) +{ + if (!(f1 >= -1.0f64x && f1 <= 1.0f64x)) __builtin_unreachable (); + __builtin_acosf64x (f1); + __builtin_asinf64x (f1); + if (!(f2 >= 1.0f64x && f2 <= __builtin_inff64x ())) __builtin_unreachable (); + __builtin_acoshf64x (f2); + if (!(f3 > -1.0f64x && f3 < 1.0f64x)) __builtin_unreachable (); + __builtin_atanhf64x (f3); + if (!(f4 > 0.0f64x && f4 < __builtin_inff64x ())) __builtin_unreachable (); + __builtin_logf64x (f4); + __builtin_log2f64x (f4); + __builtin_log10f64x (f4); + if (!(f5 > -1.0f64x && f5 < __builtin_inff64x ())) __builtin_unreachable (); + __builtin_log1pf64x (f5); + if (!(f6 >= 0.0f64x && f6 < __builtin_inff64x ())) __builtin_unreachable (); + __builtin_sqrtf64x (f6); +#if __FLT64X_MAX_EXP__ == 16384 + if (!(f7 > -11357.0f64x && f7 < 11357.0f64x)) __builtin_unreachable (); + __builtin_coshf64x (f7); + __builtin_sinhf64x (f7); + if (!(f8 > -__builtin_inff64x () && f8 < 11356.0f64x)) __builtin_unreachable (); + __builtin_expf64x (f8); + if (!(f9 > -__builtin_inff64x () && f9 < 16384.0f64x)) __builtin_unreachable (); + __builtin_exp2f64x (f9); +#endif +} +#endif diff --git a/gcc/tree-call-cdce.cc b/gcc/tree-call-cdce.cc index 341b3b9be91..143975dd112 100644 --- a/gcc/tree-call-cdce.cc +++ b/gcc/tree-call-cdce.cc @@ -677,7 +677,7 @@ gen_conditions_for_pow (gcall *pow_call, vec conds, Since IEEE only sets minimum requirements for long double format, different long double formats exist under different implementations (e.g, 64 bit double precision (DF), 80 bit double-extended - precision (XF), and 128 bit quad precision (QF) ). For simplicity, + precision (XF), and 128 bit quad precision (TF) ). For simplicity, in this implementation, the computed bounds for long double assume 64 bit format (DF), and are therefore conservative. Another assumption is that single precision float type is always SF mode, @@ -727,6 +727,8 @@ get_no_error_domain (enum built_in_function fnc) case BUILT_IN_SINHL: case BUILT_IN_COSHF64: case BUILT_IN_SINHF64: + case BUILT_IN_COSHF32X: + case BUILT_IN_SINHF32X: /* cosh: (-710, +710) */ return get_domain (-710, true, false, 710, true, false); @@ -735,6 +737,11 @@ get_no_error_domain (enum built_in_function fnc) /* coshf128: (-11357, +11357) */ return get_domain (-11357, true, false, 11357, true, false); + case BUILT_IN_COSHF64X: + case BUILT_IN_SINHF64X: + if (REAL_MODE_FORMAT (TYPE_MODE (float64x_type_node))->emax == 16384) + return get_no_error_domain (BUILT_IN_COSHF128); + return get_no_error_domain (BUILT_IN_COSH); /* Log functions: (0, +inf) */ CASE_FLT_FN (BUILT_IN_LOG): CASE_FLT_FN_FLOATN_NX (BUILT_IN_LOG): @@ -751,7 +758,7 @@ get_no_error_domain (enum built_in_function fnc) /* Exp functions. */ case BUILT_IN_EXPF16: case BUILT_IN_EXPM1F16: - /* expf: (-inf, 11) */ + /* expf16: (-inf, 11) */ return get_domain (-1, false, false, 11, true, false); case BUILT_IN_EXPF: @@ -767,6 +774,8 @@ get_no_error_domain (enum built_in_function fnc) case BUILT_IN_EXPM1L: case BUILT_IN_EXPF64: case BUILT_IN_EXPM1F64: + case BUILT_IN_EXPF32X: + case BUILT_IN_EXPM1F32X: /* exp: (-inf, 709) */ return get_domain (-1, false, false, 709, true, false); @@ -775,6 +784,11 @@ get_no_error_domain (enum built_in_function fnc) /* expf128: (-inf, 11356) */ return get_domain (-1, false, false, 11356, true, false); + case BUILT_IN_EXPF64X: + case BUILT_IN_EXPM1F64X: + if (REAL_MODE_FORMAT (TYPE_MODE (float64x_type_node))->emax == 16384) + return get_no_error_domain (BUILT_IN_EXPF128); + return get_no_error_domain (BUILT_IN_EXP); case BUILT_IN_EXP2F16: /* exp2f16: (-inf, 16) */ return get_domain (-1, false, false, @@ -787,6 +801,7 @@ get_no_error_domain (enum built_in_function fnc) case BUILT_IN_EXP2: case BUILT_IN_EXP2L: case BUILT_IN_EXP2F64: + case BUILT_IN_EXP2F32X: /* exp2: (-inf, 1024) */ return get_domain (-1, false, false, 1024, true, false); @@ -794,6 +809,10 @@ get_no_error_domain (enum built_in_function fnc) /* exp2f128: (-inf, 16384) */ return get_domain (-1, false, false, 16384, true, false); + case BUILT_IN_EXP2F64X: + if (REAL_MODE_FORMAT (TYPE_MODE (float64x_type_node))->emax == 16384) + return get_no_error_domain (BUILT_IN_EXP2F128); + return get_no_error_domain (BUILT_IN_EXP2); case BUILT_IN_EXP10F: case BUILT_IN_POW10F: /* exp10f: (-inf, 38) */