* Add option for whether ceil etc. can raise "inexact", adjust x86 conditions @ 2016-05-26 8:32 Joseph Myers 2016-05-26 17:39 ` Uros Bizjak 0 siblings, 1 reply; 24+ messages in thread From: Joseph Myers @ 2016-05-26 8:32 UTC (permalink / raw) To: gcc-patches; +Cc: hubicka, ubizjak In ISO C99/C11, the ceil, floor, round and trunc functions may or may not raise the "inexact" exception for noninteger arguments. Under TS 18661-1:2014, the C bindings for IEEE 754-2008, these functions are prohibited from raising "inexact", in line with the general rule that "inexact" is only when the mathematical infinite precision result of a function differs from the result after rounding to the target type. GCC has no option to select TS 18661 requirements for not raising "inexact" when expanding built-in versions of these functions inline. Furthermore, even given such requirements, the conditions on the x86 insn patterns for these functions are unnecessarily restrictive. I'd like to make the out-of-line glibc versions follow the TS 18661 requirements; in the cases where this slows them down (the cases using x87 floating point), that makes it more important for inline versions to be used when the user does not care about "inexact". This patch fixes these issues. A new option -fno-fp-int-builtin-inexact is added to request TS 18661 rules for these functions; the default -ffp-int-builtin-inexact reflects that such exceptions are allowed by C99 and C11. (The intention is that if C2x incorporates TS 18661-1, then the default would change in C2x mode.) The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made unconditionally available (no longer depending on -funsafe-math-optimizations or -fno-trapping-math); "inexact" is correct for noninteger arguments to rint. For floor, ceil and trunc, the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or -fno-trapping-math (they may raise "inexact" for noninteger arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that they do not raise "inexact" and so are OK unconditionally. Now, while there was no semantic reason for depending on -funsafe-math-optimizations, the insn patterns had such a dependence because of use of gen_truncxf<mode>2_i387_noop to truncate back to SFmode or DFmode after using frndint in XFmode. In this case a no-op truncation is safe because rounding to integer always produces an exactly representable value (the same reason why IEEE semantics say it shouldn't produce "inexact") - but of course that insn pattern isn't safe because it would also match cases where the truncation is not in fact a no-op. To allow frndint to be used for SFmode and DFmode without that unsafe pattern, the relevant frndint patterns are extended to SFmode and DFmode or new SFmode and DFmode patterns added, so that the frndint operation can be represented in RTL as an operation acting directly on SFmode or DFmode without the extension and the problematic truncation. A generic test of the new option is added, as well as x86-specific tests, both execution tests including the generic test with different x86 options and scan-assembler tests verifying that functions that should be inlined with different options are indeed inlined. I think other architectures are OK for TS 18661-1 semantics already. Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390 use instructions that do not raise "inexact"; nvptx does not support floating-point exceptions. (This does mean the -f option in fact only affects one architecture, but I think it should still be a -f option; it's logically architecture-independent and is expected to be affected by future -std options, so is similar to e.g. -fexcess-precision=, which also does nothing on most architectures but is implied by -std options.) Bootstrapped with no regressions on x86_64-pc-linux-gnu. OK to commit? gcc: 2016-05-26 Joseph Myers <joseph@codesourcery.com> PR target/71276 PR target/71277 * common.opt (ffp-int-builtin-inexact): New option. * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document. * config/i386/i386.md (rintxf2): Do not test flag_unsafe_math_optimizations. (rint<mode>2_frndint): New define_insn. (rint<mode>2): Do not test flag_unsafe_math_optimizations for 387 or !flag_trapping_math for SSE. Just use gen_rint<mode>2_frndint for 387 instead of extending and truncating. (frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact || !flag_trapping_math instead of flag_unsafe_math_optimizations. Change to frndint<mode>2_<rounding>. (frndintxf2_<rounding>_i387): Likewise. Change to frndint<mode>2_<rounding>_i387. (<rounding_insn>xf2): Likewise. (<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact || !flag_trapping_math instead of flag_unsafe_math_optimizations for x87. Test TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact instead of !flag_trapping_math for SSE. Use ROUND_NO_EXC in constant operand of gen_sse4_1_round<mode>2. Just use gen_frndint<mode>2_<rounding> for 387 instead of extending and truncating. gcc/testsuite: 2016-05-26 Joseph Myers <joseph@codesourcery.com> PR target/71276 PR target/71277 * gcc.dg/torture/builtin-fp-int-inexact.c, gcc.target/i386/387-builtin-fp-int-inexact.c, gcc.target/i386/387-rint-inline-1.c, gcc.target/i386/387-rint-inline-2.c, gcc.target/i386/sse2-builtin-fp-int-inexact.c, gcc.target/i386/sse2-rint-inline-1.c, gcc.target/i386/sse2-rint-inline-2.c, gcc.target/i386/sse4_1-builtin-fp-int-inexact.c, gcc.target/i386/sse4_1-rint-inline.c: New tests. Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 236740) +++ gcc/common.opt (working copy) @@ -1330,6 +1330,10 @@ Enum(fp_contract_mode) String(on) Value(FP_CONTRAC EnumValue Enum(fp_contract_mode) String(fast) Value(FP_CONTRACT_FAST) +ffp-int-builtin-inexact +Common Report Var(flag_fp_int_builtin_inexact) Optimization +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. + ; Nonzero means don't put addresses of constant functions in registers. ; Used for compiling the Unix kernel, where strange substitutions are ; done on the assembly output. Index: gcc/config/i386/i386.md =================================================================== --- gcc/config/i386/i386.md (revision 236740) +++ gcc/config/i386/i386.md (working copy) @@ -15512,25 +15512,31 @@ [(set (match_operand:XF 0 "register_operand" "=f") (unspec:XF [(match_operand:XF 1 "register_operand" "0")] UNSPEC_FRNDINT))] - "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations" + "TARGET_USE_FANCY_MATH_387" "frndint" [(set_attr "type" "fpspc") (set_attr "znver1_decode" "vector") (set_attr "mode" "XF")]) +(define_insn "rint<mode>2_frndint" + [(set (match_operand:MODEF 0 "register_operand" "=f") + (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "0")] + UNSPEC_FRNDINT))] + "TARGET_USE_FANCY_MATH_387" + "frndint" + [(set_attr "type" "fpspc") + (set_attr "znver1_decode" "vector") + (set_attr "mode" "<MODE>")]) + (define_expand "rint<mode>2" [(use (match_operand:MODEF 0 "register_operand")) (use (match_operand:MODEF 1 "register_operand"))] "(TARGET_USE_FANCY_MATH_387 && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) - || TARGET_MIX_SSE_I387) - && flag_unsafe_math_optimizations) - || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math)" + || TARGET_MIX_SSE_I387)) + || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)" { - if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math) + if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) { if (TARGET_ROUND) emit_insn (gen_sse4_1_round<mode>2 @@ -15539,15 +15545,7 @@ ix86_expand_rint (operands[0], operands[1]); } else - { - rtx op0 = gen_reg_rtx (XFmode); - rtx op1 = gen_reg_rtx (XFmode); - - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); - emit_insn (gen_rintxf2 (op0, op1)); - - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); - } + emit_insn (gen_rint<mode>2_frndint (operands[0], operands[1])); DONE; }) @@ -15770,13 +15768,13 @@ (UNSPEC_FIST_CEIL "CEIL")]) ;; Rounding mode control word calculation could clobber FLAGS_REG. -(define_insn_and_split "frndintxf2_<rounding>" - [(set (match_operand:XF 0 "register_operand") - (unspec:XF [(match_operand:XF 1 "register_operand")] +(define_insn_and_split "frndint<mode>2_<rounding>" + [(set (match_operand:X87MODEF 0 "register_operand") + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand")] FRNDINT_ROUNDING)) (clobber (reg:CC FLAGS_REG))] "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations + && (flag_fp_int_builtin_inexact || !flag_trapping_math) && can_create_pseudo_p ()" "#" "&& 1" @@ -15787,26 +15785,26 @@ operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED); operands[3] = assign_386_stack_local (HImode, SLOT_CW_<ROUNDING>); - emit_insn (gen_frndintxf2_<rounding>_i387 (operands[0], operands[1], - operands[2], operands[3])); + emit_insn (gen_frndint<mode>2_<rounding>_i387 (operands[0], operands[1], + operands[2], operands[3])); DONE; } [(set_attr "type" "frndint") (set_attr "i387_cw" "<rounding>") - (set_attr "mode" "XF")]) + (set_attr "mode" "<MODE>")]) -(define_insn "frndintxf2_<rounding>_i387" - [(set (match_operand:XF 0 "register_operand" "=f") - (unspec:XF [(match_operand:XF 1 "register_operand" "0")] - FRNDINT_ROUNDING)) +(define_insn "frndint<mode>2_<rounding>_i387" + [(set (match_operand:X87MODEF 0 "register_operand" "=f") + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand" "0")] + FRNDINT_ROUNDING)) (use (match_operand:HI 2 "memory_operand" "m")) (use (match_operand:HI 3 "memory_operand" "m"))] "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations" + && (flag_fp_int_builtin_inexact || !flag_trapping_math)" "fldcw\t%3\n\tfrndint\n\tfldcw\t%2" [(set_attr "type" "frndint") (set_attr "i387_cw" "<rounding>") - (set_attr "mode" "XF")]) + (set_attr "mode" "<MODE>")]) (define_expand "<rounding_insn>xf2" [(parallel [(set (match_operand:XF 0 "register_operand") @@ -15814,7 +15812,7 @@ FRNDINT_ROUNDING)) (clobber (reg:CC FLAGS_REG))])] "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations") + && (flag_fp_int_builtin_inexact || !flag_trapping_math)") (define_expand "<rounding_insn><mode>2" [(parallel [(set (match_operand:MODEF 0 "register_operand") @@ -15824,16 +15822,17 @@ "(TARGET_USE_FANCY_MATH_387 && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387) - && flag_unsafe_math_optimizations) + && (flag_fp_int_builtin_inexact || !flag_trapping_math)) || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math)" + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact))" { if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math) + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact)) { if (TARGET_ROUND) emit_insn (gen_sse4_1_round<mode>2 - (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING>))); + (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING> + | ROUND_NO_EXC))); else if (TARGET_64BIT || (<MODE>mode != DFmode)) { if (ROUND_<ROUNDING> == ROUND_FLOOR) @@ -15858,16 +15857,7 @@ } } else - { - rtx op0, op1; - - op0 = gen_reg_rtx (XFmode); - op1 = gen_reg_rtx (XFmode); - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); - emit_insn (gen_frndintxf2_<rounding> (op0, op1)); - - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); - } + emit_insn (gen_frndint<mode>2_<rounding> (operands[0], operands[1])); DONE; }) Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 236740) +++ gcc/doc/invoke.texi (working copy) @@ -370,9 +370,9 @@ Objective-C and Objective-C++ Dialects}. -flto-partition=@var{alg} -fmerge-all-constants @gol -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol -fmove-loop-invariants -fno-branch-count-reg @gol --fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol --fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol --fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol +-fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse @gol +-fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole @gol +-fno-peephole2 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol @@ -8531,6 +8531,24 @@ The default is @option{-fno-signaling-nans}. This option is experimental and does not currently guarantee to disable all GCC optimizations that affect signaling NaN behavior. +@item -fno-fp-int-builtin-inexact +@opindex fno-fp-int-builtin-inexact +Do not allow the built-in functions @code{ceil}, @code{floor}, +@code{round} and @code{trunc}, and their @code{float} and @code{long +double} variants, to generate code that raises the ``inexact'' +floating-point exception for noninteger arguments. ISO C99 and C11 +allow these functions to raise the ``inexact'' exception, but ISO/IEC +TS 18661-1:2014, the C bindings to IEEE 754-2008, does not allow these +functions to do so. + +The default is @option{-ffp-int-builtin-inexact}, allowing the +exception to be raised. This option does nothing unless +@option{-ftrapping-math} is in effect. + +Even if @option{-fno-fp-int-builtin-inexact} is used, if the functions +generate a call to a library function then the ``inexact'' exception +may be raised if the library implementation does not follow TS 18661. + @item -fsingle-precision-constant @opindex fsingle-precision-constant Treat floating-point constants as single precision instead of Index: gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,72 @@ +/* Test -fno-fp-int-builtin-inexact. */ +/* { dg-do run } */ +/* { dg-options "-fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ + +#include <fenv.h> + +/* Define functions locally to ensure that if the calls are not + expanded inline, failures do not occur because of libm raising + "inexact". */ + +#define LOCAL_FN(NAME, TYPE) \ + __attribute__ ((noinline, noclone)) TYPE \ + NAME (TYPE x) \ + { \ + return x; \ + } + +#define LOCAL_FNS(NAME) \ + LOCAL_FN (NAME, double) \ + LOCAL_FN (NAME ## f, float) \ + LOCAL_FN (NAME ## l, long double) + +LOCAL_FNS (ceil) +LOCAL_FNS (floor) +LOCAL_FNS (round) +LOCAL_FNS (trunc) + +extern void abort (void); +extern void exit (int); + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + if (fetestexcept (FE_INEXACT)) \ + abort (); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + TEST (__builtin_ ## FN ## l, long double); \ + } \ + while (0) + +static void +main_test (void) +{ + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (round); + FN_TESTS (trunc); +} + +/* This file may be included by architecture-specific tests. */ + +#ifndef ARCH_MAIN + +int +main (void) +{ + main_test (); + exit (0); +} + +#endif Index: gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,7 @@ +/* Test -fno-fp-int-builtin-inexact for 387. */ +/* { dg-do run } */ +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ + +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" Index: gcc/testsuite/gcc.target/i386/387-rint-inline-1.c =================================================================== --- gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (working copy) @@ -0,0 +1,36 @@ +/* Test rint and related functions expanded inline for 387. All + should be expanded when spurious "inexact" allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -ffp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + TEST (__builtin_ ## FN ## l, long double); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (trunc); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ Index: gcc/testsuite/gcc.target/i386/387-rint-inline-2.c =================================================================== --- gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (working copy) @@ -0,0 +1,30 @@ +/* Test rint and related functions expanded inline for 387. rint + should be expanded even when spurious "inexact" not allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + TEST (__builtin_ ## FN ## l, long double); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ Index: gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,12 @@ +/* Test -fno-fp-int-builtin-inexact for SSE 2. */ +/* { dg-do run } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ +/* { dg-require-effective-target sse2 } */ + +#include "sse2-check.h" + +#define main_test sse2_test +#define ARCH_MAIN +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (working copy) @@ -0,0 +1,36 @@ +/* Test rint and related functions expanded inline for SSE2. All + should be expanded when spurious "inexact" allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse -ffp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target sse2 } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (trunc); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (working copy) @@ -0,0 +1,30 @@ +/* Test rint and related functions expanded inline for SSE2. rint + should be expanded even when spurious "inexact" not allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target sse2 } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ Index: gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,12 @@ +/* Test -fno-fp-int-builtin-inexact for SSE 4.1. */ +/* { dg-do run } */ +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ +/* { dg-require-effective-target sse4 } */ + +#include "sse4_1-check.h" + +#define main_test sse4_1_test +#define ARCH_MAIN +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" Index: gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (working copy) @@ -0,0 +1,36 @@ +/* Test rint and related functions expanded inline for SSE4.1, even + when spurious "inexact" not allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target sse4 } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (trunc); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-05-26 8:32 Add option for whether ceil etc. can raise "inexact", adjust x86 conditions Joseph Myers @ 2016-05-26 17:39 ` Uros Bizjak 2016-05-27 6:14 ` Jan Hubicka 0 siblings, 1 reply; 24+ messages in thread From: Uros Bizjak @ 2016-05-26 17:39 UTC (permalink / raw) To: Joseph Myers; +Cc: gcc-patches, Jan Hubicka On Thu, May 26, 2016 at 1:46 AM, Joseph Myers <joseph@codesourcery.com> wrote: > In ISO C99/C11, the ceil, floor, round and trunc functions may or may > not raise the "inexact" exception for noninteger arguments. Under TS > 18661-1:2014, the C bindings for IEEE 754-2008, these functions are > prohibited from raising "inexact", in line with the general rule that > "inexact" is only when the mathematical infinite precision result of a > function differs from the result after rounding to the target type. > > GCC has no option to select TS 18661 requirements for not raising > "inexact" when expanding built-in versions of these functions inline. > Furthermore, even given such requirements, the conditions on the x86 > insn patterns for these functions are unnecessarily restrictive. I'd > like to make the out-of-line glibc versions follow the TS 18661 > requirements; in the cases where this slows them down (the cases using > x87 floating point), that makes it more important for inline versions > to be used when the user does not care about "inexact". > > This patch fixes these issues. A new option > -fno-fp-int-builtin-inexact is added to request TS 18661 rules for > these functions; the default -ffp-int-builtin-inexact reflects that > such exceptions are allowed by C99 and C11. (The intention is that if > C2x incorporates TS 18661-1, then the default would change in C2x > mode.) > > The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made > unconditionally available (no longer depending on > -funsafe-math-optimizations or -fno-trapping-math); "inexact" is > correct for noninteger arguments to rint. For floor, ceil and trunc, > the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or > -fno-trapping-math (they may raise "inexact" for noninteger > arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that > they do not raise "inexact" and so are OK unconditionally. > > Now, while there was no semantic reason for depending on > -funsafe-math-optimizations, the insn patterns had such a dependence > because of use of gen_truncxf<mode>2_i387_noop to truncate back to > SFmode or DFmode after using frndint in XFmode. In this case a no-op > truncation is safe because rounding to integer always produces an > exactly representable value (the same reason why IEEE semantics say it > shouldn't produce "inexact") - but of course that insn pattern isn't > safe because it would also match cases where the truncation is not in > fact a no-op. To allow frndint to be used for SFmode and DFmode > without that unsafe pattern, the relevant frndint patterns are > extended to SFmode and DFmode or new SFmode and DFmode patterns added, > so that the frndint operation can be represented in RTL as an > operation acting directly on SFmode or DFmode without the extension > and the problematic truncation. > > A generic test of the new option is added, as well as x86-specific > tests, both execution tests including the generic test with different > x86 options and scan-assembler tests verifying that functions that > should be inlined with different options are indeed inlined. > > I think other architectures are OK for TS 18661-1 semantics already. > Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390 > use instructions that do not raise "inexact"; nvptx does not support > floating-point exceptions. (This does mean the -f option in fact only > affects one architecture, but I think it should still be a -f option; > it's logically architecture-independent and is expected to be affected > by future -std options, so is similar to e.g. -fexcess-precision=, > which also does nothing on most architectures but is implied by -std > options.) > > Bootstrapped with no regressions on x86_64-pc-linux-gnu. OK to > commit? > > gcc: > 2016-05-26 Joseph Myers <joseph@codesourcery.com> > > PR target/71276 > PR target/71277 > * common.opt (ffp-int-builtin-inexact): New option. > * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document. > * config/i386/i386.md (rintxf2): Do not test > flag_unsafe_math_optimizations. > (rint<mode>2_frndint): New define_insn. > (rint<mode>2): Do not test flag_unsafe_math_optimizations for 387 > or !flag_trapping_math for SSE. Just use gen_rint<mode>2_frndint > for 387 instead of extending and truncating. > (frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact || > !flag_trapping_math instead of flag_unsafe_math_optimizations. > Change to frndint<mode>2_<rounding>. > (frndintxf2_<rounding>_i387): Likewise. Change to > frndint<mode>2_<rounding>_i387. > (<rounding_insn>xf2): Likewise. > (<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact || > !flag_trapping_math instead of flag_unsafe_math_optimizations for > x87. Test TARGET_ROUND || !flag_trapping_math || > flag_fp_int_builtin_inexact instead of !flag_trapping_math for > SSE. Use ROUND_NO_EXC in constant operand of > gen_sse4_1_round<mode>2. Just use gen_frndint<mode>2_<rounding> > for 387 instead of extending and truncating. > > gcc/testsuite: > 2016-05-26 Joseph Myers <joseph@codesourcery.com> > > PR target/71276 > PR target/71277 > * gcc.dg/torture/builtin-fp-int-inexact.c, > gcc.target/i386/387-builtin-fp-int-inexact.c, > gcc.target/i386/387-rint-inline-1.c, > gcc.target/i386/387-rint-inline-2.c, > gcc.target/i386/sse2-builtin-fp-int-inexact.c, > gcc.target/i386/sse2-rint-inline-1.c, > gcc.target/i386/sse2-rint-inline-2.c, > gcc.target/i386/sse4_1-builtin-fp-int-inexact.c, > gcc.target/i386/sse4_1-rint-inline.c: New tests. x86 part is OK. (We can make several patterns anonymous, but this should be part of possibly another cleanup patch). Thanks, Uros. > Index: gcc/common.opt > =================================================================== > --- gcc/common.opt (revision 236740) > +++ gcc/common.opt (working copy) > @@ -1330,6 +1330,10 @@ Enum(fp_contract_mode) String(on) Value(FP_CONTRAC > EnumValue > Enum(fp_contract_mode) String(fast) Value(FP_CONTRACT_FAST) > > +ffp-int-builtin-inexact > +Common Report Var(flag_fp_int_builtin_inexact) Optimization > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. > + > ; Nonzero means don't put addresses of constant functions in registers. > ; Used for compiling the Unix kernel, where strange substitutions are > ; done on the assembly output. > Index: gcc/config/i386/i386.md > =================================================================== > --- gcc/config/i386/i386.md (revision 236740) > +++ gcc/config/i386/i386.md (working copy) > @@ -15512,25 +15512,31 @@ > [(set (match_operand:XF 0 "register_operand" "=f") > (unspec:XF [(match_operand:XF 1 "register_operand" "0")] > UNSPEC_FRNDINT))] > - "TARGET_USE_FANCY_MATH_387 > - && flag_unsafe_math_optimizations" > + "TARGET_USE_FANCY_MATH_387" > "frndint" > [(set_attr "type" "fpspc") > (set_attr "znver1_decode" "vector") > (set_attr "mode" "XF")]) > > +(define_insn "rint<mode>2_frndint" > + [(set (match_operand:MODEF 0 "register_operand" "=f") > + (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "0")] > + UNSPEC_FRNDINT))] > + "TARGET_USE_FANCY_MATH_387" > + "frndint" > + [(set_attr "type" "fpspc") > + (set_attr "znver1_decode" "vector") > + (set_attr "mode" "<MODE>")]) > + > (define_expand "rint<mode>2" > [(use (match_operand:MODEF 0 "register_operand")) > (use (match_operand:MODEF 1 "register_operand"))] > "(TARGET_USE_FANCY_MATH_387 > && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) > - || TARGET_MIX_SSE_I387) > - && flag_unsafe_math_optimizations) > - || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > - && !flag_trapping_math)" > + || TARGET_MIX_SSE_I387)) > + || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)" > { > - if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > - && !flag_trapping_math) > + if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) > { > if (TARGET_ROUND) > emit_insn (gen_sse4_1_round<mode>2 > @@ -15539,15 +15545,7 @@ > ix86_expand_rint (operands[0], operands[1]); > } > else > - { > - rtx op0 = gen_reg_rtx (XFmode); > - rtx op1 = gen_reg_rtx (XFmode); > - > - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); > - emit_insn (gen_rintxf2 (op0, op1)); > - > - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); > - } > + emit_insn (gen_rint<mode>2_frndint (operands[0], operands[1])); > DONE; > }) > > @@ -15770,13 +15768,13 @@ > (UNSPEC_FIST_CEIL "CEIL")]) > > ;; Rounding mode control word calculation could clobber FLAGS_REG. > -(define_insn_and_split "frndintxf2_<rounding>" > - [(set (match_operand:XF 0 "register_operand") > - (unspec:XF [(match_operand:XF 1 "register_operand")] > +(define_insn_and_split "frndint<mode>2_<rounding>" > + [(set (match_operand:X87MODEF 0 "register_operand") > + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand")] > FRNDINT_ROUNDING)) > (clobber (reg:CC FLAGS_REG))] > "TARGET_USE_FANCY_MATH_387 > - && flag_unsafe_math_optimizations > + && (flag_fp_int_builtin_inexact || !flag_trapping_math) > && can_create_pseudo_p ()" > "#" > "&& 1" > @@ -15787,26 +15785,26 @@ > operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED); > operands[3] = assign_386_stack_local (HImode, SLOT_CW_<ROUNDING>); > > - emit_insn (gen_frndintxf2_<rounding>_i387 (operands[0], operands[1], > - operands[2], operands[3])); > + emit_insn (gen_frndint<mode>2_<rounding>_i387 (operands[0], operands[1], > + operands[2], operands[3])); > DONE; > } > [(set_attr "type" "frndint") > (set_attr "i387_cw" "<rounding>") > - (set_attr "mode" "XF")]) > + (set_attr "mode" "<MODE>")]) > > -(define_insn "frndintxf2_<rounding>_i387" > - [(set (match_operand:XF 0 "register_operand" "=f") > - (unspec:XF [(match_operand:XF 1 "register_operand" "0")] > - FRNDINT_ROUNDING)) > +(define_insn "frndint<mode>2_<rounding>_i387" > + [(set (match_operand:X87MODEF 0 "register_operand" "=f") > + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand" "0")] > + FRNDINT_ROUNDING)) > (use (match_operand:HI 2 "memory_operand" "m")) > (use (match_operand:HI 3 "memory_operand" "m"))] > "TARGET_USE_FANCY_MATH_387 > - && flag_unsafe_math_optimizations" > + && (flag_fp_int_builtin_inexact || !flag_trapping_math)" > "fldcw\t%3\n\tfrndint\n\tfldcw\t%2" > [(set_attr "type" "frndint") > (set_attr "i387_cw" "<rounding>") > - (set_attr "mode" "XF")]) > + (set_attr "mode" "<MODE>")]) > > (define_expand "<rounding_insn>xf2" > [(parallel [(set (match_operand:XF 0 "register_operand") > @@ -15814,7 +15812,7 @@ > FRNDINT_ROUNDING)) > (clobber (reg:CC FLAGS_REG))])] > "TARGET_USE_FANCY_MATH_387 > - && flag_unsafe_math_optimizations") > + && (flag_fp_int_builtin_inexact || !flag_trapping_math)") > > (define_expand "<rounding_insn><mode>2" > [(parallel [(set (match_operand:MODEF 0 "register_operand") > @@ -15824,16 +15822,17 @@ > "(TARGET_USE_FANCY_MATH_387 > && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) > || TARGET_MIX_SSE_I387) > - && flag_unsafe_math_optimizations) > + && (flag_fp_int_builtin_inexact || !flag_trapping_math)) > || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > - && !flag_trapping_math)" > + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact))" > { > if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > - && !flag_trapping_math) > + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact)) > { > if (TARGET_ROUND) > emit_insn (gen_sse4_1_round<mode>2 > - (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING>))); > + (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING> > + | ROUND_NO_EXC))); > else if (TARGET_64BIT || (<MODE>mode != DFmode)) > { > if (ROUND_<ROUNDING> == ROUND_FLOOR) > @@ -15858,16 +15857,7 @@ > } > } > else > - { > - rtx op0, op1; > - > - op0 = gen_reg_rtx (XFmode); > - op1 = gen_reg_rtx (XFmode); > - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); > - emit_insn (gen_frndintxf2_<rounding> (op0, op1)); > - > - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); > - } > + emit_insn (gen_frndint<mode>2_<rounding> (operands[0], operands[1])); > DONE; > }) > > Index: gcc/doc/invoke.texi > =================================================================== > --- gcc/doc/invoke.texi (revision 236740) > +++ gcc/doc/invoke.texi (working copy) > @@ -370,9 +370,9 @@ Objective-C and Objective-C++ Dialects}. > -flto-partition=@var{alg} -fmerge-all-constants @gol > -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol > -fmove-loop-invariants -fno-branch-count-reg @gol > --fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol > --fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol > --fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol > +-fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse @gol > +-fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole @gol > +-fno-peephole2 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol > -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol > -fomit-frame-pointer -foptimize-sibling-calls @gol > -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol > @@ -8531,6 +8531,24 @@ The default is @option{-fno-signaling-nans}. > This option is experimental and does not currently guarantee to > disable all GCC optimizations that affect signaling NaN behavior. > > +@item -fno-fp-int-builtin-inexact > +@opindex fno-fp-int-builtin-inexact > +Do not allow the built-in functions @code{ceil}, @code{floor}, > +@code{round} and @code{trunc}, and their @code{float} and @code{long > +double} variants, to generate code that raises the ``inexact'' > +floating-point exception for noninteger arguments. ISO C99 and C11 > +allow these functions to raise the ``inexact'' exception, but ISO/IEC > +TS 18661-1:2014, the C bindings to IEEE 754-2008, does not allow these > +functions to do so. > + > +The default is @option{-ffp-int-builtin-inexact}, allowing the > +exception to be raised. This option does nothing unless > +@option{-ftrapping-math} is in effect. > + > +Even if @option{-fno-fp-int-builtin-inexact} is used, if the functions > +generate a call to a library function then the ``inexact'' exception > +may be raised if the library implementation does not follow TS 18661. > + > @item -fsingle-precision-constant > @opindex fsingle-precision-constant > Treat floating-point constants as single precision instead of > Index: gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c > =================================================================== > --- gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (nonexistent) > +++ gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (working copy) > @@ -0,0 +1,72 @@ > +/* Test -fno-fp-int-builtin-inexact. */ > +/* { dg-do run } */ > +/* { dg-options "-fno-fp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > +/* { dg-require-effective-target fenv_exceptions } */ > + > +#include <fenv.h> > + > +/* Define functions locally to ensure that if the calls are not > + expanded inline, failures do not occur because of libm raising > + "inexact". */ > + > +#define LOCAL_FN(NAME, TYPE) \ > + __attribute__ ((noinline, noclone)) TYPE \ > + NAME (TYPE x) \ > + { \ > + return x; \ > + } > + > +#define LOCAL_FNS(NAME) \ > + LOCAL_FN (NAME, double) \ > + LOCAL_FN (NAME ## f, float) \ > + LOCAL_FN (NAME ## l, long double) > + > +LOCAL_FNS (ceil) > +LOCAL_FNS (floor) > +LOCAL_FNS (round) > +LOCAL_FNS (trunc) > + > +extern void abort (void); > +extern void exit (int); > + > +#define TEST(FN, TYPE) \ > + do \ > + { \ > + volatile TYPE a = 1.5, b; \ > + b = FN (a); \ > + if (fetestexcept (FE_INEXACT)) \ > + abort (); \ > + } \ > + while (0) > + > +#define FN_TESTS(FN) \ > + do \ > + { \ > + TEST (__builtin_ ## FN, double); \ > + TEST (__builtin_ ## FN ## f, float); \ > + TEST (__builtin_ ## FN ## l, long double); \ > + } \ > + while (0) > + > +static void > +main_test (void) > +{ > + FN_TESTS (ceil); > + FN_TESTS (floor); > + FN_TESTS (round); > + FN_TESTS (trunc); > +} > + > +/* This file may be included by architecture-specific tests. */ > + > +#ifndef ARCH_MAIN > + > +int > +main (void) > +{ > + main_test (); > + exit (0); > +} > + > +#endif > Index: gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (working copy) > @@ -0,0 +1,7 @@ > +/* Test -fno-fp-int-builtin-inexact for 387. */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > +/* { dg-require-effective-target fenv_exceptions } */ > + > +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" > Index: gcc/testsuite/gcc.target/i386/387-rint-inline-1.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (working copy) > @@ -0,0 +1,36 @@ > +/* Test rint and related functions expanded inline for 387. All > + should be expanded when spurious "inexact" allowed. */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -ffp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > + > +#define TEST(FN, TYPE) \ > + do \ > + { \ > + volatile TYPE a = 1.5, b; \ > + b = FN (a); \ > + } \ > + while (0) > + > +#define FN_TESTS(FN) \ > + do \ > + { \ > + TEST (__builtin_ ## FN, double); \ > + TEST (__builtin_ ## FN ## f, float); \ > + TEST (__builtin_ ## FN ## l, long double); \ > + } \ > + while (0) > + > +void > +test (void) > +{ > + FN_TESTS (rint); > + FN_TESTS (ceil); > + FN_TESTS (floor); > + FN_TESTS (trunc); > +} > + > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ > Index: gcc/testsuite/gcc.target/i386/387-rint-inline-2.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (working copy) > @@ -0,0 +1,30 @@ > +/* Test rint and related functions expanded inline for 387. rint > + should be expanded even when spurious "inexact" not allowed. */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > + > +#define TEST(FN, TYPE) \ > + do \ > + { \ > + volatile TYPE a = 1.5, b; \ > + b = FN (a); \ > + } \ > + while (0) > + > +#define FN_TESTS(FN) \ > + do \ > + { \ > + TEST (__builtin_ ## FN, double); \ > + TEST (__builtin_ ## FN ## f, float); \ > + TEST (__builtin_ ## FN ## l, long double); \ > + } \ > + while (0) > + > +void > +test (void) > +{ > + FN_TESTS (rint); > +} > + > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > Index: gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (working copy) > @@ -0,0 +1,12 @@ > +/* Test -fno-fp-int-builtin-inexact for SSE 2. */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > +/* { dg-require-effective-target fenv_exceptions } */ > +/* { dg-require-effective-target sse2 } */ > + > +#include "sse2-check.h" > + > +#define main_test sse2_test > +#define ARCH_MAIN > +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" > Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (working copy) > @@ -0,0 +1,36 @@ > +/* Test rint and related functions expanded inline for SSE2. All > + should be expanded when spurious "inexact" allowed. */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse2 -mfpmath=sse -ffp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > +/* { dg-require-effective-target sse2 } */ > + > +#define TEST(FN, TYPE) \ > + do \ > + { \ > + volatile TYPE a = 1.5, b; \ > + b = FN (a); \ > + } \ > + while (0) > + > +#define FN_TESTS(FN) \ > + do \ > + { \ > + TEST (__builtin_ ## FN, double); \ > + TEST (__builtin_ ## FN ## f, float); \ > + } \ > + while (0) > + > +void > +test (void) > +{ > + FN_TESTS (rint); > + FN_TESTS (ceil); > + FN_TESTS (floor); > + FN_TESTS (trunc); > +} > + > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ > Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (working copy) > @@ -0,0 +1,30 @@ > +/* Test rint and related functions expanded inline for SSE2. rint > + should be expanded even when spurious "inexact" not allowed. */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > +/* { dg-require-effective-target sse2 } */ > + > +#define TEST(FN, TYPE) \ > + do \ > + { \ > + volatile TYPE a = 1.5, b; \ > + b = FN (a); \ > + } \ > + while (0) > + > +#define FN_TESTS(FN) \ > + do \ > + { \ > + TEST (__builtin_ ## FN, double); \ > + TEST (__builtin_ ## FN ## f, float); \ > + } \ > + while (0) > + > +void > +test (void) > +{ > + FN_TESTS (rint); > +} > + > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > Index: gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (working copy) > @@ -0,0 +1,12 @@ > +/* Test -fno-fp-int-builtin-inexact for SSE 4.1. */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > +/* { dg-require-effective-target fenv_exceptions } */ > +/* { dg-require-effective-target sse4 } */ > + > +#include "sse4_1-check.h" > + > +#define main_test sse4_1_test > +#define ARCH_MAIN > +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" > Index: gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (working copy) > @@ -0,0 +1,36 @@ > +/* Test rint and related functions expanded inline for SSE4.1, even > + when spurious "inexact" not allowed. */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > +/* { dg-add-options c99_runtime } */ > +/* { dg-require-effective-target sse4 } */ > + > +#define TEST(FN, TYPE) \ > + do \ > + { \ > + volatile TYPE a = 1.5, b; \ > + b = FN (a); \ > + } \ > + while (0) > + > +#define FN_TESTS(FN) \ > + do \ > + { \ > + TEST (__builtin_ ## FN, double); \ > + TEST (__builtin_ ## FN ## f, float); \ > + } \ > + while (0) > + > +void > +test (void) > +{ > + FN_TESTS (rint); > + FN_TESTS (ceil); > + FN_TESTS (floor); > + FN_TESTS (trunc); > +} > + > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ > +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ > > -- > Joseph S. Myers > joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-05-26 17:39 ` Uros Bizjak @ 2016-05-27 6:14 ` Jan Hubicka 2016-05-27 9:03 ` Joseph Myers 0 siblings, 1 reply; 24+ messages in thread From: Jan Hubicka @ 2016-05-27 6:14 UTC (permalink / raw) To: Uros Bizjak; +Cc: Joseph Myers, gcc-patches, Jan Hubicka > > +ffp-int-builtin-inexact > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. When adding new codegen option which affects the correctness, it is also necessary to update can_inline_edge_p and inline_call. (In general it would be great if we had fewer such flags and more stuff explicitly represented in IL. I am not sure how hard that would be here and if it is worth the effort.) Honza > > + > > ; Nonzero means don't put addresses of constant functions in registers. > > ; Used for compiling the Unix kernel, where strange substitutions are > > ; done on the assembly output. > > Index: gcc/config/i386/i386.md > > =================================================================== > > --- gcc/config/i386/i386.md (revision 236740) > > +++ gcc/config/i386/i386.md (working copy) > > @@ -15512,25 +15512,31 @@ > > [(set (match_operand:XF 0 "register_operand" "=f") > > (unspec:XF [(match_operand:XF 1 "register_operand" "0")] > > UNSPEC_FRNDINT))] > > - "TARGET_USE_FANCY_MATH_387 > > - && flag_unsafe_math_optimizations" > > + "TARGET_USE_FANCY_MATH_387" > > "frndint" > > [(set_attr "type" "fpspc") > > (set_attr "znver1_decode" "vector") > > (set_attr "mode" "XF")]) > > > > +(define_insn "rint<mode>2_frndint" > > + [(set (match_operand:MODEF 0 "register_operand" "=f") > > + (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "0")] > > + UNSPEC_FRNDINT))] > > + "TARGET_USE_FANCY_MATH_387" > > + "frndint" > > + [(set_attr "type" "fpspc") > > + (set_attr "znver1_decode" "vector") > > + (set_attr "mode" "<MODE>")]) > > + > > (define_expand "rint<mode>2" > > [(use (match_operand:MODEF 0 "register_operand")) > > (use (match_operand:MODEF 1 "register_operand"))] > > "(TARGET_USE_FANCY_MATH_387 > > && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) > > - || TARGET_MIX_SSE_I387) > > - && flag_unsafe_math_optimizations) > > - || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > > - && !flag_trapping_math)" > > + || TARGET_MIX_SSE_I387)) > > + || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)" > > { > > - if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > > - && !flag_trapping_math) > > + if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) > > { > > if (TARGET_ROUND) > > emit_insn (gen_sse4_1_round<mode>2 > > @@ -15539,15 +15545,7 @@ > > ix86_expand_rint (operands[0], operands[1]); > > } > > else > > - { > > - rtx op0 = gen_reg_rtx (XFmode); > > - rtx op1 = gen_reg_rtx (XFmode); > > - > > - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); > > - emit_insn (gen_rintxf2 (op0, op1)); > > - > > - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); > > - } > > + emit_insn (gen_rint<mode>2_frndint (operands[0], operands[1])); > > DONE; > > }) > > > > @@ -15770,13 +15768,13 @@ > > (UNSPEC_FIST_CEIL "CEIL")]) > > > > ;; Rounding mode control word calculation could clobber FLAGS_REG. > > -(define_insn_and_split "frndintxf2_<rounding>" > > - [(set (match_operand:XF 0 "register_operand") > > - (unspec:XF [(match_operand:XF 1 "register_operand")] > > +(define_insn_and_split "frndint<mode>2_<rounding>" > > + [(set (match_operand:X87MODEF 0 "register_operand") > > + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand")] > > FRNDINT_ROUNDING)) > > (clobber (reg:CC FLAGS_REG))] > > "TARGET_USE_FANCY_MATH_387 > > - && flag_unsafe_math_optimizations > > + && (flag_fp_int_builtin_inexact || !flag_trapping_math) > > && can_create_pseudo_p ()" > > "#" > > "&& 1" > > @@ -15787,26 +15785,26 @@ > > operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED); > > operands[3] = assign_386_stack_local (HImode, SLOT_CW_<ROUNDING>); > > > > - emit_insn (gen_frndintxf2_<rounding>_i387 (operands[0], operands[1], > > - operands[2], operands[3])); > > + emit_insn (gen_frndint<mode>2_<rounding>_i387 (operands[0], operands[1], > > + operands[2], operands[3])); > > DONE; > > } > > [(set_attr "type" "frndint") > > (set_attr "i387_cw" "<rounding>") > > - (set_attr "mode" "XF")]) > > + (set_attr "mode" "<MODE>")]) > > > > -(define_insn "frndintxf2_<rounding>_i387" > > - [(set (match_operand:XF 0 "register_operand" "=f") > > - (unspec:XF [(match_operand:XF 1 "register_operand" "0")] > > - FRNDINT_ROUNDING)) > > +(define_insn "frndint<mode>2_<rounding>_i387" > > + [(set (match_operand:X87MODEF 0 "register_operand" "=f") > > + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand" "0")] > > + FRNDINT_ROUNDING)) > > (use (match_operand:HI 2 "memory_operand" "m")) > > (use (match_operand:HI 3 "memory_operand" "m"))] > > "TARGET_USE_FANCY_MATH_387 > > - && flag_unsafe_math_optimizations" > > + && (flag_fp_int_builtin_inexact || !flag_trapping_math)" > > "fldcw\t%3\n\tfrndint\n\tfldcw\t%2" > > [(set_attr "type" "frndint") > > (set_attr "i387_cw" "<rounding>") > > - (set_attr "mode" "XF")]) > > + (set_attr "mode" "<MODE>")]) > > > > (define_expand "<rounding_insn>xf2" > > [(parallel [(set (match_operand:XF 0 "register_operand") > > @@ -15814,7 +15812,7 @@ > > FRNDINT_ROUNDING)) > > (clobber (reg:CC FLAGS_REG))])] > > "TARGET_USE_FANCY_MATH_387 > > - && flag_unsafe_math_optimizations") > > + && (flag_fp_int_builtin_inexact || !flag_trapping_math)") > > > > (define_expand "<rounding_insn><mode>2" > > [(parallel [(set (match_operand:MODEF 0 "register_operand") > > @@ -15824,16 +15822,17 @@ > > "(TARGET_USE_FANCY_MATH_387 > > && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) > > || TARGET_MIX_SSE_I387) > > - && flag_unsafe_math_optimizations) > > + && (flag_fp_int_builtin_inexact || !flag_trapping_math)) > > || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > > - && !flag_trapping_math)" > > + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact))" > > { > > if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH > > - && !flag_trapping_math) > > + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact)) > > { > > if (TARGET_ROUND) > > emit_insn (gen_sse4_1_round<mode>2 > > - (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING>))); > > + (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING> > > + | ROUND_NO_EXC))); > > else if (TARGET_64BIT || (<MODE>mode != DFmode)) > > { > > if (ROUND_<ROUNDING> == ROUND_FLOOR) > > @@ -15858,16 +15857,7 @@ > > } > > } > > else > > - { > > - rtx op0, op1; > > - > > - op0 = gen_reg_rtx (XFmode); > > - op1 = gen_reg_rtx (XFmode); > > - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); > > - emit_insn (gen_frndintxf2_<rounding> (op0, op1)); > > - > > - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); > > - } > > + emit_insn (gen_frndint<mode>2_<rounding> (operands[0], operands[1])); > > DONE; > > }) > > > > Index: gcc/doc/invoke.texi > > =================================================================== > > --- gcc/doc/invoke.texi (revision 236740) > > +++ gcc/doc/invoke.texi (working copy) > > @@ -370,9 +370,9 @@ Objective-C and Objective-C++ Dialects}. > > -flto-partition=@var{alg} -fmerge-all-constants @gol > > -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol > > -fmove-loop-invariants -fno-branch-count-reg @gol > > --fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol > > --fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol > > --fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol > > +-fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse @gol > > +-fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole @gol > > +-fno-peephole2 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol > > -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol > > -fomit-frame-pointer -foptimize-sibling-calls @gol > > -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol > > @@ -8531,6 +8531,24 @@ The default is @option{-fno-signaling-nans}. > > This option is experimental and does not currently guarantee to > > disable all GCC optimizations that affect signaling NaN behavior. > > > > +@item -fno-fp-int-builtin-inexact > > +@opindex fno-fp-int-builtin-inexact > > +Do not allow the built-in functions @code{ceil}, @code{floor}, > > +@code{round} and @code{trunc}, and their @code{float} and @code{long > > +double} variants, to generate code that raises the ``inexact'' > > +floating-point exception for noninteger arguments. ISO C99 and C11 > > +allow these functions to raise the ``inexact'' exception, but ISO/IEC > > +TS 18661-1:2014, the C bindings to IEEE 754-2008, does not allow these > > +functions to do so. > > + > > +The default is @option{-ffp-int-builtin-inexact}, allowing the > > +exception to be raised. This option does nothing unless > > +@option{-ftrapping-math} is in effect. > > + > > +Even if @option{-fno-fp-int-builtin-inexact} is used, if the functions > > +generate a call to a library function then the ``inexact'' exception > > +may be raised if the library implementation does not follow TS 18661. > > + > > @item -fsingle-precision-constant > > @opindex fsingle-precision-constant > > Treat floating-point constants as single precision instead of > > Index: gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (nonexistent) > > +++ gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (working copy) > > @@ -0,0 +1,72 @@ > > +/* Test -fno-fp-int-builtin-inexact. */ > > +/* { dg-do run } */ > > +/* { dg-options "-fno-fp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > +/* { dg-require-effective-target fenv_exceptions } */ > > + > > +#include <fenv.h> > > + > > +/* Define functions locally to ensure that if the calls are not > > + expanded inline, failures do not occur because of libm raising > > + "inexact". */ > > + > > +#define LOCAL_FN(NAME, TYPE) \ > > + __attribute__ ((noinline, noclone)) TYPE \ > > + NAME (TYPE x) \ > > + { \ > > + return x; \ > > + } > > + > > +#define LOCAL_FNS(NAME) \ > > + LOCAL_FN (NAME, double) \ > > + LOCAL_FN (NAME ## f, float) \ > > + LOCAL_FN (NAME ## l, long double) > > + > > +LOCAL_FNS (ceil) > > +LOCAL_FNS (floor) > > +LOCAL_FNS (round) > > +LOCAL_FNS (trunc) > > + > > +extern void abort (void); > > +extern void exit (int); > > + > > +#define TEST(FN, TYPE) \ > > + do \ > > + { \ > > + volatile TYPE a = 1.5, b; \ > > + b = FN (a); \ > > + if (fetestexcept (FE_INEXACT)) \ > > + abort (); \ > > + } \ > > + while (0) > > + > > +#define FN_TESTS(FN) \ > > + do \ > > + { \ > > + TEST (__builtin_ ## FN, double); \ > > + TEST (__builtin_ ## FN ## f, float); \ > > + TEST (__builtin_ ## FN ## l, long double); \ > > + } \ > > + while (0) > > + > > +static void > > +main_test (void) > > +{ > > + FN_TESTS (ceil); > > + FN_TESTS (floor); > > + FN_TESTS (round); > > + FN_TESTS (trunc); > > +} > > + > > +/* This file may be included by architecture-specific tests. */ > > + > > +#ifndef ARCH_MAIN > > + > > +int > > +main (void) > > +{ > > + main_test (); > > + exit (0); > > +} > > + > > +#endif > > Index: gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (working copy) > > @@ -0,0 +1,7 @@ > > +/* Test -fno-fp-int-builtin-inexact for 387. */ > > +/* { dg-do run } */ > > +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > +/* { dg-require-effective-target fenv_exceptions } */ > > + > > +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" > > Index: gcc/testsuite/gcc.target/i386/387-rint-inline-1.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (working copy) > > @@ -0,0 +1,36 @@ > > +/* Test rint and related functions expanded inline for 387. All > > + should be expanded when spurious "inexact" allowed. */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -ffp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > + > > +#define TEST(FN, TYPE) \ > > + do \ > > + { \ > > + volatile TYPE a = 1.5, b; \ > > + b = FN (a); \ > > + } \ > > + while (0) > > + > > +#define FN_TESTS(FN) \ > > + do \ > > + { \ > > + TEST (__builtin_ ## FN, double); \ > > + TEST (__builtin_ ## FN ## f, float); \ > > + TEST (__builtin_ ## FN ## l, long double); \ > > + } \ > > + while (0) > > + > > +void > > +test (void) > > +{ > > + FN_TESTS (rint); > > + FN_TESTS (ceil); > > + FN_TESTS (floor); > > + FN_TESTS (trunc); > > +} > > + > > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ > > Index: gcc/testsuite/gcc.target/i386/387-rint-inline-2.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (working copy) > > @@ -0,0 +1,30 @@ > > +/* Test rint and related functions expanded inline for 387. rint > > + should be expanded even when spurious "inexact" not allowed. */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > + > > +#define TEST(FN, TYPE) \ > > + do \ > > + { \ > > + volatile TYPE a = 1.5, b; \ > > + b = FN (a); \ > > + } \ > > + while (0) > > + > > +#define FN_TESTS(FN) \ > > + do \ > > + { \ > > + TEST (__builtin_ ## FN, double); \ > > + TEST (__builtin_ ## FN ## f, float); \ > > + TEST (__builtin_ ## FN ## l, long double); \ > > + } \ > > + while (0) > > + > > +void > > +test (void) > > +{ > > + FN_TESTS (rint); > > +} > > + > > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > > Index: gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (working copy) > > @@ -0,0 +1,12 @@ > > +/* Test -fno-fp-int-builtin-inexact for SSE 2. */ > > +/* { dg-do run } */ > > +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > +/* { dg-require-effective-target fenv_exceptions } */ > > +/* { dg-require-effective-target sse2 } */ > > + > > +#include "sse2-check.h" > > + > > +#define main_test sse2_test > > +#define ARCH_MAIN > > +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" > > Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (working copy) > > @@ -0,0 +1,36 @@ > > +/* Test rint and related functions expanded inline for SSE2. All > > + should be expanded when spurious "inexact" allowed. */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -msse2 -mfpmath=sse -ffp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > +/* { dg-require-effective-target sse2 } */ > > + > > +#define TEST(FN, TYPE) \ > > + do \ > > + { \ > > + volatile TYPE a = 1.5, b; \ > > + b = FN (a); \ > > + } \ > > + while (0) > > + > > +#define FN_TESTS(FN) \ > > + do \ > > + { \ > > + TEST (__builtin_ ## FN, double); \ > > + TEST (__builtin_ ## FN ## f, float); \ > > + } \ > > + while (0) > > + > > +void > > +test (void) > > +{ > > + FN_TESTS (rint); > > + FN_TESTS (ceil); > > + FN_TESTS (floor); > > + FN_TESTS (trunc); > > +} > > + > > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ > > Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (working copy) > > @@ -0,0 +1,30 @@ > > +/* Test rint and related functions expanded inline for SSE2. rint > > + should be expanded even when spurious "inexact" not allowed. */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > +/* { dg-require-effective-target sse2 } */ > > + > > +#define TEST(FN, TYPE) \ > > + do \ > > + { \ > > + volatile TYPE a = 1.5, b; \ > > + b = FN (a); \ > > + } \ > > + while (0) > > + > > +#define FN_TESTS(FN) \ > > + do \ > > + { \ > > + TEST (__builtin_ ## FN, double); \ > > + TEST (__builtin_ ## FN ## f, float); \ > > + } \ > > + while (0) > > + > > +void > > +test (void) > > +{ > > + FN_TESTS (rint); > > +} > > + > > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > > Index: gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (working copy) > > @@ -0,0 +1,12 @@ > > +/* Test -fno-fp-int-builtin-inexact for SSE 4.1. */ > > +/* { dg-do run } */ > > +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > +/* { dg-require-effective-target fenv_exceptions } */ > > +/* { dg-require-effective-target sse4 } */ > > + > > +#include "sse4_1-check.h" > > + > > +#define main_test sse4_1_test > > +#define ARCH_MAIN > > +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" > > Index: gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (working copy) > > @@ -0,0 +1,36 @@ > > +/* Test rint and related functions expanded inline for SSE4.1, even > > + when spurious "inexact" not allowed. */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ > > +/* { dg-add-options c99_runtime } */ > > +/* { dg-require-effective-target sse4 } */ > > + > > +#define TEST(FN, TYPE) \ > > + do \ > > + { \ > > + volatile TYPE a = 1.5, b; \ > > + b = FN (a); \ > > + } \ > > + while (0) > > + > > +#define FN_TESTS(FN) \ > > + do \ > > + { \ > > + TEST (__builtin_ ## FN, double); \ > > + TEST (__builtin_ ## FN ## f, float); \ > > + } \ > > + while (0) > > + > > +void > > +test (void) > > +{ > > + FN_TESTS (rint); > > + FN_TESTS (ceil); > > + FN_TESTS (floor); > > + FN_TESTS (trunc); > > +} > > + > > +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ > > +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ > > > > -- > > Joseph S. Myers > > joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-05-27 6:14 ` Jan Hubicka @ 2016-05-27 9:03 ` Joseph Myers 2016-06-02 11:54 ` Ping " Joseph Myers 2017-08-15 14:11 ` Martin Jambor 0 siblings, 2 replies; 24+ messages in thread From: Joseph Myers @ 2016-05-27 9:03 UTC (permalink / raw) To: Jan Hubicka; +Cc: Uros Bizjak, gcc-patches On Thu, 26 May 2016, Jan Hubicka wrote: > > > +ffp-int-builtin-inexact > > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization > > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. > > When adding new codegen option which affects the correctness, it is also > necessary to update can_inline_edge_p and inline_call. This patch version adds handling for the new option in those places. Other changes: the default for the option is corrected so that -ffp-int-builtin-inexact really is in effect by default as intended; md.texi documentation for the patterns in question is updated to describe how they are affected by this option. Add option for whether ceil etc. can raise "inexact", adjust x86 conditions. In ISO C99/C11, the ceil, floor, round and trunc functions may or may not raise the "inexact" exception for noninteger arguments. Under TS 18661-1:2014, the C bindings for IEEE 754-2008, these functions are prohibited from raising "inexact", in line with the general rule that "inexact" is only when the mathematical infinite precision result of a function differs from the result after rounding to the target type. GCC has no option to select TS 18661 requirements for not raising "inexact" when expanding built-in versions of these functions inline. Furthermore, even given such requirements, the conditions on the x86 insn patterns for these functions are unnecessarily restrictive. I'd like to make the out-of-line glibc versions follow the TS 18661 requirements; in the cases where this slows them down (the cases using x87 floating point), that makes it more important for inline versions to be used when the user does not care about "inexact". This patch fixes these issues. A new option -fno-fp-int-builtin-inexact is added to request TS 18661 rules for these functions; the default -ffp-int-builtin-inexact reflects that such exceptions are allowed by C99 and C11. (The intention is that if C2x incorporates TS 18661-1, then the default would change in C2x mode.) The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made unconditionally available (no longer depending on -funsafe-math-optimizations or -fno-trapping-math); "inexact" is correct for noninteger arguments to rint. For floor, ceil and trunc, the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or -fno-trapping-math (they may raise "inexact" for noninteger arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that they do not raise "inexact" and so are OK unconditionally. Now, while there was no semantic reason for depending on -funsafe-math-optimizations, the insn patterns had such a dependence because of use of gen_truncxf<mode>2_i387_noop to truncate back to SFmode or DFmode after using frndint in XFmode. In this case a no-op truncation is safe because rounding to integer always produces an exactly representable value (the same reason why IEEE semantics say it shouldn't produce "inexact") - but of course that insn pattern isn't safe because it would also match cases where the truncation is not in fact a no-op. To allow frndint to be used for SFmode and DFmode without that unsafe pattern, the relevant frndint patterns are extended to SFmode and DFmode or new SFmode and DFmode patterns added, so that the frndint operation can be represented in RTL as an operation acting directly on SFmode or DFmode without the extension and the problematic truncation. A generic test of the new option is added, as well as x86-specific tests, both execution tests including the generic test with different x86 options and scan-assembler tests verifying that functions that should be inlined with different options are indeed inlined. I think other architectures are OK for TS 18661-1 semantics already. Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390 use instructions that do not raise "inexact"; nvptx does not support floating-point exceptions. (This does mean the -f option in fact only affects one architecture, but I think it should still be a -f option; it's logically architecture-independent and is expected to be affected by future -std options, so is similar to e.g. -fexcess-precision=, which also does nothing on most architectures but is implied by -std options.) Bootstrapped with no regressions on x86_64-pc-linux-gnu. OK to commit? gcc: 2016-05-26 Joseph Myers <joseph@codesourcery.com> PR target/71276 PR target/71277 * common.opt (ffp-int-builtin-inexact): New option. * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document. * doc/md.texi (floor@var{m}2, btrunc@var{m}2, round@var{m}2) (ceil@var{m}2): Document dependence on this option. * ipa-inline-transform.c (inline_call): Handle flag_fp_int_builtin_inexact. * ipa-inline.c (can_inline_edge_p): Likewise. * config/i386/i386.md (rintxf2): Do not test flag_unsafe_math_optimizations. (rint<mode>2_frndint): New define_insn. (rint<mode>2): Do not test flag_unsafe_math_optimizations for 387 or !flag_trapping_math for SSE. Just use gen_rint<mode>2_frndint for 387 instead of extending and truncating. (frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact || !flag_trapping_math instead of flag_unsafe_math_optimizations. Change to frndint<mode>2_<rounding>. (frndintxf2_<rounding>_i387): Likewise. Change to frndint<mode>2_<rounding>_i387. (<rounding_insn>xf2): Likewise. (<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact || !flag_trapping_math instead of flag_unsafe_math_optimizations for x87. Test TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact instead of !flag_trapping_math for SSE. Use ROUND_NO_EXC in constant operand of gen_sse4_1_round<mode>2. Just use gen_frndint<mode>2_<rounding> for 387 instead of extending and truncating. gcc/testsuite: 2016-05-26 Joseph Myers <joseph@codesourcery.com> PR target/71276 PR target/71277 * gcc.dg/torture/builtin-fp-int-inexact.c, gcc.target/i386/387-builtin-fp-int-inexact.c, gcc.target/i386/387-rint-inline-1.c, gcc.target/i386/387-rint-inline-2.c, gcc.target/i386/sse2-builtin-fp-int-inexact.c, gcc.target/i386/sse2-rint-inline-1.c, gcc.target/i386/sse2-rint-inline-2.c, gcc.target/i386/sse4_1-builtin-fp-int-inexact.c, gcc.target/i386/sse4_1-rint-inline.c: New tests. Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 236794) +++ gcc/common.opt (working copy) @@ -1330,6 +1330,10 @@ Enum(fp_contract_mode) String(on) Value(FP_CONTRAC EnumValue Enum(fp_contract_mode) String(fast) Value(FP_CONTRACT_FAST) +ffp-int-builtin-inexact +Common Report Var(flag_fp_int_builtin_inexact) Init(1) Optimization +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. + ; Nonzero means don't put addresses of constant functions in registers. ; Used for compiling the Unix kernel, where strange substitutions are ; done on the assembly output. Index: gcc/config/i386/i386.md =================================================================== --- gcc/config/i386/i386.md (revision 236794) +++ gcc/config/i386/i386.md (working copy) @@ -15515,25 +15515,31 @@ [(set (match_operand:XF 0 "register_operand" "=f") (unspec:XF [(match_operand:XF 1 "register_operand" "0")] UNSPEC_FRNDINT))] - "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations" + "TARGET_USE_FANCY_MATH_387" "frndint" [(set_attr "type" "fpspc") (set_attr "znver1_decode" "vector") (set_attr "mode" "XF")]) +(define_insn "rint<mode>2_frndint" + [(set (match_operand:MODEF 0 "register_operand" "=f") + (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "0")] + UNSPEC_FRNDINT))] + "TARGET_USE_FANCY_MATH_387" + "frndint" + [(set_attr "type" "fpspc") + (set_attr "znver1_decode" "vector") + (set_attr "mode" "<MODE>")]) + (define_expand "rint<mode>2" [(use (match_operand:MODEF 0 "register_operand")) (use (match_operand:MODEF 1 "register_operand"))] "(TARGET_USE_FANCY_MATH_387 && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) - || TARGET_MIX_SSE_I387) - && flag_unsafe_math_optimizations) - || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math)" + || TARGET_MIX_SSE_I387)) + || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)" { - if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math) + if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) { if (TARGET_ROUND) emit_insn (gen_sse4_1_round<mode>2 @@ -15542,15 +15548,7 @@ ix86_expand_rint (operands[0], operands[1]); } else - { - rtx op0 = gen_reg_rtx (XFmode); - rtx op1 = gen_reg_rtx (XFmode); - - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); - emit_insn (gen_rintxf2 (op0, op1)); - - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); - } + emit_insn (gen_rint<mode>2_frndint (operands[0], operands[1])); DONE; }) @@ -15773,13 +15771,13 @@ (UNSPEC_FIST_CEIL "CEIL")]) ;; Rounding mode control word calculation could clobber FLAGS_REG. -(define_insn_and_split "frndintxf2_<rounding>" - [(set (match_operand:XF 0 "register_operand") - (unspec:XF [(match_operand:XF 1 "register_operand")] +(define_insn_and_split "frndint<mode>2_<rounding>" + [(set (match_operand:X87MODEF 0 "register_operand") + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand")] FRNDINT_ROUNDING)) (clobber (reg:CC FLAGS_REG))] "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations + && (flag_fp_int_builtin_inexact || !flag_trapping_math) && can_create_pseudo_p ()" "#" "&& 1" @@ -15790,26 +15788,26 @@ operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED); operands[3] = assign_386_stack_local (HImode, SLOT_CW_<ROUNDING>); - emit_insn (gen_frndintxf2_<rounding>_i387 (operands[0], operands[1], - operands[2], operands[3])); + emit_insn (gen_frndint<mode>2_<rounding>_i387 (operands[0], operands[1], + operands[2], operands[3])); DONE; } [(set_attr "type" "frndint") (set_attr "i387_cw" "<rounding>") - (set_attr "mode" "XF")]) + (set_attr "mode" "<MODE>")]) -(define_insn "frndintxf2_<rounding>_i387" - [(set (match_operand:XF 0 "register_operand" "=f") - (unspec:XF [(match_operand:XF 1 "register_operand" "0")] - FRNDINT_ROUNDING)) +(define_insn "frndint<mode>2_<rounding>_i387" + [(set (match_operand:X87MODEF 0 "register_operand" "=f") + (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand" "0")] + FRNDINT_ROUNDING)) (use (match_operand:HI 2 "memory_operand" "m")) (use (match_operand:HI 3 "memory_operand" "m"))] "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations" + && (flag_fp_int_builtin_inexact || !flag_trapping_math)" "fldcw\t%3\n\tfrndint\n\tfldcw\t%2" [(set_attr "type" "frndint") (set_attr "i387_cw" "<rounding>") - (set_attr "mode" "XF")]) + (set_attr "mode" "<MODE>")]) (define_expand "<rounding_insn>xf2" [(parallel [(set (match_operand:XF 0 "register_operand") @@ -15817,7 +15815,7 @@ FRNDINT_ROUNDING)) (clobber (reg:CC FLAGS_REG))])] "TARGET_USE_FANCY_MATH_387 - && flag_unsafe_math_optimizations") + && (flag_fp_int_builtin_inexact || !flag_trapping_math)") (define_expand "<rounding_insn><mode>2" [(parallel [(set (match_operand:MODEF 0 "register_operand") @@ -15827,16 +15825,17 @@ "(TARGET_USE_FANCY_MATH_387 && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387) - && flag_unsafe_math_optimizations) + && (flag_fp_int_builtin_inexact || !flag_trapping_math)) || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math)" + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact))" { if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH - && !flag_trapping_math) + && (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact)) { if (TARGET_ROUND) emit_insn (gen_sse4_1_round<mode>2 - (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING>))); + (operands[0], operands[1], GEN_INT (ROUND_<ROUNDING> + | ROUND_NO_EXC))); else if (TARGET_64BIT || (<MODE>mode != DFmode)) { if (ROUND_<ROUNDING> == ROUND_FLOOR) @@ -15861,16 +15860,7 @@ } } else - { - rtx op0, op1; - - op0 = gen_reg_rtx (XFmode); - op1 = gen_reg_rtx (XFmode); - emit_insn (gen_extend<mode>xf2 (op1, operands[1])); - emit_insn (gen_frndintxf2_<rounding> (op0, op1)); - - emit_insn (gen_truncxf<mode>2_i387_noop (operands[0], op0)); - } + emit_insn (gen_frndint<mode>2_<rounding> (operands[0], operands[1])); DONE; }) Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 236794) +++ gcc/doc/invoke.texi (working copy) @@ -370,9 +370,9 @@ Objective-C and Objective-C++ Dialects}. -flto-partition=@var{alg} -fmerge-all-constants @gol -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol -fmove-loop-invariants -fno-branch-count-reg @gol --fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol --fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol --fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol +-fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse @gol +-fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole @gol +-fno-peephole2 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol @@ -8531,6 +8531,24 @@ The default is @option{-fno-signaling-nans}. This option is experimental and does not currently guarantee to disable all GCC optimizations that affect signaling NaN behavior. +@item -fno-fp-int-builtin-inexact +@opindex fno-fp-int-builtin-inexact +Do not allow the built-in functions @code{ceil}, @code{floor}, +@code{round} and @code{trunc}, and their @code{float} and @code{long +double} variants, to generate code that raises the ``inexact'' +floating-point exception for noninteger arguments. ISO C99 and C11 +allow these functions to raise the ``inexact'' exception, but ISO/IEC +TS 18661-1:2014, the C bindings to IEEE 754-2008, does not allow these +functions to do so. + +The default is @option{-ffp-int-builtin-inexact}, allowing the +exception to be raised. This option does nothing unless +@option{-ftrapping-math} is in effect. + +Even if @option{-fno-fp-int-builtin-inexact} is used, if the functions +generate a call to a library function then the ``inexact'' exception +may be raised if the library implementation does not follow TS 18661. + @item -fsingle-precision-constant @opindex fsingle-precision-constant Treat floating-point constants as single precision instead of Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi (revision 236794) +++ gcc/doc/md.texi (working copy) @@ -5554,7 +5554,9 @@ This pattern is not allowed to @code{FAIL}. @item @samp{floor@var{m}2} Store the largest integral value not greater than operand 1 in operand 0. Both operands have mode @var{m}, which is a scalar or vector -floating-point mode. +floating-point mode. If @option{-ffp-int-builtin-inexact} is in +effect, the ``inexact'' exception may be raised for noninteger +operands; otherwise, it may not. This pattern is not allowed to @code{FAIL}. @@ -5562,7 +5564,9 @@ This pattern is not allowed to @code{FAIL}. @item @samp{btrunc@var{m}2} Round operand 1 to an integer, towards zero, and store the result in operand 0. Both operands have mode @var{m}, which is a scalar or -vector floating-point mode. +vector floating-point mode. If @option{-ffp-int-builtin-inexact} is +in effect, the ``inexact'' exception may be raised for noninteger +operands; otherwise, it may not. This pattern is not allowed to @code{FAIL}. @@ -5570,7 +5574,10 @@ This pattern is not allowed to @code{FAIL}. @item @samp{round@var{m}2} Round operand 1 to the nearest integer, rounding away from zero in the event of a tie, and store the result in operand 0. Both operands have -mode @var{m}, which is a scalar or vector floating-point mode. +mode @var{m}, which is a scalar or vector floating-point mode. If +@option{-ffp-int-builtin-inexact} is in effect, the ``inexact'' +exception may be raised for noninteger operands; otherwise, it may +not. This pattern is not allowed to @code{FAIL}. @@ -5578,7 +5585,9 @@ This pattern is not allowed to @code{FAIL}. @item @samp{ceil@var{m}2} Store the smallest integral value not less than operand 1 in operand 0. Both operands have mode @var{m}, which is a scalar or vector -floating-point mode. +floating-point mode. If @option{-ffp-int-builtin-inexact} is in +effect, the ``inexact'' exception may be raised for noninteger +operands; otherwise, it may not. This pattern is not allowed to @code{FAIL}. Index: gcc/ipa-inline-transform.c =================================================================== --- gcc/ipa-inline-transform.c (revision 236794) +++ gcc/ipa-inline-transform.c (working copy) @@ -369,6 +369,8 @@ inline_call (struct cgraph_edge *e, bool update_or != opt_for_fn (to->decl, flag_associative_math) || opt_for_fn (callee->decl, flag_reciprocal_math) != opt_for_fn (to->decl, flag_reciprocal_math) + || opt_for_fn (callee->decl, flag_fp_int_builtin_inexact) + != opt_for_fn (to->decl, flag_fp_int_builtin_inexact) || opt_for_fn (callee->decl, flag_errno_math) != opt_for_fn (to->decl, flag_errno_math)) { @@ -393,6 +395,8 @@ inline_call (struct cgraph_edge *e, bool update_or = opt_for_fn (callee->decl, flag_associative_math); opts.x_flag_reciprocal_math = opt_for_fn (callee->decl, flag_reciprocal_math); + opts.x_flag_fp_int_builtin_inexact + = opt_for_fn (callee->decl, flag_fp_int_builtin_inexact); opts.x_flag_errno_math = opt_for_fn (callee->decl, flag_errno_math); if (dump_file) Index: gcc/ipa-inline.c =================================================================== --- gcc/ipa-inline.c (revision 236794) +++ gcc/ipa-inline.c (working copy) @@ -429,6 +429,7 @@ can_inline_edge_p (struct cgraph_edge *e, bool rep || check_maybe_up (flag_signed_zeros) || check_maybe_down (flag_associative_math) || check_maybe_down (flag_reciprocal_math) + || check_maybe_down (flag_fp_int_builtin_inexact) /* Strictly speaking only when the callee contains function calls that may end up setting errno. */ || check_maybe_up (flag_errno_math))) Index: gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,72 @@ +/* Test -fno-fp-int-builtin-inexact. */ +/* { dg-do run } */ +/* { dg-options "-fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ + +#include <fenv.h> + +/* Define functions locally to ensure that if the calls are not + expanded inline, failures do not occur because of libm raising + "inexact". */ + +#define LOCAL_FN(NAME, TYPE) \ + __attribute__ ((noinline, noclone)) TYPE \ + NAME (TYPE x) \ + { \ + return x; \ + } + +#define LOCAL_FNS(NAME) \ + LOCAL_FN (NAME, double) \ + LOCAL_FN (NAME ## f, float) \ + LOCAL_FN (NAME ## l, long double) + +LOCAL_FNS (ceil) +LOCAL_FNS (floor) +LOCAL_FNS (round) +LOCAL_FNS (trunc) + +extern void abort (void); +extern void exit (int); + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + if (fetestexcept (FE_INEXACT)) \ + abort (); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + TEST (__builtin_ ## FN ## l, long double); \ + } \ + while (0) + +static void +main_test (void) +{ + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (round); + FN_TESTS (trunc); +} + +/* This file may be included by architecture-specific tests. */ + +#ifndef ARCH_MAIN + +int +main (void) +{ + main_test (); + exit (0); +} + +#endif Index: gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/387-builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,7 @@ +/* Test -fno-fp-int-builtin-inexact for 387. */ +/* { dg-do run } */ +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ + +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" Index: gcc/testsuite/gcc.target/i386/387-rint-inline-1.c =================================================================== --- gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/387-rint-inline-1.c (working copy) @@ -0,0 +1,36 @@ +/* Test rint and related functions expanded inline for 387. All + should be expanded when spurious "inexact" allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -ffp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + TEST (__builtin_ ## FN ## l, long double); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (trunc); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ Index: gcc/testsuite/gcc.target/i386/387-rint-inline-2.c =================================================================== --- gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/387-rint-inline-2.c (working copy) @@ -0,0 +1,30 @@ +/* Test rint and related functions expanded inline for 387. rint + should be expanded even when spurious "inexact" not allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mfancy-math-387 -mfpmath=387 -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + TEST (__builtin_ ## FN ## l, long double); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ Index: gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse2-builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,12 @@ +/* Test -fno-fp-int-builtin-inexact for SSE 2. */ +/* { dg-do run } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ +/* { dg-require-effective-target sse2 } */ + +#include "sse2-check.h" + +#define main_test sse2_test +#define ARCH_MAIN +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-1.c (working copy) @@ -0,0 +1,36 @@ +/* Test rint and related functions expanded inline for SSE2. All + should be expanded when spurious "inexact" allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse -ffp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target sse2 } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (trunc); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ Index: gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse2-rint-inline-2.c (working copy) @@ -0,0 +1,30 @@ +/* Test rint and related functions expanded inline for SSE2. rint + should be expanded even when spurious "inexact" not allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target sse2 } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ Index: gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse4_1-builtin-fp-int-inexact.c (working copy) @@ -0,0 +1,12 @@ +/* Test -fno-fp-int-builtin-inexact for SSE 4.1. */ +/* { dg-do run } */ +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target fenv_exceptions } */ +/* { dg-require-effective-target sse4 } */ + +#include "sse4_1-check.h" + +#define main_test sse4_1_test +#define ARCH_MAIN +#include "../../gcc.dg/torture/builtin-fp-int-inexact.c" Index: gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c =================================================================== --- gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse4_1-rint-inline.c (working copy) @@ -0,0 +1,36 @@ +/* Test rint and related functions expanded inline for SSE4.1, even + when spurious "inexact" not allowed. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1 -mfpmath=sse -fno-fp-int-builtin-inexact" } */ +/* { dg-add-options c99_runtime } */ +/* { dg-require-effective-target sse4 } */ + +#define TEST(FN, TYPE) \ + do \ + { \ + volatile TYPE a = 1.5, b; \ + b = FN (a); \ + } \ + while (0) + +#define FN_TESTS(FN) \ + do \ + { \ + TEST (__builtin_ ## FN, double); \ + TEST (__builtin_ ## FN ## f, float); \ + } \ + while (0) + +void +test (void) +{ + FN_TESTS (rint); + FN_TESTS (ceil); + FN_TESTS (floor); + FN_TESTS (trunc); +} + +/* { dg-final { scan-assembler-not "\[ \t\]rint" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ceil" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]floor" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]trunc" } } */ -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Ping Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-05-27 9:03 ` Joseph Myers @ 2016-06-02 11:54 ` Joseph Myers 2016-06-02 12:00 ` Jan Hubicka 2017-08-15 14:11 ` Martin Jambor 1 sibling, 1 reply; 24+ messages in thread From: Joseph Myers @ 2016-06-02 11:54 UTC (permalink / raw) To: Jan Hubicka; +Cc: Uros Bizjak, gcc-patches Ping. This patch <https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02131.html> is pending review (for the non-x86-specific parts). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Ping Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-06-02 11:54 ` Ping " Joseph Myers @ 2016-06-02 12:00 ` Jan Hubicka 2016-06-02 12:24 ` Bernd Schmidt 2016-06-02 12:32 ` Joseph Myers 0 siblings, 2 replies; 24+ messages in thread From: Jan Hubicka @ 2016-06-02 12:00 UTC (permalink / raw) To: Joseph Myers; +Cc: Jan Hubicka, Uros Bizjak, gcc-patches > Ping. This patch > <https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02131.html> is pending > review (for the non-x86-specific parts). The inliner bits looks fine to me. Of course it is easy to check whether the function actually calls floor/ceil and thus is affected by this flag. Do you expect this to matter? I.e. do you expect that codebases will mix both values of this flag in one project and expect cross-module inlining to work with LTO? (Dealing with codegen flags in inliner is really painful. Basically I am trying to do that on demand now - when I see it blocks inlining in one of larger project I test. We will need better longer term strategry later I suppose.) Honza > > -- > Joseph S. Myers > joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Ping Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-06-02 12:00 ` Jan Hubicka @ 2016-06-02 12:24 ` Bernd Schmidt 2016-06-02 12:29 ` Joseph Myers 2016-06-02 12:32 ` Joseph Myers 1 sibling, 1 reply; 24+ messages in thread From: Bernd Schmidt @ 2016-06-02 12:24 UTC (permalink / raw) To: Jan Hubicka, Joseph Myers; +Cc: Uros Bizjak, gcc-patches On 06/02/2016 02:00 PM, Jan Hubicka wrote: >> Ping. This patch >> <https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02131.html> is pending >> review (for the non-x86-specific parts). > The inliner bits looks fine to me. In case that leaves anything unapproved, the remaining parts are OK too, modulo one question - shouldn't this option be added to the set enabled by -funsafe-math-optimizations? It looks like one pattern in i386.md used to be enabled by this option and now is no longer. Bernd ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Ping Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-06-02 12:24 ` Bernd Schmidt @ 2016-06-02 12:29 ` Joseph Myers 0 siblings, 0 replies; 24+ messages in thread From: Joseph Myers @ 2016-06-02 12:29 UTC (permalink / raw) To: Bernd Schmidt; +Cc: Jan Hubicka, Uros Bizjak, gcc-patches On Thu, 2 Jun 2016, Bernd Schmidt wrote: > On 06/02/2016 02:00 PM, Jan Hubicka wrote: > > > Ping. This patch > > > <https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02131.html> is pending > > > review (for the non-x86-specific parts). > > The inliner bits looks fine to me. > > In case that leaves anything unapproved, the remaining parts are OK too, > modulo one question - shouldn't this option be added to the set enabled by > -funsafe-math-optimizations? It looks like one pattern in i386.md used to be > enabled by this option and now is no longer. -funsafe-math-optimizations implies -fno-trapping-math which causes this option to have no effect (the difference between -ffp-int-builtin-inexact and -fno-fp-int-builtin-inexact is only meaningful if -ftrapping-math, since it relates to the raising of exceptions). The patterns testing this option all test (flag_fp_int_builtin_inexact || !flag_trapping_math). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Ping Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-06-02 12:00 ` Jan Hubicka 2016-06-02 12:24 ` Bernd Schmidt @ 2016-06-02 12:32 ` Joseph Myers 1 sibling, 0 replies; 24+ messages in thread From: Joseph Myers @ 2016-06-02 12:32 UTC (permalink / raw) To: Jan Hubicka; +Cc: Uros Bizjak, gcc-patches On Thu, 2 Jun 2016, Jan Hubicka wrote: > > Ping. This patch > > <https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02131.html> is pending > > review (for the non-x86-specific parts). > The inliner bits looks fine to me. Of course it is easy to check whether the > function actually calls floor/ceil and thus is affected by this flag. Do you > expect this to matter? I.e. do you expect that codebases will mix both values > of this flag in one project and expect cross-module inlining to work with LTO? I don't expect much use of this option until -fno-fp-int-builtin-inexact is implied by -std=c2x / -std=gnu2x (supposing TS 18661-1 gets integrated for the next major revision of the C standard - not the bug-fix revision due first). At that point people might start using those options (I'd still expect people to be consistent within one project, but maybe not for separately maintained libraries). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2016-05-27 9:03 ` Joseph Myers 2016-06-02 11:54 ` Ping " Joseph Myers @ 2017-08-15 14:11 ` Martin Jambor 2017-08-15 14:52 ` Joseph Myers 2017-08-15 15:01 ` Richard Biener 1 sibling, 2 replies; 24+ messages in thread From: Martin Jambor @ 2017-08-15 14:11 UTC (permalink / raw) To: Joseph Myers; +Cc: Jan Hubicka, Uros Bizjak, gcc-patches Hi Joseph, On Thu, May 26, 2016 at 09:02:02PM +0000, Joseph Myers wrote: > On Thu, 26 May 2016, Jan Hubicka wrote: > > > > > +ffp-int-builtin-inexact > > > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization > > > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. > > > > When adding new codegen option which affects the correctness, it is also > > necessary to update can_inline_edge_p and inline_call. > > This patch version adds handling for the new option in those places. > Other changes: the default for the option is corrected so that > -ffp-int-builtin-inexact really is in effect by default as intended; > md.texi documentation for the patterns in question is updated to > describe how they are affected by this option. > > > Add option for whether ceil etc. can raise "inexact", adjust x86 conditions. > > In ISO C99/C11, the ceil, floor, round and trunc functions may or may > not raise the "inexact" exception for noninteger arguments. Under TS > 18661-1:2014, the C bindings for IEEE 754-2008, these functions are > prohibited from raising "inexact", in line with the general rule that > "inexact" is only when the mathematical infinite precision result of a > function differs from the result after rounding to the target type. > > GCC has no option to select TS 18661 requirements for not raising > "inexact" when expanding built-in versions of these functions inline. > Furthermore, even given such requirements, the conditions on the x86 > insn patterns for these functions are unnecessarily restrictive. I'd > like to make the out-of-line glibc versions follow the TS 18661 > requirements; in the cases where this slows them down (the cases using > x87 floating point), that makes it more important for inline versions > to be used when the user does not care about "inexact". Unfortunately, I have found out that this commit regresses run-time of 538.imagick_r by about 5% on an AMD Ryzen machine and by 9% on a slightly older Intel machine when compiled with just -O2 (so with generic tuning). The problem is that ImageMagick spends a lot time calculating ceil and floor and even with with generic tuning their library implementations can use the ifunc mechanism to execute an efficient SSE 4.1 implementation on the processors that have it, whereas the inline expansion cannot do so and is much bigger and much much slower. To give you an idea, this is the profile before and after the change: | Symbol | 237073 | % of runtime | 237074 | % of runtime | sample delta | % sample delta | |----------------------------------+---------+--------------+---------+--------------+--------------+----------------| | MorphologyApply | 1058932 | 52.88% | 1043194 | 46.65% | -15738 | 98.51 | | MeanShiftImage | 508088 | 25.50% | 833378 | 37.43% | 325290 | 164.02 | | GetVirtualPixelsFromNexus | 173354 | 8.70% | 168298 | 7.56% | -5056 | 97.08 | | SetPixelCacheNexusPixels.isra.10 | 114101 | 5.72% | 112790 | 5.07% | -1311 | 98.85 | | __ceil_sse41 | 21404 | 1.07% | 0 | 0 | -21404 | 0.00 | | __floor_sse41 | 19179 | 0.96% | 0 | 0 | -19179 | 0.00 | And all of the sample count increases in MeanShiftImage can be tracked down to the line in the cource calculating if ((x-floor(x)) < (ceil(x)-x)) I am not sure what to do about this, to me it seems that the -ffp-int-builtin-inexact simply has a wrong default value, at least for x86_64, as it was added in order not to slow code down but does exactly that (all of the slowdown of course disappears when -fno-fp-int-builtin-inexact is used). Or is the situation somehow more complex? Martin > > This patch fixes these issues. A new option > -fno-fp-int-builtin-inexact is added to request TS 18661 rules for > these functions; the default -ffp-int-builtin-inexact reflects that > such exceptions are allowed by C99 and C11. (The intention is that if > C2x incorporates TS 18661-1, then the default would change in C2x > mode.) > > The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made > unconditionally available (no longer depending on > -funsafe-math-optimizations or -fno-trapping-math); "inexact" is > correct for noninteger arguments to rint. For floor, ceil and trunc, > the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or > -fno-trapping-math (they may raise "inexact" for noninteger > arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that > they do not raise "inexact" and so are OK unconditionally. > > Now, while there was no semantic reason for depending on > -funsafe-math-optimizations, the insn patterns had such a dependence > because of use of gen_truncxf<mode>2_i387_noop to truncate back to > SFmode or DFmode after using frndint in XFmode. In this case a no-op > truncation is safe because rounding to integer always produces an > exactly representable value (the same reason why IEEE semantics say it > shouldn't produce "inexact") - but of course that insn pattern isn't > safe because it would also match cases where the truncation is not in > fact a no-op. To allow frndint to be used for SFmode and DFmode > without that unsafe pattern, the relevant frndint patterns are > extended to SFmode and DFmode or new SFmode and DFmode patterns added, > so that the frndint operation can be represented in RTL as an > operation acting directly on SFmode or DFmode without the extension > and the problematic truncation. > > A generic test of the new option is added, as well as x86-specific > tests, both execution tests including the generic test with different > x86 options and scan-assembler tests verifying that functions that > should be inlined with different options are indeed inlined. > > I think other architectures are OK for TS 18661-1 semantics already. > Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390 > use instructions that do not raise "inexact"; nvptx does not support > floating-point exceptions. (This does mean the -f option in fact only > affects one architecture, but I think it should still be a -f option; > it's logically architecture-independent and is expected to be affected > by future -std options, so is similar to e.g. -fexcess-precision=, > which also does nothing on most architectures but is implied by -std > options.) > > Bootstrapped with no regressions on x86_64-pc-linux-gnu. OK to > commit? > > gcc: > 2016-05-26 Joseph Myers <joseph@codesourcery.com> > > PR target/71276 > PR target/71277 > * common.opt (ffp-int-builtin-inexact): New option. > * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document. > * doc/md.texi (floor@var{m}2, btrunc@var{m}2, round@var{m}2) > (ceil@var{m}2): Document dependence on this option. > * ipa-inline-transform.c (inline_call): Handle > flag_fp_int_builtin_inexact. > * ipa-inline.c (can_inline_edge_p): Likewise. > * config/i386/i386.md (rintxf2): Do not test > flag_unsafe_math_optimizations. > (rint<mode>2_frndint): New define_insn. > (rint<mode>2): Do not test flag_unsafe_math_optimizations for 387 > or !flag_trapping_math for SSE. Just use gen_rint<mode>2_frndint > for 387 instead of extending and truncating. > (frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact || > !flag_trapping_math instead of flag_unsafe_math_optimizations. > Change to frndint<mode>2_<rounding>. > (frndintxf2_<rounding>_i387): Likewise. Change to > frndint<mode>2_<rounding>_i387. > (<rounding_insn>xf2): Likewise. > (<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact || > !flag_trapping_math instead of flag_unsafe_math_optimizations for > x87. Test TARGET_ROUND || !flag_trapping_math || > flag_fp_int_builtin_inexact instead of !flag_trapping_math for > SSE. Use ROUND_NO_EXC in constant operand of > gen_sse4_1_round<mode>2. Just use gen_frndint<mode>2_<rounding> > for 387 instead of extending and truncating. > > gcc/testsuite: > 2016-05-26 Joseph Myers <joseph@codesourcery.com> > > PR target/71276 > PR target/71277 > * gcc.dg/torture/builtin-fp-int-inexact.c, > gcc.target/i386/387-builtin-fp-int-inexact.c, > gcc.target/i386/387-rint-inline-1.c, > gcc.target/i386/387-rint-inline-2.c, > gcc.target/i386/sse2-builtin-fp-int-inexact.c, > gcc.target/i386/sse2-rint-inline-1.c, > gcc.target/i386/sse2-rint-inline-2.c, > gcc.target/i386/sse4_1-builtin-fp-int-inexact.c, > gcc.target/i386/sse4_1-rint-inline.c: New tests. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-15 14:11 ` Martin Jambor @ 2017-08-15 14:52 ` Joseph Myers 2017-09-13 17:34 ` Martin Jambor 2017-08-15 15:01 ` Richard Biener 1 sibling, 1 reply; 24+ messages in thread From: Joseph Myers @ 2017-08-15 14:52 UTC (permalink / raw) To: Martin Jambor; +Cc: Jan Hubicka, Uros Bizjak, gcc-patches On Tue, 15 Aug 2017, Martin Jambor wrote: > I am not sure what to do about this, to me it seems that the > -ffp-int-builtin-inexact simply has a wrong default value, at least > for x86_64, as it was added in order not to slow code down but does > exactly that (all of the slowdown of course disappears when > -fno-fp-int-builtin-inexact is used). > > Or is the situation somehow more complex? It's supposed to be that -ffp-int-builtin-inexact allows inexact to be raised, and is on by default, and -fno-fp-int-builtin-inexact is the nondefault option that disallows it from being raised and may result in slower code generation. As I understand it, your issue is actually with inline SSE expansions of certain functions. Before my patch, those had !flag_trapping_math conditionals. My patch changed that to the logically correct (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact), that being the conditions under which the expansion in question is correct. Your problem is that the expansion, though correct under those conditions, is slow compared to an IFUNC implementation of the library function. Maybe that means that expansion should be disabled under some conditions where it is correct but suboptimal. It should be kept for TARGET_ROUND, because then it's expanding to a single instruction. But for !TARGET_ROUND, it's a tuning question (e.g. if tuning for a processor that would satisfy TARGET_ROUND, or for -mtune=generic, and building with recent-enough glibc, the expansion should be avoided as suboptimal, on the expectation that at runtime an IFUNC is likely to be available - or given the size of the generic SSE expansion, maybe it should be avoided more generally than that). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-15 14:52 ` Joseph Myers @ 2017-09-13 17:34 ` Martin Jambor 2017-09-13 17:47 ` Joseph Myers 2017-09-14 10:04 ` Richard Biener 0 siblings, 2 replies; 24+ messages in thread From: Martin Jambor @ 2017-09-13 17:34 UTC (permalink / raw) To: Joseph Myers; +Cc: Jan Hubicka, Uros Bizjak, gcc-patches Hello, I apologize for not coming back to this, I keep on getting distracted. Anyway... On Tue, Aug 15, 2017 at 02:20:55PM +0000, Joseph Myers wrote: > On Tue, 15 Aug 2017, Martin Jambor wrote: > > > I am not sure what to do about this, to me it seems that the > > -ffp-int-builtin-inexact simply has a wrong default value, at least > > for x86_64, as it was added in order not to slow code down but does > > exactly that (all of the slowdown of course disappears when > > -fno-fp-int-builtin-inexact is used). > > > > Or is the situation somehow more complex? > > It's supposed to be that -ffp-int-builtin-inexact allows inexact to be > raised, and is on by default, and -fno-fp-int-builtin-inexact is the > nondefault option that disallows it from being raised and may result in > slower code generation. > > As I understand it, your issue is actually with inline SSE expansions of > certain functions. Before my patch, those had !flag_trapping_math > conditionals. My patch changed that to the logically correct > (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact), that > being the conditions under which the expansion in question is correct. > Your problem is that the expansion, though correct under those conditions, > is slow compared to an IFUNC implementation of the library function. ...that is exactly right (modulo the fact that TARGET_ROUND meanwhile became TARGET_SSE4_1. > > Maybe that means that expansion should be disabled under some conditions > where it is correct but suboptimal. It should be kept for TARGET_ROUND, > because then it's expanding to a single instruction. But for > !TARGET_ROUND, it's a tuning question (e.g. if tuning for a processor that > would satisfy TARGET_ROUND, or for -mtune=generic, and building with > recent-enough glibc, the expansion should be avoided as suboptimal, on the > expectation that at runtime an IFUNC is likely to be available - or given > the size of the generic SSE expansion, maybe it should be avoided more > generally than that). This seems to me the best solution. SSE 4.1 is 11 years old, we should be tuning for it in generic tuning. That is also the reason why I do not think run-time checks for SSE 4.1 or an attempt at an internal IFUNC are a good idea (or justified effort). I was just surprised by the glibc check, what would you consider a recent-enough glibc? Or is the check mainly necessary to ensure we are indeed using glibc and not some other libc (and thus something like we do for TARGET_LIBC_PROVIDES_SSP would do)? I will try to come up with a patch. Thanks, Martin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-09-13 17:34 ` Martin Jambor @ 2017-09-13 17:47 ` Joseph Myers 2017-09-14 10:04 ` Richard Biener 1 sibling, 0 replies; 24+ messages in thread From: Joseph Myers @ 2017-09-13 17:47 UTC (permalink / raw) To: Martin Jambor; +Cc: Jan Hubicka, Uros Bizjak, gcc-patches On Wed, 13 Sep 2017, Martin Jambor wrote: > I was just surprised by the glibc check, what would you consider a > recent-enough glibc? Or is the check mainly necessary to ensure we > are indeed using glibc and not some other libc (and thus something > like we do for TARGET_LIBC_PROVIDES_SSP would do)? It looks like SSE4.1 {ceil,floor,rint}{,f} were added in glibc commit ad0f5cad15f1c76faf3843b3e189dead2c05cfcc, nearbyint{,f} in 581d30e386b9567b973a65d0bc82af782ac078ed, so 2.15 or later for all those functions (the target glibc version is known when GCC is configured, whether from configure examining headers or from --with-glibc-version). glibc does not have SSE4.1 {trunc,roundeven}{,f} at present (missing trunc is <https://sourceware.org/bugzilla/show_bug.cgi?id=20142>). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-09-13 17:34 ` Martin Jambor 2017-09-13 17:47 ` Joseph Myers @ 2017-09-14 10:04 ` Richard Biener 2017-09-14 16:50 ` Jan Hubicka 1 sibling, 1 reply; 24+ messages in thread From: Richard Biener @ 2017-09-14 10:04 UTC (permalink / raw) To: Joseph Myers, Jan Hubicka, Uros Bizjak, gcc-patches On Wed, Sep 13, 2017 at 7:34 PM, Martin Jambor <mjambor@suse.cz> wrote: > Hello, > > I apologize for not coming back to this, I keep on getting distracted. > Anyway... > > On Tue, Aug 15, 2017 at 02:20:55PM +0000, Joseph Myers wrote: >> On Tue, 15 Aug 2017, Martin Jambor wrote: >> >> > I am not sure what to do about this, to me it seems that the >> > -ffp-int-builtin-inexact simply has a wrong default value, at least >> > for x86_64, as it was added in order not to slow code down but does >> > exactly that (all of the slowdown of course disappears when >> > -fno-fp-int-builtin-inexact is used). >> > >> > Or is the situation somehow more complex? >> >> It's supposed to be that -ffp-int-builtin-inexact allows inexact to be >> raised, and is on by default, and -fno-fp-int-builtin-inexact is the >> nondefault option that disallows it from being raised and may result in >> slower code generation. >> >> As I understand it, your issue is actually with inline SSE expansions of >> certain functions. Before my patch, those had !flag_trapping_math >> conditionals. My patch changed that to the logically correct >> (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact), that >> being the conditions under which the expansion in question is correct. >> Your problem is that the expansion, though correct under those conditions, >> is slow compared to an IFUNC implementation of the library function. > > ...that is exactly right (modulo the fact that TARGET_ROUND meanwhile > became TARGET_SSE4_1. > >> >> Maybe that means that expansion should be disabled under some conditions >> where it is correct but suboptimal. It should be kept for TARGET_ROUND, >> because then it's expanding to a single instruction. But for >> !TARGET_ROUND, it's a tuning question (e.g. if tuning for a processor that >> would satisfy TARGET_ROUND, or for -mtune=generic, and building with >> recent-enough glibc, the expansion should be avoided as suboptimal, on the >> expectation that at runtime an IFUNC is likely to be available - or given >> the size of the generic SSE expansion, maybe it should be avoided more >> generally than that). > > This seems to me the best solution. SSE 4.1 is 11 years old, we > should be tuning for it in generic tuning. That is also the reason > why I do not think run-time checks for SSE 4.1 or an attempt at an > internal IFUNC are a good idea (or justified effort). Well, it's of course the poor-mans solution compared to providing our own ifunc-enabled libm ... I would expect that for SSE 4.1 the PLT and call overhead is measurable and an inline run-time check be quite a bit more efficient. As you have a testcase would it be possible to measure that by hand-editing the assembly (or the benchmark source in case it is not fortran...)? The whole point of having the inline expansions was to have inline expansions, avoding the need to spill the whole set of SSE regs around such calls. > I was just surprised by the glibc check, what would you consider a > recent-enough glibc? Or is the check mainly necessary to ensure we > are indeed using glibc and not some other libc (and thus something > like we do for TARGET_LIBC_PROVIDES_SSP would do)? > > I will try to come up with a patch. I don't think this is the appropriate solution. Try disabling the inline expansion and run SPEC (without -march=sse4.1 of course). I realize that doing the inline-expansion with a runtime check is going to be quite tricky and the GCC local IFUNC trick doesn't solve the inlining (but we might be able to avoid spilling with some IPA RA help and/or attributes?). Richard. > Thanks, > > Martin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-09-14 10:04 ` Richard Biener @ 2017-09-14 16:50 ` Jan Hubicka 0 siblings, 0 replies; 24+ messages in thread From: Jan Hubicka @ 2017-09-14 16:50 UTC (permalink / raw) To: Richard Biener; +Cc: Joseph Myers, Uros Bizjak, gcc-patches > > Well, it's of course the poor-mans solution compared to providing our own > ifunc-enabled libm ... One benefit here would be that we could have our own calling convention for this. So for floor/ceil we may just declare registers to be preserved (as they are on all modern AVX enabled cpus) which would make the code size/speed tradeoffs more interesting. Honza > > I would expect that for SSE 4.1 the PLT and call overhead is measurable > and an inline run-time check be quite a bit more efficient. As you have a > testcase would it be possible to measure that by hand-editing the assembly > (or the benchmark source in case it is not fortran...)? > > The whole point of having the inline expansions was to have inline expansions, > avoding the need to spill the whole set of SSE regs around such calls. > > > I was just surprised by the glibc check, what would you consider a > > recent-enough glibc? Or is the check mainly necessary to ensure we > > are indeed using glibc and not some other libc (and thus something > > like we do for TARGET_LIBC_PROVIDES_SSP would do)? > > > > I will try to come up with a patch. > > I don't think this is the appropriate solution. Try disabling the inline > expansion and run SPEC (without -march=sse4.1 of course). > > I realize that doing the inline-expansion with a runtime check > is going to be quite tricky and the GCC local IFUNC trick doesn't > solve the inlining (but we might be able to avoid spilling with some > IPA RA help and/or attributes?). > > Richard. > > > Thanks, > > > > Martin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-15 14:11 ` Martin Jambor 2017-08-15 14:52 ` Joseph Myers @ 2017-08-15 15:01 ` Richard Biener 2017-08-15 16:10 ` Richard Biener 1 sibling, 1 reply; 24+ messages in thread From: Richard Biener @ 2017-08-15 15:01 UTC (permalink / raw) To: Joseph Myers, Jan Hubicka, Uros Bizjak, gcc-patches On Tue, Aug 15, 2017 at 3:52 PM, Martin Jambor <mjambor@suse.cz> wrote: > Hi Joseph, > > On Thu, May 26, 2016 at 09:02:02PM +0000, Joseph Myers wrote: >> On Thu, 26 May 2016, Jan Hubicka wrote: >> >> > > > +ffp-int-builtin-inexact >> > > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization >> > > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. >> > >> > When adding new codegen option which affects the correctness, it is also >> > necessary to update can_inline_edge_p and inline_call. >> >> This patch version adds handling for the new option in those places. >> Other changes: the default for the option is corrected so that >> -ffp-int-builtin-inexact really is in effect by default as intended; >> md.texi documentation for the patterns in question is updated to >> describe how they are affected by this option. >> >> >> Add option for whether ceil etc. can raise "inexact", adjust x86 conditions. >> >> In ISO C99/C11, the ceil, floor, round and trunc functions may or may >> not raise the "inexact" exception for noninteger arguments. Under TS >> 18661-1:2014, the C bindings for IEEE 754-2008, these functions are >> prohibited from raising "inexact", in line with the general rule that >> "inexact" is only when the mathematical infinite precision result of a >> function differs from the result after rounding to the target type. >> >> GCC has no option to select TS 18661 requirements for not raising >> "inexact" when expanding built-in versions of these functions inline. >> Furthermore, even given such requirements, the conditions on the x86 >> insn patterns for these functions are unnecessarily restrictive. I'd >> like to make the out-of-line glibc versions follow the TS 18661 >> requirements; in the cases where this slows them down (the cases using >> x87 floating point), that makes it more important for inline versions >> to be used when the user does not care about "inexact". > > Unfortunately, I have found out that this commit regresses run-time of > 538.imagick_r by about 5% on an AMD Ryzen machine and by 9% on a > slightly older Intel machine when compiled with just -O2 (so with > generic tuning). > > The problem is that ImageMagick spends a lot time calculating ceil and > floor and even with with generic tuning their library implementations > can use the ifunc mechanism to execute an efficient SSE 4.1 > implementation on the processors that have it, whereas the inline > expansion cannot do so and is much bigger and much much slower. To > give you an idea, this is the profile before and after the change: > > | Symbol | 237073 | % of runtime | 237074 | % of runtime | sample delta | % sample delta | > |----------------------------------+---------+--------------+---------+--------------+--------------+----------------| > | MorphologyApply | 1058932 | 52.88% | 1043194 | 46.65% | -15738 | 98.51 | > | MeanShiftImage | 508088 | 25.50% | 833378 | 37.43% | 325290 | 164.02 | > | GetVirtualPixelsFromNexus | 173354 | 8.70% | 168298 | 7.56% | -5056 | 97.08 | > | SetPixelCacheNexusPixels.isra.10 | 114101 | 5.72% | 112790 | 5.07% | -1311 | 98.85 | > | __ceil_sse41 | 21404 | 1.07% | 0 | 0 | -21404 | 0.00 | > | __floor_sse41 | 19179 | 0.96% | 0 | 0 | -19179 | 0.00 | > > And all of the sample count increases in MeanShiftImage can be tracked > down to the line in the cource calculating > > if ((x-floor(x)) < (ceil(x)-x)) > > I am not sure what to do about this, to me it seems that the > -ffp-int-builtin-inexact simply has a wrong default value, at least > for x86_64, as it was added in order not to slow code down but does > exactly that (all of the slowdown of course disappears when > -fno-fp-int-builtin-inexact is used). > > Or is the situation somehow more complex? I suppose these days the big inline sequences for the rounding functions are no longer profitable for generic tuning (assuming 'generic' nowadays includes SSE41 support). Esp. floor/ceil includes jumpy compensation code. Note that (x - floor(x)) < (ceil(x) - x) looks like some clever simplification might speed it up. Not that I can come up with sth off my head... Richard. > Martin > > >> >> This patch fixes these issues. A new option >> -fno-fp-int-builtin-inexact is added to request TS 18661 rules for >> these functions; the default -ffp-int-builtin-inexact reflects that >> such exceptions are allowed by C99 and C11. (The intention is that if >> C2x incorporates TS 18661-1, then the default would change in C2x >> mode.) >> >> The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made >> unconditionally available (no longer depending on >> -funsafe-math-optimizations or -fno-trapping-math); "inexact" is >> correct for noninteger arguments to rint. For floor, ceil and trunc, >> the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or >> -fno-trapping-math (they may raise "inexact" for noninteger >> arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that >> they do not raise "inexact" and so are OK unconditionally. >> >> Now, while there was no semantic reason for depending on >> -funsafe-math-optimizations, the insn patterns had such a dependence >> because of use of gen_truncxf<mode>2_i387_noop to truncate back to >> SFmode or DFmode after using frndint in XFmode. In this case a no-op >> truncation is safe because rounding to integer always produces an >> exactly representable value (the same reason why IEEE semantics say it >> shouldn't produce "inexact") - but of course that insn pattern isn't >> safe because it would also match cases where the truncation is not in >> fact a no-op. To allow frndint to be used for SFmode and DFmode >> without that unsafe pattern, the relevant frndint patterns are >> extended to SFmode and DFmode or new SFmode and DFmode patterns added, >> so that the frndint operation can be represented in RTL as an >> operation acting directly on SFmode or DFmode without the extension >> and the problematic truncation. >> >> A generic test of the new option is added, as well as x86-specific >> tests, both execution tests including the generic test with different >> x86 options and scan-assembler tests verifying that functions that >> should be inlined with different options are indeed inlined. >> >> I think other architectures are OK for TS 18661-1 semantics already. >> Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390 >> use instructions that do not raise "inexact"; nvptx does not support >> floating-point exceptions. (This does mean the -f option in fact only >> affects one architecture, but I think it should still be a -f option; >> it's logically architecture-independent and is expected to be affected >> by future -std options, so is similar to e.g. -fexcess-precision=, >> which also does nothing on most architectures but is implied by -std >> options.) >> >> Bootstrapped with no regressions on x86_64-pc-linux-gnu. OK to >> commit? >> >> gcc: >> 2016-05-26 Joseph Myers <joseph@codesourcery.com> >> >> PR target/71276 >> PR target/71277 >> * common.opt (ffp-int-builtin-inexact): New option. >> * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document. >> * doc/md.texi (floor@var{m}2, btrunc@var{m}2, round@var{m}2) >> (ceil@var{m}2): Document dependence on this option. >> * ipa-inline-transform.c (inline_call): Handle >> flag_fp_int_builtin_inexact. >> * ipa-inline.c (can_inline_edge_p): Likewise. >> * config/i386/i386.md (rintxf2): Do not test >> flag_unsafe_math_optimizations. >> (rint<mode>2_frndint): New define_insn. >> (rint<mode>2): Do not test flag_unsafe_math_optimizations for 387 >> or !flag_trapping_math for SSE. Just use gen_rint<mode>2_frndint >> for 387 instead of extending and truncating. >> (frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact || >> !flag_trapping_math instead of flag_unsafe_math_optimizations. >> Change to frndint<mode>2_<rounding>. >> (frndintxf2_<rounding>_i387): Likewise. Change to >> frndint<mode>2_<rounding>_i387. >> (<rounding_insn>xf2): Likewise. >> (<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact || >> !flag_trapping_math instead of flag_unsafe_math_optimizations for >> x87. Test TARGET_ROUND || !flag_trapping_math || >> flag_fp_int_builtin_inexact instead of !flag_trapping_math for >> SSE. Use ROUND_NO_EXC in constant operand of >> gen_sse4_1_round<mode>2. Just use gen_frndint<mode>2_<rounding> >> for 387 instead of extending and truncating. >> >> gcc/testsuite: >> 2016-05-26 Joseph Myers <joseph@codesourcery.com> >> >> PR target/71276 >> PR target/71277 >> * gcc.dg/torture/builtin-fp-int-inexact.c, >> gcc.target/i386/387-builtin-fp-int-inexact.c, >> gcc.target/i386/387-rint-inline-1.c, >> gcc.target/i386/387-rint-inline-2.c, >> gcc.target/i386/sse2-builtin-fp-int-inexact.c, >> gcc.target/i386/sse2-rint-inline-1.c, >> gcc.target/i386/sse2-rint-inline-2.c, >> gcc.target/i386/sse4_1-builtin-fp-int-inexact.c, >> gcc.target/i386/sse4_1-rint-inline.c: New tests. >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-15 15:01 ` Richard Biener @ 2017-08-15 16:10 ` Richard Biener 2017-08-15 16:26 ` Richard Biener 0 siblings, 1 reply; 24+ messages in thread From: Richard Biener @ 2017-08-15 16:10 UTC (permalink / raw) To: Joseph Myers, Jan Hubicka, Uros Bizjak, gcc-patches On Tue, Aug 15, 2017 at 4:21 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Aug 15, 2017 at 3:52 PM, Martin Jambor <mjambor@suse.cz> wrote: >> Hi Joseph, >> >> On Thu, May 26, 2016 at 09:02:02PM +0000, Joseph Myers wrote: >>> On Thu, 26 May 2016, Jan Hubicka wrote: >>> >>> > > > +ffp-int-builtin-inexact >>> > > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization >>> > > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. >>> > >>> > When adding new codegen option which affects the correctness, it is also >>> > necessary to update can_inline_edge_p and inline_call. >>> >>> This patch version adds handling for the new option in those places. >>> Other changes: the default for the option is corrected so that >>> -ffp-int-builtin-inexact really is in effect by default as intended; >>> md.texi documentation for the patterns in question is updated to >>> describe how they are affected by this option. >>> >>> >>> Add option for whether ceil etc. can raise "inexact", adjust x86 conditions. >>> >>> In ISO C99/C11, the ceil, floor, round and trunc functions may or may >>> not raise the "inexact" exception for noninteger arguments. Under TS >>> 18661-1:2014, the C bindings for IEEE 754-2008, these functions are >>> prohibited from raising "inexact", in line with the general rule that >>> "inexact" is only when the mathematical infinite precision result of a >>> function differs from the result after rounding to the target type. >>> >>> GCC has no option to select TS 18661 requirements for not raising >>> "inexact" when expanding built-in versions of these functions inline. >>> Furthermore, even given such requirements, the conditions on the x86 >>> insn patterns for these functions are unnecessarily restrictive. I'd >>> like to make the out-of-line glibc versions follow the TS 18661 >>> requirements; in the cases where this slows them down (the cases using >>> x87 floating point), that makes it more important for inline versions >>> to be used when the user does not care about "inexact". >> >> Unfortunately, I have found out that this commit regresses run-time of >> 538.imagick_r by about 5% on an AMD Ryzen machine and by 9% on a >> slightly older Intel machine when compiled with just -O2 (so with >> generic tuning). >> >> The problem is that ImageMagick spends a lot time calculating ceil and >> floor and even with with generic tuning their library implementations >> can use the ifunc mechanism to execute an efficient SSE 4.1 >> implementation on the processors that have it, whereas the inline >> expansion cannot do so and is much bigger and much much slower. To >> give you an idea, this is the profile before and after the change: >> >> | Symbol | 237073 | % of runtime | 237074 | % of runtime | sample delta | % sample delta | >> |----------------------------------+---------+--------------+---------+--------------+--------------+----------------| >> | MorphologyApply | 1058932 | 52.88% | 1043194 | 46.65% | -15738 | 98.51 | >> | MeanShiftImage | 508088 | 25.50% | 833378 | 37.43% | 325290 | 164.02 | >> | GetVirtualPixelsFromNexus | 173354 | 8.70% | 168298 | 7.56% | -5056 | 97.08 | >> | SetPixelCacheNexusPixels.isra.10 | 114101 | 5.72% | 112790 | 5.07% | -1311 | 98.85 | >> | __ceil_sse41 | 21404 | 1.07% | 0 | 0 | -21404 | 0.00 | >> | __floor_sse41 | 19179 | 0.96% | 0 | 0 | -19179 | 0.00 | >> >> And all of the sample count increases in MeanShiftImage can be tracked >> down to the line in the cource calculating >> >> if ((x-floor(x)) < (ceil(x)-x)) >> >> I am not sure what to do about this, to me it seems that the >> -ffp-int-builtin-inexact simply has a wrong default value, at least >> for x86_64, as it was added in order not to slow code down but does >> exactly that (all of the slowdown of course disappears when >> -fno-fp-int-builtin-inexact is used). >> >> Or is the situation somehow more complex? > > I suppose these days the big inline sequences for the rounding functions > are no longer profitable for generic tuning (assuming 'generic' nowadays > includes SSE41 support). Esp. floor/ceil includes jumpy compensation > code. Note other options are to inline if (__builtin_cpu_supports ("sse4.1")) ... else ... or to emit a call to a (local? comdat?) __gcc_floor ifunc dispatcher and emit the ifunc math library ourselves (like we'd do with attribute(target(""))). Not sure if we really can assume glibc is intelligent enough -- does it have non-SSE4.1 implementations for ceil/floor? Back in time I implemented these SSE2 expansions it used the generic C code which was awfully slow... Richard. > Note that (x - floor(x)) < (ceil(x) - x) looks like some clever simplification > might speed it up. Not that I can come up with sth off my head... > > Richard. > >> Martin >> >> >>> >>> This patch fixes these issues. A new option >>> -fno-fp-int-builtin-inexact is added to request TS 18661 rules for >>> these functions; the default -ffp-int-builtin-inexact reflects that >>> such exceptions are allowed by C99 and C11. (The intention is that if >>> C2x incorporates TS 18661-1, then the default would change in C2x >>> mode.) >>> >>> The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made >>> unconditionally available (no longer depending on >>> -funsafe-math-optimizations or -fno-trapping-math); "inexact" is >>> correct for noninteger arguments to rint. For floor, ceil and trunc, >>> the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or >>> -fno-trapping-math (they may raise "inexact" for noninteger >>> arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that >>> they do not raise "inexact" and so are OK unconditionally. >>> >>> Now, while there was no semantic reason for depending on >>> -funsafe-math-optimizations, the insn patterns had such a dependence >>> because of use of gen_truncxf<mode>2_i387_noop to truncate back to >>> SFmode or DFmode after using frndint in XFmode. In this case a no-op >>> truncation is safe because rounding to integer always produces an >>> exactly representable value (the same reason why IEEE semantics say it >>> shouldn't produce "inexact") - but of course that insn pattern isn't >>> safe because it would also match cases where the truncation is not in >>> fact a no-op. To allow frndint to be used for SFmode and DFmode >>> without that unsafe pattern, the relevant frndint patterns are >>> extended to SFmode and DFmode or new SFmode and DFmode patterns added, >>> so that the frndint operation can be represented in RTL as an >>> operation acting directly on SFmode or DFmode without the extension >>> and the problematic truncation. >>> >>> A generic test of the new option is added, as well as x86-specific >>> tests, both execution tests including the generic test with different >>> x86 options and scan-assembler tests verifying that functions that >>> should be inlined with different options are indeed inlined. >>> >>> I think other architectures are OK for TS 18661-1 semantics already. >>> Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390 >>> use instructions that do not raise "inexact"; nvptx does not support >>> floating-point exceptions. (This does mean the -f option in fact only >>> affects one architecture, but I think it should still be a -f option; >>> it's logically architecture-independent and is expected to be affected >>> by future -std options, so is similar to e.g. -fexcess-precision=, >>> which also does nothing on most architectures but is implied by -std >>> options.) >>> >>> Bootstrapped with no regressions on x86_64-pc-linux-gnu. OK to >>> commit? >>> >>> gcc: >>> 2016-05-26 Joseph Myers <joseph@codesourcery.com> >>> >>> PR target/71276 >>> PR target/71277 >>> * common.opt (ffp-int-builtin-inexact): New option. >>> * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document. >>> * doc/md.texi (floor@var{m}2, btrunc@var{m}2, round@var{m}2) >>> (ceil@var{m}2): Document dependence on this option. >>> * ipa-inline-transform.c (inline_call): Handle >>> flag_fp_int_builtin_inexact. >>> * ipa-inline.c (can_inline_edge_p): Likewise. >>> * config/i386/i386.md (rintxf2): Do not test >>> flag_unsafe_math_optimizations. >>> (rint<mode>2_frndint): New define_insn. >>> (rint<mode>2): Do not test flag_unsafe_math_optimizations for 387 >>> or !flag_trapping_math for SSE. Just use gen_rint<mode>2_frndint >>> for 387 instead of extending and truncating. >>> (frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact || >>> !flag_trapping_math instead of flag_unsafe_math_optimizations. >>> Change to frndint<mode>2_<rounding>. >>> (frndintxf2_<rounding>_i387): Likewise. Change to >>> frndint<mode>2_<rounding>_i387. >>> (<rounding_insn>xf2): Likewise. >>> (<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact || >>> !flag_trapping_math instead of flag_unsafe_math_optimizations for >>> x87. Test TARGET_ROUND || !flag_trapping_math || >>> flag_fp_int_builtin_inexact instead of !flag_trapping_math for >>> SSE. Use ROUND_NO_EXC in constant operand of >>> gen_sse4_1_round<mode>2. Just use gen_frndint<mode>2_<rounding> >>> for 387 instead of extending and truncating. >>> >>> gcc/testsuite: >>> 2016-05-26 Joseph Myers <joseph@codesourcery.com> >>> >>> PR target/71276 >>> PR target/71277 >>> * gcc.dg/torture/builtin-fp-int-inexact.c, >>> gcc.target/i386/387-builtin-fp-int-inexact.c, >>> gcc.target/i386/387-rint-inline-1.c, >>> gcc.target/i386/387-rint-inline-2.c, >>> gcc.target/i386/sse2-builtin-fp-int-inexact.c, >>> gcc.target/i386/sse2-rint-inline-1.c, >>> gcc.target/i386/sse2-rint-inline-2.c, >>> gcc.target/i386/sse4_1-builtin-fp-int-inexact.c, >>> gcc.target/i386/sse4_1-rint-inline.c: New tests. >>> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-15 16:10 ` Richard Biener @ 2017-08-15 16:26 ` Richard Biener 2017-08-15 21:20 ` Uros Bizjak 0 siblings, 1 reply; 24+ messages in thread From: Richard Biener @ 2017-08-15 16:26 UTC (permalink / raw) To: Joseph Myers, Jan Hubicka, Uros Bizjak, gcc-patches On Tue, Aug 15, 2017 at 4:43 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Aug 15, 2017 at 4:21 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Tue, Aug 15, 2017 at 3:52 PM, Martin Jambor <mjambor@suse.cz> wrote: >>> Hi Joseph, >>> >>> On Thu, May 26, 2016 at 09:02:02PM +0000, Joseph Myers wrote: >>>> On Thu, 26 May 2016, Jan Hubicka wrote: >>>> >>>> > > > +ffp-int-builtin-inexact >>>> > > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization >>>> > > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions. >>>> > >>>> > When adding new codegen option which affects the correctness, it is also >>>> > necessary to update can_inline_edge_p and inline_call. >>>> >>>> This patch version adds handling for the new option in those places. >>>> Other changes: the default for the option is corrected so that >>>> -ffp-int-builtin-inexact really is in effect by default as intended; >>>> md.texi documentation for the patterns in question is updated to >>>> describe how they are affected by this option. >>>> >>>> >>>> Add option for whether ceil etc. can raise "inexact", adjust x86 conditions. >>>> >>>> In ISO C99/C11, the ceil, floor, round and trunc functions may or may >>>> not raise the "inexact" exception for noninteger arguments. Under TS >>>> 18661-1:2014, the C bindings for IEEE 754-2008, these functions are >>>> prohibited from raising "inexact", in line with the general rule that >>>> "inexact" is only when the mathematical infinite precision result of a >>>> function differs from the result after rounding to the target type. >>>> >>>> GCC has no option to select TS 18661 requirements for not raising >>>> "inexact" when expanding built-in versions of these functions inline. >>>> Furthermore, even given such requirements, the conditions on the x86 >>>> insn patterns for these functions are unnecessarily restrictive. I'd >>>> like to make the out-of-line glibc versions follow the TS 18661 >>>> requirements; in the cases where this slows them down (the cases using >>>> x87 floating point), that makes it more important for inline versions >>>> to be used when the user does not care about "inexact". >>> >>> Unfortunately, I have found out that this commit regresses run-time of >>> 538.imagick_r by about 5% on an AMD Ryzen machine and by 9% on a >>> slightly older Intel machine when compiled with just -O2 (so with >>> generic tuning). >>> >>> The problem is that ImageMagick spends a lot time calculating ceil and >>> floor and even with with generic tuning their library implementations >>> can use the ifunc mechanism to execute an efficient SSE 4.1 >>> implementation on the processors that have it, whereas the inline >>> expansion cannot do so and is much bigger and much much slower. To >>> give you an idea, this is the profile before and after the change: >>> >>> | Symbol | 237073 | % of runtime | 237074 | % of runtime | sample delta | % sample delta | >>> |----------------------------------+---------+--------------+---------+--------------+--------------+----------------| >>> | MorphologyApply | 1058932 | 52.88% | 1043194 | 46.65% | -15738 | 98.51 | >>> | MeanShiftImage | 508088 | 25.50% | 833378 | 37.43% | 325290 | 164.02 | >>> | GetVirtualPixelsFromNexus | 173354 | 8.70% | 168298 | 7.56% | -5056 | 97.08 | >>> | SetPixelCacheNexusPixels.isra.10 | 114101 | 5.72% | 112790 | 5.07% | -1311 | 98.85 | >>> | __ceil_sse41 | 21404 | 1.07% | 0 | 0 | -21404 | 0.00 | >>> | __floor_sse41 | 19179 | 0.96% | 0 | 0 | -19179 | 0.00 | >>> >>> And all of the sample count increases in MeanShiftImage can be tracked >>> down to the line in the cource calculating >>> >>> if ((x-floor(x)) < (ceil(x)-x)) >>> >>> I am not sure what to do about this, to me it seems that the >>> -ffp-int-builtin-inexact simply has a wrong default value, at least >>> for x86_64, as it was added in order not to slow code down but does >>> exactly that (all of the slowdown of course disappears when >>> -fno-fp-int-builtin-inexact is used). >>> >>> Or is the situation somehow more complex? >> >> I suppose these days the big inline sequences for the rounding functions >> are no longer profitable for generic tuning (assuming 'generic' nowadays >> includes SSE41 support). Esp. floor/ceil includes jumpy compensation >> code. > > Note other options are to inline if (__builtin_cpu_supports > ("sse4.1")) ... else ... > or to emit a call to a (local? comdat?) __gcc_floor ifunc dispatcher and > emit the ifunc math library ourselves (like we'd do with attribute(target(""))). > > Not sure if we really can assume glibc is intelligent enough -- does it have > non-SSE4.1 implementations for ceil/floor? Back in time I implemented > these SSE2 expansions it used the generic C code which was awfully slow... Still uses sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c which is quite slow compared to using our inline sequence. So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1")) as the sse4.1 sequence is just a single instruction. The interesting part of the story will be to make sure we can emit that even if ! TARGET_ROUND ... Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc better? Note the ABI boundary will be expensive but I guess the conditional sequence as well (and it will disturb RA even if predicted to have SSE 4.1). Richard. > Richard. > >> Note that (x - floor(x)) < (ceil(x) - x) looks like some clever simplification >> might speed it up. Not that I can come up with sth off my head... >> >> Richard. >> >>> Martin >>> >>> >>>> >>>> This patch fixes these issues. A new option >>>> -fno-fp-int-builtin-inexact is added to request TS 18661 rules for >>>> these functions; the default -ffp-int-builtin-inexact reflects that >>>> such exceptions are allowed by C99 and C11. (The intention is that if >>>> C2x incorporates TS 18661-1, then the default would change in C2x >>>> mode.) >>>> >>>> The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made >>>> unconditionally available (no longer depending on >>>> -funsafe-math-optimizations or -fno-trapping-math); "inexact" is >>>> correct for noninteger arguments to rint. For floor, ceil and trunc, >>>> the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or >>>> -fno-trapping-math (they may raise "inexact" for noninteger >>>> arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that >>>> they do not raise "inexact" and so are OK unconditionally. >>>> >>>> Now, while there was no semantic reason for depending on >>>> -funsafe-math-optimizations, the insn patterns had such a dependence >>>> because of use of gen_truncxf<mode>2_i387_noop to truncate back to >>>> SFmode or DFmode after using frndint in XFmode. In this case a no-op >>>> truncation is safe because rounding to integer always produces an >>>> exactly representable value (the same reason why IEEE semantics say it >>>> shouldn't produce "inexact") - but of course that insn pattern isn't >>>> safe because it would also match cases where the truncation is not in >>>> fact a no-op. To allow frndint to be used for SFmode and DFmode >>>> without that unsafe pattern, the relevant frndint patterns are >>>> extended to SFmode and DFmode or new SFmode and DFmode patterns added, >>>> so that the frndint operation can be represented in RTL as an >>>> operation acting directly on SFmode or DFmode without the extension >>>> and the problematic truncation. >>>> >>>> A generic test of the new option is added, as well as x86-specific >>>> tests, both execution tests including the generic test with different >>>> x86 options and scan-assembler tests verifying that functions that >>>> should be inlined with different options are indeed inlined. >>>> >>>> I think other architectures are OK for TS 18661-1 semantics already. >>>> Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390 >>>> use instructions that do not raise "inexact"; nvptx does not support >>>> floating-point exceptions. (This does mean the -f option in fact only >>>> affects one architecture, but I think it should still be a -f option; >>>> it's logically architecture-independent and is expected to be affected >>>> by future -std options, so is similar to e.g. -fexcess-precision=, >>>> which also does nothing on most architectures but is implied by -std >>>> options.) >>>> >>>> Bootstrapped with no regressions on x86_64-pc-linux-gnu. OK to >>>> commit? >>>> >>>> gcc: >>>> 2016-05-26 Joseph Myers <joseph@codesourcery.com> >>>> >>>> PR target/71276 >>>> PR target/71277 >>>> * common.opt (ffp-int-builtin-inexact): New option. >>>> * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document. >>>> * doc/md.texi (floor@var{m}2, btrunc@var{m}2, round@var{m}2) >>>> (ceil@var{m}2): Document dependence on this option. >>>> * ipa-inline-transform.c (inline_call): Handle >>>> flag_fp_int_builtin_inexact. >>>> * ipa-inline.c (can_inline_edge_p): Likewise. >>>> * config/i386/i386.md (rintxf2): Do not test >>>> flag_unsafe_math_optimizations. >>>> (rint<mode>2_frndint): New define_insn. >>>> (rint<mode>2): Do not test flag_unsafe_math_optimizations for 387 >>>> or !flag_trapping_math for SSE. Just use gen_rint<mode>2_frndint >>>> for 387 instead of extending and truncating. >>>> (frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact || >>>> !flag_trapping_math instead of flag_unsafe_math_optimizations. >>>> Change to frndint<mode>2_<rounding>. >>>> (frndintxf2_<rounding>_i387): Likewise. Change to >>>> frndint<mode>2_<rounding>_i387. >>>> (<rounding_insn>xf2): Likewise. >>>> (<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact || >>>> !flag_trapping_math instead of flag_unsafe_math_optimizations for >>>> x87. Test TARGET_ROUND || !flag_trapping_math || >>>> flag_fp_int_builtin_inexact instead of !flag_trapping_math for >>>> SSE. Use ROUND_NO_EXC in constant operand of >>>> gen_sse4_1_round<mode>2. Just use gen_frndint<mode>2_<rounding> >>>> for 387 instead of extending and truncating. >>>> >>>> gcc/testsuite: >>>> 2016-05-26 Joseph Myers <joseph@codesourcery.com> >>>> >>>> PR target/71276 >>>> PR target/71277 >>>> * gcc.dg/torture/builtin-fp-int-inexact.c, >>>> gcc.target/i386/387-builtin-fp-int-inexact.c, >>>> gcc.target/i386/387-rint-inline-1.c, >>>> gcc.target/i386/387-rint-inline-2.c, >>>> gcc.target/i386/sse2-builtin-fp-int-inexact.c, >>>> gcc.target/i386/sse2-rint-inline-1.c, >>>> gcc.target/i386/sse2-rint-inline-2.c, >>>> gcc.target/i386/sse4_1-builtin-fp-int-inexact.c, >>>> gcc.target/i386/sse4_1-rint-inline.c: New tests. >>>> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-15 16:26 ` Richard Biener @ 2017-08-15 21:20 ` Uros Bizjak 2017-08-16 10:51 ` Richard Biener 0 siblings, 1 reply; 24+ messages in thread From: Uros Bizjak @ 2017-08-15 21:20 UTC (permalink / raw) To: Richard Biener; +Cc: Joseph Myers, Jan Hubicka, gcc-patches On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener <richard.guenther@gmail.com> wrote: > So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1")) > as the sse4.1 sequence is just a single instruction. The interesting part > of the story will be to make sure we can emit that even if ! TARGET_ROUND ... > > Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc > better? Note the ABI boundary will be expensive but I guess the conditional > sequence as well (and it will disturb RA even if predicted to have SSE 4.1). TARGET_ROUND is just: /* SSE4.1 defines round instructions */ #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0) I don't remember the history around the #define, once upon a time probably made sense, but nowadays it looks that it can be simply substituted with TARGET_SSE4_1. Uros. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-15 21:20 ` Uros Bizjak @ 2017-08-16 10:51 ` Richard Biener 2017-08-16 11:04 ` Uros Bizjak 0 siblings, 1 reply; 24+ messages in thread From: Richard Biener @ 2017-08-16 10:51 UTC (permalink / raw) To: Uros Bizjak; +Cc: Joseph Myers, Jan Hubicka, gcc-patches On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener > <richard.guenther@gmail.com> wrote: > >> So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1")) >> as the sse4.1 sequence is just a single instruction. The interesting part >> of the story will be to make sure we can emit that even if ! TARGET_ROUND ... >> >> Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc >> better? Note the ABI boundary will be expensive but I guess the conditional >> sequence as well (and it will disturb RA even if predicted to have SSE 4.1). > > TARGET_ROUND is just: > > /* SSE4.1 defines round instructions */ > #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 > #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0) > > I don't remember the history around the #define, once upon a time > probably made sense, but nowadays it looks that it can be simply > substituted with TARGET_SSE4_1. Sure but we want the backend to use a TARGET_ROUND guarded define_insn when TARGET_ROUND is false but inside a runtime conditional ensuring that TARGET_ROUND is satisfied. With doing this with ifuncs we'd mark the function with a proper target attribute but within a function? Richard. > Uros. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-16 10:51 ` Richard Biener @ 2017-08-16 11:04 ` Uros Bizjak 2017-08-16 13:32 ` Uros Bizjak 0 siblings, 1 reply; 24+ messages in thread From: Uros Bizjak @ 2017-08-16 11:04 UTC (permalink / raw) To: Richard Biener; +Cc: Joseph Myers, Jan Hubicka, gcc-patches On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >> >>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1")) >>> as the sse4.1 sequence is just a single instruction. The interesting part >>> of the story will be to make sure we can emit that even if ! TARGET_ROUND ... >>> >>> Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc >>> better? Note the ABI boundary will be expensive but I guess the conditional >>> sequence as well (and it will disturb RA even if predicted to have SSE 4.1). >> >> TARGET_ROUND is just: >> >> /* SSE4.1 defines round instructions */ >> #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 >> #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0) >> >> I don't remember the history around the #define, once upon a time >> probably made sense, but nowadays it looks that it can be simply >> substituted with TARGET_SSE4_1. > > Sure but we want the backend to use a TARGET_ROUND guarded define_insn > when TARGET_ROUND is false but inside a runtime conditional ensuring that > TARGET_ROUND is satisfied. With doing this with ifuncs we'd mark the function > with a proper target attribute but within a function? How about something intrinsic headers are using? > Richard. > >> Uros. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-16 11:04 ` Uros Bizjak @ 2017-08-16 13:32 ` Uros Bizjak 2017-08-16 13:40 ` Richard Biener 0 siblings, 1 reply; 24+ messages in thread From: Uros Bizjak @ 2017-08-16 13:32 UTC (permalink / raw) To: Richard Biener; +Cc: Joseph Myers, Jan Hubicka, gcc-patches On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>> >>>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1")) >>>> as the sse4.1 sequence is just a single instruction. The interesting part >>>> of the story will be to make sure we can emit that even if ! TARGET_ROUND ... >>>> >>>> Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc >>>> better? Note the ABI boundary will be expensive but I guess the conditional >>>> sequence as well (and it will disturb RA even if predicted to have SSE 4.1). >>> >>> TARGET_ROUND is just: >>> >>> /* SSE4.1 defines round instructions */ >>> #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 >>> #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0) >>> >>> I don't remember the history around the #define, once upon a time >>> probably made sense, but nowadays it looks that it can be simply >>> substituted with TARGET_SSE4_1. >> >> Sure but we want the backend to use a TARGET_ROUND guarded define_insn >> when TARGET_ROUND is false but inside a runtime conditional ensuring that >> TARGET_ROUND is satisfied. With doing this with ifuncs we'd mark the function >> with a proper target attribute but within a function? > > How about something intrinsic headers are using? (... somehow managed to press send too early ...) There we use GCC_push_options and GCC_target pragmas. Maybe we also need corresponding __ROUND__ define defined by the compiler. Uros. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-16 13:32 ` Uros Bizjak @ 2017-08-16 13:40 ` Richard Biener 2017-08-16 14:01 ` Uros Bizjak 0 siblings, 1 reply; 24+ messages in thread From: Richard Biener @ 2017-08-16 13:40 UTC (permalink / raw) To: Uros Bizjak; +Cc: Joseph Myers, Jan Hubicka, gcc-patches On Wed, Aug 16, 2017 at 12:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener >>>> <richard.guenther@gmail.com> wrote: >>>> >>>>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1")) >>>>> as the sse4.1 sequence is just a single instruction. The interesting part >>>>> of the story will be to make sure we can emit that even if ! TARGET_ROUND ... >>>>> >>>>> Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc >>>>> better? Note the ABI boundary will be expensive but I guess the conditional >>>>> sequence as well (and it will disturb RA even if predicted to have SSE 4.1). >>>> >>>> TARGET_ROUND is just: >>>> >>>> /* SSE4.1 defines round instructions */ >>>> #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 >>>> #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0) >>>> >>>> I don't remember the history around the #define, once upon a time >>>> probably made sense, but nowadays it looks that it can be simply >>>> substituted with TARGET_SSE4_1. >>> >>> Sure but we want the backend to use a TARGET_ROUND guarded define_insn >>> when TARGET_ROUND is false but inside a runtime conditional ensuring that >>> TARGET_ROUND is satisfied. With doing this with ifuncs we'd mark the function >>> with a proper target attribute but within a function? >> >> How about something intrinsic headers are using? > > (... somehow managed to press send too early ...) > > There we use GCC_push_options and GCC_target pragmas. Maybe we also > need corresponding __ROUND__ define defined by the compiler. Those don't work inside a function. Remember I want to change the expander of ceil () to if (__builtin_cpu_supports ("sse4.1")) ceil_for_sse4.1 (); else ceil (); from the x86 target code that expands ceil for ! TARGET_ROUND. I suppose we could simply use a separate pattern for SSE 4.1 roundsd here (does it have to be an unspec? I suppose so to prevent it from being generated by other means and to prevent code motion out of the conditional?) Or forgo with the idea to use inline conditional code and emit an ifunc dispatcher, a function with the sse4.1 instruction, and a call to the dispatcher ourselves. Richard. > Uros. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions 2017-08-16 13:40 ` Richard Biener @ 2017-08-16 14:01 ` Uros Bizjak 0 siblings, 0 replies; 24+ messages in thread From: Uros Bizjak @ 2017-08-16 14:01 UTC (permalink / raw) To: Richard Biener; +Cc: Joseph Myers, Jan Hubicka, gcc-patches On Wed, Aug 16, 2017 at 12:55 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, Aug 16, 2017 at 12:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>>> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener >>>>> <richard.guenther@gmail.com> wrote: >>>>> >>>>>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1")) >>>>>> as the sse4.1 sequence is just a single instruction. The interesting part >>>>>> of the story will be to make sure we can emit that even if ! TARGET_ROUND ... >>>>>> >>>>>> Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc >>>>>> better? Note the ABI boundary will be expensive but I guess the conditional >>>>>> sequence as well (and it will disturb RA even if predicted to have SSE 4.1). >>>>> >>>>> TARGET_ROUND is just: >>>>> >>>>> /* SSE4.1 defines round instructions */ >>>>> #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 >>>>> #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0) >>>>> >>>>> I don't remember the history around the #define, once upon a time >>>>> probably made sense, but nowadays it looks that it can be simply >>>>> substituted with TARGET_SSE4_1. >>>> >>>> Sure but we want the backend to use a TARGET_ROUND guarded define_insn >>>> when TARGET_ROUND is false but inside a runtime conditional ensuring that >>>> TARGET_ROUND is satisfied. With doing this with ifuncs we'd mark the function >>>> with a proper target attribute but within a function? >>> >>> How about something intrinsic headers are using? >> >> (... somehow managed to press send too early ...) >> >> There we use GCC_push_options and GCC_target pragmas. Maybe we also >> need corresponding __ROUND__ define defined by the compiler. > > Those don't work inside a function. Remember I want to change the expander > of ceil () to > > if (__builtin_cpu_supports ("sse4.1")) > ceil_for_sse4.1 (); > else > ceil (); > > from the x86 target code that expands ceil for ! TARGET_ROUND. I suppose > we could simply use a separate pattern for SSE 4.1 roundsd here (does it > have to be an unspec? I suppose so to prevent it from being generated by > other means and to prevent code motion out of the conditional?) > > Or forgo with the idea to use inline conditional code and emit an ifunc > dispatcher, a function with the sse4.1 instruction, and a call to the dispatcher > ourselves. Hm ... Maybe in this case an example from libatomic, how cmpxchg16 is handled comes handy. Uros. ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2017-09-14 16:50 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-26 8:32 Add option for whether ceil etc. can raise "inexact", adjust x86 conditions Joseph Myers 2016-05-26 17:39 ` Uros Bizjak 2016-05-27 6:14 ` Jan Hubicka 2016-05-27 9:03 ` Joseph Myers 2016-06-02 11:54 ` Ping " Joseph Myers 2016-06-02 12:00 ` Jan Hubicka 2016-06-02 12:24 ` Bernd Schmidt 2016-06-02 12:29 ` Joseph Myers 2016-06-02 12:32 ` Joseph Myers 2017-08-15 14:11 ` Martin Jambor 2017-08-15 14:52 ` Joseph Myers 2017-09-13 17:34 ` Martin Jambor 2017-09-13 17:47 ` Joseph Myers 2017-09-14 10:04 ` Richard Biener 2017-09-14 16:50 ` Jan Hubicka 2017-08-15 15:01 ` Richard Biener 2017-08-15 16:10 ` Richard Biener 2017-08-15 16:26 ` Richard Biener 2017-08-15 21:20 ` Uros Bizjak 2017-08-16 10:51 ` Richard Biener 2017-08-16 11:04 ` Uros Bizjak 2017-08-16 13:32 ` Uros Bizjak 2017-08-16 13:40 ` Richard Biener 2017-08-16 14:01 ` Uros Bizjak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).