* pragma GCC optimize prevents inlining @ 2024-01-04 9:01 Hashan Gayasri 2024-01-04 9:27 ` LIU Hao 2024-01-04 14:51 ` David Brown 0 siblings, 2 replies; 17+ messages in thread From: Hashan Gayasri @ 2024-01-04 9:01 UTC (permalink / raw) To: gcc-help [-- Attachment #1: Type: text/plain, Size: 1399 bytes --] Hi, I noticed that GCC doesn't inline functions that have extra optimization options added via pragma GCC optimize after the pop_options statement. If the optimizations are different between the caller and the callee, seems like gcc behaves as if the called function's definition is opaque and not visible at all (as if it was declared in a different translation unit). It doesn't even notice that the function doesn't have side effects unless marked so explicitly. I wanted the following to be to be optimized: #pragma GCC push_options #pragma GCC optimize ("-ffast-math") inline int64_t __attribute__ ((const)) RoundToNearestLong (double v) { assert(fegetround() == FE_TONEAREST); return std::lrint(v); } #pragma GCC pop_options So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It does that but prevents the instruction from being inlined. I complied with - O3 -march=native -DNDEBUG. If I used attribute always_inline, the inlined version didn't seem to respect the additional optimization options I provided. const and pure function attributes helped in removing unused/no side effect code paths but the function still got called. Please advice if there's any way work around for this. Using intrinsics works but less than ideal. (Also let me know if I should be sending this to a different mailing list) Thanks in advance! Best Regards, Hashan Gayasri ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 9:01 pragma GCC optimize prevents inlining Hashan Gayasri @ 2024-01-04 9:27 ` LIU Hao 2024-01-05 0:56 ` Hashan Gayasri 2024-01-04 14:51 ` David Brown 1 sibling, 1 reply; 17+ messages in thread From: LIU Hao @ 2024-01-04 9:27 UTC (permalink / raw) To: Hashan Gayasri, gcc-help [-- Attachment #1.1: Type: text/plain, Size: 976 bytes --] 在 2024/1/4 17:01, Hashan Gayasri via Gcc-help 写道: > I wanted the following to be to be optimized: > > (... ...) > > So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It > does that but prevents the instruction from being inlined. I complied with > - O3 -march=native -DNDEBUG. Actually `-ffast-math` is an overkill; `-fno-math-errno` isn't practically bad, and can be enabled globally: (https://gcc.godbolt.org/z/hhfP6cYrr) ``` //#pragma GCC push_options //#pragma GCC optimize ("-ffast-math") inline int64_t __attribute__ ((const)) RoundToNearestLong (double v) { // assert(fegetround() == FE_TONEAREST); return lrint(v); } //#pragma GCC pop_options void xgset(int64_t& r, double s) { r = RoundToNearestLong(s); } ``` results in ``` xgset(long&, double): vcvtsd2si rax, xmm0 mov QWORD PTR [rdi], rax ret ``` -- Best regards, LIU Hao [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 9:27 ` LIU Hao @ 2024-01-05 0:56 ` Hashan Gayasri 0 siblings, 0 replies; 17+ messages in thread From: Hashan Gayasri @ 2024-01-05 0:56 UTC (permalink / raw) To: LIU Hao; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 2026 bytes --] Hi Hao, Thanks for the suggestion! Yes, - fno-math-errno is definitely more suitable for enabling globally than -ffast-math. While I don't particularly remember using errno for math functions in particularm it's used in non-math functions. So eventhough it seems reasonable to be enabled globally, still a bit tricky to validate that it doesn't cause any unintentional side effects with a large codebase with 3rd parety libs. Another weird side effect I noticed is GCC still doesn't inline the function when the function is within a `pragma GCC optimize ("-fno-math-errno") ` region and -ffast-math is enabled globally eventhough -fno-math-errno is a subset. If you enable both -ffast-math and -fno-math-errno, globally, the function gets inlined. I'm not sure if improving that should be considered as a bug-fix or a feature/enhancement. Best Regards, Hashan Gayasri On Thu, 4 Jan 2024, 8:28 pm LIU Hao, <lh_mouse@126.com> wrote: > 在 2024/1/4 17:01, Hashan Gayasri via Gcc-help 写道: > > I wanted the following to be to be optimized: > > > > (... ...) > > > > So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It > > does that but prevents the instruction from being inlined. I complied > with > > - O3 -march=native -DNDEBUG. > > Actually `-ffast-math` is an overkill; `-fno-math-errno` isn't practically > bad, and can be enabled > globally: > (https://gcc.godbolt.org/z/hhfP6cYrr) > > ``` > //#pragma GCC push_options > //#pragma GCC optimize ("-ffast-math") > > inline int64_t __attribute__ ((const)) RoundToNearestLong (double v) > { > // assert(fegetround() == FE_TONEAREST); > return lrint(v); > } > > //#pragma GCC pop_options > > void > xgset(int64_t& r, double s) > { > r = RoundToNearestLong(s); > } > ``` > > results in > ``` > xgset(long&, double): > vcvtsd2si rax, xmm0 > mov QWORD PTR [rdi], rax > ret > ``` > > > > -- > Best regards, > LIU Hao > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 9:01 pragma GCC optimize prevents inlining Hashan Gayasri 2024-01-04 9:27 ` LIU Hao @ 2024-01-04 14:51 ` David Brown 2024-01-04 15:03 ` Segher Boessenkool 1 sibling, 1 reply; 17+ messages in thread From: David Brown @ 2024-01-04 14:51 UTC (permalink / raw) To: gcc-help On 04/01/2024 10:01, Hashan Gayasri via Gcc-help wrote: > Hi, > > I noticed that GCC doesn't inline functions that have extra optimization > options added via pragma GCC optimize after the pop_options statement. > > If the optimizations are different between the caller and the callee, seems > like gcc behaves as if the called function's definition is opaque and not > visible at all (as if it was declared in a different translation unit). It > doesn't even notice that the function doesn't have side effects unless > marked so explicitly. > > I wanted the following to be to be optimized: > > #pragma GCC push_options > #pragma GCC optimize ("-ffast-math") > > inline int64_t __attribute__ ((const)) RoundToNearestLong (double v) > { > assert(fegetround() == FE_TONEAREST); > return std::lrint(v); > } > > #pragma GCC pop_options > > > So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It > does that but prevents the instruction from being inlined. I complied with > - O3 -march=native -DNDEBUG. > > If I used attribute always_inline, the inlined version didn't seem to > respect the additional optimization options I provided. > const and pure function attributes helped in removing unused/no side effect > code paths but the function still got called. > > > Please advice if there's any way work around for this. Using intrinsics > works but less than ideal. > > (Also let me know if I should be sending this to a different mailing list) > > Thanks in advance! > > Best Regards, > Hashan Gayasri > This is a general limitation in GCC, as far as I know. I have come across it myself (in my case it was the "-fwrapv" flag). As far as I remember from a previous discussion long ago, there is no easy workaround. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 14:51 ` David Brown @ 2024-01-04 15:03 ` Segher Boessenkool 2024-01-04 15:24 ` David Brown 0 siblings, 1 reply; 17+ messages in thread From: Segher Boessenkool @ 2024-01-04 15:03 UTC (permalink / raw) To: David Brown; +Cc: gcc-help On Thu, Jan 04, 2024 at 03:51:23PM +0100, David Brown via Gcc-help wrote: > This is a general limitation in GCC, as far as I know. I have come > across it myself (in my case it was the "-fwrapv" flag). As far as I > remember from a previous discussion long ago, there is no easy workaround. What are the expected semantics? That depends on the use case, so on what the user expects. If the compiler inlines the function and picks either set of options, it may do something the user wanted to avoid. Not good. The user can always write exactly what the user wants, instead :-) Maybe we could have an option -fallow-inlining-that-changes-semantics? Not sure if people will actually find that useful, but at least they cannot say they weren't warned if they use that ;-) Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 15:03 ` Segher Boessenkool @ 2024-01-04 15:24 ` David Brown 2024-01-04 16:37 ` Richard Earnshaw 2024-01-04 16:55 ` Segher Boessenkool 0 siblings, 2 replies; 17+ messages in thread From: David Brown @ 2024-01-04 15:24 UTC (permalink / raw) To: gcc-help On 04/01/2024 16:03, Segher Boessenkool wrote: > On Thu, Jan 04, 2024 at 03:51:23PM +0100, David Brown via Gcc-help wrote: >> This is a general limitation in GCC, as far as I know. I have come >> across it myself (in my case it was the "-fwrapv" flag). As far as I >> remember from a previous discussion long ago, there is no easy workaround. > > What are the expected semantics? That depends on the use case, so on > what the user expects. If the compiler inlines the function and picks > either set of options, it may do something the user wanted to avoid. > Not good. > Yes, I realise that's a key problem. What I wanted in my own case was that this function: #pragma GCC push_options #pragma GCC optimize("-fwrapv") inline int32_t add(int32_t x, int32_t y) { return x + y; } #pragma GCC pop_options would work exactly as though I had : inline int32_t add(int32_t x, int32_t y) { return (int32_t) ((uint32_t) x + (uint32_t) y); } > The user can always write exactly what the user wants, instead :-) In my case, I wrote it in that second version! But had "-fwrapv" worked through inlining, it would have been convenient and neat, especially as I had several related functions (for a wrapping-integer class). More generally, I think the expected semantics are that the additional options apply to code inside the function, and at the boundary you don't care which set of options apply. So if you have normal floating point code that sets "x", and then call an inline function with -ffast-math using "x" as a parameter and returning "y", then the inlined could can assume "x" is finite and not a NaN, and the later code can assume the returned value "y" is similarly finite. If the calculations for "x" produce a NaN, then the code will be UB - that's the programmer's fault. > > Maybe we could have an option -fallow-inlining-that-changes-semantics? > Not sure if people will actually find that useful, but at least they > cannot say they weren't warned if they use that ;-) > > > Segher > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 15:24 ` David Brown @ 2024-01-04 16:37 ` Richard Earnshaw 2024-01-09 13:38 ` Florian Weimer 2024-01-04 16:55 ` Segher Boessenkool 1 sibling, 1 reply; 17+ messages in thread From: Richard Earnshaw @ 2024-01-04 16:37 UTC (permalink / raw) To: David Brown, gcc-help On 04/01/2024 15:24, David Brown via Gcc-help wrote: > On 04/01/2024 16:03, Segher Boessenkool wrote: >> On Thu, Jan 04, 2024 at 03:51:23PM +0100, David Brown via Gcc-help wrote: >>> This is a general limitation in GCC, as far as I know. I have come >>> across it myself (in my case it was the "-fwrapv" flag). As far as I >>> remember from a previous discussion long ago, there is no easy >>> workaround. >> >> What are the expected semantics? That depends on the use case, so on >> what the user expects. If the compiler inlines the function and picks >> either set of options, it may do something the user wanted to avoid. >> Not good. >> > > Yes, I realise that's a key problem. > > What I wanted in my own case was that this function: > > #pragma GCC push_options > #pragma GCC optimize("-fwrapv") > inline int32_t add(int32_t x, int32_t y) { > return x + y; > } > #pragma GCC pop_options > > would work exactly as though I had : > > inline int32_t add(int32_t x, int32_t y) { > return (int32_t) ((uint32_t) x + (uint32_t) y); > } > >> The user can always write exactly what the user wants, instead :-) > > In my case, I wrote it in that second version! But had "-fwrapv" worked > through inlining, it would have been convenient and neat, especially as > I had several related functions (for a wrapping-integer class). > > > More generally, I think the expected semantics are that the additional > options apply to code inside the function, and at the boundary you don't > care which set of options apply. So if you have normal floating point > code that sets "x", and then call an inline function with -ffast-math > using "x" as a parameter and returning "y", then the inlined could can > assume "x" is finite and not a NaN, and the later code can assume the > returned value "y" is similarly finite. If the calculations for "x" > produce a NaN, then the code will be UB - that's the programmer's fault. > >> >> Maybe we could have an option -fallow-inlining-that-changes-semantics? >> Not sure if people will actually find that useful, but at least they >> cannot say they weren't warned if they use that ;-) >> >> >> Segher >> > > The general problem here is that the AST does not carry enough information to handle all these cases within a single function. To merge wrap/no-wrap code, for example, we'd need separate codes for ADD, SUBTRACT, MULTIPLY, etc. At present that 'state' is carried globally, which makes merging impossible without losing the state information. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 16:37 ` Richard Earnshaw @ 2024-01-09 13:38 ` Florian Weimer 0 siblings, 0 replies; 17+ messages in thread From: Florian Weimer @ 2024-01-09 13:38 UTC (permalink / raw) To: Richard Earnshaw via Gcc-help; +Cc: David Brown, Richard Earnshaw * Richard Earnshaw via Gcc-help: > The general problem here is that the AST does not carry enough > information to handle all these cases within a single function. To > merge wrap/no-wrap code, for example, we'd need separate codes for > ADD, SUBTRACT, MULTIPLY, etc. __builtin_add_overflow et al. are already specified to implement wraparound behavior on overflow, so that particular case should be covered to some extent. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 15:24 ` David Brown 2024-01-04 16:37 ` Richard Earnshaw @ 2024-01-04 16:55 ` Segher Boessenkool 2024-01-05 14:24 ` David Brown 1 sibling, 1 reply; 17+ messages in thread From: Segher Boessenkool @ 2024-01-04 16:55 UTC (permalink / raw) To: David Brown; +Cc: gcc-help On Thu, Jan 04, 2024 at 04:24:20PM +0100, David Brown via Gcc-help wrote: > In my case, I wrote it in that second version! But had "-fwrapv" worked > through inlining, it would have been convenient and neat, especially as > I had several related functions (for a wrapping-integer class). Most things work on function basis; almost nothing works per RTL instruction. There is no per-instruction representation for -fwrapv in the RTL stream. Things are even worse for -O2 vs. -O3 etc. > More generally, I think the expected semantics are that the additional > options apply to code inside the function, and at the boundary you don't > care which set of options apply. So if you have normal floating point > code that sets "x", and then call an inline function with -ffast-math > using "x" as a parameter and returning "y", then the inlined could can > assume "x" is finite and not a NaN, and the later code can assume the > returned value "y" is similarly finite. If the calculations for "x" > produce a NaN, then the code will be UB - that's the programmer's fault. Yes, but that is only true for -ffast-math (which means "the user does not care about correct results" anyway). You would not typically want random nearby code to use -fwrapv as well, for example. Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-04 16:55 ` Segher Boessenkool @ 2024-01-05 14:24 ` David Brown 2024-01-05 15:00 ` Segher Boessenkool 0 siblings, 1 reply; 17+ messages in thread From: David Brown @ 2024-01-05 14:24 UTC (permalink / raw) To: gcc-help On 04/01/2024 17:55, Segher Boessenkool wrote: > On Thu, Jan 04, 2024 at 04:24:20PM +0100, David Brown via Gcc-help wrote: >> In my case, I wrote it in that second version! But had "-fwrapv" worked >> through inlining, it would have been convenient and neat, especially as >> I had several related functions (for a wrapping-integer class). > > Most things work on function basis; almost nothing works per RTL > instruction. There is no per-instruction representation for -fwrapv > in the RTL stream. > Yes, I appreciate that. And I can also imagine that carrying such option information in the AST to make this possible would be a significant burden, and very rarely of benefit - so unless there is some other important use-case then it is not a good trade-off. > Things are even worse for -O2 vs. -O3 etc. > >> More generally, I think the expected semantics are that the additional >> options apply to code inside the function, and at the boundary you don't >> care which set of options apply. So if you have normal floating point >> code that sets "x", and then call an inline function with -ffast-math >> using "x" as a parameter and returning "y", then the inlined could can >> assume "x" is finite and not a NaN, and the later code can assume the >> returned value "y" is similarly finite. If the calculations for "x" >> produce a NaN, then the code will be UB - that's the programmer's fault. > > Yes, but that is only true for -ffast-math (which means "the user does > not care about correct results" anyway). (Getting a little off-topic... Um, that's not what "-ffast-math" means. It means "the user is using floating point as a close approximation to real number arithmetic, and promises to stick to numerically stable calculations". All my uses of floating point are done with "-ffast-math", and I /do/ care that the results are correct. But the definition of "correct" for my work is "as close to the theoretical real number result as you can get with a limited accuracy format, plus or minus small rounding errors". For other people, full IEEE compliance, support for NaNs, and bit-perfect repeatable results regardless of optimisations and target details, are important for correctness. And that's fine, and it's great that gcc supports both kinds of code - though I believe that "-ffast-math" would actually be more appropriate for a large proportion of programs.) > > You would not typically want random nearby code to use -fwrapv as well, > for example. > No, not normally. (Some people would like C to work that way, but not me!) > > Segher > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-05 14:24 ` David Brown @ 2024-01-05 15:00 ` Segher Boessenkool 2024-01-05 15:53 ` David Brown 0 siblings, 1 reply; 17+ messages in thread From: Segher Boessenkool @ 2024-01-05 15:00 UTC (permalink / raw) To: David Brown; +Cc: gcc-help On Fri, Jan 05, 2024 at 03:24:48PM +0100, David Brown via Gcc-help wrote: > On 04/01/2024 17:55, Segher Boessenkool wrote: > >Most things work on function basis; almost nothing works per RTL > >instruction. There is no per-instruction representation for -fwrapv > >in the RTL stream. > > Yes, I appreciate that. And I can also imagine that carrying such > option information in the AST to make this possible would be a > significant burden, and very rarely of benefit - so unless there is some > other important use-case then it is not a good trade-off. Things like -fwrapv and -ftrapv have semantics that naturally could be done per-insn. Many things are not like that :-/ But even then, what is supposed to happen if some optimisation works on a bunch of insns, some with -fwrapv (or -ftrapv) semantics and some not? The only safe thing to do is to not allow any transformations on mixed insns at all. > >Yes, but that is only true for -ffast-math (which means "the user does > >not care about correct results" anyway). > > (Getting a little off-topic... > > Um, that's not what "-ffast-math" means. It means "the user is using > floating point as a close approximation to real number arithmetic, and > promises to stick to numerically stable calculations". All my uses of > floating point are done with "-ffast-math", and I /do/ care that the > results are correct. But the definition of "correct" for my work is "as > close to the theoretical real number result as you can get with a > limited accuracy format, plus or minus small rounding errors". -ffast-math is allowed to introduce any rounding error it wants. Which can (in a loop for example) easily introduce unlimited rounding error, bigger than the actual result. And this is not just theoretical either. Yes, there is a lot of code where this doesn't matter, in practice. How lucky do you feel today? The only way to safely use -ffast-math is to inspect the generated machine code. After each and every compilation you do. And everyone who uses a different compiler version (or is on a different target, etc.) has to do the same thing. > For other people, full IEEE compliance, support for NaNs, and > bit-perfect repeatable results regardless of optimisations and target > details, are important for correctness. And that's fine, and it's great > that gcc supports both kinds of code - though I believe that > "-ffast-math" would actually be more appropriate for a large proportion > of programs.) Most people think that IEEE 754 was a huge step forward over wild west floating point like we used decades ago. Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-05 15:00 ` Segher Boessenkool @ 2024-01-05 15:53 ` David Brown 2024-01-05 18:19 ` Segher Boessenkool 0 siblings, 1 reply; 17+ messages in thread From: David Brown @ 2024-01-05 15:53 UTC (permalink / raw) To: gcc-help On 05/01/2024 16:00, Segher Boessenkool wrote: > On Fri, Jan 05, 2024 at 03:24:48PM +0100, David Brown via Gcc-help wrote: >> On 04/01/2024 17:55, Segher Boessenkool wrote: >>> Most things work on function basis; almost nothing works per RTL >>> instruction. There is no per-instruction representation for -fwrapv >>> in the RTL stream. >> >> Yes, I appreciate that. And I can also imagine that carrying such >> option information in the AST to make this possible would be a >> significant burden, and very rarely of benefit - so unless there is some >> other important use-case then it is not a good trade-off. > > Things like -fwrapv and -ftrapv have semantics that naturally could be > done per-insn. Many things are not like that :-/ Indeed. > > But even then, what is supposed to happen if some optimisation works on > a bunch of insns, some with -fwrapv (or -ftrapv) semantics and some not? > The only safe thing to do is to not allow any transformations on mixed > insns at all. Sometimes mixing would be possible, sometimes not. You can't mix "trap on signed integer overflow" with "wrap on signed integer overflow" and expect useful results. But you /can/ mix "wrap on signed integer overflow" with "signed integer overflow is UB" - then you wrap. But I can't imagine it's worth the GCC development time trying to figure out what could work and what could not work, and implementing this, unless someone is /really/ bored! After all, this can all be done by hand using conversions to unsigned types, and the __builtin_overflow() functions when needed. > >>> Yes, but that is only true for -ffast-math (which means "the user does >>> not care about correct results" anyway). >> >> (Getting a little off-topic... >> >> Um, that's not what "-ffast-math" means. It means "the user is using >> floating point as a close approximation to real number arithmetic, and >> promises to stick to numerically stable calculations". All my uses of >> floating point are done with "-ffast-math", and I /do/ care that the >> results are correct. But the definition of "correct" for my work is "as >> close to the theoretical real number result as you can get with a >> limited accuracy format, plus or minus small rounding errors". > > -ffast-math is allowed to introduce any rounding error it wants. Which > can (in a loop for example) easily introduce unlimited rounding error, > bigger than the actual result. And this is not just theoretical either. > Normal maths mode can also lead to rounding errors that can build up - the fact that rounding is carefully specified with IEEE does not mean there are no errors (compared to the theoretical perfect real-number calculation). It may be easier to get problems with -ffast-math, and you may get them with smaller loop counts, but it is inevitable that any finite approximation to real numbers will lead to errors, and that some calculations will be numerically unstable. IEEE means that you can do your testing on a fast PC and then deploy your code in a 16-bit microcontroller and have identical stability - but it does not mean that you don't get rounding errors. > Yes, there is a lot of code where this doesn't matter, in practice. How > lucky do you feel today? I use gcc, so I feel pretty lucky :-) The rounding errors in -ffast-math will be very similar to those in IEEE mode, for normal numbers. The operations are the same - it all translates to the same floating point cpu instructions, or the same software floating point library calls. You don't have control of rounding modes, so you have to assume that rounding will be the least helpful of any FLT_ROUNDS setting - but you will not get worse than that. This is a "quality of implementation" issue, rather than a specified guarantee, but compiler users rely on good quality implementation all the time. After all, there are no guarantees in the C standards or in the gcc user manual that integer multiplication will be done using efficient code rather than repeated addition in a loop. -ffast-math allows some changes to the order of calculations, or contracting of expressions, so you need to take that into account. But then, you need to take it into account in the way you write your expressions in IEEE mode too, and unless you put a lot of effort into picking your expression ordering, the -ffast-math re-arrangements are as likely to improve your results (in terms of the difference compared to theoretical results) as they are to make them worse. Basically, I assume that the GCC developers try to be sensible and helpful, and do not go out of their way to generate intentionally bad code for people who use one of their optimisation flags. I assume that if "-ffast-math" and the associated sub-flags were as big a risk as you are implying, they would have been removed from gcc or at least a big red warning would be added to the manual. So far, I've been lucky! > > The only way to safely use -ffast-math is to inspect the generated > machine code. After each and every compilation you do. And everyone > who uses a different compiler version (or is on a different target, > etc.) has to do the same thing. > I do actually check the generated code for some of what I do. I can't say I have ever felt the need to check generated floating point code because I worry about the correctness, but sometimes I do so to see if I've got the efficiency I expect (this is not floating point specific). And I also consider exact compiler versions and build flags as a part of my projects - bit-perfect repeatable builds are important in my work, so I don't change compiler versions or targets within a project without very good reason and a great deal of checking and re-testing. >> For other people, full IEEE compliance, support for NaNs, and >> bit-perfect repeatable results regardless of optimisations and target >> details, are important for correctness. And that's fine, and it's great >> that gcc supports both kinds of code - though I believe that >> "-ffast-math" would actually be more appropriate for a large proportion >> of programs.) > > Most people think that IEEE 754 was a huge step forward over wild west > floating point like we used decades ago. > Oh, sure - no doubts there. But it has plenty of features that are of no use to me, in my work, and I am happy to ignore them and have gcc generate the best code it can while ignoring things I don't need. David ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-05 15:53 ` David Brown @ 2024-01-05 18:19 ` Segher Boessenkool 2024-01-06 17:02 ` David Brown 0 siblings, 1 reply; 17+ messages in thread From: Segher Boessenkool @ 2024-01-05 18:19 UTC (permalink / raw) To: David Brown; +Cc: gcc-help On Fri, Jan 05, 2024 at 04:53:35PM +0100, David Brown via Gcc-help wrote: > >-ffast-math is allowed to introduce any rounding error it wants. Which > >can (in a loop for example) easily introduce unlimited rounding error, > >bigger than the actual result. And this is not just theoretical either. > > > > Normal maths mode can also lead to rounding errors that can build up - > the fact that rounding is carefully specified with IEEE does not mean > there are no errors (compared to the theoretical perfect real-number > calculation). That's not the point. A program can be perfectly fine, with bounded errors and all, and then -ffast-math will typically completely destroy all that, and replace all arithmetic by the equivalent of a dice roll. That has nothing to do with the fact that all floating point arithmetic is an approximation to real arithmetic (arithmetic on real numbers). The semantics of 754 (or any other standard followed) make it clear what the exact behaviour should be, and -ffast-math tells the compiler to ignore that and do whatever instead. You cannot have reasonable programs that way. > The rounding errors in -ffast-math will be very similar to those in IEEE > mode, for normal numbers. No, not at all. Look at what -fassociative-math does, for example. This can **and does** cause the loss of **all** bits of precision in certain programs. This is not theoretical. This is real. The -ffast-math flag can only reasonably be used with programs that did not want any specific results anyway. It would be even faster (and just as correct!) to always return 0. Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-05 18:19 ` Segher Boessenkool @ 2024-01-06 17:02 ` David Brown 2024-01-07 17:51 ` Segher Boessenkool 0 siblings, 1 reply; 17+ messages in thread From: David Brown @ 2024-01-06 17:02 UTC (permalink / raw) To: gcc-help On 05/01/2024 19:19, Segher Boessenkool wrote: > On Fri, Jan 05, 2024 at 04:53:35PM +0100, David Brown via Gcc-help wrote: >>> -ffast-math is allowed to introduce any rounding error it wants. Which >>> can (in a loop for example) easily introduce unlimited rounding error, >>> bigger than the actual result. And this is not just theoretical either. >>> >> >> Normal maths mode can also lead to rounding errors that can build up - >> the fact that rounding is carefully specified with IEEE does not mean >> there are no errors (compared to the theoretical perfect real-number >> calculation). > > That's not the point. A program can be perfectly fine, with bounded > errors and all, and then -ffast-math will typically completely destroy > all that, and replace all arithmetic by the equivalent of a dice roll. > The only difference between IEEE calculations and -ffast-math calculations is that with IEEE, the ordering and rounding is controlled and consistent. For any given /single/ arithmetic operation that is performed, each can have the same amount of rounding error or error due to the limited length of the mantissa. Agreed? If you have a /sequence/ of calculations using IEEE, then the order of the operations and the types of roundings and other errors will be defined and consistent. It won't change if you change options, optimisations, compilers, targets. It won't change if you make changes to the source code that should not affect the result. So if you do extensive and careful analysis about possible maximum errors, and wide-ranging testing of possible inputs, you be confident of the accuracy of the results despite the inherent rounding errors. If you have the same code, but use -ffast-math, then the order the calculations will be done may change unexpected, or they can be combined or modified in certain ways. You don't have the consistency. If you do extensive worst-case analysis and testing, you can be confident in the accuracy of the results. The reality is that usually, people don't do any kind of serious analysis. Of course /some/ will do so, but most people will not. They will not think about the details - because they don't have to. They will not care whether they write "(a + b) + c" or "a + (b + c)", or which the compiler does first. It does not matter if the results are consistent - they are using types with far more accuracy than they need, and rounding on the least significant bits does not affect them. They don't care - and perhaps don't even know - how precision can be lost when adding or subtracting numbers with significantly different magnitudes - because they are not doing that, at least when taking into account the number of bits in the mantissa of a double. IEEE ordering is about consistency - it is not about correctness, or accuracy. Indeed, it is quite likely that associative re-arrangements under -ffast-math give results that are closer to the mathematically correct real maths calculations. (Some other optimisations, like multiplying by reciprocals for division, will likely be slightly less accurate.) I fully appreciate that consistency is often important, and can easily be more important than absolute accuracy. (I work with real-time systems - I have often had to explain the difference between "real time" and "fast".) No matter how you are doing your calculations, you should understand your requirements, and you should understand the limitations of floating point calculations - as IEEE or -ffast-math. It is reasonable to say that you shouldn't use -ffast-math unless you know it's okay for your needs, but I think that applies to any floating point work. (Indeed, it is also true for integers - you should not use an integer type unless you are sure its range is suitable for your needs.) But it is simply wrong to suggest that -ffast-math is inaccurate and the results are a matter of luck, unless you also consider IEEE maths to be inaccurate and a matter of luck. > That has nothing to do with the fact that all floating point arithmetic > is an approximation to real arithmetic (arithmetic on real numbers). > The semantics of 754 (or any other standard followed) make it clear what > the exact behaviour should be, and -ffast-math tells the compiler to > ignore that and do whatever instead. You cannot have reasonable > programs that way. That's not what "-ffast-math" does. I really don't understand why you think that. It is arguably an insult to the GCC developers - do you really think they'd put in an option in the compiler that is not merely useless, but is deceptively dangerous and designed specifically to break people's code and give them incorrect results? "-ffast-math" is an important optimisation for a lot of code. It makes it a great deal easier for the compiler to use things like SIMD instructions for parallel calculations, since there is no need to track faults like overflows or NaN signals. It means the compiler can make better use of limited hardware - there is a /lot/ of floating point hardware around that is "-ffast-math" compatible but not IEEE compatible. That applies to many kinds of vector and SIMD units, graphics card units, embedded processors, and other systems that skip handling of infinities, NaNs, signals, traps, etc., in order to be smaller, cheaper, faster and lower power. The importance of these optimisations can be seen in that "-ffast-math" was included in the relatively new "-Ofast" flag. And the relatively new "__builtin_assoc_barrier()" function exists solely for use in "-ffast-math" mode (or at least, "-fassociative-math"). This shows to me that the GCC developers see "-ffast-math" as important, relevant and useful, even if it is not something all users want. > >> The rounding errors in -ffast-math will be very similar to those in IEEE >> mode, for normal numbers. > > No, not at all. Look at what -fassociative-math does, for example. > This can **and does** cause the loss of **all** bits of precision in > certain programs. This is not theoretical. This is real. a = 1e120; b = 2; x = (a + b) - a; IEEE rules will give "x" equal to 1e120 - mathematically /completely/ wrong. -ffast-math will give "x" equal to 2, which is mathematically precisely correct. Sometimes -fassociative-math will give you worse results, sometimes it will give you better results. /Not/ using it can, and will, lead to losing all bits of precision. That is equally real. The simple matter is that if you want good results from your floating point, you need to have calculations that are appropriate for your inputs - or inputs that are appropriate for your calculations. That applies /equally/ whether you use -ffast-math or not. > > The -ffast-math flag can only reasonably be used with programs that did > not want any specific results anyway. It would be even faster (and just > as correct!) to always return 0. > That is simply wrong. If you still don't understand what I am saying, then I think this mailing list is probably not the best place for such a discussion (unless others here want to chime in). There are no doubt appropriate forums where experts on floating point mathematics hang out, and can give far better explanations that I could - but I don't know where. This is not something that interests me enough - I know enough to be fully confident in the floating point I need for my own uses, and fully confident that "-ffast-math" gives me what I need with more efficient results than not using it would. I know enough to know where my limits are, and when I would need a lot more thought and analysis, or outside help and advice. David ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-06 17:02 ` David Brown @ 2024-01-07 17:51 ` Segher Boessenkool 2024-01-07 18:36 ` Gabriel Ravier 2024-01-08 15:53 ` David Brown 0 siblings, 2 replies; 17+ messages in thread From: Segher Boessenkool @ 2024-01-07 17:51 UTC (permalink / raw) To: David Brown; +Cc: gcc-help On Sat, Jan 06, 2024 at 06:02:45PM +0100, David Brown wrote: > On 05/01/2024 19:19, Segher Boessenkool wrote: > >That's not the point. A program can be perfectly fine, with bounded > >errors and all, and then -ffast-math will typically completely destroy > >all that, and replace all arithmetic by the equivalent of a dice roll. > > The only difference between IEEE calculations and -ffast-math > calculations is that with IEEE, the ordering and rounding is controlled > and consistent. No, that is not the only difference. '-ffast-math' Sets the options '-fno-math-errno', '-funsafe-math-optimizations', '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans', '-fcx-limited-range' and '-fexcess-precision=fast'. Many of those do much more than what you say, can result in the compiler generating completely different code. > For any given /single/ arithmetic operation that is > performed, each can have the same amount of rounding error or error due > to the limited length of the mantissa. Agreed? I don't understand what you mean to say even. > >>The rounding errors in -ffast-math will be very similar to those in IEEE > >>mode, for normal numbers. > > > >No, not at all. Look at what -fassociative-math does, for example. > >This can **and does** cause the loss of **all** bits of precision in > >certain programs. This is not theoretical. This is real. > > a = 1e120; > b = 2; > > x = (a + b) - a; > > IEEE rules will give "x" equal to 1e120 - mathematically /completely/ > wrong. -ffast-math will give "x" equal to 2, which is mathematically > precisely correct. The IEEE result is 0. Which is the **exactly correct** result. This is a computer program, not some formulas that you can manipulate at will. > >The -ffast-math flag can only reasonably be used with programs that did > >not want any specific results anyway. It would be even faster (and just > >as correct!) to always return 0. > > That is simply wrong. It is an exaggeration for dramatic effect, but it is fundamentally correct. Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-07 17:51 ` Segher Boessenkool @ 2024-01-07 18:36 ` Gabriel Ravier 2024-01-08 15:53 ` David Brown 1 sibling, 0 replies; 17+ messages in thread From: Gabriel Ravier @ 2024-01-07 18:36 UTC (permalink / raw) To: Segher Boessenkool, David Brown; +Cc: gcc-help On 1/7/24 17:51, Segher Boessenkool wrote: > On Sat, Jan 06, 2024 at 06:02:45PM +0100, David Brown wrote: >> On 05/01/2024 19:19, Segher Boessenkool wrote: >>> That's not the point. A program can be perfectly fine, with bounded >>> errors and all, and then -ffast-math will typically completely destroy >>> all that, and replace all arithmetic by the equivalent of a dice roll. >> The only difference between IEEE calculations and -ffast-math >> calculations is that with IEEE, the ordering and rounding is controlled >> and consistent. > No, that is not the only difference. > > '-ffast-math' > Sets the options '-fno-math-errno', '-funsafe-math-optimizations', > '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans', > '-fcx-limited-range' and '-fexcess-precision=fast'. > > Many of those do much more than what you say, can result in the compiler > generating completely different code. > >> For any given /single/ arithmetic operation that is >> performed, each can have the same amount of rounding error or error due >> to the limited length of the mantissa. Agreed? > I don't understand what you mean to say even. > >>>> The rounding errors in -ffast-math will be very similar to those in IEEE >>>> mode, for normal numbers. >>> No, not at all. Look at what -fassociative-math does, for example. >>> This can **and does** cause the loss of **all** bits of precision in >>> certain programs. This is not theoretical. This is real. >> a = 1e120; >> b = 2; >> >> x = (a + b) - a; >> >> IEEE rules will give "x" equal to 1e120 - mathematically /completely/ >> wrong. -ffast-math will give "x" equal to 2, which is mathematically >> precisely correct. > The IEEE result is 0. Which is the **exactly correct** result. This is > a computer program, not some formulas that you can manipulate at will. That seems to be where the disagreement lies. Those that use -ffast-math with full knowledge of what it does are presumably acting with the intent that their program should indeed be treated as "some formulas you can manipulate at will". > >>> The -ffast-math flag can only reasonably be used with programs that did >>> not want any specific results anyway. It would be even faster (and just >>> as correct!) to always return 0. >> That is simply wrong. > It is an exaggeration for dramatic effect, but it is fundamentally > correct. > > > Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining 2024-01-07 17:51 ` Segher Boessenkool 2024-01-07 18:36 ` Gabriel Ravier @ 2024-01-08 15:53 ` David Brown 1 sibling, 0 replies; 17+ messages in thread From: David Brown @ 2024-01-08 15:53 UTC (permalink / raw) To: gcc-help On 07/01/2024 18:51, Segher Boessenkool wrote: > On Sat, Jan 06, 2024 at 06:02:45PM +0100, David Brown wrote: >> On 05/01/2024 19:19, Segher Boessenkool wrote: >>> That's not the point. A program can be perfectly fine, with bounded >>> errors and all, and then -ffast-math will typically completely destroy >>> all that, and replace all arithmetic by the equivalent of a dice roll. >> >> The only difference between IEEE calculations and -ffast-math >> calculations is that with IEEE, the ordering and rounding is controlled >> and consistent. > > No, that is not the only difference. > > '-ffast-math' > Sets the options '-fno-math-errno', '-funsafe-math-optimizations', > '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans', > '-fcx-limited-range' and '-fexcess-precision=fast'. > > Many of those do much more than what you say, can result in the compiler > generating completely different code. I know what these do - they are described in the gcc manual. And they are all good things for the kind of code I write. But I did not list them in my posts because it would take too much space to include them all, every time - I have just concentrated on a couple of points. > >> For any given /single/ arithmetic operation that is >> performed, each can have the same amount of rounding error or error due >> to the limited length of the mantissa. Agreed? > > I don't understand what you mean to say even. I mean that if you write "x = a + b;" for floating point types, you will, in general, get a rounding error. And the magnitude of the worst-case rounding error will be the same whether you are using IEEE rules or "-ffast-math" rules. With IEEE the rounding error will be consistent and predictable, and for some cases that is important - but it will not be less of a rounding error. > >>>> The rounding errors in -ffast-math will be very similar to those in IEEE >>>> mode, for normal numbers. >>> >>> No, not at all. Look at what -fassociative-math does, for example. >>> This can **and does** cause the loss of **all** bits of precision in >>> certain programs. This is not theoretical. This is real. >> >> a = 1e120; >> b = 2; >> >> x = (a + b) - a; >> >> IEEE rules will give "x" equal to 1e120 - mathematically /completely/ >> wrong. -ffast-math will give "x" equal to 2, which is mathematically >> precisely correct. > > The IEEE result is 0. Sorry, of course that is what the IEEE rules will give you. It does not help if I make silly mistakes like that! > Which is the **exactly correct** result. It is the exactly correct result for IEEE floating point. But 2 is the exactly correct result for modelling real number arithmetic. And for my own use - and I believe for the majority of cases when people use floating point - the aim of floating point in code is to model real number arithmetic as closely as practically possible in an efficient manner. Of course it is important, whether you use -ffast-math or not, to use appropriate numbers and appropriate calculations - trying to evaluate "(1e120 + 2) - 1e120" is never going to be a good idea. But the fact remains that - for the value of "right" and "wrong" that matters to most people - the IEEE rules will silently give you the wrong answer here. The -ffast-math rules might give the right answer, and might give the wrong answer. Occasionally "guaranteed wrong" is better than "sometimes wrong" - it can sometimes make debugging and regression testing easier. Most of the time, they are simply both bad. Can you give an example where -fassociative-math will, as you claim, give a result that losses /all/ bits of precision - while IEEE rules would give a precise answer? It does not have to be all bits - I'm happy with simply losing noticeably more bits with "-fassociative-math" than with IEEE rules. But I want it to use the important metric for correctness - closeness to the result using infinite precision real arithmetic - not just closeness to the artificial value required by IEEE rules. And I want it to be the result of realistic calculations with realistic numbers, using a small number of calculations. (Again, I appreciate that for some uses, predictable and consistent results are vital even if they do not match the real arithmetic, and IEEE rules are of great importance. I am not arguing that IEEE rules are bad - I am arguing that -ffast-math rules are good for some uses.) > This is > a computer program, not some formulas that you can manipulate at will. > I expect the compiler to manipulate things according to its rules. The gcc manual says how it can manipulate the floating point code I write if -ffast-math is enabled. If I have written floating point code where the results are equally good - by my requirements - when these manipulations are done, then -ffast-math is safe for me and gives me correct results. If, as you earlier suggested (exaggerating to try to make a point), the compiler could manipulate the code to simply return 0, then I would agree with you that the flag is dangerous and worse than useless. But fortunately, that is not the case. >>> The -ffast-math flag can only reasonably be used with programs that did >>> not want any specific results anyway. It would be even faster (and just >>> as correct!) to always return 0. >> >> That is simply wrong. > > It is an exaggeration for dramatic effect, but it is fundamentally > correct. > You have vastly more knowledge than I about the internals of gcc and how it works. You also know vastly more about IEEE floating point rules than I do. And I expect you have worked on far more programs for which IEEE rules are important, because they have almost never been relevant to work I have done. But I have regularly used floating point maths in code, in real-life programs. I have regularly used -ffast-math, and seen how it makes my programs faster - sometimes a great deal faster. I have read the details of the -ffast-math flags in the gcc manual, and I know what they do and what manipulations of my code they allow. I know what I have to be aware of for my floating point code to give useful answers (i.e., answers that are at least as close to the real mathematical answers as they need to be for the way the results are used). I know that sticking to IEEE rules would /never/ give more useful answers for my needs - they would give equally useful answers, slower. Maybe there are some people who write floating point code where they get useful answers with IEEE rules, but enabling -ffast-math would result in useless results. I can't answer for other people - only for my own code. And I know that what you write about -ffast-math being as useless as "return 0" is not merely an exaggeration, it is pure FUD. And writing it detracts from the very real factors that are always important when writing floating point code, whether IEEE or -ffast-math, and it means you miss out on an opportunity to discuss where the real-world differences lie between the -ffast-math flags and full IEEE compatibility. David ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-01-09 13:38 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-04 9:01 pragma GCC optimize prevents inlining Hashan Gayasri 2024-01-04 9:27 ` LIU Hao 2024-01-05 0:56 ` Hashan Gayasri 2024-01-04 14:51 ` David Brown 2024-01-04 15:03 ` Segher Boessenkool 2024-01-04 15:24 ` David Brown 2024-01-04 16:37 ` Richard Earnshaw 2024-01-09 13:38 ` Florian Weimer 2024-01-04 16:55 ` Segher Boessenkool 2024-01-05 14:24 ` David Brown 2024-01-05 15:00 ` Segher Boessenkool 2024-01-05 15:53 ` David Brown 2024-01-05 18:19 ` Segher Boessenkool 2024-01-06 17:02 ` David Brown 2024-01-07 17:51 ` Segher Boessenkool 2024-01-07 18:36 ` Gabriel Ravier 2024-01-08 15:53 ` David Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).