* pragma GCC optimize prevents inlining
@ 2024-01-04 9:01 Hashan Gayasri
2024-01-04 9:27 ` LIU Hao
2024-01-04 14:51 ` David Brown
0 siblings, 2 replies; 17+ messages in thread
From: Hashan Gayasri @ 2024-01-04 9:01 UTC (permalink / raw)
To: gcc-help
[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]
Hi,
I noticed that GCC doesn't inline functions that have extra optimization
options added via pragma GCC optimize after the pop_options statement.
If the optimizations are different between the caller and the callee, seems
like gcc behaves as if the called function's definition is opaque and not
visible at all (as if it was declared in a different translation unit). It
doesn't even notice that the function doesn't have side effects unless
marked so explicitly.
I wanted the following to be to be optimized:
#pragma GCC push_options
#pragma GCC optimize ("-ffast-math")
inline int64_t __attribute__ ((const)) RoundToNearestLong (double v)
{
assert(fegetround() == FE_TONEAREST);
return std::lrint(v);
}
#pragma GCC pop_options
So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It
does that but prevents the instruction from being inlined. I complied with
- O3 -march=native -DNDEBUG.
If I used attribute always_inline, the inlined version didn't seem to
respect the additional optimization options I provided.
const and pure function attributes helped in removing unused/no side effect
code paths but the function still got called.
Please advice if there's any way work around for this. Using intrinsics
works but less than ideal.
(Also let me know if I should be sending this to a different mailing list)
Thanks in advance!
Best Regards,
Hashan Gayasri
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 9:01 pragma GCC optimize prevents inlining Hashan Gayasri
@ 2024-01-04 9:27 ` LIU Hao
2024-01-05 0:56 ` Hashan Gayasri
2024-01-04 14:51 ` David Brown
1 sibling, 1 reply; 17+ messages in thread
From: LIU Hao @ 2024-01-04 9:27 UTC (permalink / raw)
To: Hashan Gayasri, gcc-help
[-- Attachment #1.1: Type: text/plain, Size: 976 bytes --]
在 2024/1/4 17:01, Hashan Gayasri via Gcc-help 写道:
> I wanted the following to be to be optimized:
>
> (... ...)
>
> So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It
> does that but prevents the instruction from being inlined. I complied with
> - O3 -march=native -DNDEBUG.
Actually `-ffast-math` is an overkill; `-fno-math-errno` isn't practically bad, and can be enabled
globally:
(https://gcc.godbolt.org/z/hhfP6cYrr)
```
//#pragma GCC push_options
//#pragma GCC optimize ("-ffast-math")
inline int64_t __attribute__ ((const)) RoundToNearestLong (double v)
{
// assert(fegetround() == FE_TONEAREST);
return lrint(v);
}
//#pragma GCC pop_options
void
xgset(int64_t& r, double s)
{
r = RoundToNearestLong(s);
}
```
results in
```
xgset(long&, double):
vcvtsd2si rax, xmm0
mov QWORD PTR [rdi], rax
ret
```
--
Best regards,
LIU Hao
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 9:01 pragma GCC optimize prevents inlining Hashan Gayasri
2024-01-04 9:27 ` LIU Hao
@ 2024-01-04 14:51 ` David Brown
2024-01-04 15:03 ` Segher Boessenkool
1 sibling, 1 reply; 17+ messages in thread
From: David Brown @ 2024-01-04 14:51 UTC (permalink / raw)
To: gcc-help
On 04/01/2024 10:01, Hashan Gayasri via Gcc-help wrote:
> Hi,
>
> I noticed that GCC doesn't inline functions that have extra optimization
> options added via pragma GCC optimize after the pop_options statement.
>
> If the optimizations are different between the caller and the callee, seems
> like gcc behaves as if the called function's definition is opaque and not
> visible at all (as if it was declared in a different translation unit). It
> doesn't even notice that the function doesn't have side effects unless
> marked so explicitly.
>
> I wanted the following to be to be optimized:
>
> #pragma GCC push_options
> #pragma GCC optimize ("-ffast-math")
>
> inline int64_t __attribute__ ((const)) RoundToNearestLong (double v)
> {
> assert(fegetround() == FE_TONEAREST);
> return std::lrint(v);
> }
>
> #pragma GCC pop_options
>
>
> So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It
> does that but prevents the instruction from being inlined. I complied with
> - O3 -march=native -DNDEBUG.
>
> If I used attribute always_inline, the inlined version didn't seem to
> respect the additional optimization options I provided.
> const and pure function attributes helped in removing unused/no side effect
> code paths but the function still got called.
>
>
> Please advice if there's any way work around for this. Using intrinsics
> works but less than ideal.
>
> (Also let me know if I should be sending this to a different mailing list)
>
> Thanks in advance!
>
> Best Regards,
> Hashan Gayasri
>
This is a general limitation in GCC, as far as I know. I have come
across it myself (in my case it was the "-fwrapv" flag). As far as I
remember from a previous discussion long ago, there is no easy workaround.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 14:51 ` David Brown
@ 2024-01-04 15:03 ` Segher Boessenkool
2024-01-04 15:24 ` David Brown
0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2024-01-04 15:03 UTC (permalink / raw)
To: David Brown; +Cc: gcc-help
On Thu, Jan 04, 2024 at 03:51:23PM +0100, David Brown via Gcc-help wrote:
> This is a general limitation in GCC, as far as I know. I have come
> across it myself (in my case it was the "-fwrapv" flag). As far as I
> remember from a previous discussion long ago, there is no easy workaround.
What are the expected semantics? That depends on the use case, so on
what the user expects. If the compiler inlines the function and picks
either set of options, it may do something the user wanted to avoid.
Not good.
The user can always write exactly what the user wants, instead :-)
Maybe we could have an option -fallow-inlining-that-changes-semantics?
Not sure if people will actually find that useful, but at least they
cannot say they weren't warned if they use that ;-)
Segher
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 15:03 ` Segher Boessenkool
@ 2024-01-04 15:24 ` David Brown
2024-01-04 16:37 ` Richard Earnshaw
2024-01-04 16:55 ` Segher Boessenkool
0 siblings, 2 replies; 17+ messages in thread
From: David Brown @ 2024-01-04 15:24 UTC (permalink / raw)
To: gcc-help
On 04/01/2024 16:03, Segher Boessenkool wrote:
> On Thu, Jan 04, 2024 at 03:51:23PM +0100, David Brown via Gcc-help wrote:
>> This is a general limitation in GCC, as far as I know. I have come
>> across it myself (in my case it was the "-fwrapv" flag). As far as I
>> remember from a previous discussion long ago, there is no easy workaround.
>
> What are the expected semantics? That depends on the use case, so on
> what the user expects. If the compiler inlines the function and picks
> either set of options, it may do something the user wanted to avoid.
> Not good.
>
Yes, I realise that's a key problem.
What I wanted in my own case was that this function:
#pragma GCC push_options
#pragma GCC optimize("-fwrapv")
inline int32_t add(int32_t x, int32_t y) {
return x + y;
}
#pragma GCC pop_options
would work exactly as though I had :
inline int32_t add(int32_t x, int32_t y) {
return (int32_t) ((uint32_t) x + (uint32_t) y);
}
> The user can always write exactly what the user wants, instead :-)
In my case, I wrote it in that second version! But had "-fwrapv" worked
through inlining, it would have been convenient and neat, especially as
I had several related functions (for a wrapping-integer class).
More generally, I think the expected semantics are that the additional
options apply to code inside the function, and at the boundary you don't
care which set of options apply. So if you have normal floating point
code that sets "x", and then call an inline function with -ffast-math
using "x" as a parameter and returning "y", then the inlined could can
assume "x" is finite and not a NaN, and the later code can assume the
returned value "y" is similarly finite. If the calculations for "x"
produce a NaN, then the code will be UB - that's the programmer's fault.
>
> Maybe we could have an option -fallow-inlining-that-changes-semantics?
> Not sure if people will actually find that useful, but at least they
> cannot say they weren't warned if they use that ;-)
>
>
> Segher
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 15:24 ` David Brown
@ 2024-01-04 16:37 ` Richard Earnshaw
2024-01-09 13:38 ` Florian Weimer
2024-01-04 16:55 ` Segher Boessenkool
1 sibling, 1 reply; 17+ messages in thread
From: Richard Earnshaw @ 2024-01-04 16:37 UTC (permalink / raw)
To: David Brown, gcc-help
On 04/01/2024 15:24, David Brown via Gcc-help wrote:
> On 04/01/2024 16:03, Segher Boessenkool wrote:
>> On Thu, Jan 04, 2024 at 03:51:23PM +0100, David Brown via Gcc-help wrote:
>>> This is a general limitation in GCC, as far as I know. I have come
>>> across it myself (in my case it was the "-fwrapv" flag). As far as I
>>> remember from a previous discussion long ago, there is no easy
>>> workaround.
>>
>> What are the expected semantics? That depends on the use case, so on
>> what the user expects. If the compiler inlines the function and picks
>> either set of options, it may do something the user wanted to avoid.
>> Not good.
>>
>
> Yes, I realise that's a key problem.
>
> What I wanted in my own case was that this function:
>
> #pragma GCC push_options
> #pragma GCC optimize("-fwrapv")
> inline int32_t add(int32_t x, int32_t y) {
> return x + y;
> }
> #pragma GCC pop_options
>
> would work exactly as though I had :
>
> inline int32_t add(int32_t x, int32_t y) {
> return (int32_t) ((uint32_t) x + (uint32_t) y);
> }
>
>> The user can always write exactly what the user wants, instead :-)
>
> In my case, I wrote it in that second version! But had "-fwrapv" worked
> through inlining, it would have been convenient and neat, especially as
> I had several related functions (for a wrapping-integer class).
>
>
> More generally, I think the expected semantics are that the additional
> options apply to code inside the function, and at the boundary you don't
> care which set of options apply. So if you have normal floating point
> code that sets "x", and then call an inline function with -ffast-math
> using "x" as a parameter and returning "y", then the inlined could can
> assume "x" is finite and not a NaN, and the later code can assume the
> returned value "y" is similarly finite. If the calculations for "x"
> produce a NaN, then the code will be UB - that's the programmer's fault.
>
>>
>> Maybe we could have an option -fallow-inlining-that-changes-semantics?
>> Not sure if people will actually find that useful, but at least they
>> cannot say they weren't warned if they use that ;-)
>>
>>
>> Segher
>>
>
>
The general problem here is that the AST does not carry enough
information to handle all these cases within a single function. To
merge wrap/no-wrap code, for example, we'd need separate codes for ADD,
SUBTRACT, MULTIPLY, etc.
At present that 'state' is carried globally, which makes merging
impossible without losing the state information.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 15:24 ` David Brown
2024-01-04 16:37 ` Richard Earnshaw
@ 2024-01-04 16:55 ` Segher Boessenkool
2024-01-05 14:24 ` David Brown
1 sibling, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2024-01-04 16:55 UTC (permalink / raw)
To: David Brown; +Cc: gcc-help
On Thu, Jan 04, 2024 at 04:24:20PM +0100, David Brown via Gcc-help wrote:
> In my case, I wrote it in that second version! But had "-fwrapv" worked
> through inlining, it would have been convenient and neat, especially as
> I had several related functions (for a wrapping-integer class).
Most things work on function basis; almost nothing works per RTL
instruction. There is no per-instruction representation for -fwrapv
in the RTL stream.
Things are even worse for -O2 vs. -O3 etc.
> More generally, I think the expected semantics are that the additional
> options apply to code inside the function, and at the boundary you don't
> care which set of options apply. So if you have normal floating point
> code that sets "x", and then call an inline function with -ffast-math
> using "x" as a parameter and returning "y", then the inlined could can
> assume "x" is finite and not a NaN, and the later code can assume the
> returned value "y" is similarly finite. If the calculations for "x"
> produce a NaN, then the code will be UB - that's the programmer's fault.
Yes, but that is only true for -ffast-math (which means "the user does
not care about correct results" anyway).
You would not typically want random nearby code to use -fwrapv as well,
for example.
Segher
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 9:27 ` LIU Hao
@ 2024-01-05 0:56 ` Hashan Gayasri
0 siblings, 0 replies; 17+ messages in thread
From: Hashan Gayasri @ 2024-01-05 0:56 UTC (permalink / raw)
To: LIU Hao; +Cc: gcc-help
[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]
Hi Hao,
Thanks for the suggestion! Yes, - fno-math-errno is definitely more
suitable for enabling globally than -ffast-math.
While I don't particularly remember using errno for math functions in
particularm it's used in non-math functions. So eventhough it seems
reasonable to be enabled globally, still a bit tricky to validate that it
doesn't cause any unintentional side effects with a large codebase with 3rd
parety libs.
Another weird side effect I noticed is GCC still doesn't inline the
function when the function is within a `pragma GCC optimize
("-fno-math-errno") ` region and -ffast-math is enabled globally eventhough
-fno-math-errno is a subset. If you enable both -ffast-math and
-fno-math-errno, globally, the function gets inlined.
I'm not sure if improving that should be considered as a bug-fix or a
feature/enhancement.
Best Regards,
Hashan Gayasri
On Thu, 4 Jan 2024, 8:28 pm LIU Hao, <lh_mouse@126.com> wrote:
> 在 2024/1/4 17:01, Hashan Gayasri via Gcc-help 写道:
> > I wanted the following to be to be optimized:
> >
> > (... ...)
> >
> > So that std::lrint uses the vcvtsd2si instruction on X86 with SSE2. It
> > does that but prevents the instruction from being inlined. I complied
> with
> > - O3 -march=native -DNDEBUG.
>
> Actually `-ffast-math` is an overkill; `-fno-math-errno` isn't practically
> bad, and can be enabled
> globally:
> (https://gcc.godbolt.org/z/hhfP6cYrr)
>
> ```
> //#pragma GCC push_options
> //#pragma GCC optimize ("-ffast-math")
>
> inline int64_t __attribute__ ((const)) RoundToNearestLong (double v)
> {
> // assert(fegetround() == FE_TONEAREST);
> return lrint(v);
> }
>
> //#pragma GCC pop_options
>
> void
> xgset(int64_t& r, double s)
> {
> r = RoundToNearestLong(s);
> }
> ```
>
> results in
> ```
> xgset(long&, double):
> vcvtsd2si rax, xmm0
> mov QWORD PTR [rdi], rax
> ret
> ```
>
>
>
> --
> Best regards,
> LIU Hao
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 16:55 ` Segher Boessenkool
@ 2024-01-05 14:24 ` David Brown
2024-01-05 15:00 ` Segher Boessenkool
0 siblings, 1 reply; 17+ messages in thread
From: David Brown @ 2024-01-05 14:24 UTC (permalink / raw)
To: gcc-help
On 04/01/2024 17:55, Segher Boessenkool wrote:
> On Thu, Jan 04, 2024 at 04:24:20PM +0100, David Brown via Gcc-help wrote:
>> In my case, I wrote it in that second version! But had "-fwrapv" worked
>> through inlining, it would have been convenient and neat, especially as
>> I had several related functions (for a wrapping-integer class).
>
> Most things work on function basis; almost nothing works per RTL
> instruction. There is no per-instruction representation for -fwrapv
> in the RTL stream.
>
Yes, I appreciate that. And I can also imagine that carrying such
option information in the AST to make this possible would be a
significant burden, and very rarely of benefit - so unless there is some
other important use-case then it is not a good trade-off.
> Things are even worse for -O2 vs. -O3 etc.
>
>> More generally, I think the expected semantics are that the additional
>> options apply to code inside the function, and at the boundary you don't
>> care which set of options apply. So if you have normal floating point
>> code that sets "x", and then call an inline function with -ffast-math
>> using "x" as a parameter and returning "y", then the inlined could can
>> assume "x" is finite and not a NaN, and the later code can assume the
>> returned value "y" is similarly finite. If the calculations for "x"
>> produce a NaN, then the code will be UB - that's the programmer's fault.
>
> Yes, but that is only true for -ffast-math (which means "the user does
> not care about correct results" anyway).
(Getting a little off-topic...
Um, that's not what "-ffast-math" means. It means "the user is using
floating point as a close approximation to real number arithmetic, and
promises to stick to numerically stable calculations". All my uses of
floating point are done with "-ffast-math", and I /do/ care that the
results are correct. But the definition of "correct" for my work is "as
close to the theoretical real number result as you can get with a
limited accuracy format, plus or minus small rounding errors".
For other people, full IEEE compliance, support for NaNs, and
bit-perfect repeatable results regardless of optimisations and target
details, are important for correctness. And that's fine, and it's great
that gcc supports both kinds of code - though I believe that
"-ffast-math" would actually be more appropriate for a large proportion
of programs.)
>
> You would not typically want random nearby code to use -fwrapv as well,
> for example.
>
No, not normally. (Some people would like C to work that way, but not me!)
>
> Segher
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-05 14:24 ` David Brown
@ 2024-01-05 15:00 ` Segher Boessenkool
2024-01-05 15:53 ` David Brown
0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2024-01-05 15:00 UTC (permalink / raw)
To: David Brown; +Cc: gcc-help
On Fri, Jan 05, 2024 at 03:24:48PM +0100, David Brown via Gcc-help wrote:
> On 04/01/2024 17:55, Segher Boessenkool wrote:
> >Most things work on function basis; almost nothing works per RTL
> >instruction. There is no per-instruction representation for -fwrapv
> >in the RTL stream.
>
> Yes, I appreciate that. And I can also imagine that carrying such
> option information in the AST to make this possible would be a
> significant burden, and very rarely of benefit - so unless there is some
> other important use-case then it is not a good trade-off.
Things like -fwrapv and -ftrapv have semantics that naturally could be
done per-insn. Many things are not like that :-/
But even then, what is supposed to happen if some optimisation works on
a bunch of insns, some with -fwrapv (or -ftrapv) semantics and some not?
The only safe thing to do is to not allow any transformations on mixed
insns at all.
> >Yes, but that is only true for -ffast-math (which means "the user does
> >not care about correct results" anyway).
>
> (Getting a little off-topic...
>
> Um, that's not what "-ffast-math" means. It means "the user is using
> floating point as a close approximation to real number arithmetic, and
> promises to stick to numerically stable calculations". All my uses of
> floating point are done with "-ffast-math", and I /do/ care that the
> results are correct. But the definition of "correct" for my work is "as
> close to the theoretical real number result as you can get with a
> limited accuracy format, plus or minus small rounding errors".
-ffast-math is allowed to introduce any rounding error it wants. Which
can (in a loop for example) easily introduce unlimited rounding error,
bigger than the actual result. And this is not just theoretical either.
Yes, there is a lot of code where this doesn't matter, in practice. How
lucky do you feel today?
The only way to safely use -ffast-math is to inspect the generated
machine code. After each and every compilation you do. And everyone
who uses a different compiler version (or is on a different target,
etc.) has to do the same thing.
> For other people, full IEEE compliance, support for NaNs, and
> bit-perfect repeatable results regardless of optimisations and target
> details, are important for correctness. And that's fine, and it's great
> that gcc supports both kinds of code - though I believe that
> "-ffast-math" would actually be more appropriate for a large proportion
> of programs.)
Most people think that IEEE 754 was a huge step forward over wild west
floating point like we used decades ago.
Segher
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-05 15:00 ` Segher Boessenkool
@ 2024-01-05 15:53 ` David Brown
2024-01-05 18:19 ` Segher Boessenkool
0 siblings, 1 reply; 17+ messages in thread
From: David Brown @ 2024-01-05 15:53 UTC (permalink / raw)
To: gcc-help
On 05/01/2024 16:00, Segher Boessenkool wrote:
> On Fri, Jan 05, 2024 at 03:24:48PM +0100, David Brown via Gcc-help wrote:
>> On 04/01/2024 17:55, Segher Boessenkool wrote:
>>> Most things work on function basis; almost nothing works per RTL
>>> instruction. There is no per-instruction representation for -fwrapv
>>> in the RTL stream.
>>
>> Yes, I appreciate that. And I can also imagine that carrying such
>> option information in the AST to make this possible would be a
>> significant burden, and very rarely of benefit - so unless there is some
>> other important use-case then it is not a good trade-off.
>
> Things like -fwrapv and -ftrapv have semantics that naturally could be
> done per-insn. Many things are not like that :-/
Indeed.
>
> But even then, what is supposed to happen if some optimisation works on
> a bunch of insns, some with -fwrapv (or -ftrapv) semantics and some not?
> The only safe thing to do is to not allow any transformations on mixed
> insns at all.
Sometimes mixing would be possible, sometimes not. You can't mix "trap
on signed integer overflow" with "wrap on signed integer overflow" and
expect useful results. But you /can/ mix "wrap on signed integer
overflow" with "signed integer overflow is UB" - then you wrap.
But I can't imagine it's worth the GCC development time trying to figure
out what could work and what could not work, and implementing this,
unless someone is /really/ bored! After all, this can all be done by
hand using conversions to unsigned types, and the __builtin_overflow()
functions when needed.
>
>>> Yes, but that is only true for -ffast-math (which means "the user does
>>> not care about correct results" anyway).
>>
>> (Getting a little off-topic...
>>
>> Um, that's not what "-ffast-math" means. It means "the user is using
>> floating point as a close approximation to real number arithmetic, and
>> promises to stick to numerically stable calculations". All my uses of
>> floating point are done with "-ffast-math", and I /do/ care that the
>> results are correct. But the definition of "correct" for my work is "as
>> close to the theoretical real number result as you can get with a
>> limited accuracy format, plus or minus small rounding errors".
>
> -ffast-math is allowed to introduce any rounding error it wants. Which
> can (in a loop for example) easily introduce unlimited rounding error,
> bigger than the actual result. And this is not just theoretical either.
>
Normal maths mode can also lead to rounding errors that can build up -
the fact that rounding is carefully specified with IEEE does not mean
there are no errors (compared to the theoretical perfect real-number
calculation). It may be easier to get problems with -ffast-math, and
you may get them with smaller loop counts, but it is inevitable that any
finite approximation to real numbers will lead to errors, and that some
calculations will be numerically unstable. IEEE means that you can do
your testing on a fast PC and then deploy your code in a 16-bit
microcontroller and have identical stability - but it does not mean that
you don't get rounding errors.
> Yes, there is a lot of code where this doesn't matter, in practice. How
> lucky do you feel today?
I use gcc, so I feel pretty lucky :-)
The rounding errors in -ffast-math will be very similar to those in IEEE
mode, for normal numbers. The operations are the same - it all
translates to the same floating point cpu instructions, or the same
software floating point library calls. You don't have control of
rounding modes, so you have to assume that rounding will be the least
helpful of any FLT_ROUNDS setting - but you will not get worse than
that. This is a "quality of implementation" issue, rather than a
specified guarantee, but compiler users rely on good quality
implementation all the time. After all, there are no guarantees in the
C standards or in the gcc user manual that integer multiplication will
be done using efficient code rather than repeated addition in a loop.
-ffast-math allows some changes to the order of calculations, or
contracting of expressions, so you need to take that into account. But
then, you need to take it into account in the way you write your
expressions in IEEE mode too, and unless you put a lot of effort into
picking your expression ordering, the -ffast-math re-arrangements are as
likely to improve your results (in terms of the difference compared to
theoretical results) as they are to make them worse.
Basically, I assume that the GCC developers try to be sensible and
helpful, and do not go out of their way to generate intentionally bad
code for people who use one of their optimisation flags. I assume that
if "-ffast-math" and the associated sub-flags were as big a risk as you
are implying, they would have been removed from gcc or at least a big
red warning would be added to the manual. So far, I've been lucky!
>
> The only way to safely use -ffast-math is to inspect the generated
> machine code. After each and every compilation you do. And everyone
> who uses a different compiler version (or is on a different target,
> etc.) has to do the same thing.
>
I do actually check the generated code for some of what I do. I can't
say I have ever felt the need to check generated floating point code
because I worry about the correctness, but sometimes I do so to see if
I've got the efficiency I expect (this is not floating point specific).
And I also consider exact compiler versions and build flags as a part of
my projects - bit-perfect repeatable builds are important in my work, so
I don't change compiler versions or targets within a project without
very good reason and a great deal of checking and re-testing.
>> For other people, full IEEE compliance, support for NaNs, and
>> bit-perfect repeatable results regardless of optimisations and target
>> details, are important for correctness. And that's fine, and it's great
>> that gcc supports both kinds of code - though I believe that
>> "-ffast-math" would actually be more appropriate for a large proportion
>> of programs.)
>
> Most people think that IEEE 754 was a huge step forward over wild west
> floating point like we used decades ago.
>
Oh, sure - no doubts there. But it has plenty of features that are of
no use to me, in my work, and I am happy to ignore them and have gcc
generate the best code it can while ignoring things I don't need.
David
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-05 15:53 ` David Brown
@ 2024-01-05 18:19 ` Segher Boessenkool
2024-01-06 17:02 ` David Brown
0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2024-01-05 18:19 UTC (permalink / raw)
To: David Brown; +Cc: gcc-help
On Fri, Jan 05, 2024 at 04:53:35PM +0100, David Brown via Gcc-help wrote:
> >-ffast-math is allowed to introduce any rounding error it wants. Which
> >can (in a loop for example) easily introduce unlimited rounding error,
> >bigger than the actual result. And this is not just theoretical either.
> >
>
> Normal maths mode can also lead to rounding errors that can build up -
> the fact that rounding is carefully specified with IEEE does not mean
> there are no errors (compared to the theoretical perfect real-number
> calculation).
That's not the point. A program can be perfectly fine, with bounded
errors and all, and then -ffast-math will typically completely destroy
all that, and replace all arithmetic by the equivalent of a dice roll.
That has nothing to do with the fact that all floating point arithmetic
is an approximation to real arithmetic (arithmetic on real numbers).
The semantics of 754 (or any other standard followed) make it clear what
the exact behaviour should be, and -ffast-math tells the compiler to
ignore that and do whatever instead. You cannot have reasonable
programs that way.
> The rounding errors in -ffast-math will be very similar to those in IEEE
> mode, for normal numbers.
No, not at all. Look at what -fassociative-math does, for example.
This can **and does** cause the loss of **all** bits of precision in
certain programs. This is not theoretical. This is real.
The -ffast-math flag can only reasonably be used with programs that did
not want any specific results anyway. It would be even faster (and just
as correct!) to always return 0.
Segher
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-05 18:19 ` Segher Boessenkool
@ 2024-01-06 17:02 ` David Brown
2024-01-07 17:51 ` Segher Boessenkool
0 siblings, 1 reply; 17+ messages in thread
From: David Brown @ 2024-01-06 17:02 UTC (permalink / raw)
To: gcc-help
On 05/01/2024 19:19, Segher Boessenkool wrote:
> On Fri, Jan 05, 2024 at 04:53:35PM +0100, David Brown via Gcc-help wrote:
>>> -ffast-math is allowed to introduce any rounding error it wants. Which
>>> can (in a loop for example) easily introduce unlimited rounding error,
>>> bigger than the actual result. And this is not just theoretical either.
>>>
>>
>> Normal maths mode can also lead to rounding errors that can build up -
>> the fact that rounding is carefully specified with IEEE does not mean
>> there are no errors (compared to the theoretical perfect real-number
>> calculation).
>
> That's not the point. A program can be perfectly fine, with bounded
> errors and all, and then -ffast-math will typically completely destroy
> all that, and replace all arithmetic by the equivalent of a dice roll.
>
The only difference between IEEE calculations and -ffast-math
calculations is that with IEEE, the ordering and rounding is controlled
and consistent. For any given /single/ arithmetic operation that is
performed, each can have the same amount of rounding error or error due
to the limited length of the mantissa. Agreed?
If you have a /sequence/ of calculations using IEEE, then the order of
the operations and the types of roundings and other errors will be
defined and consistent. It won't change if you change options,
optimisations, compilers, targets. It won't change if you make changes
to the source code that should not affect the result.
So if you do extensive and careful analysis about possible maximum
errors, and wide-ranging testing of possible inputs, you be confident of
the accuracy of the results despite the inherent rounding errors.
If you have the same code, but use -ffast-math, then the order the
calculations will be done may change unexpected, or they can be combined
or modified in certain ways. You don't have the consistency.
If you do extensive worst-case analysis and testing, you can be
confident in the accuracy of the results.
The reality is that usually, people don't do any kind of serious
analysis. Of course /some/ will do so, but most people will not. They
will not think about the details - because they don't have to. They
will not care whether they write "(a + b) + c" or "a + (b + c)", or
which the compiler does first. It does not matter if the results are
consistent - they are using types with far more accuracy than they need,
and rounding on the least significant bits does not affect them. They
don't care - and perhaps don't even know - how precision can be lost
when adding or subtracting numbers with significantly different
magnitudes - because they are not doing that, at least when taking into
account the number of bits in the mantissa of a double.
IEEE ordering is about consistency - it is not about correctness, or
accuracy. Indeed, it is quite likely that associative re-arrangements
under -ffast-math give results that are closer to the mathematically
correct real maths calculations. (Some other optimisations, like
multiplying by reciprocals for division, will likely be slightly less
accurate.) I fully appreciate that consistency is often important, and
can easily be more important than absolute accuracy. (I work with
real-time systems - I have often had to explain the difference between
"real time" and "fast".)
No matter how you are doing your calculations, you should understand
your requirements, and you should understand the limitations of floating
point calculations - as IEEE or -ffast-math. It is reasonable to say
that you shouldn't use -ffast-math unless you know it's okay for your
needs, but I think that applies to any floating point work. (Indeed, it
is also true for integers - you should not use an integer type unless
you are sure its range is suitable for your needs.)
But it is simply wrong to suggest that -ffast-math is inaccurate and the
results are a matter of luck, unless you also consider IEEE maths to be
inaccurate and a matter of luck.
> That has nothing to do with the fact that all floating point arithmetic
> is an approximation to real arithmetic (arithmetic on real numbers).
> The semantics of 754 (or any other standard followed) make it clear what
> the exact behaviour should be, and -ffast-math tells the compiler to
> ignore that and do whatever instead. You cannot have reasonable
> programs that way.
That's not what "-ffast-math" does. I really don't understand why you
think that. It is arguably an insult to the GCC developers - do you
really think they'd put in an option in the compiler that is not merely
useless, but is deceptively dangerous and designed specifically to break
people's code and give them incorrect results?
"-ffast-math" is an important optimisation for a lot of code. It makes
it a great deal easier for the compiler to use things like SIMD
instructions for parallel calculations, since there is no need to track
faults like overflows or NaN signals. It means the compiler can make
better use of limited hardware - there is a /lot/ of floating point
hardware around that is "-ffast-math" compatible but not IEEE
compatible. That applies to many kinds of vector and SIMD units,
graphics card units, embedded processors, and other systems that skip
handling of infinities, NaNs, signals, traps, etc., in order to be
smaller, cheaper, faster and lower power.
The importance of these optimisations can be seen in that "-ffast-math"
was included in the relatively new "-Ofast" flag. And the relatively
new "__builtin_assoc_barrier()" function exists solely for use in
"-ffast-math" mode (or at least, "-fassociative-math"). This shows to
me that the GCC developers see "-ffast-math" as important, relevant and
useful, even if it is not something all users want.
>
>> The rounding errors in -ffast-math will be very similar to those in IEEE
>> mode, for normal numbers.
>
> No, not at all. Look at what -fassociative-math does, for example.
> This can **and does** cause the loss of **all** bits of precision in
> certain programs. This is not theoretical. This is real.
a = 1e120;
b = 2;
x = (a + b) - a;
IEEE rules will give "x" equal to 1e120 - mathematically /completely/
wrong. -ffast-math will give "x" equal to 2, which is mathematically
precisely correct.
Sometimes -fassociative-math will give you worse results, sometimes it
will give you better results. /Not/ using it can, and will, lead to
losing all bits of precision. That is equally real.
The simple matter is that if you want good results from your floating
point, you need to have calculations that are appropriate for your
inputs - or inputs that are appropriate for your calculations. That
applies /equally/ whether you use -ffast-math or not.
>
> The -ffast-math flag can only reasonably be used with programs that did
> not want any specific results anyway. It would be even faster (and just
> as correct!) to always return 0.
>
That is simply wrong.
If you still don't understand what I am saying, then I think this
mailing list is probably not the best place for such a discussion
(unless others here want to chime in). There are no doubt appropriate
forums where experts on floating point mathematics hang out, and can
give far better explanations that I could - but I don't know where.
This is not something that interests me enough - I know enough to be
fully confident in the floating point I need for my own uses, and fully
confident that "-ffast-math" gives me what I need with more efficient
results than not using it would. I know enough to know where my limits
are, and when I would need a lot more thought and analysis, or outside
help and advice.
David
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-06 17:02 ` David Brown
@ 2024-01-07 17:51 ` Segher Boessenkool
2024-01-07 18:36 ` Gabriel Ravier
2024-01-08 15:53 ` David Brown
0 siblings, 2 replies; 17+ messages in thread
From: Segher Boessenkool @ 2024-01-07 17:51 UTC (permalink / raw)
To: David Brown; +Cc: gcc-help
On Sat, Jan 06, 2024 at 06:02:45PM +0100, David Brown wrote:
> On 05/01/2024 19:19, Segher Boessenkool wrote:
> >That's not the point. A program can be perfectly fine, with bounded
> >errors and all, and then -ffast-math will typically completely destroy
> >all that, and replace all arithmetic by the equivalent of a dice roll.
>
> The only difference between IEEE calculations and -ffast-math
> calculations is that with IEEE, the ordering and rounding is controlled
> and consistent.
No, that is not the only difference.
'-ffast-math'
Sets the options '-fno-math-errno', '-funsafe-math-optimizations',
'-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans',
'-fcx-limited-range' and '-fexcess-precision=fast'.
Many of those do much more than what you say, can result in the compiler
generating completely different code.
> For any given /single/ arithmetic operation that is
> performed, each can have the same amount of rounding error or error due
> to the limited length of the mantissa. Agreed?
I don't understand what you mean to say even.
> >>The rounding errors in -ffast-math will be very similar to those in IEEE
> >>mode, for normal numbers.
> >
> >No, not at all. Look at what -fassociative-math does, for example.
> >This can **and does** cause the loss of **all** bits of precision in
> >certain programs. This is not theoretical. This is real.
>
> a = 1e120;
> b = 2;
>
> x = (a + b) - a;
>
> IEEE rules will give "x" equal to 1e120 - mathematically /completely/
> wrong. -ffast-math will give "x" equal to 2, which is mathematically
> precisely correct.
The IEEE result is 0. Which is the **exactly correct** result. This is
a computer program, not some formulas that you can manipulate at will.
> >The -ffast-math flag can only reasonably be used with programs that did
> >not want any specific results anyway. It would be even faster (and just
> >as correct!) to always return 0.
>
> That is simply wrong.
It is an exaggeration for dramatic effect, but it is fundamentally
correct.
Segher
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-07 17:51 ` Segher Boessenkool
@ 2024-01-07 18:36 ` Gabriel Ravier
2024-01-08 15:53 ` David Brown
1 sibling, 0 replies; 17+ messages in thread
From: Gabriel Ravier @ 2024-01-07 18:36 UTC (permalink / raw)
To: Segher Boessenkool, David Brown; +Cc: gcc-help
On 1/7/24 17:51, Segher Boessenkool wrote:
> On Sat, Jan 06, 2024 at 06:02:45PM +0100, David Brown wrote:
>> On 05/01/2024 19:19, Segher Boessenkool wrote:
>>> That's not the point. A program can be perfectly fine, with bounded
>>> errors and all, and then -ffast-math will typically completely destroy
>>> all that, and replace all arithmetic by the equivalent of a dice roll.
>> The only difference between IEEE calculations and -ffast-math
>> calculations is that with IEEE, the ordering and rounding is controlled
>> and consistent.
> No, that is not the only difference.
>
> '-ffast-math'
> Sets the options '-fno-math-errno', '-funsafe-math-optimizations',
> '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans',
> '-fcx-limited-range' and '-fexcess-precision=fast'.
>
> Many of those do much more than what you say, can result in the compiler
> generating completely different code.
>
>> For any given /single/ arithmetic operation that is
>> performed, each can have the same amount of rounding error or error due
>> to the limited length of the mantissa. Agreed?
> I don't understand what you mean to say even.
>
>>>> The rounding errors in -ffast-math will be very similar to those in IEEE
>>>> mode, for normal numbers.
>>> No, not at all. Look at what -fassociative-math does, for example.
>>> This can **and does** cause the loss of **all** bits of precision in
>>> certain programs. This is not theoretical. This is real.
>> a = 1e120;
>> b = 2;
>>
>> x = (a + b) - a;
>>
>> IEEE rules will give "x" equal to 1e120 - mathematically /completely/
>> wrong. -ffast-math will give "x" equal to 2, which is mathematically
>> precisely correct.
> The IEEE result is 0. Which is the **exactly correct** result. This is
> a computer program, not some formulas that you can manipulate at will.
That seems to be where the disagreement lies. Those that use -ffast-math
with full knowledge of what it does are presumably acting with the
intent that their program should indeed be treated as "some formulas you
can manipulate at will".
>
>>> The -ffast-math flag can only reasonably be used with programs that did
>>> not want any specific results anyway. It would be even faster (and just
>>> as correct!) to always return 0.
>> That is simply wrong.
> It is an exaggeration for dramatic effect, but it is fundamentally
> correct.
>
>
> Segher
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-07 17:51 ` Segher Boessenkool
2024-01-07 18:36 ` Gabriel Ravier
@ 2024-01-08 15:53 ` David Brown
1 sibling, 0 replies; 17+ messages in thread
From: David Brown @ 2024-01-08 15:53 UTC (permalink / raw)
To: gcc-help
On 07/01/2024 18:51, Segher Boessenkool wrote:
> On Sat, Jan 06, 2024 at 06:02:45PM +0100, David Brown wrote:
>> On 05/01/2024 19:19, Segher Boessenkool wrote:
>>> That's not the point. A program can be perfectly fine, with bounded
>>> errors and all, and then -ffast-math will typically completely destroy
>>> all that, and replace all arithmetic by the equivalent of a dice roll.
>>
>> The only difference between IEEE calculations and -ffast-math
>> calculations is that with IEEE, the ordering and rounding is controlled
>> and consistent.
>
> No, that is not the only difference.
>
> '-ffast-math'
> Sets the options '-fno-math-errno', '-funsafe-math-optimizations',
> '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans',
> '-fcx-limited-range' and '-fexcess-precision=fast'.
>
> Many of those do much more than what you say, can result in the compiler
> generating completely different code.
I know what these do - they are described in the gcc manual. And they
are all good things for the kind of code I write. But I did not list
them in my posts because it would take too much space to include them
all, every time - I have just concentrated on a couple of points.
>
>> For any given /single/ arithmetic operation that is
>> performed, each can have the same amount of rounding error or error due
>> to the limited length of the mantissa. Agreed?
>
> I don't understand what you mean to say even.
I mean that if you write "x = a + b;" for floating point types, you
will, in general, get a rounding error. And the magnitude of the
worst-case rounding error will be the same whether you are using IEEE
rules or "-ffast-math" rules. With IEEE the rounding error will be
consistent and predictable, and for some cases that is important - but
it will not be less of a rounding error.
>
>>>> The rounding errors in -ffast-math will be very similar to those in IEEE
>>>> mode, for normal numbers.
>>>
>>> No, not at all. Look at what -fassociative-math does, for example.
>>> This can **and does** cause the loss of **all** bits of precision in
>>> certain programs. This is not theoretical. This is real.
>>
>> a = 1e120;
>> b = 2;
>>
>> x = (a + b) - a;
>>
>> IEEE rules will give "x" equal to 1e120 - mathematically /completely/
>> wrong. -ffast-math will give "x" equal to 2, which is mathematically
>> precisely correct.
>
> The IEEE result is 0.
Sorry, of course that is what the IEEE rules will give you. It does not
help if I make silly mistakes like that!
> Which is the **exactly correct** result.
It is the exactly correct result for IEEE floating point. But 2 is the
exactly correct result for modelling real number arithmetic. And for my
own use - and I believe for the majority of cases when people use
floating point - the aim of floating point in code is to model real
number arithmetic as closely as practically possible in an efficient manner.
Of course it is important, whether you use -ffast-math or not, to use
appropriate numbers and appropriate calculations - trying to evaluate
"(1e120 + 2) - 1e120" is never going to be a good idea.
But the fact remains that - for the value of "right" and "wrong" that
matters to most people - the IEEE rules will silently give you the wrong
answer here. The -ffast-math rules might give the right answer, and
might give the wrong answer. Occasionally "guaranteed wrong" is better
than "sometimes wrong" - it can sometimes make debugging and regression
testing easier. Most of the time, they are simply both bad.
Can you give an example where -fassociative-math will, as you claim,
give a result that losses /all/ bits of precision - while IEEE rules
would give a precise answer? It does not have to be all bits - I'm
happy with simply losing noticeably more bits with "-fassociative-math"
than with IEEE rules. But I want it to use the important metric for
correctness - closeness to the result using infinite precision real
arithmetic - not just closeness to the artificial value required by IEEE
rules. And I want it to be the result of realistic calculations with
realistic numbers, using a small number of calculations.
(Again, I appreciate that for some uses, predictable and consistent
results are vital even if they do not match the real arithmetic, and
IEEE rules are of great importance. I am not arguing that IEEE rules
are bad - I am arguing that -ffast-math rules are good for some uses.)
> This is
> a computer program, not some formulas that you can manipulate at will.
>
I expect the compiler to manipulate things according to its rules. The
gcc manual says how it can manipulate the floating point code I write if
-ffast-math is enabled. If I have written floating point code where the
results are equally good - by my requirements - when these manipulations
are done, then -ffast-math is safe for me and gives me correct results.
If, as you earlier suggested (exaggerating to try to make a point), the
compiler could manipulate the code to simply return 0, then I would
agree with you that the flag is dangerous and worse than useless. But
fortunately, that is not the case.
>>> The -ffast-math flag can only reasonably be used with programs that did
>>> not want any specific results anyway. It would be even faster (and just
>>> as correct!) to always return 0.
>>
>> That is simply wrong.
>
> It is an exaggeration for dramatic effect, but it is fundamentally
> correct.
>
You have vastly more knowledge than I about the internals of gcc and how
it works. You also know vastly more about IEEE floating point rules
than I do. And I expect you have worked on far more programs for which
IEEE rules are important, because they have almost never been relevant
to work I have done.
But I have regularly used floating point maths in code, in real-life
programs. I have regularly used -ffast-math, and seen how it makes my
programs faster - sometimes a great deal faster. I have read the
details of the -ffast-math flags in the gcc manual, and I know what they
do and what manipulations of my code they allow. I know what I have to
be aware of for my floating point code to give useful answers (i.e.,
answers that are at least as close to the real mathematical answers as
they need to be for the way the results are used). I know that sticking
to IEEE rules would /never/ give more useful answers for my needs - they
would give equally useful answers, slower.
Maybe there are some people who write floating point code where they get
useful answers with IEEE rules, but enabling -ffast-math would result in
useless results. I can't answer for other people - only for my own code.
And I know that what you write about -ffast-math being as useless as
"return 0" is not merely an exaggeration, it is pure FUD. And writing
it detracts from the very real factors that are always important when
writing floating point code, whether IEEE or -ffast-math, and it means
you miss out on an opportunity to discuss where the real-world
differences lie between the -ffast-math flags and full IEEE compatibility.
David
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pragma GCC optimize prevents inlining
2024-01-04 16:37 ` Richard Earnshaw
@ 2024-01-09 13:38 ` Florian Weimer
0 siblings, 0 replies; 17+ messages in thread
From: Florian Weimer @ 2024-01-09 13:38 UTC (permalink / raw)
To: Richard Earnshaw via Gcc-help; +Cc: David Brown, Richard Earnshaw
* Richard Earnshaw via Gcc-help:
> The general problem here is that the AST does not carry enough
> information to handle all these cases within a single function. To
> merge wrap/no-wrap code, for example, we'd need separate codes for
> ADD, SUBTRACT, MULTIPLY, etc.
__builtin_add_overflow et al. are already specified to implement
wraparound behavior on overflow, so that particular case should be
covered to some extent.
Thanks,
Florian
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-01-09 13:38 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-04 9:01 pragma GCC optimize prevents inlining Hashan Gayasri
2024-01-04 9:27 ` LIU Hao
2024-01-05 0:56 ` Hashan Gayasri
2024-01-04 14:51 ` David Brown
2024-01-04 15:03 ` Segher Boessenkool
2024-01-04 15:24 ` David Brown
2024-01-04 16:37 ` Richard Earnshaw
2024-01-09 13:38 ` Florian Weimer
2024-01-04 16:55 ` Segher Boessenkool
2024-01-05 14:24 ` David Brown
2024-01-05 15:00 ` Segher Boessenkool
2024-01-05 15:53 ` David Brown
2024-01-05 18:19 ` Segher Boessenkool
2024-01-06 17:02 ` David Brown
2024-01-07 17:51 ` Segher Boessenkool
2024-01-07 18:36 ` Gabriel Ravier
2024-01-08 15:53 ` David Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).