[Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod()

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod()
@ 2023-02-24 12:22 jkratochvil at azul dot com
  2023-02-25  2:41 ` [Bug target/108922] fmod() 13x slowdown in gcc4.9 " pinskia at gcc dot gnu.org
                   ` (32 more replies)
  0 siblings, 33 replies; 34+ messages in thread
From: jkratochvil at azul dot com @ 2023-02-24 12:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

            Bug ID: 108922
           Summary: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem"
                    and calling fmod()
           Product: gcc
           Version: 12.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jkratochvil at azul dot com
  Target Milestone: ---

Created attachment 54528
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54528&action=edit
bench.cpp

This performance regression is since:

[PATCH, i386]: Enable reminder{sd,df,xf} and fmod{sf,df,xf} only for
flag_finite_math_only.
https://gcc.gnu.org/pipermail/gcc-patches/2014-September/400104.html

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098

Reproducible with attached "bench.cpp":
g++ (GCC) 4.8.3 20140517 (prerelease)
real    0m0.329s
g++ (GCC) 4.9.3 20150207 (prerelease)
real    0m4.396s

The committer claims "do not return NaN for infinities, but generate
invalid-arithmetic-operand exception.". But my attached testcase tests that all
the corner cases do have both the same result value and the same exceptions
generated.

The committer also claims "fixes ieee_2.f90 testsuite failure" but I have no
idea where to find this testsuite.

g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
/home/azul/t/zuc1182/fmod.C:7
  4005f8:       dd 44 24 30             fldl   0x30(%rsp)
  4005fc:       dd 44 24 38             fldl   0x38(%rsp)
  400600:       d9 c1                   fld    %st(1)
  400602:       d9 c1                   fld    %st(1)
  400604:       d9 f8                   fprem
  400606:       df e0                   fnstsw %ax
  400608:       f6 c4 04                test   $0x4,%ah
  40060b:       75 f7                   jne    400604 <main+0x34>
  40060d:       dd d9                   fstp   %st(1)
  40060f:       dd 5c 24 18             fstpl  0x18(%rsp)
  400613:       f2 0f 10 44 24 18       movsd  0x18(%rsp),%xmm0
  400619:       66 0f 2e c0             ucomisd %xmm0,%xmm0
^^^
Here it tests the result is finite;
if it is not it will fallback to calling fmod().
But I do not find even that needed, one could just use the "fprem" result.
  40061d:       7a 06                   jp     400625 <main+0x55>
  40061f:       74 2f                   je     400650 <main+0x80>
  400621:       d9 c9                   fxch   %st(1)
  400623:       eb 0b                   jmp    400630 <main+0x60>
  400625:       d9 c9                   fxch   %st(1)
  400627:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40062e:       00 00
  400630:       dd 5c 24 08             fstpl  0x8(%rsp)
  400634:       f2 0f 10 4c 24 08       movsd  0x8(%rsp),%xmm1
  40063a:       dd 5c 24 08             fstpl  0x8(%rsp)
  40063e:       f2 0f 10 44 24 08       movsd  0x8(%rsp),%xmm0
  400644:       e8 6f fe ff ff          callq  4004b8 <fmod@plt>
  400649:       eb 09                   jmp    400654 <main+0x84>
  40064b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  400650:       dd d8                   fstp   %st(0)
  400652:       dd d8                   fstp   %st(0)
  400654:       83 c3 01                add    $0x1,%ebx
  400657:       f2 0f 11 44 24 28       movsd  %xmm0,0x28(%rsp)
/home/azul/t/zuc1182/fmod.C:6
  40065d:       81 fb 00 e1 f5 05       cmp    $0x5f5e100,%ebx
  400663:       75 93                   jne    4005f8 <main+0x28>

Similar issue may be with drem() (=remainder()) vs. "fprem1" instruction.

I expect the same issue also affects fmodf(), dremf() and remainderf().

Another topic is why the glibc fmod() implementation just does not use "fprem"
on i686/x86_64 arch.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
@ 2023-02-25  2:41 ` pinskia at gcc dot gnu.org
  2023-02-25  2:47 ` pinskia at gcc dot gnu.org
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-25  2:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>The committer also claims "fixes ieee_2.f90 testsuite failure" but I have no idea where to find this testsuite.


./testsuite/gfortran.dg/ieee/ieee_2.f90

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
  2023-02-25  2:41 ` [Bug target/108922] fmod() 13x slowdown in gcc4.9 " pinskia at gcc dot gnu.org
@ 2023-02-25  2:47 ` pinskia at gcc dot gnu.org
  2023-02-25  9:58 ` amonakov at gcc dot gnu.org
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-25  2:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-02-25
             Status|UNCONFIRMED                 |WAITING

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So the simple test is run the full GCC bootstrap/test with all languages and
check if the testcase fails or not. I suspect it will.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
  2023-02-25  2:41 ` [Bug target/108922] fmod() 13x slowdown in gcc4.9 " pinskia at gcc dot gnu.org
  2023-02-25  2:47 ` pinskia at gcc dot gnu.org
@ 2023-02-25  9:58 ` amonakov at gcc dot gnu.org
  2023-02-25 10:24 ` amonakov at gcc dot gnu.org
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-02-25  9:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> But my attached testcase tests that all the corner cases do have both
> the same result value and the same exceptions generated.

It seems you forgot to attach that testcase (bench.cpp does not cover corner
cases).

I guess Uros' claim was based on what Intel and AMD manuals specify rather than
observed behavior of CPUs.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (2 preceding siblings ...)
  2023-02-25  9:58 ` amonakov at gcc dot gnu.org
@ 2023-02-25 10:24 ` amonakov at gcc dot gnu.org
  2023-02-25 10:30 ` ubizjak at gmail dot com
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-02-25 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, as well as
for {fmod,remainder,remquo}{,f,l} on i386 without any branches for corner
cases. So in practice CPUs apparently implement the expected behavior even
though the manual doesn't promise so.

The ieee_2.f90 testcase attempts to change rounding mode. It 2014 it probably
just was "miscompiled".

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (3 preceding siblings ...)
  2023-02-25 10:24 ` amonakov at gcc dot gnu.org
@ 2023-02-25 10:30 ` ubizjak at gmail dot com
  2023-02-25 10:49 ` ubizjak at gmail dot com
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-25 10:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Alexander Monakov from comment #3)
> I guess Uros' claim was based on what Intel and AMD manuals specify rather
> than observed behavior of CPUs.

As a "committer", I really don't remember the reason to disable the patterns,
but there is some analysis in the corresponding e-mail.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (4 preceding siblings ...)
  2023-02-25 10:30 ` ubizjak at gmail dot com
@ 2023-02-25 10:49 ` ubizjak at gmail dot com
  2023-02-25 10:56 ` amonakov at gcc dot gnu.org
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-25 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #5)
> (In reply to Alexander Monakov from comment #3)
> > I guess Uros' claim was based on what Intel and AMD manuals specify rather
> > than observed behavior of CPUs.
> 
> As a "committer", I really don't remember the reason to disable the
> patterns, but there is some analysis in the corresponding e-mail.

Please see Table 3-31 (and Table 3-32) in SDM [1]. If 'x' (AKA st(0)) is
infinity, no return is specified, since invalid arith operand exception is
generated.

In the above case, the SDM declares output as *undefined*, but c99 specifies
NaN.

[1]
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (5 preceding siblings ...)
  2023-02-25 10:49 ` ubizjak at gmail dot com
@ 2023-02-25 10:56 ` amonakov at gcc dot gnu.org
  2023-02-26  6:39 ` jkratochvil at azul dot com
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-02-25 10:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #7 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
I saw that. That's why I'm pointing out that Glibc (and musl) uses the
instruction without any additional checks: real CPUs produce the expected
result in st(0), despite the documentation making no promise about the content
of st(0)).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (6 preceding siblings ...)
  2023-02-25 10:56 ` amonakov at gcc dot gnu.org
@ 2023-02-26  6:39 ` jkratochvil at azul dot com
  2023-02-26  8:01 ` amonakov at gcc dot gnu.org
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jkratochvil at azul dot com @ 2023-02-26  6:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #8 from Jan Kratochvil <jkratochvil at azul dot com> ---
(In reply to Andrew Pinski from comment #2)
> So the simple test is run the full GCC bootstrap/test with all languages and
> check if the testcase fails or not. I suspect it will.

It does not. Tested on Fedora 36 x86-64.

I did test only a revert of:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098

The revert makes it 13x faster. But the produced code still falls back to
calling glibc fmod() as shown in the disassembly in Comment 0.
If I use the "fprem" instruction directly it gets 15x faster - but I did not
figure out some (easy) way for me how to patch GCC to no longer produce the
call to fmod() at all and produce only the "fprem" instruction.

(In reply to Alexander Monakov from comment #4)
> Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64,

It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x).
There is still some infinity check and I haven't found any real justification
in glibc sources for it:
28        if (__builtin_expect (isinf (x) || y == 0.0L, 0)
29            && _LIB_VERSION != _IEEE_ && !isnan (y) && !isnan (x))
30          /* fmod(+-Inf,y) or fmod(x,0) */
31          return __kernel_standard_l (x, y, 227);

> The ieee_2.f90 testcase attempts to change rounding mode. It 2014 it
> probably just was "miscompiled".

The testsuite run did include "gfortran.dg/ieee/ieee_2.f90" and it has no
regression.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (7 preceding siblings ...)
  2023-02-26  6:39 ` jkratochvil at azul dot com
@ 2023-02-26  8:01 ` amonakov at gcc dot gnu.org
  2023-02-26 11:16 ` jkratochvil at azul dot com
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-02-26  8:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #9 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Jan Kratochvil from comment #8)
> The revert makes it 13x faster. But the produced code still falls back to
> calling glibc fmod() as shown in the disassembly in Comment 0.
> If I use the "fprem" instruction directly it gets 15x faster - but I did not
> figure out some (easy) way for me how to patch GCC to no longer produce the
> call to fmod() at all and produce only the "fprem" instruction.

You just need to pass -fno-math-errno (the call is for setting errno, similar
to how gcc emits the sqrt() sequence).


> (In reply to Alexander Monakov from comment #4)
> > Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64,
> 
> It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x).
> There is still some infinity check and I haven't found any real
> justification in glibc sources for it:
> 28	  if (__builtin_expect (isinf (x) || y == 0.0L, 0)
> 29	      && _LIB_VERSION != _IEEE_ && !isnan (y) && !isnan (x))
> 30	    /* fmod(+-Inf,y) or fmod(x,0) */
> 31	    return __kernel_standard_l (x, y, 227);

This is for legacy/fancy error handling beyond setting IEEE exception flags.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (8 preceding siblings ...)
  2023-02-26  8:01 ` amonakov at gcc dot gnu.org
@ 2023-02-26 11:16 ` jkratochvil at azul dot com
  2023-02-26 21:28 ` ubizjak at gmail dot com
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jkratochvil at azul dot com @ 2023-02-26 11:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #10 from Jan Kratochvil <jkratochvil at azul dot com> ---
(In reply to Alexander Monakov from comment #9)
> You just need to pass -fno-math-errno (the call is for setting errno,
> similar to how gcc emits the sqrt() sequence).

True, thanks.


So I think the patch should be reverted, right? I expect the revert should have
a testcase nowadays.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (9 preceding siblings ...)
  2023-02-26 11:16 ` jkratochvil at azul dot com
@ 2023-02-26 21:28 ` ubizjak at gmail dot com
  2023-02-26 21:32 ` ubizjak at gmail dot com
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-26 21:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jan Kratochvil from comment #8)

> It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x).
> There is still some infinity check and I haven't found any real
> justification in glibc sources for it:
> 28	  if (__builtin_expect (isinf (x) || y == 0.0L, 0)
> 29	      && _LIB_VERSION != _IEEE_ && !isnan (y) && !isnan (x))
> 30	    /* fmod(+-Inf,y) or fmod(x,0) */
> 31	    return __kernel_standard_l (x, y, 227);

Using the following test:

--cut here--
#include <math.h>
#include <stdio.h>

long double
__attribute__((noinline))
test (long double x, long double y)
{
  return fmodl (x, y);
}

int
main ()
{
  long double x = INFINITY, y = 1.0;

  printf ("%Lf\n", test (x, y));
  return 0;
}
--cut here--

execution ends in:

            case 227:
                /* fmod(x,0) */
                exc.type = DOMAIN;
                exc.name = CSTR ("fmod");
                if (_LIB_VERSION == _SVID_)
                    exc.retval = x;
                else
                    exc.retval = zero/zero;
                if (_LIB_VERSION == _POSIX_)
                  __set_errno (EDOM);
                else if (!matherr(&exc)) {
                  if (_LIB_VERSION == _SVID_) {
                    (void) WRITE2("fmod:  DOMAIN error\n", 20);
                  }
                  __set_errno (EDOM);
                }
                break;

So, it doesn't execute fprem, but returns early with NaN.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (10 preceding siblings ...)
  2023-02-26 21:28 ` ubizjak at gmail dot com
@ 2023-02-26 21:32 ` ubizjak at gmail dot com
  2023-02-26 23:36 ` jkratochvil at azul dot com
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-26 21:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #12 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jan Kratochvil from comment #8)

> The revert makes it 13x faster. But the produced code still falls back to
> calling glibc fmod() as shown in the disassembly in Comment 0.
> If I use the "fprem" instruction directly it gets 15x faster - but I did not
> figure out some (easy) way for me how to patch GCC to no longer produce the
> call to fmod() at all and produce only the "fprem" instruction.

Use -ffinite-math-only option:

-ffinite-math-only
   Allow optimizations for floating-point arithmetic that assume that arguments
and results are not NaNs or +-Infs.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (11 preceding siblings ...)
  2023-02-26 21:32 ` ubizjak at gmail dot com
@ 2023-02-26 23:36 ` jkratochvil at azul dot com
  2023-02-27  7:13 ` ubizjak at gmail dot com
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jkratochvil at azul dot com @ 2023-02-26 23:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #13 from Jan Kratochvil <jkratochvil at azul dot com> ---
(In reply to Uroš Bizjak from comment #12)
> (In reply to Jan Kratochvil from comment #8)
> 
> > The revert makes it 13x faster. But the produced code still falls back to
> > calling glibc fmod() as shown in the disassembly in Comment 0.
> > If I use the "fprem" instruction directly it gets 15x faster - but I did not
> > figure out some (easy) way for me how to patch GCC to no longer produce the
> > call to fmod() at all and produce only the "fprem" instruction.
> 
> Use -ffinite-math-only option:
> 
> -ffinite-math-only
>    Allow optimizations for floating-point arithmetic that assume that
> arguments and results are not NaNs or +-Infs.

That works for this Comment 0 reproducer but I find -ffinite-math-only
incorrect to use due to other calculations in the whole OpenJDK codebase. Using
infinite numbers is documented for Java code and then it may have invalid
results.

To fully performance-fix it (no "call fmod" case) I find better to use
-fno-math-errno. Nothing in OpenJDK should rely on errno from math operations.
But that option still requires to revert your patch.

The question is whether gcc can rely on the undocumented Intel behavior as
described in Comment 7. glibc already relies on it anyway.

This revert proposal I have submitted only for the benefit of GCC. I (or my
employer) do not mind myself as I have already submitted a fix for OpenJDK
using an asm "fprem" expression. Relying on a fix in GCC would not be
acceptable for OpenJDK as it is still going to be built by old/exising
OSes/compilers for years: https://github.com/openjdk/jdk/pull/12508/files

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (12 preceding siblings ...)
  2023-02-26 23:36 ` jkratochvil at azul dot com
@ 2023-02-27  7:13 ` ubizjak at gmail dot com
  2023-02-27  8:17 ` amonakov at gcc dot gnu.org
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27  7:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #14 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jan Kratochvil from comment #13)
> The question is whether gcc can rely on the undocumented Intel behavior as
> described in Comment 7. glibc already relies on it anyway.

I don't think this is true, please see Comment #11.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (13 preceding siblings ...)
  2023-02-27  7:13 ` ubizjak at gmail dot com
@ 2023-02-27  8:17 ` amonakov at gcc dot gnu.org
  2023-02-27  8:31 ` jakub at gcc dot gnu.org
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-02-27  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #15 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
That is the fancy-error-handling path that is reached under _LIB_VERSION !=
_IEEE_. Before glibc-2.27, linking with -lieee would set _LIB_VERSION = _IEEE_,
and then glibc would use the fprem[1] instruction without any special-casing.

musl libc does not implement errno setting for math functions, and always uses
fprem directly; likewise for Apple libm:

https://github.com/apple-oss-distributions/Libm/blob/17a5f9daa3f5679f7536b26f133b40cc078753c3/Source/Intel/fmod.s

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (14 preceding siblings ...)
  2023-02-27  8:17 ` amonakov at gcc dot gnu.org
@ 2023-02-27  8:31 ` jakub at gcc dot gnu.org
  2023-02-27  9:10 ` ubizjak at gmail dot com
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-27  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Doesn't the SDM guarantee the right behavior though?
It is true that the FPREM results table says * and ** in certain spots (Table
3-31 in my copy), but then in the Invalid Arithmetic Operand Exception (#IA)
chapter (8.5.1.2 for me) I see Table 8-10 Invalid Arithmetic Operations and the
Masked Responses to Them
and in there:
Condition                           Masked Response
Remainder instructions FPREM,       Return the QNaN floating-point indefinite;
FPREM1: modulus (divisor) is 0      clear condition code flag C2 to 0.
or dividend is ∞.
More questionable is the #Z case, where Table 8-11 just talks about
Divide or reverse divide operation  Returns an ∞ signed with the exclusive OR
of the
with a 0 divisor.                   sign of the two operands to the destination
operand.
but FPREM does division too, so I hope it is covered too (but not listed
explicitly).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (15 preceding siblings ...)
  2023-02-27  8:31 ` jakub at gcc dot gnu.org
@ 2023-02-27  9:10 ` ubizjak at gmail dot com
  2023-02-27  9:17 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27  9:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #17 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #16)
> Doesn't the SDM guarantee the right behavior though?

Indeed, this is what is missing from Table 3-31.

> It is true that the FPREM results table says * and ** in certain spots
> (Table 3-31 in my copy), but then in the Invalid Arithmetic Operand
> Exception (#IA) chapter (8.5.1.2 for me) I see Table 8-10 Invalid Arithmetic
> Operations and the Masked Responses to Them
> and in there:
> Condition                           Masked Response
> Remainder instructions FPREM,       Return the QNaN floating-point
> indefinite;
> FPREM1: modulus (divisor) is 0      clear condition code flag C2 to 0.
> or dividend is ∞.
> More questionable is the #Z case, where Table 8-11 just talks about
> Divide or reverse divide operation  Returns an ∞ signed with the exclusive
> OR of the
> with a 0 divisor.                   sign of the two operands to the
> destination operand.
> but FPREM does division too, so I hope it is covered too (but not listed
> explicitly).

Table C-2 says that FPREM{,1} do not generate #Z exception.

So, based on the above finding, should insn condition be changed to
!flag_errno_math?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (16 preceding siblings ...)
  2023-02-27  9:10 ` ubizjak at gmail dot com
@ 2023-02-27  9:17 ` jakub at gcc dot gnu.org
  2023-02-27  9:32 ` amonakov at gcc dot gnu.org
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-27  9:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #17)
> So, based on the above finding, should insn condition be changed to
> !flag_errno_math?

I'd say that it shouldn't be the business of backends to check flag_errno_math,
it should be the middle-end.  And it can either ignore the fmod (but isn't say
hypot and others a similar case) optab in that case or it could do the sqrt
trick by using the optab inline even for flag_errno_math, then using comparison
detect if it is one of the exceptional cases and call the library function in
that case.

Of course, that is probably GCC 14 material and so a hack on the backend side
would be acceptable too.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (17 preceding siblings ...)
  2023-02-27  9:17 ` jakub at gcc dot gnu.org
@ 2023-02-27  9:32 ` amonakov at gcc dot gnu.org
  2023-02-27  9:33 ` ubizjak at gmail dot com
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-02-27  9:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #19 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
I get the feeling that you're ignoring me, but gcc-4.8.3 was already emitting a
helper fmod call for setting errno without any flag_errno_math checks in
i386.md, i.e. it was already in the middle-end. As was mentioned in comment #9.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (18 preceding siblings ...)
  2023-02-27  9:32 ` amonakov at gcc dot gnu.org
@ 2023-02-27  9:33 ` ubizjak at gmail dot com
  2023-02-27  9:49 ` ubizjak at gmail dot com
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27  9:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #20 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #16)

> More questionable is the #Z case, where Table 8-11 just talks about
> Divide or reverse divide operation  Returns an ∞ signed with the exclusive
> OR of the
> with a 0 divisor.                   sign of the two operands to the
> destination operand.
> but FPREM does division too, so I hope it is covered too (but not listed
> explicitly).

FYI, the table 3-30 (and 3-31) is wrong. Executing fprem when st(0) == 1.0 and
st(1) == 0.0 results in IA exception, not Z exception.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (19 preceding siblings ...)
  2023-02-27  9:33 ` ubizjak at gmail dot com
@ 2023-02-27  9:49 ` ubizjak at gmail dot com
  2023-02-27 10:01 ` amonakov at gcc dot gnu.org
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #21 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Alexander Monakov from comment #19)
> I get the feeling that you're ignoring me, but gcc-4.8.3 was already
> emitting a helper fmod call for setting errno without any flag_errno_math
> checks in i386.md, i.e. it was already in the middle-end. As was mentioned
> in comment #9.

When g:93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 is reverted, current mainline
does not emit anything that would handle errno (even with -fmath-errno flag
explicitly set at command line).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (20 preceding siblings ...)
  2023-02-27  9:49 ` ubizjak at gmail dot com
@ 2023-02-27 10:01 ` amonakov at gcc dot gnu.org
  2023-02-27 10:10 ` jkratochvil at azul dot com
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-02-27 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #22 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Strange, comment #8 claims the opposite (unless Jan tested the revert not on
trunk, but on some branch).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (21 preceding siblings ...)
  2023-02-27 10:01 ` amonakov at gcc dot gnu.org
@ 2023-02-27 10:10 ` jkratochvil at azul dot com
  2023-02-27 10:13 ` jkratochvil at azul dot com
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jkratochvil at azul dot com @ 2023-02-27 10:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #23 from Jan Kratochvil <jkratochvil at azul dot com> ---
Created attachment 54542
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54542&action=edit
fmoderrno.cpp

(In reply to Uroš Bizjak from comment #21)
> When g:93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 is reverted, current
> mainline does not emit anything that would handle errno (even with
> -fmath-errno flag explicitly set at command line).

With 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25),
93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted and fmoderrno.cpp I get:
fmod(1.0D, 0.0D)
g++ -o fmoderrno fmoderrno.C -O3 -Wall; ./fmoderrno
-nan errno=33=Numerical argument out of domain

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (22 preceding siblings ...)
  2023-02-27 10:10 ` jkratochvil at azul dot com
@ 2023-02-27 10:13 ` jkratochvil at azul dot com
  2023-02-27 10:16 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jkratochvil at azul dot com @ 2023-02-27 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #24 from Jan Kratochvil <jkratochvil at azul dot com> ---
(In reply to Alexander Monakov from comment #22)
> Strange, comment #8 claims the opposite (unless Jan tested the revert not on
> trunk, but on some branch).

The testsuite ran on 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25),
93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (23 preceding siblings ...)
  2023-02-27 10:13 ` jkratochvil at azul dot com
@ 2023-02-27 10:16 ` jakub at gcc dot gnu.org
  2023-02-27 10:31 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-27 10:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, the 215740 change has been backported to 4.8 branch in r215773 (I've been
wondering why I can't reproduce it on 4.8; and also to 4.9 branch).
Anyway, in 4.7 I see fmodl being called in *.optimized dump, and
expand_builtin_mathfn_2
used to add the expand_errno_check.
I bet starting with GCC 6 fmod etc. are handled through internal functions
instead and maybe the errno stuff in there is missing.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (24 preceding siblings ...)
  2023-02-27 10:16 ` jakub at gcc dot gnu.org
@ 2023-02-27 10:31 ` ubizjak at gmail dot com
  2023-02-27 10:41 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27 10:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #26 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jan Kratochvil from comment #23)
> Created attachment 54542 [details]
> fmoderrno.cpp
> 
> (In reply to Uroš Bizjak from comment #21)
> > When g:93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 is reverted, current
> > mainline does not emit anything that would handle errno (even with
> > -fmath-errno flag explicitly set at command line).
> 
> With 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25),
> 93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted and fmoderrno.cpp I get:
> fmod(1.0D, 0.0D)
> g++ -o fmoderrno fmoderrno.C -O3 -Wall; ./fmoderrno
> -nan errno=33=Numerical argument out of domain

Ah, the compilation is different if the compiler finds "errno" mentioned in the
source.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (25 preceding siblings ...)
  2023-02-27 10:31 ` ubizjak at gmail dot com
@ 2023-02-27 10:41 ` jakub at gcc dot gnu.org
  2023-02-27 10:46 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-27 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #27 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
CCing Richard S. who has removed expand_errno_check in
r6-4983-g883cabdecdb052865f.
From what I can see, can_test_argument_range doesn't handle FMOD (could it test
for x infinite or y zero?), edom_only_function does though.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (26 preceding siblings ...)
  2023-02-27 10:41 ` jakub at gcc dot gnu.org
@ 2023-02-27 10:46 ` ubizjak at gmail dot com
  2023-02-27 10:47 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.0
             Status|WAITING                     |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com

--- Comment #28 from Uroš Bizjak <ubizjak at gmail dot com> ---
I think that we cleared all questions here. I'll prepare the revert later
today.

On a related note, it would be nice if Intel corrected the table 3-30 and 3-31
w.r.t to Z exception.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (27 preceding siblings ...)
  2023-02-27 10:46 ` ubizjak at gmail dot com
@ 2023-02-27 10:47 ` jakub at gcc dot gnu.org
  2023-02-27 10:58 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-27 10:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, fmod_optab is only used on i?86 (where because of the commit mentioned
here it was limited to finite math only) and rs6000 (which guards it on unsafe
math optimizations), so both in the fast-math related area only.
Therefore it might be very well possible it got broken because of those changes
without anyone noticing.  Most of the builtins for which ranges are tested are
single operand and pow which has 2 has special handling...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (28 preceding siblings ...)
  2023-02-27 10:47 ` jakub at gcc dot gnu.org
@ 2023-02-27 10:58 ` ubizjak at gmail dot com
  2023-02-27 21:11 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27 10:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #30 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #29)
> Note, fmod_optab is only used on i?86 (where because of the commit mentioned
> here it was limited to finite math only) and rs6000 (which guards it on
> unsafe math optimizations), so both in the fast-math related area only.
> Therefore it might be very well possible it got broken because of those
> changes without anyone noticing.  Most of the builtins for which ranges are
> tested are single operand and pow which has 2 has special handling...

Looking at r6-4983-g883cabdecdb052865f, fmod handled here:

+/* Return true if CALL can produce a domain error (EDOM) but can never
+   produce a pole, range overflow or range underflow error (all ERANGE).
+   This means that we can tell whether a function would have set errno
+   by testing whether the result is a NaN.  */
+
+static bool
+edom_only_function (gcall *call)
+{
+  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (call)))
+    {
+    CASE_FLT_FN (BUILT_IN_ACOS):
+    CASE_FLT_FN (BUILT_IN_ASIN):
+    CASE_FLT_FN (BUILT_IN_ATAN):
+    CASE_FLT_FN (BUILT_IN_COS):
+    CASE_FLT_FN (BUILT_IN_SIGNIFICAND):
+    CASE_FLT_FN (BUILT_IN_SIN):
+    CASE_FLT_FN (BUILT_IN_SQRT):
+    CASE_FLT_FN (BUILT_IN_FMOD):
+    CASE_FLT_FN (BUILT_IN_REMAINDER):
+      return true;
+
+    default:
+      return false;
+    }
+}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (29 preceding siblings ...)
  2023-02-27 10:58 ` ubizjak at gmail dot com
@ 2023-02-27 21:11 ` cvs-commit at gcc dot gnu.org
  2023-02-27 21:13 ` ubizjak at gmail dot com
  2023-02-28 16:51 ` hjl.tools at gmail dot com
  32 siblings, 0 replies; 34+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-02-27 21:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #31 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:8020c9c42349f51f75239b9d35a2be41848a97bd

commit r13-6361-g8020c9c42349f51f75239b9d35a2be41848a97bd
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Mon Feb 27 22:10:01 2023 +0100

    i386: Do not constrain fmod and remainder patterns with
flag_finite_math_only [PR108922]

    According to Intel ISA manual, fprem and fprem1 return NaN when invalid
    arithmetic exception is generated. This is documented in Table 8-10 of the
    ISA manual and makes these two instructions fully IEEE compatible.

    The reverted patch was based on the data from table 3-30 and 3-31 of the
    Intel ISA manual, where results in case of st(0) being infinity or
    st(1) being 0 are not specified.

    2023-02-27  UroÅ¡ Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog:

            PR target/108922
            Revert:
            * config/i386/i386.md (fmodxf3): Enable for flag_finite_math_only
only.
            (fmod<mode>3): Ditto.
            (fpremxf4_i387): Ditto.
            (reminderxf3): Ditto.
            (reminder<mode>3): Ditto.
            (fprem1xf4_i387): Ditto.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (30 preceding siblings ...)
  2023-02-27 21:11 ` cvs-commit at gcc dot gnu.org
@ 2023-02-27 21:13 ` ubizjak at gmail dot com
  2023-02-28 16:51 ` hjl.tools at gmail dot com
  32 siblings, 0 replies; 34+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-27 21:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #32 from Uroš Bizjak <ubizjak at gmail dot com> ---
Fixed by reverting g:4f2611b6e872.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
  2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
                   ` (31 preceding siblings ...)
  2023-02-27 21:13 ` ubizjak at gmail dot com
@ 2023-02-28 16:51 ` hjl.tools at gmail dot com
  32 siblings, 0 replies; 34+ messages in thread
From: hjl.tools at gmail dot com @ 2023-02-28 16:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #33 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Uroš Bizjak from comment #20)
> (In reply to Jakub Jelinek from comment #16)
> 
> > More questionable is the #Z case, where Table 8-11 just talks about
> > Divide or reverse divide operation  Returns an ∞ signed with the exclusive
> > OR of the
> > with a 0 divisor.                   sign of the two operands to the
> > destination operand.
> > but FPREM does division too, so I hope it is covered too (but not listed
> > explicitly).
> 
> FYI, the table 3-30 (and 3-31) is wrong. Executing fprem when st(0) == 1.0
> and st(1) == 0.0 results in IA exception, not Z exception.

Thanks for bringing it up.  It will be fixed in the next revision for Intel
SDM.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-02-28 16:51 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-24 12:22 [Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() jkratochvil at azul dot com
2023-02-25  2:41 ` [Bug target/108922] fmod() 13x slowdown in gcc4.9 " pinskia at gcc dot gnu.org
2023-02-25  2:47 ` pinskia at gcc dot gnu.org
2023-02-25  9:58 ` amonakov at gcc dot gnu.org
2023-02-25 10:24 ` amonakov at gcc dot gnu.org
2023-02-25 10:30 ` ubizjak at gmail dot com
2023-02-25 10:49 ` ubizjak at gmail dot com
2023-02-25 10:56 ` amonakov at gcc dot gnu.org
2023-02-26  6:39 ` jkratochvil at azul dot com
2023-02-26  8:01 ` amonakov at gcc dot gnu.org
2023-02-26 11:16 ` jkratochvil at azul dot com
2023-02-26 21:28 ` ubizjak at gmail dot com
2023-02-26 21:32 ` ubizjak at gmail dot com
2023-02-26 23:36 ` jkratochvil at azul dot com
2023-02-27  7:13 ` ubizjak at gmail dot com
2023-02-27  8:17 ` amonakov at gcc dot gnu.org
2023-02-27  8:31 ` jakub at gcc dot gnu.org
2023-02-27  9:10 ` ubizjak at gmail dot com
2023-02-27  9:17 ` jakub at gcc dot gnu.org
2023-02-27  9:32 ` amonakov at gcc dot gnu.org
2023-02-27  9:33 ` ubizjak at gmail dot com
2023-02-27  9:49 ` ubizjak at gmail dot com
2023-02-27 10:01 ` amonakov at gcc dot gnu.org
2023-02-27 10:10 ` jkratochvil at azul dot com
2023-02-27 10:13 ` jkratochvil at azul dot com
2023-02-27 10:16 ` jakub at gcc dot gnu.org
2023-02-27 10:31 ` ubizjak at gmail dot com
2023-02-27 10:41 ` jakub at gcc dot gnu.org
2023-02-27 10:46 ` ubizjak at gmail dot com
2023-02-27 10:47 ` jakub at gcc dot gnu.org
2023-02-27 10:58 ` ubizjak at gmail dot com
2023-02-27 21:11 ` cvs-commit at gcc dot gnu.org
2023-02-27 21:13 ` ubizjak at gmail dot com
2023-02-28 16:51 ` hjl.tools at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).