From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id ACE103858417; Fri, 11 Feb 2022 07:59:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ACE103858417
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/103008] poor inlined builtin_fmod on x86_64
Date: Fri, 11 Feb 2022 07:59:40 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-103008-4-N5DsCLgg0x@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-103008-4@http.gcc.gnu.org/bugzilla/>
References: <bug-103008-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Feb 2022 07:59:40 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103008
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Just as data-point on znver2 Uros testcase shows

rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2
rguenther@ryzen:/tmp> numactl --physcpubind=3D3 /usr/bin/time ./a.out=20
19.18user 0.00system 0:19.18elapsed 99%CPU (0avgtext+0avgdata 1528maxreside=
nt)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2 -fno-builtin-fm=
od
rguenther@ryzen:/tmp> numactl --physcpubind=3D3 /usr/bin/time ./a.out=20
19.26user 0.00system 0:19.26elapsed 99%CPU (0avgtext+0avgdata 1528maxreside=
nt)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2 -Dfmodf=3D_fmod=
f=20=20=20
rguenther@ryzen:/tmp> numactl --physcpubind=3D3 /usr/bin/time ./a.out=20
4.40user 0.00system 0:04.40elapsed 100%CPU (0avgtext+0avgdata 1528maxreside=
nt)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps

that's with glibc 2.31.  So the _fmodf variant is very much faster.  But
as Joseph says a general expansion like that is probably a bad idea.

The specific case of blender using doubles and fmod (x, 1.) shows that
glibc is very much slower than x87 in the test below on znver2 but the
proposed inline is very very much faster.

Note that using modf(x, &tem) is more than three times as fast as
using fmod (x, 1.) with glibc 2.31.  While we have an optab for fmod
we don't have one for modf (which has an unfortunate pointer output API).
I'm not sure whether fmod (x, 1.) =3D=3D modf (x, &tem).

#include <math.h>

double
__attribute__((noinline))
_fmod (double x, double)
{
  return x - trunc (x);
}

int
main ()
{

  double a, b;
  volatile double z;

  for (a =3D -1000.0; a < 1000.0; a +=3D 0.01)
    for (b =3D -1000.0; b < 1000.0; b +=3D 0.1)
      {
        volatile double tem =3D a;
        z =3D fmod (tem, 1.);
      }

  return 0;
}

Note that replacing a call of fmod (x, 1.) with x - trunc (x) would
not be a simplifcation on GIMPLE so that should be possibly done
by RTL expansion?  Replacing it with modf (x, &tem) would be OK
I think (unfortunately modf doesn't seem to accept a NULL arg).
Both functions are part of C99 / POSIX so replacing one with the
other should be generally OK.

Maybe there's a function that does not compute the integer part
as well.=