From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id ACE103858417; Fri, 11 Feb 2022 07:59:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ACE103858417 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/103008] poor inlined builtin_fmod on x86_64 Date: Fri, 11 Feb 2022 07:59:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Feb 2022 07:59:40 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103008 --- Comment #12 from Richard Biener --- Just as data-point on znver2 Uros testcase shows rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2 rguenther@ryzen:/tmp> numactl --physcpubind=3D3 /usr/bin/time ./a.out=20 19.18user 0.00system 0:19.18elapsed 99%CPU (0avgtext+0avgdata 1528maxreside= nt)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2 -fno-builtin-fm= od rguenther@ryzen:/tmp> numactl --physcpubind=3D3 /usr/bin/time ./a.out=20 19.26user 0.00system 0:19.26elapsed 99%CPU (0avgtext+0avgdata 1528maxreside= nt)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2 -Dfmodf=3D_fmod= f=20=20=20 rguenther@ryzen:/tmp> numactl --physcpubind=3D3 /usr/bin/time ./a.out=20 4.40user 0.00system 0:04.40elapsed 100%CPU (0avgtext+0avgdata 1528maxreside= nt)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps that's with glibc 2.31. So the _fmodf variant is very much faster. But as Joseph says a general expansion like that is probably a bad idea. The specific case of blender using doubles and fmod (x, 1.) shows that glibc is very much slower than x87 in the test below on znver2 but the proposed inline is very very much faster. Note that using modf(x, &tem) is more than three times as fast as using fmod (x, 1.) with glibc 2.31. While we have an optab for fmod we don't have one for modf (which has an unfortunate pointer output API). I'm not sure whether fmod (x, 1.) =3D=3D modf (x, &tem). #include double __attribute__((noinline)) _fmod (double x, double) { return x - trunc (x); } int main () { double a, b; volatile double z; for (a =3D -1000.0; a < 1000.0; a +=3D 0.01) for (b =3D -1000.0; b < 1000.0; b +=3D 0.1) { volatile double tem =3D a; z =3D fmod (tem, 1.); } return 0; } Note that replacing a call of fmod (x, 1.) with x - trunc (x) would not be a simplifcation on GIMPLE so that should be possibly done by RTL expansion? Replacing it with modf (x, &tem) would be OK I think (unfortunately modf doesn't seem to accept a NULL arg). Both functions are part of C99 / POSIX so replacing one with the other should be generally OK. Maybe there's a function that does not compute the integer part as well.=