From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A73DA3858412; Mon, 14 Feb 2022 07:12:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A73DA3858412 From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/103008] poor inlined builtin_fmod on x86_64 Date: Mon, 14 Feb 2022 07:12:58 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2022 07:12:58 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103008 --- Comment #16 from rguenther at suse dot de --- On Fri, 11 Feb 2022, ubizjak at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103008 >=20 > --- Comment #13 from Uro=C5=A1 Bizjak --- > (In reply to Richard Biener from comment #12) > > Just as data-point on znver2 Uros testcase shows > >=20 > > rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2 > > rguenther@ryzen:/tmp> numactl --physcpubind=3D3 /usr/bin/time ./a.out=20 > > 19.18user 0.00system 0:19.18elapsed 99%CPU (0avgtext+0avgdata > > 1528maxresident)k > > 0inputs+0outputs (0major+76minor)pagefaults 0swaps > > rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=3Dznver2 -fno-builti= n-fmod >=20 > You should use -fno-builtin-fmodf in the above compile flags. Oops, yes. Then the glibc version is 22.53user 0.00system 0:22.53elapsed 99%CPU (0avgtext+0avgdata=20 1600maxresident)k 0inputs+0outputs (0major+77minor)pagefaults 0swaps so indeed for float the x87 inline version is faster when benchmarked this way. For double it's 19.31user 0.00system 0:19.31elapsed 99%CPU (0avgtext+0avgdata=20 1536maxresident)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps vs. 18.47user 0.00system 0:18.47elapsed 99%CPU (0avgtext+0avgdata=20 1600maxresident)k 0inputs+0outputs (0major+77minor)pagefaults 0swaps so glibc is a bit faster here while the x87 version is of course similar. Avoiding the libcall can of course avoid spilling SSE regs around the call. So what remains is really the special case in blender doing fmod (x, 1.) which can eventually be optimized with SSE4.=