From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 37F92385840F; Mon, 14 Feb 2022 07:35:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 37F92385840F From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/103008] poor inlined builtin_fmod on x86_64 Date: Mon, 14 Feb 2022 07:35:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2022 07:35:35 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103008 --- Comment #17 from Richard Biener --- (In reply to Uro=C5=A1 Bizjak from comment #14) > Created attachment 52428 [details] > Proposed patch >=20 > The attached patch implements: >=20 > fmod (a, p) =3D a - trunc (a/p) * p > drem (a, p) =3D a - roundeven (a/p) * p >=20 > using SSE4 round instruction (and uses fnma when available). >=20 > Timings with Polyhedron ac.f90 on IvyBridge-E, Fedora-34, glibc 2.33-21.f= c34 >=20 > -Ofast: > 6,150082000 seconds user >=20 > -Ofast -mno-80387: > 18,354654000 seconds user >=20 > -Ofast -msse4: > 5,722511000 seconds user I fear this is a bit too much on the "unsafe" side. Maybe we can go this way for float but use double arithmetic for the fmod to avoid the exponent issue? For double, can we do some cheap range checking and fall back to fmod() when not safe? That said, can we have a flag like -mrecip to control this?=