From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 39D93385842C; Sun, 26 Feb 2023 08:01:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 39D93385842C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1677398513; bh=IIf/jXmfbrTpWlGhE83nWBz4+2wnbR9QOdEUjHgktQ0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=qkjqxopayvN9yTaO2YGyWi6vEH16y7bKcfsvoUO0rVM02Thpe9u2fW8u+i4UAB92+ LdIr1vpnp+ldRhzGp1AiDGqnTavidxxe7XjXNW8M4Br17fARXg2bNMOOu11j61870z 3IK9s2qFyIYSvaq3h3f1/BWBe845nkTGNib38pXk= From: "amonakov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod() Date: Sun, 26 Feb 2023 08:01:52 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: amonakov at gcc dot gnu.org X-Bugzilla-Status: WAITING X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108922 --- Comment #9 from Alexander Monakov --- (In reply to Jan Kratochvil from comment #8) > The revert makes it 13x faster. But the produced code still falls back to > calling glibc fmod() as shown in the disassembly in Comment 0. > If I use the "fprem" instruction directly it gets 15x faster - but I did = not > figure out some (easy) way for me how to patch GCC to no longer produce t= he > call to fmod() at all and produce only the "fprem" instruction. You just need to pass -fno-math-errno (the call is for setting errno, simil= ar to how gcc emits the sqrt() sequence). > (In reply to Alexander Monakov from comment #4) > > Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, >=20 > It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x). > There is still some infinity check and I haven't found any real > justification in glibc sources for it: > 28 if (__builtin_expect (isinf (x) || y =3D=3D 0.0L, 0) > 29 && _LIB_VERSION !=3D _IEEE_ && !isnan (y) && !isnan (x)) > 30 /* fmod(+-Inf,y) or fmod(x,0) */ > 31 return __kernel_standard_l (x, y, 227); This is for legacy/fancy error handling beyond setting IEEE exception flags= .=