From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 39D93385842C; Sun, 26 Feb 2023 08:01:53 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 39D93385842C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1677398513;
	bh=IIf/jXmfbrTpWlGhE83nWBz4+2wnbR9QOdEUjHgktQ0=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=qkjqxopayvN9yTaO2YGyWi6vEH16y7bKcfsvoUO0rVM02Thpe9u2fW8u+i4UAB92+
	 LdIr1vpnp+ldRhzGp1AiDGqnTavidxxe7XjXNW8M4Br17fARXg2bNMOOu11j61870z
	 3IK9s2qFyIYSvaq3h3f1/BWBe845nkTGNib38pXk=
From: "amonakov at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem"
 and calling fmod()
Date: Sun, 26 Feb 2023 08:01:52 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.2.1
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: amonakov at gcc dot gnu.org
X-Bugzilla-Status: WAITING
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-108922-4-5OFp9e2jDf@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-108922-4@http.gcc.gnu.org/bugzilla/>
References: <bug-108922-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108922
--- Comment #9 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Jan Kratochvil from comment #8)
> The revert makes it 13x faster. But the produced code still falls back to
> calling glibc fmod() as shown in the disassembly in Comment 0.
> If I use the "fprem" instruction directly it gets 15x faster - but I did =
not
> figure out some (easy) way for me how to patch GCC to no longer produce t=
he
> call to fmod() at all and produce only the "fprem" instruction.

You just need to pass -fno-math-errno (the call is for setting errno, simil=
ar
to how gcc emits the sqrt() sequence).


> (In reply to Alexander Monakov from comment #4)
> > Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64,
>=20
> It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x).
> There is still some infinity check and I haven't found any real
> justification in glibc sources for it:
> 28	  if (__builtin_expect (isinf (x) || y =3D=3D 0.0L, 0)
> 29	      && _LIB_VERSION !=3D _IEEE_ && !isnan (y) && !isnan (x))
> 30	    /* fmod(+-Inf,y) or fmod(x,0) */
> 31	    return __kernel_standard_l (x, y, 227);

This is for legacy/fancy error handling beyond setting IEEE exception flags=
.=