From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 590DC3858C20; Thu, 10 Feb 2022 18:09:21 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 590DC3858C20
From: "ubizjak at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/103008] poor inlined builtin_fmod on x86_64
Date: Thu, 10 Feb 2022 18:09:21 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: ubizjak at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-103008-4-qHaBeLIkKg@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-103008-4@http.gcc.gnu.org/bugzilla/>
References: <bug-103008-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Feb 2022 18:09:21 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103008
--- Comment #10 from Uro=C5=A1 Bizjak <ubizjak at gmail dot com> ---
FYI, the following testcase:

--cut here--
#include <math.h>

float
__attribute__((noinline))
_fmodf (float x, float y)
{
  return x - truncf (x/y) * y;
}

int
main ()
{

  float a, b;
  volatile float z;

  for (a =3D -1000.0f; a < 1000.0f; a +=3D 0.01f)
    for (b =3D -1000.0f; b < 1000.0f; b +=3D 0.1f)
      z =3D fmodf (a, b);

  return 0;
}
--cut here--

$ gcc -Ofast -lm fmod-bench.c

      22,127092116 seconds time elapsed

      22,125111000 seconds user
       0,000999000 seconds sys


$ gcc -Ofast -fno-builtin-fmodf -lm fmod-bench.c

      32,751589079 seconds time elapsed

      32,746156000 seconds user
       0,000999000 seconds sys


Which points that the x87 code is considerably faster on my target
(Ivybridge-E) on Fedora-34 with glibc-2.33.

For reference, when the above _fmodf is called, I get:

$ gcc -Ofast -lm fmod-bench.c

      10,706189749 seconds time elapsed

      10,704859000 seconds user
       0,000999000 seconds sys

$ gcc -Ofast -lm -msse4 fmod-bench.c

      11,391062747 seconds time elapsed

      11,390771000 seconds user
       0,000000000 seconds sys

So, considerable faster!

It looks that with -ffast-math it is not inlined x87 code that is problemat=
ic,
but the missing fmod transformation. As shown above, the SSE2 code for trun=
cf
is on par with SSE4 roundss instruction, so if the target can provide optim=
ized
truncf code, the fmodf should definitely be converted to "a - trunc (a/p) *=
 p".=