From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6CB34385841B; Sat, 30 Oct 2021 18:51:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6CB34385841B From: "fx at gnu dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/103008] New: poor inlined builtin_fmod on x86_64 Date: Sat, 30 Oct 2021 18:51:33 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: fx at gnu dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone cf_gcctarget attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Oct 2021 18:51:33 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103008 Bug ID: 103008 Summary: poor inlined builtin_fmod on x86_64 Product: gcc Version: 11.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fx at gnu dot org Target Milestone: --- Target: x86_64 Created attachment 51706 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D51706&action=3Dedit ggl.f90 This is from looking at a Fortran benchmark set , but presumably isn't Fortran-specific. One of the cases in that set (ac.f90) gets bottlenecked on a random number routine (which may be rubbish, but it's there). It uses DMOD, which gets compiled to __builtin_fmod according to the tree dump, and is inlined. However, the benchmark performance is still 50% worse with gfortran than Intel ifort, and if I replace DMOD with its definition, gfortran is much closer to ifort. I'll attach files ggl.f90, the original, and gglx.f90 which avoids the call to the intrinsic, along with assembler from each. The assembler is from GCC 11.2.0, run (on SKX) as gfortran -Ofast -march=3Dnative (I note that the generated fmod isn't inlined with -O3, which looks to me like a Fortran miss that I should report.) I only take benchmarks too seriously for understanding the results but, at least with PDO, GCC is pretty much on a par with ifort on the bottom line of that set, despite also #40770, and another poor case. :-)=