From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 015653851C25; Thu, 17 Sep 2020 20:46:33 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 015653851C25
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
 s=default; t=1600375594;
 bh=pvBFtAYHsGkNpS8mBwEb0YkAjJKu7E3ATEOgByG1Hxw=;
 h=From:To:Subject:Date:In-Reply-To:References:From;
 b=NPaGVvGmhlE0LL0824uo2UyKXK1i9pem5rhz0MD1nzDcG2B1aXf8Tcg6G5rbHZGh9
 S1xPbuPxDIofWs5CnB9HiR7blTeyCnNPECcmVNJYMpqJYomubbEJSKSrrU8gyR/yJE
 K+hsQQg+ywxinvitJ0fO5vZO937JG3UYQPrYaYco=
From: "mailboxnotfound at yahoo dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/84201] 549.fotonik3d_r from SPEC2017 fails verification
 with recent Intel and AMD CPUs
Date: Thu, 17 Sep 2020 20:46:33 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 8.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: mailboxnotfound at yahoo dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-84201-4-rEEFycBi0m@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-84201-4@http.gcc.gnu.org/bugzilla/>
References: <bug-84201-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Sep 2020 20:46:34 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D84201

john henning <mailboxnotfound at yahoo dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mailboxnotfound at yahoo d=
ot com
--- Comment #12 from john henning <mailboxnotfound at yahoo dot com> ---
I contributed to the development of benchmark 549.fotonik3d_r.  The opinions
herein are my own, not necessarily SPEC's.

Martin Li=C5=A1ka wrote in comment 9:

> adjusted tolerance for the test from 1e-10 to 1e-9

That change would have been highly desirable, if this problem had been found
prior to the release of CPU 2017.  Unfortunately, post-release, it is very
difficult to change a SPEC CPU benchmark, because of the philosophy of "no
moving targets".=20=20

To be clear, a rule-compliant SPEC CPU run is not allowed to change the
tolerance.

Why wasn't the problem found before release?

Although GCC was tested prior to release of CPU 2017, the circumstances that
lead to this problem were not encountered.  As Steve Ellcey wrote in Commen=
t 5,
the problem comes and goes with various optimizations:

> -Ofast fails
> -Ofast -fno-unsafe-math-optimizations works
> -Ofast -fno-tree-loop-vectorize works
> -O3 works

In addition, it appears that the problem:

  - Depends on the particular architecture. In my tests today, it disappears
when I remove -march=3Dnative

  - Does not happen in the presence of FDO (Feedback-Directed Optimization,
i.e. -fprofile-generate / -fprofile-use), typically used by "peak" tuning (=
see
https://www.spec.org/cpu2017/Docs/overview.html#Q16 for info on base vs. pe=
ak).=20

SPEC CPU Examples=20

SPEC CPU 2017 users who start from the Example config files for GCC in
$SPEC/config are unlikely to hit the problem because most of the GCC Example
config files use -O3 (not -Ofast) for base.  However, if a user modifies the
example to use

   -Ofast -march=3Dnative=20

then they will need to also add -fno-unsafe-math-optimizations or
-fno-tree-loop-vectorize.

Working around the problem

The same-for-all rule --
https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags -- says that all
benchmarks in a suite of a given language must use the same base flags.  He=
re
are several examples of how a config file could obey that rule while working
around the problem:

Option (a) - In base, avoid Ofast for Fortran=20
   default=3Dbase:=20=20=20=20=20
      COPTIMIZE      =3D -Ofast -flto -march=3Dnative=20
      CXXOPTIMIZE    =3D -Ofast -flto -march=3Dnative=20
      FOPTIMIZE      =3D -O3    -flto -march=3Dnative

Option (b) - In base, avoid -march=3Dnative for Fortran
   default=3Dbase:=20=20=20=20=20
      COPTIMIZE      =3D -Ofast -flto -march=3Dnative=20
      CXXOPTIMIZE    =3D -Ofast -flto -march=3Dnative=20
      FOPTIMIZE      =3D -Ofast -flto

Option (c) - Turn off tree loop vectorizer for Fortran
   default=3Dbase:=20=20=20=20=20
      OPTIMIZE       =3D -Ofast -flto -march=3Dnative=20
   fprate,fpspeed=3Dbase:
      FOPTIMIZE      =3D -fno-tree-loop-vectorize=20

Option (d) - Turn off unsafe math for Fortran
   default=3Dbase:=20=20=20=20=20
      OPTIMIZE       =3D -Ofast -flto -march=3Dnative=20
   fprate,fpspeed=3Dbase:
      FOPTIMIZE      =3D -fno-unsafe-math-optimizations

Performance impact

The performance impact of the options above will be system dependent, and m=
ay
depend on how hard you exercise the system (e.g. one copy or many copies).=
=20=20
For a particular system tested by one particular person running only one co=
py,
here are the results of the above 4 options, normalized to option (a):

  Performance of the Fortran benchmarks in SPECrate2017 FP
  Normalized to option (a).  Higher is better

               503.    507.   521.  527.  549.     554.=20
               bwaves  cactu  wrf   cam4  fotonik  roms  geomean
 (a) O3          1.00   1.00  1.00  1.00   1.00    1.00    1.000
 (b) no native   1.31    .82  1.31  1.01    .93     .98    1.045
 (c) no vect     1.31   1.01   .94   .89    .90     .85     .973
 (d) no unsafe    .99   1.02  1.36  1.01   1.03    1.01    1.064

Given the above, at the moment option (d) seems best.

Next steps

I will add a summary of this discussion to=20
   https://www.spec.org/cpu2017/Docs/benchmarks/549.fotonik3d_r.html

Thank you Martin, Martin, Steve, and Richard for the clarity in this report=
.=