From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-196331-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 78713 invoked by alias); 11 Jun 2018 10:05:40 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 78687 invoked by uid 89); 11 Jun 2018 10:05:39 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=instructed, highprecision, high-precision, codes
X-HELO: mx2.suse.de
Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 11 Jun 2018 10:05:38 +0000
X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "Cc"
Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254])	by mx2.suse.de (Postfix) with ESMTP id DED53AC53;	Mon, 11 Jun 2018 10:05:35 +0000 (UTC)
From: Martin Jambor <mjambor@suse.cz>
To: sellcey@cavium.com, Richard Biener <richard.guenther@gmail.com>, pmenzel+gcc.gnu.org@molgen.mpg.de
Cc: GCC Development <gcc@gcc.gnu.org>
Cc:
Subject: Re: How to get GCC on par with ICC?
In-Reply-To: <1528494436.3449.36.camel@cavium.com>
References: <CAFiYyc1rpbWREjd8AS3s-uCgsJVX7kqKh_tyjSTysTvpyjieVw@mail.gmail.com> <1528494436.3449.36.camel@cavium.com>
User-Agent: Notmuch/0.26 (https://notmuchmail.org) Emacs/25.3.1 (x86_64-suse-linux-gnu)
Date: Mon, 11 Jun 2018 14:50:00 -0000
Message-ID: <ri6vaapiqog.fsf@suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2018-06/txt/msg00132.txt.bz2

Hi Steve,

On Fri, Jun 08 2018, Steve Ellcey wrote:
> On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote:
>>=C2=A0
>> When we do our own comparisons of GCC vs. ICC on benchmarks
>> like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC
>> (in fact it even trails in some benchmarks) unless you get to
>> "SPEC tricks" like data structure re-organization optimizations that
>> probably never apply in practice on real-world code (and people
>> should fix such things at the source level being pointed at them
>> via actually profiling their codes).
>
> Richard,
>
> I was wondering if you have any more details about these comparisions
> you have done that you can share? =C2=A0Compiler versions, options used,
> hardware, etc =C2=A0Also, were there any tests that stood out in terms of
> icc outperforming GCC?

Mostly AMD Ryzen, GCC 8 vs ICC 18.  We were comparing a few combinations
of options.  When we compared ICC's and our -Ofast (with or without
native GCC march/mtune and a set ICC options that hopefully generate
best code on for Ryzen), we found out that without LTO/IPO, GCC is
actually slightly ahead of ICC on integer benchmarks (both SPEC 2006 and
2017).

Floating-point results were a more mixed bag (mostly because ICC
performed surprisingly poorly without IPO on a few) but at least on SPEC
2017, they were clearly better... with a caveat, see below my comment
about wrf.

With LTO/IPO, ICC can perform a few memory-reorg tricks that push them
quite a bit ahead of us but I'm not convinced they can perform these
transformations on much source code that happens not to be a well known
benchmark.  So I'd recommend always looking at non-IPO numbers too.

>
> I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and
> a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4).
> I used '-xHost -O3' for icc and '-march=3Dnative -mtune=3Dnative -O3'
> for gcc.

Please try with -Ofast too.  The main reason is that -O3 does not imply
-ffast-math and the performance gain from it is often very big (and I
suspect the 525.x264_r difference is because of that).  Alternatively,
if your own workloads require high-precision floating-point math, you
have to force ICC to use it to get a fair comparison.  -Ofast also turns
on -fno-protect-parens and -fstack-arrays that also help a few
benchmarks a lot but note that you may need to set large stack ulimit
for them not to crash (but ICC does the same thing, as far as we know).

>
> The int rate numbers (running 1 copy only) were not too bad, GCC was
> only about 2% slower and only 525.x264_r seemed way slower with GCC.
> The fp rate numbers (again only 1 copy) showed a larger difference,=C2=A0
> around 20%.=C2=A0=C2=A0521.wrf_r was more than twice as slow when compile=
d with
> GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed
> significant slowdowns when compiled with GCC vs. ICC.
>

Keep in mind that when discussing FP benchmarks, the used math library
can be (almost) as important as the compiler.  In the case of 481.wrf,
we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU)
performance is about 70% of ICC's.  When we just linked against AMD's
libm, we got to 83%. When we instructed GCC to generate calls to Intel's
SVML library and linked against it, we got to 91%.  Using both SVML and
AMD's libm, we achieved 93%.

That means that there likely still is 7% to be gained from more clever
optimizations in GCC but the real problem is in GNU libm.  And 481.wrf
is perhaps the most extreme example but definitely not the only one.

Martin