From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 78713 invoked by alias); 11 Jun 2018 10:05:40 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 78687 invoked by uid 89); 11 Jun 2018 10:05:39 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=instructed, highprecision, high-precision, codes X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 11 Jun 2018 10:05:38 +0000 X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "Cc" Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id DED53AC53; Mon, 11 Jun 2018 10:05:35 +0000 (UTC) From: Martin Jambor To: sellcey@cavium.com, Richard Biener , pmenzel+gcc.gnu.org@molgen.mpg.de Cc: GCC Development Cc: Subject: Re: How to get GCC on par with ICC? In-Reply-To: <1528494436.3449.36.camel@cavium.com> References: <1528494436.3449.36.camel@cavium.com> User-Agent: Notmuch/0.26 (https://notmuchmail.org) Emacs/25.3.1 (x86_64-suse-linux-gnu) Date: Mon, 11 Jun 2018 14:50:00 -0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2018-06/txt/msg00132.txt.bz2 Hi Steve, On Fri, Jun 08 2018, Steve Ellcey wrote: > On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote: >>=C2=A0 >> When we do our own comparisons of GCC vs. ICC on benchmarks >> like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC >> (in fact it even trails in some benchmarks) unless you get to >> "SPEC tricks" like data structure re-organization optimizations that >> probably never apply in practice on real-world code (and people >> should fix such things at the source level being pointed at them >> via actually profiling their codes). > > Richard, > > I was wondering if you have any more details about these comparisions > you have done that you can share? =C2=A0Compiler versions, options used, > hardware, etc =C2=A0Also, were there any tests that stood out in terms of > icc outperforming GCC? Mostly AMD Ryzen, GCC 8 vs ICC 18. We were comparing a few combinations of options. When we compared ICC's and our -Ofast (with or without native GCC march/mtune and a set ICC options that hopefully generate best code on for Ryzen), we found out that without LTO/IPO, GCC is actually slightly ahead of ICC on integer benchmarks (both SPEC 2006 and 2017). Floating-point results were a more mixed bag (mostly because ICC performed surprisingly poorly without IPO on a few) but at least on SPEC 2017, they were clearly better... with a caveat, see below my comment about wrf. With LTO/IPO, ICC can perform a few memory-reorg tricks that push them quite a bit ahead of us but I'm not convinced they can perform these transformations on much source code that happens not to be a well known benchmark. So I'd recommend always looking at non-IPO numbers too. > > I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and > a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4). > I used '-xHost -O3' for icc and '-march=3Dnative -mtune=3Dnative -O3' > for gcc. Please try with -Ofast too. The main reason is that -O3 does not imply -ffast-math and the performance gain from it is often very big (and I suspect the 525.x264_r difference is because of that). Alternatively, if your own workloads require high-precision floating-point math, you have to force ICC to use it to get a fair comparison. -Ofast also turns on -fno-protect-parens and -fstack-arrays that also help a few benchmarks a lot but note that you may need to set large stack ulimit for them not to crash (but ICC does the same thing, as far as we know). > > The int rate numbers (running 1 copy only) were not too bad, GCC was > only about 2% slower and only 525.x264_r seemed way slower with GCC. > The fp rate numbers (again only 1 copy) showed a larger difference,=C2=A0 > around 20%.=C2=A0=C2=A0521.wrf_r was more than twice as slow when compile= d with > GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed > significant slowdowns when compiled with GCC vs. ICC. > Keep in mind that when discussing FP benchmarks, the used math library can be (almost) as important as the compiler. In the case of 481.wrf, we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) performance is about 70% of ICC's. When we just linked against AMD's libm, we got to 83%. When we instructed GCC to generate calls to Intel's SVML library and linked against it, we got to 91%. Using both SVML and AMD's libm, we achieved 93%. That means that there likely still is 7% to be gained from more clever optimizations in GCC but the real problem is in GNU libm. And 481.wrf is perhaps the most extreme example but definitely not the only one. Martin