From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30339 invoked by alias); 15 Mar 2004 01:55:14 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 30332 invoked from network); 15 Mar 2004 01:55:12 -0000 Received: from unknown (HELO www.eyesopen.COM) (12.96.199.11) by sources.redhat.com with SMTP; 15 Mar 2004 01:55:12 -0000 Received: from localhost (roger@localhost) by www.eyesopen.COM (8.11.6/8.11.6) with ESMTP id i2F0fQN14232; Sun, 14 Mar 2004 17:41:27 -0700 Date: Mon, 15 Mar 2004 01:55:00 -0000 From: Roger Sayle To: Scott Robert Ladd cc: gcc@gcc.gnu.org Subject: Re: GCC viciously beaten by ICC in trig test! In-Reply-To: <4054ED19.8020009@coyotegulch.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2004-03/txt/msg00642.txt.bz2 On Sun, 14 Mar 2004, Scott Robert Ladd wrote: > Consider the following program, compiled and run on a Pentium 4 > (Northwood) system: > > #include For a number of benchmarks, just this first line of source code above is enough to loose the race for GCC against Intel when compiling on Linux. Consider the following: #include double doit(double a) { return sin(a) * sin(a); } Compiling with gcc -O2 -ffast-math on Linux generates x86 code that's significantly slower than Intel's compiler output. However, commenting out the "#include " corrects the situation and GCC can then generate *exactly* the same sequence as icc. The issue is that glibc's headers provide inline implementations for sin and cos, and thereby override all of GCC's internal builtin processing. Once this is done, there's nothing tree-ssa, the middle-end or the i386 can do to improve the code. If GCC is to have a hope of using "sincos" or SSE2 specific instruction sequences, the "best intentions" of glibc's headers (will) have to be neutralized first. Perhaps fixincludes :> For the interested with "#include " GCC 3.3.3 generates foo: fldl 4(%esp) fld %st(0) #APP fsin #NO_APP fxch %st(1) #APP fsin #NO_APP fmulp %st, %st(1) ret without it, the same "-O2 -ffast-math -fomit-frame-pointer" options' output is identical to the output from Intel v7.0 (and presumably later). foo: fldl 4(%esp) fsin fmul %st(0), %st ret Just another data point. Avoiding may improve your performance and influence the results of your "command line option" experiments. Roger --