public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Better performance on older version of GCC
@ 2010-08-27 13:50 Corey Kasten
  2010-08-27 14:20 ` H.J. Lu
  2010-08-27 14:23 ` Nathan Froyd
  0 siblings, 2 replies; 9+ messages in thread
From: Corey Kasten @ 2010-08-27 13:50 UTC (permalink / raw)
  To: gcc

Hello all,

I have two computers with two different versions of GCC. Otherwise the
two systems have identical hardware. I have a processor and memory
intensive benchmark program which I compile on both systems and I cannot
understand why the system with older GCC version compiles faster code. 

System A has GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)"
System B has GCC version "4.3.0 20080428 (Red Hat 4.3.0-8)"

I find that the executable compiled on system A runs faster (on both
systems) than the executable compiled on system B (on both system), by a
factor about approximately 4 times. I have attempted to play with the
GCC optimizer flags and have not been able to get System B (with the
later GCC version) to compile code with any better performance. Could
someone please help figure this out?

Below is the GCC command I run on System A followed by the verbose
output:
gcc -v -Wall -DOFFLINE_WEIGHTS -DDOUBLEP -g bfbenchmark_threaded.c -lm
-lrt -lpthread -O3 -o bfbenchmark_threaded

---------------------------BEGIN OUTPUT---------------------------------
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c
++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic
--host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)
 /usr/libexec/gcc/i386-redhat-linux/4.1.2/cc1 -quiet -v
-DOFFLINE_WEIGHTS -DDOUBLEP bfbenchmark_threaded.c -quiet -dumpbase
bfbenchmark_threaded.c -mtune=generic -auxbase bfbenchmark_threaded -g
-O3 -Wall -version -o /tmp/ccvxPCd0.s
ignoring nonexistent directory
"/usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../i386-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/gcc/i386-redhat-linux/4.1.2/include
 /usr/include
End of search list.
GNU C version 4.1.2 20070925 (Red Hat 4.1.2-33) (i386-redhat-linux)
	compiled by GNU C version 4.1.2 20070925 (Red Hat 4.1.2-33).
GGC heuristics: --param ggc-min-expand=100 --param
ggc-min-heapsize=131072
Compiler executable checksum: ab322ce5b87a7c6c23d60970ec7b7b31
 as -V -Qy -o /tmp/ccU8kZL1.o /tmp/ccvxPCd0.s
GNU assembler version 2.17.50.0.18 (i386-redhat-linux) using BFD version
version 2.17.50.0.18-1 20070731
 /usr/libexec/gcc/i386-redhat-linux/4.1.2/collect2 --eh-frame-hdr
--build-id -m elf_i386 --hash-style=gnu
-dynamic-linker /lib/ld-linux.so.2 -o
bfbenchmark_threaded /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crt1.o /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crti.o /usr/lib/gcc/i386-redhat-linux/4.1.2/crtbegin.o -L/usr/lib/gcc/i386-redhat-linux/4.1.2 -L/usr/lib/gcc/i386-redhat-linux/4.1.2 -L/usr/lib/gcc/i386-redhat-linux/4.1.2/../../.. /tmp/ccU8kZL1.o -lm -lrt -lpthread -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/i386-redhat-linux/4.1.2/crtend.o /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crtn.o
---------------------------END OUTPUT---------------------------------



Below is the GCC command I run on System A followed by the verbose
output:
gcc -v -Wall -DOFFLINE_WEIGHTS -DDOUBLEP -g bfbenchmark_threaded.c -lm
-lrt -lpthread -O3 -o bfbenchmark_threaded

---------------------------BEGIN OUTPUT---------------------------------
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap
--enable-shared --enable-threads=posix --enable-checking=release
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--disable-libjava-multilib --with-cpu=generic --build=i386-redhat-linux
Thread model: posix
gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC) 
COLLECT_GCC_OPTIONS='-v' '-Wall' '-DOFFLINE_WEIGHTS' '-DDOUBLEP' '-g'
'-O3' '-o' 'bfbenchmark_threaded' '-mtune=generic'
 /usr/libexec/gcc/i386-redhat-linux/4.3.0/cc1 -quiet -v
-DOFFLINE_WEIGHTS -DDOUBLEP bfbenchmark_threaded.c -quiet -dumpbase
bfbenchmark_threaded.c -mtune=generic -auxbase bfbenchmark_threaded -g
-O3 -Wall -version -o /tmp/ccB4B5PI.s
ignoring nonexistent directory
"/usr/lib/gcc/i386-redhat-linux/4.3.0/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/i386-redhat-linux/4.3.0/../../../../i386-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/gcc/i386-redhat-linux/4.3.0/include
 /usr/include
End of search list.
GNU C (GCC) version 4.3.0 20080428 (Red Hat 4.3.0-8) (i386-redhat-linux)
	compiled by GNU C version 4.3.0 20080428 (Red Hat 4.3.0-8), GMP version
4.2.2, MPFR version 2.3.0-p2.
GGC heuristics: --param ggc-min-expand=100 --param
ggc-min-heapsize=131072
Compiler executable checksum: a6100d27c113f078654b8bcf6e8eb1d2
COLLECT_GCC_OPTIONS='-v' '-Wall' '-DOFFLINE_WEIGHTS' '-DDOUBLEP' '-g'
'-O3' '-o' 'bfbenchmark_threaded' '-mtune=generic'
 as -V -Qy -o /tmp/ccoiU9Dv.o /tmp/ccB4B5PI.s
GNU assembler version 2.18.50.0.6 (i386-redhat-linux) using BFD version
version 2.18.50.0.6-2 20080403
COMPILER_PATH=/usr/libexec/gcc/i386-redhat-linux/4.3.0/:/usr/libexec/gcc/i386-redhat-linux/4.3.0/:/usr/libexec/gcc/i386-redhat-linux/:/usr/lib/gcc/i386-redhat-linux/4.3.0/:/usr/lib/gcc/i386-redhat-linux/:/usr/libexec/gcc/i386-redhat-linux/4.3.0/:/usr/libexec/gcc/i386-redhat-linux/:/usr/lib/gcc/i386-redhat-linux/4.3.0/:/usr/lib/gcc/i386-redhat-linux/
LIBRARY_PATH=/usr/lib/gcc/i386-redhat-linux/4.3.0/:/usr/lib/gcc/i386-redhat-linux/4.3.0/:/usr/lib/gcc/i386-redhat-linux/4.3.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-Wall' '-DOFFLINE_WEIGHTS' '-DDOUBLEP' '-g'
'-O3' '-o' 'bfbenchmark_threaded' '-mtune=generic'
 /usr/libexec/gcc/i386-redhat-linux/4.3.0/collect2 --eh-frame-hdr
--build-id -m elf_i386 --hash-style=gnu
-dynamic-linker /lib/ld-linux.so.2 -o
bfbenchmark_threaded /usr/lib/gcc/i386-redhat-linux/4.3.0/../../../crt1.o /usr/lib/gcc/i386-redhat-linux/4.3.0/../../../crti.o /usr/lib/gcc/i386-redhat-linux/4.3.0/crtbegin.o -L/usr/lib/gcc/i386-redhat-linux/4.3.0 -L/usr/lib/gcc/i386-redhat-linux/4.3.0 -L/usr/lib/gcc/i386-redhat-linux/4.3.0/../../.. /tmp/ccoiU9Dv.o -lm -lrt -lpthread -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/i386-redhat-linux/4.3.0/crtend.o /usr/lib/gcc/i386-redhat-linux/4.3.0/../../../crtn.o

---------------------------END OUTPUT---------------------------------

Thanks in advance for your help,

Corey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-27 13:50 Better performance on older version of GCC Corey Kasten
@ 2010-08-27 14:20 ` H.J. Lu
  2010-08-27 14:23 ` Nathan Froyd
  1 sibling, 0 replies; 9+ messages in thread
From: H.J. Lu @ 2010-08-27 14:20 UTC (permalink / raw)
  To: Corey Kasten; +Cc: gcc

On Fri, Aug 27, 2010 at 6:44 AM, Corey Kasten
<corey@materialintelligencellc.com> wrote:
> Hello all,
>
> I have two computers with two different versions of GCC. Otherwise the
> two systems have identical hardware. I have a processor and memory
> intensive benchmark program which I compile on both systems and I cannot
> understand why the system with older GCC version compiles faster code.
>
> System A has GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)"
> System B has GCC version "4.3.0 20080428 (Red Hat 4.3.0-8)"
>
> I find that the executable compiled on system A runs faster (on both
> systems) than the executable compiled on system B (on both system), by a
> factor about approximately 4 times. I have attempted to play with the
> GCC optimizer flags and have not been able to get System B (with the
> later GCC version) to compile code with any better performance. Could
> someone please help figure this out?
>

Can you try gcc 4.5.1?

-- 
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-27 13:50 Better performance on older version of GCC Corey Kasten
  2010-08-27 14:20 ` H.J. Lu
@ 2010-08-27 14:23 ` Nathan Froyd
  2010-08-27 15:03   ` Corey Kasten
  1 sibling, 1 reply; 9+ messages in thread
From: Nathan Froyd @ 2010-08-27 14:23 UTC (permalink / raw)
  To: Corey Kasten; +Cc: gcc

On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote:
> I find that the executable compiled on system A runs faster (on both
> systems) than the executable compiled on system B (on both system), by a
> factor about approximately 4 times. I have attempted to play with the
> GCC optimizer flags and have not been able to get System B (with the
> later GCC version) to compile code with any better performance. Could
> someone please help figure this out?

It's almost impossible to tell what's going on without an actual
testcase.  You might not be able to provide the actual code, but you
could try distilling it down to something you could release.

-Nathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-27 14:23 ` Nathan Froyd
@ 2010-08-27 15:03   ` Corey Kasten
  2010-08-27 15:40     ` Richard Guenther
  0 siblings, 1 reply; 9+ messages in thread
From: Corey Kasten @ 2010-08-27 15:03 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote:
> On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote:
> > I find that the executable compiled on system A runs faster (on both
> > systems) than the executable compiled on system B (on both system), by a
> > factor about approximately 4 times. I have attempted to play with the
> > GCC optimizer flags and have not been able to get System B (with the
> > later GCC version) to compile code with any better performance. Could
> > someone please help figure this out?
> 
> It's almost impossible to tell what's going on without an actual
> testcase.  You might not be able to provide the actual code, but you
> could try distilling it down to something you could release.
> 
> -Nathan

Thanks for the reply Nathan.

I have attached an archive with the test case code. The code is built by
build.sh and outputs the number of microseconds to complete the
processing.

Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces
code that runs in about 66% of the time than does GCC version "4.3.0
20080428 (Red Hat 4.3.0-8)"

Thanks

Corey

[-- Attachment #2: testbenchmark.100827.1050.tgz --]
[-- Type: application/x-compressed-tar, Size: 5022 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-27 15:03   ` Corey Kasten
@ 2010-08-27 15:40     ` Richard Guenther
  2010-08-27 16:29       ` Corey Kasten
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Guenther @ 2010-08-27 15:40 UTC (permalink / raw)
  To: Corey Kasten; +Cc: Nathan Froyd, gcc

On Fri, Aug 27, 2010 at 5:02 PM, Corey Kasten
<corey@materialintelligencellc.com> wrote:
> On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote:
>> On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote:
>> > I find that the executable compiled on system A runs faster (on both
>> > systems) than the executable compiled on system B (on both system), by a
>> > factor about approximately 4 times. I have attempted to play with the
>> > GCC optimizer flags and have not been able to get System B (with the
>> > later GCC version) to compile code with any better performance. Could
>> > someone please help figure this out?
>>
>> It's almost impossible to tell what's going on without an actual
>> testcase.  You might not be able to provide the actual code, but you
>> could try distilling it down to something you could release.
>>
>> -Nathan
>
> Thanks for the reply Nathan.
>
> I have attached an archive with the test case code. The code is built by
> build.sh and outputs the number of microseconds to complete the
> processing.
>
> Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces
> code that runs in about 66% of the time than does GCC version "4.3.0
> 20080428 (Red Hat 4.3.0-8)"

-fcx-limited-range or -fcx-fortran-rules.  4.3 now is more conforming than 4.1.

Richard.

> Thanks
>
> Corey
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-27 15:40     ` Richard Guenther
@ 2010-08-27 16:29       ` Corey Kasten
  2010-08-28  1:06         ` Xinliang David Li
  0 siblings, 1 reply; 9+ messages in thread
From: Corey Kasten @ 2010-08-27 16:29 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Nathan Froyd, gcc

On Fri, 2010-08-27 at 17:09 +0200, Richard Guenther wrote:
> On Fri, Aug 27, 2010 at 5:02 PM, Corey Kasten
> <corey@materialintelligencellc.com> wrote:
> > On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote:
> >> On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote:
> >> > I find that the executable compiled on system A runs faster (on both
> >> > systems) than the executable compiled on system B (on both system), by a
> >> > factor about approximately 4 times. I have attempted to play with the
> >> > GCC optimizer flags and have not been able to get System B (with the
> >> > later GCC version) to compile code with any better performance. Could
> >> > someone please help figure this out?
> >>
> >> It's almost impossible to tell what's going on without an actual
> >> testcase.  You might not be able to provide the actual code, but you
> >> could try distilling it down to something you could release.
> >>
> >> -Nathan
> >
> > Thanks for the reply Nathan.
> >
> > I have attached an archive with the test case code. The code is built by
> > build.sh and outputs the number of microseconds to complete the
> > processing.
> >
> > Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces
> > code that runs in about 66% of the time than does GCC version "4.3.0
> > 20080428 (Red Hat 4.3.0-8)"
> 
> -fcx-limited-range or -fcx-fortran-rules.  4.3 now is more conforming than 4.1.
> 
> Richard.
> 
> > Thanks
> >
> > Corey
> >

Richard,

-fcx-limited-range worked great on both my real benchmark and my test
achive. GCC didn't recognize -fcx-fortran-rules, but obviously I don't
need it.

Thanks so much,
Corey

  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-27 16:29       ` Corey Kasten
@ 2010-08-28  1:06         ` Xinliang David Li
  2010-08-28  8:23           ` Andrew Pinski
  0 siblings, 1 reply; 9+ messages in thread
From: Xinliang David Li @ 2010-08-28  1:06 UTC (permalink / raw)
  To: Corey Kasten; +Cc: Richard Guenther, Nathan Froyd, gcc

Briefly looked at it -- the trunk gcc also regresses a lot compared to
the binary you attached. (To match your binary, also added
-mfpmath=387 -m32 options)

Two problems:

1) more register spills in the trunk version -- the old compiler seems
more effective in using fp stack registers;
2) the complex multiplication -- the old version emits inline sequence
while the trunk version emits call to _muld3c intrinsinc.

You can probably file a bug report on this.

Thanks,

David

On Fri, Aug 27, 2010 at 8:39 AM, Corey Kasten
<corey@materialintelligencellc.com> wrote:
> On Fri, 2010-08-27 at 17:09 +0200, Richard Guenther wrote:
>> On Fri, Aug 27, 2010 at 5:02 PM, Corey Kasten
>> <corey@materialintelligencellc.com> wrote:
>> > On Fri, 2010-08-27 at 06:50 -0700, Nathan Froyd wrote:
>> >> On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote:
>> >> > I find that the executable compiled on system A runs faster (on both
>> >> > systems) than the executable compiled on system B (on both system), by a
>> >> > factor about approximately 4 times. I have attempted to play with the
>> >> > GCC optimizer flags and have not been able to get System B (with the
>> >> > later GCC version) to compile code with any better performance. Could
>> >> > someone please help figure this out?
>> >>
>> >> It's almost impossible to tell what's going on without an actual
>> >> testcase.  You might not be able to provide the actual code, but you
>> >> could try distilling it down to something you could release.
>> >>
>> >> -Nathan
>> >
>> > Thanks for the reply Nathan.
>> >
>> > I have attached an archive with the test case code. The code is built by
>> > build.sh and outputs the number of microseconds to complete the
>> > processing.
>> >
>> > Compiling with GCC version "4.1.2 20070925 (Red Hat 4.1.2-33)" produces
>> > code that runs in about 66% of the time than does GCC version "4.3.0
>> > 20080428 (Red Hat 4.3.0-8)"
>>
>> -fcx-limited-range or -fcx-fortran-rules.  4.3 now is more conforming than 4.1.
>>
>> Richard.
>>
>> > Thanks
>> >
>> > Corey
>> >
>
> Richard,
>
> -fcx-limited-range worked great on both my real benchmark and my test
> achive. GCC didn't recognize -fcx-fortran-rules, but obviously I don't
> need it.
>
> Thanks so much,
> Corey
>
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-28  1:06         ` Xinliang David Li
@ 2010-08-28  8:23           ` Andrew Pinski
  2010-08-28 10:08             ` Xinliang David Li
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Pinski @ 2010-08-28  8:23 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Corey Kasten, Richard Guenther, Nathan Froyd, gcc

On Fri, Aug 27, 2010 at 5:12 PM, Xinliang David Li <davidxl@google.com> wrote:
> Briefly looked at it -- the trunk gcc also regresses a lot compared to
> the binary you attached. (To match your binary, also added
> -mfpmath=387 -m32 options)
>
> Two problems:
>
> 1) more register spills in the trunk version -- the old compiler seems
> more effective in using fp stack registers;
> 2) the complex multiplication -- the old version emits inline sequence
> while the trunk version emits call to _muld3c intrinsinc.

Neither of these seems like real bug reportable ones.  The first one
is that due to -fexcess-precision=standard being default in 4.5 and
above (see PR 323).  The second one is due to -fcx-limited-range not
being default any more (I cannot remember the bug number which changed
that though).

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Better performance on older version of GCC
  2010-08-28  8:23           ` Andrew Pinski
@ 2010-08-28 10:08             ` Xinliang David Li
  0 siblings, 0 replies; 9+ messages in thread
From: Xinliang David Li @ 2010-08-28 10:08 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Corey Kasten, Richard Guenther, Nathan Froyd, gcc

Right -- I missed Richard's previous email regarding the options.

Thanks,

David

On Fri, Aug 27, 2010 at 5:21 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Fri, Aug 27, 2010 at 5:12 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Briefly looked at it -- the trunk gcc also regresses a lot compared to
>> the binary you attached. (To match your binary, also added
>> -mfpmath=387 -m32 options)
>>
>> Two problems:
>>
>> 1) more register spills in the trunk version -- the old compiler seems
>> more effective in using fp stack registers;
>> 2) the complex multiplication -- the old version emits inline sequence
>> while the trunk version emits call to _muld3c intrinsinc.
>
> Neither of these seems like real bug reportable ones.  The first one
> is that due to -fexcess-precision=standard being default in 4.5 and
> above (see PR 323).  The second one is due to -fcx-limited-range not
> being default any more (I cannot remember the bug number which changed
> that though).
>
> Thanks,
> Andrew Pinski
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-08-28  1:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-27 13:50 Better performance on older version of GCC Corey Kasten
2010-08-27 14:20 ` H.J. Lu
2010-08-27 14:23 ` Nathan Froyd
2010-08-27 15:03   ` Corey Kasten
2010-08-27 15:40     ` Richard Guenther
2010-08-27 16:29       ` Corey Kasten
2010-08-28  1:06         ` Xinliang David Li
2010-08-28  8:23           ` Andrew Pinski
2010-08-28 10:08             ` Xinliang David Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).