public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Performance measurements
@ 1998-06-24  2:28 Martin Kahlert
  1998-06-24  8:51 ` David S. Miller
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Martin Kahlert @ 1998-06-24  2:28 UTC (permalink / raw)
  To: egcs; +Cc: axp-list

Hi,
i tried to compare different compilers on my numerical code.
Therefore i extracted a FPU intensive function and surrounded
it with a loop while measuring the execution time.

I will provide the source and the Makefile at the end of this mail.

I work on a Linux 2.0.34 SMP Kernel (libc5). My hardware is a
dual Pentium Pro 200MHz system with 128MB RAM.

pgcc is the portland group compiler and 
tcc the free Tendra compiler system
(from http://alph.dera.gov.uk/TenDRA/ )

Here are my results:
%> make
pgcc:
90.11 MFLOPS
gcc-2.7.2.1:
95.46 MFLOPS
gcc-without double align:
23.52 MFLOPS
egcs-2.91.42:
69.96 MFLOPS
tcc:
92.33 MFLOPS

- The difference -malign-double makes on gcc-2.7.2.1 is very 
  impressive. On egcs it doesn't change the result.

- egcs seems to produce worse code than gcc-2.7.2.1
- During my experience, pgcc produces very good code and is very
  reliable. The code usually runs about 25% faster than egcs-code.
- For tcc i have to say, that the result is very dependent on the 
  order, you declare the variables:
  If you put the declarations int i,j; and double *wksph=wksp+n/2;
  in front of the one for c[], the result drops down to 24.70 MFLOPS.
  For all other compilers, this doesn't make much difference.

Could anybody comment on that?

If anyone is interested in the asm code that pgcc produces, i 
can send it offline (388 lines of asm statements are too 
long for the list)


For the axp list: It would be very kind, if anybody could provide
some values for a Linux Alpha (e.g. 533MHz) for comparison.

Thanks in advance,
Martin


Here is the testfile m.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <math.h>

#define N 1024

void trafo(double *a,const int n,double *wksp,double *const ops)
{
 double c[]={ 3.52262918857095333469e-02,
             -8.54412738820266443041e-02,
             -1.35011020010254584323e-01,
              4.59877502118491543470e-01,
              8.06891509311092547385e-01,
              3.32670552950082631938e-01 };
 int i,j;
 double *wksph=wksp+n/2;

 if(n<6)
    return;

 for(i=j=0;i<n/2-2;i++,j+=2)
    {wksph[i]= c[0]*a[j]+c[1]*a[j+1]
     +c[2]*a[j+2]+c[3]*a[j+3]
         +c[4]*a[j+4]+c[5]*a[j+5];
     wksp[i] = c[5]*a[j]-c[4]*a[j+1]
         +c[3]*a[j+2]-c[2]*a[j+3]
         +c[1]*a[j+4]-c[0]*a[j+5];
    }
 wksph[i]= c[0]*a[j]+c[1]*a[j+1]
     +c[2]*a[j+2]+c[3]*a[j+3]
     +c[4]*a[0]+c[5]*a[1];
 wksp[i] = c[5]*a[j]-c[4]*a[j+1]
     +c[3]*a[j+2]-c[2]*a[j+3]
     +c[1]*a[0]-c[0]*a[1];
 i++;j+=2;
 wksph[i]= c[0]*a[j]+c[1]*a[j+1]
     +c[2]*a[0]+c[3]*a[1]
     +c[4]*a[2]+c[5]*a[3];
 wksp[i] = c[5]*a[j]-c[4]*a[j+1]
     +c[3]*a[0]-c[2]*a[1]
     +c[1]*a[2]-c[0]*a[3];

 memcpy(a,wksp,sizeof(double)*n);
 (*ops)+=((double)11)*n;
 return;
}

int main(int argc,const char *argv[])
{
 double *x,h,ops=0;
 int i;
 clock_t start;

 if(!(x=malloc(2*N*sizeof(double))))
    {
     fputs("out of memory\n",stderr);
     return EXIT_FAILURE;
    }

 for(i=0;i<N;i++)
    {
     h=(double)i;
     x[i]=sin(h*PI/(double)N);
    }

 start=clock();
 for(i=0;i<10000;i++)
     trafo(x,N,x+N,&ops);
 h=(double)(clock()-start)/(double)CLOCKS_PER_SEC;
 printf("%.2f MFLOPS\n",1.0e-6*ops/h);

 free(x);
 return 0;
}

And here is the Makefile:

EXECUTABLES = m.pgcc m.egcs-2.91.42 m.gcc-2.7.2.1 m.not_aligned m.tcc
GCC_OPTS = -O3 -malign-double -fomit-frame-pointer -Wall -malign-loops=2 -malign-jumps=2 -malign-functions=2
all: $(EXECUTABLES) test
clean:
	rm -f  $(EXECUTABLES)
test:
	@echo "pgcc:"
	@m.pgcc
	@echo "gcc-2.7.2.1:"
	@m.gcc-2.7.2.1
	@echo "gcc-without double align:"
	@m.not_aligned
	@echo "egcs-2.91.42:"
	@m.egcs-2.91.42
	@echo "tcc:"
	@m.tcc
m.pgcc: m.c
	pgcc -O2 -tp p6 -Mnoframe -o $@ $< -lm
m.gcc-2.7.2.1: m.c
	/usr/bin/gcc $(GCC_OPTS) -o $@ $< -lm
m.not_aligned: m.c
	/usr/bin/gcc -O3 -fomit-frame-pointer -Wall -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o $@ $< -lm
m.egcs-2.91.42: m.c
	/sw/egcs/bin/gcc $(GCC_OPTS) -o $@ $< -lm
m.tcc: m.c
	tcc -Ysystem -O2 -o $@ $< -lm

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Performance measurements
@ 1998-06-24 21:23 N8TM
  0 siblings, 0 replies; 18+ messages in thread
From: N8TM @ 1998-06-24 21:23 UTC (permalink / raw)
  To: martin.kahlert; +Cc: egcs

In a message dated 6/24/98 11:05:35 AM Pacific Daylight Time,
martin.kahlert@mchp.siemens.de writes:

I will provide the source and the Makefile 

I tested this with RH5.0, with libc5 and binutils-2.9.1 installed, on a single
P II/233 64 MB.  

egcs-19980517	93.09 MFLOPS
gcc-not-aligned	76.63 MFLOPS
gcc-2.8.1		123.78 MFLOPS

I don't normally see gcc-2.8.1 running so much faster than egcs snapshots.
These speeds and yours are consistent with peak results I have obtained on
Livermore Kernels with g77.

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Performance measurements
@ 1998-06-25  3:09 Christian Iseli
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Iseli @ 1998-06-25  3:09 UTC (permalink / raw)
  To: martin.kahlert, davem; +Cc: egcs, axp-list

> I couldn't resist, with current CVS egcs sources, on a 300Mhz
> UltraSparc w/512K L2 cache running Linux:
> 
> ? ./fpubench
> 142.58 MFLOPS

Ah well, guess I couldn't resist either...
On my alpha box, with RH Linux 5.0, egcs 1.0.3, 600 MHz alpha, 2 MB L2 cache
(Aspen Durango II) I get 191.71 MFLOPS...

Cheers,
					Christian

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Performance measurements
@ 1998-06-25  6:50 Brad M. Garcia
  0 siblings, 0 replies; 18+ messages in thread
From: Brad M. Garcia @ 1998-06-25  6:50 UTC (permalink / raw)
  To: egcs

I ran Martin's test program on my machine.
This should help answer Gerald's question about egcs 1.0.3a.

Single PPro 200, 128MB ram, 2.0.34 kernel, glibc 2 (RH5.0).

gcc2.7.2.3:   77.68 MFLOPS  
egcs1.0.3a:   64.00 MFLOPS
gcc2.7.2.3, not aligned:   23.86 MFLOPS
egcs1.0.3a, not aligned:   61.55 MFLOPS


Brad Garcia
   ___/  __ /  __ /  ___/ "Being the Linux of digital media
  __/   /  /  / _/  __/    would be a very good life."
_/    ____/ _/ _| ____/      - Jean-Louis Gassee, CEO of Be, Inc.


^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Performance measurements
@ 1998-06-27 15:52 John Wehle
  0 siblings, 0 replies; 18+ messages in thread
From: John Wehle @ 1998-06-27 15:52 UTC (permalink / raw)
  To: law; +Cc: egcs

> Here's some more info.  PPro200
> 
> egcs-1.0.3	 69.96
> today's sources	 70.40
> 
> Note I get 73.14 if I remove all the various -malign switches.
> 
> Not particularly good.    Can someone look into this?  It might
> be another case of double alignment losing badly.  I don't know
> x86 issues well enough.

Part of the problem is due to loop turning:

(insn 73 71 74 (set (reg:DF 49)
        (mem/s:DF (plus:SI (plus:SI (mult:SI (reg/v:SI 29)
                        (const_int 8))
                    (reg/v:SI 21))
                (const_int 16)))) 74 {movdf+1} (nil)
    (nil))

(insn 74 73 75 (set (reg:DF 50)
        (mult:DF (reg:DF 48)
            (reg:DF 49))) 360 {ffshi_1+1} (nil)
    (nil))

into:

(insn 73 69 74 (set (reg:DF 49)
        (mem/s:DF (reg:SI 104))) -1 (nil)
    (nil))

(insn 74 73 75 (set (reg:DF 50)
        (mult:DF (reg:DF 48)
            (reg:DF 49))) -1 (nil)
    (nil))

which global register allocation turns into:

(insn 295 298 74 (set (reg:SI 2 %ecx)
        (mem:SI (plus:SI (reg:SI 7 %esp)
                (const_int 16)))) -1 (nil)
    (nil))

(insn:HI 74 295 75 (set (reg:DF 9 %st(1))
        (mult:DF (reg:DF 9 %st(1))
            (mem/s:DF (reg:SI 2 %ecx)))) 360 {ffshi_1+1} (nil)
    (nil))

because it had to spill (reg:SI 104) since the Intel 386 is a
register poor machine.  Defining DONT_REDUCE_ADDR when builting
egcs results in:

(insn:HI 74 379 75 (set (reg:DF 9 %st(1))
        (mult:DF (reg:DF 9 %st(1))
            (mem/s:DF (plus:SI (plus:SI (mult:SI (reg/v:SI 0 %eax)
                            (const_int 8))
                        (reg:SI 2 %ecx))
                    (const_int 16))))) 360 {ffshi_1+1} (nil)
    (nil))

after global register allocation.  The corresponding benchmark
results on a 233 MHz Pentium II running FreeBSD 3.0 are:

egcs-19980621 aout: 87.91 MFLOPS
egcs-19980621 elf: 86.85 MFLOPS

egcs-19980621 DONT_REDUCE_ADDR aout: 105.24 MFLOPS
egcs-19980621 DONT_REDUCE_ADDR elf: 105.24 MFLOPS

Possible solutions:

  1) Don't call find_mem_givs if SMALL_REGISTER_CLASSES.

  2) Don't consider the giv if SMALL_REGISTER_CLASSES and it's a valid
     memory address for the machine.

  3) Consider the giv but don't take an action which will result in
     a new register / (more registers then before) if SMALL_REGISTER_CLASSES
     and the giv is a valid memory address for the machine.

BTW, I'm pulling these solutions out of thin air as I'm not up to speed
with the operation of loop.

-- John
-------------------------------------------------------------------------
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: john@feith.com  |
|    John Wehle    |     Fax: 1-215-540-5495  |                         |
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~1998-07-03  7:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-06-24  2:28 Performance measurements Martin Kahlert
1998-06-24  8:51 ` David S. Miller
1998-06-24 10:08 ` Gerald Pfeifer
1998-06-25  3:09 ` Jeffrey A Law
     [not found] ` <3590D5AE.167EB0E7@iis.fhg.de>
     [not found]   ` <19980624124843.A15248@keksy.mchp.siemens.de>
     [not found]     ` <3591031A.2781E494@iis.fhg.de>
     [not found]       ` <19980624170051.21290@haegar.physiol.med.tu-muenchen.de>
1998-06-25  3:09         ` Performance measurements (thanks and conclusion) Martin Kahlert
1998-06-26  1:05 ` Performance measurements Aubert Pierre
1998-06-27  2:25   ` Jeffrey A Law
1998-06-29 22:34 ` Rask Ingemann Lambertsen
1998-07-01  3:42   ` Nicholas Lee
1998-07-01 21:20 ` Marc Lehmann
1998-07-02  7:14   ` Craig Burley
1998-07-02 22:44     ` Marc Lehmann
1998-07-03  7:20       ` Toon Moene
1998-07-02 15:15   ` Joern Rennecke
1998-06-24 21:23 N8TM
1998-06-25  3:09 Christian Iseli
1998-06-25  6:50 Brad M. Garcia
1998-06-27 15:52 John Wehle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).