best optimization under IRIX ?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* best optimization under IRIX ?
@ 2000-09-05 20:26 Matthias Kurz
  2000-09-05 23:36 ` Tim Prince
  2000-09-06  3:17 ` Erik Mouw
  0 siblings, 2 replies; 8+ messages in thread
From: Matthias Kurz @ 2000-09-05 20:26 UTC (permalink / raw)
  To: gcc

Hi.

What are the optimization options that give the maximum speed on
a O2000/R10000 ?
Currently the native "cc" with "-32 -Ofast=IP27" generates code that runs
twice as fast as "gcc -mabi=n32 -O3 -mips3 -r4000". I see neither -mips4
nor -r10000 in the man page.

   (mk)

-- 
Matthias Kurz; Fuldastr. 3; D-28199 Bremen; VOICE +49 421 53 600 47

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: best optimization under IRIX ?
  2000-09-05 20:26 best optimization under IRIX ? Matthias Kurz
@ 2000-09-05 23:36 ` Tim Prince
  2000-09-06  6:16   ` Matthias Kurz
  2000-09-06  3:17 ` Erik Mouw
  1 sibling, 1 reply; 8+ messages in thread
From: Tim Prince @ 2000-09-05 23:36 UTC (permalink / raw)
  To: Matthias Kurz, gcc

You must add -funroll-loops or -funroll-all-loops; I usually got about 70%
of MipsPro 7.3 Fortran speed from g77, and gcc should be closer than that to
the speed of MipsPro C.  -O3 does interprocedural analysis only in a forward
direction within one file, while -Ofast does it it link time.  Also, you
must perform many of the usual optimizations in source code as gcc is much
more faithful to your code.

----- Original Message -----
From: "Matthias Kurz" <mk@baerlap.north.de>
To: <gcc@gcc.gnu.org>
Sent: Tuesday, September 05, 2000 8:20 PM
Subject: best optimization under IRIX ?

>
> Hi.
>
> What are the optimization options that give the maximum speed on
> a O2000/R10000 ?
> Currently the native "cc" with "-32 -Ofast=IP27" generates code that runs
> twice as fast as "gcc -mabi=n32 -O3 -mips3 -r4000". I see neither -mips4
> nor -r10000 in the man page.
>
>
>    (mk)
>
> --
> Matthias Kurz; Fuldastr. 3; D-28199 Bremen; VOICE +49 421 53 600 47

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: best optimization under IRIX ?
  2000-09-05 20:26 best optimization under IRIX ? Matthias Kurz
  2000-09-05 23:36 ` Tim Prince
@ 2000-09-06  3:17 ` Erik Mouw
  2000-09-06  7:26   ` Matthias Kurz
  1 sibling, 1 reply; 8+ messages in thread
From: Erik Mouw @ 2000-09-06  3:17 UTC (permalink / raw)
  To: mk; +Cc: gcc

On Wed, 6 Sep 2000 05:20:36 +0200, Matthias Kurz wrote:
> What are the optimization options that give the maximum speed on
> a O2000/R10000 ?

I usually use (on an Onyx2/R12000):
-O2 -mabi=n32 -mips4 -mcpu=r8000 -funroll-loops -fomit-frame-pointer

> Currently the native "cc" with "-32 -Ofast=IP27" generates code that runs
> twice as fast as "gcc -mabi=n32 -O3 -mips3 -r4000". 

Depends a lot on the source code. I have examples where gcc-2.95.2
generates code that runs three to four times as fast as the native MipsPRO
C compiler.

> I see neither -mips4 nor -r10000 in the man page.

The manpage is not up to date. Read the info files.


Erik

-- 
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
Phone: +31-15-2783635  Fax: +31-15-2781843  Email: J.A.K.Mouw@its.tudelft.nl
WWW: http://www-ict.its.tudelft.nl/~erik/



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: best optimization under IRIX ?
  2000-09-05 23:36 ` Tim Prince
@ 2000-09-06  6:16   ` Matthias Kurz
  0 siblings, 0 replies; 8+ messages in thread
From: Matthias Kurz @ 2000-09-06  6:16 UTC (permalink / raw)
  To: Tim Prince; +Cc: gcc

On Tue, Sep 05, 2000 at 11:38:45PM -0700, Tim Prince wrote:
> You must add -funroll-loops or -funroll-all-loops;

"-funroll-all-loops" is good for another 7% of speed up. Well,
i tried only once... "-funroll-loops" shows no effect currently.

>                                                    I usually got about 70%
> of MipsPro 7.3 Fortran speed from g77, and gcc should be closer than that to
> the speed of MipsPro C.  -O3 does interprocedural analysis only in a forward
> direction within one file, while -Ofast does it it link time.  Also, you
> must perform many of the usual optimizations in source code as gcc is much
> more faithful to your code.

May be. (Un-?)fortunately it's not my code :)
That does not mean that i could do it better but that i don't even try
to mess with the algorithms/ordering.

> ----- Original Message -----
> From: "Matthias Kurz" <mk@baerlap.north.de>
> > What are the optimization options that give the maximum speed on
> > a O2000/R10000 ?
> > Currently the native "cc" with "-32 -Ofast=IP27" generates code that runs
> > twice as fast as "gcc -mabi=n32 -O3 -mips3 -r4000". I see neither -mips4
> > nor -r10000 in the man page.

I'm using -n32 with cc btw. - but that's covered by -Ofast anyway.
           ^

   (mk)

-- 
Matthias Kurz; Fuldastr. 3; D-28199 Bremen; VOICE +49 421 53 600 47

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: best optimization under IRIX ?
  2000-09-06  3:17 ` Erik Mouw
@ 2000-09-06  7:26   ` Matthias Kurz
  2000-09-06 20:28     ` Tim Prince
  0 siblings, 1 reply; 8+ messages in thread
From: Matthias Kurz @ 2000-09-06  7:26 UTC (permalink / raw)
  To: Erik Mouw; +Cc: gcc

On Wed, Sep 06, 2000 at 01:06:29PM +0000, Erik Mouw wrote:
> On Wed, 6 Sep 2000 05:20:36 +0200, Matthias Kurz wrote:
> > What are the optimization options that give the maximum speed on
> > a O2000/R10000 ?
> 
> I usually use (on an Onyx2/R12000):
> -O2 -mabi=n32 -mips4 -mcpu=r8000 -funroll-loops -fomit-frame-pointer

No effect. Well, it's like -funroll-all-loops, just the other direction,
that means somewhat slower. Hmmm, hope i'm not doing something silly.
I have to check twice.
It's gcc-2.95.2, maybe i'll try the current CVS later. Any hints for
configure options ?

> > Currently the native "cc" with "-32 -Ofast=IP27" generates code that runs
> > twice as fast as "gcc -mabi=n32 -O3 -mips3 -r4000". 
> 
> Depends a lot on the source code. I have examples where gcc-2.95.2
> generates code that runs three to four times as fast as the native MipsPRO
> C compiler.

Sure that no dnetc was running when you tested the MipsPRO results ? One
should shoot them... Or a screen saver ?
The cc code runs twice as fast on a R12000, again (did not try gcc,
i guess it will be also twice as fast). But then, the R12000/300 has 8MB
cache, while the R10000/180 has only 1MB.

> > I see neither -mips4 nor -r10000 in the man page.
> 
> The manpage is not up to date. Read the info files.

And break my fingers ? :-) I'll try.

   (mk)

-- 
Matthias Kurz; Fuldastr. 3; D-28199 Bremen; VOICE +49 421 53 600 47

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: best optimization under IRIX ?
  2000-09-06  7:26   ` Matthias Kurz
@ 2000-09-06 20:28     ` Tim Prince
  2000-09-07  4:56       ` Matthias Kurz
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Prince @ 2000-09-06 20:28 UTC (permalink / raw)
  To: Matthias Kurz, Erik Mouw; +Cc: gcc

----- Original Message -----
From: "Matthias Kurz" <mk@baerlap.north.de>
To: "Erik Mouw" <J.A.K.Mouw@its.tudelft.nl>
Cc: <gcc@gcc.gnu.org>
Sent: Wednesday, September 06, 2000 7:20 AM
Subject: Re: best optimization under IRIX ?


> The cc code runs twice as fast on a R12000, again (did not try gcc,
> i guess it will be also twice as fast). But then, the R12000/300 has 8MB
> cache, while the R10000/180 has only 1MB.

I saw some cases where gcc ran somewhat better relative to MipsPro cc on the
r12k than on the r10k, just enough so to support the claim that r12k had
corrected some bottlenecks.  8MB cache?  Your problem will have to sit a
long time on a single processor to take advantage of that.  The r12k box I
had access to had 4MB L2 cache, and my processes were interrupted far too
often to take advantage of that.  Funny thing, my laptop does as well on
Livermore Kernels as that r12k did.  What a difference a year or two can
make.
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: best optimization under IRIX ?
  2000-09-06 20:28     ` Tim Prince
@ 2000-09-07  4:56       ` Matthias Kurz
  2000-09-07  5:47         ` Matthias Kurz
  0 siblings, 1 reply; 8+ messages in thread
From: Matthias Kurz @ 2000-09-07  4:56 UTC (permalink / raw)
  To: Tim Prince; +Cc: Erik Mouw, gcc

On Wed, Sep 06, 2000 at 08:30:30PM -0700, Tim Prince wrote:
> 
> ----- Original Message -----
> From: "Matthias Kurz" <mk@baerlap.north.de>
> 
> > The cc code runs twice as fast on a R12000, again (did not try gcc,
> > i guess it will be also twice as fast). But then, the R12000/300 has 8MB
> > cache, while the R10000/180 has only 1MB.
> 
> I saw some cases where gcc ran somewhat better relative to MipsPro cc on the
> r12k than on the r10k, just enough so to support the claim that r12k had
> corrected some bottlenecks.  8MB cache?  Your problem will have to sit a
> long time on a single processor to take advantage of that.

My (real world) test was running on "one" processor of a 4 processor box
that was more than 99% idle. Ok, not quite real world.
What do you mean with "gcc ran somewhat better relative to cc on the r12k" ?
That the speed diff from gcc to cc was smaller on the r12k than on the
r10k ?
I tried the gcc code on the R12000, to see whether the bigger cache helps.
But, while the cc code ran 50% faster on the R12000, the gcc one only got
a speedup of 35%. That's _one_ program and _one_ data set. One will have
to run many different programs with different data to get a picture. I
was only here to ask for "best" configure/compile options. Maybe in the
same time i could have tried the most permutations :)

>                                                             The r12k box I
> had access to had 4MB L2 cache, and my processes were interrupted far too
> often to take advantage of that.  Funny thing, my laptop does as well on
> Livermore Kernels as that r12k did.  What a difference a year or two can
> make.

Well, from the numbers it would need a 808MHz PIII with 256k cache to
catch the R12000. I'm using a 550MHz one, currently. But one would have
to compare fully utilized boxes with the same number of processors.
That's another story.

   (mk)

-- 
Matthias Kurz; Fuldastr. 3; D-28199 Bremen; VOICE +49 421 53 600 47

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: best optimization under IRIX ?
  2000-09-07  4:56       ` Matthias Kurz
@ 2000-09-07  5:47         ` Matthias Kurz
  0 siblings, 0 replies; 8+ messages in thread
From: Matthias Kurz @ 2000-09-07  5:47 UTC (permalink / raw)
  To: Tim Prince; +Cc: Erik Mouw, gcc

On Thu, Sep 07, 2000 at 01:52:59PM +0200, Matthias Kurz wrote:
> same time i could have tried the most permutations :)
Kombinations. And one has to know all options :-)

   (mk)

-- 
Matthias Kurz; Fuldastr. 3; D-28199 Bremen; VOICE +49 421 53 600 47

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2000-09-07  5:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-09-05 20:26 best optimization under IRIX ? Matthias Kurz
2000-09-05 23:36 ` Tim Prince
2000-09-06  6:16   ` Matthias Kurz
2000-09-06  3:17 ` Erik Mouw
2000-09-06  7:26   ` Matthias Kurz
2000-09-06 20:28     ` Tim Prince
2000-09-07  4:56       ` Matthias Kurz
2000-09-07  5:47         ` Matthias Kurz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).