public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Bernd's reload patch installed
@ 1998-10-28  1:58 Jeffrey A Law
  1998-10-29  0:50 ` Ben Cheese Was: " Robert Lipe
  0 siblings, 1 reply; 8+ messages in thread
From: Jeffrey A Law @ 1998-10-28  1:58 UTC (permalink / raw)
  To: egcs

I've just installed Bernd's last patch to implement localized spilling in
reload.

This is a major design change for the reload pass that should be a noticable
code generation improvement for targets with limited register sets.  See the
web page for a few more details about what Bernd's patches do.

The egcs tree has been more unstable than I'd like for a while now, so I'd like
to slow down development for a short period of time so that we can stabilize
the tree and try to shake out bugs buried in the massive reload work done by
Bernd and Joern.

Note the slowdown is meant to effect the optimization and code generation
aspects of the compiler.  Not the front-ends or supporting code like cpp,
fixincludes, etc.


Bernd -- thanks for all the contributions!

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Ben Cheese Was: Bernd's reload patch installed
  1998-10-28  1:58 Bernd's reload patch installed Jeffrey A Law
@ 1998-10-29  0:50 ` Robert Lipe
  1998-10-30  5:07   ` EGCS performance ? Sergio Ruocco
  0 siblings, 1 reply; 8+ messages in thread
From: Robert Lipe @ 1998-10-29  0:50 UTC (permalink / raw)
  To: egcs

Jeffrey A Law wrote:

> This is a major design change for the reload pass that should be a noticable
> code generation improvement for targets with limited register sets.  See the

This is pretty unscientific, but I thought I'd report it anyway.
Developed for a Digi internal troll, I have a script that runs around
various ia32 compilers within reach and runs dhrystone on them with
options that are as similar as is practical.  I won't promise that
this dhrystone is "pristine" so don't compare the absolute numbers
to anything else.  This is meant only to be a relative performance
indicator.  Arguments about dhrystone being obsolete or a piece of crap
are beyond the scope of the immediate discussion. ;-)

# GCC = egcs 1.1b

gcc -o dry -mpentiumpro -O2 -finline-functions  -funroll-all-loops -fexpensive-optimizations  -DTIMES dhry_1.c dhry_2.c
Microseconds for one run through Dhrystone:    1.3
Dhrystones per Second:                      781860.8

# This is EGCS as of this morning.   

/play/negcs/gcc/xgcc -B/play/negcs/gcc/ -o dry -mpentiumpro -O2 -finline-functions  -funroll-all-loops -fexpensive-optimizations  -DTIMES dhry_1.c dhry_2.c
Microseconds for one run through Dhrystone:    1.2
Dhrystones per Second:                      829875.5

# This is the UDK target as of Oct 22.   This number had been as close
# to identical as the one just above as it could be and had been hovering
# slightly below the 1.1b numbers for some time.  I think this means
# we can attribute the difference in these two numbers to something that's
# changed in the tree in the last six days.   

/play/tmp/7/gcc/xgcc -B/play/tmp/7/gcc/ -o dry -mpentiumpro -O2 -finline-functions  -funroll-all-loops -fexpensive-optimizations  -DTIMES dhry_1.c dhry_2.c
Microseconds for one run through Dhrystone:    1.3
Dhrystones per Second:                      774593.3

# This is the OpenServer native compiler.   No P6 optimization options 
# are available.

/bin/cc -o dry -belf -Kpentium -O2 -Khost -Kinline -Kloop_unroll -DTIMES dhry_1.c dhry_2.c
dhry_1.c:
dhry_2.c:
Microseconds for one run through Dhrystone:    1.0
Dhrystones per Second:                      974658.9

# This is the SVR5 compiler from SCO.
/udk/usr/ccs/bin/cc -o dry -Kpentium_pro -O2 -Khost -Kinline -Kloop_unroll -DTIMES dhry_1.c dhry_2.c
dhry_1.c:
dhry_2.c:
Microseconds for one run through Dhrystone:    0.9
Dhrystones per Second:                      1086956.5

# This is Intel's Optimizing Compiler 2.1.4.   It actually will do 
# interprocedural analysis across source files (!) when handed multiple 
# files on a single invocation like this.
icc -o dry -belf -ip -mem -O1 -pad -tp p6 -DTIMES dhry_1.c dhry_2.c
dhry_1.c:
dhry_2.c:
Microseconds for one run through Dhrystone:    0.6
Dhrystones per Second:                      1582278.5

# Same compiler without the i/p analysis.   It's about neck and neck with
# SCO's tools in "real world" use.   (Do you compile imagemagic all on one
# command line?    Didn't think so. :-) 
icc -o dry -belf -mem -O1 -pad -tp p6 -DTIMES dhry_1.c dhry_2.c
dhry_1.c:
dhry_2.c:
Microseconds for one run through Dhrystone:    0.9
Dhrystones per Second:                      1088139.2




> Bernd -- thanks for all the contributions!

Indeed.   Thanx to everyone.

RJL

^ permalink raw reply	[flat|nested] 8+ messages in thread

* EGCS performance ?
  1998-10-29  0:50 ` Ben Cheese Was: " Robert Lipe
@ 1998-10-30  5:07   ` Sergio Ruocco
  1998-10-30 15:21     ` David Edelsohn
                       ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Sergio Ruocco @ 1998-10-30  5:07 UTC (permalink / raw)
  To: Robert Lipe

 RL> to anything else.  This is meant only to be a relative performance
 RL> indicator.  Arguments about dhrystone being obsolete or a piece of crap
 RL> are beyond the scope of the immediate discussion. ;-)
 
Ok, I agree that Dhry. is a quasi-meaningless test, and as far as the
topic of my posting is concerned, performance is only a component of
the overall quality of compiler, however...
 
 RL> # GCC = egcs 1.1b                              781860.8
 RL> # This is EGCS as of this morning.             829875.5
 RL> # This is the SVR5 compiler from SCO.         1086956.5
 RL> # This is Intel's Optimizing Compiler 2.1.4.  1582278.5
 RL> # Same compiler without the i/p analysis.     1088139.2

...I was surprised to see a large performance gulf (?) in such a
simple test among EGCS and other "proprietary" compilers, and I'm
wondering what is causing these gaps:

- better intermediate optimizations (see the 50% improvement due
  only to i/p analysis)
...
- dedicated x86 schedulers
- embedded-auto-recognition of Drystone benchmark code... :-)

If someone could test with a better benchmark EGCS vs. other
compilers on different (RISC) architectures we could determine this
performance gap still exists, and how much is due to x86 scheduling
arcana (if they come on par) vs to machine-independent optimizations
(if the 30-200% performance gulf stays).

	Sergio Ruocco - ruoccos@comm2000.it


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: EGCS performance ?
  1998-10-30  5:07   ` EGCS performance ? Sergio Ruocco
  1998-10-30 15:21     ` David Edelsohn
@ 1998-10-30 15:21     ` Joe Buck
  1998-10-30 19:14     ` Robert Lipe
  1998-10-31  5:02     ` Jan Hubicka
  3 siblings, 0 replies; 8+ messages in thread
From: Joe Buck @ 1998-10-30 15:21 UTC (permalink / raw)
  To: Sergio Ruocco; +Cc: egcs

> Ok, I agree that Dhry. is a quasi-meaningless test, and as far as the
> topic of my posting is concerned, performance is only a component of
> the overall quality of compiler, however...
>  
>  RL> # GCC = egcs 1.1b                              781860.8
>  RL> # This is EGCS as of this morning.             829875.5
>  RL> # This is the SVR5 compiler from SCO.         1086956.5
>  RL> # This is Intel's Optimizing Compiler 2.1.4.  1582278.5
>  RL> # Same compiler without the i/p analysis.     1088139.2
> 
> ...I was surprised to see a large performance gulf (?) in such a
> simple test among EGCS and other "proprietary" compilers, and I'm
> wondering what is causing these gaps:

Some compiler vendors work very hard at optimizations specific for
Dhrystone, which isn't hard to do -- this is one reason why it's
such a misleading benchmark.  The gcc team, in the past, was opposed
to this kind of cheating so I believe that some patches designed to
specifically improve Dhrystone have been rejected.

> - embedded-auto-recognition of Drystone benchmark code... :-)

Well, the cheating isn't quite *that* blatant.  It's more that the
vendors choose transformations that will help their Dhrystone score
even if it doesn't help any other programs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: EGCS performance ?
  1998-10-30  5:07   ` EGCS performance ? Sergio Ruocco
@ 1998-10-30 15:21     ` David Edelsohn
  1998-10-31 16:39       ` Joe Buck
  1998-10-30 15:21     ` Joe Buck
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: David Edelsohn @ 1998-10-30 15:21 UTC (permalink / raw)
  To: Sergio Ruocco; +Cc: egcs

>>>>> "Sergio Ruocco" writes:


RL> # GCC = egcs 1.1b                              781860.8
RL> # This is EGCS as of this morning.             829875.5

RL> # This is Intel's Optimizing Compiler 2.1.4.  1582278.5
RL> # Same compiler without the i/p analysis.     1088139.2

Sergio> ...I was surprised to see a large performance gulf (?) in such a
Sergio> simple test among EGCS and other "proprietary" compilers, and I'm
Sergio> wondering what is causing these gaps:

Sergio> If someone could test with a better benchmark EGCS vs. other
Sergio> compilers on different (RISC) architectures we could determine this
Sergio> performance gap still exists, and how much is due to x86 scheduling
Sergio> arcana (if they come on par) vs to machine-independent optimizations
Sergio> (if the 30-200% performance gulf stays).

	I would not be too concerned about Intel's compiler gap.  Intel's
compiler is optimized to make standard benchmarks look good, not to
produce efficient code for representative user applications.  There are
significant optimization opportunities for benchmarks which will harm
performance on general user code.  I believe that GCC explicitly has a
policy of not implementing those types of optimizations.  Anybody who
chooses a compiler based on industry benchmarks as opposed to benchmarking
their particular task deserves what they get.

David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: EGCS performance ?
  1998-10-30  5:07   ` EGCS performance ? Sergio Ruocco
  1998-10-30 15:21     ` David Edelsohn
  1998-10-30 15:21     ` Joe Buck
@ 1998-10-30 19:14     ` Robert Lipe
  1998-10-31  5:02     ` Jan Hubicka
  3 siblings, 0 replies; 8+ messages in thread
From: Robert Lipe @ 1998-10-30 19:14 UTC (permalink / raw)
  To: Sergio Ruocco, Robert Lipe

>  RL> # GCC = egcs 1.1b                              781860.8
>  RL> # This is EGCS as of this morning.             829875.5
>  RL> # This is the SVR5 compiler from SCO.         1086956.5
>  RL> # This is Intel's Optimizing Compiler 2.1.4.  1582278.5
>  RL> # Same compiler without the i/p analysis.     1088139.2
> 
> ...I was surprised to see a large performance gulf (?) in such a
> simple test among EGCS and other "proprietary" compilers, and I'm
> wondering what is causing these gaps:

That "gulf" is actually closing in recent months.  I don't think that
anyone has reasonably claimed EGCS/GCC to generate the best code in the
industry for ia32 systems.

> - better intermediate optimizations (see the 50% improvement due
>   only to i/p analysis)
> ...
> - dedicated x86 schedulers
> - embedded-auto-recognition of Drystone benchmark code... :-)

Yes, the -ip stuff is a little tacky and not at all representative of
how people use compilers.  Personally, I suspect option #2 is the right
one.  The others you cite all have p6 optimizations and are highly
optimized for a single target instead of being a portable compiler
primarily and secondarily generating highly target-specific optimized
code.  After all, we're only now starting to really see P5 and P6
pipeline optimizations in EGCS, right?

These were also compilers produced by groups that had access to
"Appendix H", simulators, logic analyzers, vtune-like substances, and
other things that were long out of reach of the average GCC developer.

> If someone could test with a better benchmark EGCS vs. other
> compilers on different (RISC) architectures we could determine this
> performance gap still exists, and how much is due to x86 scheduling
> arcana (if they come on par) vs to machine-independent optimizations
> (if the 30-200% performance gulf stays).

I'll extend the offer that if someone feels "shorted" by these results
and has a benchmark that they'd like run on the entire suite for ia32,
I'll do it.  (It has to be written in a standard language.  I'm not
going to untangle a bunch of asms or __extension stuff.)

Dhrystone aside, I've run them against larger programs (xv, imagemagic,
etc.) and while they all have some strengths and some weaknesses the
relative ranking cited above (icc, udk/svr5 cc, osr5 cc, egcs/gcc)
pretty well matches my experiences.

RJL


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: EGCS performance ?
  1998-10-30  5:07   ` EGCS performance ? Sergio Ruocco
                       ` (2 preceding siblings ...)
  1998-10-30 19:14     ` Robert Lipe
@ 1998-10-31  5:02     ` Jan Hubicka
  3 siblings, 0 replies; 8+ messages in thread
From: Jan Hubicka @ 1998-10-31  5:02 UTC (permalink / raw)
  To: Sergio Ruocco; +Cc: Robert Lipe

> ...I was surprised to see a large performance gulf (?) in such a
> simple test among EGCS and other "proprietary" compilers, and I'm
> wondering what is causing these gaps:
> 
> - better intermediate optimizations (see the 50% improvement due
>   only to i/p analysis)
> ...
> - dedicated x86 schedulers
> - embedded-auto-recognition of Drystone benchmark code... :-)
I've compared the assembly output from Intel optimizing scheduler and egcs
and my conclusion is, that egcs is not so bad. Main sowdown seems to be caused
by fact, that dryrstone uses mostly char type. GCC is generaly very bad in
retyping variables. On i386 important point is also, that gcc can use just
low register halves to hold char values, so it runs into higher register
pressure and results in more spills.
You I've reported speedup with latest snapshot it is IMO caused by local
spilling patches that improved code in case of this register pressure.

Important role in dryrstone plays calling of function. What IOC does
is that it ignores push and uses moves saving the subl after call.
It generally don't seems to help (just makes code longer and worse)
but in this special case it should be advantage.
> 
> If someone could test with a better benchmark EGCS vs. other
> compilers on different (RISC) architectures we could determine this
I would expect better results at other architectures becuase of register
haves handicap.
> performance gap still exists, and how much is due to x86 scheduling
I've done some tests with scheduling and it don't seems to help.
There is not much place for it.
The small functions that can not be scheduled very well.
One of thinks that help on pentium is to set alignment to 16, as IOC does.
(gcc uses 8)
surprising enabling haifa and -fschedule-insns has same effect, but it is probably
not because of scheduling but because code is reoredered so it fits better to cache
or so.
So I can improve dryrstone test from 130 000 to 203 000 by enabling haifa
alignment and -fschedule-insns and improve it to 215 000 using my patches.
(the small speedup in my patches generaly isn't caused by scheduling changes,
but by modified instruction selection)
> arcana (if they come on par) vs to machine-independent optimizations
> (if the 30-200% performance gulf stays).

Honza
> 
> 	Sergio Ruocco - ruoccos@comm2000.it

-- 
------------------------------------------------------------------------------
                   Have you browsed my www pages? Look at:
                       http://www.paru.cas.cz/~hubicka
      Koules-the game for Svgalib,X11 and OS/2,  Xonix-the game for X11
      czech documentation for linux index, original 2D computer art and
              funny 100 years old photos and articles are there!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: EGCS performance ?
  1998-10-30 15:21     ` David Edelsohn
@ 1998-10-31 16:39       ` Joe Buck
  0 siblings, 0 replies; 8+ messages in thread
From: Joe Buck @ 1998-10-31 16:39 UTC (permalink / raw)
  To: David Edelsohn; +Cc: ruoccos, egcs

David E. writes:

[ re: special optimizations to boost benchmark scores ]

> I believe that GCC explicitly has a
> policy of not implementing those types of optimizations.  Anybody who
> chooses a compiler based on industry benchmarks as opposed to benchmarking
> their particular task deserves what they get.

Well, this depends on the quality of the benchmarks.  SPEC 95 isn't bad,
because most of the benchmarks are actual, widely used programs or
portions of them.  Dhrystone, on the other hand, is so easy to optimize
for in ways that don't improve real programs that it should never be
used for comparisions: you are only comparing how much effort the compiler
maker put into improving Dhrystone and learn little about the quality
of the compiler.

(I do think that we should take a look, though, if egcs version N+1 gets
a worse Dhrystone mark than version N, since we shouldn't go backwards
without finding out why).

> David
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1998-10-31 16:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-10-28  1:58 Bernd's reload patch installed Jeffrey A Law
1998-10-29  0:50 ` Ben Cheese Was: " Robert Lipe
1998-10-30  5:07   ` EGCS performance ? Sergio Ruocco
1998-10-30 15:21     ` David Edelsohn
1998-10-31 16:39       ` Joe Buck
1998-10-30 15:21     ` Joe Buck
1998-10-30 19:14     ` Robert Lipe
1998-10-31  5:02     ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).