Re: ridiculous amounts of padding

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: ridiculous amounts of padding
  1999-01-31 23:58 ` Joern Rennecke
@ 1999-01-31 23:58   ` Jeffrey A Law
  1999-01-31 23:58     ` Peter Barada
  1999-01-31 23:58     ` Joern Rennecke
  0 siblings, 2 replies; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Zack Weinberg, egcs

  In message < 199901121529.PAA26847@phal.cygnus.co.uk >you write:
  > This sounds like a hard pproperty to verify, and it won't be true very
  > often overall.
  > A different approach would be to only do the alignment if it reduces the
  > total number of cache lines the data item ends up in.  We need support
  > from the assembler with a .align directive that supports a maximum
  > alignment padding.  If we name the cache line size as CLS (32 for the
  > above target), and the data items size as SIZE, the max padding is:
  > (SIZE - 1) & CLS
The point behind the string alignments is not to improve cache behavior, but to
provide the strings on aligned addresses for memcpy, strcpy and other routines
that will examine the alignment at compile or runtime and possibly select a
more efficient loop if the alignment of the operands are suitable.


jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58       ` Zack Weinberg
@ 1999-01-31 23:58         ` Jeffrey A Law
  1999-01-31 23:58           ` Zack Weinberg
  0 siblings, 1 reply; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Peter Barada, egcs

  In message < 199901121950.OAA15108@rabi.phys.columbia.edu >you write:
  > Alignment for efficient copying is a win here, but wouldn't 8 byte
  > alignment
  > suffice (to be able to use aligned 64 bit load/store instructions)?  I'm
  > pretty certain this is so for x86, not as sure for other platforms.
Depends on the str* mem* implementations available on the target system.

I've worked on some that would do things like 4X unrolled copies through
64bit wide FP regs if the alignments allowed it.

I'm not saying the alignment isn't over-aggressive, just that it's not as 
clear cut as you may think.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58           ` John Vickers
@ 1999-01-31 23:58             ` Joern Rennecke
  0 siblings, 0 replies; 37+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: John Vickers; +Cc: egcs

> Joern Rennecke wrote:
> > 
> > > Even a one-char string copy could be faster done as a word-move on some
> > > platforms.
> > 
> > But you may not do that unless you know that you may access the extra data.
> > So the applicability of that optimization seems to be pretty low.
> 
> If we've got a one-byte string with 4-byte alignment,   the last two
> bytes
> in the 4-byte word are unlikely to fall off the end of a memory chip,
> or or to lie in a different memory protection region :-).
> 
> Or are you thinking of segment limit registers (Ugh) ?

There might be something at the destination that would be clobbered.
And an aligned source operand could be in some I/O register that doesn't
span a full word, or gives side effects on reading the full word (if the
processor / memory interface allows to make that distinction).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58     ` Peter Barada
  1999-01-31 23:58       ` Zack Weinberg
@ 1999-01-31 23:58       ` Marc Espie
  1999-01-31 23:58       ` Nick Ing-Simmons
  2 siblings, 0 replies; 37+ messages in thread
From: Marc Espie @ 1999-01-31 23:58 UTC (permalink / raw)
  To: pbarada; +Cc: egcs

In article < 199901121833.NAA30076@hyper.wavemark.com > you write:

>>The point behind the string alignments is not to improve cache behavior, but to
>>provide the strings on aligned addresses for memcpy, strcpy and other routines
>>that will examine the alignment at compile or runtime and possibly select a
>>more efficient loop if the alignment of the operands are suitable.

>Aligning the strings won't help if the size of the strings is small
>enough that the string code is forced to use brute force.

>So don't align strings if their length is below some small
>constant.  This will dramatically cut down on the wasted space as a
>percent of the total string size. 

This is already what CONSTANT_ALIGNMENT() in i386.h does...

What would probably help is to stash small strings together so that big
strings alignment does not gobble memory. I don't know whether this is
a good idea in general... there's a chance the programmer puts logically
linked strings together, and sorting them and scattering them over memory
may increase page faults on some systems ?  but considering we're dealing
with constant strings, this is probably informative messages, so the
programmer didn't care overly too much about it.

Anyway, this looks like precisely the kind of optimization that -Os should
do.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 ridiculous amounts of padding Zack Weinberg
  1999-01-31 23:58 ` Alfred Perlstein
@ 1999-01-31 23:58 ` Joern Rennecke
  1999-01-31 23:58   ` Jeffrey A Law
  1 sibling, 1 reply; 37+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

> Since everyone is discussing string functions...
> 
> Code of the form
> 
> puts("line 1");
> puts("line 2");
> puts("line 3");
> ...
> puts("line 96");
> 
> is compiled (at least on x86) into a series of calls and a bunch of string
> table entries.  Fine.  The problem is, in the string table we emit
> .align 32 directives between all the strings.  This wastes quite a lot of
> space.  It seems that this is because of a general policy of aligning
> .rodata items on 32 byte boundaries - I assume for cache reasons.  When we
> know, as in this case, that all the strings will be accessed in succession,
> I think we should be able to remove the padding.  (Kaveh's suggested
> collapse-to-one-call optimization would be good too.)

This sounds like a hard pproperty to verify, and it won't be true very often
overall.
A different approach would be to only do the alignment if it reduces the
total number of cache lines the data item ends up in.  We need support
from the assembler with a .align directive that supports a maximum
alignment padding.  If we name the cache line size as CLS (32 for the
above target), and the data items size as SIZE, the max padding is:
(SIZE - 1) & CLS

Thus, if you have a sucession of short strings, you can expect to get only
a few and relatively small paddings there.

Of course, we still want to align the start of the section, to avoid
having a cache line that spans text and rodata section, since that
leads to poor performance on processors with separate data and instruction
caches.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58     ` Peter Barada
  1999-01-31 23:58       ` Zack Weinberg
  1999-01-31 23:58       ` Marc Espie
@ 1999-01-31 23:58       ` Nick Ing-Simmons
  1999-01-31 23:58         ` Joern Rennecke
  2 siblings, 1 reply; 37+ messages in thread
From: Nick Ing-Simmons @ 1999-01-31 23:58 UTC (permalink / raw)
  To: pbarada; +Cc: egcs, law, zack, amylaar

Peter Barada <pbarada@wavemark.com> writes:
>>The point behind the string alignments is not to improve cache behavior, but to
>>provide the strings on aligned addresses for memcpy, strcpy and other routines
>>that will examine the alignment at compile or runtime and possibly select a
>>more efficient loop if the alignment of the operands are suitable.
>
>Aligning the strings won't help if the size of the strings is small
>enough that the string code is forced to use brute force.

Even a one-char string copy could be faster done as a word-move on some
platforms.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58       ` Nick Ing-Simmons
@ 1999-01-31 23:58         ` Joern Rennecke
  1999-01-31 23:58           ` John Vickers
  0 siblings, 1 reply; 37+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: nik; +Cc: pbarada, egcs, law, zack, amylaar

> Even a one-char string copy could be faster done as a word-move on some
> platforms.

But you may not do that unless you know that you may access the extra data.
So the applicability of that optimization seems to be pretty low.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58           ` Zack Weinberg
@ 1999-01-31 23:58             ` Jeffrey A Law
  0 siblings, 0 replies; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

  In message < 199901140000.TAA19809@rabi.phys.columbia.edu >you write:
  > glibc has very aggressively optimized string functions, that do tricks like
  > that.  They do not appear to require more than 8 byte alignment on x86.
  > I'm not sure about other platforms.
Right.  And we shouldn't restrict ourselves to looking at glibc to determine
what or how to optimize.


  > For the case that prompted the question (cpplib.c:print_help()) we emit
  > 4468 bytes of string constants and 1228 bytes of padding.  That's a 27.5%
  > space increase.
Hardly typical of most code I suspect since all this code is print a bunch
of strings, it has no other significant purpose.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58     ` Joern Rennecke
@ 1999-01-31 23:58       ` Jeffrey A Law
  0 siblings, 0 replies; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: zack, egcs

  In message < 199901121830.SAA27131@phal.cygnus.co.uk >you write:
  > Well, then it doesn't make sense to me to align the string to something
  > larger than the largest power of two that is smaller or equal to the string  size.
You're probably right.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 ridiculous amounts of padding Zack Weinberg
@ 1999-01-31 23:58 ` Alfred Perlstein
  1999-01-31 23:58 ` Joern Rennecke
  1 sibling, 0 replies; 37+ messages in thread
From: Alfred Perlstein @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

On Tue, 12 Jan 1999, Zack Weinberg wrote:

> 
> Since everyone is discussing string functions...
> 
> Code of the form
> 
> puts("line 1");
> puts("line 2");
> puts("line 3");
> ...
> puts("line 96");
> 
> is compiled (at least on x86) into a series of calls and a bunch of string
> table entries.  Fine.  The problem is, in the string table we emit
> .align 32 directives between all the strings.  This wastes quite a lot of
> space.  It seems that this is because of a general policy of aligning
> .rodata items on 32 byte boundaries - I assume for cache reasons.  When we
> know, as in this case, that all the strings will be accessed in succession,
> I think we should be able to remove the padding.  (Kaveh's suggested
> collapse-to-one-call optimization would be good too.)
> 
> I would rewrite it
> puts(	"line 1\n"
> 	"line 2\n"
> 	...
> 	"line 96");
> 
> but it has to compile on K+R compilers.

note: frag = sizeof(string) % cacheblock

ok,

I'm wondering why something isn't done to align strings in a smarter way:

by using the cache line size, strings could be sorted and then combined to
minimize cache fragmentation.

if a string is 33 bytes long and a cache line is 32 bytes I don't consider
it _THAT_(*) signifigant if the cache miss happens on the first character,
or the last.

if you could then keep track of frags, you could try to pack strings much
better and match them to set them next to strings that complement the frag
size to minimize data size.

wieghting the sorting could also be done on reference (is this already
done?) so that instead of requiring a new cache block for each string, you
hope that the string packed beforehand would be accessed and therefor
bring in the initial frag from the previous string.

(*) this is a bad example though, because unless locality of reference is
taken into account and the assumption that the beginning of a string is
probably accessed more, this could cause performance issues, but if the
previous string has been accessed this can cause a better cache
utilization.

this is on the assumption that the reason for the 32 byte align is because
of cache, if it's because of alignment speed, i hardly see the point of
using alignment on strings as they are accessed mosty in a byte
granularity.

please excuse if most of this has been done, but if someone thinks this is
a totally incorrect methodology, or that i'm thinking with totally
incorrect asssumptions, please explain, as it makes sense to me.

-Alfred

> 
> zw
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58         ` Joern Rennecke
@ 1999-01-31 23:58           ` John Vickers
  1999-01-31 23:58             ` Joern Rennecke
  0 siblings, 1 reply; 37+ messages in thread
From: John Vickers @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joern Rennecke, egcs

Joern Rennecke wrote:
> 
> > Even a one-char string copy could be faster done as a word-move on some
> > platforms.
> 
> But you may not do that unless you know that you may access the extra data.
> So the applicability of that optimization seems to be pretty low.

If we've got a one-byte string with 4-byte alignment,   the last two
bytes
in the 4-byte word are unlikely to fall off the end of a memory chip,
or or to lie in a different memory protection region :-).

Or are you thinking of segment limit registers (Ugh) ?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58   ` Jeffrey A Law
@ 1999-01-31 23:58     ` Peter Barada
  1999-01-31 23:58       ` Zack Weinberg
                         ` (2 more replies)
  1999-01-31 23:58     ` Joern Rennecke
  1 sibling, 3 replies; 37+ messages in thread
From: Peter Barada @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: amylaar, zack, egcs

>The point behind the string alignments is not to improve cache behavior, but to
>provide the strings on aligned addresses for memcpy, strcpy and other routines
>that will examine the alignment at compile or runtime and possibly select a
>more efficient loop if the alignment of the operands are suitable.

Aligning the strings won't help if the size of the strings is small
enough that the string code is forced to use brute force.

So don't align strings if their length is below some small
constant.  This will dramatically cut down on the wasted space as a
percent of the total string size. 

-- 
Peter Barada                             pbarada@wavemark.com

"Real men know that you should never attempt to accomplish with words
what you can do with a flame thrower" --Bruce Ferstein

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58         ` Jeffrey A Law
@ 1999-01-31 23:58           ` Zack Weinberg
  1999-01-31 23:58             ` Jeffrey A Law
  0 siblings, 1 reply; 37+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: egcs

On Tue, 12 Jan 1999 20:26:32 -0700, Jeffrey A Law wrote:
>
>
>  In message < 199901121950.OAA15108@rabi.phys.columbia.edu >you write:
>  > Alignment for efficient copying is a win here, but wouldn't 8 byte
>  > alignment
>  > suffice (to be able to use aligned 64 bit load/store instructions)?  I'm
>  > pretty certain this is so for x86, not as sure for other platforms.
>Depends on the str* mem* implementations available on the target system.
>
>I've worked on some that would do things like 4X unrolled copies through
>64bit wide FP regs if the alignments allowed it.
>
>I'm not saying the alignment isn't over-aggressive, just that it's not as 
>clear cut as you may think.

glibc has very aggressively optimized string functions, that do tricks like
that.  They do not appear to require more than 8 byte alignment on x86.  I'm
not sure about other platforms.

For the case that prompted the question (cpplib.c:print_help()) we emit 4468
bytes of string constants and 1228 bytes of padding.  That's a 27.5% space
increase.

zw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58     ` Peter Barada
@ 1999-01-31 23:58       ` Zack Weinberg
  1999-01-31 23:58         ` Jeffrey A Law
  1999-01-31 23:58       ` Marc Espie
  1999-01-31 23:58       ` Nick Ing-Simmons
  2 siblings, 1 reply; 37+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Peter Barada; +Cc: egcs

On Tue, 12 Jan 1999 13:33:59 -0500, Peter Barada wrote:
>
>>The point behind the string alignments is not to improve cache behavior, but 
>to
>>provide the strings on aligned addresses for memcpy, strcpy and other routine
>s
>>that will examine the alignment at compile or runtime and possibly select a
>>more efficient loop if the alignment of the operands are suitable.
>
>Aligning the strings won't help if the size of the strings is small
>enough that the string code is forced to use brute force.
>
>So don't align strings if their length is below some small
>constant.  This will dramatically cut down on the wasted space as a
>percent of the total string size. 

The case I'm concerned about has many strings all of which are between 75
and 80 characters long.  Aligning them on 32 byte boundaries wastes an
average of 16 bytes per string.

Alignment for efficient copying is a win here, but wouldn't 8 byte alignment
suffice (to be able to use aligned 64 bit load/store instructions)?  I'm
pretty certain this is so for x86, not as sure for other platforms.

zw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* ridiculous amounts of padding
@ 1999-01-31 23:58 Zack Weinberg
  1999-01-31 23:58 ` Alfred Perlstein
  1999-01-31 23:58 ` Joern Rennecke
  0 siblings, 2 replies; 37+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: egcs

Since everyone is discussing string functions...

Code of the form

puts("line 1");
puts("line 2");
puts("line 3");
...
puts("line 96");

is compiled (at least on x86) into a series of calls and a bunch of string
table entries.  Fine.  The problem is, in the string table we emit
.align 32 directives between all the strings.  This wastes quite a lot of
space.  It seems that this is because of a general policy of aligning
.rodata items on 32 byte boundaries - I assume for cache reasons.  When we
know, as in this case, that all the strings will be accessed in succession,
I think we should be able to remove the padding.  (Kaveh's suggested
collapse-to-one-call optimization would be good too.)

I would rewrite it
puts(	"line 1\n"
	"line 2\n"
	...
	"line 96");

but it has to compile on K+R compilers.

zw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58   ` Jeffrey A Law
  1999-01-31 23:58     ` Peter Barada
@ 1999-01-31 23:58     ` Joern Rennecke
  1999-01-31 23:58       ` Jeffrey A Law
  1 sibling, 1 reply; 37+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: amylaar, zack, egcs

> The point behind the string alignments is not to improve cache behavior, but to
> provide the strings on aligned addresses for memcpy, strcpy and other routines
> that will examine the alignment at compile or runtime and possibly select a
> more efficient loop if the alignment of the operands are suitable.

Well, then it doesn't make sense to me to align the string to something larger
than the largest power of two that is smaller or equal to the string  size.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-02-01 17:28 N8TM
@ 1999-02-28 22:53 ` N8TM
  0 siblings, 0 replies; 37+ messages in thread
From: N8TM @ 1999-02-28 22:53 UTC (permalink / raw)
  To: pcg, law; +Cc: egcs

In a message dated 2/1/99 8:36:31 AM Pacific Standard Time, pcg@goof.com
writes:

<< btw, did you benchmark 1.1.1 or the current snapshot? 1.1.1 seems to be
 slower than gcc-2.8 with every benchmark _I_ tried, and I guess thats what
 people test. The snapshots, however, are faster. >>

In most of my benchmarks, 1.1.1 was fastest with -Os.  Recent snapshots (not
this week's!) do as well or better than 1.1.1 at -Os, and often better at -O2
or -O3.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-02-01  8:07         ` Marc Lehmann
@ 1999-02-28 22:53           ` Marc Lehmann
  0 siblings, 0 replies; 37+ messages in thread
From: Marc Lehmann @ 1999-02-28 22:53 UTC (permalink / raw)
  To: law; +Cc: Marc Lehmann, egcs

On Sun, Jan 31, 1999 at 05:59:28PM -0700, Jeffrey A Law wrote:
>   > If you want a good and fast compiler and do not need the new C++ features
>   > of egcs, then by all means stay with gcc. If you really need speed, use
>   > pgcc, but -O6 -funroll-all-loops, pgcc isn't faster than gcc with -O3.

I was just told that his findings are now online (at
http://members.xoom.com/Alex_Maranda )

>   > Of course, that was his own (maybe limited) benchmark, but thats what
>   > counts for him. I also reeceived quite a few similar reports, but these
>   > didn't have hard data in it, i.e. "nothing to report to egcs".
> Such is life.  The benchmarks I've run show just the opposite.

btw, did you benchmark 1.1.1 or the current snapshot? 1.1.1 seems to be
slower than gcc-2.8 with every benchmark _I_ tried, and I guess thats what
people test. The snapshots, however, are faster.

> benchmark it.  That's always the best indicator of performance -- running 
> your own code.

;)

> And such ports are interesting in that they tell us we need to continue to
> improve the compiler.  However, I'm much more likely to spend time working
> with someone that's going to take the time to help analyze the problem.  For
> example, Zack is taking the time to analyze the code and point out problems.
> And I'm more than happy to work with Zack to try and nail down these problems.

More happy than? I'm quite happy when somebody comes at least with hard
data (i.e. something to reproduce). The problem (which is hopefully solved at
leats in part) is the high-quality test-suite and the non-existant benchmark
suite.

Correct code is more important (and there is the high quality of the egcs
releases with respect to correctness!) than fast code. I actually hope
to prove that the current codebase is atcually superior. For example, it
often only takes a simple "-O3 -funroll-all-loops" to make code run (much
faster than gcc-2.8 with the same options).

> And finally, with the ongoing issues with alignment of stack slots, I'm
> leery of any FP benchmarks.  It's too difficult to get reliable benchmark
> results with the FP numbers varying so wildly due to alignment issues.

Its not that difficult with all alignment options on
(-mstack-align-double, -marg-align-double) ;-> Anyway, a very good way to
improve this would be to -fschedule-insns only on fp-intensive code (on
x86).  (I do plan to implement this, next year or so.. ;)

> In fact, I was looking at the losing caller saves stuff today, and gee, fpppp
> started running about 50% slower than normal.  What was it tracked down to?
> A call earlier in call stack allowing 4 less bytes of space caused the main
> code to run a hell of a lot slower.

Thats symptomatic of misaligned stuff. But it gets even worse: on
linux-kernel there is a dicsussion about code which runs twice as
fast with a different virtual memory configuration (linux doesn't do
cache-colouring).  Using the _same_ code, of course.

Anyway, wasn't I told recently on egcs that stack-alignment of double
variables is not really worth the effort (according to benchmarks? ;)

>   > Maybe you can hint me to some of the smaller ones? The current benchmark
>   > suite is quite sensible to small code changes (which was one of the
>   > goals), but it does not provide an adequate view on the performance on
>   > real world problems.
> gcc, sc, perl, compress, m88ksim.  None of which are particularly small, but
> I think all are available in various locations on the net.  Then you just have
> to come up with datasets.
> 
> I believe perl comes with a testsuite, if so, that could become the input
> data.  For gcc, select a target and generate a bunch of .i files (possibly
> the compiler itself) as input.
>
> Compress?  How about compressing the gcc tarball.  Obviously you have to
> pick one and stick with it, but that's not hard.
> 
> m88k sim?  Dunno.  Depends on how complete the simulator is.  One could feed
> the simulator small free benchmarks.  Since what you're measuring is the
> simulator, not the end benchmark.
> 
> John Wehle does benchmarking with craftychess, and I'm sure if we looked around

Interesting note: Robert Hyatt (craftys author) actually recommends "pgcc
-O" (which _should_ be almost the same as "egcs -O"). Veeery interesting.

> we could find some nontrivial fp intensive benchmarks.

My (and your) primary concerns with regards to the benchmark suite were
space constraints. If nobody minds the ("few") additional megabytes I'd be
happy to put in more complete tests. (thanks for the ideas!)

      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
@ 1999-02-01 17:28 N8TM
  1999-02-28 22:53 ` N8TM
  0 siblings, 1 reply; 37+ messages in thread
From: N8TM @ 1999-02-01 17:28 UTC (permalink / raw)
  To: pcg, law; +Cc: egcs

In a message dated 2/1/99 8:36:31 AM Pacific Standard Time, pcg@goof.com
writes:

<< btw, did you benchmark 1.1.1 or the current snapshot? 1.1.1 seems to be
 slower than gcc-2.8 with every benchmark _I_ tried, and I guess thats what
 people test. The snapshots, however, are faster. >>

In most of my benchmarks, 1.1.1 was fastest with -Os.  Recent snapshots (not
this week's!) do as well or better than 1.1.1 at -Os, and often better at -O2
or -O3.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 20:05       ` Jeffrey A Law
@ 1999-02-01  8:07         ` Marc Lehmann
  1999-02-28 22:53           ` Marc Lehmann
  0 siblings, 1 reply; 37+ messages in thread
From: Marc Lehmann @ 1999-02-01  8:07 UTC (permalink / raw)
  To: law; +Cc: Marc Lehmann, egcs

On Sun, Jan 31, 1999 at 05:59:28PM -0700, Jeffrey A Law wrote:
>   > If you want a good and fast compiler and do not need the new C++ features
>   > of egcs, then by all means stay with gcc. If you really need speed, use
>   > pgcc, but -O6 -funroll-all-loops, pgcc isn't faster than gcc with -O3.

I was just told that his findings are now online (at
http://members.xoom.com/Alex_Maranda )

>   > Of course, that was his own (maybe limited) benchmark, but thats what
>   > counts for him. I also reeceived quite a few similar reports, but these
>   > didn't have hard data in it, i.e. "nothing to report to egcs".
> Such is life.  The benchmarks I've run show just the opposite.

btw, did you benchmark 1.1.1 or the current snapshot? 1.1.1 seems to be
slower than gcc-2.8 with every benchmark _I_ tried, and I guess thats what
people test. The snapshots, however, are faster.

> benchmark it.  That's always the best indicator of performance -- running 
> your own code.

;)

> And such ports are interesting in that they tell us we need to continue to
> improve the compiler.  However, I'm much more likely to spend time working
> with someone that's going to take the time to help analyze the problem.  For
> example, Zack is taking the time to analyze the code and point out problems.
> And I'm more than happy to work with Zack to try and nail down these problems.

More happy than? I'm quite happy when somebody comes at least with hard
data (i.e. something to reproduce). The problem (which is hopefully solved at
leats in part) is the high-quality test-suite and the non-existant benchmark
suite.

Correct code is more important (and there is the high quality of the egcs
releases with respect to correctness!) than fast code. I actually hope
to prove that the current codebase is atcually superior. For example, it
often only takes a simple "-O3 -funroll-all-loops" to make code run (much
faster than gcc-2.8 with the same options).

> And finally, with the ongoing issues with alignment of stack slots, I'm
> leery of any FP benchmarks.  It's too difficult to get reliable benchmark
> results with the FP numbers varying so wildly due to alignment issues.

Its not that difficult with all alignment options on
(-mstack-align-double, -marg-align-double) ;-> Anyway, a very good way to
improve this would be to -fschedule-insns only on fp-intensive code (on
x86).  (I do plan to implement this, next year or so.. ;)

> In fact, I was looking at the losing caller saves stuff today, and gee, fpppp
> started running about 50% slower than normal.  What was it tracked down to?
> A call earlier in call stack allowing 4 less bytes of space caused the main
> code to run a hell of a lot slower.

Thats symptomatic of misaligned stuff. But it gets even worse: on
linux-kernel there is a dicsussion about code which runs twice as
fast with a different virtual memory configuration (linux doesn't do
cache-colouring).  Using the _same_ code, of course.

Anyway, wasn't I told recently on egcs that stack-alignment of double
variables is not really worth the effort (according to benchmarks? ;)

>   > Maybe you can hint me to some of the smaller ones? The current benchmark
>   > suite is quite sensible to small code changes (which was one of the
>   > goals), but it does not provide an adequate view on the performance on
>   > real world problems.
> gcc, sc, perl, compress, m88ksim.  None of which are particularly small, but
> I think all are available in various locations on the net.  Then you just have
> to come up with datasets.
> 
> I believe perl comes with a testsuite, if so, that could become the input
> data.  For gcc, select a target and generate a bunch of .i files (possibly
> the compiler itself) as input.
>
> Compress?  How about compressing the gcc tarball.  Obviously you have to
> pick one and stick with it, but that's not hard.
> 
> m88k sim?  Dunno.  Depends on how complete the simulator is.  One could feed
> the simulator small free benchmarks.  Since what you're measuring is the
> simulator, not the end benchmark.
> 
> John Wehle does benchmarking with craftychess, and I'm sure if we looked around

Interesting note: Robert Hyatt (craftys author) actually recommends "pgcc
-O" (which _should_ be almost the same as "egcs -O"). Veeery interesting.

> we could find some nontrivial fp intensive benchmarks.

My (and your) primary concerns with regards to the benchmark suite were
space constraints. If nobody minds the ("few") additional megabytes I'd be
happy to put in more complete tests. (thanks for the ideas!)

      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 ` Marc Espie
  1999-01-31 23:58   ` Joe Buck
  1999-01-31 23:58   ` David Edelsohn
@ 1999-01-31 23:58   ` Jeffrey A Law
  1999-01-30 16:35     ` Marc Lehmann
  2 siblings, 1 reply; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Marc.Espie; +Cc: john, egcs

  In message < 199901142216.XAA13874@quatramaran.ens.fr >you write:
  > the main offender... -Os improved the code size, -fno-gcse improved it some
  > more, relaxing constant string alignment improved almost *nothing*.
Just a note, global cse will make code larger in an attempt to make it
faster.  So this is not unexpected.

  > If someone would be interested in helping me tracking down code
  > differences, I am very interested...
I hope someone will.

  > It may well be that I'm mistaken, or that I'm missing something obvious.
  > But until someone corrects me (or I find a solution myself), OpenBSD will
  > have to stay with gcc 2.8.1 (as much as I would like a change, personally).
I'd be real suprised if in general gcc2.8 produced better code than egcs;
there may be localized issues of course, but I'd be real suprised if on a
global basis gcc-2.8 was better.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 ` Marc Espie
  1999-01-31 23:58   ` Joe Buck
@ 1999-01-31 23:58   ` David Edelsohn
  1999-01-31 23:58     ` Joe Buck
  1999-01-31 23:58   ` Jeffrey A Law
  2 siblings, 1 reply; 37+ messages in thread
From: David Edelsohn @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Marc.Espie; +Cc: john, egcs

>>>>> Marc Espie writes:

Marc> The main stumbling block is that code output by egcs under -O2 on 
Marc> i386-aout is *larger* than code output by gcc 2.8.1.

Marc> Part of it can be accounted for by stricter alignment issues (but not much
Marc> mind you).

Marc> Part of it can be traced to an embarrassing behavior of egcs, which tends
Marc> to leave stuff on the stack longer instead of using registers... same number
Marc> of cycles, larger code.

Marc> ... plus some differing passes, since egcs has changed. The gcse seems to be
Marc> the main offender... -Os improved the code size, -fno-gcse improved it some
Marc> more, relaxing constant string alignment improved almost *nothing*.

Marc> If someone would be interested in helping me tracking down code differences,
Marc> I am very interested...

Marc> This is a real problem: for performance freaks that care only about C
Marc> code quality, it seems that at least on i386, egcs is a bad idea: code is
Marc> larger, and looking at assembler fragments does not indicate a trade-off
Marc> between code size and efficiency.

	I think that EGCS needs to start looking more carefully at code
efficiency.  So far there has been a lot of work adding new optimization
passes and then squashing bugs before a release without the actual benefit
and efficiency of the resulting code compared to previous versions of GCC
as a release engineering quality assurance requirement.

	Maybe this just isn't sexy or we do not have enough volunteers
looking at this or the focus on c-torture results is ignoring performance
results. 

David

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 ` Marc Espie
@ 1999-01-31 23:58   ` Joe Buck
  1999-01-31 23:58     ` Marc Espie
  1999-01-31 23:58   ` David Edelsohn
  1999-01-31 23:58   ` Jeffrey A Law
  2 siblings, 1 reply; 37+ messages in thread
From: Joe Buck @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Marc.Espie; +Cc: john, egcs

> OpenBSD is considering switching to egcs 1.1.1 (or a later release :) )
> from gcc 2.8.1.
> 
> The main stumbling block is that code output by egcs under -O2 on 
> i386-aout is *larger* than code output by gcc 2.8.1.

It may be that -Os is more appropriate for a kernel, where the code is
permanently in memory.

> Something like 30K for a 2MB kernel, which is somewhat large.

30K/2MB = 1.46%.  I would call this measurable, but not somewhat large.
The question is whether the marginally larger code is faster, or if it
is just worse.

> ... plus some differing passes, since egcs has changed. The gcse seems to be
> the main offender... -Os improved the code size, -fno-gcse improved it some
> more, relaxing constant string alignment improved almost *nothing*.

You can choose to recommend whatever flags you wish: e.g. -Os and -fno-gcse.

> This is a real problem: for performance freaks that care only about C
> code quality, it seems that at least on i386, egcs is a bad idea: code is
> larger, and looking at assembler fragments does not indicate a trade-off
> between code size and efficiency.

Unfortunately, we don't systematically run benchmarks, so developers can't
really tell when they are introducing performance regressions.

> It may well be that I'm mistaken, or that I'm missing something obvious.
> But until someone corrects me (or I find a solution myself), OpenBSD will
> have to stay with gcc 2.8.1 (as much as I would like a change, personally).

That's your call, though if any OpenBSD users want to use C++, you aren't
doing them any favors by giving them 2.8.1.  A third option is to help
make egcs better and then switch once 1.2 is available.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58   ` Joe Buck
@ 1999-01-31 23:58     ` Marc Espie
  0 siblings, 0 replies; 37+ messages in thread
From: Marc Espie @ 1999-01-31 23:58 UTC (permalink / raw)
  To: egcs

On Thu, Jan 14, 1999 at 02:34:35PM -0800, Joe Buck wrote:
> You can choose to recommend whatever flags you wish: e.g. -Os and -fno-gcse.
Well, even with -Os -fno-gcse, code is still larger.
And it is not very practical either, as the installation floppy includes
some code that comes from other areas. This basically would mean adding
some switches to Makefiles everywhere... if I find options that make code
fit, I am very tempted to disregard egcs's  documentation and override
OPTIMIZATION_OPTIONS for OpenBSD i386...

> > This is a real problem: for performance freaks that care only about C
> > code quality, it seems that at least on i386, egcs is a bad idea: code is
> > larger, and looking at assembler fragments does not indicate a trade-off
> > between code size and efficiency.

> Unfortunately, we don't systematically run benchmarks, so developers can't
> really tell when they are introducing performance regressions.

This is a problem.

> > It may well be that I'm mistaken, or that I'm missing something obvious.
> > But until someone corrects me (or I find a solution myself), OpenBSD will
> > have to stay with gcc 2.8.1 (as much as I would like a change, personally).

> That's your call, though if any OpenBSD users want to use C++, you aren't
> doing them any favors by giving them 2.8.1.  A third option is to help
> make egcs better and then switch once 1.2 is available.

Don't you think I know that !!!  I am a C++ user.  But, if the floppy disk
no longer fits, I can't decide people to make the switch.
[Besides, I would say that the C compiler is still awfully important.
C++ is still a religious issue. Some people don't believe in it, so if
the C compiler is not better, they don't see any reason to switch, and
then the C++ community loses. You have splits, where not everybody uses
the same compiler. Bigger distributions. Smaller userbase. Duplication of
effort.]

On the other hand, I am perfectly willing to make egcs better in that area.
I can supply sample source code, I can supply example assembler output.
I haven been looking very hard at egcs configuration files, to see whether
there was a new functionality introduced that I missed (say, a new macro
that I need to define if I want to avoid substandard code).

But I don't know enough about the internals of egcs.

I'm also pretty new to the i386, I'm more used to m68k. I've given a few 
code fragments to fellow OpenBSD developpers, and what's come out is that
egcs produces larger code, apparently because it prefers leaving stuff
in memory to using registers when code speed is the same... but size
is definitely not the same.

I am not trying to start a flamewar here, just trying to get some help.

-- 
	Marc Espie		
|anime, sf, juggling, unicycle, acrobatics, comics...
|AmigaOS, OpenBSD, C++, perl, Icon, PostScript...
| `real programmers don't die, they just get out of beta'

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58     ` Joe Buck
@ 1999-01-31 23:58       ` Gabriel Dos Reis
  1999-01-31 23:58         ` Joe Buck
  0 siblings, 1 reply; 37+ messages in thread
From: Gabriel Dos Reis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joe Buck; +Cc: David Edelsohn, Marc.Espie, john, egcs

>>>>>  Joe Buck <jbuck@Synopsys.COM> wrote:

>> I think that EGCS needs to start looking more carefully at code
>> efficiency.  So far there has been a lot of work adding new optimization
>> passes and then squashing bugs before a release without the actual benefit
>> and efficiency of the resulting code compared to previous versions of GCC
>> as a release engineering quality assurance requirement.
>> 
>> Maybe this just isn't sexy or we do not have enough volunteers
>> looking at this or the focus on c-torture results is ignoring performance
>> results. 

> This is, I think, in part due to the fact that we distribute a test suite
> which lots of people run, but we don't have a systematic set of
> performance benchmarks.  If someone published, say, a full set of Spec95
> numbers based on each week's snapshot, I'm sure we'd start seeing
> attention paid to such matters.

As I'm concerned with number crunching efficiency issue, I volunteer
for this task. I just need guidance. Anyone to enlight me ?

-- Gaby

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 John Wehle
@ 1999-01-31 23:58 ` Marc Espie
  1999-01-31 23:58   ` Joe Buck
                     ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Marc Espie @ 1999-01-31 23:58 UTC (permalink / raw)
  To: john; +Cc: egcs

In article < 199901140044.TAA11843@jwlab.FEITH.COM > you write:

>It's certainly worth considering having this type of thing controlled by
>-Os.  Many of the alignments done for performance result in an increase
>in size which may be undesirable in some situations.  BTW, in all fairness
>I would be surprised if the 27.5% space increase is representative of the
>total size change that the final executable experienced due to extra
>alignment.

In case you should care, I have a system on which such small differences
make a difference.

OpenBSD is considering switching to egcs 1.1.1 (or a later release :) )
from gcc 2.8.1.

The main stumbling block is that code output by egcs under -O2 on 
i386-aout is *larger* than code output by gcc 2.8.1.

Something like 30K for a 2MB kernel, which is somewhat large.

Part of it can be accounted for by stricter alignment issues (but not much
mind you).

Part of it can be traced to an embarrassing behavior of egcs, which tends
to leave stuff on the stack longer instead of using registers... same number
of cycles, larger code.

... plus some differing passes, since egcs has changed. The gcse seems to be
the main offender... -Os improved the code size, -fno-gcse improved it some
more, relaxing constant string alignment improved almost *nothing*.

If someone would be interested in helping me tracking down code differences,
I am very interested...

I already asked about this specific problem a few weeks ago, I've got next
to no feedback until now.

This is a real problem: for performance freaks that care only about C
code quality, it seems that at least on i386, egcs is a bad idea: code is
larger, and looking at assembler fragments does not indicate a trade-off
between code size and efficiency.

It may well be that I'm mistaken, or that I'm missing something obvious.
But until someone corrects me (or I find a solution myself), OpenBSD will
have to stay with gcc 2.8.1 (as much as I would like a change, personally).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 N8TM
@ 1999-01-31 23:58 ` Zack Weinberg
  1999-01-31 23:58   ` Jeffrey A Law
  0 siblings, 1 reply; 37+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: N8TM; +Cc: egcs

On Wed, 13 Jan 1999 22:11:30 EST, N8TM@aol.com wrote:
>In a message dated 1/13/99 4:47:58 PM Pacific Standard Time, john@feith.com
>writes:
>
><< It's certainly worth considering having this type of thing controlled by
> -Os.  Many of the alignments done for performance result in an increase
> in size which may be undesirable in some situations.   >>
>In my tests (mostly g77), the difference in performance due to alignments is
>much greater than the difference between -Os and -O2, so I'd like to continue
>to improve the alignments available with -Os.
>
>If you're talking about saving space, turning off .p2align typically saves
>about 5% in code space but takes at least that much longer to run on a PPro,
>much more on a p2.  I haven't seen anyone seriously wanting to turn off
>.p2align, although a smarter compiler could figure out where it may be time
>critical and where it definitely isn't.

I'm fine with leaving the alignment rules as is for the general case.  I
think it might be a good idea to reduce alignment for the special case of
many strings allocated next to one another, but I never claimed to be an
expert in this sort of thing.

It is unfortunate that Kaveh's idea of collapsing multiple consecutive
printf/puts calls down to one can't be implemented at the moment.  I seem to
remember someone suggested doing the tree->rtl conversion on a per-block or
per-function basis to enable more optimization on the trees.  Is that a
non-starter?

zw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58         ` Joe Buck
@ 1999-01-31 23:58           ` Jeffrey A Law
  0 siblings, 0 replies; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Joe Buck; +Cc: Gabriel Dos Reis, egcs

  In message < 199901151755.JAA18790@yamato.synopsys.com >you write:
  > Gaby writes:
  > > As I'm concerned with number crunching efficiency issue, I volunteer
  > > for this task. I just need guidance. Anyone to enlight me ?
  > 
  > Well, unless you have a Spec95 license you can't run that.  However,
  > I've been thinking about this, and it seems that we could come up with
  > a "free software benchmark" that is similar in spirit to Spec95.
Yes.  And one can even use many of the same benchmarks since many are available
for free.  What you don't get are the datasets or the infrastructure for
building, testing and reporting information that spec provides.


  > First, a bit on benchmark philosophy.  Tests like dhrystone are no good
  > because they are artificial programs, and they don't produce any output
  > so a sufficiently good globally optimizing compiler could in principle
  > throw all the code away.  Spec, on the other hand, consists of real
  > programs -- see
Right.  And in Gaby's case, the best benchmarks would be the codes he cares
about.  That's actually true for everyone.

  > into optimized Sparc assembly code (the Spec people couldn't
  > prevent us from building the exact same version of gcc, but I guess
  > they own the input file that the test uses?).
Possibly.  Depends on what the input files are.  The input might be gcc itself;
that was the case for spec92.



jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
@ 1999-01-31 23:58 N8TM
  1999-01-31 23:58 ` Zack Weinberg
  0 siblings, 1 reply; 37+ messages in thread
From: N8TM @ 1999-01-31 23:58 UTC (permalink / raw)
  To: john, zack; +Cc: egcs, law

In a message dated 1/13/99 4:47:58 PM Pacific Standard Time, john@feith.com
writes:

<< It's certainly worth considering having this type of thing controlled by
 -Os.  Many of the alignments done for performance result in an increase
 in size which may be undesirable in some situations.   >>
In my tests (mostly g77), the difference in performance due to alignments is
much greater than the difference between -Os and -O2, so I'd like to continue
to improve the alignments available with -Os.

If you're talking about saving space, turning off .p2align typically saves
about 5% in code space but takes at least that much longer to run on a PPro,
much more on a p2.  I haven't seen anyone seriously wanting to turn off
.p2align, although a smarter compiler could figure out where it may be time
critical and where it definitely isn't.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
@ 1999-01-31 23:58 John Wehle
  1999-01-31 23:58 ` Marc Espie
  0 siblings, 1 reply; 37+ messages in thread
From: John Wehle @ 1999-01-31 23:58 UTC (permalink / raw)
  To: zack; +Cc: egcs, law

> glibc has very aggressively optimized string functions, that do tricks like
> that.  They do not appear to require more than 8 byte alignment on x86.  I'm
> not sure about other platforms.

I implemented CONSTANT_ALIGNMENT following the recommendations of the
"Intel Architecture Optimization Manual" that objects over a certain
size should be aligned on 32 byte boundaries in order to reduce the
number of cache lines.

> For the case that prompted the question (cpplib.c:print_help()) we emit 4468
> bytes of string constants and 1228 bytes of padding.  That's a 27.5% space
> increase.

It's certainly worth considering having this type of thing controlled by
-Os.  Many of the alignments done for performance result in an increase
in size which may be undesirable in some situations.  BTW, in all fairness
I would be surprised if the 27.5% space increase is representative of the
total size change that the final executable experienced due to extra
alignment.

-- John
-------------------------------------------------------------------------
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: john@feith.com  |
|    John Wehle    |     Fax: 1-215-540-5495  |                         |
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-30 16:35     ` Marc Lehmann
  1999-01-31 20:05       ` Jeffrey A Law
@ 1999-01-31 23:58       ` Marc Lehmann
  1 sibling, 0 replies; 37+ messages in thread
From: Marc Lehmann @ 1999-01-31 23:58 UTC (permalink / raw)
  To: egcs

[going through my mail folders, this is a bit late]

On Thu, Jan 14, 1999 at 06:51:41PM -0700, Jeffrey A Law wrote:
>   > If someone would be interested in helping me tracking down code
>   > differences, I am very interested...
> I hope someone will.

While I can't help with tracking down code size differences, the benchmark
suite records the stripped executable size of each benchmark it generates.
Although this data is not yet evaluated anywhere (its not even in the
database), all reports come with this information.

If somebody tells me what to do with that data we can at least keep an eye
on it in the future.

>   > But until someone corrects me (or I find a solution myself), OpenBSD will
>   > have to stay with gcc 2.8.1 (as much as I would like a change, personally).
> I'd be real suprised if in general gcc2.8 produced better code than egcs;
> there may be localized issues of course, but I'd be real suprised if on a
> global basis gcc-2.8 was better.

Actually, thats exactly what I get to hear. Recently someone send me
a report that he intends to publish (I'll post an url here), in which
he compares gcc-2.8, egcs and pgcc using his number-crunching c++
program. His summary was:

If you want a good and fast compiler and do not need the new C++ features
of egcs, then by all means stay with gcc. If you really need speed, use
pgcc, but -O6 -funroll-all-loops, pgcc isn't faster than gcc with -O3.

Of course, that was his own (maybe limited) benchmark, but thats what
counts for him. I also reeceived quite a few similar reports, but these
didn't have hard data in it, i.e. "nothing to report to egcs".

On Fri, Jan 15, 1999 at 03:19:26PM -0700, Jeffrey A Law wrote:
>   > Well, unless you have a Spec95 license you can't run that.  However,
>   > I've been thinking about this, and it seems that we could come up with
>   > a "free software benchmark" that is similar in spirit to Spec95.
> Yes.  And one can even use many of the same benchmarks since many are available
> for free.  What you don't get are the datasets or the infrastructure for
> building, testing and reporting information that spec provides.

Maybe you can hint me to some of the smaller ones? The current benchmark
suite is quite sensible to small code changes (which was one of the
goals), but it does not provide an adequate view on the performance on
real world problems.

      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58   ` David Edelsohn
@ 1999-01-31 23:58     ` Joe Buck
  1999-01-31 23:58       ` Gabriel Dos Reis
  0 siblings, 1 reply; 37+ messages in thread
From: Joe Buck @ 1999-01-31 23:58 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Marc.Espie, john, egcs

> 	I think that EGCS needs to start looking more carefully at code
> efficiency.  So far there has been a lot of work adding new optimization
> passes and then squashing bugs before a release without the actual benefit
> and efficiency of the resulting code compared to previous versions of GCC
> as a release engineering quality assurance requirement.
> 
> 	Maybe this just isn't sexy or we do not have enough volunteers
> looking at this or the focus on c-torture results is ignoring performance
> results. 

This is, I think, in part due to the fact that we distribute a test suite
which lots of people run, but we don't have a systematic set of
performance benchmarks.  If someone published, say, a full set of Spec95
numbers based on each week's snapshot, I'm sure we'd start seeing
attention paid to such matters.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
@ 1999-01-31 23:58 John Wehle
  0 siblings, 0 replies; 37+ messages in thread
From: John Wehle @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Marc.Espie; +Cc: egcs

> I'm also pretty new to the i386, I'm more used to m68k. I've given a few 
> code fragments to fellow OpenBSD developpers, and what's come out is that
> egcs produces larger code, apparently because it prefers leaving stuff
> in memory to using registers when code speed is the same... but size
> is definitely not the same.

You mentioned a similar problem involving regmove in December.

  1) Did the patch I mentioned help?

  2) Does the problem occur using current snapshots?

Code fragments where current egcs snapshots don't compile to efficient
instruction sequences (or when using -Os the instruction sequence is
overly large) are of interest, especially if older versions of the
compiler (or gcc 2.8) do a better job when using the same compiler
options.  Keep in mind that there are cases where egcs will produce
larger code in the interest of performance, and that it's not always
obvious from code inspection why it's being done.  -Os addresses some
of the speed versus size issues, though I'm sure it can be improved
(especially if someone can point out the exact failing :-).

> I am not trying to start a flamewar here, just trying to get some help.

There are certainly people here willing to help though at times it may
take a while for someone to get back to you due to other things being in
their queue.  Anything you can do to supply specifics (or small test cases)
which isolate the problem helps tremendously.

-- John
-------------------------------------------------------------------------
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: john@feith.com  |
|    John Wehle    |     Fax: 1-215-540-5495  |                         |
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58 ` Zack Weinberg
@ 1999-01-31 23:58   ` Jeffrey A Law
  0 siblings, 0 replies; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: N8TM, egcs

  In message < 199901150310.WAA23774@rabi.phys.columbia.edu >you write:
  > I'm fine with leaving the alignment rules as is for the general case.  I
  > think it might be a good idea to reduce alignment for the special case of
  > many strings allocated next to one another, but I never claimed to be an
  > expert in this sort of thing.
First, you can't ever assume that consecutive strings will end up consecutive
in the final image.  It's not uncommon for linkers to rearrange stuff in the
data and readonly data segments in an attempt to:

  * Delete useless "sethi" type instructions for address calculations on risc
  machines

  * sort items based on size/alignment needs

  > It is unfortunate that Kaveh's idea of collapsing multiple consecutive
  > printf/puts calls down to one can't be implemented at the moment.  I seem
  > to remember someone suggested doing the tree->rtl conversion on a per-block or
  > per-function basis to enable more optimization on the trees.  Is that a
  > non-starter?
It's still a long term goal, but it's not a focus of work at the moment.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58       ` Gabriel Dos Reis
@ 1999-01-31 23:58         ` Joe Buck
  1999-01-31 23:58           ` Jeffrey A Law
  0 siblings, 1 reply; 37+ messages in thread
From: Joe Buck @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: egcs

I wrote:
> > This is, I think, in part due to the fact that we distribute a test suite
> > which lots of people run, but we don't have a systematic set of
> > performance benchmarks.  If someone published, say, a full set of Spec95
> > numbers based on each week's snapshot, I'm sure we'd start seeing
> > attention paid to such matters.

Gaby writes:
> As I'm concerned with number crunching efficiency issue, I volunteer
> for this task. I just need guidance. Anyone to enlight me ?

Well, unless you have a Spec95 license you can't run that.  However,
I've been thinking about this, and it seems that we could come up with
a "free software benchmark" that is similar in spirit to Spec95.

First, a bit on benchmark philosophy.  Tests like dhrystone are no good
because they are artificial programs, and they don't produce any output
so a sufficiently good globally optimizing compiler could in principle
throw all the code away.  Spec, on the other hand, consists of real
programs -- see

http://www.specbench.org/osg/cpu95/news/cpu95descr.html#product

for a description of the tests.  The idea is that if you score well
on spec, gcc, perl, JPEG compression, and scientific computation has
to run fast (unless Spec goofs and puts in a test that can be improved
drastically with transformations that don't help most code that much;
Spec92 had a couple of these).

So what would a free software benchmark look like?  You'd need to
compile a set of commonly used free software applications, then
run real data through those applications and produce results (the
tests should be CPU-bound rather than disk-bound).  You'd need to
always use the same version of the programs.  For example, the
SPEC95 gcc test compiles a particular version of gcc with a Sparc
target, and the test is to turn a particular preprocessed C input
into optimized Sparc assembly code (the Spec people couldn't
prevent us from building the exact same version of gcc, but I guess
they own the input file that the test uses?).

So, what kinds of tests might we include?  I think it should be
applications that folks run a lot, with enough diversity to get
good coverage.

gcc, of course.
gzip
perl
some database application (postgresSQL, maybe)
some Lisp (or Guile, or whatever) application

The GIMP: could we get smaller tests by making an enviroment where some
plug-in could run standalone?  (Or you could extract the computational
cores and get the image data from disk rather than shared memory).

C++ tests:
groff, Octave, ?
The Stepanov tests (how much do you lose by using increasingly abstract
STL constructs).

The Fortran people could come up with good number-crunching tests.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-30 16:35     ` Marc Lehmann
@ 1999-01-31 20:05       ` Jeffrey A Law
  1999-02-01  8:07         ` Marc Lehmann
  1999-01-31 23:58       ` Marc Lehmann
  1 sibling, 1 reply; 37+ messages in thread
From: Jeffrey A Law @ 1999-01-31 20:05 UTC (permalink / raw)
  To: Marc Lehmann; +Cc: egcs

  In message < 19990131010811.O8047@cerebro.laendle >you write:
  > While I can't help with tracking down code size differences, the benchmark
  > suite records the stripped executable size of each benchmark it generates.
  > Although this data is not yet evaluated anywhere (its not even in the
  > database), all reports come with this information.
Good.

  > Actually, thats exactly what I get to hear. Recently someone send me
  > a report that he intends to publish (I'll post an url here), in which
  > he compares gcc-2.8, egcs and pgcc using his number-crunching c++
  > program. His summary was:
  > 
  > If you want a good and fast compiler and do not need the new C++ features
  > of egcs, then by all means stay with gcc. If you really need speed, use
  > pgcc, but -O6 -funroll-all-loops, pgcc isn't faster than gcc with -O3.
  > 
  > Of course, that was his own (maybe limited) benchmark, but thats what
  > counts for him. I also reeceived quite a few similar reports, but these
  > didn't have hard data in it, i.e. "nothing to report to egcs".
Such is life.  The benchmarks I've run show just the opposite.

I've always stated that everyone needs to take code they care about and 
benchmark it.  That's always the best indicator of performance -- running 
your own code.

And such ports are interesting in that they tell us we need to continue to
improve the compiler.  However, I'm much more likely to spend time working
with someone that's going to take the time to help analyze the problem.  For
example, Zack is taking the time to analyze the code and point out problems.
And I'm more than happy to work with Zack to try and nail down these problems.

And finally, with the ongoing issues with alignment of stack slots, I'm
leery of any FP benchmarks.  It's too difficult to get reliable benchmark
results with the FP numbers varying so wildly due to alignment issues.

In fact, I was looking at the losing caller saves stuff today, and gee, fpppp
started running about 50% slower than normal.  What was it tracked down to?
A call earlier in call stack allowing 4 less bytes of space caused the main
code to run a hell of a lot slower.

  > > Yes.  And one can even use many of the same benchmarks since many are ava
  > ilable
  > > for free.  What you don't get are the datasets or the infrastructure for
  > > building, testing and reporting information that spec provides.
  > 
  > Maybe you can hint me to some of the smaller ones? The current benchmark
  > suite is quite sensible to small code changes (which was one of the
  > goals), but it does not provide an adequate view on the performance on
  > real world problems.
gcc, sc, perl, compress, m88ksim.  None of which are particularly small, but
I think all are available in various locations on the net.  Then you just have
to come up with datasets.

I believe perl comes with a testsuite, if so, that could become the input
data.  For gcc, select a target and generate a bunch of .i files (possibly
the compiler itself) as input.

Compress?  How about compressing the gcc tarball.  Obviously you have to
pick one and stick with it, but that's not hard.

m88k sim?  Dunno.  Depends on how complete the simulator is.  One could feed
the simulator small free benchmarks.  Since what you're measuring is the
simulator, not the end benchmark.

John Wehle does benchmarking with craftychess, and I'm sure if we looked around
we could find some nontrivial fp intensive benchmarks.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ridiculous amounts of padding
  1999-01-31 23:58   ` Jeffrey A Law
@ 1999-01-30 16:35     ` Marc Lehmann
  1999-01-31 20:05       ` Jeffrey A Law
  1999-01-31 23:58       ` Marc Lehmann
  0 siblings, 2 replies; 37+ messages in thread
From: Marc Lehmann @ 1999-01-30 16:35 UTC (permalink / raw)
  To: egcs

[going through my mail folders, this is a bit late]

On Thu, Jan 14, 1999 at 06:51:41PM -0700, Jeffrey A Law wrote:
>   > If someone would be interested in helping me tracking down code
>   > differences, I am very interested...
> I hope someone will.

While I can't help with tracking down code size differences, the benchmark
suite records the stripped executable size of each benchmark it generates.
Although this data is not yet evaluated anywhere (its not even in the
database), all reports come with this information.

If somebody tells me what to do with that data we can at least keep an eye
on it in the future.

>   > But until someone corrects me (or I find a solution myself), OpenBSD will
>   > have to stay with gcc 2.8.1 (as much as I would like a change, personally).
> I'd be real suprised if in general gcc2.8 produced better code than egcs;
> there may be localized issues of course, but I'd be real suprised if on a
> global basis gcc-2.8 was better.

Actually, thats exactly what I get to hear. Recently someone send me
a report that he intends to publish (I'll post an url here), in which
he compares gcc-2.8, egcs and pgcc using his number-crunching c++
program. His summary was:

If you want a good and fast compiler and do not need the new C++ features
of egcs, then by all means stay with gcc. If you really need speed, use
pgcc, but -O6 -funroll-all-loops, pgcc isn't faster than gcc with -O3.

Of course, that was his own (maybe limited) benchmark, but thats what
counts for him. I also reeceived quite a few similar reports, but these
didn't have hard data in it, i.e. "nothing to report to egcs".

On Fri, Jan 15, 1999 at 03:19:26PM -0700, Jeffrey A Law wrote:
>   > Well, unless you have a Spec95 license you can't run that.  However,
>   > I've been thinking about this, and it seems that we could come up with
>   > a "free software benchmark" that is similar in spirit to Spec95.
> Yes.  And one can even use many of the same benchmarks since many are available
> for free.  What you don't get are the datasets or the infrastructure for
> building, testing and reporting information that spec provides.

Maybe you can hint me to some of the smaller ones? The current benchmark
suite is quite sensible to small code changes (which was one of the
goals), but it does not provide an adequate view on the performance on
real world problems.

      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~1999-02-28 22:53 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-31 23:58 ridiculous amounts of padding Zack Weinberg
1999-01-31 23:58 ` Alfred Perlstein
1999-01-31 23:58 ` Joern Rennecke
1999-01-31 23:58   ` Jeffrey A Law
1999-01-31 23:58     ` Peter Barada
1999-01-31 23:58       ` Zack Weinberg
1999-01-31 23:58         ` Jeffrey A Law
1999-01-31 23:58           ` Zack Weinberg
1999-01-31 23:58             ` Jeffrey A Law
1999-01-31 23:58       ` Marc Espie
1999-01-31 23:58       ` Nick Ing-Simmons
1999-01-31 23:58         ` Joern Rennecke
1999-01-31 23:58           ` John Vickers
1999-01-31 23:58             ` Joern Rennecke
1999-01-31 23:58     ` Joern Rennecke
1999-01-31 23:58       ` Jeffrey A Law
  -- strict thread matches above, loose matches on Subject: below --
1999-02-01 17:28 N8TM
1999-02-28 22:53 ` N8TM
1999-01-31 23:58 John Wehle
1999-01-31 23:58 John Wehle
1999-01-31 23:58 ` Marc Espie
1999-01-31 23:58   ` Joe Buck
1999-01-31 23:58     ` Marc Espie
1999-01-31 23:58   ` David Edelsohn
1999-01-31 23:58     ` Joe Buck
1999-01-31 23:58       ` Gabriel Dos Reis
1999-01-31 23:58         ` Joe Buck
1999-01-31 23:58           ` Jeffrey A Law
1999-01-31 23:58   ` Jeffrey A Law
1999-01-30 16:35     ` Marc Lehmann
1999-01-31 20:05       ` Jeffrey A Law
1999-02-01  8:07         ` Marc Lehmann
1999-02-28 22:53           ` Marc Lehmann
1999-01-31 23:58       ` Marc Lehmann
1999-01-31 23:58 N8TM
1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58   ` Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).